Sensitivity of marine fish thermal habitat models to fishery data sources

Abstract Statistical models built using different data sources and methods can exhibit conflicting patterns. We used the northern stock of black sea bass (Centropristis striata) as a case study to assess the impacts of using different fisheries data sources and laboratory‐derived physiological metrics in the development of thermal habitat models for marine fishes. We constructed thermal habitat models using generalized additive models (GAMs) based on various fisheries datasets as input, including the NOAA Northeast Fisheries Science Center (NEFSC) bottom trawl surveys, various inshore fisheries‐independent trawl surveys (state waters), NEFSC fisheries‐dependent observer data, and laboratory‐based physiological metrics. We compared each model's GAM response curve and coupled them to historical ocean conditions in the U.S. Northeast Shelf using bias‐corrected ocean temperature output from a regional ocean model. Thermal habitat models based on shelf‐wide data (NEFSC fisheries‐dependent observer data and fisheries‐independent spring and fall surveys) explained the most variation in black sea bass presence/absence data at ~15% deviance explained. Models based on a narrower range of sampled thermal habitat from inshore survey data in the Northeast Area Monitoring and Assessment Program (NEAMAP) and the geographically isolated Long Island Sound data performed poorly. All models had similar lower thermal limits around 8.5℃, but thermal optima, when present, ranged from 16.7 to 24.8℃. The GAMs could reliably predict habitat from years excluded from model training, but due to strong seasonal temperature fluctuations in the region, could not be used to predict habitat in seasons excluded from training. We conclude that survey data source can greatly impact development and interpretation of thermal habitat models for marine fishes. We suggest that model development be based on data sources that sample the widest range of ocean temperature and physical habitat throughout multiple seasons when possible, and encourage thorough consideration of how data gaps may influence model uncertainty.


| INTRODUC TI ON
Defining marine habitat indicators is challenging due to the complex interaction between marine species and environmental variability, but various methodologies such as process-based laboratory experiments (Slesinger et al., 2019) and empirical field studies (Cullen & Guida, 2021;Kleisner et al., 2017;McHenry et al., 2019) have been developed to determine the relationships between marine ectotherms and the environment. Such relationships can be developed because metabolic processes in ectotherms are tightly coupled to environmental temperatures (Clarke & Johnston, 1999;Verberk et al., 2016) and they are more vulnerable to warming temperatures than their terrestrial counterparts (Pinsky et al., 2019). Understanding how living marine resources respond to changes in environmental conditions is useful for management and policy decisions in a rapidly changing climate, as well as anticipating potential species shifts and introductions/evacuations in a local ecosystem. A common response to ocean warming includes rapid distribution shifts, usually poleward, for many marine species (Kleisner et al., 2016;Nye et al., 2009), which can lead to management conflicts and changes in species compositions in newly occupied regions .
While laboratory and empirical studies have proven to be useful in elucidating these range shifts, they are also subject to many limitations. Laboratory studies are conducted in controlled environments to promote replication under optimal conditions whereby the fish are fed ad libitum, predators are absent, and interactive effects from other stressors are removed, sometimes resulting in skewed thermal optima (Jutfelt et al., 2018;Slesinger et al., 2019). Environmental response models for a single species developed in empirical field studies, on the other hand, can look quite different from one another depending on the specific type of model and training datasets used, leaving many models vulnerable to misinterpretation or bias (Bahn & McGill, 2013). These methodological issues are concerning because one of the overarching goals in habitat model development is to provide a tool to predict future distributions under various climate change scenarios. While they are incredibly useful and effective tools, the limitations are often glossed over rather than used to inform key areas of uncertainty.
The Mid-Atlantic Bight (MAB), located within the U.S. Northeast Shelf, provides a natural laboratory to investigate the impacts of fishery data source on thermal habitat models because this region has exceptionally high seasonal temperature variability and substantial long-term ocean warming , and offers long-term datasets of fishery-independent and dependent data to build thermal habitat models for many marine taxa. The MAB spans from Cape Hatteras, North Carolina to Cape Cod, Massachusetts, with a broad shelf that transitions to a steep slope ~150 km from shore. Ectotherms that inhabit the MAB have distributions and life histories coupled to ocean temperature and thus many species undergo seasonal migrations in which they move between nearshore habitat during the summer, often for spawning, and southern or warmer offshore shelfslope habitats during cooler months of the year (e.g., butterfish; Cross et al., 1999; black sea bass: Drohan et al., 2007; summer flounder: Packer et al., 1999;bluefish: Shepherd & Packer, 2006;scup: Steimle et al., 1999). On longer timescales, several marine species have already shifted their distribution poleward and/or into deeper waters (Kleisner et al., 2016;Nye et al., 2009) and are projected to continue to shift under continued ocean warming (Kleisner et al., 2017;McHenry et al., 2019;Morley et al., 2018). If we hope to understand short-and long-term species distribution shifts in such a highly dynamic region, it is important to accurately interpret the relationships between those species distributions and the surrounding environment.
Within the MAB, black sea bass (Centropristis striata) are an ideal study species to test the effects of varying data sources on model shape used for the development of thermal habitat models due to the abundance of existing data and research on black sea bass.
Black sea bass are an abundant demersal finfish often found near rock and artificial reefs (Drohan et al., 2007) that support lucrative recreational and commercial fisheries in the MAB (NEFSC, 2017).
The U.S. Northern stock of black sea bass occupies waters north of Cape Hatteras, NC (Roy et al., 2012), inhabiting shallow inshore water during the summer and migrating offshore during the late fall to overwinter at the shelf-slope front (Moser & Shepherd, 2008).
Like many species, black sea bass have shifted their distribution northward along the U.S. northeast coast as ocean temperature has warmed over the last few decades (Bell et al., 2015;Kleisner et al., 2016). Multiple black sea bass thermal habitat models have been developed using fisheries data paired with in situ and modeled environmental parameters. For example, Kleisner et al. (2017) and McHenry et al. (2019) each developed GAMs by associating presence-absence data with several environmental covariates.
While similar, the two models used different data sources as input to their respective models, resulting in some key differences in the shape of the response curves of the GAMs. Differences in model output describing the relationships of bottom temperature to black sea bass habitat can have significant implications for both our current interpretation of temperature as it applies to fish distribution (e.g., optimal temperatures) and for predicting future thermal habitat under varying climate scenarios. While differences between various models are not inherently a problem, examining the reasons behind those differences can improve our understanding of the models themselves and can include where and why there is more (or less) uncertainty in each model, and what types of applications different models may be most appropriately suited to rather than opting for a "one size fits all" approach. Altogether, this analysis could provide a framework for using these models to predict thermal habitat.
Here we show that thermal response curves developed using GAMs vary by model type and data source. Specifically, we assessed field-based measurements from fishery-independent and fisherydependent data sources as they relate to measured and modeled bottom temperature, and in situ laboratory-derived thermal optima from physiological studies (Slesinger et al., 2019). We limited our models to habitat based on bottom temperature because it allowed us to directly compare empirical models derived from survey data to recent process-based laboratory data that only measured the effects of temperature, and because bottom temperature has a well-established physiological relationship with fitness for many demersal species, including black sea bass (Bicego et al., 2007;Clarke & Johnston, 1999). As a standard ocean variable resolved in many global and regional hydrodynamic hindcasts, nowcasts, and longterm climate projections, habitat models based solely on temperature can be coupled to a wide variety of hydrodynamic simulations. Our multi-model analysis provides new information on uncertainty, interpretation, and application of habitat models that is relevant to many marine species. Following Kleisner et al. (2017) and McHenry et al. (2019), we associated black sea bass presence-absence data with environmental covariates, which we limited to bottom temperature in order to better compare models developed from different training data sources.

| ME THODS
We used the "mgcv" package in R version 3.6.1 (Wood, 2017) to develop presence-absence GAMs with a binomial distribution from five fisheries data sources paired with both measured and modeled bottom temperatures in order to determine whether GAM models developed using modeled temperatures were comparably effective to those developed with measured temperatures. The shape and performance of the curves were compared to each other and to the experimental physiological response curve from Slesinger et al. (2019) to assess compatibility between model results obtained via statistical models versus in situ laboratory data.

| Fisheries data sources
We used data from four fisheries-independent bottom trawl sources (Sections 2.1.1-2.1.4): a shelf-wide federal bottom trawl survey, a shallow inshore federal bottom trawl survey spanning the entire MAB coast, and inshore state government bottom trawl surveys by New Jersey and Connecticut. All fisheries-independent surveys followed a stratified random sampling design. Fisheries-dependent observer data from a regional observer program was used as well (Section 2.1.5). Where available, data from 1985 through 2015 were used for this analysis (Table 1). We started with 1985 because it was the earliest year included in the decadal climatologies used for bias-correcting that had full overlap with the hydrodynamic model used, and all surveys except one have data going back to 1989 or earlier, and thus, long-term temporal coverage was similar. were observed between 5 and 15℃ with salinity between 32 and 36 PSU; fall samples had a similar salinity range but a wider temperature range between ~5 and 25℃. More details on the survey design can be found in Azarovitz (1981). The gear on the research vessel being used since 2009 has higher catchability compared to the gear used before 2009, particularly for smaller fish (Miller et al., 2010),

| Ocean bottom temperature sources
Fisheries data samples were paired with bottom temperature measured in situ with each trawl (where available) and bias-corrected modeled bottom temperature (described below). Each of the four fisheries-independent surveys collected oceanographic data for each survey sample by lowering a temperature probe to near-bottom immediately before or after each tow was performed, so both in situ measured and bias-corrected modeled bottom temperatures were available for trawls from these surveys. There were no in situ temperature data available from the commercial fishery observer program, and thus, presence-absence data were matched only to bias-corrected modeled bottom.
We paired each tow from all surveys with bias-corrected monthly averaged bottom temperatures from a Regional Ocean Modeling System (ROMS) non-data-assimilative hindcast simulation span- Mission (SRTM) database (Farr et al., 2007), initial ocean boundary conditions from reanalysis data of Simple Ocean Data Assimilation (SODA 3.3.1) (Carton et al., 2018), and atmospheric forcing from the Drakkar forcing set (DFS 5.2) (Dussin et al., 2016). There was a persistent bottom temperature bias of around 2℃ across the shelf ( Figure S1), which we corrected using a regional climatology.
The NOAA Northwest Atlantic climatology (Seidov et al., 2016) at a monthly temporal resolution and 0.1° spatial resolution was used to bias-correct ROMS bottom temperatures. Several decadal climatologies, calculated using all data that was available in the World Ocean Database in 2013, are available, including three that overlapped with our time period : 1985-1994, 1995-2004, and 2005-2012

| Model development and comparison
We used the generalized additive models (GAM) function from the R package mgcv 1.8-28 (Wood, 2017)  The survey iterations were randomized as described for the initial Monte-Carlo described above, and the laboratory GAM iterations were randomized by selecting 9 samples of each temperature record (from originally 10-12 samples for each of five temperatures between 12 and 30℃) and then compared to testing data from the three surveys well-described by temperature (NJDEP, NEFSC, and observer). Presence-absence survey GAMs were all asymptotic near zero for low temperatures, while aerobic scope was on a much larger scale and required scaling from 0 to 1 over the minimum to maximum range of response from 0 to 30℃ in order to be comparable. This scaled response value was calculated for each trawl in each of the 100 iterations and then converted to a habitat quality ranking between 1 (poorest quality) and 10 (highest quality) depending on this scaled response value (i.e., scaled GAM response between 0.0-0.1 was given a quality ranking of 1, scaled responses 0.1-0.2 a quality ranking of 2,… scaled response 0.9-1.0 a quality ranking of 10).
For this version of the cross-validation, the proportion of trawls that caught black sea bass in each of the ten categories was calculated and pattern of the proportion of trawls with black sea bass versus habitat quality ranking was used to assess model performance. Better model performance was indicated by increasing black sea bass catch with habitat quality ranking, regardless of the specific proportion of tows containing black sea bass.

| Fisheries data coverage
Each of the data sources considered in this study provided thousands of samples within the MAB (Table 1 Island Sound survey covered a wide range of temperatures as well ( Figure S4b-d), but data from this survey were similarly limited in monthly temporal coverage and almost completely surrounded by land and therefore isolated from the rest of the shelf ( Figure S4a).
Of the five fisheries data sources, only the observer program provided consistent year-round coverage. Sampling was concentrated in key fishing areas such as the canyons and the shelf break front ( Figure S5a) but was relatively consistent throughout the year. Black sea bass caught in the surveys generally followed expected seasonal migration patterns, mainly inshore during the summer and transitioning into deeper water in the fall until the winter and spring when most of the catch was at the shelf break ( Figure S5b).

| Black sea bass GAM thermal response curves
Only GAMs based on NEFSC trawl survey and observer data exhibited a clear peak in thermal response, and both were equally welldescribed with approximately 15% deviance explained. While the patterns varied slightly, they were generally the same with a positive response between approximately 8.5℃ and 25℃. The thermal optimum (T opt ), the temperature at which the GAM peaked, was slightly warmer for the NEFSC data than it was for the observer data (18℃ vs. 16.7℃ ROMS bottom temperature). The NEFSC data resulted in a slightly bimodal GAM, with a weak and narrow spring peak in cool temperatures as well as a wider and stronger peak in warmer fall waters, as opposed to the smoother curve created by the year-round observer data (Table 2, Figure 2). This is similar to the partial GAM including only bottom temperature from the McHenry et al. (2019) model that also used fisheries-dependent observer data ( Figure S6b).
The GAM response derived from NJDEP data increased toward higher temperatures before plateauing without reaching a natural peak or high-temperature decrease. Despite the difference in shape, the lower thermal limit from this curve is remarkably similar to the lower limit in the NEFSC and observer data, at 7.7℃ for in situ bottom temperature and 8.7℃ for ROMS bottom temperature (   Figure S6a). The GAM based on laboratory data was somewhere between the two curves, with a gradual rise at low temperatures and minimal decrease at high temperatures.
A peak in the laboratory-based data existed but was not as strong as the other peaks and was a few degrees warmer than survey-based GAMs at 24.8℃ ( Figure S6c).
The remaining two fisheries data sources both performed poorly (<5% deviance explained) with erratic response curves (Table 2, Figure 2). For the NEAMAP-based GAM, this was likely due to the narrow range of temperatures sampled ( Figure S3), while for the CTDEEP-based GAM it was more likely attributed to the isolated nature of Long Island Sound ( Figure S4). Due to the poor performance of these models, they were excluded from the cross-validation.

| Cross-validation
Because the GAMs based on NEFSC and NJDEP survey data were similar when fitted to in situ bottom temperature and ROMS bottom temperature (Figure 2), and no in situ data were available for the observer dataset, cross-validations were only performed using ROMSbased models and temperatures. The scaled predictive response curves for each survey based on the full dataset that were used for the habitat quality ranking cross-validation are shown in Figure 3.
The NEFSC survey-and observer-based GAMs each performed well in both the random and time-grouped cross-validations, with nearly every iteration having an AUC value over 0.75. The NJDEPbased GAM also performed relatively well, though not as well as the shelf-wide models, with the majority of iterations exceeding an AUC of 0.65 (Figure 4). Cross-validation for predictions based on years not included in model training were comparable to random crossvalidation, but with a wider range in predictive ability (Figure 4).
Shelf-wide NEFSC and observer survey-based GAMs were the most skillful, followed by the NJDEP-based GAM ( Figure 5). None of the GAMs were successful at predicting presence-absence in the NEAMAP or CTDEEP data.
The three surveys (NEFSC, Observer, NJDEP) had an increasing proportion of tows containing black sea bass with increasing habitat quality ranking, further indicating good model performance ( Figure 6). The laboratory cross-validation did not perform as well as the surveys. There was a small uptick at the lowest habitat quality ranking, which was likely due to the very gradual slope of the curve as well as a small number of black sea bass that were observed in very cold water (<5℃, scaled response between 0 and F I G U R E 2 GAM log-odds response ±1 SE for data from each survey as a response to in situ measured bottom temperature (a) and monthly averaged bias-corrected ROMS bottom temperature (b)  Figure 3) in the observer dataset included in testing for the laboratory-based GAM. It should also be noted that 12℃ was the coldest temperature that black sea bass metabolic rates were measured for, so inferences colder than 12℃ were based on the model generated from warmer temperatures. Excluding the minor deviation at the lowest habitat quality ranking, black sea bass presence increased steadily with habitat quality ranking for the laboratory-derived GAM, until reaching the highest two levels where black sea bass became less prevalent. The laboratory-derived thermal optimum, by definition the highest habitat quality ranking, was several degrees warmer than seen in any of the survey GAMs (Figures 3,6) or even frequently sampled by any of the surveys (Figures S2-S5), so the highest quality rankings for that data were shifted into temperatures black sea bass were rarely observed in the field.

| D ISCUSS I ON
Our results demonstrate that differences in survey designs and the time of year that data are collected can strongly affect final model results, and similarities and differences between models can guide uncertainty estimates and model interpretations for various applications. Specifically in our data, varying the data source used as input to a thermal habitat model yielded different model output (e.g., T opt range 17-25℃) but led to similar lower positive temperature response bounds (~8℃). Similarities between models can improve our confidence in their use, while differences can inform uncertainty estimates. The GAM-derived thermal habitat models had skill at predicting years not included in model training, with only a few exceptions from data sources that performed poorly overall. However, due to the intense seasonal variability in the MAB, the models that were based on fishery data that did not cover all four seasons as well as a large spatial portion of the region, both inshore and offshore waters, are unlikely to be reliable if applied to seasons not included in the training dataset. The observer data source was the only dataset that included black sea bass presence-absence data for all 12 months of the year. However, as a fisheries-dependent data source, it is more appropriate for species distribution analyses such as our research than it would be for biomass and abundance studies that require fisheries-independent data such as the NEFSC bottom trawl survey. F I G U R E 4 Boxplot of AUC cross-validations metrics for the three well-performing surveys (NJDEP, observer, and NEFSC). AUC values near 0.5 indicate models no better than random prediction, and closer to 1 indicating highest performance. Blue boxplots show the range of AUC values for the random cross-validations, compared to the grouped cross-validation with consecutive groups of years removed for testing shown in green

F I G U R E 6
Habitat quality ranking cross-validation for the three surveys with well-performing GAMs (NEFSC, NJDEP, and observer) and the laboratory-based GAM, as shown in Figure 3. Good performance is indicated by increasing percentage of tows containing black sea bass with increasing quality ranking, regardless of the total scale of the percentage G Quality Ranking % Tows with Black Sea Bass  Figure 3 projected onto those temperature fields for the NEFSC-based GAM (c, spring; d, fall), the observer-based GAM (e, spring; f, fall), and the laboratorybased GAM (g, spring; h, fall) physiological effects of those features are not well-established (Helmuth et al., 2005). Furthermore, our study only included juveniles and adults that are capable of migrating out of suboptimal water temperatures. The effects of temperature and other environmental variables on early life history stages that have more restrictive thermal constraints must be considered to achieve a more comprehensive understanding of a species ecosystem response, as well as inform possible time lags in large-scale thermal responses (Berlinsky et al., 2004;Cowen et al., 1993;Houde, 1989).
Despite variability in the steepness of the curves and the exact location of the optimal temperature, the two shelf-wide thermal response curves, one based on fisheries-dependent data and the other based on fisheries-independent data, were very similar with a gradual increase in habitat quality with temperatures up to about 18℃ followed by a rapid decrease in quality with temperatures above that. The only skillful GAM based on inshore data (NJDEP) plateaued and failed to exhibit a decrease in thermal habitat quality at high temperatures. This GAM did not perform as well as the shelf-wide surveys even in the within-survey cross-validation and also had higher skill at predicting the shelf-wide catch data than it did predicting its own data. This suggests that bottom temperature alone is a more effective predictor shelf-wide than it is in shallow inshore waters where other features such as productivity, bottom topography, terrestrial and riverine inputs, competition, spawning dynamics, and predatorprey interactions may play a larger role (Cowen et al., 1993;Del Vecchio & Blough, 2004;Vodacek et al., 1997). Cooler temperatures may be more deterministic to predicting distribution limitations than warmer temperatures for the US Northern stock of black sea bass population, which inhabits the most northern latitude for the species, an observation supported by several previous studies (Miller et al., 2016;Sullivan & Tomasso, 2011;Younes et al., 2020)  by Atwood et al., 2003). Under laboratory conditions, fish were fed ad libitum and relieved of energy expenditure related to finding food and evading predators. Therefore, the warmer thermal optimum observed in the laboratory could be a function of increased available energetic reserves allowing higher tolerance to warmer temperatures (Clark et al., 2013). Additionally, the use of aerobic scope as an indicator to determine optimal temperatures has recently been critiqued (see Jutfelt et al., 2018). The laboratory-derived thermal optimum should therefore be assumed to be near a maximum thermal tolerance limit, should other limiting factors in the environment (e.g., food, reproductive costs) be minimal. There may be potential flexibility for fish to acclimatize to a higher thermal optimum as ocean warming continues, especially given that the black sea bass U.S. Southern stock extends into warmer temperatures further south and into the Gulf of Mexico (Drohan et al., 2007;McCartney et al., 2013). However, the perceived lower thermal limit is likely to remain a habitat distribution limit. Additionally, physiological data encapsulates a different metric of fitness than presence-absence data, such that presence-absence data may best reflect true shortterm in situ oceanographic limits whereas the laboratory data reflects longer-term effects on fitness (e.g., reduced reproduction) that downstream may affect the black sea bass population. Restricting GAMs to variables that are represented in hydrodynamic models (e.g., bottom temperature) allows for development of coupled habitat models that can have many applications, and the reliability of hydrodynamic models as they apply to niche models is supported by the similar response of in situ versus modeled temperature in this study. Thermal habitat models for various species have been coupled to ocean temperature hindcasts (Kleisner et al., 2016), to nowcasts and forecasts, and to long-term climate change projections in order to estimate future habitat availability (Crear et al., 2020;Kleisner et al., 2017;McHenry et al., 2019;Tanaka et al., 2020). We coupled three of our GAMs (NEFSC survey-based, observer-based, and laboratory-based) to seasonally averaged hindcast bottom temperature and the resulting spatial maps show that while the intensity of the spatial gradients varies, patterns of habitat quality are very similar. During the spring, habitat quality is weak across the entire shelf for all three models except for a narrow stretch of favorable habitat along the shelf break and at the southern boundary of the bight, while during the fall they each show favorable habitat across the majority of the MAB shelf and Georges Bank (Figure 7). The laboratory-based GAM shows a much less intense spatial gradient, reflecting the gradual increase in habitat quality with temperature in cool water due to the lack of metabolic rate measurements at temperatures <12℃. Despite the strong similarities between the two shelf-wide GAM response patterns, the observer-based GAM predicted stronger habitat quality than predicted by the NEFSC survey-based GAM. Thus, choosing an inappropriate model could be effective at reproducing the general spatial pattern of habitat, but may miss the strength of the spatial gradients in habitat quality or certain key patches of high-or low-quality habitat. The distribution of the northern stock of black sea bass has been shifting to the north (Kleisner et al., 2016) and projections suggest that this northern shift will persist under continued ocean warming (Kleisner et al., 2017;McHenry et al., 2019), but spatial habitat modeling research must consider the uncertainty caused by fishery data used to train models particularly as it applies to habitat coupled to climate projections or other scenarios where the environment is very different from the data the habitat model were initially trained with.
GAMs are frequently used to determine the relationship between marine species and the ocean environment. However, these models often yield overfitted results that can be easily misinterpreted. To avoid this, we suggest the following: (1) consider that data source can affect the shape of the model's curve and thus the ultimate intended application of the GAM (e.g., estuary, nearshore, offshore habitat modeling); (2) GAMs coupled to hydrodynamic models should only be applied to times of year similar to those that were actually sampled by the model training data sources (e.g., applying a GAM to seasons not included in the training of the model will need to be interpreted with extreme caution); (3) an ensemble of habitat model types (e.g., GAMs, linear models, random forest, machine learning) can also be used to address single model limitations such as GAM overfitting (Tanaka et al., 2020); and (4) thermal response variability between models developed from different data sources and modeling methods can be used to inform uncertainty levels in thermal habitat predictions.

ACK N OWLED G M ENTS
We thank the Northeast Fisheries Science Center, the National

CO N FLI C T O F I NTE R E S T
None.

DATA AVA I L A B I L I T Y S TAT E M E N T
Northeast Fisheries Science Center spring and fall bottom trawl survey data were downloaded from https://ocean adapt.rutge rs.edu/.
All other data were obtained via email requests to the various surveys included in this analysis.