Improved prediction of Canada lynx distribution through regional model transferability and data efficiency

Abstract The application of species distribution models (SDMs) to areas outside of where a model was created allows informed decisions across large spatial scales, yet transferability remains a challenge in ecological modeling. We examined how regional variation in animal‐environment relationships influenced model transferability for Canada lynx (Lynx canadensis), with an additional conservation aim of modeling lynx habitat across the northwestern United States. Simultaneously, we explored the effect of sample size from GPS data on SDM model performance and transferability. We used data from three geographically distinct Canada lynx populations in Washington (n = 17 individuals), Montana (n = 66), and Wyoming (n = 10) from 1996 to 2015. We assessed regional variation in lynx‐environment relationships between these three populations using principal components analysis (PCA). We used ensemble modeling to develop SDMs for each population and all populations combined and assessed model prediction and transferability for each model scenario using withheld data and an extensive independent dataset (n = 650). Finally, we examined GPS data efficiency by testing models created with sample sizes of 5%–100% of the original datasets. PCA results indicated some differences in environmental characteristics between populations; models created from individual populations showed differential transferability based on the populations' similarity in PCA space. Despite population differences, a single model created from all populations performed as well, or better, than each individual population. Model performance was mostly insensitive to GPS sample size, with a plateau in predictive ability reached at ~30% of the total GPS dataset when initial sample size was large. Based on these results, we generated well‐validated spatial predictions of Canada lynx distribution across a large portion of the species' southern range, with precipitation and temperature the primary environmental predictors in the model. We also demonstrated substantial redundancy in our large GPS dataset, with predictive performance insensitive to sample sizes above 30% of the original.


Abstract
The application of species distribution models (SDMs) to areas outside of where a model was created allows informed decisions across large spatial scales, yet transferability remains a challenge in ecological modeling. We examined how regional variation in animal-environment relationships influenced model transferability for Canada lynx (Lynx canadensis), with an additional conservation aim of modeling lynx habitat across the northwestern United States. Simultaneously, we explored the effect of sample size from GPS data on SDM model performance and transferability. We used data from three geographically distinct Canada lynx populations in Washington (n = 17 individuals), Montana (n = 66), and Wyoming (n = 10) from 1996 to 2015. We assessed regional variation in lynx-environment relationships between these three populations using principal components analysis (PCA). We used ensemble modeling to develop SDMs for each population and all populations combined and assessed model prediction and transferability for each model scenario using withheld data and an extensive independent dataset (n = 650). Finally, we examined GPS data efficiency by testing models created with sample sizes of 5%-100% of the original datasets. PCA results indicated some differences in environmental characteristics between populations; models created from individual populations showed differential transferability based on the populations' similarity in PCA space. Despite population differences, a single model created from all populations performed as well, or better, than each individual population. Model performance was mostly insensitive to GPS sample size, with a plateau in predictive ability reached at ~30% of the total GPS dataset when initial sample size was large. Based on these results, we generated well-validated spatial predictions of Canada lynx distribution across a large portion of the species' southern range, with precipitation and temperature the primary environmental predictors in the model. We also demonstrated substantial redundancy in

| INTRODUC TI ON
Species distribution models (SDMs), which compare environmental conditions at presence and background locations and calculate a relative probability of habitat suitability (Elith & Leathwick, 2009), are a useful tool to better understand the distribution of a species' habitat across landscapes (Elith & Leathwick, 2009;Guisan & Thuiller, 2005). These models can provide both an understanding of the specific environmental components that might define a species' habitat as well as generate spatial predictions of distribution at a landscape scale (Elith & Leathwick, 2009). Species distribution models have been used extensively to create maps of predicted habitat (Derville et al., 2018;Gantchoff et al., 2019), evaluate threats from climate change or increased anthropogenic disturbance (Diniz-Filho et al., 2009;Requena-Mullor et al., 2019), or consider habitat corridors and connectivity (Zeller et al., 2018). Accurate SDMs are particularly important for landscape-scale conservation planning given the large-scale changes associated with climate (Park Williams et al., 2013), anthropogenic alterations (Curtis et al., 2018), habitat loss and fragmentation (Sala et al., 2000), wildfire (Hansen et al., 2010), and insect outbreaks (Kurz et al., 2008). However, one of the limitations faced by SDMs, and indeed all ecological models, is uncertainty about their transferability when applied to novel conditions (Lonergan, 2014;Yates et al., 2018).
When SDMs are implemented across a species' range, they assume a uniform response to the variety of environmental conditions encountered. However, SDMs often encompass multiple, geographically distinct populations which may vary in their responses to local conditions (Barbosa et al., 2009;Habibzadeh et al., 2019;Valladares et al., 2014). Differentiation between individual populations may generate poor model performance outside the model training area, producing erroneous conclusions if that model is applied to other areas. The importance that regional variability plays in SDMs has been demonstrated frequently in plants (O'Neill et al., 2008;Valladares et al., 2014), amphibians (Davies et al., 2019), birds (Habibzadeh et al., 2019), and mammals (Barbosa et al., 2009).
Regional variation in intraspecific habitat relationships has been attributed to multiple biological processes, including local adaptation through genetic differentiation (Peterson et al., 2019), biotic interactions (Wisz et al., 2013), or functional responses to differences in habitat availability (Vanreusel et al., 2007). By understanding differences in environmental relationships associated with individual populations, we can improve the development of SDMs, generating improved model predictability and transferability (O'Neill et al., 2008;Vanreusel et al., 2007).
Additionally, SDMs may not generalize geographically because of model over-fitting, whereby model predictive ability is high in areas where data were collected, but low in areas outside those conditions (Wenger & Olden, 2012). Complex models with excessive environmental covariates, for instance, may result in models which are less generalizable to novel areas (Yates et al., 2018). Similarly, models with large amounts of localized data may not generalize to other landscapes because of the specificity of the species-environment relationships characterized (Boria & Blois, 2018;Wenger & Olden, 2012). While the impact of sample size on SDMs has been extensively considered, the general concern has been with too little data, rather than too much (Hernandez et al., 2006;Stockwell & Peterson, 2002). However, the recent availability of extensive Global Positioning System (GPS) datasets presents a novel challenge to conventional SDMs as there is little consensus regarding how to treat the large volume of animal relocations (Gantchoff et al., 2019;Li et al., 2017;Magg et al., 2016;Maiorano et al., 2015;Rice et al., 2013;Shoemaker et al., 2018) which may create redundant or spatially correlated nonindependent information with respect to species distributions, particularly if few animals are sampled. Yet, GPS data provide high spatial accuracy, reduced sampling bias, and less species misidentification; all these issues plague the opportunistic sampling schemes common in SDM literature (Aubry et al., 2017;Newbold, 2010). The challenge of modeling distributions of species with large GPS datasets has received little attention (but see Boria & Blois, 2018), but given the availability and benefits of extensive GPS data, an evaluation of the trade-offs between sampling efficiency and SDM performance is needed.
Our study goals are twofold: (a) evaluate SDM generalizability to model the distribution of Canada lynx (Lynx canadensis; hereafter lynx), a federally listed specialist forest carnivore in the contiguous United States, and (b) develop a process to assess GPS data efficiency with respect to SDM predictability and transferability. Lynx rely almost entirely on snowshoe hares (Lepus americanus) as a food source (Aubry et al., 2000;Squires & Ruggiero, 2007), and thus are closely tied to boreal forests with high horizontal vegetation cover (Holbrook et al., 2017;Squires et al., 2010). Lynx are an excellent species to assess geographic generalizability of SDMs across populations, because we expect habitat specificity and selection for a narrow range of environmental conditions to result in less intraspecific variation and more habitat generalizability compared to generalist species (Bonthoux et al., 2017;Yates et al., 2018). We used data from three geographically distinct populations at the species' southern range periphery in Washington, Montana, and Wyoming, USA. Our conservation aim was to model the distribution of habitat capable our large GPS dataset, with predictive performance insensitive to sample sizes above 30% of the original. of supporting lynx across the northwestern United States, including areas outside known populations. To inform predictions of SDM generalizability among lynx populations, we first evaluated regional variation in lynx-environment relationships between populations.
We hypothesized that, if regional variation was present, models built on individual populations would perform best for the training population but be less transferable outside that population. We suspected a combined model (using all populations) might perform more poorly on any single population but have higher overall performance across the entire region. We assessed model performance using withheld data as well as an independently collected dataset. To evaluate the efficiency of GPS data in SDMs, we compared model performance and transferability across a range of sample sizes to determine optimal sample size for SDMs when using GPS datasets.

| Study areas
Our study area covered a large region in the northwestern United States, including parts of Washington (WA), Idaho (ID), Montana (MT), and Wyoming (WY), as well as the area directly to the north, including parts of British Columbia and Alberta, Canada ( Figure 1). We bounded the study area using the level II ecoregion "western cordillera," which is primarily forested mountains with limited grasslands or other open areas (Omernik & Griffith, 2014). Within our study area were three monitored lynx populations: one in north-central Washington and into Canada, one in western Montana, and one in northwest Wyoming (Figure 1). These populations are discrete and, though genetic data indicates that north-south movement renders the contiguous United States and Canada populations panmictic (Schwartz et al., 2002), telemetry data from marked individuals exhibit no east-west dispersal between populations. Pairwise distances between population centroids were approximately 400 km, 600 km, and 1,000 km for Washington and Montana, Montana and Wyoming, and Wyoming and Washington, respectively. General environmental conditions averaged at lynx locations within each geographic area are given in Table 1; we calculated elevation from a digital elevation model (DEM; U.S. Geological Survey, National Elevation Dataset), and mean annual precipitation, mean annual temperature, and mean snow depth on April 1 from Wang et al. (2016).

| Occurrence data
We used GPS data from radio-collared lynx. Data

| GPS data efficiency
To explore the impact of sample size on model performance and determine the optimum sample size of GPS locations for model calibration, we performed a sensitivity analysis of predictive performance across varying sample sizes. From the original dataset (WA n = 7,476, MT n = 22,510, WY n = 670), we randomly sampled a percentage of each population (MT, WA, or WY) from 5% to 100% of the original sample size in increments of 5%. For each sample size, we selected an equal number of background locations within the extent of each population and fit the same ensemble model including 11 topographic and climate variables and six modeling algorithms, and evaluated models using withheld and independent datasets (see below for full modeling and validation details). We compared model performance using AUC (Marmion et al., 2009) to assess model predictive ability as well as transferability across sample sizes. We used the outcome from the sample size simulation to determine optimum trade-off between model performance and data parsimony, with the assumption that the sample size reached before a drop in performance had little to no data redundancy or spatial correlation, and adopted this sample size (WA n = 2,243, MT n = 6,753, WY n = 540) for each GPS dataset for the remainder of our analyses.

| Regional variation between populations
To explore the hypothesis that regional variation was present between populations, we performed a principal component analysis (PCA; Hällfors et al., 2016). If regional variation was present, we expected to observe distinct clustering of the three populations within the PCA dimensions. We used all 16 covariates from our models and ran the PCA on the lynx locations from the dataset used in the SDM modeling process using the "PCA" function from the R package "FactoMineR" (Lê et al., 2008). We plotted lynx locations with 95% confidence intervals of clustering on the first two dimensions of the PCA to visualize grouping of the populations. We used the correlation between individual covariates and the first two principal components to inform which covariates were contributing most to each component. This allowed us to identify the environmental gestalt associated with each population. We hypothesized that populations similar in principal component space would be more transferable to each other than populations farther away, regardless of geographic distance, because of environmental similarity.

| SDM development
We constructed separate SDMs for each individual population and a regional model with all combined populations. Since one of our TA B L E 1 Mean and range of environmental conditions averaged across Canada lynx locations at each of the three distinct populations used to make species distribution models across the northwestern United States modeling goals was to explore the effects of data efficiency given the use of large GPS datasets, we considered three sample size scenarios for models from the entire region: unequal sample sizes from each region ("Unequal," based on initial size of each population dataset; WA n = 2,243, MT n = 6,753, WY n = 540), equal sample size where possible based on Washington ("WA Equal," MT and WA n = 2,243, WY n = 540), and equal sample size based on Wyoming ("WY Equal," all sample sizes reduced to equal WY sample size n = 540; Figure 2). Presence locations for reduced datasets were chosen randomly from the initial population dataset. Since SDMs are often sensitive to the extent and locations chosen as randomly distributed background data (Iturbide et al., 2018), we also considered two scenarios to explore the effect of background extent of individual population models on model prediction and transferability: background data from either the entire region or an area associated with only the local population (Figures 1 and 2). We split our combined regional study area into three population areas subjectively based on landscape features such as large rivers and nonforested spaces that we hypothesized would be difficult for lynx to cross ( Figure 1). This resulted in a total of 9 modeling scenarios ( Figure 2).
Background locations were initially sampled at approximately 1 point per 1.5 km 2 across the study area to ensure adequate coverage. We then subsampled from these points to create a background sample equal to the number of lynx GPS locations per population, depending on which scenario was being modeled. We used the "biomod2" package   . This resulted in 60 models per scenario, which were combined into a weighted average based on area under the curve (AUC) of the receiver operating characteristic (ROC), so that better-performing models contributed more to the final ensemble, with the threshold for inclusion greater or equal to the median AUC calculated from all 60 models. Ensemble modeling has demonstrated equal or superior predictive performance relative to single models (Hao et al., 2020;Marmion et al., 2009).

| SDM validation
We assessed model predictive performance using AUC (Fielding & Bell, 1997), the continuous Boyce index (Hirzel et al., 2006), and the minimal predicted area (MPA; Engler et al., 2004). The AUC considers model discriminatory ability at all possible thresholds; we used the partial-area ROC (Peterson et al., 2008), which uses the proportion of background area predicted as present, rather than absence locations, as the x-axis metric. This variation makes the AUC metric more applicable to SDMs, since the models are based on presence and background (rather than presence and absence) data. For background data, we again randomly sampled the entire study area at a density of 1 point per 10 km 2 to provide a spatially well-distributed sample. The continuous Boyce index quantifies the delineation of capable habitat using a Spearman rank correlation between the ratio of predicted to expected number of presence locations and mean habitat capability grouped into equal-area bins (Boyce et al., 2002;Hirzel et al., 2006). MPA uses a chosen threshold (in our case 90% of presence locations) applied to the prediction surface to determine extent of the area above this threshold; this evaluation provides a metric of model efficiency, illustrating the trade-off between correctly identifying presence locations while doing so with a minimum of predicted area. We used the R package "pROC" (Robin et al., 2011) to calculate AUC and "ecospat" (Di Cola et al., 2017)

| SDM mapping
To identify key conservation areas for sensitive species, like lynx, that occupy extensive ranges, we generated predictions from the top-performing SDM in both continuous and categorical formats.
Continuous predictions provide a detailed look at the relative habitat suitability of lynx across the study area, while a categorical map provides simplified predictions that may be more useful to managers responsible for conservation planning (Freeman & Moisen, 2008). For example, an important application for the lynx SDM developed here is to generate habitat predictions in areas between the three main populations. Therefore, we applied a threshold to our top-performing model chosen to include 90% of lynx GPS locations (composed of reproductive populations on home ranges) as "high" probability lynx habitat and a threshold chosen to include 85% of independent data as "medium" probability lynx habitat. The independent location data for lynx included incidental sightings of animals outside the range of core populations and therefore may represent a larger array of behaviors and thus of habitat use. We chose the 90% and 85% cutoff for high and medium lynx data, respectively, to maintain a high conservation standard with low acceptable error (here 15% or less) and using values consistent with data cut-offs for home-range delineation and various habitat thresholds in the literature (Börger et al., 2006;Freeman & Moisen, 2008). However, to acknowledge a range of potential thresholds for different conservation goals, we also considered two thresholds that bracketed these criteria, one of 95% lynx locations and 90% independent data, and a second of 85% lynx locations and 80% independent data.

| GPS data efficiency
We found model performance to be mostly insensitive to sample size. Models trained on 100% of GPS location data were less than 0.05 AUC (<5% gain) better than those trained on only 5% when

| Regional variation between populations
Counter to our expectations for a specialist species, some regional variation was present across the three populations of lynx as demonstrated through clustering in PCA space. Wyoming and Montana populations were the most differentiated, while Washington exhibited a combination of characteristics between Wyoming and Montana ( Figure 5). The PCA explained 33% of the variation in the first two axes, with PC1 dominated by precipitation-related covariates (summer and winter precipitation, relative humidity, and soil pH) and PC2 dominated by vegetation-related covariates (long-term NDVI, forest heterogeneity, and road density; Appendix C: Tables C1   and C2). The Wyoming population was grouped on the PCA axes based on less moisture, lower long-term NDVI, and more forest heterogeneity than the Montana population. Interestingly, Washington fell in between Montana and Wyoming along these axes, despite its relative isolation in geographic space (Figure 1).

| Lynx SDM performance
Consistent with the PCA results, individual lynx population models performed well in the area from which they were developed and were less transferable to other populations (Table 2). Based on F I G U R E 3 Performance of species distribution models, as measured by the area under the curve (AUC), for a range of sample sizes from 5% to 100% of the original Canada lynx GPS dataset. The first panel shows model performance when evaluated on data within the area that the model was trained on (Calibration Area). The second through fourth panels show the performance of models trained on a given population ("MT" = Montana, "WA" = Washington, "WY" = Wyoming) when transferred to the remaining populations. For example, "WA Transferability" shows models calibrated in Washington but tested on data from Montana and Wyoming F I G U R E 4 Performance of species distribution models, as measured by area under the curve (AUC), for a range of sample sizes from 5% to 100% of the original Canada lynx GPS dataset. This figure shows a close-up of the first panel from Figure 3, of model performance when evaluated on data within the area that the model was calibrated on. Model performance for each region ("MT" = Montana, "WA" = Washington, "WY" = Wyoming) improves steeply from 5% to approximately 30%, but plateaus thereafter model performance assessed on both withheld and independent data, the regional model that used 30% of Washington data and a Montana sample size to match ("WA Equal," Table 2) was the most predictive of lynx use locations across each population and the entire region combined (see Appendix D for validation results for continuous Boyce Index and MPA). Individual population models made from 30% of the data from each population were slightly more predictive for Montana (AUC = 0.981) and Washington (AUC = 0.959) than the regional model (MT AUC = 0.974, WA AUC = 0.954), but the "WA Equal" regional model performed best in the Wyoming population and across all three populations together (Table 2). Regional models from the three combined populations were consistent in performance when tested separately on each population and exhibited good predictive performance of withheld data (AUC > 0.90) in each population and good predictive performance of independent data (AUC > 0.80) in each population (Table 2). Spatial predictions from the "WA Equal" model matched well with our expectations of lynx habitat and demonstrated areas of high habitat probability in the areas with known reproductive lynx populations as well as smaller islands of probable habitat in areas between populations ( Figure 6). Covariates of greatest relative importance were primarily related to snow and precipitation, with mean temperature in the coldest month contributing the most to model predictions, and lesser contributions from snow water equivalent, precipitation in summer and winter, and long-term NDVI (Figure 7). For population-specific models, background extent (population versus region) had very little effect on model performance within the calibration area, but model transferability was better for models made with population-level backgrounds ( Table 2).

| SDM mapping
Our best-performing SDM generated predictions consistent with known lynx habitat use (Mckelvey, 2000), with Canada lynx patchily distributed in mountainous areas throughout the Pacific Northwest and the Greater Yellowstone Area (see Figure 8 for details).
Categorical predictions created by 90% and 85% threshold values when applied to the "WA Equal" model delineated the location of habitat most likely to be selected by lynx in a reproductive population ("high" probability habitat) and habitat that was less favorable but potentially still used by lynx ("moderate" probability habitat), particularly for connectivity or as part of a matrix with "high" and "low" probability habitat (Figure 8). We delineated 34,930 km 2 of "high" probability habitat and 125,580 km 2 of "moderate" probability habitat across the study area. By state, Montana had the largest area of "high" habitat, with 11,961 km 2 , followed by Washington (4,411 km 2 ), Idaho (2,497 km 2 ), and Wyoming (2,424 km 2 ). Differences in amount of area in each category were more pronounced with changes in the threshold generated from independent data, since this dataset included more variation in habitat use (Appendix E: Figures E1 and E2). We expected generalizability between individual lynx population models given the known habitat specificity of lynx but found that, while lynx exhibit narrow habitat selection (Holbrook et al., 2017;Squires et al., 2010), there was enough variation in local animal-environment relationships to limit transferability of any single population model to our entire inference area. Regional models built using data from all populations combined, however, performed strongly across the entire study area, generated predictions for areas that were outside the three main populations and thus lacked data, and performed comparably to individual population models. Our use of principal components analysis (PCA) to examine regional variation between populations revealed differences and similarities between populations, and thus provided informed predictions of model transferability. The use of GPS data in our work resulted in models with very high predictive accuracy, which was maintained above 0.90 AUC even when data were reduced to approximately 5% of their original sample size.

| D ISCUSS I ON
SDMs are often constructed with opportunistic data collected across large spatial extents or with intensive data collection across smaller extents (Aubry et al., 2017;Thuiller et al., 2006). Few studies have the resources required for extensive data collection at multiple locations across a large area (Bonthoux et al., 2017). However, we combined GPS data from multiple collaborators to directly assess regional differences in habitat selection across populations within a large spatial area. We believe that large-scale species distribution modeling will increasingly benefit from similar collaborative approaches for creating accurate, regional-scale suitability models for other species and regions, given the widespread prevalence of GPS monitoring of a range of species by academic, government, and nonprofit institutions.
We found that individual population models performed well for a given population but were less predictive when generalized across the region, consistent with the presence of regional varia-  Generalizability of SDMs is also predicted to be related to specificity in diet or habitat selection (Bonthoux et al., 2017;Yates et al., 2018), although this pattern appears to be born out in some  (Bonthoux et al., 2017). Specialists are generally predicted to select a narrower range of environmental conditions (Kassen, 2002;Peers et al., 2012), and thus are predicted to favor homogenous environments with resource use similar and transferable across populations. Canada lynx reliance on snowshoe hares as prey make them similarly reliant on the environmental conditions that favor hares (Ivan & Shenk, 2016;Squires et al., 2010). Previous works show that lynx select boreal forest environments with deep snow and high horizontal cover (Holbrook et al., 2017;Mowat et al., 2000;Squires et al., 2010), leading to predicted transferability of SDMs. Instead, models from each individual population had marginal fit when applied to geographic areas outside their training location.
One possible explanation is that lynx may use alternate prey when necessary; while their dependence on hares is well known, when hare abundance is low they may turn to alternative prey such as blue grouse (Dendragapus obscurus) or red squirrels (Tamiasciurus hudsonicus) (Ivan & Shenk, 2016), and thus differ somewhat in habitat use. Alternatively, while the populations sampled may vary in some environmental characteristics, they may be similar enough in features important to hares, such as high horizontal cover in mature forests (Squires et al., 2010), that lynx can find adequate food while still exhibiting habitat differentiation. The lynx population in Wyoming, for instance, is located in habitat that appears strikingly similar in forest structure and horizontal cover to lynx habitat in Montana (J. Squires, pers. com.). Additionally, the lynx in Wyoming that were monitored with Argos collars were partly comprised of individuals originally reintroduced from Canada to Colorado and had exhibited long-distance post-reintroduction movements (Devineau et al., 2010). These animals might therefore have been exhibiting atypical habitat selection, which may have included a less specialized pattern of selection, possibly also contributing to the low transferability of the Wyoming model.
Interestingly, despite differences in animal-environment relationships between populations, the regional model which included data from all populations performed well across the entire study area. Given the lack of generalizability demonstrated by the individual population models, we might expect that a SDM F I G U R E 7 Estimated variable importance of each covariate to the best-performing species distribution model. Variable importance was estimated by permuting each covariate in turn, generating predictions, and comparing predictions to those from the original, unpermuted model. If a covariate was important, predictions would be changed and the correlation between sets of predictions would be lower created from all populations would perform more poorly in any given population than a model created only on those data (Torres et al., 2015). Instead, the regional model performed better than the individual population model for Wyoming and was nearly indistinguishable in performance from population-level models for Washington and Montana. The strong performance of the regional model might be explained by the larger geographic range that it sampled. Sampling a larger portion of the range is more likely to encompass the fundamental niche of lynx, thus increasing the predictive performance of the model across the study area. In other F I G U R E 8 Categorical spatial predictions of Canada lynx relative habitat probability across the study region in the northwest United States, as generated by the top-performing species distribution model. Model thresholds are based on correctly assigning 90% of Canada lynx withheld GPS locations for the "High" category and 85% of independent lynx locations for the "Moderate" category. Background image sources ESRI, USGS, NOAA words, while any one population is unlikely to represent the totality of a species' geographic distribution, a sufficient sample of multiple populations throughout a larger portion of its range is capable of describing individual populations quite well. Qiao et al. (2018) showed that SDMs were more transferable when more of the fundamental niche was used for model training, resulting in less extrapolation between calibration and transfer regions. Here, the covariates that had the most effect on lynx habitat capability were primarily temperature and moisture related, with the top four variables all related to snow, precipitation, or cold temperatures, as well as NDVI, a measure of long-term forest presence or productivity. These results have conservation implications for the species' future at the southern range periphery under a changing climate, as temperature is likely to increase and snow to decrease if anthropogenic climate change continues unabated. Previous work has shown that warming trends are more severe in areas with mean annual temperatures in the range of 0°C to 5°C, due to a snow-ice feedback loop where loss of snow causes lowered surface albedo, which in turn further speeds warming (Pepin & Lundquist, 2008). Our study area had a mean annual temperature ranging from −1°C to 12°C (Table A1), suggesting that snow-ice feedback might influence warming patterns in lynx habitat, resulting in faster warming and decreased habitat suitability. King et al. (2020) found a similar susceptibility to changes in temperature and snow pack for the persistence of Canada lynx at their range periphery in Washington.
We found the amount of data provided by most GPS studies may greatly exceed what is necessary for peak SDM model performance and may be deleterious to model generalizability at some sizes, possibly reflected in the decreased transferability of our large dataset from the Montana population, as compared to the smaller dataset of Washington. Boria and Blois (2018) found that an SDM using approximately 13,000 occurrences from deer mice (Peromyscus maniculatus) decreased in predictive ability at large sample sizes, and that models with 10%-20% of the presence locations performed as well as those with greater percentages. Our results were similar, in that models with approximately 30% or more of our ~22,000 occurrences performed similarly. This number may be influenced by the number of individuals or sample size, however, as Wyoming, which had the fewest individuals and smallest sample, required closer to 70%-80% of the dataset to reach peak predictive performance. While the sample size of our Wyoming population was small compared to other datasets in our study, the number of presences was large (n = 670) compared to what is often recommended as the minimum sample size necessary for species distribution modeling (n ≈ 25, Hernandez et al., 2006; 50 < n < 100, Stockwell & Peterson, 2002). The Wyoming model performed well when assessed within the model training area, but exhibited poor transferability, which reinforces the need for caution in extrapolating even models that validate highly to novel areas. An aspect of GPS data collection that we acknowledge we were unable to address here was the effect of fix rate on GPS data efficiency. The fix rate, which determines the number of GPS locations taken during a given time period, was similar for GPS data from all three study populations, with one fix per hour in Montana, one fix per four hours in Washington, and one fix per three hours in Wyoming. Previous work has shown that autocorrelation increases with increased fix rate (Fieberg et al., 2010); thus, when applying methods used here, a reduction to 30% of the data should be considered when fix rates are similar, while a further reduction in data will likely be necessary for datasets with faster fix rates and less reduction when fix rate is slower.
Sensitive carnivores require large-scale monitoring to evaluate population status (Golding et al., 2018). These efforts are aided by SDMs that spatially map the likelihood of species presence or habitat suitability so ecologists and managers can evaluate management actions such as recreation or timber production (Rowland & Vojta, 2013). Our work here provides the most comprehensive evaluation of lynx habitat at the species' southern range periphery in the northwestern United States. In addition, we used an extensive sample of known lynx locations across the study area to evaluate model performance. As such, this SDM for lynx will be central to may be too small to support long-term occupancy and reproduction, they may provide valuable areas of refuge or connectivity to maintain population persistence at the species' southern range periphery (Walpole et al., 2012). The delineation of habitat patches in Canada also provides important conservation information, since these areas often act as "source" populations for the lynx populations in the northwestern United States (Schwartz et al., 2002). The methods we used here should provide managers and conservationists with a more refined depiction of "high" probability habitat, allowing conservation actions, which are limited by time and resources, to be focused on areas which will be the most beneficial to lynx.

ACK N OWLED G EM ENTS
We acknowledge the United States Forest Service Region 1 for their funding of this work. We also thank M. Kosterman, M. Schwartz, K.
Pilgrim, J. Golding, and the Southwest Crown Collective for providing additional lynx detections to the independent dataset.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
The data that support the findings of this study are openly available  (Table A1).
Since many of these covariates were highly correlated with each other, we initially ran a single global model with all covariates using only machine-learning modeling methods (global boosted models, random forest, and multiple adaptive regression splines) since these are known to be robust to correlation among covariates (Li & Wang, 2013 Note: Dimension 1 is dominated by moisture-related covariates including summer and winter precipitation, soil pH, and relative humidity, while dimension 2 is dominated by forest-related covariates including long-term NDVI, forest heterogeneity, and road density. A PPE N D I X E F I G U R E E 1 Categorical spatial predictions of Canada lynx relative habitat capability across the study region in the northwest United States, as generated by the top-performing species distribution model. Model thresholds are based on correctly assigning 95% of Canada lynx withheld GPS locations for the "High" category and 90% of independent lynx locations for the "Moderate" category. These thresholds provide a more liberal delineation of lynx habitat than the 90%/85% thresholds provided in the main paper F I G U R E E 2 Categorical spatial predictions of Canada lynx relative habitat capability across the study region in the northwest United States, as generated by the top-performing species distribution model. Model thresholds are based on correctly assigning 85% of Canada lynx withheld GPS locations for the "High" category and 80% of independent lynx locations for the "Moderate" category. These thresholds provide a more conservative delineation of lynx habitat than the 90%/85% thresholds provided in the main paper