Integrating presence‐only and presence–absence data to model changes in species geographic ranges: An example in the Neotropics

Anthropogenic changes such as land use and climate change affect species' geographic ranges, causing range shifts, contractions, or expansions. However, data on range dynamics are insufficient, heterogeneous, and spatially and temporally biased in most regions. Integrated species distribution models (IDMs) offer a solution as they can complement good quality presence‐absence data with opportunistically collected presence‐only data, simultaneously accounting for heterogeneous sampling effort. However, these methods have seen limited use in the estimation of temporal change of geographic ranges and are not yet widespread as they have a steep learning curve. Here we present a generalisable model and case example.


| INTRODUC TI ON
Mapping the temporal change of species' geographic ranges in today's changing world (Cardinale et al., 2012;Urban, 2015) is a critical task for biogeography. To describe the dynamics of entire geographic ranges of species, we need both data over large and often heterogeneous regions, sometimes across entire continents or even the world, as well as data collected over a long-time period (Yoccoz et al., 2001). Despite increasing access to open data (Wüest et al., 2020), they are still sparse and spatially and temporally biased (Boakes et al., 2010;Maldonado et al., 2015;Meyer et al., 2016;Shirey et al., 2021). Moreover, the available data rarely come from a single large and standardised sampling effort (Ondei et al., 2018), but instead comprise a mix of local surveys that used different sampling methods (e.g., camera-traps, eDNA, or acoustic data loggers; Deiner et al., 2017;Gibb et al., 2019;Steenweg et al., 2017), as well as incidental occurrence records (e.g., derived from museum specimen collections and citizen-science records; Chandler et al., 2017;Osawa, 2019). To tackle the global challenge of stopping the loss of biodiversity (IPBES, 2019) without any further delay, we must seek to develop species' distribution models with the heterogeneous data that are available today (Heberling et al., 2021).
Integrated species distribution models (hereafter IDMs) comprise a recently developed family of parametric species distribution models that combine ecological information within multiple data types that were typically collected by different survey approaches (Isaac et al., 2020;Kéry & Royle, 2015;Miller et al., 2019). IDMs capitalise on each data type's strengths, i.e., standardised surveys can provide information on the local abundance of species but often only at a relatively small number of sites, while opportunistic occurrence records can cover larger geographic/environmental spaces and can inform on range boundaries. IDMs are usually hierarchical models that explicitly model the sampling process of each dataset to account for limitations, including imperfect detection and sampling bias (Fithian et al., 2015;Fletcher & Fortin, 2018;K. Pacifici et al., 2017), as well as varying effort and area of surveys (Keil & Chase, 2019). A characteristic of most IDMs is that they assume a common underlying spatial point process that determines the spatial locations of individuals of a species (Dorazio, 2014;Fletcher Jr. et al., 2019;Miller et al., 2019). Following this assumption, parameters affecting the intensity (or density) of the resulting point pattern (e.g., land cover or climate) are estimated using the joint likelihood for all included data types (Fletcher Jr. et al., 2019). As the point pattern has no spatial resolution, it can be aggregated to spatial units of any size (Baddeley et al., 2015).
Most applications so far fit IDMs in a Bayesian framework (van de Schoot et al., 2021), which also helps propagate the uncertainties associated with each data type into the predictions and parameter estimates.
Studies have already shown the many advantages of modelbased data integration. First, the increased sample size from making use of diverse data streams tends to increase the precision of parameter estimates (Farr et al., 2021;Zipkin et al., 2017) and the accuracy of the predictions (Zulian et al., 2021). Second, combining structured or semi-structured data with unstructured data helps to factor out spatial biases in the latter and consequently helps to make better use of data streams coming from opportunistic citizen science events (Dorazio, 2014;Zulian et al., 2021). Finally, the greater geographic coverage achieved by data integration may lead to a better sampling of environmental gradients and hence improve the accuracy and precision of the estimated effects of covariates such as land cover and climate (Bowler et al., 2019;Chevalier et al., 2021).
Despite the great promise of IDMs, their applications are still limited. Studies have used IDMs to address a wide range of data integration problems (e.g., Martino et al., 2021;Rose et al., 2020;Schank et al., 2017;Zulian et al., 2021), but they have mostly been used over local (Farr et al., 2021) or nationwide (Hertzog et al., 2021) extents, and at fine grains, but not to model entire geographic ranges of species over coarse grains. As an exception, Zulian et al. (2021) used data integration to model the full geographic distribution of a parrot species endemic to the tropical South American Atlantic Forest. Further, with some exceptions (Doser et al., 2022;Hertzog et al., 2021;Pagel et al., 2014), IDMs have not been used to model temporal change of distributions, although this could be their obvious application, given the scarcity of temporally replicated data.
Finally, IDMs can appear complex, with a lack of user-friendly tools available; thus, their implementation can be challenging, particularly for inexperienced users. Hence, the full potential of IDMs has yet to be realised and made accessible, and there is a need for models that balance pragmatism and realism for combining the data typically available for large-scale distribution models.
Here, we introduce an IDM that addresses these shortcomings and models the temporal dynamics of entire species' geographic ranges by integrating two common data types: presence-only observations (e.g., as available in GBIF) and presence-absence surveys (e.g., from systematic surveys, in our case from camera traps). The model also accounts for common data problems such as local and regional variation in sampling effort and unequal area of surveys. The model can predict the temporal change of geographic distributions, change in range size, and the associated uncertainty at any spatial resolution. Lastly, one of our important aims is to lower the learning threshold of IDMs for new users.
The American tropics (the Neotropics) have been identified among the most important hotspots of biodiversity in the world (Antonelli et al., 2018;Morrone, 2017). At the same time, they are one of the K E Y W O R D S camera-trap surveys, community-science records, data-poor regions, jaguarundi, Latin America, MCMC, Poisson point process, range size, sampling bias, SDM, spatial autocorrelation areas where biodiversity is declining at higher rates (IPBES, 2019).
Unfortunately, data challenges are particularly pronounced in this region (Meyer et al., 2016); thus, the range dynamics of many species that occur there remain unknown. The available data are scattered and heterogeneous, typically coming from countries such as Colombia, Brazil, and Mexico and mainly from the birds' group. As a test case for our IDM, we chose the jaguarundi (Herpailurus yagouaroundi, also known in Spanish as yaguarundí, gato moro, leoncillo, león brenero, and onza) (Figure 1), which has a large distribution across Latin America, but knowledge of it has been limited by the data. Evidence shows that carnivore species such as the jaguarundi, have been recently varying their geographic distribution, most often noted around range edges (Grattarola et al., 2016;Lombardi et al., 2022;Luengos Vidal et al., 2017), and their abundance, over their entire distribution range (Caso et al., 2015). However, whether these changes are a product of previous lack of monitoring efforts in the region or due to the expansion or contraction of this species' range over time has not been quantitatively studied. Here, we develop an IDM to fill this knowledge gap. Moreover, we use our study design to provide a clear working example with R code, which can easily be copied, extended, and applied to model the range dynamics of other species.

| The data
Occurrence records (i.e., presence-only data; Figure 2) were downloaded from GBIF (GBIF.org, 2021), filtering all records from Neotropical carnivores with geographic coordinates and with no spatial issues. Jaguarundi data were subset by removing records with coordinate precision smaller than three decimal places (i.e., 0.001), and coordinate uncertainty greater than 25,000 meters (to match the resolution of the environmental predictors of 100 × 100 km). Finally, we eliminated duplicates considering independent records as individuals recorded on a different date, latitude, and longitude. Since we aimed to compare the distributional change in time, we divided the data into two time periods (time1: 2000-2013 and time2: 2014-2021), which were chosen since most of the data were collected from 2000 onwards and, on average each period represented 50% of the data (presence-absence and presence-only). A more unbalanced data split would likely increase the uncertainty of the estimated distribution in the period with the lower sample size, which might lead to convergence issues.
For each time period, we mapped the data to 100 × 100 km resolution grid-cells (Lambert azimuthal equal-area projection; centre latitude 0°S and centre longitude 73.125°W) covering the entire Neotropical region (i.e., from Mexico to the south of Argentina) -100 km was chosen as a compromise between sufficiently coarse for computational efficiency and sufficiently fine to produce useful descriptions of a species' range at a continental scale. In total, there were 193 occurrence records for the first time period and 234 records for the second period. See Figure S1 in Supporting Information, a schema of the data processing. with geographic coordinates, with information about the study sampling area, with starting and ending month and year of the study, and with reported sampling effort (i.e., the number of active camera trap days). These variables were then considered in the model; the temporal span was used to split the data, and the location, area, and sampling effort were used either as offsets or as covariates in the model. For each survey, a buffer polygon was F I G U R E 1 Individuals of jaguarundi (Herpailurus yagouaroundi) displaying the main two-coat colour variants. Left observed in Mexico by albamaya (CC-BY-NC) and right in Argentina by hhulsberg (CC-BY-NC). Photos from iNatu ralist.org. created using the latitude and longitude of the survey as centroid and either the study area or the latitude/longitude precision for the studies at the sampling level of "area" as the expected area of the polygon (see the metadata in Nagy-Reis et al. 2020 for more details on these definitions). Individual polygons were then overlapped and combined into 'blobs' for each time period. Finally, absences were generated in those blobs where the jaguarundi was not recorded. For more details, see the chart of the dataprocessing workflow in Figure S1. We used data from 8346 surveys for our study period: 4303 for the first period and 4043 for the second. The jaguarundi was recorded in 614 of the surveys (356 times in the first period and 258 in the second). Overlapping surveys for each period were then combined in blobs (488 for the first period and 480 for the second), and for each one, we calculated the total surface area, the time span of the records, and the effort in camera trap days.

| Thinning variables
These variables were used to explain the observation process of the presence-only records. The real-world occurrences of F I G U R E 2 Distribution of the jaguarundi data for the two time periods: from 2000 to 2013 (time1) and from 2014 to 2021 (time2). The top row shows occurrence records from GBIF.org (2021), and the bottom, camera-trap surveys from Nagy-Reis et al. (2020) with presences in colour and absences in dark grey (i.e., blobs in which some carnivore species were reported, but not the jaguarundi). Blobs were buffered by 20 km to improve visibility. The geographic range distribution of the jaguarundi, according to IUCN, is shown in shaded grey for all maps (IUCN, 2022). Projection WGS 84. the jaguarundi can be thought of as a point pattern (Baddeley et al., 2015), which is then sampled such that only some points are observed (thinned) and end up in GBIF, and ultimately in the presence-only dataset. To adjust the presence-only data for sampling effort (i.e., thinning) in each 100 × 100 km grid cell, we used data on accessibility from urban areas based on travel time (Weiss et al., 2020). Based on many past studies (e.g., Geldmann et al., 2016), we expected that highly accessible grid cells would have more point records than inaccessible grid cells. In addition, for each grid cell, we also included the country of origin to account for differences among countries in data-sharing capacities and citizen-science levels of engagement.

| Environmental covariates
The jaguarundi has been reported to occur mainly in lowland areas (up to 3200 m) (Caso et al., 2015) and in a variety of habitats, from dense tropical rainforest to open grassland, although in open areas it prefers patches of thick cover (Macdonald & Loveridge, 2010).
To model its distribution, we chose a set of environmental covariates and assessed their relative importance for the species.
For each grid cells in the 100 x100 km grid, and for each blob, we extracted the 19 bioclimatic variables and elevation (SRTM) from WorldClim V2.1 (Fick & Hijmans, 2017), land cover at 500 m/yearly

| Covariate extraction to grids and blobs
Continuous covariate data were matched to the presence-only data by averaging values within the 100 × 100 km grid cells and to the presence-absence data by averaging values within blobs. For the land cover (categorical covariate), we assigned the mode value (i.e. the most common value) for each grid cell and blob. We used the 'rnaturalearth' package (South, 2017) to obtain Latin American countries' spatial polygons at a large scale. Spatial data analyses were done using 'sf' (Pebesma et al., 2018) and 'terra' packages (Hijmans et al., 2022). MODIS data were downloaded using 'MODIStsp' (Busetto & Ranghetti, 2016).

| Covariate selection
Using all these covariates would lead to convergence problems due to collinearities, so we narrowed down the scope of the covariates for the final integrated model. Yet, doing a formal stepwise variable selection in the IDM setting is challenging because of computationally intensive MCMC sampling; similarly, Bayesian variable selection (O'Hara & Sillanpää, 2009) led to convergence problems. We thus manually selected a subset of covariates using (i) published descriptions of jaguarundi's habitat requirements (Caso et al., 2015;Espinosa et al., 2018;Macdonald & Loveridge, 2010), (ii) Pearson correlations among covariates (we aimed at minimising them; r < 0.6), (iii) findings (the top selected variables) from simple treebased machine learning analyses (boosted trees, random forests) with the raw presence/absence as a response, and all the covariates as predictors. We ended up selecting bio7 (temperature annual range: maximum temperature of the warmest month -minimum temperature of the coldest month), bio15 (precipitation seasonality), elevation, and NPP (Net Primary Production) (see Figure S2 in Supporting Information).

| The model
We illustrate our model as a Directed Acyclic Graph (DAG) in Figure 3. Notation of the data matrices, parameters, and indices is in Table S9.

| Approach to integration
The model assumes that jaguarundi's distribution can be described by Both likelihoods are then used jointly (Miller et al., 2019) to estimate parameter values.

| Modelling of time
A key feature of our model is that it allows for different probability of occurrence at each location between the first (time1) and second (time2) periods. To understand this, let us point out the parts of the model that are constant in time, and parts that change. We defined the following model components to be constant in time (identical in time1 and time2): (i) the relationship between point process intensity and environmental covariates, (ii) the relationship between accessibility and point process thinning, (iii) the random effects of countries on point process thinning, and (iv) the relationship between sampling effort and presences/absence data. The only components that change in time are the autocorrelated spline surfaces, one for time1, and the other for time2. These capture any time-dependent spatial structure in the jaguarundi's occurrence that is not captured by (temporally constant) environmental covariates.

| Point pattern intensity
To create the point pattern intensity of jaguarundi occurrences, we used design matrices (X PA and X PO ) that contained as many columns as the fitted model has parameters; in our case, 21: an intercept, environmental covariates (elevation, NPP, bio7, bio15) and the spline bases, and as many rows as blobs or grid-cells respectively. When the design matrices are multiplied by the vector of parametric effects (b), they yield the linear predictors ( PA and PO ), i.e., the expected point pattern intensity, given the values of all explanatory variables in the model: Note here that b is the same in both Equations 1 and 2 (see also

| Smoothing splines
We used thin plate regression splines (Wood, 2003) to model the spatial structure in the distribution that was not accounted for by the environmental covariates. These spatial splines give our model the flexibility to predict absences in otherwise suitable environments, which can happen due to dispersal limits, demographic stochasticity, or biotic interactions. In other words, the splines enable us to model the realised distribution in each time period, i.e., the actual distribution affected by its ecological preferences and factors such as dispersal limitation, and not just the fundamental distribution given solely by the environmental conditions (Rushing et al. 2019). Since we fit a different spatial spline for each time period, we used the splines as a flexible way to model change without making any assumptions about the drivers of change. We first generated k = 9 spline basis variables prior to the model fitting, using the jagam function from the 'mgcv' package (Wood, 2017). These variables are then part of the X PA and X PO matrices, and they have their own corresponding coefficients in the b vector. These coefficients have their own multivariate normal prior, specified using smoothing penalty matrices and smoothing parameters; for the sake of simplicity we do not present the complex mathematical definition here; we refer readers to the help of the jagam R function. We selected k = 9 as it was the highest value that still gave good convergence of the MCMC, and also provided sufficiently flexible surfaces to model large-scale geographic range.

| Modelling presence-absence data
Letting y PA i refer to the observed presence (1) or absence (0) value in each i -th blob for the first or second period, we modelled the blobspecific probability of presence ( i ) as a function of the fixed effects of the presence-absence linear predictor ( PA i ) and sampling effort Graphical illustration of our IDM. In this Directed Acyclic Graph (DAG), the yellow boxes refer to the responses, the green boxes to data, the pink circles to parameters to be estimated, the blue circles to linear predictors, and the arrows to causal relationships. Note that the 'b' parameter links both models. In the bottom part, we illustrate how we predicted the probability of occurrence and calculated the species' range area at each period.
(effort i , i.e., number of camera trap days), and the logarithm of the area of each blob in m 2 (area PA i ) as an offset term, where the index i identifies blobs. Prior distribution of was ∼ Normal(0, 10), i.e., Gaussian prior with mean zero and standard deviation of 10. This prior distribution is sufficiently uninformative given the scale of the predictors (scaled to 0 mean and SD = 1), while also not too wide to hinder MCMC convergence.
The state of this variable follows a Bernoulli distribution with mean i where y PA i is the observed data,

| Modelling presence-only data
We assume that the spatial distribution of individuals may be modelled using a Poisson point process. In our model, the true intensity (i.e., mean number of presence points per grid-cell) for the species in each grid-cell j is denoted as j . We modelled it as a function of the exponential of the presence-only linear predictor ( PO j ) by the area of each grid-cell in m 2 (area PO j ).
where j denotes a grid-cell.
To model the thinning of the true intensity, we calculated the cell-specific probability of retaining/observing a point (P ret j ) as a decaying exponential function with a random intercept 0 c for each c-th country and a fixed slope 1 for grid-cell accessibility (acce j ), The prior distribution for 0 c was Beta distribution with shape parameters equal to one, i.e., 0 c ∼ Beta(1, 1) where c ∈ 1: n cntr , (total number of countries). This is a flat prior that gives equal probability density to every value between 0 and 1. The prior for the slope of the distance decay 1 was a Gamma distribution with shape 0.5 and scale 0.05, i.e., 1 ∼ Gamma(0.5, 0.05). This is a weakly informative prior that is skewed to take small values; it is wide enough to be effectively non-informative given the meaningful parameter values, but not too wide to hinder MCMC convergence.
Finally, we calculated the thinned intensity per grid-cell ( j ) as the product of the true intensity ( j ) times the probability of retaining a point per grid-cell (P ret j ): The state of this variable follows a Poisson distribution with mean j where y PO j is the observed data,

| Predictions
To predict the probability of occurrence of the species in the two time periods, we used the linear predictor pred , as The detection probability (P pred j ) was modelled for each grid-cell j with area (area PO j ) as an offset term.

| Assessing Model Performance
We performed posterior predictive checks to evaluate the fit of the model (Conn et al. 2018) and plotted expected and observed data to visually compare them. Also, for the PA data, we used Tjur's R 2 , a coefficient of discrimination for logistic regression models (Tjur 2009), and for the PO data, we did residual diagnostics using the 'DHARMa' package (Hartig, 2022).

| RE SULTS AND D ISCUSS I ON
We successfully fitted an IDM to study the dynamics of the geographic range of the jaguarundi in Latin America over the last two decades. Good convergence values (Rhat <1.1) were reached for all model parameters. Data integration enabled us to increase the sample size, geographic extent, and environmental scope for each time period, taking advantage of the complementary information and sampling locations in different data streams (see Table S10). As the open data revolution continues and citizen science contributes everincreasing amounts of data, we expect IDMs will become a standard tool for ecologists, provided that the tools become available. To our knowledge, we present the first application of the IDM approach with a temporal dimension and over the entire geographic range of a species.

| Most up-to-date knowledge of the species range
Expert range maps have played important roles in both research and policy by providing information on species distributions where there are data gaps. However, expert range maps are unsurprisingly coarse and infrequently updated, which means that they rapidly become outof-date and have been less useful for studying range change. Using all the data available (open-access) for the species from 2000 to 2021, our modelled geographic range (time1 and time2) is not entirely consistent with the current expert range map from IUCN (Figure 4c). , and (f) range difference split by the uncertainty of the prediction (darker pink and green colours show highly certain range losses and gains, respectively) -this shows that the jaguarundi has contracted its southern and northern range limits but expanded its area of distribution over lower latitudes. Projection WGS 84.

F I G U R E 4 Predicted
Specifically, in the southern range limit in Argentina and Uruguay and in the Caatinga region of north-eastern Brazil, where our model predicted a low probability of occurrence, or in the border of Mexico with Guatemala and the northern Andes, where it predicted a high probability. Thus, we have updated the knowledge presented by the IUCN expert's range map.

| Temporal change
The main innovation of our IDM is the estimation of temporal change in species' geographic range, which was possible even over a relatively short time span. We found that between the first and second periods, the jaguarundi has contracted its southern and northern range limits towards the equator but expanded its area of distribution over the entire species' range (Figure 4d We attribute the southern and northern range contractions to the species being rarer close to its environmental niche limits. Yet, at least in the southern limit, major land conversions have taken place in recent decades as a result of agriculture expansion (mainly soybean Baldi & Paruelo, 2008;Song et al., 2021), where the species also occurs at relatively low densities (Giordano, 2016;Luengos Vidal et al., 2017). The jaguarundi can also be shifting its distribution as a response to changes in environmental variables such as increasing temperatures and precipitation anomalies (Magrin et al., 2014) or due to the influence of the distribution of other species, e.g., competitive exclusion with Leopardus pardalis due to what is known as "the ocelot effect" (de Oliveira et al., 2010).
However, we note that even though our model can model temporal change of species ranges, it does not directly test causal hypotheses about drivers of the change. This is because we modelled temporal change solely using the different spline surfaces in first and second periods, while the environmental covariates in the model were long-term averages. In a way, our model thus predicts range kinetics (i.e. temporal change) rather than dynamics (i.e. temporal change and its causes). Here we see a clear opportunity for extensions of our IDM to directly assess causal drivers of the change. A simple approach can be to relate the predicted change of P pred to an observed change of environment directly within the model (specifically, in the "predicted quantities" part). Environmental covariates could be decomposed into both the spatial long-term means as well as the temporal anomalies (Oedekoven et al., 2017). A more sophisticated approach could involve velocity (i.e., magnitude and spatial direction) of both the range and environment (Loarie et al., 2009) or modelling co-occurrence effects (Ovaskainen et al., 2017). A challenge when testing species interactions will be the dependency of these on the scale of analysis. At smaller spatial scales, species' reactions to environmental factors may reflect their preferences for particular habitats or how they use space, while at larger scales, their responses may indicate broader ecological patterns such as metacommunity dynamics, biogeographical limitations, and energy constraints (Connor et al., 2019;Riva et al., 2023).

| Model performance and prediction uncertainty
Posterior predictive checks showed that the model performs well for the PA data (Tjur's R 2 = 0.368, Figure S3). The PO model was checked by simulating new data from the fitted model, followed by residual analysis, and overall showed a reasonable fit ( Figures S4   and S5). To highlight where there was most uncertainty in the model predictions, we calculated the predicted change in occupancy at the grid cell level and plotted the uncertainty of grid-level change (i.e., the standard deviation of the posterior distribution of occupancy change) (Figure 4f). By incorporating uncertainty in our predictions, we ensure that all predictions are robust to the data.

| Ecological inference
An advantage of our parametric IDM (over, e.g., a machine learning such as random forest) is that it can also be used for ecological inference. According to the coefficients for the environmental covariates ( Figure 6), the jaguarundi avoids deserts and semi-arid areas such as the Atacama or the Brazilian Northeast (negative effect of annual temperature range, or "bio7", in Figure 6); it prefers areas with seasonal precipitation, which is most of Latin America except for the deserts, high Andes, and extremely humid Colombian forests (positive effect of precipitation seasonality, or "bio15"). This aligns with findings from Espinosa et al. (2018), who show that habitat suitability for the species is negatively affected by extreme temperatures and low annual precipitation. The jaguarundi also prefers productive green areas with good vegetation cover, likely where it can hide (positive effect of NPP). Finally, we found a positive effect of elevation (but it was also weak, relatively to the effect of other covariates, Figure 6). This indicates that the jaguarundi may not be restricted to lowlands, as described by Caso et al., (2015). This is in line with several observations from higher elevations in our data (presence-only records in Figure 2). Hence our model suggests that if an area offers enough vegetation cover and suitable climatic conditions, jaguarundi will likely be present, irrespectively of altitude.
This insight would be impossible if a presence-absence approach to species distribution modelling were used, as these data miss the occurrence of jaguarundi in the northern Andes (Figure 2), unlike the presence-only data.

| Inference about sampling effort -the thinning function
A notable feature of our model is how we considered the probability of an observation being made if the species was present (i.e., the thinning process; Equation 6). We made it dependent on the gridcell accessibility since this is a known variable associated with the density of presence-only records and reflects the effect of the number of possible observers and their tendency to observe species near where they live. Moreover, we allowed its effect to vary among countries, allowing for differences in capacities to observe and report observations. Our results revealed that, for the predicted species range distribution (Figure 4), countries such as Argentina, French Guiana, and Suriname have reasonable levels of records for the species in accessible areas (see Figure S6). However, most countries show low observation-retention probability (P ret <0.25 when accessibility is maximum; see Figure S6), which means that even areas that are easy to reach only get less than 25% of observations.
We expected that countries such as Colombia had better levels of sampling effort for the species, as this is one of the countries in Latin America with the highest numbers of records in GBIF (GBIF Secretariat, 2022). However, the low levels of sampling found there can be related to the high abundance (point intensity) of the species in the area. Regardless, some particular regions of the species distribution range, as seen by the uncertainty of the estimates, need more sampling effort (see Figure S7), i.e., central Mexico, northeast Brazil, Peruvian Amazon, eastern Bolivia, and Uruguay.

| Limitations and potential extensions
Like with any other model, our IDM has clear limitations and scope for improvements. First, if larger datasets on species occurrences are available, the number of predictors can be increased, and the best covariates selected during model-fitting, e.g., with variable indicator selection (O'Hara & Sillanpää, 2009, Rushing et al. 2019. However, in data-poorer contexts such as ours, this can lead to convergence issues. Second, since it is implemented in JAGS, the model can be extended in various ways. Accounting for imperfect detection (Dorazio, 2014;Koshkina et al., 2017), modelling multiple species in F I G U R E 6 Effect of the environmental covariates on the intensity of the point process for the jaguarundi. Thick lines represent 90% of the highest posterior densities of the parameters and thin lines represent 95%. joint species distribution models (Doser et al., 2022;Ovaskainen & Abrego, 2020), e.g., to share information on the thinning process (Fithian et al., 2015), or accounting for false positives (Kéry & Royle, 2015) are some of the potential extensions that have recently been tested in the IDM framework (Doser et al., 2022), although not over large geographic extents. Also, for the sake of simplicity, we established two discrete temporal periods that enabled us to have balanced presence-only and presence-absence data and to produce an up-to-date map of the species' current distribution. However, time can also be included as a continuous predictor in our model (e.g., by allowing the spatial splines to interact with the year) to allow predictions of annual change in distributions. Third, the simple linear responses are enough to demonstrate the general idea of the model, i.e., the model converges without more complex response curves.
However, the presented model can be expanded to fit, for instance, unimodal responses, if more realistic (Rushing et al. 2019).

| Data imbalance
The PA and PO data show different spatial patterns; specifically, there are more PO points towards Central America, but slightly more PA data towards south-east South America. However, our modelled predictions should be robust to this data imbalance because: (i) each data type has a similar spatial pattern across time periods, and their joint spatial pattern covers the whole range of the species expected from the IUCN. Therefore, they jointly present low imbalances in the geographic space they cover; (ii) both of our predicted ranges align with the current expert knowledge (IUCN range) and not with the perceived imbalance in the raw data (compare Figure 2 with Figure 4a,b and Figure 4c); (iii) as our analysis is Bayesian, we were able to incorporate the uncertainty around the estimated temporal change, measured by the standard deviation of the predicted range change (Figure 4e). This uncertainty in a given location should increase when there is a lack of data in the location ( Figure S7). By incorporating uncertainty in our predictions, we ensure that all predictions are robust to the data imbalance.
The model tends to underpredict species presence, as the probability of occurrence was not always high in areas with high species counts/ presences. Our model was, therefore, better at predicting absences than presences. Range expansions are reflected in larger PO counts in the second period ( Figure S8). However, range contractions are most likely driven by the PA data (Figure 2 vs. pink areas in Figure 4d).

| Practical applications and challenges
Jumping from classic statistics to full Bayesian statistical inference comes with some hurdles. There are conceptual barriers/ obstacles, new terms/definitions and much "statistical rethinking" (McElreath 2020). There are also no pre-made R functions to choose from, and users must design and code on their own. This requires a greater knowledge of the inner workings of the models and their ecological interpretation. Bayesian methods can be computationally intensive; in our case, it took 230 min in a 16GB RAM 3.2Ghz 8-core laptop to run MCMC with 100,000 iterations. We used JAGS (an implementation of BUGS; Lunn et al., 2000) to specify and fit the model since it is more flexible than, e.g., INLA (Rue et al., 2009) and also more didactical than, e.g., STAN (Stan Development Team, 2022).
Spatial splines can be a particular challenge to code for JAGS, but the jagam function in the "mgcv" R package is enormously helpful for this. For more advanced users, however, the latter two may be more computationally efficient alternatives. Fortunately, code and data sharing are becoming more and more common. Understanding Bayesian IDMs can be challenging but hopefully, our analysis together with the commented code and the data will help to overcome these hurdles. We thus urge readers to pay close attention to our model's code (https://github.com/bienf loren cia/yagua rundi_IDM).
It comes with a glossary of terms, extensive comments, and further explanations, all aimed at providing the material that readers can reuse in their own projects. In this respect, we also recommend the excellent tutorials by Kéry (2010) and Kéry & Royle (2015 and McElreath (2020) together with the course (https://github.com/ rmcel reath/ stat_rethi nking_2022).

| OUTLO O K
In the tropics and the global South, a lack of temporal data (i.e., data at the same location for different points in time) has always limited understanding of how species change their geographic ranges through time (Antonelli et al., 2018;Hortal et al., 2015). Most recent studies have focus on the temperate global North (i.e., North America, Europe, and Australia-New Zealand) (Chen et al., 2011). Even studies on the global scale, such as those based on the BioTime (Dornelas et al., 2018) or the Living Planet databases (Loh et al., 2005), have severe data gaps in the tropics. If there are any studies that cover the entire world (Pacifici et al., 2020), they rely on static geographic ranges (e.g., IUCN range maps), which in many regions tend to misrepresent actual species distributions (Hughes et al., 2021). Fortunately, unsystematic records from citizen or community-based science platforms are a growing source of presence-only data. Global initiatives such as iNaturalist (www.inatu ralist.org) have become popular in Latin America, with national nodes in Argentina, Chile, Colombia, Costa Rica, Ecuador, Guatemala, Panama, Mexico and Uruguay, now counting more than 2.6 million research-grade observations (GBIF. org, 2022). These data have been deemed particularly problematic in the estimation of temporal trends of biodiversity and species geographic distributions (Peterson et al., 2011). However, with the type of IDMs that we present here, they can become a potentially valuable source for species distribution modelling in the region.

CO N FLI C T O F I NTE R E S T
The authors declare no conflict of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Species PO data are available at GBIF.org. (2021) and PA data at Nagy-Reis et al. (2020). The extensively commented code used in this paper is openly available at https://github.com/bienf loren cia/ yagua rundi_IDM under a CC-BY licence.