Assessing the accuracy of density-independent demographic models for predicting species ranges

findings suggest that deterministic, density-independent DDMs are appropriate for applications where locating all possible sites the species might occur in is prioritized over reducing false presence predictions in absent sites. This makes DDMs a promising tool for mapping invasion risk. However, demographic data are often collected at sites where a species is abundant. Density-independent DDMs are inappropriate in this case.


Introduction
Spatial projections of species occurrence and persistence are essential for developing ecological theory and improving environmental management (Guisan et al. 2013, Meyer et al. 2015, Briscoe et al. 2019. While scientists and environmental agencies commonly correlate static presence/absence data with environmental variables to project species ranges (Elith and Leathwick 2009, Guisan et al. 2013, Hijmans et al. 2017, the accuracy and utility of these projections have been criticized, especially when projecting distributions in time or to novel environments (Pearson and Dawson 2003, Thuiller et al. 2014, Zurell et al. 2016, Cabral et al. 2017, Briscoe et al. 2019). Unfortunately, due to current, rapid, environmental change, projections in time and to novel environments are urgently needed (Ackerly et al. 2010). Demographic distribution models (DDMs), which model vital rates, such as survival, development and reproduction, as functions of environmental variables, have been proposed as a promising alternative for generating these projections (Merow et al. 2014, Zurell et al. 2016, Briscoe et al. 2019. DDMs are promising because they model the underlying mechanisms that drive occurrence (Normand et al. 2014, Cabral et al. 2017, mechanisms which may continue to hold in new environments. Yet the accuracy and limitations of DDMs are poorly understood.
The first step to building a DDM is regressing vital rates against environmental factors to determine trajectories of population size through time, which then may be projected into the future to predict species occurrence and abundance (Villellas et al. 2015, Ehrlén et al. 2016, Cabral et al. 2017, Csergő et al. 2017). This explicit focus on capturing how environmental variables affect vital rates differs from other approaches using demographic models to to project species' ranges. For example, coupled niche-population models use stochastic demographic models to project range dynamics, but vital rates are typically fixed, with environmental effects captured by constraining potential habitat and carrying capacity based on modelled habitat suitability (Keith et al. 2008, Fordham et al. 2013. Such approaches allow one to explicitly project range dynamics with limited demographic data accross sites, but their reliance on occurance data to model environmental cosntraints means that they can suffer from many of the same drawbacks as correlative SDMs (Briscoe et al. 2019). Unfortunately, DDMs also have important drawbacks. Even the simplest DDMs require population abundance data through time (Buckley et al. 2010), or detailed demographic data tracking many individuals and their offspring within a field season (Merow et al. 2014). In both cases, these data must be collected at multiple, geographically and climatically dissimilar sites, and classified by age, and/or size of development (Caswell 2001, Needham et al. 2018. This requirement of spatial and temporal replication is challenging in its own right. Therefore, DDMs typically ignore the effect of intraspecific competition on survival or reproduction, despite tools for incorporating density-dependence effects in demographic models (Cushing et al. 2002, Dahlgren et al. 2014, Teller et al. 2016. To our knowledge, the vast majority of DDMs, parameterized with field data, to project species ranges, ignore density-dependent effects (Buckley et al. 2010, Barbraud et al. 2011, Merow et al. 2014, Needham et al. 2018, Sheth and Angert 2018 but see (Pagel et al. 2020) for an exception. Such density-independent DDMs predict occupancy by linking environmental variables to long-term population growth rate, λ, through the variables' effects on vital rates in matrix-population or integral-projection models (Buckley et al. 2010, Merow et al. 2014. The implicit logic is that if λ > 1, the local population is predicted to persist; if λ < 1, the population is predicted to go locally extinct in the long-term.
However, estimates of λ from density-independent demographic models do not necessarily reflect long-term persistence if the population experiences density-dependent survival, development, and/or fecundity. If fecundity or survival are lower at high population densities due to, for example, competition for resources, populations can approach a long-term equilibrium abundance -which we will refer to as carrying capacity. It is likely that demographers often collect data where populations are abundant, i.e. close to carrying capacity (Quintana-Ascencio et al. 2018, Fournier et al. 2019. This is because large populations are easier to find and researchers often go where healthy populations are known to exist, not to fringe populations, likely to produce small datasets. However, in healthy populations near carrying capacity, there may be limited population growth, even if the site is highly suitable. Demographic models fit to data from these sites should produce λ ~ 1, and therefore, an estimate of λ < 1 could simply reflect measurement error, disturbance or temporary declines after populations exhausted their resources. In short, if demographic data are collected in field sites where the species is abundant, a density-independent DDM using these data could erroneously map prime habitat as uninhabitable. Therefore, it is no surprise that empirical studies often find estimates of λ uncorrelated or even negatively correlated with habitat suitability or species occurrence (Diez et al. 2014, Thuiller et al. 2014, Csergő et al. 2017. In contrast, there are at least two cases of density-independent DDMs built from demographic data restricted to sites with small populations (e.g. an invasive species and species experiencing high levels disturbance). In these cases, λ successfully predicted species occurrence (Merow et al. 2014(Merow et al. , 2017. Given the mixed success of initial attempts to predict species ranges using density-independent demographic models, we set out to determine general guidelines for when these models can predict species occupancy accurately. We achieved this by simulating range dynamic data, and observers sampling the data to build DDMs. We then compared DDM predictions against long term occupancy, assessed DDM accuracy and correlated DDM accuracy with various species and population characteristics. Finally, we compared DDM predictions to predictions from standard correlative species distribution models (SDMs). In face of limited data available to validate predictions of species range dynamics, our simulated approach provides a tool for assessing DDM accuracy. Our computational framework has many advantages over traditional validation and sensitivity analyses using real-world data, including: increased repeatability, transparency, sample sizes and control over environmental and historical factors (Zurell et al. 2010) -all of which help improve the generality of the results.

Material and methods
Our study involved three separate processes: 1) simulating population dynamics for hypothetical species using a stochastic, negative density-dependent, spatially-explicit, stagestructured population model with two life stages (juvenile, adult) and juvenile dispersal ( Fig. 1 for a graphical description of the model); 2) simulating sampling by field workers conducting demographic surveys across a subset of the species' habitat; and 3) fitting demographic distribution models, computed from the sampled field data. To determine the characteristics of species that can successfully be modeled using demographic distribution models, we simulated the range dynamics of 1.5 million hypothetical species that differed in their maximum survival rates at each life stage, maximum fecundity at low densities, maximum carrying capacity, response to environmental variables, stochastic variability in survival, initial population densities and proportion of the population that disperses at each time step. We then determined if we could draw general conclusions about the species for which DDMs made accurate versus inaccurate predictions of species occurrence.

Simulated population dynamics -survival
We simulated population dynamics using a simple stochastic model. The probability of survival for an individual in life stage s and site i, φ(i,s), was a function of life stage, and environmental conditions, In the above equations, x i,j is the value of environmental variable j in site i, (j = 1, …, n e ), where n e is the number of environmental variables. The parameter ψ s is the maximum expected survival probability in life stage s. The functional form in (Eq. 1) maps linear and quadratic combinations of the environmental variables, u, to survival, so that survival is always bounded between 0 and ψ s . The coefficient α 1,s,j , is the linear trend between the link to survival, of life stage s and the environmental variable j, whereas α 2,s,j is the quadratic trend. If the quadratic coefficient is negative, α 2,s,j < 0, survival is maximized at intermediate values of environmental variable j, along its gradient, to resemble first principles of the Hutchinsonian niche concept (Holt 2009). Whereas, if α 1,s,j > 0, and α 2,s,j = 0, then increases in environmental variable j strictly increase survival. Spatial and temporal variation in survival during life stage s, not Figure 1. A graphical depiction of the simulation model. Solid dark arrows represent the effects of environmental variables and population density on vital rates (thick yellow arrows) in the baseline scenario. Dashed arrows are for effects absent in the baseline scenario, but which are tested in the sensitivity analysis. Blue curves show the assumed functional relationships between the variables. Note that both elevation and forest cover affect survival in both life stages, but repeated arrows are omitted to improve readability. For the Beverton-Holt and logistic fecundity curves, the fecundity axis is total offspring (this is rescaled to expected per-capita offspring in the model description).
attributable to the environmental variables, x i,j is given by, ϵ i,s,t , a random variable with zero expectation. The intercept, α 0 , is set to zero throughout the paper with no loss of generality.

Simulated population dynamics -fecundity
As standard in ecological modeling and environmental management (Quinn II and Deriso 1999), we incorporated density-dependence in simulated fecundity. We considered two of the most widely used types of negative density-dependence in the literature. The first was logistic fecundity, where the expected number of juvenile offspring was determined by the logistic model (May 1974). This represented scramble competition, where population sizes above carrying capacity cause declines in total viable offspring. The second was Beverton-Holt (aka. reciprocal yield) density dependence (Shinozaki andKira 1956, Beverton andHolt 2012). This represented contest competition, decreasing per-capita fecundity, but increasing total fecundity, with respect to population size. Both of these fecundity functions were parameterized with the variables f max and k i , per capita fecundity at low adult abundance and adult carrying capacity in site i, respectively. We refer to the fecundity function in site i as f i (n), where n is the number of adults in the given site and time. Note, however, that we did not consider Allee effects and use density-dependence synonymously with strict negative density dependence. See Supporting information for details on the functional forms and parameterization of these standard ecological models.

Simulation algorithm
We considered a population with two life stages, juvenile (s = 0) and adult (s = 1), and assumed juveniles became adults after one time step or died in that period of time. Therefore, φ(i,0) was the probability of a juvenile in site i transitioning to an adult and, 1 − φ(i,0) was the associated mortality probability. The unit of each time step was one iteration of the demographic model, often thought of as one year (Salguero-Gómez et al. 2016). However, for generality, we do not specify the time unit since the species are hypothetical. Note that a deterministic version of this simulation, without dispersal, is equivalent to simulating a standard matrix population model governed by the density-dependent Lefkovitch matrix (Caswell 2001), where, f i (n) is per-capita adult fecundity, and with survival probabilities φ(i,0) and φ(i,1), determined by Eq. 1. Therefore, one can predict the persistence of the population (in an analogous deterministic scenario) by the leading eigenvalue of the linearized system at the extinction equilibrium, e.g. the eigenvalue of, which yields an expected long-term population growth rate, at low population densities, in site, i, If l i < 1 , a population in site i will eventually go extinct without dispersal from other sites or a series of random favorable years; similarly, if l i > 1, the population will persist in the absence of random fluctuations. The full stochastic simulation, including dispersal, involved four steps. We: 1. Drew the number of surviving juveniles that became adults in the next time step, in each site, from a binomial distribution with the number of trials equal to the number of juveniles in the previous time step and probability of survival, φ(i,0). 2. Drew the number of surviving adults in each site, from a binomial distribution with the number of trials equal to the number of adults in the previous time step and probability of survival, φ(i,1).

Drew the number of new juveniles from a Poisson distri-
bution with expectation f i (n)n (Supporting information for details about the fecundity function f i (n)). 4. Randomly selected a fraction of new juveniles (offspring in step 3), p d , to disperse, where each dispersing individual has an equal probability of landing in each site. 5. Updated the total number of adults to equal the surviving adults plus the surviving juveniles (new adults), and updated the total number of juveniles as the new juveniles from reproduction, accounting for offspring entering and exiting the site, due to dispersal.

Simulation scenarios
The main goal of the study was to test how different species varying in survival rates, fecundity, dispersal, responses to environmental variables, stochasticity and initial abundance affect the accuracy of demographic distribution model (DDM) predictions. We first calculated DDM performance for a baseline scenario, with logistic fecundity and parameters set to values in Table 1. We then performed a sensitivity analysis, where we simulated the population dynamics and fit distribution models under 1000 random combinations of parameter values, each under the two different density-dependent fecundity functions, three correlation structures between fecundity and survival, and three different ways of distributing initial population density in space, each containing approximately 50 different density distributions. This created 1.5 million experiments, each over 4915 sites and two age-classes (over 145-billion time series).
In both the baseline and the sensitivity analysis, we simulated population dynamics of hypothetical species over 4195, 10 km 2 , sites in Switzerland, affected by elevation, x i,1 , and forest cover, x i,2 (Supporting information), standardized to zero mean and unit standard deviation (Kéry et al. 2017). We let ϵ i,s,t be independently, identically, normally distributed with mean 0 and standard deviation, σ. Site-specific carrying capacity, k i , was set to maximum abundance, k max , times the proportion of the site covered by forest. In sites with k i < 5, we set carrying capacity to zero, to represent too little habitat for a persistent, long-term population. We set initial abundances in sites where the species would persist in the absence of stochasticity (sites with l i > 1) to specified proportions of carrying capacity. We randomly selected 5% of sites where forest cover was high enough to yield carrying capacity above 10 individuals, but where survival was too low to maintain a long term viable population (sites with l i < 1 ) and set their initial adult abundances to 10 individuals, to represent invaded sink populations. There were 200 time-steps in the simulations.
For the sensitivity analysis (Table 1), maximum adult and juvenile survival, ψ 1 and ψ 0 , were assigned random values uniformly drawn between 0.01 and 0.99. This wide range includes slow-growing, long-lived species and fast-growing, short-lived species. The linear effect of forest cover on survival, α 1,s,1 , was randomly drawn from a uniform distribution from 0 to 3. To produce ecologically sensible, yet wide ranges for the effect of elevation on survival, we drew the quadratic elevation effect, α 2,s,1 , from a uniform distribution from −3 to 0, and then also drew a preferred elevation, v, uniformly over the entire elevation range in the data, and chose the linear elevation effect to maximize survival at this preferred elevation, namely, α 1,s,1 = −2vα 2,s,1 . A quadratic factor of zero represented species that could survive across all observed elevations equally, whereas −3 was for species that could only tolerate a narrow range of elevations.
We considered three scenarios for maximum viable offspring at low population densities, f max , 1) positively correlated with survival, 2) negatively correlated with survival and 3) fixed across sites. For the fixed case, in each of the 1000 parameter sets, f max was randomly drawn from a uniform distribution ranging between the lowest possible number such that the species would be expected to persist in at least 5% of sites (i.e. l i > 1), and the largest possible number for which carrying capacity was guaranteed to be a stable equilibrium at the most favorable site, given ψ 1 , ψ 0 and the effects of environmental variables (which were drawn first). The last constraint simply eliminated the possibility of chaotic and unstable, periodic dynamics, and was determined through standard linear stability analysis (Strogatz 1994) techniques (Supporting information). In the correlated case, f max was different at each site according to the environmental variables at that site. This was achieved by drawing a number between the maximum and minimum values for f max described above, for each site, using a beta distribution, with beta distribution parameters as a function of environmental variables. This made f max more likely to be high in sites with high survival, and low in sites with low survival (see the section 'Environmentally driven fecundity scenarios' in Supporting information for details). The negatively correlated scenario was achieved similarly (Supporting information).
The standard deviations of environmental stochasticity and the dispersal proportion were uniformly randomly generated on (0, 0.5) and (0, 0.05), respectively, to represent wide ranges for the types of species an ecologist would consider fitting a deterministic demographic model without dispersal. Maximum carrying capacity across all sites, k max , was also uniformly distributed.
We considered three different scenarios for how initial population sizes were distributed through space: 1) fixed initial abundances at all sites where the species was expected to persist were varied factorially with each parameter combination, using 51 values between 5% and 135% of local carrying capacity; 2) initial abundance set at 25, 50 and 75% of the carrying capacity in a fixed proportion of sites and carrying capacity in the other sites (varied across all possible proportions); and 3) uniformly distributed abundance, with 46 mean values from 52.5% of carrying capacity (corresponding to a lower bound of 5%) to 100% of carrying capacity.

Simulated field sampling
We assumed that simulated population dynamics represented the true population, and sampled from this population by * Chosen to maximize survival at a uniform randomly chosen preferred elevation, over all possible elevations in the data, given a random quadratic elevation effect, drawn from the range in the row below.
simulating a demographer using common field-sampling methods (Zurell et al. 2010). Sampling occurred for ten time-steps from the start of the simulation with dispersal turned off, to simplify the analysis and mimic a situation where the researcher can account for the origin of individuals in the site. The virtual ecologist randomly chose n s sites where the species was present at the beginning of the simulation and then counted the number of surviving juveniles (new adults), surviving adults and new juveniles, at each sampled site, over ten time-steps. While this is a standard approach (Lavine et al. 2002), an alternative, but more laborious and computationally expensive, approach, would model individual organisms and track a sampled subset of these individuals. Tracking individuals is advantageous if one wants to quantify individual variability in demographic processes, but this was not a focus of our study. Also, considering we were analyzing nearly five-billion time series, computational efficiency was required to make sure results were general across species. We set n s = 200 sites, representing a highly optimistic, but realistic sample size. For example, previously, DDMs have used 138 sites (Merow et al. 2014). We selected a high value because the purpose of the study was to identify species for which DDMs could generate useful predictions given high-quality data, but we also tested DDM accuracy for scenarios with 50, 30, 25 and 20 sampled sites. We set the length of demographic surveys to two years, but we also tested survey lengths of three, five, 10 and 20 yr.

Distribution models
The demographic distribution model (DDM) assumed that the population dynamics in site i were governed by a twostage matrix population model. While a variety of densityindependent demographic models have been used to build DDMs in the literature, including integral projection models (Merow et al. 2014(Merow et al. , 2017 and matrix population models (Buckley et al. 2010), we chose a matrix approach for both the simulation and fitted DDM because it is the simplest and most computationally efficient model that maintains the essential demographic features of structured population dynamics. The fitted model was, was performed using a statistical model. From the simulated field sampling of demographic data, we computed the observed per-capita fecundity, number of surviving adults and juveniles over the sampling period. These quantities were uniquely determined because we turned off dispersal during the period of demographic sampling. This procedure created a vector of observed juveniles survived, adults survived and fecundity, each with the number of entries equal to the number of sites. The functions f 0 , f 1 and f , were then estimated using generalized additive models, function 'gam' in R (Wood 2017). The generalized additive models assumed binomially distributed counts of surviving adults and juveniles and Poisson distributed total offspring with a rate parameter equal to an estimated parameter, based on environmental predictors, times the number of adults at the site. The estimated parameter was therefore expected percapita fecundity. We restricted total offspring predictions to the range of observed values to avoid issues extrapolating beyond the data, as is common in distribution modelling (Stohlgren et al. 2011, Owens et al. 2013.
The DDMs predicted a unique matrix, A i , for every site based on the environmental variables at that site. We used the predicted A i to calculate the long-term population growth rate, λ i , by computing A i 's leading eigenvalue. Following standard practice for DDMs (Merow et al. 2014(Merow et al. , 2017, we interpreted λ i as a measure of persistence, where λ i < 1 predicted eventual extinction and λ i > 1 predicted long-term persistence at a given site. We then compared the predicted λ i values to the presence of the species, 200-time steps after demographic sampling, to determine whether DDM predictions of persistence were correlated with long-term persistence at a site. Note that, throughout the paper, we refer to λ i as long-term population growth rate from the fitted DDM, whereas l i is the expected population growth rate from a deterministic version of the true model used to simulate the data. We also compared how accurately the DDMs predicted occupancy in comparison to correlative species distribution models (SDMs). SDMs were generalized additive models, predicting the probability of species' presence given presence/absence data and environmental variable values, at the n s sampled sites, at the end of demographic sampling. In cases where the generalized additive models did not converge (less than 0.1 percent of scenarios), for both the DDM and SDM, we used generalized linear models with similarly distributed error. To calculate the prediction accuracy of the SDM we considered sites to be predicted present when the modeled probability of presence was greater than 0.5.

Results
For the baseline scenario, with all parameters set to intermediate values, and initial population sizes in each site set to 10% of local carrying capacity, the DDM performed well. For 95.1% of the 3518 sites where the population went extinct (light grey in Fig. 2a), the DDM predicted λ i < 1 (Fig. 2b). For the 1397 sites where the species was present at the end of the simulation (green in Fig. 2a), the DDM predicted λ i > 1 in 99.7% of these sites. The lack of red pixels in Fig. 2c denotes the 0.3% of present sites where the DDM incorrectly predicted λ i < 1.
In the same baseline scenario, but with initial populations at carrying capacity during the start of demographic sampling, the DDM over-predicted extinction ( Fig. 2d-f ). For the 1415 sites where the species was present at the end of the simulation, the DDM predicted λ i > 1 at only 39.0% of the sites. On the other hand, out of the 3500 sites where the population went extinct (light grey in Fig. 2d), the DDM correctly predicted λ i < 1, 97.3% of the time (Fig. 2d). This means the DDM predicted the status of absent sites well regardless of population density at demographically sampled sites.
To summarize, in the baseline scenario, DDMs predicted present sites accurately if initial density during sampling was close to zero, and inaccurately for initial density at carrying capacity. However, for what initial density in sampled sites, between 10% and 100% of carrying capacity, do predictions cease to be accurate at present sites? To identify the critical population density to achieve a specified target percentage of correctly predicted present sites, we ran the above simulation with initial densities of 5-135% of carrying capacity (Fig. 3) and computed the proportion of present sites where the DDM predicted λ i > 1 for each initial density (black dots in Fig. 3). The black dots in Fig. 3 formed a clear monotonic decreasing pattern, and we fit a smooth curve to these data (curve in Fig. 3, see Supporting information for details on curve fitting methods). We then computed the critical population abundance as the intersection of this curve with the specified target prediction accuracy (Fig. 3). A critical initial population size of 94.2% of carrying capacity was required to achieve a prediction accuracy of 80% in present sites (circle in Fig. 3). The critical population abundance needed to correctly predict 90% of present sites was 90.1% of carrying capacity (square in Fig. 3). Unlike true presence predictions, correctly predicting absent sites was not strongly related to population density, which we discuss further in the results of the sensitivity analysis.
To demonstrate that Fig. 3 was not an artefact of the baseline parameterization chosen, we then proceeded to calculate the critical population abundance, as in Fig. 3, for every parameter combination, fecundity function and correlation scenario in the sensitivity analysis. We set the true presence accuracy threshold equal to 80% accuracy, a round number close to the 79% prediction accuracy reported from past empirical DDMs (Merow et al. 2014). A histogram of critical population abundances across all parameter combinations tested, under both fecundity functions and all three correlation scenarios, shows that high critical initial population sizes (80-95% of carrying capacity) at demographically Figure 2. Maps of occupancy and predicted population growth rate, λ, from the demographic distribution model (DDM) of a virtual species in Switzerland, given demographic data sampled from sites with low (top row) and high (bottom row) population densities, showing an over prediction of extinction when demographic data come from locations at carrying capacity. (a, d) Map of occupancy at the end of the simulation, (b, e) predicted population growth rate λ from the DDM and (c, f ) sites where present populations at the end of the simulation are incorrectly predicted by the DDM to be absent. The initial population sizes at sampled sites were at 10% of carrying capacity in row 1 (a-c) and at 100% of carrying capacity in row 2 (d-f ), for the baseline parameterization. When demographic samples were conducted at sites where the populations were at 10% of carrying capacity, the DDM correctly predicted all present sites as present, whereas the DDM only correctly predicted 56.1% of present sites when demographic sampling occurred at carrying capacity. sampled sites were the most common (Fig. 4). This means that DDMs predicted presence accurately even when abundances at sampled sites were intermediate rather than small. However, in the fixed fecundity scenario, when maximum fecundity was not spatially correlated with survival, the DDMs did not accurately predict occupancy at present sites for a few parameter combinations, even when sites had small population sizes during sampling (small leftmost bar in the middle column of Fig. 4).
In the case where populations were at 50% of carrying capacity at the start of demographic sampling DDMs had higher prediction accuracy at present sites than correlative species distribution models (SDMs) in both the correlated (Fig. 5a) and negatively correlated (Supporting information) fecundity scenarios. However, SDMs predicted locations where the population was absent more accurately ( Fig. 5b and Supporting information). The trade-off in improved accuracy at present and absent sites for DDMs and SDMs, respectively, occurred in over 98 percent of the simulations (points in the lower-right, grey region in Fig. 5c and Supporting information). SDMs improved total accuracy over DDMs more frequently if survival was correlated with fecundity (more points below the 1-1 line in Fig. 5c). However, the DDM improved overall accuracy if survival and fecundity were negatively correlated (more points above the 1-1 line in Supporting information). In the negatively correlated fecundity scenario, the accuracy of both methods declined compared to the correlated scenario (see more left bars in the histograms in Supporting information than Fig. 5a-b). However, the relative improvement of DDMs over SDMs was due to major decreases in SDM accuracy at present sites under negatively correlated fecundity (see the leftward shift of pink bars in Supporting information compared to Fig. 5a).
All of the results discussed thus far assumed that populations were at a specified density, relative to carrying capacity, at all sites used to build the DDM. If only a portion of sites started at a specified density, perturbed below carrying capacity, while others started at carrying capacity, predictions worsened (Supporting information). For example, if fecundity was logistic, 25-61 percent of sites supplying demographic data needed to be perturbed from carrying capacity in order to achieve 80 percent prediction accuracy in 90 percent of the simulations (Supporting information) across the three fecundity correlation structures and three perturbation magnitudes tested. The results were qualitatively similar for Beverton-Holt fecundity, but with a higher proportion of sites that needed to be perturbed (34-75% of sites) to achieve the same accuracy (Supporting information). In the case where initial population size was uniformly distributed at each site, 80 percent prediction accuracy was achieved if mean initial densities exceeded 71% of carrying capacity, across all scenarios (Supporting information).
In general, given 200 sampled sites, across all scenarios, if mean initial abundance, averaged across sites, was 70% of carrying capacity or less, a density-independent DDM predicted present sites with at least 80% accuracy, in at least 90% of the simulations (Supporting information and Fig. 6a). However, demographic data from fewer than 200 sites, meant more sites had to be perturbed below carrying capacity during demographic sampling to achieve a desired DDM prediction accuracy. For example, for logistic fecundity, positively correlated with survival and populations at 50% of carrying capacity in perturbed sites, the DDM required 25, 61 and 87% of sites to be perturbed below carrying capacity, given 200, 50 and 30 sampled sites, respectively (Fig. 6a-c). Even if all populations were at 50% of carrying capacity during the start of demographic sampling, it was impossible to guarantee 80% accuracy in 90% of the simulations, if there were 25 sampled sites or fewer (Fig. 6d-e). Additionally, DDMs built with long-term monitoring data (e.g. 10 time-steps or greater) provided less accurate predictions (Supporting information) despite increased sample sizes, because sampled populations frequently saturated at carrying capacity during data collection.

Discussion
Using 1.5 million simulations of hypothetical species' rangedynamics, we evaluated when deterministic, density-independent, demographic distribution models (DDMs) accurately predicted species' distributions. While DDM predictions were biased towards species absence in simulations where Figure 3. The probability that the DDM correctly predicted species' presence at present sites declined as population size, relative to carrying capacity, increased at sampled sites. Each black dot is the percent of occupied sites at the end of a single simulation where the DDM predicted λ i > 1, given an initial population size, during demographic sampling, specified by the x-axis. The red line is a fitted smooth monotonic curve to these data. The green square and blue circle on the x-axis are the critical initial population sizes (90.1%, and 94.2% of carrying capacity, respectively) required, during demographic sampling, to achieve a 90% and 80% chance of correctly predicting occupied sites at the end of the simulation. All parameters were set to their baseline, and the fecundity function was logistic. Absent sites were always predicted absent by the DDM with 96-100% accuracy regardless of initial population abundance in sampled sites and hence are not displayed here. data used to build the DDM came from sites with populations near carrying capacity, our comprehensive simulations support the following generality: if mean initial population size is less than 50% of carrying capacity, averaged across > 25 sites, where demographic data is available, a density-independent DDM will predict present sites accurately. When this condition was satisfied, DDMs outperformed correlative species distribution models (SDMs) at accurately predicting present sites, but, not absent sites. Our results suggest that density-independent DDMs may be useful for predicting species ranges for some taxa and applications (Briscoe et al. 2019), despite obvious limitations (Ellner et al. 2016). Often species' have multiple locations available for sampling with populations well below carrying capacity (Haak 2000, Williams et al. 2011, and there are 11 species of plants and animals with matrix population models built from data at > 25 sites, already available in the global, open-access databases COMPADRE andCOMADRE (Salguero-Gómez et al. 2015, 2016).
One particularly important application of density-independent DDMs is invasive species risk mapping (Merow et al. 2017), an application where traditional SDM approaches have been criticized (Liu et al. 2020). Our work confirms that this is potentially an appropriate use of DDMs because invasive species, at the start of an invasion, are typically well below carrying capacity (Ramula et al. 2008, Davis 2009, Burns et al. 2013. Additionally, because successful invaders often invade multiple locations across wide geographic ranges, they are also species for which demographic data replicated across geographically distant and climatically dissimilar sites can be collected (Marshall 2015). Our simulations suggest more than 25 such sites are required for accurate predictions. While this will be feasible for many invasive species (Merow et al. 2017), DDMs will not be useful for mapping Figure 4. Histogram of critical population size at the start of demographic sampling required for the DDM to achieve 80% prediction accuracy at present sites, across 1000 randomly drawn parameter combinations. Critical population density (initial abundance as a proportion of carrying capacity) was typically between 0.75 and 0.95. The top and bottom rows are for Logistic and Beverton-Holt negative density-dependent fecundity functions. Maximum fecundity, at low population densities, is positively and negatively correlated with survival in the left and right columns. In the middle column, maximum fecundity is fixed across the landscape. As long as both survival and fecundity were correlated with environmental variables, DDMs achieved an 80% prediction accuracy in nearly all simulations. Note that critical population size means the population must be at or lower than this population size, to achieve the accuracy threshold.
invasion risk for species already abundant throughout their entire invaded ranges or those whose populations start small but quickly saturate to carrying capacity during demographic data collection.
DDMs may also be appropriate for predicting threatened species ranges, because threatened populations are often at low densities (IUCN 2012). Since DDMs tie these predictions to mechanisms of decline and growth, they may provide insight into which management actions will maximize species persistence (Briscoe et al. 2019). However, our results identify a key limitation of using DDMs for threatened species management. While DDMs accurately predicted presence at sites where the species persisted, correlative species distribution models SDMs outperformed DDMs at sites where the species was absent in over 95 percent of the simulations. Therefore, our results suggest that DDMs may be more appropriate for applications where identifying present sites correctly is more important than identifying absent sites. For example, in invasion risking mapping, predicting a local invasion at a site where an invasion fails to occur is a more tolerable error than missing the location of a future invasion. In this case, present site predictions are more important and a DDM will be appropriate, if demographic data from low density sites are available. In contrast, a manager looking for a site to release a threatened species, to establish a new population (often called a species 'translocation'), would not want to select a site where the species will go extinct. Therefore, when identifying translocation sites, predicting absences accurately is very important, and DDMs may be inappropriate. Our DDMs possibly overpredicted presence in sites where the species went extinct because they were deterministic and therefore did not predict extinctions caused by demographic stochasticity. Therefore, Figure 5. DDM and SDM prediction accuracy at (a) present and (b) absent sites and (c) difference in prediction accuracy between SDMs and DDMs at present sites (horizontal axis) and absent sites (vertical axis), given initial densities at 50% of carrying capacity. Positive values indicate that DDMs have higher prediction accuracy. In (c) each point corresponds to a single randomly sampled parameter set. Almost all points fall in the lower right quadrant, corresponding to parameter sets where DDMs more accurately predicted present sites, and SDMs more accurately predicted absent sites. Points above the red one-to-one line correspond to parameter sets where a DDM's improved prediction accuracy at present sites is higher than the DDM's decreased prediction accuracy at absent sites. Fecundity was logistic, and positively correlated with survival.
applications for which correct predictions in unsuitable habitat are more important than predictions in suitable habitat require a stochastic DDM, or a coupled niche-population model, to account for demographic stochasticity explicitly (Keith et al. 2008, Fordham et al. 2013, or an SDM, which can account for extinctions from demographic stochasticity implicitly.
Perhaps the biggest obstacle for wide-scale use of densityindependent DDMs is that, for many species, demographic data are sampled primarily at sites where the species is already abundant (Quintana-Ascencio et al. 2018, Fournier et al. 2019. Field ecologists do not typically risk designing randomized, labor-intensive, demographic studies at sites where populations are so small that the ecologists may not even find individuals to sample. However, there is a small subset of the demography literature focused on populations at the edge of species ranges, where population density is often small (Sexton et al. 2009, Eckhart et al. 2011, Pironon et al. 2017. Our results show that demographic data from such low-density sites are highly valuable for building accurate densityindependent DDMs. There are three major caveats behind our approach to identifying guidelines for when to use density-independent DDMs. First, we simulated range dynamics using a simple model with several underlying assumptions, such as common pool dispersal, random/fixed initial population densities across sites, density dependence in fecundity and demographic rates influenced by life stages rather than continuous traits (Easterling et al. 2000). Second, we assumed no systematic environmental change (e.g. climate change or deforestation). Lastly, the DDM used a model that closely matched the stochastic model employed to simulate the data. Some of these choices may have affected the relative performance of DDMs versus SDMs. For example, DDM performance may have improved, relative to SDMs, if we included systematic, environmental change, allowing estimated vital rates to change through time with environmental drivers (Evans et al. 2016). And the relative accuracy of DDMs might have declined if the DDM model did not closely match the simulation model. However, the main result of our paper, that density-independent DDMs will only be useful when demographic data are collected from areas where populations are below carrying capacity is likely robust to all of these caveats.
For modelers who wish to predict species ranges using process-explicit models for species with data collected from populations near carrying capacity, one option is to model density dependence explicitly. Simulation studies have found that models including carrying-capacity (Pagel andSchurr 2012, Schurr et al. 2012) can outperform density-independent methods when predicting species occurrence (Zurell et al. 2016). And there are a few examples of empirically-driven density-dependent demographic models (Vanderwel et al. 2013, García-Callejas et al. 2016, Pagel et al. 2020). These models, which incorporate density dependence explicitly are likely required to predict transient dynamics and project species abundances (rather than just presence absence). Unfortunately, fitting density-dependent models requires, not only data at multiple sites, across multiple environmental conditions and tracking multiple life stages, but also requires replication across populations at different densities. This may be impractical to obtain in many scenarios. Here, we have demonstrated that ignoring density dependence when predicting species ranges from demographic models is a practical first step and likely appropriate in several situations.

Data availability statement
This is a simulation study. All R scripts to reproduce the analysis can be found in the Supporting information. They are self-contained and do not use external data.
Funding -This work was supported by an ARC Centre of Excellence in Environmental Decisions workshop and ARC Discovery Project (DP180101852). MHH was supported by an ARC DECRA Figure 6. Percent of present sites correctly predicted by the DDM as a function of the percentage of sites perturbed 50% below carrying capacity given data from (a) 200, (b) 50, (c) 30, (d) 25 and (e) 20, sites respectively. Dark and light shaded regions are 80 and 95% confidence intervals, respectively. Black circles are the critical percentage of sites that must be perturbed below carrying capacity during demographic sampling to achieve 80% prediction accuracy or higher at present sites in 90% of the simulations. As the number of sampled sites decreases, a higher percentage of sites need to be below carrying capacity to achieve the desired accuracy. If there were 25 sampled sites, or fewer, it was not possible to achieve 80% prediction accuracy, in 90% of the simulations (d, e). Fecundity was logistic, and maximum fecundity was positively correlated with survival.