A cloudy forecast for species distribution models: Predictive uncertainties abound for California birds after a century of climate and land‐use change

Correlative species distribution models are widely used to quantify past shifts in ranges or communities, and to predict future outcomes under ongoing global change. Practitioners confront a wide range of potentially plausible models for ecological dynamics, but most specific applications only consider a narrow set. Here, we clarify that certain model structures can embed restrictive assumptions about key sources of forecast uncertainty into an analysis. To evaluate forecast uncertainties and our ability to explain community change, we fit and compared 39 candidate multi‐ or joint species occupancy models to avian incidence data collected at 320 sites across California during the early 20th century and resurveyed a century later. We found massive (>20,000 LOOIC) differences in within‐time information criterion across models. Poorer fitting models omitting multivariate random effects predicted less variation in species richness changes and smaller contemporary communities, with considerable variation in predicted spatial patterns in richness changes across models. The top models suggested avian environmental associations changed across time, contemporary avian occupancy was influenced by previous site‐specific occupancy states, and that both latent site variables and species associations with these variables also varied over time. Collectively, our results recapitulate that simplified model assumptions not only impact predictive fit but may mask important sources of forecast uncertainty and mischaracterize the current state of system understanding when seeking to describe or project community responses to global change. We recommend that researchers seeking to make long‐term forecasts prioritize characterizing forecast uncertainty over seeking to present a single best guess. To do so reliably, we urge practitioners to employ models capable of characterizing the key sources of forecast uncertainty, where predictors, parameters and random effects may vary over time or further interact with previous occurrence states.


| INTRODUC TI ON
Changes in species distributions and assemblages are important indicators of the effects of global change.Correlative species distribution models (SDMs, Elith & Leathwick, 2009) are widely used to characterize the impacts of environmental changes on distributions and communities, and for forecasting geographic range shifts and the emergence of non-analog communities (Lawler et al., 2009;Stralberg et al., 2009).As SDM structures, algorithms, data requirements, and software are extremely variable, a wide range of model specifications have been used to describe or forecast the impacts of global change on species and communities (avian examples include Briscoe et al., 2021;Clement et al., 2016;Jones et al., 2022;Rodenhouse et al., 2008;Rushing et al., 2020;Stralberg et al., 2009).
Specifying the process component of an SDM-a mathematical depiction of the ecological processes that generate species distributionspresents a set of modeling decisions.SDMs can attribute changes in species distributions to temporal variation in predictors (the values of different environmental variables and autocovariates like a previous occupancy state that accounts for memory effects), parameters (e.g., the effects of environmental variables often measured by their slopes), or unmeasured (latent or "random") effects.However, most applications focused on describing or projecting distribution changes commonly fix certain process components a priori.For example, models might assume that variation in species occurrence over time is explained by temporal variation in environmental predictors and their effects (MacLean et al., 2018), or that variation in occurrence is explained by temporally fixed environmental effects and time-varying random effects (Rushing et al., 2020).Different model specifications can exhibit differences in inference, fit, and projections of where and how much distributions have or will change (e.g., Briscoe et al., 2021;Brodie et al., 2020;Norberg et al., 2019).
What appears less widely recognized is that modeling decisions can embed strong prior beliefs about the importance of different components of forecast uncertainty into the analysis (Dietze, 2017;Zylstra & Zipkin, 2021).These components include: (1) parameter uncertainty (i.e., estimation error); (2) environmental uncertainty, which is the degree to which a response is sensitive to environmental effects and the predictability of future environmental conditions; (3) internal uncertainty, which is the degree to which a future state (e.g., species occurrence) is sensitive to the current state (this is sometimes called a "memory effect") and the predictability of current distributions; (4) parameter variation, which includes the degree to which estimated parameters vary in unexplainable ways over time or other dimensions; and (5) process error, which is the magnitude of unexplained residual error and its predictability.For example, consider an SDM that relates species occurrence only to time-invariant effects of environmental variables.This model presupposes a system that is well-explained and relatively easy to predict (i.e., there is no structured process error, no parameter variability, and no sensitivity to previous states).It effectively assumes that parameter uncertainty and environmental uncertainty are the only sources of forecast uncertainty, and that what limits the capacity to predict future distributions is the ability to forecast future environmental conditions and estimate fixed environmental associations.These assumptions may not always be realistic (Lovell et al., 2023;Rollinson et al., 2021).Next, consider an extension of this model that allows environmental associations to vary over time, adds random effects to explain structured residual error, and permits these random effects to vary over time.The second model does not assume that the system is well-explained, nor that there is no parameter variation or temporal variation in process error.Fitting the second model may reveal that both assumptions are safe to ignore, while fitting the first model precludes discovering if either assumption is meaningful.
Different model specifications imply different processes generating distributional dynamics and lead to different inferences.Thus, decisions about model structure play a key role in understanding and responding to factors affecting distributional change.These decisions also play an important role in clarifying how well change can be explained and how precisely future change can be predicted.
Conditioning on a single model with problematic assumptions about different uncertainties may not only lead to poor forecasts and ineffective actions focused on mitigation or adaptation but also lead to a false sense of confidence in forecast reliability and action efficacy.
Here, we begin by clarifying how different sources of forecast uncertainty are linked to SDM terms, and how model structures with more restrictive assumptions are nested within broader models that account for more uncertainty.We then empirically evaluate how well distinct SDM structures fit bird incidence data and how differently they predict retrodictive change.We focus on 185 bird species at locations initially surveyed by Joseph Grinnell and colleagues during the early 20th century, and resurveyed, on average, 93 years later as part of the Grinnell Resurvey Project (Beissinger et al., 2023;Iknayan & Beissinger, 2018;MacLean et al., 2018;Socolar et al., 2017;Tingley et al., 2009Tingley et al., , 2012)).Because the data reflect avian dynamics over a century of climate and land-use change, it may be useful for predicting how the state's avian communities will change over the coming century.A chief barrier, however, is that there are many ways to specify models to describe or forecast distributional change, and previous investigations have largely conditioned on single models with distinct predictors and structures.
Therefore, we fit 39 models that vary on the basis of whether: (1) (4) modern distributions were shaped by contemporary conditions, the magnitude of change in these conditions over time, or both.
We used post-hoc inference from more complete models to evaluate which uncertainties were impactful and the degree to which different model components changed over time.We employed predictive information criterion to assess whether accounting for these uncertainties meaningfully improves model fit and compared model-specific predictions of species richness and richness change over time to assess whether modeling decisions led to markedly different predictions.We find that several sources of forecast uncertainty commonly ignored within most SDMs appear important for describing a century of change, which suggests that many empirical longer-term forecasts understate overall uncertainty.

| LINKING S DM TERMS TO PRO CE SS E S OF CHANG E AND SOURCE S OF FOREC A S T UN CERTAINT Y
We start by broadly characterizing the SDM model space and progress to show how models nested within from this general form can embed different assumptions about key sources of forecast uncertainty (Figure 1).We use a binary occupancy state to describe species distributions and index species' i and locations j.We use a t superscript to generically index time.For simplicity, we consider a time-series length of two that matches the Grinnell Resurvey data set used in our application (i.e., "historic" and "modern" survey periods denoted with h/m superscripts, respectively).The occupancy probability for species i in site j at time t is t i,j , where t = h or m and the corresponding binary occupancy state z t i,j ~ Bernoulli ( t i,j ).
Throughout, bold face is used to distinguish vectors or matrices from scalars.
Many parametric correlative SDMs used to assess or project change in species occurrence over time can be described as some variant of: where g −1 is an inverse link function (e.g., probit) that maps the linear predictor in parentheses to the 0-1 scale.Equation (1) indicates that variation in t i,j can arise through temporal variation in a speciesspecific intercept ( t i ), through temporal variation in a row-vector of predictors (environmental terms, autocovariates, or further basis expansions) at site j (X t j ), through temporal variation in a column-vector of species-specific coefficients for the predictors ( t i ), or through variation in process error t i,j . Following common practice for models used for multiple species (Dorazio & Royle, 2005;Ovaskainen et al., 2017; (1) Typology of the candidate models used to fit species incidence data.Numbers inside boxes correspond to specific models in Table 1, Appendix S2: Table S2."Memory" indicates autologistic structure with additive dependence on previous occupancy state, "Memory x environment" denotes an interaction between the previous occupancy state and other effects.Factors and loadings refer to the structure of random effects.Warton et al., 2015), we assume that species' parameters exhibit a hierarchical structure and that t i,j is expressed as t j t i , the product of a row-vector of time-specific latent site variables ( t j , often called "latent factors") and a column-vector of time-specific species coefficients for the latent site variables ( t i , often called "loadings").Nested within Equation (1) are many simpler models where terms are either assumed time-invariant or completely omitted (Figure 1).
For example, if latent factors and loadings are omitted, Equation (1) reduces to a fully time-varying multi-SDM (e.g., Dorazio et al., 2010;Dorazio & Royle, 2005).If one further holds t i constant over time, this model reduces to a multi-SDM with additive time effects.
The elements of Equation ( 1) have different interpretations and implications for explaining change.Distribution changes attributable to predictors are straightforward to interpret as the explainable effects of environmental change, and forecast uncertainty depends upon uncertainty in future environmental conditions.Changes in effects t i capture changing environmental associations, and indicate forecast sensitivity to parameter variability (i.e., uncertainty in future environmental associations).This represents an unexplainable change in the covariance between species occurrence and the environment (Dietze, 2017), which can arise from ecological heterogeneity like species plasticity or adaptation (Guisan et al., 2014) or from model misspecification (Lovell et al., 2023;Yackulic et al., 2015).

Changes in the intercept t
i also indicate sensitivity to parameter variability; these describe unexplained, unstructured changes in occurrence across time that other terms do not describe.Random effects ( t j t i ) are employed to parsimoniously approximate process error that covaries across species, and can be used to derive a residual variance-covariance matrix as t i t i ′ (Ovaskainen et al., 2017;Warton et al., 2015).These also capture unexplainable variation (Clark et al., 2014): factors t j roughly represent omitted process terms, like missing predictors or non-linearities, and loadings ( t i ) roughly represent species' associations with these terms.Temporal variation in latent factors implies that site-specific residual errors change over time, changes in species loadings imply changes in how residual errors covary across species, and temporal changes in random effects indicate that forecast uncertainty depends upon how well their future values can be predicted (Dietze, 2017).
Finally, one can extend the general model to assume that the current occupancy state depends upon the previous occupancy state by introducing z h i,j as an autocovariate within X m j (MacKenzie et al., 2003;Royle & Kéry, 2007).Imagining only a single environmental predictor and no random effects, m describes a temporally "autologistic" model (Royle & Dorazio, 2008) where β m 1,i is the effect of z h i,j on m i,j .Introducing this term also leads to state-dependent transition probabilities that are easily interpreted as ecological processes: is equivalent to the probability of persistence, and is equivalent to the probability of colonization.One could further pose an interaction between z h i,j and environmental predictors where We call these "colonization/persistence" models, where persistence probability and colonization probability = Here, the effect of previous occupancy depends on the environmental context.Both models imply sensitivity to initial (or previous) state uncertainty, such that forecast uncertainty depends on how accurately the current distribution is estimated.
As more terms are included in Equation (1) and allowed to vary over time, both model complexity (Figure 1) and forecast uncertainty increases.A model specified as t i,j = g −1 i + X t j i where X t j only consists of environmental predictors implies that forecast uncertainty is only sensitive to parameter uncertainty and environmental uncertainty.Allowing i and i to vary over time allows for potential sensitivity to parameter variation.Adding the autocovariate z h i,j to X m j allows for potential sensitivity to initial state uncertainty, while specifying interactions between z h i,j and other predictors introduces potential linkages between initial state uncertainty and environmental uncertainty.Adding random effects t j t i introduces process error and potential sensitivity to its own variation over time.
Given that different model specifications imply different processes generating distributional dynamics, how does one determine which terms and sources of uncertainty may be meaningful?
Because the model set is nested (Figure 1), one could fit a "full" model and make inference post-hoc.Alternatively, model selection methods (Hooten & Hobbs, 2015) could be employed to determine which combinations of terms have the best predictive properties for describing the system.Within-sample model selection will not identify which model has the most forecast skill using the data at hand (Dietze, 2017).But it can identify which model best predicts what has occurred, which provides insight into the important predictive uncertainties that a forecast may need to confront.
Our exploration focuses on models that are commonly pursued in SDMs focused on understanding occupancy change across time or were previously posed for specific objectives in regions of our study system (Figure 1, Table 1).Below we describe the methods used for data collection and model construction and evaluation.Most locations were surveyed multiple times within eras (range:

| Bird surveys
1-32 historic era, 1-9 modern era), typically within a single year and on consecutive days.We limited the maximum number of surveys per location and era to 5 for analysis, because the decisions leading to historic sampling effort were unknown and potential covariance between survey effort and detectability can induce bias if not accounted for.We also omitted surveys preceded by gaps of many years to limit closure violations, noting that previous analysis suggested species occurrence changed little across successive or nearly successive years (Tingley et al., 2012).While we explored models where site and year-specific differences affected availability for detection given occupancy (sensu Nichols et al., 2008), these produced essentially the same results.
Our analysis focused on 185 diurnal bird species (Appendix S2, Table S1) known to breed within the region (Bird Life International and Handbook of the Birds of the World, 2021) for which the sampling was reasonably designed.We excluded nocturnal species and some waterbirds but included diurnal raptors (Beissinger et al., 2023).We did not augment the species pool to make inference about completely unobserved species.

| Model predictors
We derived site covariates (Figure 2) from a 1-km buffer around transect survey locations, a scale that has previously provided better fit than smaller definitions (Beissinger et al., 2023) and is tractable for conservation planning.We used California's Basin Characterization Model (Flint et al., 2013) counterparts.Briefly, we derived agricultural and urban land cover from maps of irrigated lands in California published by the US Department of Agriculture (1922) and U.S. Geological Survey maps (ranging from 1906 to 1932), which were subsequently georeferenced and digitized in ArcGIS Pro 2.8.For modern land use, we derived the proportions of agricultural (Ag) and urban (Urban) land cover within each buffer from the 2016 National Land Cover Database for the modern period.For the historic period, urban land cover was defined based on a 30 m buffer around buildings and major roads (Beissinger et al., 2023;MacLean et al., 2018), and subsequently rasterized to a 30 m resolution to maintain consistency with the resolution of modern land cover products.

| SDM process models
We fit a set of temporally stratified multispecies or joint-species occupancy models (Figure 1) to estimate site-level measures of avian occupancy and species richness (Dorazio & Royle, 2005;Tobler et al., 2019).These are hierarchical models where repeated incidence data is assumed to be generated by an ecological state process and an observation process conditional on the state variable.
Our set of process models broadly follows the typology of Notation broadly follows the previous description, but with three differences employed for brevity.We denote covariates describing environmental change from the historic to modern period as Δ X j and their effects as Δ i to distinguish them from measures and effects of contemporaneous environmental conditions.We denote autologistic effects using i rather than a beta term.Similarly, for models following colonization-persistence parameterizations, we use vectors i and i (rather than m i ) to distinguish coefficients when z h i,j = 0 or z h i,j = 1, with the intercept m i and additive effect of previous occurrence implicit in these vectors.Finally, we omit a superscript if a term was assumed time-invariant.
In all models where historic covariates were used to explain variation in historic occupancy, vector X h j included Temp h j , Precip h j , FPCTemp h j , Ag h j , Urban h j , Temp 2 h j and FPCTemp 2,h j .The latter two terms describe quadratic effects.When modern occupancy was a function of modern covariates, vector X m j included a Temp m j , Precip m j , FPCTemp m j , Ag m j , Urban m j , Temp 2,m j and FPCTemp 2,m j .When modern occupancy was a function of environmental change vector Δ X j included Δ Temp j , Δ Precip j , Δ FPCTemp j , Δ Ag j , and Δ Urban j .While we initially sought to pursue quadratic effects of each term and interactions between land-use and climatic variables, the empirical distribution of the data was such that interaction terms and many quadratic terms were highly collinear with main terms (r often >.9).Moreover, modern land coverand the change in land cover were highly correlated.Thus, models where modern occurrence varied as a function of Δ X j + X m j omitted Δ Ag j and Δ Urban j .We briefly describe a subset of the models considered (see Figure 1, Table 1, and further description in Appendix S2: Table S2 for the full model set) and refer to implementations elsewhere.
Model numbers below relate to labels on Figure 1.

Time-varying intercept:
Species distributions track concurrent environmental conditions but also exhibit spatially unstructured trends over time (Fuller et al., 2016).

Autologistic:
. Species distributions track concurrent environmental conditions with an additive effect of the previous occupancy state (Miller et al., 2018).

Time-varying intercept and coefficients:
Species distributions are driven by concurrent environmental conditions, but associations may change over time and there are also spatially unstructured trends (MacLean et al., 2018;Tingley & Beissinger, 2013).

Autologistic with time-varying coefficients:
. Species distributions are associated with concurrent environmental conditions, but associations may change over time, and there is an additive effect of previous occupancy (Iknayan & Beissinger, 2020).

Colonization/Persistence:
. Species distributions are associated with concurrent environmental conditions, but associations and baseline prevalence change over time, and the effects of the modern environment depend upon the previous occurrence state (Maphisa et al., 2019).
Models 7-12 (Figure 1, Table 1, Appendix S2: Table S2) are variants of models 4, 5, and 6 using different predictors (Δ X j or Δ X j + X m j ) for modern occupancy.The chief assumptions are largely preserved, Models 13-18 omit explicit environmental predictors, using time-varying random effects to describe variation in species occupancy (Figure 1, Table 1, Appendix S2: Table S2).We consider the cases where temporal variation in random effects follows t j i or t j t i .

Time-varying factors and loadings:
. Unmeasured site characteristics vary over time, and so do species associations with these characteristics.
To the best of our knowledge, time-varying species loadings have not been previously considered, although Hui et al. (2023) pose something similar.Models 15 and 16 simply add a time-varying intercept to models 13 and 14.Extending these to also include an autologistic effect yields models 17 and 18.
The remaining models partially cross models 13 and 14 with models 1-12.Models 19-21 extend models 1, 2, and 3 to include time-varying factors and fixed species loadings.Models 22-30 are variants of models 4-12 with time-varying factors and fixed species loadings.The remaining models are variants of models 4-12 with both time-varying factors and loadings.We omit models where latent factors are constant and species loadings time-varying, and models where species coefficients are constant while loadings are time-varying.

| Observation model
We assumed the occupancy state of birds was imperfectly observed in our surveys.Incidence y over repeated surveys k = 1, 2, … K at a specific site and primary period was observed as y t i,j,k ∼ Bernoulli z t i,j p t i,j,k , where p t i,j,k denotes the probability a species was observed if present.We expected detection to vary across sites, eras, and visits as logit −1 t 0,i + t j,k where t j,k ∼ N 0, 2 detection .Thus, we posited certain species were, on average, more or less likely to be detected in one period or another.The second term follows Riecke et al. (2021) in assuming that survey efficiency varied in shared ways across species, perhaps due to differences in effort, weather, and so forth.

| Model fitting and assessment
We used Nimble (de Valpine et al., 2017) to fit models using Markov chain Monte Carlo simulation called through the R computing environment (R Core Team, 2021).We used custom ('dDynOcc') distributions provided by the "nimbleEcology" package (Goldstein et al., 2021;Ponisio et al., 2020) to specify the data likelihood for each model as a 2-state hidden Markov model marginalizing over the latent occupancy states (Gimenez et al., 2014).All MCMC simulations were performed across three chains with a burn-in of 60,000 iterations.An additional 20,000-30,000 iterations per-chain were sampled and variably thinned to a total posterior sample of 6000 per model.We sampled from additional chains when the number of effective Monte Carlo samples was low (less than several hundred) for any parameter.If models did not converge within the initial run, we used the final samples as starting values for subsequent chains of the same length, repeating as needed.We assessed model convergence visually and with standard diagnostic tests (e.g., Gelman & Rubin, 1992).We report prior specifications and other key details in Appendix S1.
We used expected log posterior predictive density (ELPD) estimated via a pareto-smoothed importance sampling approximation to leave-one-out cross-validation (Vehtari et al., 2017) in R package 'loo' (Vehtari et al., 2023) to explore how different model specifications influenced predictive fit.We treated each speciesby-site combination (59,200) as a distinct datum to be "left out".
This decision effectively evaluates a model's predictive accuracy for the occupancy trajectory of a withheld species at a given site (see further discussion in Appendix S1).Models were ranked based on their LOOIC score (LOOIC = −2 × ELPD), with the weight of support for specific models assessed using bootstrap-corrected pseudo-Bayesian modeling averaging weights (pseudoBMA+; Yao et al., 2018).
We supplement model selection in two ways.Because differences in fit to the observations do not necessarily imply differences in the predicted ecological states of interest, we derived posteriorpredictive site-and period-specific estimates of species richness for each posterior sample (simulating z t i,j ~Bernoulli( t i,j ), for each iteration and deriving richness as Rj = ∑ 185 i=1 z t i,j ) and changes in species richness across periods for a subset of models across the range of model rankings.Following suggested practices, estimates of occupancy probability used to derive richness estimates included random effects where applicable (Gelfand & Shirota, 2019;Wilkinson et al., 2021).Secondly, we used post-hoc comparisons to evaluate the degree to which species coefficients and loadings practically changed over time, focusing on model 31 because the historic and modern occupancy processes mirrored one another.

| SDM model fit
All models adequately converged, but LOOIC calculations led to pointwise pareto-K estimates >0.7 for several models, indicating instability in the approximate leave-one-out procedure for certain species-site combinations (Vehtari et al., 2017).Posterior predictive checks based on deviance residuals (Broms et al., 2016) found no evidence that any model exhibited lack of fit.The most extreme Bayesian p-values were .30for the "space-for-time" model (model 1) and 0.67 for a model ( 7) where intercept and coefficients changed over time, with modern predictors describing environmental change rather than modern conditions.All others ranged between .40 and .60.
Because omnibus checks can lack power (Conn et al., 2018), we employed further posterior-predictive assessments, simulating observations from the fitted model, and aggregating these across sites and eras (i.e., simulating the species richness observed at a site) or across species and eras (i.e., simulating the number of sites at which a species was observed).All models were capable of recreating patterns in observed extent of species occurrence, but models without latent variables and species loadings could not recreate site-by-era specific patterns in the number of site-specific species observed (Appendix S2: Figure S2).Thus, models without latent variables appeared to miss important site-structured heterogeneity.Indeed, for the basic "space-for-time" model (model 1), the number of species detected at sites in the modern era (presumably fewer than occurred) exceeded the upper 95% CI of posterior predicted richness at 15% of sites (Appendix S2: Figure S3), suggesting the model's predictions understated richness.
Consequently, we assume that diagnostic instability for models without random effects arose partially from model misspecification.
For models with random effects, diagnostic instability may have instead reflected high-leverage observations that were difficult to predict out-of-sample due to a combination of model flexibility and sparsely observed species.Although recommended practice is to directly estimate leave-one-out predictive densities when diagnostics indicate the approximate leave-on-out approach is unstable (Vehtari et al., 2017), this was computationally impractical.We proceeded cautiously based on the selection criteria, assessed sensitivity to problematic pointwise pareto-K estimates by further comparing model weights after omitting these scores, and examined resulting estimates to ensure that additional structural terms had meaningful effects.

| SDM model selection
Selection criteria highlighted a few general patterns (Figure 3, Appendix S2: Table S3).Most strikingly, any type of latent variable model-regardless of assumptions about dynamics, covariates, or even the inclusion of any explicit environmental predictors-vastly outperformed models lacking latent variables (Δ LOOIC > 10,000, Appendix S2: Table S3, Figure 3).A broad LOOIC-based rank ordering suggested models where predictors, parameters, and random effects (factors and loadings, and then factors only) were each allowed to vary over time performed best.This was followed in turn by: ( 1) with time-varying factors and loadings versus time-varying factors alone was small in the scheme of all contrasts (Δ LOOIC > 100 ), there was consistent support for the former.Off-diagonal portions of the species-to-species residual correlation matrices derived from species-loading estimates commonly exhibited only moderate residual correlation across eras (e.g., for model 31 where the set of predictors is congruent across eras, r = 0.40, 95% CI = 0.34-0.46,Appendix S2: Figure S4), suggesting meaningful changes in crossspecies covariation over time.
The choice of model parameters (i.e., covariates) and dependence assumptions also exhibited a few consistent patterns in selection criteria, although differences in fit diminished when random effects were included.First, criteria consistently indicated that autologistic models outperformed models without memory, and that colonization-persistence models generally outperformed both.Models assuming fixed environmental associations (e.g., "space-for-time") generally performed more poorly than those allowing environmental associations to change over time.Estimates from model 31 (Figure 1 and Table 1), where the predictors for historic and modern occupancy were consistent but effects were allowed to vary, suggested correlations between era-specific coefficients of 0.86 (95% CI = 0.82-0.90).This was largely driven by intercept terms and when they were excluded, the cross-era correlation in environmental associations was much weaker (r = 0.33, 0.28-0.38,Appendix S2: Figure S5).Directional inference regarding species associations with temperature and seasonal mildness were particularly prone to changing over time (Appendix S2: Table S4).
The vast majority of pseudo-BMA+ support was spread across three models regardless of whether problematic point-wise estimates of out-of-sample fit were included or censured from the comparison (Appendix S2: Table S5).The full (most complex) model-a "colonization/persistence" model where occurrence transitions were functions of modern environmental conditions, environmental change, and time-varying factors and loadings-had weight = 0.23 (all point-wise ELPD estimates used) and 0.41 (point-wise ELPD estimates censured), respectively.We used this most-parameterized model to make inference about environmental effects (Appendix S2: Table S5, Figure 3), acknowledging that our regularizing priors may understate effect uncertainty.Inspection of hyper-parameter and coefficient estimates suggested that terms associated with environmental change (Δ X) had trivial influence, so we interpret a simpler candidate (model 37) differing only in omitting these effects as preferred (pseudo-BMA+ weights = 0.50 and 0.29, respectively).The third model with non-negligible support was an autologistic (contemporary + change) model (number 36), with dynamic factors/loadings and environmental predictors (pseudo-BMA+ weights = 0.26 and 0.30, respectively).As estimates from the prior two models strongly indicate that the effect of mean annual minimum temperature depended on the previous occupancy state, we consider this model also less preferable.While we do not pursue it here, pseudo-BMA+ weights suggest that combining these three models might produce the best predictions (Appendix S2: Table S5), and parameter inference suggests that predictions might be further improved by omitting several non-explanatory environmental terms.

| Inference on drivers of change and comparing predicted changes
Based on the most parameterized model (parameter estimates and CI in Appendix S2: Table S6), bird species were less likely to colonize sites as the modern annual minimum temperature increased, more likely to colonize sites with intermediate intra-annual variability in minimum temperature, and more likely to persist at sites with less modern variability in intra-annual minimum temperature (Figure 4).
Most models synoptically estimated greater expected modern species richness and larger richness gains in California's mid-longitude regions, like the Central Valley and Central Coast, and more depauperate modern richness and richness losses in hot and highly seasonal regions within the southern Great Basin and Mojave deserts (Figure 5).Models with time-varying random effects exhibited very similar patterns in posterior-predicted species richness and richness change (r ≈ 0.90, 95% CI ± 0.10 or less for all pairwise correlations, Figure 6), regardless of the covariates employed, inclusion of site-specific memory, or assumptions about changes in environmental associations.However, models omitting random effects predicted much less variation in species richness and richness changes over time, and they tended to predict lower species richness across periods than models with random effects.Predicted species richness correlations between models without latent factors were more erratic, suggesting predictions were less robust to model structure when random effects were omitted.For example, the basic colonization-persistence and time-varying models (models 6 and 4, respectively) made very similar estimates of predicted species richness (r = 0.96), but they were weakly correlated with the "space-fortime" estimates of richness (r = 0.52 and 0.54, respectively), which yielded far smaller contemporary communities than estimates from models with time-varying environmental effects (Figures 5 and 6).
Similarly, models without latent factors were sensitive to covariate choices.For example, colonization-persistence models where transitions were related to modern environmental conditions versus the change in conditions (models 6 and 11, Appendix S2: Table S2) yielded dissimilar estimates of expected richness and change across eras (r = 0.14, Figure 6).

| DISCUSS ION
Bird surveys repeated nearly a century apart across California provided an unusual opportunity to evaluate sensitivity to different sources of forecast uncertainty over the multi-decadal periods often used for strategic planning.Models assuming that key sources of forecast uncertainty could effectively be ignored often exhibited substantially worse fit (Figure 3).Models with time-varying random effects (factors and loadings) outperformed models that attributed change to environmental predictors and parameters alone.Models with environmental associations of species that changed over time outperformed models that assumed fixed associations, such as the widely used "space-for-time" SDM, which was among the weakest performers.Autologistic models that leveraged dependence upon site occupancy nearly a century earlier outperformed models without dependence, and colonization-persistence models generally outperformed both.
While we do not expect specific results here will generalize to all applications, they support the conclusion that long-term species distribution forecasts can be sensitive to environmental uncertainty, previous state uncertainty, parameter variability, and variation in process error (Dietze, 2017;Zylstra & Zipkin, 2021).
Our objective was not to identify which model would make the best forecasts (which would require evaluation using withheld time periods or waiting for the collection of future data), but to identify which sources of forecast uncertainty were important in our system.Our results indicate a system (1) that, by virtue of being sensitive to many sources of forecast uncertainty, is difficult to forecast accurately across longer horizons, and (2) where several key drivers are not fully understood.We discuss these findings below in relationship to their implications for avian communities and effective longer-term forecasts.

| Ecological complexity, uncertainty and shifting distributions of California birds
Changes in avian distributions in California were associated with variation in measured environmental conditions, changes in environmental associations across eras that interacted with the site-specific occupancy state a century ago, and unmeasured changes both in latent site characteristics and species responses to these characteristics (Figure 3).The potential simplifying assumptions that appeared most problematic were the omission of random effects and assuming species had time-invariant environmental associations.
When models included time-varying latent factors, there was little practical difference in predictions of species richness.This suggests such random effects are useful terms for guarding against model misspecification (Brodie et al., 2020;Lany et al., 2020).
F I G U R E 5 Point estimates (posterior predictive means) of species richness during the historic era, modern era, and changes in species richness across eras for a subset of models (models 1/4/6/11 at left, models 19/31/37/38 at right, Table 1 and Appendix S2: Table S2).Note models with latent factors suggest greater variation in richness and richness change, and finer-scale spatial variability in these predictions.1 and Appendix S2: Table S2.
Predictions were more disparate when random effects were omitted and variation in occupancy attributed to environmental effects appeared limited compared to random effects based on model selection scores.Moreover, models lacking random effects fit poorly and appeared to miss important site-structured heterogeneity in the observations.This manifested as poor empirical coverage (Harris et al., 2018) and in extreme cases, like the basic space-for-time model, resulted in apparent bias in estimates of species richness, which is a potential symptom of underfitting (Kéry & Royle, 2015).Model structural uncertainty-uncertainty arising from variation in model-specific predictions-can be a major source of forecast uncertainty (Brodie et al., 2022;Dietze, 2017).
To the extent we explored it here-focusing on practical differences in species richness rather than species-specific occupancy predictions-much of this uncertainty appears linked to the poor fit of models where we assumed explicit model predictors were the sole drivers of occurrence dynamics.That is, including data-driven terms that sop up residual error (like latent variables and species loadings) appears to reduce the magnitude of structural uncertainty.
One reason that species distributions and assemblages may be difficult to forecast over multi-decadal to centurial periods is that their dynamics are influenced by a mixture of ecological processes with different characteristic rates and scales.The paleo-ecological record provides evidence that environmental conditions drive shifts in regional species pools across longer time periods of millennia via filtering (Jackson & Blois, 2015).Across broad spatial and paleoecological scales, space-for-time predictions may perform reasonably well at coarse resolutions (Adler et al., 2020;Blois et al., 2013but see Klesse et al., 2020).Over decades to a century, however, the paleoecological record suggests that community dynamics are shaped by a combination of faster local processes and slower regional processeslocal communities may undergo substantial change while regional species pools remain relatively stable (Jackson & Blois, 2015).
Results here suggest a mixture of coarser and finer generating processes drove changes in avian communities.On average, we found birds were less likely to colonize the warmest locations and less likely to persist in locations where temperature was more seasonal (Figure 4).As a result, California's warm deserts appear to have suffered broad species losses driven by environmental change operating on physiological constraints (Riddell et al., 2019; Figure 5).
Conversely, the Central Valley and portions of the central California coast have seen an increase in species richness.However, there was pronounced local variation that is potentially attributable to changing species-environment associations and changes in both unmeasured site characteristics and species responses to these changes.
Aside from California's deserts, there was substantial local variability in richness changes across much of the state (Figure 5).In sum, local species assemblages did not appear to track long-term shifts in temperature, precipitation, and land use so much as reshuffle (Princé & Zuckerberg, 2015) in ways that were difficult to predict with the historic data alone.
Several terms estimated as being important for describing avian community shifts represent unexplainable variation.The best fitting model suggests that shifting environmental associations of species may have been related to non-equilibria starting conditions, because changes in distributions were shaped by environmentally distinct colonization and persistence processes (Yackulic et al., 2015).
However, parameter variability may also arise from adaptation, plasticity, or the use of predictors that were not strong causal drivers (Guisan et al., 2014;Lovell et al., 2023).The importance of sitespecific latent variables and species-specific loadings provides a further indication that several important drivers or processes were missing within our models; based on model selection patterns, residual site-by-time variation appeared more important than residual species-by-time co-variation.These missing drivers may include environmental phenomena like recent disturbance (Cohen et al., 2020;Jones et al., 2022; but see Iknayan & Beissinger, 2018); dispersal limitation, rescue effects, or other nuances shaping colonization and persistence processes (Bled et al., 2011;Chandler et al., 2015;DiRenzo et al., 2021); and non-linear responses to environmental gradients (Pederson et al., 2019;Stralberg et al., 2009).
While changing model structure may help reduce uncertainty associated with process error or parameter variability (Dietze, 2017), it is not clear whether further process complexity would improve forecast performance and reduce overall uncertainty (Harris et al., 2018;Perretti et al., 2013;Ward et al., 2014), or instead add more constituent unknowns (e.g., environmental covariates) that need to be predicted themselves to forecast distributions (Dietze, 2017).This is an important research priority.Although phenomenological, latent variables and species loadings provide a parsimonious means to capture residual variation and improve predictions (Lany et al., 2020;Warton et al., 2015).Uncertainty in the future values of random effects and their distributional parameters appears to be a key challenge for forecasting here, but recent work has introduced different autoregressive methods to better predict residual error (Hui et al., 2023;Thorson et al., 2016).Such approaches appear promising for efforts with more temporal replication.

| Implications for SDM models and forecasts
Given the widespread interest in forecasting species distributions across time horizons spanning 50-100 years to guide conservation planning, our results provide some sobering realities for how effectively SDMs can predict the future.Nevertheless, our results also suggest some strategies that might make these tools more useful.A central finding is that focusing on one or a narrow set of models to forecast species distributions, which is a common SDM approach, is problematic.It may not only understate model structure uncertainty (Brodie et al., 2022) but can also make it impossible to uncover other potentially important sources of forecast uncertainty (Dietze, 2017;Zylstra & Zipkin, 2021).As our results indicate that many of these sources of uncertainty can be non-negligible over the long term, we urge practitioners to consider models that are equipped to discover them.
Indeed, we argue that a characterizing uncertainty in future conditions is a higher priority for long-term forecasts than seeking to present a single-best guess.The surest way to improve forecast skill (and reduce uncertainty) is to iteratively collect data and compare forecasts from different models over nearer horizons (Brodie et al., 2022;Dietze et al., 2018;Petchy et al., 2015), and we recommend practitioners consider shorter-or multiple-horizons for strategic planning.Long-term forecasts are rarely subject to evaluation, their reliability is commonly unknown, and it is scarcely practical to wait for the (distant) future to evaluate model performance and managerial outcomes (Harris et al., 2018;Wiens et al., 2009).Although longer horizons may still be useful or necessary for some planning efforts, it may be useful to codify the purpose of long-term forecasts as characterizing uncertainty in future outcomes rather than trying to present a single best guess (Boettiger, 2022;Brodie et al., 2022;Schindler & Hilborn, 2015).Such presentation can inform planning and managerial action that is more robust to a range of outcomes.
Fuller models (where predictors, parameters, and random effects may vary over time, but where future states may also depend upon current states) are better suited for characterizing the magnitude of uncertainty and its constituent sources, although within-time regularization or model selection techniques may help direct focus on the key uncertainties given the data at hand (Hooten & Hobbs, 2015).
We also acknowledge there will be cases where salient assumptions cannot be assessed (e.g., changes in species-environment association cannot be assessed because one only has collected data during one time-interval).One way to bracket the uncertainty in these latter situations is to present a set of forecasts conditional on different scenarios for how terms may change over time (e.g., time-invariant, relatively little variation over time, more pronounced variation over time), and use sensitivity analysis to assess the relative influence of different assumptions on projected outcomes.Such techniques have long been used in population viability analysis for the similar objective of forecasting population persistence (Beissinger & McCullough, 2002).
In summary, specifying and selecting SDMs plays a key role in forecasting, regardless of whether the purpose is to present a best projection of future conditions or to bracket their uncertainty.We hope results here lead to better exploration of the potential future of species distributions, and better communication of their uncertainty to target audiences.
species' occupancy was driven by the effects of climate and landuse conditions only, random (latent, unmeasured) effects only, or both; (2) avian environmental associations remained the same or changed over the past century; (3) contemporary avian distributions K E Y W O R D S avian communities, climate change, colonization/persistence, cross-validation, forecasting, imperfect detection, joint species distribution model, land use change exhibited dependence upon their distributions a century earlier; and Historic and modern avian survey data were derived primarily from previously published studies, and we describe data collection briefly below, referring readers to details inTingley et al. (2012), Socolar et al. (2017),Iknayan and Beissinger (2018), MacLean et al. (2018), and Beissinger et al. (2023).Joseph Grinnell and colleagues surveyed 320 sites across California between 1896 and 1940 (Figure2).Survey data included lists of species encountered each day and standardized counts, which we reduced to detection/non-detection data.Bird incidence, site location, and other metadata were extracted from notebooks in the archives in the Museum of Vertebrate Zoology at the University of California, Berkeley.Contemporary resurveys occurred serially by region between2003 and 2016, following a point-count protocol at 10 points TA B L E 1 Summary of the candidate models pursued to explain avian community dynamics here.
separated by 250 m along a transect.Transects were placed to represent the location and elevation of the initial surveys based on the descriptions of historic surveyors.Modern surveys began at dawn and lasted 2-3 h.At each point along the transect, we recorded all birds seen or heard during a seven-min period.Each transect was surveyed daily over three consecutive days.Bird counts from modern surveys were collated for each day across all 10 points surveyed and reduced to detection/non-detection data per day per site for modeling.
Figure 2, Appendix S2: Figure S1).It was spatially averaged across the buffers and used as an additional predictor (FPCTemp) capturing seasonal climatic mildness.MacLean et al. (2018) and Beissinger et al. (2023) detail the approach for reconstructing historic land use and creating modern

Figure 1 .
Figure1.We imagined that intercepts might be time-invariant, time-varying, or that the modern intercept might further be additively influenced by the previous occupancy state.We further supposed that environmental associations might be omitted, time-invariant (i.e., following the assumptions of space-for-time substitution), time-varying, or that modern associations might further depend on the previous occupancy state.Similarly, we presumed that random effects might be omitted, that latent factors might be time-varying while loadings are constant (implying that residual covariance between species is constant), or that both factors and loadings might be time-varying (implying that residual covariance between species changes over time).Finally, we posited that species in the contemporary period might either respond primarily to contemporary environmental conditions, the change in environmental conditions relative to the historic period (implying distributions are largely shaped by the rate of change in a non-stationary environment rather than the delineation of contemporary environmental conditions), or a combination of both.This latter specification implies modern distributions are influenced both by modern conditions and historic conditions.
except they imagine modern distributions or transitions are potentially influenced by spatial patterns of environmental change.Model 11-where colonization and persistence processes are related to environmental change between eras-is analogous to what Iknayan and Beissinger (2018) and Beissinger et al. (2023) pursued.
where predictors and random effects varied over time, but parameters (or in some cases coefficients only) were constant; (2) models with time-varying random effects but no explicit covariates, (3) models with time-varying predictors and parameters but no random effects; and (4) models with time-varying predictors but fixed parameters and no random effects.Thus, of the broader terms that might be employed to describe change, random effects were far and away the most important for predicting occurrence of withheld species at sampled sites.While the difference between modelsF I G U R E 3 Model selection resultscategorized by which components were estimated as being time-varying.Assumptions pertaining to covariates and memory depicted in the symbology.Note, left two categories omit random effects completely, third and fourth categories omit environmental predictors completely.

F
I G U R E 4 Hyper-parameter estimates for occupancy or transition probabilities based on the full model (a/b); the number of positive and negative species-specific associations by specified effect ( = .05,c).Marginal effects of mean annual minimum temperature and a functional principal component based on mean monthly minimum temperature for a 'median' species (d-f).denotes historic occupancy effects, denotes effects on persistence, and denotes effects on colonization.

F
Cross-model correlations between posterior predictions of species richness across sites during the historic period (a) and modern period (b).Correlations between posterior prediction of richness change is depicted in panel c.Note, values depicted are posterior means: uncertainty intervals are within ±0.10.Models with and without random effects (* = latent factors and species loadings) exhibit much weaker correlations.Model numbers in parenthesis follow Figure 1, Table