Simple statistical models can be sufficient for testing hypotheses with population time‐series data

Abstract Time‐series data offer wide‐ranging opportunities to test hypotheses about the physical and biological factors that influence species abundances. Although sophisticated models have been developed and applied to analyze abundance time series, they require information about species detectability that is often unavailable. We propose that in many cases, simpler models are adequate for testing hypotheses. We consider three relatively simple regression models for time series, using simulated and empirical (fish and mammal) datasets. Model A is a conventional generalized linear model of abundance, model B adds a temporal autoregressive term, and model C uses an estimate of population growth rate as a response variable, with the option of including a term for density dependence. All models can be fit using Bayesian and non‐Bayesian methods. Simulation results demonstrated that model C tended to have greater support for long‐lived, lower‐fecundity organisms (K life‐history strategists), while model A, the simplest, tended to be supported for shorter‐lived, high‐fecundity organisms (r life‐history strategists). Analysis of real‐world fish and mammal datasets found that models A, B, and C each enjoyed support for at least some species, but sometimes yielded different insights. In particular, model C indicated effects of predictor variables that were not evident in analyses with models A and B. Bayesian and frequentist models yielded similar parameter estimates and performance. We conclude that relatively simple models are useful for testing hypotheses about the factors that influence abundance in time‐series data, and can be appropriate choices for datasets that lack the information needed to fit more complicated models. When feasible, we advise fitting datasets with multiple models because they can provide complementary information.


| INTRODUC TI ON
Understanding factors that govern population size through time is a central theme in ecology, with a rich history of inquiry that spans theoretical and mathematical (Hassell, 1975;May, 1975;Turchin, 1990) to empirical and applied approaches (Beissinger & McCullough, 2002;Morris & Doak, 2002). Over the past decade, advances in state-space models have woven together a number of these different research lineages. For example, efforts have successfully incorporated density-dependent population growth mechanisms into data-driven statistical models of population time series while accounting for imperfect detection (Dail & Madsen, 2011;Hostetler & Chandler, 2015;Kanno et al., 2015;Zipkin et al., 2014).
Model extensions account for excess zeroes due to immigration/ emigration (Hostetler & Chandler, 2015) and simultaneous analysis of multiple populations, which facilitates viability analysis for less intensively sampled populations (Leasure et al., 2019;Wenger et al., 2017). Mark-recapture models have likewise been extended to hierarchical models in which demographic processes are the focus (Link & Barker, 2005), joint models of interacting species (Yackulic et al., 2018), and integrated population viability models (Saunders et al., 2018), among others.
Two fundamental challenges characterize these recent modeling advances: (1) they are data intensive, generally requiring additional sampling effort to estimate observation error, and (2) they are structurally complex, which puts them beyond the reach of many practitioners. The first point is the most important because many existing time series datasets lack the information needed to fit an observation model, rendering such approaches infeasible. However, the complexity of the modeling can be a barrier even when all requisite data are available. Most such models must be fit using custom-coded Bayesian methods, often requiring weeks to months of development and troubleshooting. With large datasets, they may require considerable computational time to fit a single model, although recent advances have reduced this time (e.g., Yackulic et al., 2020). Much of this complexity is a necessary result of incorporating observation and sampling models, which are essential for obtaining unbiased estimates of true abundance and population viability (Freckleton et al., 2006;Hobbs & Hooten, 2015).
However, there are many applications where incorporating observation and sampling models is not essential, and for which simpler models may provide useful insights. One such application, which is our focus here, is testing ecological hypotheses to explain changes in species abundance as a function of abiotic or biotic covariates. In this case, it is not necessary to know the true population abundance or the observation error, as long as the observation errors are homogeneous, or nearly so. Most importantly, the observation error cannot be correlated strongly with a predictor variable of interest.
For example, if one wishes to test whether individual counts through time are a function of temperature, temperature must not strongly influence detection. If this assumption can be met, then a simple model structure may yield useful insights. This is fortunate because, as mentioned above, many existing population time-series datasets lack replicates or other auxiliary data with which to properly fit observation models (e.g., repeat sampling, multiple observers, or mark-recapture data), but nevertheless contain information potentially useful for testing hypothesized drivers of population dynamics.
The number of such datasets has greatly increased in recent decades (Comte et al., 2021;Dornelas et al., 2018).
Most population time series have some degree of temporal autocorrelation, meaning that the abundance at any point in time is dependent on one or more previous time steps (Barker & Sauer, 1992;Tuljapurkar & Haridas, 2006). This presents a challenge for testing hypotheses to explain abundance through time because abundance may be high despite unfavorable environmental conditions if it was even higher in a previous time step, or low despite favorable environmental conditions if it was even lower in a previous time step.
Conversely, negative density dependence can cause populations to decline when abundances are high or increase when abundances are low, regardless of any environmental influence. Addressing these nuisances may be necessary for testing hypothesized drivers of population change.
We explore a range of regression models that differ mainly in how they account for temporal autocorrelation. At one end of the spectrum is a traditional generalized linear modeling (GLM) approach in which abundance at every time step is assumed to be independent of previous time steps. This simple model would likely be most suitable for highly fecund, short-lived species (i.e., r life-history strategists) whose populations undergo large fluctuations with low temporal autocorrelation. We refer to this as model A. At the other end of the spectrum, model C uses the difference in abundance between time steps (which can be interpreted as the population growth rate) as the response variable, an approach more appropriate for long-lived species with low fecundity (i. We describe the three models (Section 2) and evaluate them using simulated population data to test how they perform for species with different life-history characteristics (Section 3). We then apply the models to two case studies using empirical data (Sections 4 and 5).

T A X O N O M Y C L A S S I F I C A T I O N
Applied ecology, Conservation ecology, Population ecology The first case study uses a freshwater fish dataset to test hypotheses of associations between river flow conditions and abundance over time, and the second case study uses a small mammal dataset to test hypotheses of population response to precipitation and fire regimes.
Finally, we discuss the results and provide recommendations based on the model comparisons (Section 6). Throughout, our perspective is pragmatic rather than theoretical: we wish to identify models that are useful for testing hypotheses to understand the change in abundance through time. Our hope is that this study provides useful guidance to ecologists and managers who are interested in testing hypotheses using existing time-series datasets, particularly those datasets that lack information for fitting more complicated models.

| Model A. Generalized linear mixed model of abundance
For all models, we assume a dataset of counts comprised of individuals (N s,t ) at one or more sites (s) at two or more points in time (t), with at least one candidate covariate (X 1s,t ) to explain variation in counts in space and/or time (in this section, we index the covariate by space and time, but in our examples, it is indexed only by time, with one exception for example 2). Because we are modeling count data, we use a generalized linear mixed model (GLMM) in which stochasticity is treated as conditionally Poisson (potentially with overdispersion) or negative binomially distributed. The simplest model, which we call "model A," assumes no latent temporal autocorrelation in abundance after accounting for fixed effects. This is an overdispersed Poisson GLMM. If there are multiple sites, latent random effects s,t may not be fully independent, as sites may differ in mean abundances even after accounting for covariates. Therefore, a random intercept for site identity will usually be necessary. Random slopes for covariates may also be considered. This model is very similar to the basic Bayesian model for time-series analysis at multiple sites presented by Kéry and Schaub (2012), which has been widely applied in ecological analyses. It can be fitted with common non-Bayesian statistical packages or by Bayesian methods.

| Model B. Generalized linear mixed model of abundance with autoregressive errors
Model B accounts for latent autoregressive dependence via the parameter ρ in formula 2 below. Most commonly, the autoregressive dependence is assumed to be Markov, i.e., dependent only on the previous time step (also called AR1 or a moving average model), although different dependence structures are possible.
This can be coded with relative ease in Bayesian software such as JAGS (Plummer, 2003) or stan (Stan Development Team, 2020), or it can be fit with the R packages glmmTMB (Brooks et al., 2017) and brms (Bürkner, 2017). One consequence of adding the autoregressive term is that this model requires more temporally complete data than model A because at least two consecutive time steps of data are required to estimate the AR1 error term. However, missing count data can be imputed if Bayesian methods are used or a custom model is written. The model implicitly assumes that time steps are equal.

| Model C. Generalized linear model of growth rate
Model C (Equation 3) characterizes the change in abundance through time-i.e., the population growth rate.
This model cannot be fit in this form using conventional non-Bayesian regression packages (although see Equation 5 for a reformulation), but can be estimated with Bayesian methods. We can include an autoregressive abundance term (on the original scale, not logged) to serve as a density dependence term (Hobbs & Hooten, 2015;Equation 4).
Coefficient β 2 will usually take on a negative value, representing negative density dependence, i.e., the reduction in growth rate as N increases. The intercept in this model (β 0 ) can now be interpreted as the intrinsic growth rate or the growth rate when N is close to 0. This model formulation is a form of the Ricker model, with carrying capacity calculated as K = 0 − 2 , after accounting for exogenous influences on population change ( 1 X 1s,t ). For simplicity, we use Equation 3 for model C in the simulations and examples in this study, but we return to the topic of density dependence in Section 6.
Like model B, model C implicitly assumes that time steps are approximately equal, and is not appropriate for datasets with many missing data points unless using Bayesian methods or a custom model that allows for imputation. Unlike models A and B, a random effect for site identity may not be necessary because differences in mean abundances among sites are accounted for by modeling changes in abundance rather than absolute abundance. However, if some sites have higher or lower long-term growth rates than can be explained by covariates, random effects may be required. Random slopes may also be considered if covariates affect the growth rate differently due to context-dependent processes that vary in space.
Although this model cannot be fit using conventional frequentist statistical packages, an approximation using conventional frequentist regression methods is possible if the dataset contains no zeros, or if 1 is added to all counts to preclude non-zero values. We rearrange the model to bring the lagged abundance term to the left side and omit the Poisson stochasticity (Equation 5).
In the previously presented models, the data used are the abundances, but here the data are the log of the ratio of abundance in time t to abundance in time t − 1, which is equivalent to log(N s,t ) − log(N s,t−1 ). This makes it possible to fit the model using any linear regression software.
Models of similar form have been used in ecological time-series analysis previously, although often with a coupled observation model (e.g., Williams et al., 2003).  . We illustrate this idea in the case study presented in Section 4.

| S IMUL ATIONS
We simulated population time series to compare the performance of the Bayesian and non-Bayesian versions of the three models when fitted to datasets representing populations that varied by fecundity and survival. Our simulations were based on a matrix modeling approach that accounted for life span, reproductive age, juvenile survival rate, adult survival rate, fecundity (technically, this is fecundity multiplied by egg-to-juvenile survival), carrying capacity, initial population size, and length of simulation ( Figure 1).
We assumed that juvenile survival (S j ) was affected by a single timevarying environmental variable (env1), adult survival (S a ) was affected by a different time-varying environmental variable (env2), and that S j and S a did not covary. These environmental variables were drawn from normal distributions (one draw per time step, t) with a mean of zero and were added to S j and S a on the logit scale, after which the variables were back-transformed to probabilities. Survival was calculated independently for each year class as a binomial process based on the applicable (juvenile or adult) survival probability. Recruitment was modeled as a Poisson process based on the number of reproductive adults and fecundity. Density dependence was incorporated into the juvenile survival term by multiplying S j by 1 − (N t−1 /K), where K is the carrying capacity. All code is provided with the data in the Zenodo archive.
In our first round of simulations, we ran a large number of iterations representing two scenarios. Scenario 1 specified high fecundity (4 juveniles per adult), low survival (S j = 0.25, S a = 0.4), a carrying capacity of 1000, and a lifespan of 3 years. Scenario 2 specified low fecundity (0.5 juveniles per adult), high survival (S j = 0.5, S a = 0.9), a carrying capacity of 500, and a lifespan of 12 years. To test our models, we summed the adult population at each time step and modeled N as a function of env1 and env2 using each of the three model structures. We predicted that models A and B would be favored for Scenario 1 (a population of a high-fecundity, short-lived species with low temporal autocorrelation), and that model C would be favored for Scenario 2 (a population of a low-fecundity, longlived species with high temporal autocorrelation), with model B in second place. For the Bayesian models, we ran 1000 simulations of 50 years each for both scenarios using RunJAGS (Denwood, 2016) and JAGS (Plummer, 2003). For each simulation, we ran four chains for a burn-in period that varied by the model (5000 for model A; 20,000 for model B; and 10,000 for model C, based on tests to ensure convergence) followed by a 20,000-iteration sampling period.
For the non-Bayesian models, we ran 1000 simulations of 50 years each for both scenarios using glmmTMB (Brooks et al., 2017). We considered numerous indicators of model performance but determined that most were inappropriate for comparison between Bayesian and non-Bayesian models that have different response variables (e.g., likelihood-based information criteria such as AIC or BIC are not an option). We elected to compare models based on the squared correlation between model predictions and actual values (a pseudo-R 2 ), a simple but admittedly imperfect metric because it does not consider model complexity or out-of-sample performance.
In all cases, model predictions only included fixed effects, not error terms.
We found that models A and B had better average performance than model C for scenario 1, consistent with predictions (Table 1).
Nevertheless, model C was the best-supported model for about a quarter of the datasets. Bayesian and non-Bayesian models had nearly identical average performance, but for non-Bayesian models, model B had a performance that was consistently just slightly lower than model A, and therefore was rarely selected as best overall.
However, the actual performance difference between models A and We ran a second round of simulations in which we compared models based on predictor variables that were strongly correlated with abundance (i.e., env1 and env2, as used in the first round of simulations) with a weakly correlated predictor variable (env3) that was based on the sum of env1 and env2 but had substantial random noise added. For this comparison, we only used the three non-Bayesian models and evaluated them based on their ability to predict abundance in the final year (i.e., the mean absolute percent error for year 50), which was withheld from model fitting. This also provided an out-of-sample estimate of model error. We ran 1000 iterations for each of the same two scenarios used in the first round. We hypothesized that the "strong" model would be selected the great majority of the time in all cases. However, our results indicated that the "weak" model was selected almost half the time in some cases, and at least a third of the time in all cases (Table 2). This was true in spite of the fact that mean error rates were higher for the "weak" models in both scenarios. This serves as a reminder to be cautious in interpreting the results of an analysis with a single time series-even a 49-year time series-as providing strong evidence in favor of one hypothesis over another. By random chance, a weak variable can have a tighter correlation with abundance than a strong variable in any individual dataset. Unless random variability is quite low, multiple datasets are needed to have high confidence in the outcomes of a test of two competing, correlated predictor variables.
Across our simulations, we found that all three models produced similar parameter estimates in most cases, although model A tended to have lower standard errors on parameters than models B and C.
We consider the standard errors of model A to be biased low since this model assumes independence of annual samples, an assumption we know is not met in Scenario 2. In these simulations, we cannot evaluate the accuracy of parameter estimates against known values because the simulation model is substantially more complicated than any of the three fitting models. Nevertheless, we have found these simulations to be a useful tool for evaluating model behavior and performance. The simulations we report here are just a few of those that are possible, and we encourage users to adapt the supplied simulation code to conduct other comparisons relevant to particular applications.

| C A S E S TUDY 1 . FLOW ECOLOGY OF S HOAL FIS HE S IN THE E TOWAH RIVER , G EORG IA , UNITED S TATE S
Aquatic organisms that have evolved in riverine ecosystems are assumed to be adapted to a flow regime (Lytle & Poff, 2004;Poff et al., 1997). Although researchers have demonstrated that species-specific or trait-specific models of organismal response TA B L E 1 Predictions and results of simulations of the three models under two scenarios. The "% each model selected as best" indicates the frequency with which each model had the lowest mean absolute percent error (MAPE) in 1000 random model runs. The "pseudo-R 2 " is the squared correlation between model predictions and actual values.

TA B L E 2
Results of simulations to test a "strong" model (with predictors directly correlated with abundance) versus a "weak" model (with predictors indirectly correlated with abundance and added noise). The "% of times the 'strong' model selected as best" indicates the frequency with which the strong model had lower mean absolute percent error (MAPE) than the weak model, in 1000 random model runs.
The "mean error rate" indicates the average MAPE for the strong and weak models in each scenario. to interannual variability in flows are still largely lacking . It has been suggested that these flow ecology questions could be better answered with the use of population rates (e.g., growth rate) rather than states (e.g., abundance) as response variables in time-series data analyses (Poff, 2018;Tonkin et al., 2019;Wheeler et al., 2018). Model C can be viewed as a rate model, whereas models A and B are repeated-state models (i.e., models of abundances observed repeatedly through time).
We used a long-term dataset of fish counts from the Etowah River  We proposed four hypotheses regarding how the abundances of fish species respond to flow: 1. Populations decline in years of exceptionally high summer flows due to direct mortality of eggs and young of the year, which reduces total abundance (Harvey, 1987;Humphries et al., 1999).
2. Populations increase in years following exceptionally high summer flows due to the scouring of fine sediment that increases the productivity of the system in the following year (Cattanéo et al., 2001). Alternatively, populations could rise due to a densitydependent response to declines, or to observation/sampling error that mimics density dependence (Freckleton et al., 2006); it may not be possible to disentangle these mechanisms.  percentile (low-flow days, LFD t ) for the summer, which we defined as June through September. We used the same flow metric values for each site within a year (i.e., flow metrics vary by time but not space).
For each species, we fit the Bayesian versions of models A-C using RunJAGS (Denwood, 2016) and JAGS (Plummer, 2003). We tested all hypotheses simultaneously by including fixed effects for HFD t , HFD t−1 , LFD t , and LFD t−1 . We also included a term for mean discharge on the day of sampling (Q), hypothesizing that capture efficiency would be negatively related to stage height (this term was only moderately correlated with other flow variables; Pearson's r < .5). Correlations among other predictor variables also were low (Pearson r = .42 or less). For consistency, we included a random intercept for the site in all models. All predictor variables were standardized by subtracting the mean and dividing by the standard deviation.
We specified vague priors for all parameters. We ran four chains for a 30,000-iteration burn-in period (including a 2000-iteration adaptation) followed by a 100,000-iteration sampling period with a thinning factor of 10, for a total of 40,000 samples included in the posterior parameter estimates. We determined convergence based on the Brooks-Gelman-Rubin diagnostic (R-hat <1.1). When necessary, model runs were extended to achieve convergence. We calculated the deviance information criterion (DIC) as an indicator of the relative support for each set of three models for each species.
We also calculated a pseudo-R 2 as the squared correlation between marginal (i.e., without random effects) model predictions and observations. Code is provided with the data in the Zenodo archive.
All algorithms converged. We found that, based on DIC, models A and B had very similar performance; the slightly better likelihood of model B was balanced by its slightly greater complexity ( Table 3).
Model C had similar support to models A and B for most species, although it had substantially lower support for the speckled madtom and the blackbanded darter. Based on pseudo-R 2 , model C had higher support than models A and B for all species except the Coosa chub.
We found mixed support for our hypotheses (Table 3) Hedden & Gido, 2020). Hypothesis 4 (negative effect of lagged low flows) was moderately supported, with generally negative parameter estimates, although these were imprecisely estimated for half of the species. Support for Hypothesis 4 was generally more evident with model C than with other models. All species except the Coosa madtom had the expected negative parameter estimate for discharge on the day of sampling. Broadly speaking, parameter estimates were quite similar between models A and B for a given species, but model C tended to have parameter estimates that differed from the other two models. We explore this in Section 6.
We also ran non-Bayesian versions of all models for all species.
The results, reported in the Supporting Information (Table SI1), were very similar to the Bayesian model results in most cases, although the standard errors on parameter estimates tended to be substantially smaller for model A.

| C A S E S TUDY 2 . S MALL MAMMAL S IN KONZ A PR AIRIE , K AN SA S , UNITED S TATE S
Temporal changes in the community composition of small mammals have been linked to interannual climate variation, especially precipitation (Bruckerhoff et al., 2020;Cárdenas et al., 2021;Thibault et al., 2010). Mammal species representing different feeding guilds are hypothesized to respond differently to variations in precipitation given the distinct effects of rainfall on different mammal food resources (Reed et al., 2006b), but these predictions have not been We proposed four hypotheses about how small mammal populations would fluctuate with precipitation and burning regime: 1. Populations of herbivores increase in years with higher precipitation, as more rainfall generates increased forage biomass (Reed et al., 2006a(Reed et al., , 2007.

Populations of insectivores and omnivores also increase in high
rainfall years, as insect food sources are positively associated with increases in plant biomass (Kaufman et al., 2012;Reed et al., 2006a).
3. Granivore populations decline in years with high precipitation because increased plant litter negatively affects seed predation (Reed et al., 2006b). Code is provided with the data in the Zenodo archive.
All algorithms converged. The top-ranked model formulation varied among taxa ( Table 4). As with the fish analysis, models A and B diverged little in model fit for five of six species based on DIC (i.e., ΔDIC ≤2). These two models performed better than model C (i.e., ΔDIC >2) for four species (Eliot's short-tailed shrew, white-footed mouse, deer mouse, and Western harvest mouse), whereas models B and C performed similarly and were the top-ranked models for prairie vole. All three models performed similarly in the case of the hispid cotton rat. Pseudo-R 2 based strictly on fixed effects varied considerably among taxa, and also generally indicated greater support for models A and B.
We found mixed support for our hypotheses (Table 4). Annual precipitation was associated with increases in counts of one of the two herbivores (prairie vole; Hypothesis 1) as well as the insectivore (Eliot's short-tailed shrew; Hypothesis 2), but not for the two omnivores (white-footed mouse and deer mouse; Hypothesis 2), where the effect of precipitation did not differ from zero based on Bayesian credible intervals (BCI). The lack of a strong rainfall effect in deer mouse was consistent with a short-term demographic analysis of this taxon at KPBS, in which the highest population growth rate occurred during a year of average precipitation, as opposed to a very wet or dry year (Reed et al., 2007). We expected the granivorous Western harvest mouse to decline with precipitation (Hypothesis 3) but this species also showed a small positive association with rainfall in the top models. Species had mixed responses to time since burning, with only deer mouse exhibiting the hypothesized negative response in the top-ranked models compared to two species with positive responses based on BCI (prairie vole and white-footed mouse).

Parameter estimates for the predictors of interest in models A and B
were similar: they had the same sign (i.e., positive, negative, or zero according to BCI) in 11 of 12 cases, and differed in magnitude by <25% in most cases. Parameter estimates from model C, however, deviated in several instances from those of A and B, particularly for the time-since-burning covariate. Results of the non-Bayesian models (Table SI2) were qualitatively similar to the Bayesian model results in most cases (e.g., most parameter estimates have the same TA B L E 4 Parameter estimates (posterior means and standard deviations) and performance scores for the Bayesian versions of the three model types for six mammal species. "Precipitation" and "Time since burning" are variables representing the amount of precipitation in the preceding year and the number of years since prescribed burns occurred at the site. Superscripts indicate support for hypotheses of the corresponding number (i.e., parameter estimates with the expected sign and 90% credible intervals that do not overlap zero). R 2 is the squared correlation between conditional model predictions and observations (a pseudo-R 2 ). sign and similar magnitude). As for Case Study 1, we found that model A had much smaller standard errors on parameter estimates than models B and C.

| DISCUSS ION
Time-series datasets of species abundances have become more widely available in recent decades (Comte et al., 2021;Dornelas et al., 2018), offering increasing opportunities to test hypotheses to explain variation in species abundances through time. Our objective was to evaluate the potential for simple statistical models to test such hypotheses.
We found that even the simplest model (model A) was useful for detecting relationships between predictors and abundances, particularly for r-strategists (shorter-lived and higher fecundity species), although a model with autoregressive errors (model B) and a model of growth rate (model C) was sometimes preferred in both simulations and case studies. In simulations of time series of K-strategists (longer-lived and lower fecundity species), models B and C were usually preferred. The species in both of our case studies were closer to r-strategists than K-strategists, but we nevertheless found that model C was preferred for several species. Based on our results, our general recommendation is to test all three models, rather than trying to determine a priori which is likely to be the best supported based on species traits or characteristics of the time series. After the data are formatted, all three models are straightforward to implement. Of course, depending on the research question, there could be reasons to select one model over another (e.g., if questions concern population growth rate, model C would be preferred).

The fact that model C revealed relationships not evident in models
A and B is an important result and is a further rationale to fit multiple models. Such discrepancies among models can provide insight into the mechanisms giving rise to observed population patterns, and (potentially) evidence in favor of one or more alternative hypotheses (Yen et al., 2021). Case Study 1 provides an illustration: evidence for hypothesis 2 (populations increase in years following exceptionally high flows) is only provided by model C. Models A and B do not reveal this relationship because they use abundance rather than growth rate as a response variable, and abundances tend to be low in years following exceptionally high flows because populations are still recovering from the even lower levels in the preceding year. The fact that populations tend to rebound in the year after the high flow is only evident when using the growth rate as the response variable (similar patterns of lagged high-flow effects are well documented in the fish ecology literature; Gido et al., 2013;Humphries et al., 2008;Rogosch et al., 2019).
Because the models assess different things, we advise fitting multiple models and interpreting outputs from each.
Model C has one advantage over the other models: because it measures change, abundance is entirely factored out of the regression equation. This implies the model can be used to simultaneously analyze multiple datasets collected with different sampling methods, as long as such methods have been used consistently within each dataset. Of course, the same predictor variables must be available in each dataset to make such comparisons, and the assumption of homogeneity of observation error still applies. Setting up such a multidataset model is straightforward with the non-Bayesian version of model C, although it requires more work with the Bayesian version.
On the other hand, model C will not be as useful for datasets that (1) lack sufficient year-to-year variability or (2) have spatial variability rather than temporal variability in relationships of interest. The first case is perhaps best illustrated with an extreme example: consider a hypothetical 10-year dataset in which the environmental conditions are poor for 5 years running and then good for 5 years running.
Further, assume that the population responds by staying at a steadily low level for 5 years and then at a steadily high level for 5 years.
Because there is overall little information on the change in such a dataset, model C will perform poorly, whereas models A and B should perform quite well. This is a toy example, but the more general point is that to test hypotheses about the factors governing population increases and decreases, the dataset must have sufficient dynamics in both the population response and predictors. Three mammal taxa in case study 2 illustrate the second case in which model C may be inappropriate. For these species-prairie vole, white-footed mouse, and deer mouse-sites with above-average time-since-burning values had variable but typically higher species abundances (or lower in the case of deer mouse) than sites with more frequent burning.
Models A and B identified this correlation between time since burning and overall abundance while model C did not because the greatest variability in both abundance and the predictor occurred across space rather than through time. Therefore, for datasets in which effects of interest vary spatially, model C may not be as useful as models A and B.
Datasets with many gaps present a problem for the non-Bayesian versions of models B and C when using conventional regression packages. Fitting models B and C requires data from both the current and the prior time step, thus one missing value can effectively eliminate two observations from the time series (i.e., 2 years of data).
Where sampling is conducted every other year or every few years, which is often employed in wildlife and fish population monitoring designs (Urquhart & Kincaid, 1999), model A becomes the only choice available if conventional frequentist methods are used. The Bayesian versions of the models allow imputation of missing data, although we have not tested their performance when frequency and intervals in gaps are large.
The simplicity of Model A is appealing, but outputs from this model must be interpreted carefully as the assumption of independence of errors is unlikely to be met in many time-series datasets.
We found that the precisions of the parameter estimates from non-  (Freckleton et al., 2006). By random variation, a particular observed count could be exceptionally high or low, but such anomalies are unlikely to be followed by a similarly extreme estimate in the next time step. The result is that populations will appear to increase rapidly after an unusual dip, and to decline after an unusual peak, a pattern similar to negative density-dependent behavior. However, unless density dependence is the focus of the analysis, it is of little consequence whether the density dependence term represents true or apparent density dependence. The term serves the dual purpose of accounting for both, with a caveat that the unique contributions of each may not be identifiable.
The choice between non-Bayesian and Bayesian methods is mostly a practical one. We have observed that the non-Bayesian and Bayesian versions of all three models generally yield very similar parameter estimates. We find the non-Bayesian model C to be somewhat unsatisfying in that it requires the ad hoc solution of adding 1 to counts of 0. Nevertheless, our simulations indicated that the non-Bayesian version of model C can detect relationships between predictors and response where they are present, and so is likely to be sufficient for many purposes. One analytical strategy could be to first test models using the non-Bayesian methods, and to invest the effort into the Bayesian models only if results are sufficiently interesting or the application is sufficiently important. This two-step process may be favored when there are large numbers of hypotheses to be tested, as preliminary screening can be conducted much more quickly with the non-Bayesian models.
Long-term abundance datasets were once a scarce resource, but they have been quietly accumulating in recent decades. There are now many hundreds, likely thousands of such datasets, although many lack the auxiliary data necessary to parameterize a linked observation model. We were thus motivated to explore relatively simple, accessible methods that could be used to make valid inferences from time-series datasets without an observation model. Despite limitations potentially imposed by unknown observation biases, we believe that unlocking the information in these datasets could contribute greatly to ecological understanding. We hope that our results will encourage others to use the models presented here as starting points to investigate environmental effects on population dynamics. Foundation (DEB-1440484) for collecting and providing access to mammal community data. We thank Jay Ver Hoef for a helpful critique of an early version of this manuscript, and two anonymous reviewers for their constructive comments.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest.