Random year intercepts in mixed models help to assess uncertainties in insect population trends

An increasing number of studies is investigating insect population trends based on time series data. However, the available data is often subject to temporal pseudoreplication. Inter‐annual variability of environmental conditions and strong fluctuations in insect abundances can impede reliable trend estimation. Temporal random effect structures in regression models have been proposed as solution for this issue, but remain controversial. We investigated trends in ground beetle abundance across 24 years using generalised linear mixed models. We fitted four models: A base model, a model featuring a random year intercept, a model featuring basic weather parameters, and a model featuring both random year intercept and weather parameters. We then performed a simple sensitivity analysis to assess the robustness of the four models with respect to influential years, also testing for possible spurious baseline and snapshot effects. The model structure had a significant impact on the overall magnitude of the estimated trends. However, we found almost no difference among the models in how the removal of single years (sensitivity analysis) relatively affected trend coefficients. The two models with a random year intercept yielded significantly larger confidence intervals and their p‐values were more sensitive during sensitivity analysis. Significant differences of the model with random year intercept and weather parameters to all other models suggest that the random year effects and specific weather effects are rather additive than interchangeable. We conclude that random year intercepts help to produce more reliable and cautious uncertainty measures for insect population trends. Moreover, they might help to identify influential years in sensitivity analyses more easily. We recommend random year intercepts in addition to any variables representing temporally variable environmental conditions, such as weather variables.


INTRODUCTION
The number of annually published studies that examine temporal changes in insect populations has more than doubled in the last decade (with 79 studies in 2012 and 180 in 2022, based on a search in Web of Science using the search string 'insect* AND population AND trend' performed on March 13th 2023). Some of these studies reported dramatic declines in insect abundance, biomass or diversity (e.g., Hallmann et al., 2017;Lister & Garcia, 2018;Seibold et al., 2019) triggering public interest in this topic and causing widespread concern.
Although more and more long-term data sets are becoming available (e.g., Van Klink et al., 2020) there yet remains the need for longitudinal long-term data to better understand the spatial and temporal patterns of insect trends and gain insights about their drivers (Montgomery et al., 2021;Mupepele et al., 2019;Thomas et al., 2019). Most longterm monitoring schemes started well after 1980 (Brooks et al., 2012;Hallmann et al., 2020;Homburg et al., 2019) and data sets exceeding 40 years are still rare (Bell et al., 2020;Macgregor et al., 2019;Martins et al., 2013). This represents a considerable challenge for deriving reliable trends, because many insect populations show a strong inter-annual variability (e.g., Aldercotte et al., 2022;Günther & Assmann, 2004;Pollard, 1991), penalising trend estimations. Daskalova et al. (2021) recently observed that studies investigating shorter time series tended to find the strongest trends-both, positive and negative. They attributed this heterogeneity to the increased impact of exceptional years in terms of environmental conditions and insect occurrences. Didham et al. (2020) also addressed this issue and emphasised the strong leverage of years close to the start or the end of a time series causing so-called false baseline effects or snapshot effects. Daskalova et al. (2021) therefore advocated the use of random year effects in generalised linear mixed models (GLMMs) to gain more reliable trend estimates as they help to account for inter-annual variability and temporal pseudoreplication. The issue of pseudoreplication in ecological studies working with experimental or observational data has been known and discussed for almost four decades (Hurlbert, 1984). Modern implementations of mixed model frameworks allow addressing this problem by including random effect structures (Chaves, 2010). Previously, Werner et al. (2020) had used random year intercepts in GLMMs to account for temporal pseudoreplication in vegetation data, and similar approaches to investigate trends have been utilised for bird or insect population data using generalised additive mixed models (GAMMs) (Bell et al., 2020;Knape, 2016). In a reply to Daskalova et al. (2021), Seibold et al. (2021) questioned the use of random year intercepts due to a lack of independence between single years and propose that specific environmental parameters (e.g., weather variables) are more suitable to account for varying environmental conditions and annual fluctuations in insect occurrences.
We used a 24-year data set of ground beetle (Coleoptera: Carabidae) abundances from Eberswalde, Germany, to explore trends estimated with GLMMs featuring different combinations of random year intercepts and weather variables. We tested the different model structures for their robustness towards influential years by performing a simple sensitivity analysis.

METHODS
We used data on ground beetle sampling abundances (in the following simply referred to as abundances) from a previously unpublished data set. Ground beetles were caught from 1999 to 2022, between the beginning of May and the end of July each year, on 13 forest plots within one forest site (with an extension of approximately 1 Â 1 km) close to Eberswalde, Germany (52.820000, 13.790000). Not all plots were sampled each year and we only included data from plots, which were sampled in 3 years or more.
We used pitfall traps consisting of 400 mL glass jars positioned in a piece of PVC pipe (see Boetzl et al., 2018), formaldehyde as trapping fluid and metal roofs. There were four pitfall traps at each plot, organised as either square or transect with approximately 20 m between the traps. The layout of traps (square or transect) remained consistent for each plot throughout all sampled years.
During the sampling period, the traps were emptied 3 times, usually after 4 weeks (28 days), however, the exact sampling length occasionally varied. Prior to data analysis, we excluded all data from traps that had been disturbed by factors such as rain, wild animals or vandalism. Abundances represent the sum of all individuals of species belonging to the Carabidae family that were caught in one pitfall trap during one sampling interval. We excluded data of one species (Nebria brevicollis, Fabricius 1792), which is known to display extreme fluctuations in numbers between years (Nelemans et al., 1989). We modelled ground beetle abundance using GLMMs of the negative-binomial family (link = log) fitted with the glmmTMB R package (glmmTMB function, Brooks et al., 2022). The simplest model (base model) featured linear terms for year (continuous) and sampling interval (factor) as well as a quadratic term for days of sampling (continuous) (see Schirmel et al., 2010) as fixed effects. Furthermore, the base model included a random intercept specific for each trap and year, as the trap numbering and their exact locations varied among years, nested within the plot. We tested for temporal auto-correlation among years with the testTemporalAutocorrelation function of the DHARMa package (Hartig, 2021), which showed no indication for any relevant temporal autocorrelation between years (Durbin-Watson test: 1.63, p-value = 0.3499). Based on this base model, we then performed a stepwise forward model selection with additional weather variables.
We derived all weather variables from publicly available data (daily mean temperature and daily sum of precipitation) recorded at a meteorological station located approximately 27 km from the study site (DWD, 2023). The mean daily temperature during sampling (as interaction with sampling interval), sum of precipitation during sampling and mean early spring temperature (March) were tested as additional variables. All of these parameters have been observed to affect ground beetle sampling abundances in earlier studies (Honěk, 1997;Irmler, 2022;Tsafack et al., 2019;Wang et al., 2014). We added each variable to the model as either linear or quadratic fixed effect resulting in different model candidates, which were then compared via AIC (note: cAIC is not available for glmmTMB, models were fitted with maximum likelihood (REML = FALSE) with all random effects remaining constant).
The variable yielding the lowest AIC was kept in the model formula, while all remaining variables were again added in the next step creating new model candidates. Finally, the base model as well as the model resulting from the model selection were each expanded by a random intercept for year (factor), which was crossed with all other random effects. This resulted in the following four model structures (we report the model formulas as R-syntax here and additionally provide mathematical equations in the Supporting Information S1): We used bootstrapping to compute 0.95 CIs: Full model CIs were bootstrapped using the ggpredict function of the ggeffects package (Lüdecke et al., 2022) within the bootMer function of the lme4 package (Bates et al., 2022) based on 1000 simulations. We used the boot-strap_model function of the parameters package (Lüdecke et al., 2023) with 120 simulations to calculate CIs for trend coefficients during the sensitivity analysis.
All analyses were performed with R (version 4.2.2, R Core Development Team, 2022). Further methodological details and the full R code can be found in the Supporting Information of this article (S1).

RESULTS AND DISCUSSION
Trend estimates for ground beetle abundance significantly differed among the four model structures (base model, random year model, weather model and combined model): The random year and the combined models, which both included a random year intercept, generally estimated trends over time to be significantly more negative than both models without random year intercepts (Figures 1, 2a-d and 3a).
In contrast, Daskalova et al. (2021)  of weather variables such as site-specific temperature and precipitation might be more suitable to account for inter-annual variation in environmental conditions and should therefore be used. However, ecosystems are complex and even well-chosen parameters might be insufficient to fully adjust for year-to-year variance in environmental conditions that affect insect communities (Daskalova et al., 2021).
We found that random year intercepts and weather variables both similarly modified trend coefficients, but patterns in CIs and p-values significantly differed between the two approaches. Moreover, signifi-  Roth et al., 2021) to uncover potential false baseline effects or snapshot effects (Didham et al., 2020;Fournier et al., 2019). Other options include the permutation of years (Aldercotte et al., 2022;Crossley et al., 2020) or the exclusion of trends of single species (Dennis et al., 2019), certain plots or whole study areas .
We conclude that random year intercepts help to account for inter-annual variation of environmental conditions, random population fluctuations and inherent temporal pseudoreplication in insect time series. Models with random year intercepts yield trend estimates with wider but more robust confidence intervals and more sensitive writingreview and editing; formal analysis. Andreas Linde: Data curation; supervision; resources; writingreview and editing.
F I G U R E 3 Results of the sensitivity analysis for the base model (1), random year model (2), weather model (3) and combined model (4)

ACKNOWLEDGEMENTS
We would like to acknowledge that the data were collected in the course of teaching activities at Eberswalde University for Sustainable Development. We therefore thank more than 1000 students who helped collecting the data between 1999 and 2022. We would also