Forecasting the spread of an invasive forest‐defoliating insect

Biological invasions are an escalating environmental challenge due to their substantial ecological and socio‐economic consequences. Accurate near‐term forecasts of future areas occupied by an invasive species could enhance the efficiency and efficacy of invasion monitoring and management but spread forecasting models have been developed and tested for few invasive species thus far.

expenditures of over $2 billion towards management and mitigation of non-native forest pests (Aukema et al., 2011).In addition to causing economic damages, invasive species can reduce or extirpate populations of native species, spread disease, and alter food webs (Jones et al., 2022;Venette & Hutchison, 2021).The ability to accurately predict where and when an invading species is likely to spread has the potential to support early detection efforts by managers, resulting in more efficient allocations of limited resources for pre-emptive management and rapid response to biological invasions (Briscoe et al., 2019;Ibáñez et al., 2009;Thuiller et al., 2005).
Correlative species distribution models (SDMs), also called ecological niche models, have been the primary tool for modelling species' range dynamics over recent decades in ecology and conservation science (Araújo & Peterson, 2012;Elith & Leathwick, 2009).This suite of techniques leverages the correlations between species' occurrence and environmental variables to predict the spatial suitability of areas, and therefore have been applied to predicting invasive distributions.However, the limitations of correlative SDMs, including coarse-grained predictions and the exclusion of complex interactions between biotic and abiotic factors, meant that even the most sophisticated invasion management programs reacted to, rather than anticipated, invasive spread (Sharov et al., 2002;Tobin et al., 2012).In response to these limitations, recent advances in SDM methods now integrate demography, physiological tolerances, and dispersal in their modelling frameworks (Singer et al., 2016).
These models are often called mechanistic or process-based SDMs and comprise a diverse range of approaches that include techniques such as dynamic occupancy models (Kéry et al., 2013), eco-physiological models (Kearney & Porter, 2009), and coupled SDMpopulation approaches (Keith et al., 2008).
Mechanistic SDMs have been relatively successful in the context of projecting stable-state distributions of invasive species under potential climate scenarios in the future (e.g.Holcombe et al., 2010;Jarnevich et al., 2018).However, these models would be more useful for invasive management if they provided predictions at a temporal scale that matches that of decision-making (e.g.daily or yearly; Barker et al., 2020;Bodner et al., 2021;Houlahan et al., 2017).Near-term ecological forecasting is a framework for developing iterative predictions regarding the state of ecological systems with specified uncertainties, specifically on shorter time scales to guide environmental decision-making (Clark et al., 2001;Dietze et al., 2018).Mechanistic SDM approaches may be leveraged to address this need for near-term ecological forecasts that capture the dynamism of invasion patterns.Another challenge of using these methods for the management of invasive species is the poor availability of high-resolution distribution and abundance data, making it necessary to evaluate the trade-offs between forecast performance and data requirements as model complexity increases.Here, we use the statistical methods underlying mechanistic SDMs to develop a near-term ecological forecasting model that captures spatiotemporal patterns of invasive spread of a highly damaging invasive insect by iteratively incorporating new data for each yearly forecast.We also assess the value of complexity in our forecasting model by comparing models with increasing levels of environmental information.
More than 450 non-native insects are currently established in forest ecosystems of the United States, and nearly every genus of woody plants in North American forests are impacted by at least one invasive insect or pathogen (Aukema et al., 2011;Hill et al., 2016;Kenis et al., 2009).And yet, invasive insects are among the most data deficient taxonomic groups (Brockerhoff & Liebhold, 2017;Troudet et al., 2017).The extensive record documenting the spread of the invasive forest insect Lymantria dispar dispar (recently renamed the spongy moth) is a notable exception.This species was introduced around 1869 in Medford, MA and gradually expanded its range in North America to cover over 900,000 km 2 as of 2012 (Tobin et al., 2012).Its current range today represents approximately a third of the US forest habitat susceptible to future spread of this species (Liebhold et al., 1997).Over the past two decades, the USDA Forest Service Slow-the-Spread program has annually deployed an extensive trapping grid over the leading edge of the invasion as a management tool, resulting in one of the most extensive and detailed spatiotemporal datasets on population spread of an invasive species (Grayson & Johnson, 2018).The combination of extensive knowledge of L. dispar ecology and unparalleled data on its distribution and abundance makes this species an ideal study system for developing and testing near-term spread forecasting models.However, despite extensive prior research on the invasion process and factors regulating rates of L. dispar spread (Johnson et al., 2006;Liebhold et al., 1992;Nunez-Mir et al., 2022), a large majority of these studies have focused on explanation of historical invasion dynamics, as opposed to prediction, and we are unaware of other studies forecasting L. dispar range dynamics on year-to-year or similar temporal scales.
In this study, we developed and evaluated forecasting models that use conditions in a given year to predict the probability of locations being occupied by L. dispar in the following year.Our study examined three main questions: (1) can we accurately forecast yearly changes in L. dispar occupancy?(2) How does forecast performance depend on population density thresholds for occupancy and the spatial extent of connectivity metrics accounting for dispersal?and (3) how biologically detailed does the model need to be to produce accurate forecasts?We used data on the spatial distribution of L. dispar from the first decade of invasion front-wide trapping (2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009)(2010) for model development, applied the model to iteratively predict occupancy for each year between 2011 and 2019 based on conditions of the previous year, and determined the accuracy of our forecast models based on observed occupancy.We considered models that included up to four factors that regulate the spread of biological invasions (Lockwood et al., 2013): resource availability, local diffusive spread, long-distance transport, and temperature-driven developmental performance.Prior research on L. dispar invasion dynamics informed how each was represented in our models.As model complexity increases in our study, so does the dependence of the model formulation on prior knowledge of L. dispar biology and invasion dynamics.Our study evaluates whether relatively simple forecasting models can yield accurate 1-year-ahead predictions of occupancy, informing the data and research needs for operationalizing spread forecasting in invasive species management programs.

| Study system
Lymantria dispar dispar is a forest-defoliating insect pest that was introduced to North America near Boston, MA in the late 1860s and is known for its periodic outbreaks that defoliate large areas of forest.
Since its introduction, it has gradually expanded its North American range to span from Minnesota to North Carolina in the US and portions of southeastern Canada (Figure 1).Larvae consume the foliage of >200 tree species, mainly preferring oaks and aspen (Liebhold et al., 1995).
Lymantria dispar spreads through a combination of natural and humanaided means.Natural dispersal is generally limited to tens of metres to a few kilometres because adult females are flightless (Whitmire & Tobin, 2006).On the other hand, accidental transportation of life stages by humans can span much longer distances, for example, when egg masses are laid on firewood, vehicles, or household items (Bigsby et al., 2011).Across the invasion front, spatial and temporal variation in the rate of spread depends on temperature (Nunez-Mir et al., 2022;Sharov et al., 1999;Walter, Meixler, et al., 2015), landscape structure (Nunez-Mir et al., 2022;Walter et al., 2016), and the dynamics of established populations (Johnson et al., 2006;Walter, Johnson, et al., 2015).Specifically, the pulsed advance and retreat of the range boundary is associated with population outbreaks behind the invasion front, which likely yield an influx of immigrants to the leading edge of the invasion.
Large-scale investments in invasion management by the Slow the Spread program provide detailed spatiotemporal data on L. dispar spread (Tobin et al., 2012).Each year, around 100,000 pheromone-baited traps are deployed in the transition zone spanning areas where L. dispar is generally established to where local densities are zero (Tobin et al., 2012).Trap deployment spans the adult maturation window, generally early summer through early fall.Although traps catch only adult males, these data are used to target nascent populations for eradication, thereby slowing overall spread of the invasion (Coleman & Liebhold, 2023).In addition, the spatiotemporally detailed data on population abundances from this program have resulted in L. dispar becoming a much-studied model system for the invasion process (Grayson & Johnson, 2018).

F I G U R E 1
The established range of Lymantria dispar for US counties coloured by the year of quarantine.A quarantine is a regulatory mechanism where the county is considered generally infested in an effort to limit movement of non-native species to uninfested areas (data provided by the US Code of Federal Regulations, Title 7, Chapter III, Section 301.45, digitized by the US Forest Service).The black lines indicate the Slow the Spread program's year 2018 projected Action Area.The Action Area is the area with the most extensive monitoring using pheromone-baited traps and where most management treatments are conducted.

| Data
We differentiated between areas occupied and unoccupied by Resource availability was represented by proportional tree canopy cover (2011 conditions) obtained from the National Land Cover Dataset (Yang et al., 2018) at 30 by 30 m resolution and aggregated to 5 by 5 km grid cells by averaging.We chose canopy cover of all trees, as opposed to a metric of host tree presence or density, to represent resource availability because previous studies have found little or no relationship between host tree density and invasion speed (Walter et al., 2016) or low-density population growth rates (Walter, Meixler, et al., 2015).This likely reflects a combination of factors: L. dispar is highly polyphagous (Liebhold et al., 1995), high host tree densities are not likely needed to maintain low-density populations (i.e.populations near the range edge), and preferred host tree species are common throughout our study area.
Local diffusive spread was estimated by applying Omniscape software (Landau et al., 2021) to tree canopy cover data and Slow the Spread trap catch density data.Omniscape is an elaboration on circuit theory-based connectivity estimators (McRae et al., 2008) that quantifies connectivity within fixed-radius moving windows, as a function of landscape structure and source strength.We considered the ability of a grid cell to conduct current (L.dispar movement) to equal the canopy cover proportion, and the strength of the current source to equal the trap catch abundance in the previous year; hence, connectivity changed through space and time as a function of changing L. dispar distribution.The Omniscape output for each grid cell gives an index of the potential strength of dispersal into a grid cell from surrounding cells, taking into account L. dispar densities in these cells.Four different values for search radius were tested: 5, 10, 25, and 50 km.Because natural dispersal by L. dispar is limited (Walter et al., 2016), as the search radius distance increases, the importance of incidental human-aided movements also increases.The selected distances correspond to multiples of the grid cell resolution.Resource availability and local diffusive spread were included in our baseline model and more complex models included long-distance transport, and developmental performance (Figure 2).Temporal variation in the potential volume of long-distance transport from established populations behind the invasion front was represented by the study area-wide total area defoliated by L. dispar in the previous year.Defoliated area is a widely used index of the magnitude of population outbreaks.Prior research showed that population outbreaks are associated with pulsed advances of the range boundary (Walter, Johnson, et al., 2015) and initiation of new populations ahead of the invasion front (Epanchin-Niell et al., 2022), likely because of increased numbers of long-distance transport events owing to extremely high population densities in areas where L. dispar is outbreaking.Hence, we expected grid cell establishment probabilities to increase as 1-year lagged defoliated area increased.Long-distance transport mechanisms can include both human-aided movements such as on firewood or household goods (Bigsby et al., 2011;Epanchin-Niell et al., 2022) and meteorologically aided long-distance dispersal in which storms blow life stages longer distances than normal dispersal capability (Frank et al., 2013).
We represented the temperature-dependent developmental performance of L. dispar in each grid cell for each year by combining gridded climate data (Thornton et al., 2020) with three measures of physiological performance: (1) a numerical model of temperature-dependent growth and development (Gray, 2004(Gray, , 2009)); (2) cold exposure risk based on experimental data on cold tolerance (Hafker et al., 2021); and (3) the pupal mass of L. dispar reared in different thermal environments (Thompson, Powers, et al., 2021).
Because previous studies provide evidence of local adaptation in L. dispar responses to temperature (Faske et al., 2019;Hafker et al., 2021;Thompson et al., 2017;Thompson, Powers, et al., 2021), we incorporated this variation into the second and third components of developmental performance.In the first component, the model of Gray (2009) was used to evaluate whether a typical individual would complete its life stage in a given year (See details in Data S1).
This model also provided estimates of onset and completion of larval development in a given year and these dates were used in the estimation of the two other components of local developmental performance based on climate.The second component of developmental performance was cold exposure risk, which accounted for the number of cold days during larval development and local adaptation in cold tolerance using data on chill coma recovery, an index of cold tolerance (MacMillan & Sinclair, 2011), from 15 populations collected across the invaded range (Hafker et al., 2021).The third main component of developmental performance was projected pupal mass, taken as an index of fitness (Faske et al., 2019), which was estimated taking into account temperatures during larval development and local adaptation in thermal performance using data from 14 populations collected across the invaded range (Thompson, Powers, et al., 2021).

| Model development
We used Bayesian occupancy models to forecast 1-year ahead L. dispar spread.The model is iterative in the sense that predictors and predictions change on an annual basis.We used our baseline model to test the spatial range of connectivity estimates (Omniscape radius) and population thresholds for occupancy, and then compared the accuracy of predictions from three models that varied in their complexity (number of predictors and the processes they represent).
We describe these experiments and their evaluation in the next section (Experiments and Evaluation, Figure 2).Here, we detail aspects of model construction and model fitting.
We model occupancy ( i,t ) of grid cell i in year t as a Bernoulli trial with occupancy probability i,t .Occupancy probability is a function of predictor covariates describing habitat quality, potential dispersal, and climatic suitability: Here, the n are regression coefficients and the X i,t (n) are covariates indexed by grid cell and year.We tested models with different combinations of covariates.We did not separately model observation errors.Pheromone-baited traps are efficient at capturing L. dispar males at low to moderate densities (Onufrieva et al., 2020) and we lack repeated observations of the same population in the same year that could be used to directly quantify observation error.

| Experiments and evaluation
We performed two experiments with our models, the first testing the importance of population thresholds for defining occupancy and distance thresholds for measuring connectivity, and the second testing the effect of model complexity on forecast accuracy.In all cases, model predictions were evaluated in terms of their total accuracy, true-positive rate, and false-positive rate.Total accuracy quantifies the total proportion of grid cells that are correctly identified as occupied or unoccupied.The true-positive rate quantifies the probability that an empirical occupancy will be classified as occupied by the model.The false-positive rate quantifies the probability that a cell will be classified as occupied by the model when it was actually unoccupied.In each of our two experiments, we considered how forecast performance changed over time and mapped correct and erroneous predictions to evaluate whether there were spatiotemporal patterns in forecast performance.
Occupancy probabilities (Ψ i,t+1 ) were interpreted as binary (occupied/not occupied) outcomes by establishing an optimal threshold that provided the best balance of sensitivity (true positives) and specificity (true negatives; Youden, 1950); Youden's Index (J) is the sum of sensitivity and specificity minus one.Independently for each trained model, we found the threshold that maximized Youden's Index (J).Because our dataset had many more unoccupied than occupied grid cells, optimal thresholds were <0.5 and varied from model to model (mean = 0.11 ± SD = 0.04).
Our first experiment tested the effects of the spatial scale of connectivity measures and occupancy thresholds on forecast performance.Both the Omniscape radius and occupancy threshold are subjective choices without a clear a priori optimum, so we investigated the impact of these choices.The Omniscape program quantifies connectivity in user-specified, fixed-width moving windows.To examine how a search radius could impact the ultimate accuracy of the models, four different radii were chosen: 5, 10, 25, and 50 km.
We also examined the influence of trap catch density thresholds used to determine grid cell occupancy by L. dispar.These were 1, 3,

| RE SULTS
Overall, 1-year-ahead Lymantria dispar occupancy can be predicted with high accuracy.Although forecast performance varied in space and time and according to details of model construction, our best performing models exhibited total accuracies as high as 96% (median = 90%, mean = 91%, standard deviation (SD) = 1.61%), true-positive rates as high as 98% (median = 90%, mean = 94%, SD = 3.82%), and false-positive rates as low as 3.7% (median = 9.0%, mean = 9.6%, SD = 2.1%).Because the true-positive rate was the most variable of these criteria across models, and because in an invasion management context the ability to successfully predict the locations occupied by an invader may be the primary concern (especially given the low false-positive rates observed here), we focused primarily on the true-positive rate for evaluating differences in performance among models.
Forecast performance (true-positive rate) was somewhat dependent on occupancy threshold and on the spatial scale of representations of local diffusive spread (Figure 3).True-positive rates tended to be slightly higher using occupancy thresholds of 1 and 3 moths than 10 moths but were most variable at 1 moth.Forecast performance was also greater when local connectivity was measured using 5 and 10 km radii in Omniscape, and declined when the radius was increased to 25 and 50 km.The highest mean (over years) true-positive rate (94% ± SD = 3%) was obtained with a 3-moth occupancy threshold and 10 km Omniscape radius.Based on these results, for comparisons of forecast accuracy across models of differing complexity, we focused on models using an occupancy threshold of 3 moths and an Omniscape radius of 10 km.Performance of the top model varied spatiotemporally.For example, in 2011, there was a tendency for the model to underpredict spread in the northwest portions of the study area, particularly near Lake Superior in both Wisconsin and Minnesota (Figure 5).In the same year, the model also overpredicted spread in Illinois and Indiana (Figure 5).See Figures S1-S8

| DISCUSS ION
Forecasting models can predict the occupancy distribution of L. dispar across its invasion front 1 year ahead with high accuracy.While further improvements and optimizations of the model presented here are possible, the high prediction accuracy strongly suggests that spread forecasting could be operationally useful for invasion management.Even though we focused on a much-studied species with an exceptional body of data and historical research that could be leveraged to create a detailed model, we found that a relatively simple model performed at least as well as more complex models incorporating added ecological detail, which offers promise that a similar approach could be successful for other, less well studied species.Our study adds to the growing research area of near-term ecological forecasting (Dietze et al., 2018), which has substantial promise for improving both fundamental ecological understandings and the success of conservation and natural resource management activities.Furthermore, it supports that forecasting the spread of invasive species could be both feasible and broadly valuable (e.g.Jones et al., 2021Jones et al., , 2022)).
Given the wide range of variables that have been mechanistically or statistically related to population dynamics and spread of L. dispar (Bigsby et al., 2011;Epanchin-Niell et al., 2022;Nunez-Mir et al., 2022), we were surprised that our simplest model, which used a coarse representation of habitat suitability and a computationally sophisticated but highly generalized depiction of local-scale dispersal, performed at least as well as more complex models.In models that predict potential distributions of invasive species, there is increasing recognition that increasing complexity does not always translate into an increase in model performance (Gallien et al., 2010).Although we chose variables to represent long-distance transport and developmental performance that were grounded in prior research (Gray, 2009;Hafker et al., 2021;Thompson, Powers, et al., 2021;Walter, Johnson, et al., 2015), it is possible that other representations of these mechanisms, or of other mechanisms entirely, would lead to better predictions.Notably, the developmental performance variable used several datasets with different We also found that forecast accuracy tended to be highest for lower occupancy thresholds and shorter Omniscape radii.One possible explanation for the stronger performance of models using shorter Omniscape radii is the short spatial scale of natural dispersal by L. dispar.Populations descended from the European strain of L. dispar, such as the populations in eastern North America, have flightless females (Robinet & Liebhold, 2009), so natural dispersal occurs primarily in the form of "ballooning" by early-instar larvae on silk threads that increase their buoyancy, which typically occurs over scales of tens to several hundreds of meters (Mason & McManus, 1981).While difficult to directly quantify, it is also likely that human-aided dispersal is more common over short (i.from the nearest area of well-established populations.These population initiation events can result from human-aided transport (Bigsby et al., 2011;Epanchin-Niell et al., 2022) or meteorological events that blow life stages ahead of the invasion front (Frank et al., 2013;Tobin & Blackburn, 2008).One area with frequent errors was the arrowhead region of northeastern Minnesota, which is believed to be influenced by blow-in from source populations across Lake Superior in Wisconsin and the upper peninsula of Michigan.Both blow-in and human-aided long-distance transport are rare and stochastic events, so it is unsurprising that our model does not account for these well.While it is possible to better understand where and when the risk of these rare stochastic events is greatest and allocate monitoring and management resources accordingly, they may be inherently difficult to forecast owing to their rarity.
While our model performed well under the tested conditions, improvement on and expansion of our model to address different applications is possible.Although our developmental performance variable did not improve predictions, we expect that climate and its interaction with physiology and development will be important for making accurate predictions further into the future than the 1-year-ahead predictions made in this study (DeMarche et al., 2019). of a spread forecast (Tobin et al., 2012), but currently there is no formal quantitative process for evaluating priority areas recommended by the decision algorithm.For forecasting the spread of other invasive species, we found it promising that an accurate forecast did not depend on detailed knowledge of the ecology of this species, which is often not available, particularly for new invaders.On the other hand, our predictions took advantage of spatiotemporally detailed and accurate data on current density and distribution, which are often not available to the same degree for other invasions.To better understand the transferability of our modelling approach to other species, future studies should investigate how forecast accuracy depends on the spatial and temporal grain, and measurement uncertainty, of distribution and density data.
Ecological forecasting has emerged as a rapidly growing new research area that cuts across traditional subdisciplines of ecology.
Mathematical models have played an important role in predicting the spread of invasive species and guiding management actions to minimize their ecological and economic damage (Thompson, Olden, & Converse, 2021); the value of forecasting biological invasions has also been recognized (Ibáñez et al., 2009(Ibáñez et al., , 2014)).Many species distribution modelling studies use climate averages to produce a single prediction of the potential distribution of invaders (e.g.Acosta et al., 2016;Bradley et al., 2010), but there are relatively fewer studies employing near-term forecasts to predict changes in the distribution of an invader (but see, e.g.Barker et al., 2020;Jones et al., 2021Jones et al., , 2022;;Messager & Olden, 2018;Rubenson & Olden, 2020).Our work demonstrates that near-term invasion forecasts can be highly accurate, even without detailed information about the biology and spread dynamics of a species.Given the strong potential benefits to management, this framework is a fruitful avenue for additional research for a wide range of ecologically damaging invasive species.

ACK N O WLE D G E M ENTS
This research was supported by the Thomas F. and Kate Miller Jeffress Memorial Trust, Awards Program in Interdisciplinary Research.Guillaume Latombe and an anonymous reviewer provided helpful feedback on the manuscript.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflict of interest.

PE E R R E V I E W
The peer review history for this article is available at https:// www.webof scien ce.com/ api/ gatew ay/ wos/ peer-review/ 10. 1111/ ddi.

L
. dispar from 2001 to 2019 using pheromone-baited trap catch data from the Slow the Spread program (https:// www.slowt hespr ead.org/ ).Typical distances between traps are 2-3 km, so we gridded the landscape into 5 by 5 km grid cells and considered a grid cell established if the mean trap catch density exceeded an occupancy threshold.We considered three thresholds (1, 3, and 10 moths) that are used for planning and decision making by the Slow the Spread program and assessed whether forecast performance depended on occupancy threshold.Because some areas are treated with mating disruption agents or insecticides to slow the spread of the invasion, we omitted trap catch data within 1.5 km of treated areas, thereby excluding the direct effects of treatments on spread.Treatments are documented by the Slow the Spread program.
We combined developmental completion, pupal mass (fitness), and cold exposure risk components into an overall developmental performance score by taking the weighted mean of the three scores; developmental completion was assigned a weight of 1, and pupal mass and cold exposure risk both received weights of 0.5.Developmental completion was assigned the highest weight because of the fundamental importance of whether L. dispar completed its life cycle based on the local climate.Assigning other components non-zero weights allowed grid cells without modelled developmental completion to have developmental performance >0, which we considered realistic because L. dispar has persisted in areas predicted by theGray (2009) model to be unsuitable(Streifel et al., 2019), possibly in part because the empirical data on which the life cycle model is based do not take into account adaptation to local climate.We computed developmental performance each year from 2000 to 2017 and took the mean for each grid cell over years to represent typical climatic suitability for L. dispar development across the study area.We then computed for each grid cell annual deviations from the mean to capture interannual variability in climate conditions.See Data S1 for further detail on the computation of developmental performance components.

E 2
Flowchart detailing the sequential development of a near-term forecasting model to predict Lymantria dispar occupancy.The Baseline model of occupancy probability included resource availability and local diffusive spread as the fundamental covariates to forecast 1-year ahead spread.Experiment 1 determined the best performing values for Omniscape radius and trap catch threshold in the baseline model.These values were used in Experiment 2, which compared forecast performance between the Baseline models and two models of increasing complexity based on the inclusion of additional covariates (long-distance transport and developmental performance).We tested models with different combinations of covariates (see Experiments and Evaluation, Figure1).Note that the covariates always reflect conditions during the year before the predicted year.The models were fit using the Integrated Nested Laplace Approximation (INLA) using the R-INLA package(Rue et al., 2009) for R (R Core Team, 2021).Data from the years 2001 to 2010 were used to parameterize the models, and data from 2011 to 2018 were set aside for forecast evaluation.We chose INLA methods over more commonly used Markov-Chain Monte Carlo methods because tests showed advantages in computational speed, which were particularly important for the full model.Coefficient estimates obtained using INLA were comparable to those obtained using Just Another Gibbs Sampler (JAGS;Plummer, 2003).
and 10 moths, and were selected because these thresholds are used by the Slow the Spread program to interpret L. dispar invasion front position.Using our baseline model containing resource availability and local diffusive spread as predictors, we ran a fully factorial experiment with the four search radii and three occupancy thresholds, resulting in 12 different combinations (Figure 2).The results of the first experiment informed the combination of Omniscape radius and occupancy threshold we focused on in our second experiment.In our second experiment, we evaluated three different models that differ in their complexity and in the ecological processes represented (Figure 2).The baseline model contained terms for resource availability (canopy cover) and local diffusive spread (Omniscape connectivity) as before.The "intermediate" model contained the first two terms and added a term for long-distance transport (defoliated area).The "full" model contained the three previously used terms and added two terms representing temperature-dependent thermal performance: the long-term (2001-2019) average and the annual departure from the mean.Our baseline model was designed to represent a minimum viable model of invasive spread based on data readily available to the Slow the Spread management program; the intermediate model added an additional easily obtained parameter and the full model leveraged knowledge obtained from considerable prior study of the ecology and physiology of this species.

Forecast
performance was not improved, and in some respects worsened, by increasing the complexity and biological detail of the model (Figure4).Although total accuracies remained high, and false-positive rates remained low, true-positive rates were more variable and there was a notable decline in the accuracy of predictions beyond 2011 for the intermediate and full models.There was no such decline for the baseline model.The full model did, however, have the lowest false-positive rate.Given the findings of Experiments 1 and 2, we henceforth focus on the baseline model with 3 moth occupancy threshold and 10 km Omniscape radius as the top model.
for a complete set of analogous Figures for 2012-2019.In general, no broad geographic regions of the invasion front consistently had higher concentrations of false positives or false negatives than others (Figure 6).

F
I G U R E 3 Forecast performance of the Baseline model with variation in (a) trap catch density threshold (1, 3, and 10 moths) and (b) Omniscape moving window radius (5, 10, 25, and 50 km; Experiment 1).Forecast performance is measured in terms of the true-positive rate of Lymantria dispar occupancy.The box plot shows median with 1.5 * IQR whiskers and outlying points.experimental designs, and thus many potential sources of variation.On the other hand, the potential for local diffusive spread depended on trap catch densities in the prior year, which implicitly contain a great deal of information about habitat quality.While landscape structure can be discontinuous, factors like climate and forest species composition are often spatially autocorrelated, so the presence of L. dispar in nearby grid cells is likely a strong indicator that a grid cell is hospitable to invasion.
e. a few to several km) distances than over several tens to hundreds of km.Although long-distance transport events, for example, due to household moves and transport of infested firewood or nursery F I G U R E 5 Map of forecast performance (true positive, false positive, false negative, and true negative) for Lymantria dispar occupancy in year 2011 using our top model.See maps in Figures S1-S8 for 2012 through 2019.F I G U R E 4 Forecast performance over time by model complexity (Experiment 2).Model complexity increases from the Baseline model with resource availability and local diffusive spread by successively adding terms accounting for long-distance transport (Intermediate model) and climatic suitability (Full model; Figure 2).The trend over time is shown by a locally weighted running smoother line.Based on the results from Experiment 1, the forecasting models used in this experiment used a density threshold of 3 moths for occupancy, and potential local diffusive spread is modelled using a moving window radius of 10 km.stock, are detectable because they advance L. dispar into areas well ahead of the invasion front, short trips are likely far more common.Previous work has found correlations between L. dispar spread and local (i.e.county-level) population and socioeconomic variables consistent with a role for short trips in spreading L. dispar (Bigsby et al., 2011; Epanchin-Niell et al., 2022).The better performance of our model at low-occupancy thresholds suggests that our model may represent dispersal processes better than population processes.Local population growth is generally needed to achieve larger population densities, so if factors driving population growth are not well represented, either due to model structure or choice of predictor variables, the resulting model might perform relatively poorly at explaining presence/absence of higher density populations.Forecast performance varied spatiotemporally despite the overall high accuracy of our top model.There was a substantial decline in accuracy of the full model following 2011, for unclear reasons.Considering predictions of our top model, errors arose when L. dispar began to occupy areas ahead of and discontinuous

Future
considerations could address the effects of climate change on occupancy over longer time horizons, as well as changes in spread dynamics when the invasion front reaches climatically marginal or unsuitable areas.Similarly, as L. dispar spreads toward areas where preferred host tree species (predominantly oaks and aspen) are less common, adopting a representation of resource availability that accounts for species composition may be necessary.Although our use of temporally static forest canopy data is justified by the low overall rate of forest cover change across our study area(Homer et al., 2020), this could produce important errors when applied to focal areas where land cover change has occurred.While we chose to focus on occupancy rather than an abundance for building a proof-of-concept forecast model, an accurate model of abundances could be even more valuable.Since our dispersal model accounts for distribution and abundance in the prior year, further developing this work into an abundance model would help support forecasts over multi-year time horizons.We observed forecast accuracies (true-positive rates) that are likely high enough to consider operationalizing forecasts of L. dispar occupancy in management.For example, since the scope of monitoring and management is constrained by resources, a spread forecast could be a decision support tool used in allocation of traps (e.g.trapping at higher density where spread is more likely) and budgeting for the number of areas potentially requiring treatment to slow spread.Currently, management of L. dispar spread is informed by a decision algorithm for determining management actions within a year that contains some elements F I G U R E 6 Total number of (a) false positive and (b) false-negative predictions of Lymantria dispar occupancy across years (2011-2019) by grid cell.Results are shown from the Baseline model using a 3 moth density threshold for occupancy and 10 km Omniscape radius.