Overcoming data gaps using integrated models to estimate migratory species' dynamics during cryptic periods of the annual cycle

Environmental and anthropogenic factors affect the population dynamics of migratory species throughout their annual cycles. However, identifying the spatiotemporal drivers of migratory species' abundances is difficult because of extensive gaps in monitoring data. The collection of unstructured opportunistic data by volunteer (citizen science) networks provides a solution to address data gaps for locations and time periods during which structured, design‐based data are difficult or impossible to collect. To estimate population abundance and distribution at broad spatiotemporal extents, we developed an integrated model that incorporates unstructured data during time periods and spatial locations when structured data are unavailable. We validated our approach through simulations and then applied the framework to the eastern North American migratory population of monarch butterflies during their spring breeding period in eastern Texas. Spring climate conditions have been identified as a key driver of monarch population sizes during subsequent summer and winter periods. However, low monarch densities during the spring combined with very few design‐based surveys in the region have limited the ability to isolate effects of spring weather variables on monarchs. Simulation results confirmed the ability of our integrated model to accurately and precisely estimate abundance indices and the effects of covariates during locations and time periods in which structured sampling are lacking. In our case study, we combined opportunistic monarch observations during the spring migration and breeding period with structured data from the summer Midwestern breeding grounds. Our model revealed a nonstationary relationship between weather conditions and local monarch abundance during the spring, driven by spatially varying vegetation and temperature conditions. Data for widespread and migratory species are often fragmented across multiple monitoring programs, potentially requiring the use of both structured and unstructured data sources to obtain complete geographic coverage. Our integrated model can estimate population abundance at broad spatiotemporal extents despite structured data gaps during the annual cycle by leveraging opportunistic data.

of migratory species throughout their annual cycles.However, identifying the spatiotemporal drivers of migratory species' abundances is difficult because of extensive gaps in monitoring data.The collection of unstructured opportunistic data by volunteer (citizen science) networks provides a solution to address data gaps for locations and time periods during which structured, design-based data are difficult or impossible to collect.2. To estimate population abundance and distribution at broad spatiotemporal extents, we developed an integrated model that incorporates unstructured data during time periods and spatial locations when structured data are unavailable.
We validated our approach through simulations and then applied the framework to the eastern North American migratory population of monarch butterflies during their spring breeding period in eastern Texas.Spring climate conditions have been identified as a key driver of monarch population sizes during subsequent summer and winter periods.However, low monarch densities during the spring combined with very few design-based surveys in the region have limited the ability to isolate effects of spring weather variables on monarchs.
3. Simulation results confirmed the ability of our integrated model to accurately and precisely estimate abundance indices and the effects of covariates during locations and time periods in which structured sampling are lacking.In our case study, we combined opportunistic monarch observations during the spring migration and breeding period with structured data from the summer Midwestern breeding grounds.Our model revealed a nonstationary relationship between weather conditions and local monarch abundance during the spring, driven by spatially varying vegetation and temperature conditions.4. Data for widespread and migratory species are often fragmented across multiple monitoring programs, potentially requiring the use of both structured and unstructured data sources to obtain complete geographic coverage.Our

| INTRODUC TI ON
Migratory species offer many ecosystem services that are highly valued by humans including nutrient cycling, pest control, seed dispersal, recreational opportunities, and food (Green & Elmberg, 2014;Mattsson et al., 2018;Thogmartin et al., 2022).Across taxonomic groups, migratory species have declined and face ongoing threats from myriad factors including climate change and habitat loss (Deinet et al., 2020;Zurell et al., 2018;Zylstra et al., 2022).Migratory species are among the most vulnerable species groups because they traverse large geographic areas throughout their annual cycle, making them susceptible to stressors across wide geographic regions including breeding grounds, nonbreeding grounds and throughout migratory pathways (Saunders et al., 2021;Zylstra et al., 2021).This poses unique challenges for disentangling the effects of various stressors and determining appropriate conservation actions.
Structured data, or data collected during systematic surveys based on consistent protocols, are paramount in identifying ecological processes that lead to declines in migratory species (Lindenmayer et al., 2012).However, structured data often require large investments of time and resources and thus tend to be limited in scope, often capturing only a subset of a species' range.This is especially true for migratory species, in that the bulk of data collection efforts often occur over a limited portion of their annual cycle.For instance, for North American birds, most monitoring occurs on breeding grounds when species are comparatively stationary, with less data collected on non-breeding grounds and during periods of migration (Marra et al., 2015;Rushing et al., 2016).Scaling up systematic monitoring efforts from well sampled areas to a species' full range during its annual cycle is challenging (but see Şekercioğlu, 2012;Suman et al., 2023), leaving spatiotemporal gaps in data, and hampering current analyses of migratory species ecology (Marra et al., 2015).As a result, knowledge of how environmental factors affect the abundance and distribution of migratory species is incomplete, especially during cryptic and hard-to-study periods of the annual cycle.
Collections of unstructured, opportunistic data via volunteerbased networks (e.g.iNaturalist, eBird) are providing a means to address geographic and temporal gaps in currently available structured data (Chandler et al., 2017).Unstructured data are those that are collected without a planned survey design or monitoring protocol, such as an opportunistic sighting of a given species.Though unstructured data can be plentiful, producing unbiased and meaningful inferences from these data can be challenging (Fithian et al., 2015).
Spatial sampling biases can lead to a disproportionate number of observations near urban areas (e.g.roads, city parks), where people are concentrated.Additionally, observers typically record when a species is present but not when it is absent, inhibiting estimation of occurrence probabilities or abundance indices when exclusively using presence-only data (Dorazio, 2014;Farr et al., 2021).Consequently, the value of unstructured data for population-level inferences is more limited.Integrated modelling, in which multiple data types are analysed in a single, unified framework, can be a useful tool to address limitations encountered when using structured (i.e.limited extent and quantity) or unstructured (i.e.limited information content) data to model dynamics of animal populations (Fletcher et al., 2019;Isaac et al., 2020;Miller et al., 2019;Schindler et al., 2022;Zipkin et al., 2021;Zipkin & Saunders, 2018).For migratory species, integrated modelling can resolve discrepancies in estimated population trends from separate periods of the annual cycle by linking seasonal dynamics (Saunders, Farr, et al., 2019).Integrated models have also been used to evaluate potential environmental drivers of population trends across a species' full annual cycle (Rushing et al., 2017;Rushing et al., 2021;Zylstra et al., 2021).However, an approach has yet to be developed that formally incorporates unstructured data to explore population dynamics during time periods and spatial domains when structured data are unavailable.
Here, we develop an integrated model to estimate spatiotemporal population abundance for a migratory species when structured data gaps exist within the annual cycle.Underlying our integrated framework is a Poisson point process model that describes variation in abundance using relevant covariates and spatial random effects (Dorazio, 2014;Fithian et al., 2015).We integrate structured count data and unstructured presence-only observations using a joint-likelihood, which shares parameters between data types within the point process model (Fletcher et al., 2019).The joint-likelihood provides a pathway to lend information on abundance from structured data to periods of the annual cycle when only unstructured data are available.We demonstrate the validity of this framework using a simulation study and then apply our model to a case study of monarch butterflies in eastern North America.
Our case study was motivated by our work on the effects of factors influencing adult monarch butterfly abundances during the spring breeding season when structured data are unavailable (Oberhauser et al., 2015).The eastern population of monarch butterflies in North America has declined by as much as 80% over the integrated model can estimate population abundance at broad spatiotemporal extents despite structured data gaps during the annual cycle by leveraging opportunistic data.

K E Y W O R D S
data integration, hierarchical modelling, integrated modelling, migratory species, monarch butterfly last 25 years (Brower et al., 2012;Thogmartin et al., 2017).The population undertakes a spectacular four-generation annual migration from their wintering grounds in central Mexico, through the spring breeding corridor in eastern Texas and Oklahoma, and then up to the summer breeding grounds in the northern half of the U.S. and southern Canada east of the Rocky Mountains before the last generation returns to the wintering grounds in Mexico (Brower et al., 2012).Analyses of recent data from the Midwestern summer breeding areas reveal that climate conditions during the spring and summer breeding seasons were more than seven times as important than other factors (e.g.size of the population at the end of winter, crop cover, and glyphosate application rates in the Midwest) in determining the peak summer population size between 2004 and 2018 (Zylstra et al., 2021).Despite the importance of the spring migratory and breeding area for eastern monarchs, few structured surveys have been conducted in this area.Thus, we lack direct evidence of the relationship between weather variables and the spatiotemporal dynamics of monarchs throughout the spring breeding season.We use our integrated model to estimate monarch abundance and associated effects of environmental conditions during their spring breeding period, allowing detailed inferences throughout this important, yet understudied and under sampled, period of the monarch's migratory cycle.

| Model description
The purpose of our integrated model is to generate spatially explicit estimates of population abundance within one or more seasons (hereafter, periods) of a species' annual migratory cycle when structured data are unavailable (Figure 1a,b).We define abundance, N t , during periods of the annual cycle, t = 1, … , T, as the number of individuals in a corresponding spatial domain S t (e.g.breeding range, non-breeding range, migratory pathway) occupied by the species during period t.Abundance is assumed to be a random variable arising from a Poisson process, N t ∼ Pois Λ t , with expected abundance, Λ t .We specify the spatial distribution of individuals within domain S t during period t using a point process model.Individuals of the population are distributed within the domain according to an intensity function, t (s), which describes the number of individuals at location s (s ∈ S t ) during period t.We link abundance to the point process by specifying expected abundance as the integral of the intensity function across the domain S t : Λ t = ∫ S t t (s)ds.

| Spatial heterogeneity in the distribution of individuals
Local abundance often changes as a function of one or more environmental factors.As these factors likely vary within the spatial domain, S t , abundance is heterogeneous, and thus can be referred to as an inhomogeneous point process.We begin by modelling local abundance (i.e. the intensity function), t (s), during a period of the annual cycle where both structured and unstructured data exist.In this description, we assume for simplicity that this occurs at t = 1 .
We model t=1 (s) with spatial covariates using an inhomogeneous point process: where X t=1 (s) is a vector of environmental covariates at location s during period, t = 1.′ is the vector of corresponding coefficients, and 0 is the expected number of individuals at location s when covariates X are at their mean values (standardized to mean of zero and standard deviation of 1).Both t=1 (s) and 0 are estimated on the log scale as their support ranges from 0 to ∞ while the support of covariates is from − ∞ to ∞.We account for spatial variation not explained by environmental covariates with a spatial random effect, (s), using a Gaussian random field (i.e. the log-Gaussian Cox process; Møller et al., 1998).We implemented the Gaussian random field using a stochastic partial differential equation approximation via a triangulated spatial mesh (which may vary by time period) of k = 1, … , K t nodes across S t (Krainski et al., 2018;Simpson et al., 2016; Supporting Information S1).

| Seasonal change in abundance
During subsequent periods of the annual cycle, t = 2, … , T, abundance may change because of demographic processes (i.e.mortality, recruitment, immigration, emigration).Various methods exist to describe population dynamics (Kéry & Royle, 2020), but here we simply describe changes in abundance between periods as a function of spatiotemporal variation in environmental factors along with a fixed effect for each subsequent period: X t (s) is a vector of environmental covariates for each period after the first, t = 2, … , T. Changes in abundance not captured by covariates are represented by t , which is the difference in the log of mean abundance between time periods t − 1 and t.

| Unstructured presence-only data
Presence-only data are typically the outcome of opportunistic, unstructured sampling where observations of the target species are recorded, whereas the absence of the species at a given location is not recorded.We modelled presence-only data resulting from opportunistic sampling using a thinned Poisson process, Y t ∼ Pois ∫ S t t (s)p t (s)ds , where Y t is the number of presenceonly observations during period t within spatial domain S t (Dorazio, 2014).The thinning rate function, p t (s), represents the observation process that relates expected abundance at location s, t (s), to the presence-only data (i.e. the number of observations at location s).Here, p t (s), is the probability that an individual was detected and is constrained to be between 0 and 1 where we assume double counting is trivial compared to imperfect detection.
We account for spatiotemporal variation in imperfect detection and sampling bias by specifying a linear model with a logit link for p t (s), with an intercept (p 0 ), covariates (W t (s)) and their effects ( ) (Dorazio, 2012).

| Structured count data
We assume the structured count data, C tj , occur at a set of sites (i.e.subsets of the spatial domain) j = 1, … , J t , each with an associated area D j where D 1,…,J t ⊂ S t .Count data are also described with a Poisson process: C tj ∼ Pois ∫ D j t (s)ds .Count data, like many structured data types, cannot be used as direct measures of true abundance due to observation error.When counts are replicated or additional information is recorded, an observation process model can be included to account for imperfect detection.We use a single-visit count model that does not account for imperfect detection.Thus, we were only able to estimate relative abundance (assuming that detection does not vary over time and space), similar to many large-scale structured monitoring programs (e.g.Christmas Bird Count).Because counts of individuals are aggregated within a site (i.e.observations are not associated with a point location, s i , for each individual i ), a change-ofsupport problem (i.e.mismatches in spatial scales of multiple data sources) must be addressed (Pacifici et al., 2019).Given that the expected sum of Poisson random variables is equivalent to the sum of their means, we specify an area offset and approximate the integral over each site j as D j tj ≈ ∫ D j t (s)ds.This assumes a homogeneous process within each site (i.e.tj is the mean intensity for D j ) and a linear relationship between area and the number of individuals (i.e.no density dependence).Counts at each period t and site j can be modelled as: C tj ∼ Pois D j tj , where log tj = log 0 + � X tj + t + j .
Covariates, X tj , are summarized across each D j , and a projection matrix, A kj , is needed to interpolate the random effects, (s), estimated at each node k in the spatial mesh to each site j ( j = k • A kj ;Krainski et al., 2018).

| Spatiotemporal data integration
Within our modelling framework, structured data provide an anchor point for estimating the intercept of the intensity function, 0 , which is unidentifiable in a standalone presence-only data analysis (Dorazio, 2014;Farr et al., 2021).We assume that structured data exist during one or more periods within a species' annual cycle but are not available during every period, whereas unstructured data are available during periods with and without structured data.Structured data are used to estimate 0 , which also appears in the model for unstructured data.This allows us to estimate spatially explicit abundance during periods when structured data are unavailable.A critical assumption of this framework is that changes in abundance between time periods are explained by covariates or captured as deviations from abundance in t = 1 (i.e. through a time period effect, t ).To create the integrated model, a joint-likelihood is formed using the product of the likelihoods of the structured (e.g.count) and unstructured (e.g.presence-only) data (Supporting Information S1).

| Simulation study
The objective of our simulation study was to assess whether the integrated model, which relies on a shared intensity function (Equation 2) between structured and unstructured data collected over multiple time periods, can produce unbiased estimates of unknown parameters.We evaluated our simulation study's objective by measuring the accuracy and precision of estimated parameters, including the expected number of individuals, 0 , and the change in the log of mean abundance between time periods, t .We simulated abundance in two periods within a spatial domain, S (Figure 2a,b).
Both unstructured and structured data were available during the first period, while only unstructured data were available during the second period.We simulated presence-only observations for the unstructured data and single-visit counts for the structured data.
We specified 0 and to generate abundances near 3600 and 9900 individuals within S for periods 1 and 2, respectively (Figure 2a,b).To create a spatially varying landscape, we specified an inhomogeneous point process via a single environmental covariate that varied over space and between periods, X t (s), with unmeasured spatial heterogeneity, (s).We specified a single covariate effect to establish a relationship between the environmental covariate and local abundance (i.e.intensity) that was constant between periods (Figure 1a,b; Supporting Information S2).The spatial random effect, (s), was simulated using a Gaussian random field with precision ( ) and scale ( ) hyperparameters (Supporting Information S1 and S2).
We then simulated the true number of individuals at locations for each season using a Poisson point process and the corresponding intensity function.
We generated presence-only data for both periods (Figure 1a,b) where the thinning process incorporated effect and a single covariate that did not vary with period, W(s), representing sampling bias (simulated from a Gaussian random field).We standardized the covariate by subtracting the minimum value and dividing by the mean, which forced areas with low sampling intensity (i.e.lower covariate values) towards the intercept, p 0 .We set p 0 = logit(0.01);thus, locations at the lower end of the covariate range represent low or no sampling effort (i.e.p t ≈ 0).We generated single-visit count data at 100 sites (i.e.circles with uniform areas) for the first period only (Figure 2a), where detections of individuals in a site were summed to generate a single count.For simplicity, we specified that all individuals present were detected during each survey (i.e.counts were perfect).
We simulated both true abundance and observed datasets 1000 times, and for each simulation, estimated population abundance to evaluate our ability to leverage structured data over space and time using the integrated model.The majority of fixed effect parameters ( 0 , , , , p 0 , ), were constant across simulations.We allowed the environmental effect parameter, , to vary between 0.5 and 1.5 on the log-scale to evaluate the accuracy and precision of estimates under a range of conditions.The covariates X t (s) and W(s) , the spatial random effects, (s), and individual animal locations were redrawn for each simulation (see Supporting Information S2).Parameters were estimated using maximum likelihood with R-INLA, Template Model Builder, and R (Kristensen et al., 2016;Lindgren & Rue, 2015;R Core Team, 2020).To assess convergence (i.e.negative log likelihood at a minimum), we checked that the absolute values of the final gradient for each parameter were near 0 and checked that the Hessian matrix was positive definite (Skaug & Fournier, 2006).To assess performance, we calculated percent relative bias for the mean estimate of each parameter: estimated mean − true true × 100.We also derived estimates of abundance and associated uncertainty to compare with true abundance.Finally, we estimated abundance for each of the 1000 simulated datasets using only the presence-only data to compare with estimates from our integrated framework.We did not estimate parameters using only the count data as there was no structured sampling during the second period.

| Case study
The Multiple hypotheses have been proposed to explain the decline of monarchs in eastern North America (Thogmartin et al., 2017).
Recent retrospective analyses demonstrate the importance of weather conditions during the spring breeding season, highlighting a critical period of population growth (Saunders et al., 2016(Saunders et al., , 2018)).
The peak size of the monarch population on the summer breeding grounds, which is highly correlated with the size of the population that ultimately returns to Mexico each winter, is strongly influenced by weather conditions on the spring breeding grounds (Zylstra et al., 2021).Spring temperature and precipitation affect the recruitment of monarchs by impacting development and survival as well as host plant resources (Zalucki, 1982), but the importance of these and other factors are only understood indirectly through analyses of count data collected on the summer breeding grounds (Saunders et al., 2018;Zipkin et al., 2012).
With spring conditions potentially becoming less favourable for monarch recruitment (Neupane et al., 2022;Zylstra et al., 2022), a deeper understanding of the direct links between local weather conditions and monarch abundance can aid management and policy decisions for eastern monarchs, a population of substantial conservation concern.Fortunately, multiple volunteer data-collection networks compile opportunistic observations of adult monarchs in the spring and early summer.Using our integrated framework, we leveraged these opportunistic data to estimate population abundance of monarchs during spring breeding and arrival on the summer breeding grounds.We also assessed the effects of weather and greenness (i.e. a proxy for host and nectar resources) on monarch abundance during this critical period of the migratory cycle.

| Data
Our analysis focuses on spring migrations from 2016 to 2018.Within a given year, we partitioned spring migration into three distinct spatiotemporal periods.We indexed years with r (R = 3) and used S t (T = 3) to denote each spatiotemporal period during spring migration in a given year (Figure 1d).The spatiotemporal periods of the spring migration were constant across years and defined similarly to previous studies (Saunders, Ries, et al., 2019;Zylstra et al., 2021).
Early Mexico originate from this region (Flockhart et al., 2017).This early summer period is also important as it contains both structured and unstructured data, whereas we did not have access to structured data during S 1 and S 2 .
We used opportunistic, presence-only data from five data collec- Prior to modelling, we pooled unstructured data into a single dataset rather than modelling each source with a separate submodel as protocols and data types were similar.
The structured monarch data collected during early summer (3 May-6 June) come from five different monitoring programs.
The first program we used was the North American Butterfly Association's Fourth of July (NFJ) surveys.These structured surveys are conducted annually in summer, with optimal timing depending on local conditions at specified sites throughout the summer breeding range.Volunteers survey areas within a fixed 25-km diameter circle and record the number of adult butterflies observed, by species (Oberhauser et al., 2015).We summed monarch observations among volunteers during each survey, resulting in a single count for each site and year C rj .The other four monitoring programs were 'Pollard walk' surveys from state-level butterfly monitoring networks (BMNs) in Illinois, Ohio, Iowa, and Michigan (Oberhauser et al., 2015, Pollard, 1977).Volunteers walked fixed transects and counted the number of adult butterflies observed.
Unlike NFJ surveys, volunteers surveyed Pollard transects multiple times each year; however, we only used the first survey of the year because our focus was estimating monarch abundance during early summer.Data from the NFJ and Pollard transects thus contain a single count for each site and year.The average number of counts each year in S 3 was: n S 3 = 229.Similar to the unstructured data, we pooled the structured data into a single dataset rather than modelling each with a separate submodel as similar protocols led to a common data type.
We included environmental covariates that are likely to influence monarch recruitment and survival, and hence spatiotemporal abundance.Monarch larvae are obligate feeders on milkweed (Asclepias spp.; Pleasants & Oberhauser, 2013); thus, milkweed availability is likely to drive monarch recruitment.Because data on milkweed distribution and abundance are lacking, we used the normalized difference vegetation index (NDVI), a measure of landscape greenness, as a proxy for milkweed distribution during the spring and early summer (Flockhart et al., 2013;Lemoine, 2015).Weather variables are also vital to the recruitment of monarchs as they influence the rates of development and survival from eggs to adults (Zalucki, 1982).
Similar to other monarch models, we used the number of growing degree days (GDD) to describe thermal conditions on the spring and summer breeding grounds (Saunders et al., 2016(Saunders et al., , 2018;;Zipkin et al., 2012;Zylstra et al., 2021).GDD is the accumulation of heat within a specific temperature range that allows for monarch development (McMaster & Wilhelm, 1997).We calculated GDD values as the heat accumulated at a given location over a 14-day period immediately preceding each observation or survey.NDVI and GDD variables were calculated for each observation, survey, and mesh node (Didan, 2015;Thornton et al., 2020, Supporting Information S3).
We also included environmental covariates in our model that  Browning et al., 2006) and assumed that butterflies within 1-m of the observer were detected.

| Analysis
We used the integrated modelling framework described above to estimate abundance of monarchs in spring and early summer in each year from 2016 to 2018 with a few case-specific modifications (Figure 1b,c).We selected a 100-m 2 baseline resolution for presence-only data and converted the area offset (D j ) accordingly.In addition to the fixed effects describing population change between spatiotemporal domains within a single annual cycle ( t ), we included a fixed effect to capture population change between years ( r ).We allowed the effects of covariates to vary by period and included quadratic effects for both NDVI and GDD to account for peaks in optimal conditions.Effects were fixed across years (i.e.no interaction between covariates and years).Thus, the updated equation for abundance (modified from Equation 2) is: We report abundance as the expected number of monarchs per 100-m 2 .Parameters were estimated using maximum likelihood with R-INLA, Template Model Builder, and R (Kristensen et al., 2016; Lindgren & Rue, 2015; R Core Team, 2020).See Supporting Information S3 for additional implementation details.

| Simulation study
The integration of structured data from the first time period with unstructured data from both periods allowed for nearly unbiased estimates of abundance (Figure 2c, left), even in the second period during which no structured data were available (1.3% and 1.2% relative biases in periods 1 and 2, respectively).In contrast, abundance could not be estimated when relying exclusively on either presence-only data (Figure 2c, right; Dorazio, 2014) or structured data that were limited to a single time period.Precision of estimates was also higher for the integrated model as compared to a presence-only analysis (Supporting Information S2: Table S2.1).Our integrated modelling framework successfully identified the intercept of the intensity function, 0 (1.7% relative bias), effects of an environmental covariate on local abundance, (−3.0%relative bias; simulated values between 0.5 and 1.5 on the log-scale), and the change in abundance between periods, (0.0% relative bias; Figure 2d; see Supporting Information S2: Table S2.1 for full results).The thinning parameters, p 0 and , were also estimated with low bias (−0.4% and −1.2% relative bias, respectively).Hyperparameters of the spatial random effect, and , had larger biases across simulations (−37.0%for and 14.8% for ), which may be a result of the generic spatial mesh used for analyses (Dambly et al., 2023;Jullum, 2020).

| Case study
Environmental conditions in the eastern Texas region during the spring breeding season, particularly in March, strongly influenced the overall size of the monarch population and the distribution of individuals across the breeding area (Figure 3a).We identified nonlinear effects of both NDVI and GDD during all three time periods (i.e.early spring, late spring, and early summer; Figure 2b,c).
The relationships between environmental variables and monarch abundance changed throughout the spring and early summer breeding seasons (Figure 3d-f).Highest monarch abundances in the early spring and summer were associated with above average NDVI and temperature values (i.e.100-km 2 resolution values averaged across 2016-2018 for each period's domain; Figure 3d,f), while the highest abundances in late spring were associated with average NDVI and temperature values (Figure 3e).Estimates for the GDD linear and quadratic effects (Figure 3a,c,d) reveal that peak monarch abundance in the early spring breeding period was constrained to a tight range of values (especially compared to summer; Figure 3d versus Figure 3f), suggesting that a precise range of cumulative temperature conditions at the start of the annual breeding season may be necessary for large population growth.
Using the integrated model, we estimated spatially explicit monarch abundances (reported as adults per 100 m 2 ) across the spring and summer breeding areas (Figure 4), providing the first such estimates for eastern monarchs in spring, a time period when only unstructured, opportunistic data are available.In each year, monarch abundance increased over the course of the spring breeding season, with the lowest expected population sizes in early spring and the highest in early summer (Figure 4).These results were expected based on monarch life history patterns that reflect the ongoing breeding of subsequent generations from the beginning of spring until the end of the summer breeding period.
The annual size of the eastern monarch population peaks in late summer just before the final generation enters diapause and initiates the southerly migration (Ries et al., 2015).Yearly estimates of spring and early summer population indices are consistent with estimates of overwintering population size in Mexico later that same year, with considerably higher abundance in 2018 than in the previous 2 years (Supporting Information S3: Table S3.1).These results further highlight the importance of the spring breeding season and suggest that annual variations in the overall size of the monarch population can be seen as early as the first generation of the year (Figure 4).

| DISCUSS ION
We developed an integrated model using a Poisson point process to estimate seasonal abundances of migratory species with unstructured, opportunistic data when structured data are sparse or unavailable during portions of the annual cycle.Integration of unstructured data with information-rich structured data can improve inferences on migratory populations relative to independent analyses of either data source (Figure 2).Analyses relying exclusively on structured data may result in inferences that are limited to certain periods of a species annual cycle (e.g.breeding), while analyses using just presence-only data cannot estimate absolute abundance due to unidentifiable parameters.By addressing data gaps and parameter identifiability through integrated modelling, our framework can be used to estimate relationships between environmental factors and population abundance and elucidate dynamics that occur across multiple periods of a species' migratory or annual cycle.We validated our model through a simulation study where we demonstrated that abundance or relative abundance (i.e. when imperfect detection is unmeasured or ignored) can be estimated for time periods in which structured data are unavailable.The simulations also confirmed that our modelling framework can yield unbiased estimates of overall abundance and the effects of spatially varying covariates.However, estimates of spatial hyperparameters were somewhat biased, suggesting uncertainty in the model's ability to parse spatial heterogeneity beyond that explained by covariates.
We used a non-customized mesh across simulations, which likely contributed to biases.For specific applications, such as our monarch case study (Supporting Information S3), tuning the spatial mesh to the system at hand can minimize such biases (Dambly et al., 2023;Jullum, 2020;Krainski et al., 2018).Krainski et al. (2018) provides in-depth instruction on how to build triangulated meshes using the R-INLA package while Dambly et al. (2023) documents tradeoffs encountered when specifying a spatial mesh.
We demonstrated the merit of our model by generating spatiotemporal estimates of relative monarch abundance during their spring breeding season when structured data are largely unavailable and densities are low.The spring breeding season has been identified as a critical period for the eastern monarch population, as evidenced by consistent and strong links between spring weather conditions and the abundance of monarchs in late summer and early winter (Saunders et al., 2016(Saunders et al., , 2018;;Zylstra et al., 2021).Our model corroborates these results from long-term analyses, despite the use of different data sources and a different spatiotemporal resolution than has been used in previous work.Further, our results revealed nonstationary effects of environmental variables on monarch abundances during the spring and early summer (Rollinson et al., 2021).We observed a shift in the effects of NDVI (a metric of landscape greenness) on monarch abundance, where peak abundances were associated with the greenest conditions in early spring and summer and slightly lower NDVI values in late spring (Figure 3b).Although there was some F I G U R E 4 Predicted abundance of monarchs (reported as the number of adults/100 m 2 ) from 2016 to 2018 during the spring in eastern Texas and Oklahoma (left two columns) and during the early summer breeding season in the midwestern U.S. and southern Ontario, Canada (right column).Abundance was calculated using parameter estimates and values of NDVI and GDD averaged across each period: early spring (March 8-April 4), late spring (April 19-May 2), and early summer (May 3-June 6) at a given location and year.Rows and columns correspond to years and periods, respectively.variation, the highest monarch abundances were associated with cumulative 14-day GDD values between 100 and 150 throughout the spring and early summer.
As demonstrated in the simulation and case studies, our integrated model offers a powerful approach for estimating abundance during periods and spatial locations that lack design-based monitoring data.However, we recommend that potential users evaluate the appropriateness of our framework for their systems prior to implementation.For example, some taxa may violate the assumption of a shared intensity function across space and time because they contain distinct subpopulations with unique demographic processes in different parts of their range.The amount and type of data available could also render data integration futile if low amounts of structured data preclude the estimation of parameters or if the amount of structured data is sufficient to estimate parameters across a species full annual cycle without the use of unstructured data.Additionally, a critical assumption of our integrated framework is that the relationship between unstructured data and species abundance is constant over space and time (i.e.Observation processes, such as imperfect detection of individuals during structured data collection, may reduce parameter accuracy and precision if not properly accounted for within the model structure.Uncertainty in covariate values along with potential correlations between environmental and observation covariates should be evaluated to minimize errors and collinearity (Dorazio, 2014).
Practitioners should also consider if large quantities of unstructured data in an area are likely to reflect high quality habitat (i.e.environmental covariate) or simply spatial biases from variable data collection effort (i.e.observation covariate).Finally, improper covariate selection may lead to parameter biases (Kéry & Royle, 2020), and for many species and systems, relevant observation covariates may be hard to identify or measure (Kelling et al., 2019;Troudet et al., 2017).
Nearly one sixth of examined species are classified as data deficient by the IUCN, and far more have little to no structured data available throughout much of their range (IUCN, 2022).Migratory species, in particular, often lack data across large portions of their geographic range despite being vulnerable to wide ranging threats (Marra et al., 2015).A shift in biodiversity monitoring from structured to unstructured data collection protocols is providing new sources of information to mitigate data gaps for migratory species.
However, new analytical approaches, such as integrated modelling, are needed to overcome observation biases in unstructured data and realize the potential of this emerging data type to help address and mitigate ongoing species declines.

F
An integrated modelling framework allows estimation of population abundance across space and time for species with spatiotemporal gaps in structured data.(a) Relationships between data and parameters for a general description of our integrated modelling framework are shown with a directed acyclic graph (DAG).The DAG also displays data availability across a species annual cycle (three spatiotemporal periods differentiated by colour [period 1, blue; period 2, green; period 3, orange]).The outer circle displays time while oblong shapes within the outer circle depict each data type (light grey shading indicates unstructured data; dark grey shading indicates structured data).The border colours of the oblong shapes match the period in which data were collected.The innermost circle represents the biological process for latent abundance that is shared between all data types.Data (squares) and parameters (circles) are shown with dashed lines displaying the flow of information.The border and shading of data and parameter objects indicate the period with which they are associated.Parameters in the abundance model with black borders and rainbow shading were estimated across periods.(b) Descriptions of parameters and data for each DAG (a, c) and the case study map (d) are shown within a legend.(c) Modifications of the integrated model for the case study (see footnotes) are shown within a separate DAG.Unstructured (i.e.presence-only) data exist in early and late spring (blue and green shading) along with early summer (orange shading).But, structured (i.e.single-visit counts) data are restricted to the early summer.Our analysis does not include data from late summer through the winter portion of monarch's annual cycle.(d) Location and timing of observation processes for each data type (unstructured data, smaller circles; structured data larger circles; colours match legend descriptions) are displayed within a map.
eastern North American migratory population of monarch butterflies has a multi-generational migratory cycle.Adult monarchs overwinter (December-February) in large clusters blanketing patches of high-elevation Oyamel fir (Abies religiosa) forests in central Mexico.In early spring (March-April), monarchs migrate north and arrive on the breeding grounds in and around eastern Texas, where they lay eggs, producing the first new generation of the year (Brower et al., 2012).This first generation continues the northerly migration to the summer breeding grounds, with most adults concentrated across the Midwestern and northeastern U.S. and southeastern Canada.About three more generations are produced on the summer breeding grounds (May-August); the final generation of the year enters reproductive diapause and commences F I G U R E 2 Visualization of our simulation study showing latent abundance and the amount of both structured and unstructured data within domain S for (a) period 1 and (b) period 2. The background gradient indicates the expected abundance at location s (reported as individuals per unit area).(a) Period 1 contains both presence-only data (black dots) as well as 100 sites with single-visit count data (dashed circles).(b) Period 2 exclusively contains presence-only data (black dots).(c, d) Percent relative bias associated with estimates from the integrated and presence-only models (red horizontal lines at zero indicate no bias in estimated parameters; positive values are overestimates; negative values are underestimates; boxes represent the interquartile range of estimates from 1000 simulations; centre lines are the median value; whiskers are values within 1.5 times the interquartile range).(c) Percent relative bias in estimates of abundance for period 1 (N1) and period 2 (N2) using the integrated model (left) and in an analysis of just the presence-only data (right).(d) Percent relative bias in estimates of the intensity function intercept ( 0 ), covariate effect ( ), and change in abundance between periods ( ) as estimated using the integrated model (blue) and in an analysis of the presence-only data (grey).thesoutherly migration in autumn (September-October) to the overwintering grounds in Mexico (arriving by mid-December).Monitoring of the migration varies dramatically across the annual cycle(Oberhauser et al., 2015).Structured monitoring occurs annually on the overwintering grounds in central Mexico (where researchers delineate the number of hectares occupied by monarch aggregations) and throughout the Midwestern summer breeding grounds (where volunteer observers count butterflies on established plots or transects).However, structured monitoring has only recently begun during their spring breeding and autumn migration.Further compounding monitoring challenges, the population is at its smallest size during the spring phase of migration and generally occurs at low densities(Ries & Oberhauser, 2015).
spring, S 1 , encompassed eastern Texas and Oklahoma (93.5°W to 100°W, 25.8° N to 37° N) between 8 March and 4 April and was specified to include adult monarchs arriving from Mexico as they lay eggs that become the first generation of the year.Late spring, S 2 , covered the same spatial extent but occurred between 19 April and 2 May, which primarily included sightings of adults that are part of the first generation produced that year.Early summer, S 3 , captured these adults shortly after they arrive on the summer breeding grounds, between 3 May and 6 June.We used data from an area that spans the Midwestern U.S. and Ontario, Canada (74.3° W to 97.2° W, 39.8° N to 49.4° N).Although monarchs disperse outside of this area during summer, we focused on the Midwest because the majority of individuals that arrive on the overwintering grounds in were likely to influence the number of monarchs observed during sampling.For the presence-only data, we used two covariates in the thinning function to account for spatiotemporal variation in sampling intensity.First, we used observations of non-monarch butterflies (i.e.number of detections for each species) within the superfamily Papilionoidea in iNaturalist (iNaturalist Community, 2019), assuming high numbers of butterfly observations correlate with high sampling intensity.We used human population density as another proxy for sampling intensity with the assumption that the number of people in the surrounding area is positively correlated with the number of local observers (Center for International Earth Science Information Network-CIESIN-Columbia University, 2018; Geldmann et al., 2016; Supporting Information S3).For the structured data, the amount of effort expended searching for monarchs on each survey is likely to affect the number of individuals observed.Because the area searched during each structured survey (e.g.length of each BMN transect or area covered during each NFJ survey) was not reported, we converted the number of person hours spent surveying into area sampled based on the average human walking speed (5-km/hour;

F
Effects of NDVI and GDD on monarch abundance (reported as the number of adults/100 m 2 ).(a) Parameter estimates (mean and 95% confidence intervals) for the linear and quadratic effects of NDVI and GDD during each period: early spring, late spring, and early summer.Dashed line at zero indicates no covariate effect on monarch abundance.(b, c) Estimated marginal effects (mean and 95% confidence intervals) of NDVI and GDD, respectively, on monarch population abundance during each period.Panels in the bottom row depict the marginal effect (mean) of both NDVI and GDD on monarch abundance in (d) early spring, (e) late spring, and (f) early summer.Dashed lines indicate the mean values of NDVI and GDD summarized across 2016-2018 for each period's spatial domain (at 100-km 2 resolution) and date range.
parameters in Equation 3 are stationary), with the Poisson thinning rate accurately capturing variation in the observation process of unstructured data.Violations of this assumption may lead to increased biases as estimation of abundance moves further away in space and time from where structured data were used to estimate 0 .We advise that practitioners develop their own simulations to assess the feasibility of this type of integrated modelling, as case specific considerations may require further model complexity (see either Supporting Information S2 and S3, Zenodo [DOI: https://doi.org/10.5281/zenodo.8433370],or GitHub [https:// github.com/ zipki nlab/ Farr_ etal_ 2024_ MEE] for code used in our analyses).