Predicting the dynamics of establishing tree populations: A framework for statistical inference and lessons for data collection

Under global change, there is an urgent need to forecast the dynamics of establishing tree populations. However, tree population dynamics are slow and historical data on these dynamics are rare. This raises the question whether tree population dynamics can be reconstructed from data collected at a single time point. Doing so poses challenges for modelling, data collection and model‐data integration. We present a Bayesian framework that uses multiple data types to parametrize an individual‐based model (IBM) for the growth of establishing tree populations. The framework combines likelihood‐based Bayesian inference and approximate Bayesian computation (ABC). Using this framework, we assess the information content of three data types (recruitment data, dendrochronological data describing individual growth and molecular markers characterizing within‐population pedigrees) by comparing the bias and uncertainty of parameter estimates and model forecasts obtained under different simulated scenarios of data availability. The combination of all data types leads to accurate forecasts of the future state of tree populations, despite large uncertainties in some parameter estimates. Dendrochronological data were the most informative of the examined data types. Combining data types improved forecasts of population state. Nevertheless, for a given parameter related to a given process, combining data types did not improve estimates compared to using only the data type most closely related to the process. The presented Bayesian framework allows to infer the dynamics of establishing tree populations from data collected at a single time point. It helps to optimally allocate limited resources for data collection in order to rapidly improve the understanding and forecasting of tree population dynamics.


| INTRODUC TI ON
Forecasting the dynamics of non-equilibrium populations is of prime interest in the context of global change. In particular, the dynamics of establishing tree populations play a key role for phenomena as diverse as the spread of invasive tree species (Chornesky et al., 2006), the migration of tree species under climate change (Nathan et al., 2011), biome shifts from tundra to boreal forests or from grasslands to woodlands (Frost & Epstein, 2014) and carbon sequestration (Bastin et al., 2019). Tree population dynamics are also critical in the context of land use abandonment which can lead to the emergence of new forests (Lira et al., 2012) and forest recolonization in fragmented landscapes (Palmero-Iniesta et al., 2020). Assessing and ultimately managing these multiple facets of tree population dynamics under global change require a thorough understanding of forest establishment and its underlying ecological mechanisms, as well as reliable forecasts of tree population dynamics. This leads to challenges for modelling, data collection and model-data integration.
Modelling the dynamics of establishing tree populations is challenging because the demographic processes driving these dynamics (individual growth, fecundity, dispersal, recruitment, mortality) are size dependent and thus strongly vary among individuals in a population. Moreover, individuals interact with each other via distance-dependent processes such as gene flow and competition for resources. Individual-based models (IBMs) can describe the resulting feedbacks between the demographic performance of individuals and the spatial and size structure of populations (Grimm et al., 2005). Hence, IBMs are well suited for modelling the dynamics of tree populations.
Forecasts of tree population dynamics require not only models but also informative data . In particular, time series of tree population size structure need to be long enough to cover the long generation times of trees. Hence, when historical data on tree populations are lacking, it is an open question whether-within a short time period-one can collect sufficient data for inferring their past dynamics and forecasting their future dynamics. This may be feasible since several data types contain information about the past dynamics of tree populations (Grimm et al., 2005). Of particular importance are data on size distribution and spatial structure of populations, dendrochronological data on annual growth increments of individual trees, data on tree fecundity and recruitment as well as 'genetic data' on molecular polymorphic markers that can be used for genotyping individuals and reconstructing pedigrees. For instance, rates of individual growth and competition can be inferred by relating dendrochronological data on annual growth increments of individual trees to the spatial structure of the tree stand Lamonica et al., 2020). Seed production, seed dispersal and seedling recruitment can be estimated by relating recruitment data on spatial variation in seedling density to the size distribution and spatial structure of adult trees (Uriarte et al., 2005;Schurr et al., 2008). Additionally, high-resolution molecular markers have been used to infer within-population pedigrees that quantify effective pollen and seed production as well as rates of pollen (Klein et al., 2011) and seed dispersal (Klein et al., 2013). These data types differ in the time, material costs and skills required for data collection and analysis (Table 1). To design efficient data collection, it is thus important to quantify the amount of information that each data type provides for forecasting population dynamics.
The final challenge to predicting and forecasting tree population dynamics is the integration of suitable models (notably IBMs) with multiple data types. IBM parametrization still widely uses a 'piecemeal' approach, in which different submodels are parametrized independently (Clark & Gelfand, 2006;Moran & Clark, 2011).
Bayesian inference allows the use of multiple data types, collected at different scales, as well as the simultaneous estimation of all parameters (Gopalaswamy et al., 2012;Lamonica et al., 2016).
Bayesian approaches to inference typically depend on having explicit functional forms for the likelihood. However, commonly used likelihood-based methods of parameter estimation are difficult to apply to IBMs, because the likelihood functions of stochastic IBMs usually cannot be calculated explicitly (Hartig et al., 2011(Hartig et al., , 2014. Alternatively, approximate Bayesian computation (ABC) methods (Beaumont, 2018) use model simulations instead of likelihood computations, and are becoming more and more popular in the field of ecology (Csilléry et al., 2010;Hartig et al., 2011). ABC has also TA B L E 1 Data types that can be collected at a single time point to parametrize an IBM of tree population establishment been applied to parametrize IBMs of tree population dynamics from historical forest inventory data (Hartig et al., 2014;Lagarrigues et al., 2015). However, the efficiency of ABC methods to parametrize IBMs of tree population dynamics from different data types remains to explore. This paper has two aims: first, we develop a statistical framework that combines likelihood-based and ABC methods to parametrize an IBM of establishing tree populations which have not reached equilibrium yet and in which the initial founder tree may still be alive.
This framework can integrate multiple data types collected at a single time point (size distribution and spatial structure, genetic, recruitment and dendrochronological data). Second, we quantify how informative these different data types are for understanding and forecasting tree population dynamics. These results provide guidelines for the optimal allocation of limited resources for data collection in order to improve the understanding and forecasting of tree population dynamics.

| MATERIAL S AND ME THODS
We formulated an individual-based model (IBM) for the dynamics of an establishing tree population and then developed a statistical framework for the estimation of model parameters from combinations of different data types ( Figure 1). To test the framework and to quantify the amount of information provided by different data types, we used a 'virtual ecologist' approach (Zurell et al., 2010), collecting virtual data from simulated forest population dynamics.
We consider the dynamics of an isolated establishing tree population (Lamonica et al., 2020;Ruiz-Carbayo et al., 2020) focusing on establishment of a new forest patch rather than gradual forest expansion Palmero-Iniesta et al., 2020). We applied the framework to a range of scenarios of data availability that represent different combinations of collected data types and evaluated the bias and uncertainty of parameter estimates and of forecast for-

| General overview
The individual-based, spatially explicit and grid-based model describes the dynamics of forest establishment within a patch. The grid is divided in square cells of 1 m 2 . Each tree is characterized by its position on the grid (the cell number n i ), its mother and father tree and by its size, namely diameter at breast height (DBH) in a given time step, (x(i, t)). The following processes are modelled: growth, adult mortality, fecundity and pollen and seed dispersal, and seedling survival. The time step of the model is 1 year. The model structure is shown in Figure 1 and model parameters are listed in Table 2.

Individual growth and competition
Annual tree growth depends both on the current tree size and competition from neighbouring trees. The mean of the logarithm of the absolute growth rate g (i, t) of individual i at time t is modelled following Uriarte et al. (2004): with a 1 the maximum growth rate, a 2 the size at maximum growth rate, a 3 the shape of the growth rate curve and b 1 the sensitivity to competition. y c (i, t) is the sum of competition kernels depending on the distance D i,j between the focal individual i and its neighbour j and the size of the neighbour x(j, t) (Nottebrock et al., 2017): with b 2 the distance at which competition is 37% of maximum competition at zero distance.
The logarithm of growth increment y g (i, t) (cm/year) follows a normal distribution with mean g (i, t) and standard deviation a 4 : The size of individual i at the end of time step t is then:

Adult mortality
The probability (i, t) that a tree i dies in year t depends on the tree's current size x(i, t) (Hülsmann et al., 2018): with c 1 the logit − 1 regression intercept and c 2 the logit − 1 regression linear coefficient. Whether individual i dies at the end of time step t follows a Bernoulli process with probability (i, t).

Pollen production, pollen dispersal and ovule fertilization
The model describes the relative contribution of pollen from each parent in the population and pollen immigration from outside the population. These relative pollen contributions are the basis for assigning fatherhood, but we do not consider the total number of fertilized seeds to be limited by pollen availability. The pollen production Methods in Ecology and Evoluঞon LAMONICA et AL. y p (i, t) of individual i at time t depends on current size x(i, t) and is set to 0 if the individual is smaller than the size at maturity d 1 : with d 2 the allometric exponent of the fecundity-size relationship.
Ovules cannot be self-fertilized. Ovule fertilization on a tree is a function of the distance to the neighbouring trees and their respective pollen production. We use an exponential kernel (Klein et al., 2006) k p (i, j) to model pollen dispersal from individual i to a neighbour j: with e 1 the mean pollen dispersal distance. The relative contribution (i, j, t) of pollen from individual i to the pollen cloud that can fertilize a seed of individual j at time t is: F I G U R E 1 A Bayesian framework to parametrize an individual-based model (IBM) for the dynamics of establishing tree populations from data collected at a single time point. In a first step, submodels for recruitment and/or growth and competition are parametrized using likelihood-based Bayesian inference. Posterior parameter distributions of this first step then provide some of the priors for a second step in which the full IBM is parametrized with approximate Bayesian computation (ABC) methods using further data on size distribution and spatial structure as well as pedigrees. Note that the framework can either use only size distribution spatial structure data or it can combine these data with arbitrary combinations of the other data types (if dendrochronological and recruitment data are not used, the framework simplifies to classical ABC) TA B L E 2 Parameters of an individual-based model for the dynamics of establishing tree populations with reference values used for simulating data and prior distributions used for Bayesian parameter estimation Size at maturity Size (DBH) at which an individual starts producing seeds and pollen cm 10 Each individual receives a constant amount of immigrant pollen from outside the population g 1 .

Seed production, seed dispersal and recruitment
For each grid cell, the model describes the contribution of seeds from each parent and seed immigration from outside the population. As for pollen production, seed production y s (i, t) of individual i at time t depends on current tree size x(i, t) and maturity threshold d 1 : The number of seeds N(i, t) produced by individual i at time t follows a Poisson distribution with mean y s (i, t): We use an exponential kernel (Klein et al., 2006) k s (i, n) to model seed dispersal from individual i to grid cell n with e 2 the mean seed dispersal distance and D i,n the distance between individual i and grid cell n.
The contribution of individual i to the seed pool of grid cell n is where D i,n is the distance between individual i and cell n. Finally, we assume that all seeds in a grid cell become seedlings (if a lower proportion of seeds became seedlings this would just alter the interpretation of seed production y s ).

Seedling survival and parenthood assignment
In each empty grid cell that currently does not hold a tree, a maximum of one sapling can establish. Lottery competition is used to determine which (if any) of the seedlings in the grid cell survives and becomes a new sapling (1 cm DBH).
The probability (i, n, t) that a new tree in grid cell n at time step t originates from mother tree i is with f 1 the inverse seedling survival rate, i.e. the seedling number per cell for which there is a 50% chance that one seedling survives.
The contribution of immigrant seeds to the seed pool is g 2 . Similarly, the probability that a new tree originates from immigrant seeds is . The probability that no seedling survives and grid cell n remains empty is The mother of a new tree is drawn from a multinomial distribution with the following probabilities for each potential mother tree in the population i and for seed immigration from outside the population ( (1, n, t), …, (i, n, t), …, (I, n, t), (n, t)). For new trees with mother tree i inside the population, the father is drawn from a multinomial distribution with the following probabilities for each potential father tree j in the population and for pollen immigration from outside the popu- , g 1 ) (after normalization of the contributions and g 1 ) (Oddou-Muratorio Davi, 2014). We assume that new trees originating from immigrant seeds have both their mother and father outside the population.

| Virtual data collection
For virtual data collection, we simulated a population with parameters set to their reference values (Table 2) (Lamonica et al., 2021).
The size of the plot was 55 m by 55 m. The model was initialized with one individual of 2 cm DBH randomly located on the grid. After running the simulation for 70 years, we recorded for each individual with a DBH >3 cm the size, location on the grid, absolute growth rates in each year, as well as the father and mother trees. We also recorded the number of seedlings per grid cell.
Simulation outputs were used to generate four different data types (Table 1) Table 1, we indicate time investment, material costs and skills required to collect and analyse each data type, based on expert assessments.
To evaluate the type and amount of information provided by different combinations of data types we estimated parameters using eight data scenarios. A base scenario only included the size distribution and spatial structure and we additionally considered all possible combinations of the base data with genetic data, dendrochronological data and recruitment data. These data scenarios were denoted by the combination of letters representing the included data types.

| A Bayesian framework for IBM parametrization from different data types
The statistical framework for parameter estimation (Figure 1) combines likelihood-based Bayesian inference with ABC. For the base data scenario and scenario G, we estimated all parameters using only the ABC method based on summary statistics of the size distribution and spatial structure and/or summary statistics of the genetic data.
When the data scenario included recruitment and/or dendrochronological data, we first estimated parameters related to fecundity and dispersal and/or growth with likelihood-based Bayesian methods. The posterior distributions obtained for those parameters were then used as the prior distributions for the subsequent ABC parameter estimation. As original priors we used uniform distributions that span the realistic ranges for each parameter (Table 2).

| Likelihood-based parameter estimation
Dendrochronological data were used to estimate posterior distributions for parameters of growth (a 1 , a 2 , a 3 , a 4 ) and competition (b 1 , b 2 ).
The data comprised the growth increment y g (i, t) that were recorded for each living tree i in all years t and the sizes x(i, t) (calculated as summed growth increments). The likelihood of the observed growth increment y g (i, t) was calculated from a log-normal distribution where parameter a 4 is the standard deviation and g (i, t) is the expected growth (on a log-scale) predicted according to Equation (1).
Recruitment data were used to estimate posterior distributions for parameters of fecundity (d 1 , d 2 ), seed dispersal (e 2 ) and seed immigration (g 2 ). The data comprised the recorded number of seedlings per grid cell n in the last simulated year Z n . The likelihood of the observed number of seedlings Z n was calculated from a Poisson distribution Z n ∼ P( ∑ I i = 1 (i, n) + g 2 ) where parameter g 2 is the contribution of immigrant seeds to the seed pool and (i, n) follows Equation (12).
Likelihood-based parameter estimation used the rjags package (Plummer, 2009). Three independent chains with three different initial conditions were run in parallel using the snow and dclone packages (Sólymos, 2010;Tierney et al., 2016). The chains were run for 50,000 iterations to reach convergence, as verified with the Gelman Rubin (1992) convergence diagnostic (95% quantile of potential scale reduction factor below 1.02).

| Approximate Bayesian computation
For ABC, we sampled 100,000 parameter combinations from the prior distributions and simulated the model for each parameter combination (Lamonica et al., 2021). Subsequently, we computed summary statistics for each simulated population and compared them to the summary statistics calculated from the data. Five hundred parameter combinations were accepted and the posterior distributions were drawn using the local linear regression of the abc package (Csilléry et al., 2012). For size distribution and spatial structure (Base), we computed six summary statistics: the basal area, the standard deviation and 9th decile of size distribution, tree density, a clumping index (standard deviation of tree density distribution in 16 m 2 plots), the coefficient of the regression of individual size against the mean of the distances to neighbours divided by neighbour sizes. For genetic data (G), we computed eight summary statistics: the mean distance between offspring and mother trees, the mean distance between father and mother trees, the allometric exponent of the offspring number-size relationship, the percentage of local mother trees, the percentage of local father trees, the mean maternal and paternal sibship size

| Evaluation
To compare data scenarios in terms of predictive accuracy, we used

| Parameter estimation
The combination of all data types (DGR) leads to precise and accurate estimates of most model parameters (Figure 2, Supporting Information Table A1). The exception is parameters describing adult mortality (c 1 and c 2 ) and mean pollen dispersal distance (e 1 ), for which posterior distributions were rather wide. The posterior of the inverse seedling survival rate (f 1 ) was narrow but biased. The base data by themselves provided little information on model parameters, leading to posteriors that were just slightly narrower than the prior distributions.
With genetic data (G), we obtained more precise estimates for size at maturity d 1 and immigration parameters (g 1 and g 2 ) than the base scenario. Additionally, uncertainties were slightly reduced for growth parameter a 2 and for competition b 1 . Dendrochronological data (D) lead to very precise and accurate estimates for all growth and competition parameters (a 1 , a 2 , a 3 , a 4 and b 1 , b 2 ). Uncertainties F I G U R E 2 Posterior distributions of model parameters obtained under different data scenarios (Base: only size distribution and spatial structure, G: base plus genetic, R: base plus recruitment, RG: base plus recruitment plus genetic, D: base plus dendrochronological, DR: base plus dendrochronological plus recruitment, DG: base plus dendrochronological, plus genetic, DRG: all data). The blue lines represent the 'true' values of each parameter that were used for data simulation. See Table 1 for parameter definitions were also reduced for pollen and seed immigration (g 1 and g 2 ).
Recruitment data (R) only informed on seed fecundity (d 2 ) and seed dispersal (e 2 ).
Combining genetic and dendrochronological data (DG) improved the estimation of parameters related to immigration, and to seed fecundity and dispersal. The combination of genetic and recruitment data (RG) reduced uncertainties on immigration parameters.
Combining recruitment and dendrochronological data (scenario DR) did not improve estimates (except for the inverse seedling survival rate f 1 ).

| Forecasts of future dynamics
Forecasts of future population state are one way to summarize the information content of data scenarios across all model parameters.
All data scenarios yielded largely unbiased forecasts of population density (Figures 3 and 4, Supporting Information Table A2) as well as the size and genetic structure of populations 40 years into the future ( Figure 3). Forecasts of density (number of trees and basal area) and size structure (size distribution and offspring size relationship) were least uncertain when dendrochronological data were used for model parametrization (Figures 3 and 4). Forecasts of the proportion of local mothers and fathers became more certain when using genetic data in combination with other data. However, forecasts of the proportion of local mothers were also improved by using only dendrochronological data. In contrast, recruitment data helped little to reduce forecast uncertainty (Figures 3 and 4). We found similar results for reconstructions of past population dynamics (from 0 to 70 years, Supporting Information Figures A1 and A2). Overall, forecasts of future dynamics showed less uncertainty than predicted past dynamics, presumably because forecast simulations were initialized with the exact same starting point (the 'true' population structure in year 70) and were only run over 40 years.

| D ISCUSS I ON
In this study, we developed a framework for parametrizing an IBM for establishing tree populations from different data types by combining ABC and likelihood-based estimation methods (Figure 1). When comparing the value of different data types for model parametrization, we found that dendrochronological data were particularly informative although genetic and recruitment data also improved estimates of certain parameters (Figure 2). Reliable forecasts of future population structure and density could be obtained from dendrochronological and genetic data, whereas recruitment data contributed little to improve forecasts (Figures 3 and 4). This holds not only for population states at the time horizon of forecasts ( Figure 3) but also for the temporal dynamics up to this time point (Figure 4). However, even when combining all data types we did not obtain reliable estimates of adult mortality and seedling survival (Figure 2). This is because these parameters partly compensate each other in their effect on the summary statistics used for model parametrization, causing moderately strong correlations between parameter estimates (Supporting Information Figure A3). Thus, different parameter combinations lead to the same population state, for example, a high adult mortality combined with a high seedling survival rate would lead to similar size distribution and spatial structure as lower adult mortality and lower seedling survival rate. Another limit to inference occurs for mean pollen dispersal distance e 1 . This parameter could not be estimated reliably because the data were not informative-only few father trees from the population produced offspring-and the true value (50 m) is close to the side length of the study site (55 m). Moreover, the sensitivity of summary statistics to this parameter was quite low.
We obtained reliable forecasts of population dynamics even though the marginal posterior distributions of adult mortality and seedling survival parameters were wide (Figures 3 and 4

, Supporting
Information Figures A1 and A2). This is due to the above-mentioned correlation of parameter estimates. It may not be necessary to resolve this trivariate parameter uncertainty if one is interested in forecasting population dynamics rather than learning about specific parameters (Sirén et al., 2018). However, if one needs precise estimates of individual parameters, certain data types are more valuable than others. In our study, for instance, dendrochronological data lead to good forecasts, but provided almost no information on fecundity parameters, whereas recruitment data did.
When designing data collection to infer the dynamics of establishing tree populations, dendrochronological data are the first choice (in addition to data on tree size distribution and spatial structure) given its high information content and the relatively low costs in terms of time, money and skills (Table 1). Combining either recruitment or genetic data with dendrochronological data informed fecundity, dispersal, recruitment and/or immigration processes, thus reducing further uncertainty. This combination of data types leads to better forecasts of population structure because different data types informed on different processes. Yet when focusing on a single parameter related to a given progress, combining data types did not lead to a better estimate than using only the data type related to the process.
Our analysis permits to steer future data collection depending on (a) data costs and available skills, (b) processes and parameters of interest and (c) available prior information and knowledge on the study system. For instance, if the studied population is known to be very isolated or if the seed dispersal distance is unknown, it may be most informative to complement dendrochronological with recruitment data. On the contrary, if parenthood relationships are of major interest, it may not be necessary to collect recruitment data in addition to genetic data because the latter will provide enough information on parenthood.
We considered a 'best-case scenario' of data-driven modelling, in which (a) each data type is sampled exhaustively, (b) there are no measurement errors and (c) the 'true' model is fitted to the data. We chose this best-case approach since it enabled us to quantify the potential value of each data type. Exhaustive sampling of genetic data is necessary since it may allow to retrieve the complete pedigree (Jones & Arden, 2003). The ability to reconstruct parentage and accommodate genotyping errors depends on genetic data resolution F I G U R E 3 Forecasts of the state of tree populations 40 years into the future. Forecasts of eight population characteristics were generated by an IBM parametrized with different data scenarios (see Figure 2), from the prior distribution of parameters (grey) and with the 'true' parameter values used for data simulation (blue). Each boxplot represents variation in forecasts for 500 replicate simulations of the stochastic simulation model

Proportion of local fathers
(number of single nucleotide polymorphisms or multi-allelic microsatellites) and on the method used to reconstruct parentage (exclusion, relatedness-based or likelihood-based methods) (Huisman, 2017). Genomics is a rapidly moving field, both in terms of resolution and data analysis, so that one can expect the quality of reconstructed pedigrees to get closer to the best-case scenario considered here.
Exhaustive sampling of dendrochronological data is also necessary since it allows to infer past competition neighbourhoods (Lamonica et al., 2020). Importantly, the exhaustive sampling of genetic and dendrochronological data is feasible for populations comprising several hundreds of individuals (Gerzabek et al., 2020;Lamonica et al., 2020). Concerning recruitment data, censusing the entire patch area might be too time-consuming, so that one would rather sample a fraction of the patch. Yet it is possible to obtain reliable estimates of spatial recruitment parameters by sampling < 1 % of the patch area (Schurr et al., 2008).
The estimation of IBM parameters with summary statistics-based ABC methods can lack efficiency (Hartig et al., 2014). Here, we found that the two-step combination of direct likelihood-based estimation and ABC provided very informative posterior distributions of certain parameters.
Specifically, likelihood-based methods provided posterior distributions for submodels, which were then used as priors for ABC estimation of the full IBM. We obtained very good estimates of growth and competition parameters because dendrochronological data directly result from the growth-competition process. In contrast, other data types result from the combination of several processes, which increases uncertainty of individual parameter estimates but means that these data inform about a wider range of parameters (for instance, genetic data provide modest information about growth, competition, adult mortality and fecundity).

| CON CLUS IONS
Inferring past dynamics and parametrizing predictive models from data collected at a single time point is a prerequisite to forecast the population dynamics of long-lived organisms. This is because time is lacking to monitor long-term dynamics of these organisms, especially under global change. In the case of trees, dendrochronological and genetic data make it possible to reconstruct not only individual growth trajectories but also past population dynamics. While tree rings are specific to trees in climates with annual growth cycles, various other sessile organisms have morphological structures that can serve the same purpose, for instance annual growth rings in herb roots (Dietz & Ullmann, 1998), coral growth rings (Marschal et al., 2004) and annual stem growth increments in certain shrubs (Carlson et al., 2011). Annual growth increments can also be inferred from the otholiths of fish (Rountrey et al., 2014).
With annual growth data, one can reconstruct individual growth trajectories that can help to infer population dynamics. The presented framework for data-driven modelling and forecasting of tree population dynamics could thus be modified to describe the population dynamics of other ecologically important long-lived sessile organisms (whereas application to mobile organisms such as fish would require major modifications in submodels for competition, reproduction and dispersal). Moreover, it will be exciting to extend the framework beyond population dynamics to the modelling of community and range dynamics (Evans et al., 2016;Pagel et al., 2020;Schurr et al., 2012), and the forecasting of invasion dynamics and species range expansions (Hastings & Wysham, 2010;Nathan et al., 2011).

F I G U R E 4
Forecasts of tree density (top) and basal area (bottom) over a time horizon of 40 years. Columns represent different data scenarios (see Figure 2). Lines represent medians and areas the 95% credibility intervals of forecasts obtained for prior distributions (light grey), posterior distributions under the given data scenario (orange) and 'true' parameter values used for data simulation (blue) analysed the data and led the writing of the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

PE E R R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13656.
[Correction added on 18 Aug 2021, after first online publication: Peer review history statement has been added.]

DATA AVA I L A B I L I T Y S TAT E M E N T
All data can be found in the Zenodo data repository (Lamonica et al., 2021) and R code for IBM simulation and parameter estimation is publicly available at the following address https://gitlab.com/lamon ica_/ data-predi cting -the-dynam ics-of-estab lishi ng-tree-popul ation sa-frame work-for-stati stica l-infer ence-and-lesso ns-for-data-colle ction.