Tree regeneration in models of forest dynamics: A key priority for further research

Tree regeneration is a key process in forest dynamics, particularly in the context of forest resilience and climate change. Models are pivotal for assessing long-term forest dynamics, and they have been in use for more than 50 years. However, there is a need to evaluate their capacity to accurately represent


INTRODUCTION
Forests provide a wide range of products and services of vital importance to humankind (FOREST EUROPE, 2020).Under the pressure of climate change, increasing disturbance impacts, and changing societal demands on forest ecosystem services, it is becoming ever more important to understand how forest structure, composition, and function will change and to evaluate forest capacity to adapt to or be resilient after disturbance (Lindner et al., 2010;Seidl & Turner, 2022).A wide range of models of forest dynamics have been developed over the past decades considering the impacts of climate (Bugmann & Seidl, 2022).From these studies, it is evident that we have a reasonably good understanding how to model tree growth (Bugmann et al., 1996;Vanclay & Skovsgaard, 1997), and substantial efforts have been dedicated to improving the representation of tree mortality (Bugmann et al., 2019;Cailleret et al., 2017).In contrast, tree regeneration is much less studied and is often represented rather coarsely in models (Leishman et al., 1992;Price et al., 2001;Walck et al., 2011).This presents an important research gap, particularly in the context of climate-induced forest disturbances and forest resilience.
Tree regeneration arises from multiple processes including pollination, fruit maturation, seed production, dispersal, germination, juvenile growth, and survival (Price et al., 2001;Vacchiano et al., 2018).All these processes are difficult to assess, and some of them are scarcely understood and thus appear highly stochastic (Bogdziewicz et al., 2021).Identifying the appropriate level of complexity for the mathematical formulation of the key factors that are leading to successful tree regeneration is challenging.Currently, tree regeneration processes in dynamic forest models are handled in a multitude of ways (Bugmann & Seidl, 2022;König et al., 2022): from (1) entirely ignoring them (as done in classical forest growth models, for example, Pretzsch et al., 2002), across (2) the use of a few simple environmental filters, as done in most forest "gap" models (Shugart, 1984) and Dynamic Global Vegetation Models (e.g., Hickler et al., 2012;Smith et al., 2001), to (3) complex approaches that incorporate local feedback from the canopy, multiple ecological processes, and often also short time steps (e.g., Seidl et al., 2012;Wehrli et al., 2006), or (4) field-based statistical parameterizations, which however are not easy to extrapolate in space and time (e.g., Ribbens et al., 1994).
Overall, models are needed to (1) synthesize existing empirical data and explore their relationships, (2) assess future tree regeneration, for example, in the context of global change scenarios, and (3) identify the most important processes that are shaping vegetation patterns.Given the strategies that are used in models of forest dynamics to represent tree regeneration, their behavior often is prone to problems, such as very high levels of tree regeneration that necessitate excess mortality at early stages of tree life to simulate correct stand structure and composition (Kroiss & HilleRisLambers, 2015).Also, inadequately high species diversity in tree regeneration may be simulated, which is characteristic of "classical" forest gap models (Gutiérrez et al., 2016), at least as long as the simulation set-up comprises a multitude of species.Some models use calibration against local canopy-level data to constrain simulated tree regeneration, which is likely to hamper the general applicability of these models, for example, under scenarios of climate change.Furthermore, correctly capturing the species composition of tree regeneration as a function of the presence of seed trees in the canopy is often a particular challenge, potentially leading to unrealistic successional drift in the model, which must be corrected by factors that are hard to parameterize (e.g., Lischke & Löffler, 2006).A related issue is the excessive reduction of species diversity due to positive feedback effects such that eventually just single-species stands remain (Meier et al., 2011).This is sometimes corrected by the incorporation of a low level of seed influx of all species at all times (Schumacher et al., 2006) or by restricting the number of seeds per species in the seedbank (Lischke & Löffler, 2006).However, simulated species composition is usually exceedingly sensitive to assumptions about seed availability, whereas the parameters of such functions are poorly constrained by field data.Lastly, there is often a problem with insufficient observational constraints on parameter values for models that start from very small tree sizes (e.g., 10 cm in height, or even from seed) and track tree development in a process-oriented manner by considering a multitude of ecological influences, rather than emphasizing tree recruitment into a larger size class (e.g., Wehrli et al., 2006).
Modeling tree regeneration processes is challenging, and even in empirical ecology, it has not received much attention (Hanbury-Brown et al., 2022), although many subprocesses have been studied in detail (Miina et al., 2006).Yet, little data are available that cover all the processes within one species along environmental gradients, let alone for a vast suite of species.Data on forest regeneration are often fragmented, which constitutes a major problem for model building (Clark et al., 1999).For example, monitoring on permanent plots (such as National Forest Inventories) often measures the ingrowth of new trees into a specific size class (Zell et al., 2019), however with a design that captures the rate after the stand initiation phase (Hallsby et al., 2015).Data from permanent plots, although available across regions, are also highly heterogeneous, rendering their use difficult in a modeling context (Käber et al., 2021).Lastly, targeted studies to measure tree regeneration on experimental sites are often limited in spatial extent, species studied, or the subset of processes that are investigated (Berdanier & Clark, 2016;Collet & Chenost, 2006).
Thus, a focus on the modeling of tree regeneration processes is sorely needed and overdue (Price et al., 2001;Walck et al., 2011) if we are to make reliable projections of future forest dynamics, that is, when the models need to be operated in extrapolation mode, as well as from a fundamental ecological point of view for increasing systems understanding.In the present study, for the first time, a large number of forest models commonly used to assess forest dynamics under climate change are evaluated against a continental-scale, multispecies harmonized dataset on tree recruitment (Käber et al., 2023).By tree recruitment, we refer to the passing of trees across a specific diameter threshold ("ingrowth").We included models that are based on a range of "philosophies," from models operating at the stand to the global scale as well as the range of models from empirically derived to "process-based" (e.g., Bugmann et al., 1996;Fabrika & Ďurský, 2012;Lexer & Hönninger, 2001;Reyer et al., 2014).
Due to the large variability in tree regeneration patterns in nature and the large number of factors driving and constraining this process-including some that are not incorporated explicitly in most models, such as deer browsing-we do not aim for a detailed statistical evaluation of each model.Instead, we aim to evaluate the general recruitment patterns and magnitudes simulated by the models and benchmark the simulated regeneration niche of multiple species against empirical data along a wide environmental gradient of temperature, moisture, and light availability (Grubb, 1977).
More specifically, we aim to answer the following questions: (1) Are models of forest dynamics capturing accurately tree recruitment levels, initial tree species diversity, and mortality in the recruitment?(2) Do model traits explain differences in model performance?(3) How well do the models capture total recruitment and the regeneration niches of individual species across environmental gradients of light availability, temperature, and soil moisture?
We evaluate the performance of the models by analyzing simulated data alongside observed data.Specifically, we examine tree species diversity, mortality rates in tree recruitment, and recruitment levels.Then, we link model traits (e.g., complexity and scale) to performance, and we assess whether modeled total recruitment or individual species regeneration niches (i.e., proportional recruitment along an environmental gradient) align with observational data.

Models
Fifteen models of forest dynamics (including two models featuring model variants) developed for the stand, landscape, or global scale were used to simulate forest dynamics (Table 1).The approaches used for model construction and their origin differ strongly, with most of the models featuring a largely "process-based" approach, whereas two models are based on formulations derived from the statistical analysis of inventory data (SIBYLA and xComp).Some of the models largely rely on the original approach underlying forest gap models (e.g., ForClim 1), whereas some are based on plant ecophysiological processes (e.g., FORMIND and iLand).The different approaches underlying the models have strongly influenced the formulation of tree regeneration processes.
The models can be differentiated into "regeneration" and "recruitment" approaches (König et al., 2022;Vanclay & Jerome, 1994).Regeneration models include processes such as flowering and pollination, seed production, seed dispersal, germination, and seedling growth, which ultimately lead to the simulated number of established trees.Recruitment models, in contrast, introduce a number of new trees with certain characteristics such as biomass or diameter, without explicitly considering earlier development processes.We can further distinguish models that feature a feedback from stand properties to simulated recruitment, that is, where the level and species composition of recruitment are influenced by the adult tree community via the production of seeds (or seedlings/saplings), from models that do not contain such feedback.
The starting point for tree regeneration in the models differs as well, ranging from seedbank, seed, or seedling to sapling (i.e., trees much larger than 10 cm height, often ca.2-3-m tall).Models that start from seed need to include a larger number of ecological processes such as germination and survival of young seedlings, whereas models that start from saplings have to aggregate via parameterizations several ecological processes that are not treated explicitly.This latter approach reduces model complexity but comes at the cost of blurred process representation.
In the models used here, the overall complexity in the regeneration modules varies considerably.Following Bugmann and Seidl (2022), we can classify seven models (iLand, PICUS, LandClim, ForCEEPS, LPJ-GUESS, ForClim, and TreeMig) as having rather high complexity in their regeneration modules (mean regeneration formulation complexity across all processes >0, Table 1), whereas the other eight models feature relatively low complexity.Two models, ForClim and ForCEEPS, were used here with two alternative variants of regeneration.ForClim variant 1 (Bugmann et al., 1996) is based on a regeneration module that adheres closely to the concept introduced by Botkin et al. (1972) which is based on environmental filters considering species-specific thresholds of light availability and climatic variables.In contrast, ForClim variant 11 is adopting a slightly more complex approach where individual species properties and their relation to the environment are represented gradually along with the relative suitability of different species (i.e., competition among tree species) (Huber et al., 2020).These two model variants allow us to evaluate a more process-based and complex module (variant 11) against a simple module (variant 1), while the rest of the model structure is identical.Similarly, the two ForCEEPS variants allow us to isolate the importance of the canopy feedback (i.e., simulated actual composition and relative abundance of species in the plot) via seed trees for the quantity and quality (e.g., diversity and composition) of simulated regeneration, as one variant includes this feedback, whereas the other does not.

Observed data
Recruitment data covering a wide range of environmental conditions are hard to obtain, and this is one of the reasons why most models of forest dynamics have never been confronted with a dataset covering such gradients over a large number of sites, to evaluate how well regeneration is captured.The observations used here are derived from a novel network of sites in forest reserves that represent the range of environmental gradients in temperature and precipitation in Central Europe as compiled in the EuFoRIa network (EuFoRIa, 2019;Käber et al., 2023) (Figure 1).The environmental coverage and the variety of forests represented by this data are unprecedented within Europe.These forests have been unmanaged for at least 10 years prior to their designation as reserves, T A B L E 1 Models included in this study and their characteristics and simulation strategies used.

Note:
Scale refers to the models scale, Type to the models type, Population structure to the model population approach, Feedback to the inclusion or not of adult tree feedback to the level and species composition of regeneration, Approach to the model regeneration module approach, Start from to the stage from which trees are recruited, Complexity, to the mean regeneration formulation as estimated in Bugmann and Seidl (2022), Species to which species from the original 11 species were simulated, Runtime (spin-up) (years) to the number of years used per model in their simulations for the spin-up period and regeneration sampling, and Climate data type (use of years) to the type of data used in the simulations and how it was used.Reference provides the link to the publication with more model details. and most of them provide time series of natural forest dynamics over multiple decades and up to ca. 85 years.
The census periods range from 3 to 37 years, with an average of 14 years.The data provide information at the tree level, thus allowing for the sequential comparison of processes such as individual-tree recruitment and death between the measurements.We selected 200 sites from this network as the benchmarking dataset for the simulation, so as to be representative of the environmental variation contained in the data.This was achieved by applying k-means clustering to define 200 clusters of plots from the original set of 869 plots along the environmental dimensions of temperature, climatic water balance, soil quality, slope and aspect.Each of these sites featured at least two consecutive measurements.Recruitment size thresholds ranged from stem dbh from 0 to 10 cm.For our study, we defined two datasets, where one included 165 sites with a diameter threshold of 7 cm or lower, and the other included another 35 sites with diameter thresholds between 7 and 10 cm.Therefore, we had in total 200 sites with observations of newly recruited trees above 10 cm and a subset of 165 sites with recruitment data above 7 cm.In the Results, any observations from the data at the 7-cm threshold originated from 165 sites.
Plot size ranged from 0.02 to 5.52 ha, and the data were further processed and aggregated following Käber et al. (2023) to provide recruitment rates per hectare and per decade.The observations used in this study featured 30,900 newly established trees.Recruitment rates per site, sample, and decade ranged from 0 to 1246 trees, with a mean of 56 trees.Adult species composition was also available for each plot.For more details on this dataset and the detailed recruitment information, cf.Käber et al. (2023).

Simulation protocol
The overarching goal of the simulation experiments was to assess tree recruitment as it arises from empirical data against its representation in a wide range of models of forest dynamics.We define tree recruitment as the passing of a breast height diameter threshold of 7 and 10 cm (synonym: ingrowth).To this end, each modeling group was provided with a detailed protocol (Díaz-Y añez et al., 2022) with instructions on how to perform the simulations, input variables on climate and soil conditions, and the list of expected output variables.Neither were further site information (except for the data specified below) nor any data on tree recruitment or forest stand features provided prior to the simulation.
The input variables were collected from different data sources and aggregated to be suitable for the needs of the different models.Time series of climatic variables from 1981 to 2018 were provided in hourly (Era5-land data, Muñoz Sabater, 2019), daily, and monthly resolutions (CHELSA data, Karger et al., 2021).Some variables required for some models, such as relative humidity or vapor pressure deficit, were calculated from these variables.The final instructions for using the climate data were slightly different in each model, based on the approach that best suited the model (Table 1).Soil quality data were provided as continuous values between 1 and 5 (Soilgrids dataset, Hengl et al., 2017).The protocol also provided the elevation, slope, and aspect for each of the 200 sites (ASTER Science Team, 2019), but no other spatial information such as coordinates, with the exception of iLand and aDGVM2, which required blurred coordinates to derive highly detailed soil data.
The simulations were run in the absence of natural disturbances.Only the model LPJ-GUESS had to include a "background" disturbance probability to increase chances of shade-intolerant species to establish.The simulations were set up to sample species-specific recruitment rates per decade and per hectare in the equilibrium state of the model, typically entailing a "spin-up" run (as we did not provide any forest data).The modeling teams decided on the simulated area and how they derived these samples.The exact length of the simulation was also decided by the modeling teams (Table 1).Further details on how each modeling team prepared the simulations and the outputs are available in Appendix S1: Model reports.
The simulations were run in the absence of management to a simulated equilibrium ("Potential Natural Vegetation") with the current climate, as the ultimate goal was to evaluate tree recruitment under comparable and near-equilibrium conditions.This entails the assumption that (1) the observations from the forest reserves reflect no traces of forest management and (2) there is an equilibrium between forest dynamics and climate.While the former might be starting to be visible in many of the EuFoRIa reserves, the latter may be more debatable.However, in the absence of detailed data on the history of each plot in the EuFoRIa network, some broad assumptions had to be made.Both the width of the regeneration niche (i.e., in environmental space) as well as the intensity of recruitment (i.e., the number of recruited trees per area and per unit of time) were of interest.
The simulations were run for mixed-species forests (not multiple single-species simulations) using mixtures of 11 species or genera for which regeneration data of sufficient quality were available from EuFoRIa: Fagus sylvatica L., Picea abies L., Abies alba Mill., Carpinus betulus L., Tilia cordata Mill., Acer pseudoplatanus L., Betula spp.L., Fraxinus excelsior L., Quercus spp.L., Alnus glutinosa L., and Pinus sylvestris L. The same set of species was used at all 200 sites.Two models included their standard set of species for the simulations, which is much larger (i.e., ForClim 1, ForClim 11, and TreeMig), and the species simulated beyond the 11 species were lumped in a category "others."In three models, fewer than these 11 species were simulated (4C, xComp, and LPJ-GUESS) (Table 1).In 4C, only Fagus sylvatica, Picea abies, Betula spp., Quercus spp.and Pinus sylvestris are parameterized.xComp simulations did not consider Fraxinus excelsior due to a deprecated species parameterization.In LPJ-GUESS, Acer pseudoplatanus and Alnus glutinosa are not parameterized, and therefore, these species could not be included in the simulation.Finally, the model aDGVM2 does not simulate individual species.Rather, community assembly processes and trait filtering generate plant communities that are adjusted to the local biotic and abiotic conditions, and the simulated plants can be classified into ecological strategies based on their trait values in a post-processing step.
Each of the models reported the number of new trees crossing the two diameter thresholds by sampling 200 times in a 10-year interval for each species and per hectare for each of the 200 sites.Multiple samples per site were used to better understand the simulated variation within each site.This was done using different strategies, depending on the model, including (1) sampling simulated data from the same 1-ha plot in the equilibrium over time, (2) sampling several 1-ha plots from the simulated forest at one specific point in time (in the equilibrium) or (3) a combination of ( 1) and ( 2).This resulted in 880,000 observations per model that simulated the 11 species included in the protocol (200 sites, 200 samples per site, 11 species, and two diameter thresholds).For the models that simulated additional species, their regeneration rates were aggregated as "others." Two models did not provide results from all the simulated samples or sites to avoid unrealistic results, as follows.In the model 4C, a threshold of a maximum basal area of 90 m 2 ha − 1 was used to avoid unrealistic stand basal area data, and therefore, not all the samples and sites were reported.The reason for this is that 4C is not suitable for long-term simulations without management, due to misrepresentations in density-dependent mortality processes in long-term simulations and assumptions of tree geometry that lead to unrealistic single tree dimensions of very old individuals.LPJ-GUESS had 2% of the sample outputs without tree recruitment, and these were considered as zero stand basal area and zero regeneration for all the tree species simulated; one site produced grassland rather than a forest and was not included in the results.

Data analysis
The simulation results were analyzed regarding (1) recruitment levels (i.e., ingrowth number per unit time and space), (2) recruitment species diversity, (3) recruitment mortality, (4) the relationship between model performance and model traits, and (5) ingrowth gradients along the regeneration niches.We evaluated species diversity across the models and in relation to the observed data by calculating the Shannon index H based on the relative proportion of the species in terms of basal area.It was calculated for the regeneration (H R n ) (Equation 1) and at the stand level (H S n,s ) (Equation 3).The higher the value of the index, the higher the species diversity at a particular site and sample.
where s is the total number of species present that have a basal area larger than zero in sample n; p R n,i is the proportion of species i in sample n calculated as the regeneration basal area (r BA n,i ) for that species i relative to total recruited basal area (TotalR BA n ) of the sample n; p S n,i is the proportion of species i calculated as the basal area of all trees (BA n,i ) of that species i relative to total basal area (Total BA n ) of the sample n.Species diversity was not assessed for aDGVM2 as this model does not simulate individual species.Mortality in tree recruitment was assessed based on the ratio of recruitment between the 7-and 10-cm-diameter thresholds.We used the Reineke self-thinning rule (Pretzsch & Biber, 2003;Reineke, 1933) as a reference to estimate whether the ratio of recruitment between these two thresholds was above or below the expected theoretical rate.The Reineke self-thinning rule is usually calculated for even-aged, single-species stands and is based on a fixed relationship between the number of stems and the quadratic mean diameter in fully stocked pure stands.The value used in our comparisons was 1.77 (i.e., we expect stem numbers at 7 cm to be 77% higher than at 10 cm), calculated using Equation (5).

Reineke ¼ N7 N10
− 1:605 : We assessed model performance in relation to model traits focusing on (1) model complexity as defined by Bugmann and Seidl (2022), (2) model type (empirical or process-based), (3) the presence or absence of a canopy feedback for regeneration, and (4) the scale of application of the model (stand, landscape, or global).We tested for significant differences between simulated and observed values of the recruitment levels and species diversity by using t tests with the Bonferroni correction for multiple testing.Note that the t tests do not present conclusive evidence for a specific hypothesis but rather facilitate managing the extensive number of comparisons discussed in our study.
Total recruitment and the regeneration niches of the individual species were evaluated across the gradients of light availability, temperature, and soil moisture as captured in the data.Stand basal area (the basal area of all the trees in each sample above the recruitment threshold) was used as a proxy for light availability at the forest floor, the annual degree-day sum (Allen, 1976;Fischlin et al., 1995) as a proxy for growing season warmth, and the climatic water balance as a proxy for soil moisture (Speich, 2019).Recruitment values of each species were calculated as the mean across the 200 samples per site.The observed data were modeled using a Generalized Additive Model (Wood, 2011) with a negative binomial distribution and restricted maximum likelihood to better understand the relationship between the environmental gradients and the levels of observed regeneration, relative to the simulation results per model.
In order to analyze the regeneration niches across the climatic gradients, we focused on five common tree species or genera: Abies alba, Fagus sylvatica, Picea abies, Pinus sylvestris, and Quercus spp.For these species, we calculated the share in the recruited basal area per site, where each site represents a part of the environmental space, as the mean across the available samples per site: where s is the total number of species simulated, and r t,i is the mean basal area in the recruitment (subsequently referred to as "recruited basal area") of species i at site t across the available samples at that site.Furthermore, regarding the share in the recruited basal area per species (R BA share i ), we categorized this as zero not only when the recruited basal area of that species (r BA t,i ) was zero, but also when both the total recruited basal area ( P s i¼1 r BA t,i ) and the recruited basal area of that species (r BA t,i ) equaled zero.

Recruitment levels
Simulated levels of recruitment varied strongly across the 15 models and typically did not match the levels found in the forest reserve data (Figure 2).Recruitment was overestimated in most models for both the 7-and 10-cm diameter thresholds, with the exception of the empirical stand model SIBYLA, the landscape model Landis-II, and the global model aDGVM2, which estimated recruited levels at the lower end of the plausibility interval of the observed data (Figure 2).The models with the largest overestimation were the stand model PICUS and the landscape model TreeMig.For most models, the variability of simulated recruitment levels across the 200 sites (visible from the interquartile range in the box plots of Figure 2) was similar to or smaller than observed, with the notable exception of PICUS, where simulated recruitment variability was much larger.
Both the observations and the simulated data had no recruitment in some samples and at some sites (for details, cf.Appendix S1: Table S1).The observed data had 4% of the samples with no recruitment.Only three models had a larger proportion of no recruitment (4C, Landis-II, and aDVM2).Two models always simulated recruitment for both the 7-and 10-cm thresholds (xComp and TreeMig), that is, they did not feature any zero values.The other 10 models had a very low percentage of samples with no recruitment (0.01%-2.39%), that is, they had distinctly fewer occurrences of zeros than the observations.

Tree species diversity of recruitment
Most models matched the level of diversity of the observed data quite well (Figure 3).Five models overestimated recruitment diversity: ForCEEPS, ForClim 1, PICUS, TreeMig, and LandClim, the latter particularly for the 7-cm diameter threshold.The model 4C is a special case, as it simulated five species only, that is, its diversity values are not directly comparable with those of the other models, nor to the observations.Only one model, Landis-II, consistently underestimated recruited diversity.
In most models, there were only small but significant differences in the species diversity of recruitment between 7 and 10 cm across sites (Appendix S1: Table S2).Four models (iLand, Landis-II, TreeMig, and LPJ-GUESS) maintained the recruitment diversity between the 7-and 10-cm thresholds (i.e., the differences between them were not significant; see Appendix S1: Table S2), and the same was evident from the observed data.
For both the observed and the simulated data, and both recruitment thresholds, species diversity in the recruitment compared with stand-level diversity did not feature clear patterns across the models (Appendix S1: Figures S1 and S2).Most models captured reasonably well or overpredicted species diversity at the stand level (Figure 4A, left and center; cf.Appendix S1: Figure S3), but a characteristic feature was that the simulations had a much lower variability of diversity compared with observations.Models overpredicting stand-level species diversity included ForClim1, ForClim 11, FORMIND, and TreeMig (Figure 4A, left).Several models underpredicted species diversity, that is, LPJ-GUESS, ForCEEPS(f), and 4C (Figure 4A, right).
The majority of the models overpredicted recruitment diversity; as already observed in Figure 3, only two of them underestimated it (4C and Landis-II) (Figure 4B and Appendix S1: Figure S4).Both the models with and without feedback from the adult trees to regeneration (via seed production) overestimated, underestimated, or captured reasonably well recruitment species diversity compared with the observations.Again, simulated diversity had considerably lower variability than observed diversity in the recruitment.

Recruitment mortality
There were strong differences among models regarding the mortality rate between 7 and 10 cm (Figure 5).Most of the models featured a mortality rate significantly larger than the observed data, thus at least partially compensating for the general overestimation of recruitment levels at 7 cm (Figure 2 and Appendix S1: Table S3); six models did not feature a significant difference compared with the observed data (ForCEEPS, FORMIND, ForClim 1 and 11, iLand, and aDGVM2).Four models (xComp, PICUS, LandClim, and TreeMig) featured very high mortality rates (i.e., well above the Reineke self-thinning line), which compensated for the strong initial overestimation of recruitment (cf. Figure 2).Interestingly, two models that underestimated overall recruitment levels (SIBYLA and Landis-II; cf. Figure 2) featured mortality rates that were close to but still above the self-thinning line (Figure 5).
The fact that some models (and observed data) featured lower mortality than expected by Reineke's self-thinning rule may point to facilitation, or simply a F I G U R E 3 Mean Shannon index (H) of tree regeneration, calculated by basal area, as the mean value across the 200 samples per site for the diameter thresholds of 7 and 10 cm, respectively.The red dashed lines show the 25th and 75th percentiles for the 7-cm diameter threshold in the observed data.Boxplot features are the same as in Figure 2.
higher mortality rate before the trees had reached 7 cm diameter.Yet, the case of models such as xComp, PICUS and TreeMig, whose mortality was well above the self-thinning line, indicates that they feature higher recruitment mortality compared with even-aged, single-species forests.Some models yielded a mortality rate of recruitment that is broadly compatible with the self-thinning rule (e.g., 4C and LandClim).
The three models with the largest overestimation of the proportion of recruitment at the 7-cm threshold featured the highest mortality between the 7-and 10-cm thresholds (Figure 6A).The other models that overestimated recruitment had mortalities lower than the expected self-thinning ratio, with the exception of LandClim (Figure 6B).It is noteworthy that most of the models that featured a low ratio of recruitment between 7 and 10 cm (i.e., ForCEEPS, ForClim 1, ForClim 11, and iLand) had only a small overestimation of recruitment at the 7-cm threshold (Figure 5).aDGVM2 was the only model that underestimated recruitment at the 7-cm F I G U R E 5 Ratio of tree regeneration rates between the 7-and 10-cm thresholds.Dashed blue lines mark a ratio equal to 1, indicating no decrease in tree regeneration between 7 and 10 cm, and a ratio equal to 1.77, corresponding to the Reineke self-thinning ratio under even-aged conditions.Boxplot features are the same as in Figure 2.
F I G U R E 6 Ratio of tree regeneration between the 7-and 10-cm thresholds (regeneration 7 cm/regeneration 10 cm) and the overestimation proportion at 7 cm ([simulated − observed]/observed) for the mean regeneration per model across sites and samples.The horizontal dashed lines show a ratio equal to 1, indicating no decrease of regeneration between 7 and 10 cm, and a ratio equal to 1.77 corresponding to the Reineke self-thinning line under even-aged conditions.The vertical line indicates zero overestimation.
threshold while featuring a low mortality between the two thresholds (Figure 6C), and only a few models were close to the correct recruitment levels at 7 cm while also being close to the Reineke line (SIBYLA and Landis-II; cf. Figure 6D; and 4C as well as LandClim).
Lastly, some models that had a low number of recruits at the 7-cm threshold also had a small decrease in recruitment between 7 and 10 cm (ratio 7-10 cm) (Appendix S1: Figure S5).In these models, competition and self-thinning are either not pronounced or must have occurred before the trees had reached 7 cm.However, this pattern was not consistent across models (Appendix S1: Figure S5).It is noteworthy that there is a relationship between the recruitment levels at 7 cm and the mortality rate (Appendix S1: Table S4): most of the models showed a positive effect (higher regeneration at 7 cm is coupled to an increase in the mortality between 7 and 10 cm).While some models showed a negative effect, the linear trend was not pronounced and not always significant regarding its slope.

Model performance and model traits
There was no significant relationship between the mean complexity of the regeneration module (Table 1) and the overestimation proportion at a diameter threshold of 7 cm (Appendix S1: Figure S6).All models except 4C had significant differences between the observed and simulated mean recruitment values.Therefore, it was not possible to differentiate in terms of performance between model type (empirical or process-based), or their scales (stand, landscape, or global) (Appendix S1: Table S5).The values of species diversity in the recruitment simulated by each model were also assessed against the observed data to see if there was a difference between models that include a feedback compared with those that did not; all the models except ForCEEPs(f) and FORMIND had significant differences (Appendix S1: Table S6).It is noteworthy that for the model ForCEEPS, the pattern changed from the overprediction of species diversity in the regeneration to a diversity level that is closer to observations when the recruitment module included feedback (ForCEEPs(f )).

Regeneration gradients and regeneration niches
When evaluating total tree recruitment levels along key gradients of light availability (basal area), temperature (degree-days), and soil moisture (climatic water balance), distinct features emerged: the models reproduced the effect of basal area in both its magnitude and patterns (Figure 7 and Appendix S1: Figure S7) considerably better than the effects of the climatic gradients, where they featured varying patterns.
In the observed data, recruitment levels decreased clearly with increasing stand basal area (Figure 7).This trend was captured by the Landis-II model only, albeit at much lower values of basal area than in the observations.The other models featured distinctly different trends, such as (1) an increase in recruitment levels with increasing total basal area followed by the absence of recruitment at high values of basal area (ForCEEPS, ForCEEPS(f), FORMIND, TreeMig, and LPJ-GUESS), (2) almost constant recruitment levels with basal area (ForClim 11, xComp, LandClim, and aDGVM2), or (3) an increase in recruitment up to a certain value of stand basal area followed by a decrease at even higher values of stand basal area, with model-specific thresholds (4C, ForClim 1, PICUS, SIBYLA, and iLand).
The observed recruitment did not change much across the climatic water balance gradient and showed a slight increase with temperature.Neither did the models generally match the observed recruitment patterns across these gradients (Figures 8 and 9) nor was there a clear pattern across models.Regarding the soil moisture gradient, a group of models featured decreasing recruitment with increasing soil moisture (4C, ForClim 1, ForClim11, SIBYLA, xComp, PICUS, LandClim, TreeMig, and iLand), while a few models showed the opposite trend (ForCEEPS (f), FORMIND, and LPJ-GUESS).Across the temperature gradient, all the models that featured a decrease in recruitment levels with increasing soil moisture showed the opposite trend (i.e., more recruitment with increasing temperature), except for 4C.The other models did not feature a clear pattern.
Silver fir (Abies alba), beech (Fagus sylvatica), Scots pine (Pinus sylvestris), and oak (Quercus spp.) were the species observed most frequently in the EuFoRIa data.Most models captured well the share of basal area in the recruitment of these five species compared with the observed data along the two major climatic gradients (Figure 8, cf.Appendix S1: Figure S8).However, the models differed considerably in the way in which they simulated these climatic niches.
First, some models overestimated recruitment across the entire gradients of at least one of the five main tree species (Figure 10,e.g.,ForCEEPS,ForCEEPS(f), LandClim, or LPJ-GUESS), while other models overestimated recruitment of more than one species for a large part of the climate gradients, such as xComp, PICUS, SIBYLA, or LPJ-GUESS (cf.red colors in Figure 10).These trends were consistent for both recruitment thresholds, although the exact changes in the share of the recruited basal area were different (Appendix S1: Figure S8).
Second, there was some congruence in the simulation results by species across the models.The share in the recruitment of Silver fir (Abies alba) was captured well across the gradient by most models except for xComp, PICUS, and LandClim, which overestimated the recruitment share of this species.There were also some peculiarities evident for some models, such as 4C that did not simulate Abies alba.The patterns across models for Fagus sylvatica recruitment were more complex, as its recruitment was underestimated by many models across the environmental gradients featuring negligible regeneration at most sites, while others overestimated it in different parts of the environmental space.Most models overestimated the share of Picea abies recruitment in the cold-wet part of the gradients.Pinus sylvestris and Quercus spp.represented a small share of the recruitment in the observed data mostly at warm-dry sites.Many models, however, erroneously featured recruitment for these two species along most of the climatic gradients, although with a low share of basal area.Five models (4C, FORMIND, SIBYLA, xComp, and LandClim) had very small amounts of recruitment of Pinus sylvestris and Quercus spp. or did not feature any recruitment of these species at all.
Lastly, there is no model that performed well across the five species regarding the species-specific recruitment levels in the environmental space (Figure 10).Some models (e.g., ForClim11 and PICUS) tended to consistently overestimate the recruitment share of all five species, but most models overestimated the recruitment share of some species while underestimating the share of others (e.g., ForCEEPS and LPJ-GUESS).

DISCUSSION
Tree regeneration is a fundamental process in forest dynamics.Correctly capturing it in dynamic models is fundamental to, for example, evaluate post-disturbance dynamics and potential long-term recovery trajectories as it will define the forest state in the first decades (Seidl & Turner, 2022).If this initial phase is not captured well, we cannot properly assess aspects such as forest resilience or the timing, magnitude, and progression of carbon sequestration.In this study, for the first time, projections of tree recruitment from multiple models of forest dynamics were confronted with a unique dataset from unmanaged forest reserves across a large environmental gradient in Europe.
The EuFoRIa data (Käber et al., 2023) are exceptional, particularly with respect to the number of records (number of sites and repeated measurements on unmanaged forests), which is essential for capturing a highly "noisy" process such as tree regeneration.The use of this dataset for model benchmarking provided novel insights on the ability of state-of-the-art models to accurately simulate recruitment levels and its species composition and mortality in an early stage of tree life, that is, between tree diameters of 7 and 10 cm.Overall, by adopting this approach, a much broader understanding resulted than if we had used these data for model calibration: it is primarily from the shortcomings of the models that we can gain ecological insights (cf.Trugman, 2022).

Recruitment levels
Most of the models overestimated tree recruitment levels.This has potentially far-reaching implications, for example, regarding biomass (and thus carbon) turnover, with a potential overestimation of the capacity for forest carbon sequestration (Pan et al., 2011;Pugh et al., 2019).Yet, we focused on one specific stage, that is, recruitment into the 7-and 10-cm diameter classes.Trees at this size contribute little to carbon sequestration, and if the excess regeneration at this stage is compensated-in the models-soon thereafter by higher mortality, overall simulated tree population dynamics may still be trustworthy at the level of stand structure and attributes such as basal area or biomass.
From a modeling point of view, excess tree recruitment inevitably requires excess mortality rates in a later stage, either-as observed for some modelsbetween the 7-and 10-cm diameter thresholds, or soon after the 10-cm threshold has been crossed.In any case, correcting at early stages for the expected forest densities at later stages is equivalent to compensating for a first error (excess recruitment) by a second error (excess mortality).It is highly likely that biased projections will result, because the two errors are unlikely to be perfectly linked and thus will not always compensate each other.Hence, this structural problem of most models investigated here is problematic particularly if the models are to be used under novel conditions such as under climate change (e.g., Huber et al., 2021) or in a decision support context (e.g., Thrippleton et al., 2021).
Recruitment levels define the structure and composition of future forests, but it is equally important to correctly identify areas where regeneration is lacking (Rammer et al., 2021).There are multiple constraints to the regeneration niche of tree species (Price et al., 2001), and therefore, the absence of regeneration is likely to be common (Fortin & DeBlois, 2007), even over larger areas such as the one-hectare plots used here.Tree regeneration data are characterized by zero-inflation.This was clearly evident from the EuFoRIa dataset, but some of the models did not produce zeros at all, or featured a very low proportion of zero data.This substantial difference may be due to the fact that the simulation results were drawn from equilibrium forests, whereas in reality, many of the forest reserves are recovering from past management activities and have become denser over the past decades (e.g., Heiri et al., 2009), leading to less regeneration than in an equilibrium situation.Another possible reason for this difference may be the exclusion of factors like deer browsing or failure to accurately measure establishment filters, such as competition with the herb layer and site-specific or environmental limitations.

Species composition of recruitment
Correctly capturing species composition in tree recruitment is important to assess the future functional diversity of a forest, for example, its sensitivity to drought or resilience to disturbances (Redmond et al., 2015;Seidl & Turner, 2022).In the simulations, overall species diversity levels in the recruitment were well within the observed range for half of the models (7 out of 14).Thus, while most models are facing difficulties to quantitatively match recruitment levels (cf.above), their performance is better regarding the composition of recruitment as a function of abiotic and biotic conditions.Most models maintained or even decreased recruitment diversity between the 7-and 10-cm thresholds, and the same is visible from the empirical data, but the differences between the 7-and 10-cm thresholds were not significant.Diversity variations may be explained by the way the individual models consider regeneration processes F I G U R E 1 0 Mean recruitment levels (R) of the five main species in terms of their basal area share in the observations (top row) and the performance of each model across the environmental gradients (other rows) for the 7-cm regeneration threshold.The values shown are the mean of the 200 samples per site and across the sites in each bin (tile), with 10 bins per gradient.The sizes of the circles represent the ratio between the regeneration basal area of the species and the total regeneration basal area for all species.The absence of a circle indicates a zero basal area share in the regeneration, or the absence of regeneration altogether.The color gradient (for the models only) shows the difference between the simulated and observed ratio of regeneration basal area of the species and the total regeneration basal area for all the species.(König et al., 2022), for example, regeneration can be linked to the seed/seedling bank, seed rain from dispersal or from a pool of available species.
The species diversity of the entire stand was captured better by the models than the diversity of tree recruitment, and only a few models overpredicted stand-level diversity.Defining recruitment composition as being proportional to local adult abundance, regardless of productivity, might be a simple and conservative assumption to maintain relative species abundances (Hanbury-Brown et al., 2022), although this may be simplistic.Furthermore, based on the simulation results, there is no evidence that models with feedback the canopy (in terms of species composition of regeneration) captured the species diversity of the recruitment better than those without feedback, with the exception of ForCEEPS that featured significantly more accurate recruitment species diversity with the model version that included feedback (ForCEEPS(f)).The similar performance of models with and without feedback is likely because the models put more weight on the regeneration niche arising from abiotic and biotic filters, than from the habitat niche of the adult trees (Grubb, 1977).

Recruitment mortality
There are several factors that lead to mortality during the regeneration phase, such as competition (Casper & Jackson, 1997) and multiple abiotic factors (Cunningham et al., 2006;Schmid et al., 2021).As mentioned above, it is reasonable to expect that models that overestimate recruitment at 7 cm may have a particularly high mortality between the 7-and 10-cm thresholds.However, this was not consistently evident from the simulations.This implies that these models must have an excess mortality in later stages, if we assume that all models were able to capture the structure and composition of the adult stands along the EuFoRIa gradient; this however was not tested here.
Higher mortality toward the adult phase has important implications for forest dynamics and the goods and services provided by forests.On the one hand, mortality in later stages may erroneously enhance the share of less shade-tolerant species in the models (Klopčič et al., 2015), therefore shifting the species composition.Unrealistic high stem densities for a longer period of time may overestimate the role of tree regrowth in carbon sink dynamics (Pugh et al., 2019).Given our set of simulations and analyses, we cannot conclusively assess what is happening in the models, and further studies are required that focus on a wider range of tree sizes and the fate of tree regeneration along such a size continuum.

Model performance and model traits
In spite of the critical considerations above, it is remarkable that most models did not deviate exceedingly from the observations with respect to simulated recruited levels-after all, few if any of the models' regeneration routines are well constrained by data, with the exception of the empirical models xComp and SIBYLA.It is noteworthy that in spite of their empirical basis, these two models did not match empirical recruitment levels, in a similar magnitude as the other models and even in opposite directions (SIBYLA: underestimation; xComp: overestimation).It appears that using empirical data of limited geographical scope that are constrained to managed forests, as done in these models, leads to extrapolation problems already under current climatic conditions.For the other models, whose regeneration modules are not strongly constrained by empirical data, the multiple strategies that are available for modeling regeneration processes (König et al., 2022) could have implied that model performance would be much worse than what we found.
Our study showed that increasing complexity in the regeneration modules is not linked to a higher accuracy of the projections of recruitment levels, species composition, or mortality at early tree stages, as there was no significant relationship between model performance and model complexity.Increasing complexity in regeneration modules has been called for a long time ago (e.g., Price et al., 2001) motivated by better process understanding and with the goal of enhancing model accuracy (Bugmann & Seidl, 2022).However, more complex models do may not necessarily lead to better projections but rather to reduced transparency and lower predictive power (Franklin et al., 2020).Thus, the question regarding the level of detail that is appropriate and parsimonious for modeling tree regeneration remains open (Bugmann & Seidl, 2022;König et al., 2022).

Regeneration gradients on regeneration niches
Competition for light as a strong filter for tree regeneration has been widely documented (Berdanier & Clark, 2016;Collet & Chenost, 2006;Messier et al., 1999), but the models examined here did not reproduce this expectation (i.e., decreasing recruitment levels with increasing stand basal area).However, it is difficult to measure light availability at large spatial and temporal scales.We used total stand basal area as a proxy for light availability (cf.Schmid et al., 2021).Yet, we were unable to consider light availability restrictions caused by ground vegetation, which may be an important filter for forest dynamics at least in some EuFoRIa stands (Woltjer et al., 2008).We found pronounced differences not only in the ranges of stand basal area simulated by the models, but also between models and observations.This made it impossible to evaluate tree recruitment for the extremes of the stand density ranges in some models.This is unfortunate because recruitment levels at low stand density are relevant to assess how well forests are recovering, for example, after gap creation due to disturbance (Grubb, 1977;Seidl & Turner, 2022).At the level of the simulated 1-ha samples, average basal area is typically not very low as long as no larger disturbances occur, which was explicitly excluded in our simulation protocol.Toward the other end of the spectrum, that is, with increasing stand basal area, it would be reasonable to expect that the recruitment of the different species would become sparser and drop out entirely under low-light conditions (Klopčič et al., 2012;Zell et al., 2019).However, few models showed this trend, thus indicating that the relation between regeneration and light availability is not yet captured correctly in most models.Yet, several of the models that did feature an increase in regeneration with increasing basal area include a feedback between seed production of mature trees and regeneration.Thus, it seems that in these models, higher light competition does not sufficiently compensate for increased seed availability with higher basal area due to a higher abundance of mature trees.
Lastly, there were pronounced differences how the main tree species were represented by the models along the environmental gradients, in particular the dominant species Abies alba, Fagus sylvatica, and Picea abies.The recruitment levels were sampled from simulations in the equilibrium, and in this state, it is expected that nonclimax species such as Pinus sylvestris or Quercus would be of minor importance, or absent (Klopčič et al., 2015).Most models captured this low abundance, which is also found in the empirical data.Thus, although the broad patterns are matched by many models, improvements in the quantification of the regeneration niche of the species are needed, but this cannot be done in the absence of robust datasets across multiple species.

Methodological considerations
The EuFoRIa data as used here are well suited to better understand tree regeneration.However, three aspects of these data may represent considerable limitations.First, we made a comparison of tree regeneration in an equilibrium state, but we cannot assess how close the forests included in the EuFoRIa dataset were to such a state.The data were collected in forest reserves where no management has taken place for long periods of time.This makes our assumption of an equilibrium between forest properties and environmental drivers more reasonable than it might appear at first sight.In an analysis of primeval forests at demographic equilibrium, Brzeziecki et al. ( 2021) found higher recruitment rates than the ones observed in the EuFoRia data.Thus, the overestimation of recruitment rates by the models may not be so problematic.Second, the data were collected for recruitment above a 7-cm threshold, thus limiting the assessment of tree regeneration to a specific point of stand dynamics.This constituted a hard limit based on which we can understand only some aspects of tree regeneration, which in its entirety often comprises a rather long period since seed production (Price et al., 2001).In reality, many environmental constraints are acting on young trees (Käber et al., 2021) that we were unable to assess.Yet, recruitment data with lower thresholds are simply not available in a harmonized manner across large environmental gradients.Third, the empirical data were collected from rather small plots, while we sampled simulated recruitment levels from 1-ha areas, which may lead to an incorrect representation of space.Even though the strategy we adopted is not ideal, it represents a common challenge when harmonizing diverse data sources originating from varied sampling strategies (Portier et al., 2022).It would have been extremely challenging for such a variety of models to follow a protocol where the spatial sampling size was different at each of the 200 sites, and it would have introduced additional uncertainty in the results.
The design of our sampling protocol did not include spatial aspects such as seed dispersal or detailed soil data.While we considered a wide range of models of forest dynamics, from stand to global scales, the simulation set-up was limited a spatial scale of 1 ha.This lack of consideration of spatial scale relationships is appropriate for stand-scale models, but it potentially puts the landscape models at a disadvantage, as they have been built to be accurate at the landscape level: without any spatial context, we are limiting tree regeneration to the seed influx from the stand itself, unless the model has a background seed input.Yet, the global models should not be at a disadvantage here due to their inherently limited spatial considerations (but cf.Snell et al., 2018;or Lehsten et al., 2019), as they are usually lacking dispersal between cells and are based on a strong abstraction of horizontal space (Hanbury-Brown et al., 2022).Lastly, detailed data on soil conditions were not available from the observed data, and independent, admittedly coarse data for soil properties and the climatic water balance had to be used instead.It is noteworthy that many models represent drought using detailed indicators based, for example, on soil water holding capacity, which had to be derived from a rough soil quality measure.This may at least partially explain the unsatisfactory performance of many models along the drought axis (i.e., climatic water balance).

Research recommendations
With our study, we have demonstrated that models of forest dynamics need a focus on their regeneration modules to make them more robust.It remains uncertain what level of detail is required to model tree regeneration, and this must be addressed in future research.We recommend that the improvement of the regeneration modules is implemented as additional features that can be traced back, as done here for the variants of ForClim and ForCEEPS, and that model complexity and structure must always be connected with modeling objectives (Albrich et al., 2020).If it should be necessary to include more detail in the regeneration models, this will come with higher parameterization efforts.This will most likely lead to lower generalization because the required data will have to be collected from specific locations, as currently there is no general, comprehensive regeneration dataset available.
Therefore, we further recommend that more effort should be invested into collecting harmonized datasets in a site-specific manner covering the different aspects leading to tree regeneration.We emphasize that datasets such as EuFoRIa are invaluable and should be expanded in both their spatial extent (e.g., toward boreal and Mediterranean conditions) as well as in time (e.g., continuing the monitoring into the future).Such data will allow for a better evaluation of forest models and help to reduce the uncertainty in their projections, which is crucial when they are used as tools for predicting, for example, the impacts of anthropogenic climate change.
In the present study, we have considered tree regeneration in the equilibrium state only.It is equally important to understand how these models project tree regeneration after changes in forest structure by disturbances (Seidl & Turner, 2022), or under different management strategies (Lindner et al., 2000).However, this will require an entirely different set of observed data, and potentially not all forest models would be able to assess the relationship of these aspects on tree regeneration, for example, due to the lack of disturbance or appropriate management modules.
We recommend to investigate in detail the implications of the current modeling strategies for tree regeneration and, ultimately, simulated forest stand structure.This applies particularly to the erroneous patterns of excess tree regeneration and later excess mortality, by focusing on a wider range of tree sizes and the related regeneration dynamics.We also recommend, especially for the landscape-level models, the inclusion of explicit spatial considerations regarding tree regeneration (Beckage & Clark, 2003); this, however, is a serious challenge regarding appropriate datasets, as large-scale inventory data have a wide coverage but by definition do not allow for the assessment of spatial interactions.
Exercises like the one presented here, where the models are operated in "blind flight" mode, that is, without the possibility of being tuned toward capturing the expected patterns, should be repeated.Such benchmarking exercises should next focus on aspects such as specific model traits and the ecological formulations of particular (sub-)processes, to better understand the implications of the assumptions on which the models are based.Furthermore, the inclusion of a wide range of models with different (1) scales, (2) approaches for capturing population structure, (3) tree regeneration modules, and (4) complexity of their formulations will ensure a large benefit to the entire modeling community and beyond.

CONCLUSION
Models of forest dynamics are important tools in science and decision support, and the formulation of tree regeneration has strong implications for simulated forest properties.The 15 models and variants used here are facing similar challenges in their representation of tree regeneration: they generally overestimate tree recruitment levels, and the simulated regeneration niche is not always captured accurately as a function of biotic (light) and abiotic (temperature and moisture) factors.
However, most models properly capture the diversity of the initial tree community, and differences between model formulations, for example, the presence or absence of feedback from the adult trees did not have a strong effect on capturing the species composition of regeneration.
Regarding mortality in the early phase of tree life, many models that feature a particularly high overestimation of recruitment levels are compensating for this by a larger tree mortality.Often, this compensation is not sufficient to reduce the high recruitment levels to realistic values.Overall, there is no clear mortality pattern across all models.
When capturing tree regeneration, the specific design decisions taken in the development of any model are more important for its behavior (accuracy) than scale (stand, landscape, and global), modeling approach (empirical vs. process-based), and complexity.Having both empirical and process-based models in our set, the empirically based models could have been expected to have a better performance, as they were calibrated with inventory data, but this was not the case.Similarly, higher model complexity does not represent an improvement for capturing tree regeneration.
Even though the regeneration routines of most of the models investigated here have never been constrained well by robust data, their projections of recruitment are not overly off.This indicates that a lot can be gained by a focus on the modeling of regeneration processes.The representation of forest dynamics in these models would become much more robust particularly in the face of climate change and post-disturbance dynamics, thus strongly reducing the uncertainty in long-term projections of future forest dynamics.

AUTHOR CONTRIBUTIONS
Olalla Díaz-Y añez and Yannek Käber contributed equally and share the first authorship.Harald Bugmann, Yannek Käber, and Olalla Díaz-Y añez developed the study design.Yannek Käber and Olalla Díaz-Y añez prepared the input data used by all modeling teams.All authors contributed by conducting the simulations with their respective models.Olalla Díaz-Y añez led the analysis of the simulation results with contributions during the workshop from Yannek Käber, Tim Anders, Kristin H. Braziunas, Josef Bru ˚na, Samuel M. Fischer, Jessica Hetzer, Thomas Hickler, Heike Lischke, Mats Nieberg, Paola Mairota, Katarina Merganičov a, Tobias Mette, Xavier Morin, Werner Rammer, and Daniel Scherrer.Olalla Díaz-Y añez and Harald Bugmann led the writing of the manuscript and provided the first draft.All authors participated in the revision of the manuscript and approved its submission.

F
I G U R E 2 Mean regeneration levels (R) across all samples per site, plotted for the 200 sites and for each model.The red dashed lines show the 25th and 75th percentiles for the 7-cm diameter threshold in the observed data.There are two boxplots for each model: the left lighter boxplot corresponds to 7 cm and the right darker boxplot to 10 cm.The boxplot shows the median (midline), interquartile range (box limits), largest and smallest values within 1.5 times interquartile range (whiskers); larger values are represented as outliers.

F
I G U R E 4 Mean Shannon index (H) across all samples per site for observed and simulated data.Each plot shows one pattern represented by one exemplary model of each category (overpredicted, intermediate, and underpredicted).The full data with the grouping of the models are shown in Appendix S1: Figures S3 and S4.n indicates the number of models falling in each group.(A) Examples for the three trends across models for species diversity at the stand level.(B) Examples for the three trends across models for regeneration at the 7-cm threshold (165 sites).

F
I G U R E 7 Mean recruitment (R) across the 200 samples per site with the y-axis scaled differently by model, for the 200 sites against gradients of total basal area.The values were split into 10 bins; the red lines represent a Generalized Additive Model showing the trend in the observed data.

F
I G U R E 8 Mean recruitment (R) across the 200 samples per site with the y-axis scaled differently by model, for the 200 sites against gradients of climatic water balance.The values were split into 10 bins; the red lines represent a Generalized Additive Model showing the trend in the observed data.