How do trees respond to species mixing in experimental compared to observational studies?

Abstract For decades, ecologists have investigated the effects of tree species diversity on tree productivity at different scales and with different approaches ranging from observational to experimental study designs. Using data from five European national forest inventories (16,773 plots), six tree species diversity experiments (584 plots), and six networks of comparative plots (169 plots), we tested whether tree species growth responses to species mixing are consistent and therefore transferrable between those different research approaches. Our results confirm the general positive effect of tree species mixing on species growth (16% on average) but we found no consistency in species‐specific responses to mixing between any of the three approaches, even after restricting comparisons to only those plots that shared similar mixtures compositions and forest types. These findings highlight the necessity to consider results from different research approaches when selecting species mixtures that should maximize positive forest biodiversity and functioning relationships.


| INTRODUC TI ON
The provisioning of ecosystem services beneficial to human wellbeing strongly relies on plant diversity .
Decreases in primary producer diversity can impact ecosystem functioning and decrease ecosystem productivity and stability Hooper et al., 2012), a phenomenon especially well studied in grassland ecosystems (e.g., Isbell et al., 2015;Reich et al., 2012;Tilman et al., 1997) where log species richness and log productivity are often linearly related (Craven et al., 2016;Hector et al., 1999;Tilman et al., 1997). In forest ecosystems, systematic research on the effects of species mixing on wood production dates back to the foundations of modern forestry (Hartig, 1791). Current global synthesis studies concluded that, across the different forest biomes, a positive relationship between tree diversity and stand productivity prevails (Liang et al., 2016;Scherer-Lorenzen, 2014;Zhang, Chen, & Reich, 2012).
The relationship between tree diversity and productivity has already been studied using different research approaches (Table 1), starting with the analysis of forest inventories (Hartig, 1791;Schwappach, 1912;Wiedemann, 1943), followed by silvicultural trials and tree diversity experiments (Bruelheide et al., 2014;Koricheva, 2002;Pretzsch, 2005;Scherer-Lorenzen et al., 2005;Tobner, Paquette, Reich, Gravel, & Messier, 2014;Verheyen et al., 2016) and more recently by the selection of comparative plots in mature forests (Baeten et al., 2013;Bruelheide et al., 2011;Fischer et al., 2010). Forest inventories usually cover large numbers of uniformly distributed plots across multiple forest types and large environmental gradients. Tree diversity experiments, in contrast, consist of spatially restricted, replicated plantations of different tree species compositions and levels of tree species diversity and have minimal variation in environmental conditions. Comparative study plots (Bruelheide et al., 2011) or "exploratories" (Fischer et al., 2010) consist of survey plots within mature forests selected to contain replicated levels of tree species diversity and compositions while at the same time controlling for differences in community structure and environmental conditions. They can thus be regarded as an intermediate approach that combines aspects of forest inventories and tree diversity experiments.
Regardless of the approach applied, most previous research on forest diversity-productivity relationships focussed on the effects of tree species diversity on the productivity of the community (e.g., Homeier, Breckle, Günter, Rollenbeck, & Leuschner, 2010;Jucker et al., 2016;Liang et al., 2016;Paquette & Messier, 2011;Ruiz-Benito et al., 2014;Vilà et al., 2013). In theory, any positive effect of species diversity could stem from either positive interactions between the co-occurring species (complementarity effects, Loreau & Hector, 2001) or from the admixing of one or few exceptionally productive or dominating species (selection effects, Loreau & Hector, 2001). Depending on the forest ecosystem, species-specific growth responses to increasing tree diversity can be consistently positive (Chamagne et al., 2017;Liang et al., 2016) or variable, depending on the species and context (Baeten et al., 2019;Jucker, Bouriaud, Avacaritei, Dănilă, et al., 2014;Ratcliffe, Holzwarth, Nadrowski, Levick, & Wirth, 2015;del Río et al., 2017;Tobner et al., 2016). It is unclear to what extent these differences in species responses to tree diversity are caused by differences in species-specific characteristics (Fichtner et al., 2017;Williams, Paquette, Cavender-Bares, Messier, & Reich, 2017) or differences in study design. Comparing species-specific responses to mixing between the different research approaches could help to determine which species generally benefit, suffer, or show divergent responses to increases in tree species diversity. Restricting these comparisons to only the set of tree species and forest types that are shared between research approaches should furthermore reduce the confounding effects of species compositions and large scale environmental context-dependency and leave mainly the effects of local environmental context-dependency and differences in stand structure.
In the FunDivEUROPE research network (functional significance of forest diversity in Europe, Baeten et al., 2013), all three previously described approaches (experiments, exploratories and inventories) were applied throughout Europe to study the effects of tree diversity on forest ecosystem functioning. The three approaches partly overlap in their species pools, although there are differences in species compositions as well as successional, structural, climatic and edaphic plot conditions. Syntheses across all three approaches can thus be applied to test whether most tree species respond consistently to species mixing. Identifying tree species that display consistent responses between different approaches and different forest types would furthermore allow the isolation of general patterns from context-dependent effects.
With this study, we provide a first comparison of the growth response of a large set of tree species to species mixing across three distinct research approaches (tree diversity experiments, networks of comparative plots and forest inventories). We tested the following hypotheses: (H1) across all species and research approaches, tree species growth is higher in mixed than in monospecific tree communities, (H2) across all species and research approaches, the effect of tree species mixing on species growth linearly increases with the logarithm of the number of admixed tree species (two, three or higher species mixtures), and (H3) species' aggregated responses to mixing are correlated between different research approaches. We furthermore hypothesized that species' responses to mixing will become more consistent between the three research approaches, if we compare only matching species compositions. (H4). The findings of TA B L E 1 Summary of the advantages, disadvantages, and exemplary findings on the relationship between tree species diversity and tree growth or stand-level biomass production in three different research approaches. Figures depict the characteristics of the research approaches: representativeness (i.e., the anticipated transferability of the findings to existing forests), comprehensiveness (i.e., the number of ecosystem functions and properties that can be feasibly quantified), and orthogonality (i.e., the ability to quantify the effect of tree diversity against a background of variation); Figures are based on Nadrowski et al. (2010) and Jucker et al. (2016) (Verheyen et al., 2016), www.treed ivnet.ugent.be Positive (Pretzsch, 2005;Fichtner et al., 2017;Erskine, Lamb, & Bristow, 2006;Potvin & Gotelli, 2008;Haase et al., 2015) Nonsignificant (Tobner et al., 2016;Nguyen, Herbohn, Firn, & Lamb, 2012;Guo & Ren, 2014) Negative (Firn, Erskine, & Lamb, 2007) Comparative forest plots (exploratories) Positive (Liang et al., 2016;Paquette & Messier, 2011;Vilà et al., 2013;Ruiz-Benito et al., 2014Ratcliffe et al., 2017;Madrigal-González et al., 2016;Guo & Ren, 2014;Vilà et al., 2007) Nonsignificant (Szwagrzyk & Gazda, 2007;Moser & Hansen, 2009;Long & Shaw, 2010;Vayreda, Gracia, Canadell, & Retana, 2012) Hump-shaped (Gamfeldt et al., 2013) Negative (Mina, Huber, Forrester, Thürig, & Rohner, 2017) this study should deepen our understanding of the species, environmental conditions, and research designs for which consistent positive diversity-ecosystem functioning relationships can be expected.

| ME THODS
Within the framework of the European FunDivEUROPE project (www.fundi veuro pe.eu), the significance of forest biodiversity for ecosystem functioning across Europe was investigated with three complementary research approaches (tree diversity experiments, networks of comparative plots in established forests, and forest inventories). All approaches share a similar subset of tree species and forest types and were established in regions with similar climatic conditions (see Appendices S1-S4 and Baeten et al., 2013).
The approaches differed in how well they represented existing mature forests, the comprehensiveness of the studied tree species and environmental gradients and the extent to which potentially confounding effects could mask the effects of tree species diversity ("orthogonality", see Table 1, Figure 1 and Nadrowski, Wirth, & Scherer-Lorenzen, 2010).

| Research approaches
The experimental research approach contained growth measure- conditions (e.g., geology, soil texture and depth and topography) and tree species richness and composition. The study design as well as the forest characteristics and tree species compositions are described in Appendices S1-S4 and in Baeten et al. (2013). Within each plot, all trees with a dbh of more than 7.5 cm were mapped and identified. From a subset of trees, wood core samples were taken and,  (26), and thermophilous deciduous forest (33). We calculated for each plot, the proportion that was covered by each tree species and classified each plot as either a monospecific, two, three or higher species mixture, where the most dominant species must cover more than 90% and none of the "nondominant" species more than 10% of a plot's summed basal area.
The inventory research approach contained harmonized forest plots from five national forest inventories (Finland, Sweden, Germany, Belgium-Wallonia, and Spain) that had been surveyed at least twice. Details can be found in Appendix S5 and in . In short, for all trees with a dbh of 10 cm or more, we extracted the tree status (ingrowth, survivor, dead due to natural mortality or harvesting) and basal area (expressed as m 2 /ha) from the two most recent survey dates. We discarded all plots with indications of harvesting activities between survey dates. Tree species names were harmonized following the Atlas Florae Europaeae (Kurtto, Sennikov, & Lampinen, 2013). Within each plot, we calculated the proportion of total basal area that was belonged to each tree species. Analogous to the exploratory approach, we classified each plot as either a monospecific, two, three or higher species mixture. After discarding all plots that did not meet these criteria, we retained 47,754 plots in the inventory dataset (see Appendix S4 for a more detailed description of the classification criteria).

| Environmental data
For each plot of the three research approaches, we extracted mean annual temperature, temperature seasonality (standard deviation of mean monthly temperatures), annual precipitation, and precipitation seasonality (standard deviation of mean monthly precipitation) from the WorldClim dataset (interpolated from measurements taken between 1960 and to 1990 and at a spatial resolution of one square kilometer, Hijmans, Cameron, Parra, Jones, & Jarvis, 2005) and the slope from the GTOPO30-digital elevation model with a spatial resolution of one square kilometer (data available from the U.S. Geological Survey).

| Data preparation
For each plot of the experimental, exploratory and inventory approach, we calculated for every target/dominant species the yearly summed increase in basal area, dbh, tree height, or diameter at ground height (based on the respective growth measurement).
These summed growth estimates were divided by the number of trees in the experiments and by the summed basal area (m 2 ha -1 ) of the respective tree species in the exploratory and inventory approach to obtain growth estimates (hereafter "species growth") that are not biased by potentially uneven species proportions.
Within each forest type and tree diversity experiment, we quantified the effect of species mixing on species growth as the mean log response ratio, defined as species growth in mixed divided by species growth in monospecific plots of comparable stand conditions (i.e., within the same dataset and forest type). In the exploratory approach, no monospecific plots of Acer pseudoplatanus L.
were found in the beech forest and no monospecific plots of Betula spec. and Quercus robur L. were found in the hemiboreal forest. For these three species, we could not calculate the effect sizes in the respective forest types which, thus, reduced our exploratory dataset to 169 plots.
In the inventory approach, mixed and monospecific plots within the same forest type could differ considerably in stand conditions (e.g., in climate, tree community structure, and edaphic conditions).
To partly control for these potentially confounding differences, we first assigned pairs of monospecific and mixed plots that were most similar regarding stand and environmental conditions and subsequently calculated the effect size for each pair of plots. The dissimilarity in stand and environmental conditions was quantified as the Euclidean distance in normalized plot-level values (i.e., subtracted by the mean and divided by the standard deviation) of mean annual temperature, temperature seasonality, annual precipitation, precipitation seasonality, slope and the sum and coefficient of variation of trees' basal area (m 2 /ha). The latter two were included in order to account for potential effects of stand age and evenness (e.g., Zhang, Chen, & Reich, 2012 percentile of all distances ( Figure S9). The locations of the remaining 16,773 plots are shown in Figure S6. All plots were assigned to one of the following forest types, listed in the EEA Technical Report In order to narrow down the comparisons of mixing effects to only those tree species and community compositions that were shared between the three approaches, we created three data subsets that included only those species and mixtures that were present in two datasets, that is, (a) the experimental and exploratory, (b) the experimental and inventory, and (c) the exploratory and inventory approach (Table S4).

| Statistical analysis
Separately for each tree diversity experiment and each forest type within the exploratory or the inventory dataset, we calculated for every tree species the separate mean log response ratio (hereafter "effect size") of the species' growth in either all 2, 3 or higher species mixtures divided by the growth in the respective monospecific plots of that forest type/diversity experiment. The whole data preparation procedure up to the point of the calculation of effect sizes is briefly summarized in Appendix S8.
We tested hypothesis H1 (i.e., a general positive effect of tree species mixing on species growth) by testing for significance of the grand mean effect size (i.e., the intercept) with a linear random-effects model. The model included effect sizes as the dependent variable and the identity of the experiment/forest type and, in the case of the inventory approach, the countries of the compared plots, as random effects. In the national forest inventory dataset, certain species could have multiple effect sizes within the same forest type and species richness level (because we did not pool effect sizes between different countries). Those multiple effect sizes were assigned an accordingly lower weight in the following linear model (calculated as one divided by the number of multiple effect sizes). The resulting grand mean effect size was deemed significant, if the approximated 95% confidence interval (intercept ± 1.96 × SE) did not include zero.
We tested the differences between approaches by including the research approach as a categorical predictor variable in the mixed-effects model.
Hypothesis H2 (i.e., a positive effect of log species richness on the species' mean log response ratios) was tested with linear mixedeffects models that included the effect sizes as the dependent variable, log species richness as the predictor variable and the identity of the forest type or experiment and, in case of the inventory approach, the countries of the compared plots as a nested random effect. In contrast to the model applied to test H1, we assigned equal weights to all effect sizes. In the inventory approach, we weighted effects sizes by the inverse of the number of effect sizes for the same species in the same forest type (this number could vary when plots from different forest inventories were assigned to the same forest type). In order to test hypothesis H3 (i.e., the consistency in speciesspecific responses to mixing across the research approaches), we fitted separate mixed-effects models per approach (for the experimental, exploratory, and inventory approach, respectively). These models included the identity of the tree species as a predictor variable and the random-effects structure was adapted from the model that was applied to test H1. The intercept of each model was set to zero. From each model, we then extracted the coefficient estimates for the respective tree species included. The consistency in species responses was then assessed by testing the significance of the rank-based correlation coefficients (Kendall's tau) between the coefficient estimates of species that were shared between different approaches (separately for the experiments-exploratories, experiments-inventories, and exploratories-inventories comparison).
Hypothesis H4 (i.e., the proposed increase in the consistency of species responses to mixing when the comparisons of approaches were restricted to only those community compositions and forest types that are shared between the approaches) was tested analogous to H3, but this time based on datasets restricted to tree species occurring in the same compositions and forest types in the compared research approaches (listed in Table S4). The obtained Kendall's tau values were then compared to the tau values that were obtained from the unrestricted datasets.

| RE SULTS
(H1) When calculated across all three research approaches (experiments, exploratories, and inventories), the grand mean effect size of species mixing (i.e., the average log response ratio of species growth in mixed compared to monospecific plots) was significantly positive (approximated 95% confidence interval: 0.05-0.25). On average, species showed 16% higher growth in mixed compared to monospecific plots. When calculated separately for each research approach, both the inventory and exploratory dataset yielded significantly positive mean effect sizes (on average, species growth was 27% and 20% higher in mixed compared to monospecific plots of the exploratory and inventory approach, respectively, Figure 2), whereas the mean effect size of the experimental approach was nonsignificant (on average, species growth was 1% higher in mixed compared to monospecific plots, Figure 2). In the experimental approach, none of the mean effect sizes (average species log response ratios) of the individual diversity experiments was significantly different from zero.
In the exploratory approach, significantly positive mean effect sizes were found in Mediterranean coniferous, thermophilous deciduous, and boreal forests. In the inventory approach, significantly positive mean effect sizes were found in beech, thermophilous deciduous, alpine, Mediterranean coniferous, boreal, and mountain beech forests.

| D ISCUSS I ON
In this study, we compiled tree growth data from three European Our results further suggested that species mixing mostly benefitted those species that grew in forest types with relatively cold (boreal and alpine forests) or hot climates (Mediterranean coniferous and thermophilous deciduous forests). These observations are in line with an analysis of an eastern Canadian forest inventory dataset that likewise found stronger positive effects of tree diversity on stand productivity in boreal as compared in temperate forests (Paquette & Messier, 2011). Together, these findings broadly support the stressgradient hypothesis, stating that positive interactions prevail in more stressful conditions (e.g., cold or dry), resulting in higher relative diversity effects than in more benign conditions (Forrester & Bauhus, 2016). We found consistent species responses to mixing between the exploratory and inventory approach only for those three forest types with the most stressful climatic conditions. However, for the remaining three forest types that were shared between both approaches and found in intermediate conditions, we found no consistency in the significance or even direction of the mixing effect. This limited transferability of mixing effects between approaches, already indicated that scaling of diversity effects across approaches might problematic.
Consequently, we found that species-specific responses to mixing were largely inconsistent between all three approaches, even after restricting the datasets to plots of only those species compositions and forest types that were shared between the different approaches. These observed inconsistencies likely resulted from unaccounted but influential drivers of forest diversity and functioning relationships, like tree density, size heterogeneity, and successional status (Lasky et al., 2014).
In accordance with a recent global meta-analysis (Duffy, Godwin, & Cardinale, 2017), we found tree diversity effects on productivity to be generally stronger in natural as compared to experimental F I G U R E 3 Comparison of tree species mean effect sizes (log response ratios) of growth in mixed compared to monospecific plots obtained from three different research approaches (experimental, exploratory, and inventory approach). Depicted are the mean effect sizes of only those species that were shared between the compared research approaches ( (Table S2) study designs. We must point out that the tree diversity experiments included in this study were not planted to represent mature forests, but to isolate the effects of tree species richness and functional diversity on ecosystem functioning. Since those experimental forests were still in juvenile phases they usually lacked successional trajectories that lead to the replacement of underperforming species.
Tree diversity experiments might therefore still harbor maladapted species that could not compete in mature forests with a similar climate. In the inventory dataset, however, trees were usually planted and managed to maximize wood production and financial return. We tried to minimize, but could not rule out the effects of local plot conditions on tree productivity. A number of plots might display both, a higher productivity and a higher tree species richness, simply because of the prevailing favorable climatic and edaphic conditions.
Differences in the climatic conditions can generally lead to different forest biodiversity-productivity relationships (Paquette & Messier, 2011;Jucker et al., 2016;Ratcliffe et al., 2017). Although With the approach applied in this study (i.e., the comparison of mean species growth between mixed and monospecific plots), we could not account for the potentially confounding differences in tree sizes and especially the interaction with prevailing climatic conditions.
Herbivore pressure is another factor that likely varied between the three approaches. Except for the Satakunta site, all tree diversity experiments were fenced to exclude game species and safeguard the successful establishment of all planted trees. In the inventory, and even more in the exploratory approach, the juvenile trees are exposed to pressure by game species, which are known to be affected by tree species richness (Milligan & Koricheva, 2013;Ohse, Seele, Holzwarth, & Wirth, 2017).
The effects of tree diversity on forest functioning are scale-dependent, meaning that significance can change with the size of the surveyed forest plots (Wang et al., 2016). Inconsistencies in speciesspecific responses could thus partly result from differences plot size and spatial extent between the compared research approaches.
In summary, all of the proposed factors might have contributed to the inconsistency of species-specific responses to mixing between tree diversity experiments and established forests. On the one hand, these results impede clear recommendations for forest owners on how to jointly maximize forest diversity and productivity. On the other hand, our results unequivocally demonstrated that not even one of the 64 investigated tree species generally suffers from species mixing. Beside the hemiboreal forests in the inventory approach, most tree species were, on average, either not significantly or even positively affected by species mixing. We thus concluded that many, if not most, monospecific stands can be diversified without negative or with positive effects on wood production.
Future research will be needed to answer (a) what are underlying causes that lead to different diversity-functioning relationships between observational and experimental research approaches and (b) what are the species-specific abiotic and biotic requirements that maximize the productivity in mixed and monospecific communities.
These findings will be essential to devise forest management practices that can maximize synergies between wood production and the safeguarding of forest diversity in Europe (Chamagne et al., 2017).

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
Information on the availability of the National Forest Inventory datasets can be found on the following websites: Finnland -www.metla.