Global assessment of relationships between climate and tree growth

Tree‐ring records provide global high‐resolution information on tree‐species responses to global change, forest carbon and water dynamics, and past climate variability and extremes. The underlying assumption is a stationary (time‐stable), quasi‐linear relationship between tree growth and environment, which however conflicts with basic ecological and evolutionary theory. Indeed, our global assessment of the relevant tree‐ring literature demonstrates non‐stationarity in the majority of tested cases, not limited to specific proxies, environmental parameters, regions or species. Non‐stationarity likely represents the general nature of the relationship between tree‐growth proxies and environment. Studies assuming stationarity however score two times more citations influencing other fields of science and the science–policy interface. To reconcile ecological reality with the application of tree‐ring proxies for climate or environmental estimates, we provide a clarification of the stationarity concept, propose a simple confidence framework for the re‐evaluation of existing studies and recommend the use of a new statistical tool to detect non‐stationarity in tree‐ring proxies. Our contribution is meant to stimulate and facilitate discussion in light of our results to help increase confidence in tree‐ring‐based climate and environmental estimates for science, the public and policymakers.


| INTRODUC TI ON: TREE S A S G LOBAL ENVIRONMENTAL ARCHIVE S
The Earth is home to over three trillion trees (Crowther et al., 2015), which constantly record environmental information in their cell structure, annual ring width, density and isotopic composition of the wood. Due to their long lifespan of decades to millennia and with forests covering about 30% of the world's land surface on six continents (MacDicken, 2015), trees have become a globally important archive of environmental information (Figure 1). Tree-rings contribute to our | 3213 understanding of past and contemporary forest carbon and water dynamics (Babst et al., 2014;Frank et al., 2015), mortality events (Cailleret et al., 2017;Park Williams et al., 2012), late Holocene climate variability and its societal impacts , and responses of forest ecosystems and tree species to global climate change (Charney et al., 2016). Tree-ring-based assessments of ecosystem health and past climate variability have strongly contributed to every Intergovernmental Panel on Climate Change (IPCC) assessment report and most likely will strongly contribute to the Sixth Assessment Report in 2021.
Information about climate dynamics from tree-rings is based on a transfer function between tree-ring parameters (the so-called proxy) and a climate or environmental driver of growth. In short, tree-ring parameters (e.g., ring widths, maximum latewood density or isotopic composition of the wood) are mathematically transferred into reconstructions of past climate, ecosystem functions and range dynamics of tree species. This process of reconstruction nearly always assumes an approximate linear relationship between the tree-ring proxy and target environmental driver(s), either as the result of interacting non-linear processes cancelling each other out (Cook and Peterson in Hughes, Swetnam, & Diaz, 2011) or if all other confounding factors have been removed (Hughes, Kelly, Pilcher, & LaMarche, 1982). Crucially, this linear relationship is presumed to be stable through time. This is called the stationarity assumption (National Research Council, 2006).
However, it has been recognized that trees themselves are neither thermometers nor rain gauges (Briffa, Jones, Schweingruber, Karlén, & Shiyatov, 1996), and that tree growth patterns can never explain 100% of the variance in a target climate variable. In fact, the explained variance rarely exceeds 60%-70% and is more often in the range of 30%-50% (Esper et al., 2016;St. George, 2014). To maximize the explained climate variability and achieve a strong regional climate signal, tree-ring studies rely on high sample replication to average out noise, careful site and tree selection to ensure that selected trees are highly sensitive to climate (but see Stine & Huybers, 2017), and statistical treatments of varying complexity (Fritts, 1976).

| TREE S LOS E TR ACK : RECENT DECRE A S ING S ENS ITIVIT Y OF TREE-RING ARCHIVE S
Tree growth at high-latitude treelines is often limited by low temperatures and short growing seasons leading to high temperature sensitivity. Therefore, tree-ring data from these sites have been important sources of information on past temperature variability. Over the last two decades, however, research has found that tree-ring data from high-latitude treelines in the Northern hemisphere have failed to capture the recent post-1970 warming trend (D'Arrigo, Wilson, Liepert, & Cherubini, 2008), and have instead exhibited a general loss of temperature sensitivity or instabilities in the associations between climate and tree growth-a phenomenon that has since been reported in various sites and species around the globe Briffa et al., 1998;Carrer & Urbinati, 2006;Hofgaard et al., 2019;Leonelli, Pelfini, D'Arrigo, Haeberli, & Cherubini, 2011;Schurman et al., 2019;Visser, Büntgen, D'Arrigo, & Petersen, 2010;Wilmking, D'Arrigo, Jacoby, & Juday, 2005). In other words, trees at these sites changed track, and their growth patterns began to diverge from temperature parameters.
These terms are however generally used without a definition and it is often unclear if they denote a statistically significant deviation from the stationarity assumption (for a definition of stationarity, see Box 1).

F I G U R E 1
Location of all sites analyzed in studies investigating climate sensitivity of tree-rings (a), testing for non-stationarity of this sensitivity (b) and detecting it (c). Non-stationarity is evident at global scale BOX 1 Definition of stationarity-Stationarity test Stationarity (National Research Council, 2006), our addition in italics: The statistical relationship between the proxies and the climate variable is the same throughout the calibration period, validation period and reconstruction period or across specific sub-periods of the common overlap period of proxy and climate data.
While it is impossible to verify the stationarity assumption in the reconstruction period outside the instrumental data coverage, testing whether the statistical relationship between proxy and climate variable is the same in two (or more) sub-periods of the overlap period of instrumental and proxy data is commonly done in climate sensitivity or reconstruction studies. The big question however is how to determine if regressions are 'the same' (see definition above).

Stationarity test:
We promote the use of the Bootstrapped Transfer Function Stability (BTFS) test (Buras, Zang, & Menzel, 2017) as one new statistical tool to test for stationarity ( Figure 2). Since each regression is characterized by three parameters (intercept, slope and r 2 ), the BTFS simply compares bootstrapped estimates of the model parameters between different sub-periods. To test for significant differences between the regressions, BTFS compares the bootstrapped significance of corresponding models. No overlap signifies significant differences and therefore a failed stationarity test. A special case that is tested for in BTFS is related to the significance of the period-based regressions. If at least one of the bootstrapped regression parameters is non-significant (i.e., regression slope of the respective window is on average not significantly different from zero), the regression is rendered problematic for reconstruction purposes. For testing, we recommend a minimum window size of 30 years to establish a regression, similar to the 30-year period used to establish climate normals, and a common climate-proxy period of at least 2 × 30 = 60 years.
F I G U R E 2 BTFS tests whether relationship between proxy and climate driver is stationary across two time periods (blue-green and brown dots and regression lines, respectively). If intercept, slope, r 2 and their respective 95% confidence intervals overlap, stationarity can be assumed (a). If the intercept varies significantly, non-stationarity is detected, which could lead to different means between a calibration and a reconstruction (b). If the slope varies significantly, non-stationarity is detected, which could lead to different amplitudes of calibration and reconstruction (c). If the r 2 differs significantly, non-stationarity is detected, which could lead to differing confidence intervals between calibration and reconstruction (d)

| 3215
Non-stationarity has been interpreted in two ways: (a) The stationarity assumption is generally valid, and its observed violation is due to increasing noise over time, low-quality climate data (especially early in the record), incorrect statistical treatment of the tree-ring data or a combination thereof (Esper & Frank, 2009;Frank, Büntgen, Böhm, Maugeri, & Esper, 2007;Hughes et al., 2011;Wilson et al., 2007) and (b) a general stationarity assumption is not valid and growth responses of trees to climatic or environmental drivers are 'by nature' non-linear and variable through time (Smith, 2008;Stine & Huybers, 2017).
If for any reason the stationarity assumption does not hold or is violated, tree-ring-based climate reconstructions may (a) lead to incorrect estimates of past temperature trends, extremes and amplitudes, drought severities, river discharge, or snowpack variability, and therefore (b) potentially impede our ability to put recent climatic change into a long-term perspective (Esper et al., 2016); (c) con-

| G LOBAL SURVE Y OF TREE G ROW TH-CLIMATE REL ATIONS HIPS
To contribute to the discussion, we surveyed the published record investigating climate or environmental sensitivity of tree-rings all across the globe. Our dataset encompasses 1965 scientific papers published between 1945 and 2015, representing >50,000 citations.
We sought answers to the following questions: (a) Is stationarity only assumed or actually tested for? (b) If non-stationarity exists, are all tree-ring proxies and environmental drivers affected? (c) If nonstationarity exists, do obvious spatial patterns exist? (d) What is the balance in scientific impact between studies assuming stationarity and studies testing for it?
Studies investigating tree-ring sensitivity to climate or environment covered 6,054 sites in 94 countries on all continents (excluding Antarctica) and within all major climate zones (Figure 1), using 477 woody species. Annual tree growth (as ring width, density, isotopic composition or anatomy) was tested against >30 climate or environmental variables with mainly monthly resolution. Due to the large spatial coverage of our study in combination with the diversity of proxy-environment interactions, we believe that our dataset allows insight into how fundamental relationships between climate or environmental drivers and the growth of woody plants are investigated by the scientific community.
We found that about 2/3 (n = 1,269) of all studies published from 1945 to 2015 did not test for stationary relationships between climate and tree growth, that is, did not validate a transfer function across time ( Figure 3a). The remaining 1/3 (n = 696) of studies tested for stationarity ( Figure 1b), with more than half (56%) of these studies reporting non-stationarity (changes in trees' sensitivity to climate over time; Figure 1c). In our analysis, we have used the original author's assessment of non-stationarity to provide a community-based stationarity assessment, while our recommendations concerning stationarity definitions and tests are outlined in Box 1. The result that a substantial fraction, actually over half of the cases testing for stationarity reported non-stationarity leads F I G U R E 3 Complete survey of papers investigating climate or environmental sensitivity of tree-rings: Doughnut charts depict percentage of papers testing for stationarity between tree-ring proxy and climate/environment (light brown) in all papers (a) and the subset of actual climate reconstructions (b). Results of stationarity tests depicted in outer blue-green shaded semi-circles. Different shades of brown and blue-green indicate the answer categories for stationarity tests and signs of stationarity. Some reconstruction studies (b) tested multiple proxies or target climate variables and found non-stationarity in some but not all relationships ('not all') us to question the validity of a general stationarity assumption and supports the viewpoint of (a) a more cautious approach when using tree-rings for climate or environmental reconstructions and (b) a more complex and dynamic interplay between trees and the factors influencing their growth.
In papers presenting actual climate reconstructions, about half (49%) tested for stationarity ( Figure 3b). Of those testing for stationarity, about 18% identified non-stationary climate-growth relationships, a far lower share than in the general category, as expected.
However, even though non-stationarities existed, the studies inferred past climatic conditions. Taken together, 37% of all tree-ring-based reconstructions in our survey are based on tested stationary relationships between climate and tree growth and the remaining 63% therefore potentially include certain bias of past climate variability.

| Tree-ring proxies
To detect potential biases in our results, we explored the available meta-data. First, we analyzed whether testing and detection of nonstationarity varied across different tree-ring proxies. Generally, results were similar across proxies ( Figure S1), with a slight tendency for more stationarity tests in wood density studies (47.4% vs. 36.3% for TRW and 42.4% for isotopes, respectively). Non-stationarity was detected in more than half of all cases (55.9%), less in isotope and wood density studies (48.9% and 47.0%, respectively), which were numerically strongly underrepresented.

| Climate or environmental parameters
Next, we scrutinized possible bias introduced by frequently studied climate or environmental parameters. The proportion of studies testing stationarity was roughly similar across parameters, ranging from 34% for precipitation to 49% for temperature. All parameters were affected by non-stationarity to varying degrees ( Figure S2).

| Spatial patterns
To test for possible regional bias, we divided all studies according to their location in the main climatic zones. The proportions of papers testing stationarity were generally similar between climatic zones and signs of non-stationarity were detected throughout ( Figure S3).
The number of case studies identifying non-stationarities varied between roughly 25% and 60% per climatic zone with less cases in the A and B climates, which were numerically underrepresented. Results did not seem systematically biased by climatic zones, but we highlight the need for additional tests, especially in highly underrepresented tropical areas.

| Tree species
We found that the proportion of papers testing stationarity was similar across all species, but the results of testing did vary by species ( Figure S4). Non-stationarities were detected for every species analyzed, with detection rates ranging between 18.8% for Chinese pine (Pinus tabulaeformis) to 84.6% in European beech (Fagus sylvatica).
Scots pine (Pinus sylvestris) was the most commonly used tree species with 153 studies overall, 64 of which tested for stationarity, with a 53.1% detection rate of non-stationarity.

| Selection bias through Boolean search
We are aware that a Boolean search might miss some potentially important contributions simply due to the keywords not matching. Therefore, we ran sensitivity analyses on the main results. We basically tested if we would have gotten the same results if we had used a random subset of the original data of varying size. Sensitivity analyses indicate that to achieve stable results, we needed an initial sample size of 1/3 to 2/3 of all papers we finally analyzed ( Figure S5).
We therefore consider our sample size large enough and the selection of papers as a robust representation of the relevant scientific literature.

| PAPER S THAT D ID NOT TE S T FOR S TATIONARIT Y HAVE HIG HER CITATION NUMBER S
Papers not testing the basic assumption of stationarity scored a twofold higher citation number (33,324 citations) than those testing (16,803 citations). This is mainly a result of the amount of papers not testing (n = 1,269) versus those testing for stationarity (n = 696), since papers from both categories achieve similar citation rates per paper (about 25 citations/paper). It is not a result of possible earlier publication dates of studies not testing for stationarity. While the total number of studies increased sharply over the last decades ( Figure S6) and the proportion of studies performing stationarity tests increased slightly, the detection rate of non-stationarity has not changed over time. While we acknowledge that a citation does not necessarily mean agreement with the reported facts of the cited paper, taken overall, we believe that citation rate reflects the general influence of a scientific paper on the scientific community. Our results therefore suggest a biased influence of studies assuming stationarity without testing for it on other fields of science and the science-policy interface.

| WHO IS THE CULPRIT ?
Earlier, we introduced two ways of looking at the challenge of nonstationarity: (a) Stationarity is a valid principle and deviations are | 3217 due to varying noise over time, data quality or data treatment.
(b) A general stationarity assumption is overly simplistic, and growth responses of trees to climatic or environmental drivers are primarily non-linear and variable through time. While noise, data quality and data treatment may contribute to non-stationarity in some cases, our review results suggest a relatively clear tendency that observed non-stationarity is the result of basic ecological processes that produce a dynamic relationship between tree growth and environment.

BOX 2 Ecological reality includes non-linearity
The relationship between tree growth and environment can in theory be simplified to response curves, similar to reaction norms in evolutionary biology (Figure 4), for example, bell or sigmoidal-shaped, but never linear. However, some parts of response curves can be considered quasi-linear (y ≈ ax + b), other parts show non-linearity (y ≠ ax + b) or no response (y = b). It is thus possible to calibrate a linear response function and pass validation statistics, if both calibration and verification are performed in that quasilinear range. Within that range, estimates might be considered 'high confidence'. The stationarity assumption is violated however, if the tree growth-environment relationship has either shifted from quasi-linearity to non-linearity or no-response over time, or if the driving variable has shifted its mean stage and extremes to the non-linear part of the response curve. Then explained variance, slope and intercept of the calibration-verification functions can strongly differ. If that happens during the overlap period of proxy and instrumental data, it will be detected by stability testing. Shifts outside the calibration range however, and responses outside the 'quasi-linear' part of the response curves will lead to biased reconstructions or projections. Exactly because of this possibility, any projection or reconstruction (even with passed calibration-verification statistics) outside the calibration range is error prone and must be considered 'low confidence'. We acknowledge that reality is more complex and involves multiple growth-limiting factors, individual life histories, and plastic and genetic adaptation processes, leading to an even higher proportion of 'low confidence' estimates. Also, the no-analog situation of recent decades has and will increase the likelihood of non-linearity or no-response during the verification period, further increasing the chance of 'low confidence' estimates.
To evaluate tree-ring-based estimates of past and future climate or environmental variability, we therefore propose to reserve the term 'high confidence' for studies presenting rigorous, adequate and passed stationarity tests and a reconstruction or projection within the calibration range. To fully utilize the potential tree-rings have to offer, we urge the community to (a) increase efforts for the development of using non-linear functions (e.g., see Carrer & Urbinati, 2001;Jevšenak, Džeroski, Zavadlav, & Levanič, 2018;Jevšenak & Levanič, 2016;Ljungqvist et al., 2020), (b) refine mechanistic tree growth models to calibrate climate-proxy relationships, (c) to constrain uncertainties surrounding the resulting reconstructions or projections, and (d) to develop methods to test for stationarity also in non-linear cases.

F I G U R E 4
Response curves between tree growth and environment: (a) optimum curve, (b) saturation curve. Independent of the shape of the curve, quasi-linear response spaces exist, where tree growth is highly sensitive to environmental changes. There, transfer functions can be calibrated and approximated with a linear model (dashed lines). Resulting estimates can be considered 'high confidence' (light gray areas). Other response spaces show non-linearity (dark gray) resulting in 'low confidence' estimates sessile organisms and undergo ontogeny in a constantly changing world (Smith, 2008), influenced by a multitude of factors of varying importance through time (Carrer & Urbinati, 2006;Stine & Huybers, 2017;Trouillier et al., 2019). Evolutionarily, trees must possess high rates of plasticity (e.g., in growth rates, cell structure, physiological processes related to photosynthesis), and there are multiple lines of evidence that indicate trees do indeed exhibit this plasticity (Chevin, Collins, & Lefèvre, 2013;Lange et al., 2020). Ecologically, the relationship between tree growth and environment often reflects non-linearities, such as physiological optima and threshold or saturation effects (see Box 2). This basic ecological knowledge seems to contrast with the linear stationarity assumption of tree-ring science.
The fundamental tree-ring literature recognizes these limitations in theory (e.g., see seminal book by Fritts, 1976), but often still operates under the assumption that a combination of non-linear factors will lead to quasi-linearity (Cook and Peterson in Hughes et al., 2011).
Stationarity as a concept has in the literature often been deduced from the 'uniformitarian principle', which is the basic cornerstone of paleo-sciences. It is often rephrased as 'the present is the key to the past'. There does, however, exist some confusion on the correct interpretation of the uniformitarian principle: It does not imply a stationary relationship between a proxy and an environmental driver, it simply states that the laws governing todays processes are the same that governed that process in the past.
So possible non-stationary relationships between climate and tree growth would have also happened in the past given the same circumstances, and detected non-stationarity is therefore not a violation of the uniformitarian principle, but rather reflects our incomplete knowledge on the processes governing tree growth (for a historical context of the uniformitarian principle and an extensive discussion, see . Since any climate or environmental reconstruction using treering proxies has two contributing time-series datasets (tree-ring data and climate/environmental data), either one or both of these datasets could be the reason for a non-stationary relationship. While growth trend removal of tree-ring series (the so-called 'detrending') is generally necessary prior to any proxy-driver analyses, the chosen method does influence long-term trends and amplitudes in reconstructions and could potentially induce non-stationarity (Allen et al., 2018;Esper & Frank, 2009). On the other hand, it is well acknowledged that climate data quality and availability decrease back in time and low-quality climate data could potentially also contribute to the detection of apparent non-stationarity in tree-ring studies by influencing calibration-verification statistics (Wilson et al., 2007).
However, since the 1950s, the global station network is quite dense and of high quality, notable exceptions include the breakdown of the Soviet Union and the subsequent closure of many weather stations after the 1990s. In general, non-stationarity should be better detectable in the second half of the 20th century and in the future, also because of the continuous increase in satellite-derived high-resolution climate data.
Weighing the evidence and considering evolutionary and ecological realities, we believe that it is possible, but unlikely that the majority of detected non-stationarity is simply the result of low-quality climate data or specific data treatment, even though this might be the case in specific studies or regions with short or poor-quality climate records. According to our survey results, non-stationarity is widespread, affecting all tree-ring proxies, all tested species and environmental parameters, and all areas of the globe. While well in line with ecological theory (Box 2), these results do pose a significant challenge for using tree-ring proxies to reconstruct or project climate or other environmental parameters.

| THE WAY FORWARD: RECON CILING ECOLOG I C AL RE ALIT Y WITH THE APPLIC ATION OF TREE-RING -BA S ED ENVIRONMENTAL E S TIMATE S
In our opinion, all climate or environmental reconstructions and projections not testing for stationarity should be viewed with a certain caution. According to our global survey results, it is more likely than not that their results are affected by a violation of the stationarity assumption and might be impacted by larger errors than originally reported. The affected number of 1,269 studies is high, and has to be multiplied by their legacy in the scientific literature. Second, treering-based inferences about the past and future should not be performed without rigorous stationarity tests (Esper et al., 2016). Here, we promote the use of the 'BTFS Test' (Buras et al., 2017) for this purpose, since it allows statistically sound testing of climate-growth relationships in different periods (see Box 1). A re-evaluation of existing studies might be helpful to increase confidence and constrain estimates on non-stationarity.
However, while the overall amount of studies has strongly increased, the percentage of those papers testing for stationarity has slightly increased over time ( Figure S6). Also, according to our additional opinion poll carried out in 2016-2017, 71% of researchers in the field of tree-ring science believe that non-stationarity can affect the accuracy and reliability of tree-ring-based climate reconstructions ( Figure S7). These two findings suggest (a) an increased recognition of potential non-stationarity in proxy-environment interactions and (b) an increased awareness of their potential effects on climate reconstructions. Both are important prerequisites for the careful use of tree-ring data according to the current state of knowledge about tree growth relationships to climate and environment.

| CON CLUS IONS
Tree-ring-based climate and environmental reconstructions and projections influence other scientific disciplines such as earth-system science and modeling, climatology, hydrology, forestry and ecology (Babst, Poulter, Bodesheim, Mahecha, & Frank, 2017), and the science-policy interface (e.g. IPCC). In this light, we summarize that | 3219 at present the non-stationarity in tree-ring studies is widespread and most likely reflects complex biological environment-growth interactions rather than differing noise levels or low-quality climate or environmental data used for calibration. This might potentially lead to biased estimates of past climate variability, past and future forest growth and tree-species performance, and carbon and water dynamics of forest ecosystems. To overcome these challenges, careful selection of tree-ring chronologies based on adequate stationarity tests over the full range of climate target variability should be mandatory (Buras et al., 2017;Esper et al., 2017). Our data also indicate a need to re-evaluate the a priori stationarity assumption that often underpins tree-ring studies. We believe that the tree-ring community is well on its way toward a renewed fundamental discussion about non-stationarity, potentially even leading to a 'paradigmshift' acknowledging the higher likelihood of instability between tree growth and an environmental driver. This discussion will advance tree-ring science and help to increase confidence in tree-ring-based climate and environmental estimates for science, the public and policymakers at a time when robust reconstructions and projections are crucial to addressing global climate change and its local and regional impacts.

ACK N OWLED G EM ENTS
This study was supported by DFG GRT 2010 and DFG Wi2680/8-1.
We would like to thank two anonymous reviewers; their comments significantly improved the manuscript. The authors declare no conflict of interest.