Trends and uncertainties in budburst projections of Norway spruce in Northern Europe

Abstract Budburst is regulated by temperature conditions, and a warming climate is associated with earlier budburst. A range of phenology models has been developed to assess climate change effects, and they tend to produce different results. This is mainly caused by different model representations of tree physiology processes, selection of observational data for model parameterization, and selection of climate model data to generate future projections. In this study, we applied (i) Bayesian inference to estimate model parameter values to address uncertainties associated with selection of observational data, (ii) selection of climate model data representative of a larger dataset, and (iii) ensembles modeling over multiple initial conditions, model classes, model parameterizations, and boundary conditions to generate future projections and uncertainty estimates. The ensemble projection indicated that the budburst of Norway spruce in northern Europe will on average take place 10.2 ± 3.7 days earlier in 2051–2080 than in 1971–2000, given climate conditions corresponding to RCP 8.5. Three provenances were assessed separately (one early and two late), and the projections indicated that the relationship among provenance will remain also in a warmer climate. Structurally complex models were more likely to fail predicting budburst for some combinations of site and year than simple models. However, they contributed to the overall picture of current understanding of climate impacts on tree phenology by capturing additional aspects of temperature response, for example, chilling. Model parameterizations based on single sites were more likely to result in model failure than parameterizations based on multiple sites, highlighting that the model parameterization is sensitive to initial conditions and may not perform well under other climate conditions, whether the change is due to a shift in space or over time. By addressing a range of uncertainties, this study showed that ensemble modeling provides a more robust impact assessment than would a single phenology model run.

by different model representations of tree physiology processes, selection of observational data for model parameterization, and selection of climate model data to generate future projections. In this study, we applied (i) Bayesian inference to estimate model parameter values to address uncertainties associated with selection of observational data, (ii) selection of climate model data representative of a larger dataset, and (iii) ensembles modeling over multiple initial conditions, model classes, model parameterizations, and boundary conditions to generate future projections and uncertainty estimates. The ensemble projection indicated that the budburst of Norway spruce in northern Europe will on average take place 10.2 ± 3.7 days earlier in 2051-2080 than in 1971-2000, given climate conditions corresponding to RCP 8.5. Three provenances were assessed separately (one early and two late), and the projections indicated that the relationship among provenance will remain also in a warmer climate. Structurally complex models were more likely to fail predicting budburst for some combinations of site and year than simple models. However, they contributed to the overall picture of current understanding of climate impacts on tree phenology by capturing additional aspects of temperature response, for example, chilling. Model parameterizations based on single sites were more likely to result in model failure than parameterizations based on multiple sites, highlighting that the model parameterization is sensitive to initial conditions and may not perform well under other climate conditions, whether the change is due to a shift in space or over time. By addressing a range of uncertainties, this study showed that ensemble modeling provides a more robust impact assessment than would a single phenology model run.

K E Y W O R D S
Bayesian inference, climate models, International Phenological Gardens, phenology models, Picea abies, provenance

| INTRODUCTION
Compared with the global average, climate warming is expected to be higher during winter months, and more pronounced further north and in mountainous regions, such as in the Alps (IPCC, 2013). Plant spring phenology is highly tuned to winter and spring temperatures and is thus a good indicator of climate change. Many plants have responded to the recent warming by becoming active earlier in the year, but the climate change response varies among species and locations and depends on the time period considered (Ahas, Aasa, Menzel, Vg, & Scheifinger, 2002;Menzel & Fabian, 1999;Menzel et al., 2008).
Potential implications that may follow from phenological shifts in trees include longer growing seasons, which may increase forest productivity (Richardson et al., 2010). However, as frost hardiness in spring is negatively related to growth activity (Westin, Sundblad, Strand, & Hällgren, 2000), an earlier onset of the growing season may increase the risks and severity of frost damage during late spring cold spells (Jönsson & Bärring, 2011). For commercially important species like Norway spruce, for which large differences in phenology traits exist among provenances, comprehensive cultivation research is carried out to identify traits favorable in a warmer climate (e.g., Skrøppa & Steffenrem, 2016;Westin et al., 2000).
Reliable phenology models are needed to improve the simulations of terrestrial biosphere models, for more robust projections of climate change impacts on, for example, forest productivity and plantatmosphere interactions (Jeong, Medvigy, Shevliakova, & Malyshev, 2012;Migliavacca et al., 2012;Richardson et al., 2012). When modeling climate change impacts on tree phenology, uncertainties propagate from initial conditions (i.e., the observed system state at the start of the simulation), model classes (i.e., process representations), model parameters (i.e., parameterization), and boundary conditions (i.e., assumption about model forcing data) (Araújo & New, 2007). Budburst models vary in their representation of tree physiology processes, for example, how various requirements are attained by interacting photoperiod, chilling, and forcing temperatures. Chilling requirements are particularly difficult to quantify, as dormancy release cannot be readily observed (Linkosalo, Häkkinen, & Hänninen, 2006), although it may be correlated with blocking of plasmodesmata by callose (Singh, Svystun, AlDahmash, Jönsson, & Bhalerao, 2017).
Provenances of Norway spruce, adapted to different environmental conditions, differ in the temperature sums required to trigger bud development, and tree breeders have for a long time been selecting trees with high growth capacity and low risk of frost damage (Hannerz, 1999). The lack of provenance-specific requirements in phenology models may impose bias and uncertainty in simulations across geographical and climatic gradients (Chuine, Belmonte, & Mignot, 2000;Kramer et al., 2017;Olsson, Bolmgren, Lindström, & Jönsson, 2013;Olsson & Jönsson, 2015). Adding model parameters may however lead to increased uncertainty as more factors have to be taken into account (Beven, 2009). Using observations of known provenances for model parameterization, the uncertainty related to genetic differences is removed, and the spatial variation in phenology can be assumed to mainly represent the effect of local climate conditions (Chen, 2013). Statistical inference can be applied to account for parameter uncertainties (Beven, 2009), and by recognizing that all models have shortcomings but still provide useful information, ensemble simulations over multiple initial conditions, model classes, model parameterizations, and boundary conditions can be carried out to assess trends and uncertainties. That is, whereas model limitations are traditionally overcome by building better models, ensemble simulations provide an alternative way to generate robust projections (Araújo & New, 2007).
The main objective of this study was to perform ensemble modeling of budburst in Norway spruce (Picea abies). The study region covered the main distribution area in Europe, north of the Alps, and the analysis included four categories of uncertainties, related to the following: (i) initial conditions (IC), (ii) model classes (MC), (iii) model parameterization, and (iv) boundary conditions (BC). The analysis of IC was carried out using phenology observations of three cloned provenances, grouped into six sets of observations used for model parametrization ( Figure 1a). The analysis of MC included seven budburst models, varying in their representation of tree physiology processes, with each IC-MC combination parameterized using Bayesian inference. The analysis of BC was carried out using a subensemble of five climate model datasets representing RCP8.5.
Specifically, we assessed the following: (i) the provenance-specific budburst projections for the study region, (ii) differences among budburst model projections, in relation to model representation of tree physiological processes, and (iii) the relative contribution of IC, MC, and BC to uncertainties in the model projections. Through this, we also assessed which budburst models were more challenging to parameterize in relation to observational constraints and initial inference values.

| Initial conditions
Observed timing of budburst was used for model parameterization and validation, representing the initial conditions in a climate change context. Data on budburst of Norway spruce (Picea abies) were provided by the network of International Phenological Gardens (IPG; Chmielewski, Heider, Moryson, & Bruns, 2013). Clones of known provenances have been planted in similar garden environments, mainly on plain meadows with sparse trees. Budburst is recorded by professional observers and has been defined as the first spring sprout when the buds open and the bud edges are visible with the needles not yet expanded (Chmielewski et al., 2013).
To correctly capture interannual variability and to separate differences in climatic response from local adaptation, it is preferable to use datasets with long time series. In this study, we therefore selected IPGs with more than 20 years of budburst records per Norway spruce provenance; a provenance from Germany with early timing of budburst (IPG plant no. 121, hereafter referred to as P121) and two provenances with late timing of budburst, one from Germany (P122) and one from northern Norway (P123). This generated a dataset of 23 IPGs and 1506 records from 1968 to 2013 (Table 1, Figure A1 in Appendix S1). Each time series was checked for outliers, using the 30-day rule of Schaber and Badeck (2002). The five potential outliers identified were not removed from the analysis, considering the small deviations (between 30 and 36 days, Table 1) and the reliability of the records. The dataset was for each provenance grouped into six IC to constrain the model parameterization; parameterization across sites using observations from all IPGs (IPG All ), and for comparison, site-specific parameterization using five selected IPGs (IPG 2 , IPG 7 , IPG 14 , IPG 33, and IPG 56 ). The selected IPG stations includes the station with the largest elevation difference between site and climate grid cell (IPG 2 ), the most northern and eastern site (IPG 7 ), most western site (IPG 14 ), highest elevated site and with the latest budburst (IPG 33 ), and the most southern and earliest budburst (IPG 56 ).

| Model classes
Budburst models range from being purely empirical to more processbased in their representation of tree physiological processes. Previous studies indicate that models with few parameters in general have higher accuracy over larger regions than more complex models (Basler, 2016;Olsson & Jönsson, 2014) and that all models are less accurate when applied outside the range of conditions for which they are parameterized for (Olsson et al., 2013). To address uncertainties related to MC, we selected an ensemble of eight budburst models that have previously been applied across large regions; one empirical model and seven models based on temperature sums. The empirical model (MT), which is based on a linear regression between day of budburst and mean temperature in February to April (Table 2, Equation 1), was in a previous study among the better performing models for birch and Norway spruce in Europe (Olsson & Jönsson, 2014). The temperature sum models comprised of three varieties of a forcing model (Table 2, Equations 3-5), one forcing-photoperiod model ( Table 2, Equation   6), and three chilling-forcing models (Table 2, Equations 7-9). For all of these models, day of budburst was simulated to occur on the first day when the accumulated forcing exceeded the requirement (Table 2, Equation 2). Given that photoperiod can influence the timing of budburst, and thus protected from too early or too late budburst (Hänninen, 1990;Partanen, Koski, & Hänninen, 1998), we used August 1st as a cutoff threshold for model failure to simulating bud burst. During projection using climate model data, model failure was calculated as the percent of site and year combinations, for which a model was not able to predict budburst before August 1.
Two of the forcing models include a linear response to temperatures above a base temperature (T b ) (Landsberg, 1974); GDD 1 from January 1 and GDD DOY from a parameterized starting day. The other forcing model, SIG DOY , includes a sigmoidal temperature response for all temperatures from a parameterized starting day (Migliavacca et al., 2012). The forcing-photoperiod model (BC DOY , Blümel & Chmielewski, 2012) is an extension of GDD DOY, in which, longer days are associated with an enhanced temperature response, as defined by an exponential constant. The Alternating model (ALT, Cannell & Smith, 1983) is an extension of GDD 1 , with the forcing requirement exponentially reduced for each additional chilling day. The Sequential model (SEQ, Sarvas 1972in Hänninen, 1990 is an extension of SIG, with the forcing temperature response conditioned on the break of winter rest, estimated to occur when a chilling requirement is reached. The most complex model applied in this study, the Unified model (UNI, Chuine, 2000), is an extension of SIG 1 , and share features with both SEQ (chilling requirement) and ALT (forcing requirement conditioned on the number of chilling days).
F I G U R E 1 (a) The ensemble modeling scheme used in this study: Model simulations were produced for three Norway spruce provenance; one with an early timing of budburst originating from Germany (P121) and two with a late timing of budburst, originating from Germany (P122) and Norway (P123). Initial conditions (IC) for model parameterization came from observations at International Phenological Gardens (IPG); comparing the outcome of using data from all 23 IPGs with data from five single sites. Seven budburst models were successfully parameterized, representing a range of model classes (MC). Bayesian inference was applied for model parameterization. Model boundary conditions (BC) were provided by a subensemble of regional climate model data from SMHI-RCA4, used in dynamical downscaling of five global climate models

| Model parameterization using Bayesian inference
The budburst models were parameterized using Bayesian inference for each provenance and initial condition separately. By propagating the parameter uncertainties into future projections, we account for the uncertainty in the estimated parameters, that is, different parameter values that yield similar predictions under current conditions may not do so under future conditions. To obtain the parameter values most likely given our observations (the posterior probability distribution, p(θ|IC)), Bayesian inference combines our prior beliefs regarding possible parameter values (the prior probability distribution, p(θ)) with a likelihood (p(IC|θ)). Assuming Gaussian prediction errors, the log-likelihood (log p(IC|θ)) is the negative sum of all squared prediction residuals (i.e., a larger error gives a smaller log-likelihood value). Here, θ is the unknown parameter value and IC the initial conditions, that is, the observations used in the model parameterization (IPG and E-OBS). Each parameter was assigned a uniform prior based on values found in the literature (Appendix S1), and the range was explored using an adaptive Metropolis-Hasting algorithm (AMH) (Andrieu & Thoms, 2008;Haario, Saksman, & Tamminen, 2001). For each model, the AMH was run eight times using randomly selected starting points and up to 300,000 iterations (Appendix S1). For each iteration, new parameters are randomly generated and the likelihood function (L i ) scores the models ability to predict the observed budburst given the climate data. The L i is then compared to that of the previous iteration (L i−1 ), if larger the new parameters are kept, else a new set of parameters based on either the latest accepted set or the new parameters is created. When the new parameters yield a smaller L, they are accepted if the ratio L 1 /L i−1 is larger than a random number (a ε U[0,1]). By occasionally keeping parameter values that give a smaller L, the AMH algorithm reduces the risk of getting stuck in local optima, allowing it to better explore the possible parameter values. Before reaching the posterior distribution, we removed the burn-in period and thinned each chain to reduce autocorrelation. The burn-in period was assumed to consist of all samples until the first time at which the likelihood exceeds the mean likelihood over the last 10% of the chain. For T A B L E 1 Summary of data on three Norway spruce provenances, one early (P121) and two late (P122, P123), at the International Phenological Gardens (IPG), with number of recorded budburst (n), average budburst (±SD), and location of the IPGs; latitude (Lat, °N), longitude (Lon, °E), altitude (Alt, m above sea level), the difference in altitude between the climate grid cell and the IPG site (Alt diff), and mean annual temperature (MAT) for the period with observations   e.g., Hänninen, 1990). UNI (sharing features with ALT and SEQ) was not included in the final ensemble as it did not converge sufficiently within 300,000 iterations (only one chain converging per provenance, and only ten iterations remaining after thinning for the early German provenance (P121), which indicates a hard optimization problem or an overparameterized model).

| Boundary conditions
For parameterization, the budburst models were forced with interpolated observed daily mean air temperature with a spatial resolution of 0.44° (E-OBS vs. 10.0, 1950-2014) (Haylock et al., 2008 conditions but not the year-by-year variations. Furthermore, the climate model data were not adjusted for site-specific elevation differences, as the purpose was to generate simulations representing the grid-cell level, mapping the entire study region of northern Europe. (ii) The period 2011-2040 was selected to assess current climate and near future, a period little influenced by uncertainties associated with future greenhouse gas emissions (IPCC, 2013). (iii) The period 2051-2080 was selected to quantify uncertainties in plant phenological response in relation to a high emission scenario (RCP 8.5). With a rotation period of at least 50 years, this period is highly relevant to consider during regeneration of Norway spruce forest stands.
To address uncertainties related to climate model data, the phenology models were driven by five climate model datasets from EURO-CORDEX representing RCP8.5 at a spatial resolution of 0.44° (Jacob et al., 2014). The subensemble consisted of data from one regional cli- corresponding to the PRUDENCE regions (Christensen & Christensen, 2007;Pulatov et al., 2016). All ensemble members had been bias corrected, using empirical quantile mapping with EURO4M as reference dataset (Wilcke, Mendlik, & Gobiet, 2013). We hereafter refer to the BC datasets as CanESM2, CERFACS, IPSL, NorESM1, and GFDL.

| Analysis
An ensemble was produced for each Norway spruce provenance   provenance-specific analysis (Appendix S2), referred to in the text as Table A1-A4, Fig. A1-A2 and Fig. B1-B6, respectively.

| Model performance and failure
In general, the models failed to simulate budburst more often when constrained by initial conditions based on one IPG, compared to IPG All (  TP1   TP2   TP3   TP1   TP2   TP3   TP1   TP2   TP3   TP1   TP2   TP3   TP1   TP2   TP3   TP1   TP2 (Fig. A2). For simulations with a positive growth rate, irrespective of the sign of the inflection point, the response rate is lower with higher temperature.
The TP1 simulations, based on climate model data, were for some sites significantly earlier or later than observed budburst (KW test, α = 0.05, df = 1) (Figure 2). For some sites (i.e., IPG 2 , IPG 12 , IPG 23 , IPG 33, and IPG 36 ), all models generated budburst projections that were early in comparison with observed timing. This is related to differences between E-OBS data (adjusted for site-specific altitude, used for model parameterization) and gridded climate model data (used for large-scale and long-term climate impact assessments). In comparison with observed timing of budburst, the models generated different grid-cellspecific projections for TP1, with ALT and SIG DOY representing the two extremes: The ALT model projected an earlier than observed budburst for 19 of 23 grid cells with IPGs, whereas SIG DOY projected an earlier than observed budburst for five grid cells, and later than observed for 12 of the grid cells.

| Ensemble projections
The ensemble projections indicate that the temporal and spatial changes in budburst will be similar among the provenances, maintaining the interprovenance relationship over time, also in a warmer climate (Figure 3). The observed timing of budburst and simulations based on E-OBS in 1971-2000 agree to a large extent, but the range was slightly larger for the E-OBS simulations, which reflects both model uncertainties, uncertainties associated with the E-OBS data, and added variability by site-year projections without corresponding budburst observations (i.e., gaps in the observed datasets). The discrepancies between model projections based on E-OBS and modeled climate data were solely due to differences in the temperature estimates (see Section 2.4). For many of the sites, the climate model data indicated slightly warmer conditions that the E-OBS data, as the climate model data were not adjusted for site-specific altitudes. Sixteen of 23 sites were at lower altitudes than the corresponding grid-cell average altitude (Table 1). That is, the E-OBS data were adjusted to generate time series that represented the site-specific conditions, for optimal calibration of the phenology models, but the climate model data were not adjusted to provide future predictions that represented the gridcell level, enabling mapping of the entire study region (c.f. Figure 4).
A subset of projections (initial condition IPG All , model classes MT, GDD 1 , GDD DOY , SIG DOY , BC DOY, and ALT, and all boundary conditions) was combined to assess variations in climate sensitivity within the study region (Figure 4, Fig. B2). The budburst advancement was more pronounced in northern Europe, that is, in areas with a more pronounced warming during winter and spring, while the estimated interannual variation in budburst (as influenced by MC and BC) increased more in western Europe, that is, in areas where the current climate is relatively warm (Fig. B3). It is worth noting that even though the forcing requirement in GDD 1 and GDD DOY is highly correlated, GDD 1 projects a greater advancement by accounting for temperature increase in late winter and early spring (i.e., before the starting day of GDD DOY , ranging from late February to late March, Table A2-A4).  Figure 5c). Furthermore, SEQ was the only model to project a relative delay in budburst, at IPG 14 for all provenances and at IPG 18 for the two late provenances (P122 and P123, Fig. B4). Among all IPG sites, these two have the highest average temperature and lowest variability in winter (November-January) and spring (February-April).

| Model-specific projections
All temperature sum models projected the greatest advancement at the most northern site (IPG 7 ), and the least advancement (or even a delay according to SEQ) at the most western site (IPG 14 ). Minor differences were found for IPG 27 and IPG 28 , even though they belong to the same climate grid cell, due to the latitude-specific day length function implemented in BC DOY .  Figure 1b). All CDFs indicated an earlier timing of budburst in response to a warmer climate ( Figure 6). The estimated average advancement in TP2 was 4.1 ± 1.6 days, and in TP3 10.2 ± 3.7 days. All provenances indicated the same magnitude of change, but the CDFs of the early provenance differed somewhat from the two late provenances, with a slightly larger variation within IC (11 days) than MC (8 days) and BC (6 days; c.f. Figure 6 and Fig. B5).

| Sources of uncertainty
In this study, the variation was mainly attributed to the budburst models (MC), followed by the initial conditions (IC) and selection of climate model data (BC) (Figure 6). Within each category, the variability among CDFs increased over time, somewhat more for MC than for IC (Figure 6, Fig. B5). Among the initial conditions (IC-CDFs), IPG All changed the most and IPG 7 the least. Among the phenology models (MC-CDFs), BC DOY showed the largest change and SIG DOY the smallest change. A comparison of the climate change effect on the central tendency (i.e., change in average values between TP1 and TP3) revealed minor differences among IC-CDFs and BC-CDFs (all within the range of 8-11 days). Larger variations were found among the MC-CDFs, ranging between 2 days (SIG DOY ) and 22 days (GDD 1 ; Table 5). The weak signal by SIG DOY can be attributed to the increased frequency of model failure, and that the model in general is approaching failure (Table 4). Furthermore, the interannual variations predicted by BC DOY were substantially smaller than for GDD 1 (c.f. Figure 5b). Nevertheless, BC DOY differed significantly between TP1 and TP3, indicating a general shift in timing of budburst by 9 days (i.e., close to the ensemble average).
A post hoc analysis, generating a pairwise comparison among IC-CDFs and MC-CDFs, showed that the ranking among model runs did generally not differ across time periods. However, the intricate interplay between parameter values and temperature generated some inconsistencies: The P123-projections were generally earlier with IC IPG 56 than with IC IPG 33 in TP1, but later in TP2 and TP3 (Fig. B5), and the shift was induced by a stronger temperature response in IC-IPG 33  Table A4). The projections with ALT were for all provenances generally earlier than with GDD 1 in TP1, but later in TP3. With warmer springs, GDD 1 will unconditionally project earlier budburst, while the advancement with ALT is partly offset by warmer winters, as fewer chilling days increase the forcing requirement of ALT.
The BC-CDFs showed minor differences in ranking among the time periods, as influenced by the partly random interannual variation in climate model data, and all BC-CDFs indicated a similar magnitude of change between TP1 and TP3. The selection of initial conditions had a significant influence on all model projections (KW test, α = 0.05, df = 5), with SIG DOY and SEQ being more sensitive than the other models ( Fig. B6a,b).

| DISCUSSION
Assessments of climate impacts on tree phenology are required for a range of theoretical and practical applications, including ecosystem modeling for mitigation and adaptation purposes and selection of suitable provenance at the timing of regeneration of forest stands (Migliavacca et al., 2012;Skrøppa & Steffenrem, 2016). A range of phenological models has been developed to capture temperature effects on the timing of budburst, differing in terms of model structure and complexity (Olsson & Jönsson, 2014). As there are remaining uncertainties on how to model the tree species and provenance-specific phenology (Fu et al., 2015;Jochner, Sparks, Laube, & Menzel, 2016;Suvanto, Nöjd, Henttonen, Beuker, & Mäkinen, 2016), future projections should preferably be based on an ensemble of models with different structures (Basler, 2016). In this study, focusing on Norway spruce in northern Europe, we applied an ensemble approach using seven phenology models to provide a general overview of trends and uncertainties, addressing effects related to initial condition for model parameterization, model class, and boundary conditions. Our results are in line with other studies concluding that structurally different phenology models can generate similar results under current climate conditions, but differ in terms of future projections (Basler, 2016;Linkosalo, Lappalainen, & Hari, 2008;Vitasse et al., 2011). The ensemble projection indicated that the timing of budburst will be on average 10.2 ± 3.7 days earlier in 2051-2080 than in 1971-2000, given climate conditions corresponding to RCP 8.5. An earlier timing of budburst was associated with increased spatial variation, which is in line with observed changes for many species in Germany between 1951 and 2002 (Menzel et al., 2006). The variations captured by the ensemble projections were primarily caused by differences among phenology model classes (MC), secondly, by the initial conditions used for model parameterization (IC), and lastly, by climate model data (BC).
The ensemble analysis of this study focused on the complementarity of bud burst models, that is, the main source of variation. The included models captured different aspects of environmental regulation, such as forcing, chilling, and photoperiod, thereby contributing to the overall picture of current understanding of climate impacts on tree phenology (Fu, Campioli, Van Oijen, Deckmyn, & Janssens, 2012;Vitasse et al., 2011). Model parameterizations and projections were carried out for one early and two late Norway spruce provenances, as tree breeding with selection of suitable seed sources, (including provenances from e.g., Germany, Poland, and Belarus) is part of the climate adaptation strategy of the north European forestry sector.
Tree phenology is an important selection criterion (Hannerz, 1999), as trees that start growth early in the season face a higher risk of frost damage, whereas trees that start late may not take advantage of the full growing season (Karlsson, 2009). Seed sources with late bud flushing are commonly recommended in south Sweden, whereas early bud flushing varieties are commonly used in north Sweden, as the spring temperature progression of this region is less frequently interrupted by frost episodes, and the growing season is shorter (Jönsson, Linderson, Stjernquist, Chlyter, & Bärring, 2004). The results of this study indicated that the temporal differences in timing of budburst between early and late provenances will remain in a warmer climate; however, the magnitude of temperature change will influence the regions and provenances most at risk. This is due to the fine balance between warmer climate generally reducing the number of frost days and tree dehardening and budburst occurring earlier in the year, at a time when temperature backlashes are more frequent (Jönsson & Bärring, 2011). While it was beyond the scope of this study to assess the risk of frost damage, it pointed toward the need of analyzing both the general trend in timing of budburst (most pronounced in northern Europe) and the interannual variation (will increase most in western Europe), as both aspects influence the risk.  (Olsson & Jönsson, 2014). The higher degree of model failure with initial conditions based on single IPGs indicated that the overestimation effect became more pronounced when the parameterization was constrained by local conditions than by average regional conditions (IPG All ), especially when adding the effects of climate change.
In this study, UNI was not properly calibrated and thereby omitted from the ensemble projections. Previous studies have indicated that this model is prone to fit random noise data (Linkosalo et al., 2008), which results in poor performance at external sites (Fu et al., 2012), and this may have impaired our calibration process. Similar results have been found for SEQ (Fu et al., 2012), which in this study had the highest rate of failure to predict budburst, likely due to insufficient monitoring data for the calibration of chilling requirement (Fu et al., 2012;Linkosalo et al., 2008;Vitasse et al., 2011). The models that did not fail, GDD 1 , GDD DOY, and BC DOY , were easiest to calibrate and among the better performing models when comparing model simulations with observed timing of budburst. Even so, the calibration procedure of these models highlights some model weaknesses and gaps in physiological understanding. The parameterization of GDD 1 generated a posterior bimodality caused by the interdependence between base temperature and forcing requirement, and the parameterization of SIG DOY was strongly influenced by the correlation between growth rate and infliction point.  (IPCC 2013). The result of a tree phenology study including a subselection of cold years, to enable a validation based on currently climate (Olsson & Jönsson, 2014), suggests that the model performance may deteriorate gradually with temperature increase as years with weather and climate conditions outside the model predictive capacity increase in frequency.
In this study, the boundary conditions were ranked as the third most important source of variation. However, the study design will influence the attribution of uncertainties (Uusitalo, Lehikoinen, Helle, & Myrberg, 2015) and it has to be taken into account that the model runs were based on bias-corrected climate model data, selected to represent RCP 8.5 only . In a study on Harvard forest (Massachusetts, USA), largest uncertainty was attributed to the selection of climate data, followed by budburst models and model parameterization (Migliavacca et al., 2012). The two scenarios included, A1fi and B1, have a median temperature increase by 2100 similar to RCP8.5 and RCP4.5 (Rogelj, Meinshausen, & Knutti, 2012), thereby capturing parts of the scenario uncertainty that was not specifically addressed in this study. Limited time and computer resources commonly restrict the possibility to take the entire range of climate model data into account when carrying out impact assessments , and this study focused on a pronounced climate change scenario as this also captures aspects of what could happen in a less severe scenario. That is, the climate conditions outlined by RCP 8.5 for the midcentury correspond roughly with the conditions outlined by RCP 4.5 for the end of the century (IPCC, 2013).

| CONCLUSION
In this study, we applied (i) Bayesian inference to estimate model parameter values to address uncertainties associated with selection of observational data, (ii) climate data selection to identify a subensemble of climate model data representative of a larger dataset, and (iii) ensembles modeling over multiple initial conditions, model classes, model parameterizations, and boundary conditions to generate future projections and uncertainty estimates. Structurally complex models were more likely to fail predicting budburst for some combinations of site and year than simple models, however, contributing to the overall picture of current understanding of climate impacts on tree phenology by capturing additional aspects of temperature response, for example, chilling. Model parameterizations based on single sites were more likely to result in model failure than parameterizations based on multiple sites, highlighting that the model parameterization is sensitive to initial conditions and may not perform well under other climate conditions, whether the change is due to a shift in space or over time. By addressing a range of uncertainties, this study showed that ensemble modeling provides more a robust impact assessment than would a single phenology model run.

ACKNOWLEDGMENTS
This study was financially supported by FORMAS through the project Climate change impact on tree defense capacity (2010-822), and JL T A B L E 5 Differences between time periods (TP1 = 1971-2000, TP2 = 2011-2040, and TP3 = 2051-2080  Ensemble averages (ENS-AVG) are presented, along with average differences for the individual ensemble members. Values are provided for three provenances, one early (P121) and two late (P122, P123). has been partially funded by MERGE. We acknowledge the observers of the International Phenological Gardens of Europe (http://ipg.huberlin.de), the coordinator and data provider of the network, as well as the E-OBS dataset from the EU-FP6 project ENSEMBLES (http:// ensembles-eu.metoffice.com) and the data providers in the ECA&D project (http://ecad.eu).