Remote sensing data assimilation for a prognostic phenology model



[1] Predicting the global carbon and water cycle requires a realistic representation of vegetation phenology in climate models. However most prognostic phenology models are not yet suited for global applications, and diagnostic satellite data can be uncertain and lack predictive power. We present a framework for data assimilation of Fraction of Photosynthetically Active Radiation absorbed by vegetation (FPAR) and Leaf Area Index (LAI) from the MODerate Resolution Imaging Spectroradiometer (MODIS) to constrain empirical temperature, light, moisture and structural vegetation parameters of a prognostic phenology model. We find that data assimilation better constrains structural vegetation parameters than climate control parameters. Improvements are largest for drought-deciduous ecosystems where correlation of predicted versus satellite-observed FPAR and LAI increases from negative to 0.7–0.8. Data assimilation effectively overcomes the cloud- and aerosol-related deficiencies of satellite data sets in tropical areas. Validation with a 49-year-long phenology data set reveals that the temperature-driven start of season (SOS) is light limited in warm years. The model has substantial skill (R = 0.73) to reproduce SOS inter-annual and decadal variability. Predicted SOS shows a higher inter-annual variability with a negative bias of 5–20 days compared to species-level SOS. It is however accurate to within 1–2 days compared to SOS derived from net ecosystem exchange (NEE) measurements at a FLUXNET tower. The model only has weak skill to predict end of season (EOS). Use of remote sensing data assimilation for phenology model development is encouraged but validation should be extended with phenology data sets covering mediterranean, tropical and arctic ecosystems.

1. Introduction

1.1. Phenology and Climate

[2] Timing and magnitude of cyclic events in the terrestrial biosphere are strongly related to climate variability [Scheifinger et al., 2002] because plant physiological processes are controlled by surface climatic states like moisture, temperature and light [Jarvis, 1976; Larcher, 2003]. Seasonal and inter-annual climatic variations influence the timing of plant development called vegetation phenology.

[3] Phenological networks provide one of the longest sources of direct evidence of climate variability and change [van Vliet et al., 2003; Betancourt et al., 2005; Menzel et al., 2006]. Concurrent with recent warming trends bud burst or flowering have advanced by around 1–2 days per decade in temperate deciduous ecosystems [Menzel and Fabian, 1999; Menzel, 2000]. Historical records and phenological reconstructions also reveal substantial inter-annual to centennial variability of start of season (SOS) [Menzel et al., 2005; Aono and Kazui, 2008; Rutishauser et al., 2007]. End of season (EOS) events like leaf coloring, and hence their variability, are harder to detect and document but nevertheless they are strongly coupled to climate [Sparks and Menzel, 2002; Taylor et al., 2008]. In general the growing season has lengthened during the last decades of the 20th century [Intergovernmental Panel on Climate Change, 2007b].

[4] Modeling studies demonstrate the influence of vegetation phenology on the climate system. Regional and global temperature and precipitation patterns are both sensitive to and affect not only temporal but also spatial phenological variability [Tsvetsinskaya et al., 2001; Lu and Shuttleworth, 2002; Kim and Wang, 2005]. Piao et al. [2007] show that modeled growing season length correlates with terrestrial CO2 uptake. White and Nemani [2003] however find that the yearly net CO2 balance is only moderately affected by growing season length. This can be explained by both enhanced spring CO2 uptake and higher autumnal CO2 respiration rates [Schaefer et al., 2005] and leads to larger seasonal CO2 amplitudes as documented by Keeling et al. [1996]. On longer time scales phenology can affect tree competition and vegetation dynamics in response to climate variability and change [Kramer et al., 2000]. Generally, either diagnostic or prognostic parameterizations of vegetation phenology are employed in these studies.

1.2. Diagnostic Phenology

[5] Satellite remote sensing vegetation indices exploiting the seasonal changes in the spectral signature of vegetation photosynthetic activity have been developed during the last two decades [Tucker et al., 1985; Reed et al., 1994]. They can be used to derive global maps of biophysical and phenological parameters like FPAR or LAI. These maps then prescribe phenological variability in climate models [Sellers et al., 1996; Buermann et al., 2001; Lu and Shuttleworth, 2002; Lawrence and Slingo, 2004], a method also termed as “diagnostic phenology”. Satellite phenological observations are different from ground observations since they provide a spatially integrative view of continuous biophysical states instead of plant-specific phenological development stages. Studer et al. [2007] demonstrate that inter-annual SOS variability of both methods are comparable even over complex terrain such as the Swiss alps when individual ground observed species are composed into a “statistical plant” [Studer et al., 2005]. However the transfer functions to detect SOS and EOS timing from satellite measurements have to be chosen carefully.

[6] Current satellite sensors like MODIS [Justice et al., 2002] or Medium Resolution Imaging Spectrometer (MERIS) [Rast et al., 1999] used for phenological research provide data at 1 km spatial scale with a 1–16 day revisiting frequency. However small-scale (<500 m) topographical variability in the order of 50 m can result in a 1–2 week difference in SOS [Fisher et al., 2006]. Sub-pixel land cover heterogeneity leads to substantial uncertainty in the calculation of biophysical properties from satellite radiances [Cohen et al., 2006]. In temperate ecosystems both satellite and ground phenological observations respond to large-scale climate forcing while mediterranean and tropical phenology is known to have small-scale spatial variability [Los et al., 2001; Zhang et al., 2004; Maignan et al., 2008].

[7] Atmospheric disturbances like clouds or aerosols as well as snow masking of vegetation limit the applicability of diagnostic phenology data sets in climate models. Figure 1 visualizes the seasonal course and uncertainty range of MODIS-derived LAI for four major global ecosystem types, namely temperate deciduous, tropical evergreen, boreal evergreen and mediterranean savanna. Only few high-quality observations (black crosses, the other curves are explained further down) are available for the tropical ecosystem and error bars are large because of clouds and aerosols. Gaps are also present in the boreal ecosystem during winter and spring because of snow cover and missing light. Quality screening [Myneni et al., 2002; Delbart et al., 2006] and gap filling by use of curve fitting algorithms [Los et al., 2000; Jonsson and Eklundh, 2002; Zhang et al., 2003; Stöckli and Vidale, 2004; Bradley et al., 2007; Gao et al., 2008] can be applied to create continuous and consistent time series needed to prescribe biophysical states in climate models.

Figure 1.

Prognostic LAI from four phenology parameterizations (GSI, CN, IBIS, and TRIFFID, see introduction section for more detail) used in climate models compared to MODIS-derived LAI at a (a) temperate deciduous broadleaf forest, (b) tropical evergreen broadleaf forest, (c) boreal evergreen needleleaf forest, and (d) mediterranean savanna (Black crosses are the highest quality MODIS observations, error bars show the observation uncertainty; for details see methods section).

[8] While some of these methods, such as using TIMESAT [Jonsson and Eklundh, 2002] for gap filling [Gao et al., 2008] are very promising, spatial or temporal interpolation generates further uncertainty in the time series. Finally, diagnostic phenology data sets only cover the past satellite observation period and cannot be used for, for example, seasonal numerical weather forecast or future climate predictions [Gienapp et al., 2005].

1.3. Prognostic Phenology

[9] Models simulating the timing of phenological events have mainly been developed for linking phenological ground observations with climate variability. Hunter and Lechowicz [1992] find that ground observed bud-burst can be predicted from spring temperatures and photoperiod in combination with a chilling requirement. White et al. [1997] predict SOS over the continental US with an accuracy of 6–7 days and EOS with 5–6 days accuracy. They find that temperature sums can be used to predict SOS while a more complex combination of temperature, photoperiod and precipitation is needed for EOS depending on vegetation type. Chuine [2000] integrates previous approaches into a generalized model for SOS depending on chilling and forcing temperatures.

[10] So-called prognostic phenology models are employed in climate models for a continuous prediction of biophysical states like FPAR and LAI (examples shown in Figure 1). TRIFFID (Top-down Representation of Interactive Foliage and Flora Including Dynamics [Cox, 2001]; component of JULES, the Joint UK Land Environment Simulator) and IBIS (Integrated BIosphere Simulator [Foley et al., 1996]; component of the NCAR Community Land Model [Levis et al., 2004]) use temperature triggers to simulate growth and decay of leaves in temperate and boreal vegetation. TRIFFID prognoses continuous LAI changes by use of a leaf turnover rate while IBIS triggers instantaneous LAI changes. In CN (prognostic Carbon-Nitrogen dynamics based on BIOME-BGC [Thornton et al., 2002]; and component of the NCAR Community Land Model [Thornton et al., 2007]) leaf growth is predicted from vegetation biochemical cycling rates coupled to the terrestrial carbon-nitrogen cycle. SOS is triggered by cumulative soil temperature and EOS is triggered by day-length. Drought deciduous phenology in CN is triggered by temperature and soil moisture. In GSI (Growing Season Index [Jolly et al., 2005]) environmental factors based on temperature, light and humidity thresholds concurrently control the phenological state without the use of trigger functions.

[11] Careful interpretation is needed when comparing prognostic phenology models to satellite observations since most models simulate individual (for example, deciduous or evergreen) vegetation types. This is exemplified for a deciduous broadleaf forest (DBF) in Figure 1a: satellite observations have a winter LAI of around 1.5 as a result of the evergreen vegetation fraction while the modeled DBF LAI decays to 0 in winter. Contrasting to this, CN correctly simulates a constant LAI for the boreal evergreen needleleaf forest (ENF) in Figure 1c while there is a substantial fraction of deciduous vegetation revealed by the observations during summer. Most models can simulate a mixed phenology based on the fractional cover of individual vegetation types (for example, CN, IBIS, TRIFFID). This is ultimately needed for their application in global models but it adds another level of complexity.

[12] Therefore, rather than focusing on magnitude, differences in timing should be analyzed when comparing models and satellite observations. In Figure 1, GSI and CN are accurate to within one week for prediction of SOS and EOS for DBF (a), but TRIFFID and IBIS predict a too long-growing season. Models capture the almost constant LAI of the evergreen tropical forest (b), although IBIS decreases LAI during the dry season and GSI displays too much variability. Models partly fail to reproduce the constant LAI of ENF (c). None of the models matches either timing or phase of the drought-deciduous mediterranean grassland phenology (d).

[13] Each prognostic model [see also e.g., Potter and Klooster, 1999; Arora and Boer, 2005; Gibelin et al., 2006; Dickinson et al., 2008] includes a partial set of processes required to simulate global phenological variability. Figure 1 reveals significant timing differences between models and satellite observations especially for drought-deciduous mediterranean ecosystems independent of model complexity. The highly empirical model formulations were mostly developed for temperate DBF phenology but they are currently applied as part of decadal to centennial global climate model predictions.

1.4. Best of Both Worlds

[14] A realistic representation of seasonal to inter-annual phenological variability would be of benefit for models simulating the global carbon and water cycle. It however requires bridging the gap between knowledge available from local-scale phenological observations and their application in global-scale models [Cleland et al., 2007]. It also requires taking advantage of the wealth of data contained in diagnostic phenology data sets and applying them to reduce uncertainty in prognostic phenology models.

[15] This study explores whether it is possible to constrain uncertainty in model parameters by assimilating MODIS FPAR and LAI into the GSI phenology model. In the next section both the data assimilation model and the modifications to the GSI model are presented. Data assimilation and model experiments are carried out at local and regional scale covering a wide range of ecosystem types and climate zones (Table 1). This strategy is computationally efficient and can reveal advantages and deficiencies of the employed methodology prior to its application in a global scale experiment. It further allows for validation with ground observations which are only available at local scale. The results section firstly demonstrates the potential of data assimilation to constrain model parameters. Seasonal FPAR and LAI predictions using satellite-constrained parameters are then compared to model simulations with original parameters. Modeled inter-annual SOS and EOS variability is finally validated against three independent ground phenology data sets.

Table 1. Tower Sites and Regional Areas Used in This Studya
No.SiteLon [°E]Lat [°N]Altitude [m]Biome TypeYearsClimate
  • a

    Mixed forest (MF), evergreen needleleaf forest (ENF), deciduous broadleaf forest (DBF), tundra (TUN), evergreen broadleaf forest (EBF), grasslands (GRA), savanna (SAV), croplands (CRO).

CarboEurope Sites (Europe)
1Vielsalm [Aubinet et al., 2001]6.0050.30450MF2000–2005Temperate
2Tharandt [Grunwald and Bernhofer, 2007]13.5750.96380ENF2000–2003Temperate
3Castel Porziano [Valentini, 2003]12.3841.7168EBF2000–2005Mediterranean
4Collelongo [Valentini, 2003]13.5941.851550DBF2000–2003Mediterranean
5Kaamanen [Laurila et al., 2001]27.3069.14155TUN2000–2005North boreal
6Hyytiälä [Suni et al., 2003]24.2961.85181ENF2000–2005Boreal
7El Saler [Ciais et al., 2005]−0.3239.3510ENF2000–2005Mediterranean
8Puechabon [Rambal et al., 2004]3.6043.74270DBF2001–2005Temperate
9Sarrebourg [Granier et al., 2000]7.0648.67300DBF2000–2005Temperate
LBA Sites (Brazil)
10Santarem KM83 [Goulden et al., 2004]−54.97−3.02130EBF2001–2003Tropical
11Tapajos KM67 [Hutyra et al., 2007]−54.96−2.86130EBF2002–2005Tropical
AmeriFlux Sites (USA)
12Morgan Monroe [Schmid et al., 2000]−86.4139.32275DBF2000–2006Temperate
13Boreas OBS [Dunn et al., 2007]−98.4855.88259ENF2000–2005Boreal
14Lethbridge [Flanagan et al., 2002]−112.9449.71960GRA2000–2004Boreal
15Fort Peck [Gilmanov et al., 2005]−105.1048.31634GRA2000–2005Temperate
16Harvard Forest [Urbanski et al., 2007]−72.1742.54303DBF1990–2006Temperate
17Niwot Ridge [Monson et al., 2002]−105.5540.033050ENF2000–2004Sub-alpine
18Wind River [Paw U et al., 2004]−121.9545.82371ENF2000–2004Temperate
19Bondville [Meyers and Hollinger, 2004]−88.2940.01213CRO2000–2005Temperate
20Willow Creek [Bolstad et al., 2004]−90.0845.81520DBF2000–2005Temperate
21Tonzi Ranch [Baldocchi et al., 2004]−120.9738.43177SAV2002–2005Mediterranean
22Vaira Ranch [Baldocchi et al., 2004]−120.9538.41129GRA2002–2005Mediterranean
Regional Area
23Swiss Lowl [Rutishauser et al., 2007]8.2547.25600MF1958–2006Temperate

2. Methods

2.1. Models

[16] This study integrates a process model and a data assimilation model. The prognostic phenology model predicts FPAR and LAI and is driven by meteorological predictor data. The data assimilation model then updates ensembles of predicted model states and empirical model parameters with information contained in MODIS FPAR and LAI observations.

2.1.1. Prognostic Phenology Model

[17] The GSI (Growing Season Index) by Jolly et al. [2005] serves as the foundation for the prognostic phenology model. It is simple and it includes the three main climatic controls of seasonal phenological processes: minimum daily temperature Tm (K), mean daily global radiation Rg (W m−2) and mean daily vapor pressure deficit vpd (mb). Those variables are readily available from local micro-meteorological measurements and from climate reanalysis data sets. Note that Jolly et al. [2005] used photoperiod as light controlling variable instead of Rg. We use Rg and argue that it provides a year-to-year variability and therefore responsiveness to, for example, clouds or aerosols. The analytic equation for GSI (−) is simply the product of three factors f(equation image), f(equation image) and f(equation image),

equation image
equation image
equation image
equation image

where the empirical climate parameters Tmmax, Tmmin, Rgmax, Rgmin, vpdmax, and vpdmin are maximum and minimum Tm, Rg and vpd ranges. f(equation image), f(equation image) and f(equation image) vary linearly between the constraining limits of 0 and 1, and thus regulate vegetation activity. equation image, equation image and equation image are multi-day running mean averages with averaging times tave(Tm), tave(Rg) and tave(vpd) (days). Note that Jolly et al. [2005] use a 21-day running mean GSI calculated from daily mean meteorological variables while here running means of meteorological variables are used to calculate GSI. Our aim is to define a separate and optimal averaging time for each climate variable. GSI can be interpreted as a potential phenological state under current meteorological conditions. It is extended by the following analytic equations into a prognostic phenology model by defining a true prognostic phenological state P (−), which can be related to biophysical state variables FPAR (−) and LAI (m2 m−2) by use of Beer's law [Sellers et al., 1996],

equation image
equation image
equation image
equation image
equation image

where P is derived from FPAR by linearly scaling structural vegetation parameters FPARmin and FPARmax. These are the minimum and maximum FPAR corresponding to the FPAR of the evergreen vegetation fraction and the FPAR of the fully developed deciduous and evergreen vegetation respectively. Both parameters are static but they have high-spatial variability. They are needed to make the phenology model compatible with satellite observations as demonstrated in the introduction section. FPARmax is further closely related to the fraction of vegetation cover fv (−) by FPARmax = fv · FPARsat. ∂GSI is the growth vector (−). It is positive (or negative) when the potential phenological state (GSI) based on current meteorological conditions is above (or below) the current prognostic phenological state P. γ (day−1) is the maximum growth rate and P(1 − P) is a logistic growth function which constrains growth at low and high phenological states. The latter is needed to provide a stable numerical solution and prevent the model from unrealistically switching leaves on or off when meteorological conditions change rapidly. In a mechanistic model of plant physiology and phenology both γ and P(1 − P) would be handled by modeled biochemical cycling rates based on nutrient availability and meteorological conditions. FPARsat = 0.98 (−) is the FPAR value for maximum Leaf Area Index LAImax (m2 m−2).

[18] A semi-implicit numerical scheme is used to integrate the above equations forward in time. Leaf growth ΔFPAR/Δt depends on new meteorological conditions used for calculating (GSIt+1) and the previous biophysical state (FPARt).

equation image
equation image
equation image
equation image

Since the data assimilation model (see below) updates FPAR and LAI, both states need to be prognostic. They are thus integrated separately in time. FPARL, PL and ΔGSIL are derived from LAI at every time step,

equation image
equation image
equation image
equation image
equation image
equation image

[19] Equations (14)(18) can be neglected by setting FPARLt+1 = FPARt+1 in equation (19) if only a prognostic FPAR with diagnostic LAI is needed, for example, if the model is used without data assimilation, or if there are no LAI observations.

[20] The following numerical constraint is needed: P(1 − P) = max(P(1 − P), 0.01). During data assimilation (see below) model parameters and states are stochastically perturbed at initial time and their ensemble is furthermore continuously updated. Since this can result in unphysical parameters or violation of the validity of the above equations, further numerical constraints were implemented in the model. However they are not needed for operating the model in prognostic mode: Tmmax = max(Tmmin + 1, Tmmax), vpdmax = max(vpdmin + 1, vpdmax), Rgmax = max(Rgmin + 1, Rgmax), FPARmin = min(FPARmin, FPARmax − 0.001), γ = min(max(γ, 0.01), 5), LAI = max(min(LAImaxfv, LAI), 0), FPAR = min(max(FPARmin, FPAR), FPARmax)).

2.1.2. Data Assimilation Model

[21] The Ensemble Kalman Filter (EnKF) after Evensen [1994, 2003] is a sequential data assimilation model. It is applied in this study with modifications for joint state and parameter estimation following Moradkhani et al. [2005]. EnKF is a Monte Carlo-based algorithm that propagates N a-priori model ensemble forecasts forward in time and creates a posterior model state and parameter ensemble by the analysis of both the observation and model uncertainty,

equation image

where Af is the ensemble matrix containing predicted model states (from equations (13) and (19)) and model parameters. They are updated to Aa when new observations D become available. H is the operator relating observed to model states and K is the Kalman gain (for details see Evensen [2003]). A is a matrix holding N ensemble members of the vector ψ with n states and parameters. D is the matrix holding N ensemble members of the vector d with m observations,

equation image
equation image

[22] The state and parameter ensemble members ψi0 with i = 1…N are initialized at the beginning of the model integration by defining an uncertainty vector ω. This vector has a Gaussian distribution with mean 0 and initial variance Vψ0 and is added to the initial states x0 and parameters θ0,

equation image

[23] Similarly, the observation ensemble members di with i = 1…N are built each time when observations become available. ε is the observation uncertainty vector which has a Gaussian distribution with mean 0 and observation variance Vd,

equation image

[24] The elements of ψ, their initial values ψ0 as well as initial variances Vψ0 used in this study are defined in Table 2. The initial ensemble needs to be carefully chosen in order to successfully apply the EnKF analysis. Choice of unrealistic initial parameters or too small initial variances can result in over-dispersion of parameter ensembles and run-away effects during data analysis. We started from climate control parameters similar to those given by Jolly et al. [2005] and also chose parameter variances encompassing the physical range of a global climate (for example, 50 K for temperature control parameters). The outcome of the analysis is also highly connected to the ensemble size N [Evensen, 2003]. For n = 15 we found that N > 1000 gave sufficiently stable assimilation results, so N = 2000 was chosen.

Table 2. State and Parameter Vector ψ, Initial Values ψ0 and Initial Variances Vψ0
States x
2LAI2.55m2 m−2
5Rgmax200500W m−2
6Rgmin100500W m−2
10LAImax2.52m2 m−2

[25] Estimating hidden parameters by use of the EnKF-based joint state and parameter estimation can result in over-shrinkage of covariances. When parameter ensembles become too narrow observations progressively have smaller impact [Anderson and Anderson, 1999; Aksoy et al., 2006]. Especially vegetation structural parameters (for example, FPARmin) converged more rapidly than climate control parameters (for example, Tmmin) in our initial experiments, resulting in a too constrained ensemble matrix which cannot respond to new observations. The following “kernel perturbation” routine was developed (similar to the inflation factor by Anderson [2001]) to keep the ensemble variance above a pre-defined threshold,

equation image
equation image

where equation image is a vector containing the ensemble mean of each element of ψ. α is a vector of scaling factors, which keep the ensemble variances for ψ above β = 0.0001 times the initial ensemble variances of ψ. The above formulation does not modify the shape of the ensemble distribution, but only scales its variance.

2.1.3. Combining the Prognostic Phenology and the Data Assimilation Model

[26] At the beginning of each data assimilation experiment phenological state and parameter ensembles are initialized after equation (23) with initial values and variances defined in Table 2. The prognostic phenology model is integrated forward in time. At each time step the prognostic phenology model forecasts ensembles of states xt+1 = [FPARt+1, LAIt+1] from previous states xt and parameters θ driven by averaged meteorological predictors equation imaget+1, equation imaget+1 and equation imaget+1. This is like running N separate instances of the phenology model, each one having its distinct parameter set θi (i = 1…N). Only at times when there are two or more observations (for example, m = 2 for one FPAR and one LAI observation; but m ≤ 98 for 7 × 7 km areas as specified below) the above described EnKF-based data assimilation model is applied as follows:

[27] 1. D is calculated after equations (22) and (24) with mean values and variances as defined in the data section (see below).

[28] 2. Af is defined after equation (21) from forecasted state and parameter ensembles.

[29] 3. HA is calculated as an m × N matrix containing modeled FPARt+1 and LAIt+1 for each observed FPAR and LAI ensemble.

[30] 4. Aa is calculated by use of the square root implementation of the EnKF scheme as presented in Evensen [2004].

[31] 5. Kernel perturbation after equations (25) and (26) is applied to Aa.

[32] Aa now contains analyzed state and parameter ensembles updated with information gained from the assimilated observations. Analyzed states and parameters are then used for the next prognostic phenology model forecast.

2.2. Data

2.2.1. Meteorological Predictor Data

[33] The prognostic phenology model needs daily minimum temperature Tm, daily mean global radiation Rg and daily mean vapor pressure deficit vpd as meteorological predictor data. For local-scale simulations at CarboEurope, Large-Scale Biosphere-Atmosphere Experiment in Amazonia (LBA) and AmerFlux flux tower sites (1–22, Table 1) these data are calculated from 30′ and 60′ gap-filled flux tower measurements [Stöckli et al., 2008]. The regional-scale simulation (23, Table 1) uses 1° × 1° gridded ECMWF ERA-40 (1958–2001) and Analysis/Forecast (2002–2006) data at 6 hourly intervals [Uppala et al., 2005].

[34] For data assimilation the ensemble members of Tm, Rg and vpd are stochastically perturbed at each time step with a variance of 0.5 K, 5 W m−2 and 0.5 mb, respectively.

2.2.2. Assimilation Satellite Data

[35] TERRA MODIS 1 km/8 day FPAR and LAI data (MOD15A2, Collection 4) are used to fill the observation matrix D. Tower-centered 7 km × 7 km MODIS ascii data sets (Oak Ridge National Laboratory MODIS Land Subsets, are used for experiments 1–22, while 0.25° × 0.25° areas derived from standard MODIS tiles are used for the regional experiment 23. Data are screened by land-cover class (Table 1) by use of the TERRA MODIS land cover product (MOD12Q1, Collection 4, year 2004). MOD15A2 data are quality screened and not used if they have either a fill value, or a value outside the valid range, or if any of the following quality flag bits are set: (1) FparLai bit 1 (not produced: cloud or other reasons); (2) FparLai bit 2 (dead detectors); (3) FparLai bits 3 or 4 (clouds present or unclear); (4) FparLai bit 7 (could not retrieve pixel); (5) FparExtra bit 2 (snow or ice); (6) FparExtra bit 5 (internal cloud mask); and (7) FparExtra bit 6 (cloud shadow).

[36] The observation vector d contains the m remaining observations. The ensemble observation matrix D is created by stochastically perturbing each element of d with an uncertainty measure Vd (equation (24)) based on the remaining MOD15A2 quality flag bits: minimum uncertainty Vd for each observation is defined as 10% of the maximum observation range (maximum observation range is 1 for FPAR and 10 for LAI). For each of the following MOD15A2 quality flag bits 25% of the maximum observation range multiplied by a “severity factor” s is added to the minimum Vd, if the bit is set: (1) FparLai bit 0 (produced but not the best), s = 4; (2) FparLai bit 6 (main RT is saturated), s = 3; (3) FparLai bit 7 (empirical method used), s = 3; (4) FparExtra bit 3 (aerosol is average or high), s = 8; and (5) FparExtra bit 4 (cirrus clouds detected), s = 7.

[37] Pixel-by-pixel variability provides a further uncertainty measure and effectively perturbs the observation ensemble: for example, it is high when land cover is not homogeneous or when sub-pixel clouds (not captured by the MOD15A2 quality flag bits) are present. Characterization of observation uncertainty as a weighting tool for the data assimilation procedure ensures that only high-quality observations are assimilated into the model.

2.2.3. Validation Phenology Data

[38] Independent validation data is obtained from ground measurements of biophysical state variables, from site-based phenological observations and from historical phenological reconstructions. Such data sets are sparse and do not cover the full climatic range. Validation is complicated by the fact that modeled biophysical states are not compatible with traditional phenological metrics and thus transfer functions need to be developed in order to perform comparative studies [Studer et al., 2007]. To overcome this gap SOS and EOS are determined from the model as the time of half greenness by depicting the day in each year when the LAI crosses a 50% threshold between the maximum and the minimum LAI, the latter two being determined from the whole time series. For each of the phenological validation data sets SOS and EOS are calculated as follows:

[39] 1. Validation data for Morgan Monroe (12) consists of ground-based LAI measurements by use of LICOR-2000 devices [Jonkheere et al., 2004] for the period 2000–2006. LAI is estimated as the mean from measurements along 3 main transects. Dominant tree species in the transects and in the vicinity of the tower are sugar maple (Acer saccharum), tulip poplar (Liriodendron tulipifera), sassafras (Sassafras albidum), white oak (Quercus alba), and black oak (Quercus nigra). SOS and EOS are determined from the observed LAI like for modeled LAI (see above). Uncertainty in SOS and EOS was defined as the measurement interval length at the time of threshold crossing (can vary between a few days and weeks). SOS and EOS were also derived from Net Ecosystem Exchange (NEE) measured at Morgan Monroe. SOS is the day when daily NEE consistently becomes negative in spring and EOS is the day when daily NEE consistently become positive in autumn.

[40] 2. Validation data for Harvard Forest (16) consists of a 17-year-long set of observations of phenological spring and autumn phases for the period 1990–2006 covering three dominant tree species located within 1.5 km of the tower site ( The species used for this study are red oak (Quercus rubra), white oak (Quercus alba) and red maple (Acer rubrum). Only dominant species are used in order to be consistent with the satellite view, and temporal consistency is guaranteed: for each year there is at least one individual of each species for each of the phenological phases observed. SOS is calculated from the mean of spring phases bud-burst and 75% leaf development. EOS is calculated from the mean of the autumn phases leaf coloring and leaf fall. SOS and EOS are the day of year when 50% of all individuals of a species reach a phenological phase. Uncertainty in SOS and EOS are calculated as the standard deviation of SOS and EOS between individuals of a species, and is averaged among species.

[41] 3. Validation data for the Swiss Lowland (23) consists of a 1959–2006 SOS time-series derived from a statistical “Spring Plant” by Rutishauser et al. [2007]. The “Spring Plant” is defined as the weighted annual mean day of year for beech bud burst (Fagus sylvatica) and full flowering of the cherry (Prunus avium) and apple (Malus domesticus) trees. In Switzerland, these phases occurred on average on 28 April, 23 April, and 7 May during the 1951–2006 period. “Spring plant” dates are based on 26–73 observations per year from 8 to 23 reporting stations of the Swiss Phenological Network (SPN) [Defila and Clot, 2001]. Taking into account observation variability, uncertainties and station bias decadal uncertainties are associated with a standard deviation of ±5 days.

2.3. Experimental Setup

[42] Three experiments are performed for each site in Table 1:

[43] 1. ORIGINAL: integrating the original GSI phenology model [Jolly et al., 2005] with original parameters for all years with available meteorological predictor data.

[44] 2. ANALYSIS: integrating the combined prognostic phenology and data assimilation model with 2000 ensembles to estimate model parameters for the MODIS observation period 2000–2006.

[45] 3. PROGNOSTIC: integrating only the prognostic phenology model with new parameters (estimated in ANALYSIS) for all years with available meteorological predictor data.

[46] All simulations are repeated 3 times in order to satisfy spin-up (PROGNOSTIC) and for extending the data assimilation period (ANALYSIS). Initial parameter and state values for ANALYSIS are given in Table 2. Climate control parameters for ORIGINAL are from in Jolly et al. [2005], complemented with the following structural vegetation parameters: FPARmin = 0.05, fv = 0.95, LAImax = 6.5. γ is not needed since GSI is not a prognostic model. FPAR or LAI are directly diagnosed from GSI at every time step by setting P = GSI and applying equations (5) and (6).

3. Results

[47] The following analysis reveals key phenological parameters resulting from the data assimilation experiment. We further evaluate the applicability of the data assimilation framework for predicting the seasonal course of FPAR and LAI in a number of climate zones and then validate our model results on the inter-annual to decadal time scale by use of independent phenological ground observations.

3.1. Parameter Uncertainty Reduction

[48] Parameters from the ANALYSIS model where the ensemble standard deviation is reduced below 20% of their initial uncertainty (Table 2) are considered successfully constrained. A 3-year-long data assimilation for site 12 (Morgan Monroe) is exemplified in Figure 2: model states (for example, FPAR) and model parameters (for example, FPARmin, FPARmax, Tmmin, Tmmax, Rgmin, Rgmax) are continuously adjusted and their uncertainty is reduced by assimilation of MODIS observations.

Figure 2.

ANALYSIS model simulations: joint estimation of (a) biophysical states (FPAR), structural vegetation parameters (FPARmin, FPARmax) and (b) climate control parameters (Tmmin, Tmmax, Rgmin and Rgmax) by use of EnKF (Black crosses are the highest quality MODIS observations, error bars show the observation uncertainty; for details see methods section).

3.1.1. Climate Control Parameters

[49] Table 3 reveals a strong uncertainty reduction of both Tmmin and Tmmax temperature control parameters at temperate deciduous broadleaf forest (DBF) and mixed forest (MF) sites 1, 8, 12, 16, 20 and 23. The strongest constraint results for Morgan Monroe (site 12) where Tmmin = 276.3 ± 0.3 K and Tmmax = 278.9 ± 0.2 K encompass a narrow temperature band with uncertainties being only about 4% of their initial ranges. DBF mean Tmmin is 272.8 ± 2.4 K and mean Tmmax is 278.8 ± 2.3 K, respectively, both being slightly higher but similar to Tmmin = 271.1 K and Tmmax = 278.1 K estimated by Jolly et al. [2005].

Table 3. Climate Control Parameters and Uncertainties Resulting From the ANALYSIS Model Simulationa
  • a

    Bold numbers show successfully constrained parameters.

  • b

    Below 10% of initial uncertainty.

  • c

    Below 20% of initial uncertainty.

1269.8 ± 2.0278.8 ± 0.8c69.6 ± 9.9228.9 ± 7.32.0 ± 1.633.2 ± 3.8
2267.1 ± 4.0282.8 ± 2.489.4 ± 16.2178.3 ± 12.62.2 ± 3.231.0 ± 4.6
3266.7 ± 3.4279.1 ± 2.069.1 ± 10.8210.2 ± 7.32.5 ± 1.837.4 ± 3.5
4255.7 ± 5.3267.5 ± 3.485.3 ± 8.0187.6 ± 3.3c6.2 ± 0.7b35.3 ± 4.8
5264.2 ± 1.6289.7 ± 1.621.6 ± 14.1192.4 ± 9.711.2 ± 4.430.9 ± 4.6
6268.6 ± 1.1c277.9 ± 0.6b−38.6 ± 11.7212.7 ± 5.910.1 ± 1.3c37.0 ± 4.5
7284.3 ± 1.4c291.2 ± 1.0c148.7 ± 16.8256.9 ± 13.010.7 ± 3.831.4 ± 4.4
8272.5 ± 2.2282.5 ± 0.9c101.8 ± 9.7207.4 ± 6.813.9 ± 1.4c39.7 ± 3.3
9261.7 ± 4.0277.9 ± 2.362.1 ± 11.7199.7 ± 7.50.8 ± 1.634.1 ± 4.0
10270.3 ± 5.8284.7 ± 4.494.3 ± 16.2165.3 ± 2.8c19.8 ± 1.938.4 ± 3.6
11265.7 ± 6.7279.0 ± 5.372.3 ± 17.1144.4 ± 5.610.8 ± 2.634.7 ± 4.7
12276.3 ± 0.3b278.9 ± 0.2b92.1 ± 1.6b176.6 ± 1.0b15.7 ± 2.237.5 ± 3.2
13256.0 ± 1.9279.6 ± 1.0c45.2 ± 12.8196.8 ± 5.27.9 ± 0.9c30.7 ± 2.9
14262.5 ± 2.7277.0 ± 1.4c65.4 ± 12.8226.9 ± 9.26.9 ± 1.4c25.5 ± 2.2
15259.5 ± 2.1282.6 ± 1.1c140.3 ± 8.2229.6 ± 5.52.9 ± 1.530.3 ± 2.2
16273.8 ± 0.7b277.4 ± 0.4b101.0 ± 2.8c138.4 ± 1.5b21.5 ± 1.634.5 ± 3.0
17268.6 ± 0.5b275.9 ± 0.3b105.2 ± 8.9246.4 ± 3.0c4.1 ± 0.6b32.6 ± 3.1
18269.4 ± 1.7277.5 ± 0.5b−10.1 ± 13.5177.1 ± 6.213.0 ± 2.035.6 ± 4.0
19273.7 ± 1.2c287.5 ± 0.5b151.8 ± 2.7c252.3 ± 1.8b21.4 ± 2.835.2 ± 2.8
20271.7 ± 0.4b276.4 ± 0.4b113.9 ± 3.9c184.9 ± 1.4b28.0 ± 3.837.9 ± 3.4
21253.9 ± 4.0271.9 ± 2.02.6 ± 7.6191.7 ± 6.94.9 ± 0.3b12.6 ± 0.6b
22255.1 ± 4.1268.5 ± 2.283.3 ± 5.0187.5 ± 4.2c7.2 ± 0.5b13.5 ± 0.7b
23270.8 ± 0.6b278.5 ± 0.3b45.8 ± 7.3162.2 ± 2.9c0.9 ± 1.3c40.9 ± 2.7

[50] Evergreen needleleaf forest (ENF) sites 6, 13, 17 and 18 have a mean Tmmax = 277.7 ± 1.5 K. Only for sites 6 and 17 is Tmmin sufficiently constrained to 268.6 ± 0.8 K. Both temperature control factors are lower for ENF compared to DBF. The seasonality detected here only accounts for the phenology of the deciduous vegetation fraction in evergreen needleleaf forests. The needleleaf forest itself would have a constant FPAR which is equal to FPARmin.

[51] Most DBF sites which successfully constrained temperature parameters also have strong light constraints (Table 3). Again, site 12 has the clearest signal with Rgmin = 92.1 ± 1.6 W m−2 and Rgmax = 176.6 ± 1.0 W m−2. DBF mean Rgmin = 102.3 ± 11.0 W m−2 and Rgmax = 166.6 ± 22.8 W m−2 (sites 4, 12, 16 and 20).

[52] Light control parameters for the agricultural site Bondville (19) are more than 50 W m−2 higher than for DBF sites. Rgmax is weakly constrained for both the tropical site Santarem KM83 (10) and for the mediterranean grassland site Vaira Ranch (22). The lower bound Rgmin however cannot be clearly identified there.

[53] Mediterranean ecosystems, where none of the other climate controls could be clearly identified, show success with moisture control parameters: at Tonzi Ranch (21) and Vaira Ranch (22) the uncertainties of vpdmin and vpdmax are strongly reduced to less than 7% and 10% of their initial uncertainty respectively (Table 3).

[54] For some ENF sites vpdmin can be estimated, but not vpdmax. Surprisingly the phenology of the two tropical sites 10 and 11 and the agricultural site 19 do not reveal any moisture controls.

3.1.2. Structural Vegetation Parameters

[55] Table 4 demonstrates that FPARmin and fv are successfully constrained since uncertainties are mostly below 10% of initial uncertainty. For the tropical site KM83 (10) fv (and therefore FPARmax) can be constrained, but not FPARmin. Only few observations provide a good constraint of FPARmin since low FPAR and LAI values for tropical areas coincide with higher uncertainty in observations (for example, Figure 3b). Mean LAImax is 6.14 ± 0.79 for forests and 3.91 ± 0.74 for short vegetation.

Figure 3.

PROGNOSTIC model simulations: prognostic FPAR and LAI compared to MODIS FPAR and LAI at a (a) temperate deciduous broadleaf forest, (b) tropical evergreen broadleaf forest, (c) boreal evergreen needleleaf forest, and (d) mediterranean savanna (Black crosses are the highest quality MODIS observations, error bars show the observation uncertainty; for details see methods section).

Table 4. Structural Vegetation Parameters and Uncertainties Resulting From the ANALYSIS Model Simulationa
  • a

    Bold numbers show successfully constrained parameters.

  • b

    Below 10% of initial uncertainty.

  • c

    Below 20% of initial uncertainty.

10.70 ± 0.01b0.88 ± 0.02b6.05 ± 0.24c0.33 ± 0.07
20.65 ± 0.02b0.81 ± 0.04c5.67 ± 0.530.38 ± 0.10
30.76 ± 0.01b0.95 ± 0.02b6.85 ± 0.400.37 ± 0.08
40.51 ± 0.02b0.97 ± 0.03c6.52 ± 0.26c0.40 ± 0.08
50.65 ± 0.01b0.93 ± 0.03c6.05 ± 0.310.12 ± 0.02b
60.73 ± 0.01b0.94 ± 0.02b6.67 ± 0.19c0.08 ± 0.01b
70.25 ± 0.03c0.75 ± 0.04c4.39 ± 0.460.37 ± 0.08
80.80 ± 0.00b0.84 ± 0.00b3.99 ± 0.04b0.48 ± 0.07
90.62 ± 0.02b0.97 ± 0.03c5.96 ± 0.330.27 ± 0.08
100.63 ± 0.050.91 ± 0.02b6.50 ± 0.19c0.34 ± 0.08
110.75 ± 0.04c0.90 ± 0.02b6.61 ± 0.21c0.37 ± 0.11
120.50 ± 0.00b0.98 ± 0.01b6.36 ± 0.04b0.76 ± 0.03c
130.64 ± 0.02b0.98 ± 0.02b6.76 ± 0.330.11 ± 0.01b
140.19 ± 0.01b0.56 ± 0.03c3.11 ± 0.25c0.35 ± 0.07
150.15 ± 0.01b0.62 ± 0.03c3.39 ± 0.22c0.61 ± 0.09
160.71 ± 0.01b0.95 ± 0.01b6.29 ± 0.09b0.58 ± 0.07
170.60 ± 0.01b0.93 ± 0.01b5.56 ± 0.13b0.27 ± 0.04c
180.84 ± 0.01b0.92 ± 0.01b6.78 ± 0.08b0.14 ± 0.04c
190.31 ± 0.00b0.73 ± 0.01b3.88 ± 0.07b0.59 ± 0.05
200.51 ± 0.01b0.94 ± 0.01b6.26 ± 0.10b0.77 ± 0.05
210.58 ± 0.01b0.92 ± 0.02b5.00 ± 0.15c0.22 ± 0.04c
220.67 ± 0.01b0.82 ± 0.01b4.17 ± 0.08b0.76 ± 0.10
230.74 ± 0.01b0.91 ± 0.01b7.06 ± 0.25c0.29 ± 0.06

[56] Maximum growth-rate γ is more difficult to estimate than the other structural vegetation parameters. It is generally lower for boreal sites (for example, 0.12 day−1 for Kaamanen) than for temperate sites (for example, 0.76 day−1 for Morgan Monroe). Mean γ is 0.24 day−1 and has a high site-by-site variability of 0.23 day−1. Since γ is a surrogate for complex biochemical processes controlling the rate of leaf development a generalized value may not be realistic. γ only influences the duration of phenological events and is therefore only weakly constrained by a short observation window each year.

3.1.3. Time Averaging Parameters

[57] The use of data assimilation for estimating time averaging parameters was highly unsuccessful. For none of the sites the uncertainties in the time averaging parameters are sufficiently reduced below 20% of initial uncertainty.

3.2. Prediction of Seasonal Phenology

[58] The PROGNOSTIC model with parameters constrained by the data assimilation is now compared to the ORIGINAL model with parameters after Jolly et al. [2005]. The aim of this section is to document the ability of the model to predict “MODIS-like” FPAR and LAI without the help of satellite data. The correlation coefficient (R) and the root mean square error (RMSE) are calculated by comparing daily MODIS-observed and model-predicted FPAR and LAI values for different climate zones.

3.2.1. Temperate

[59] The ORIGINAL model already has accurate spring and fall timing for DBF ecosystems like, for example, Morgan Monroe (red curve in Figure 1a) since temperature control factors are only slightly higher (see previous section) in PROGNOSTIC than in ORIGINAL. While R for FPAR only rises from 0.86 in ORIGINAL to 0.93 in PROGNOSTIC (Table 5), RMSE for FPAR and LAI is less than halved. The mixed forest site Swiss Lowland (23) has the highest RMSE for LAI among all temperate forests. It encompasses a much larger area than the other sites and most likely has substantial subgrid-scale variability in FPARmin and FPARmax.

Table 5. FPAR and LAI Performance for the ORIGINAL and the PROGNOSTIC Model Simulationsa
  • a

    R and RMSE (in brackets) are calculated from daily comparisons of modeled to MODIS FPAR and LAI. For bold numbers R is significant with p < 0.0001 (two-tailed T-test, μ0: R = 0).

10.55 (0.31)0.32 (2.90)0.56 (0.09)0.37 (0.95)
20.62 (0.30)0.34 (2.95)0.68 (0.08)0.39 (0.96)
30.82 (0.41)0.71 (2.33)0.85 (0.04)0.77 (0.76)
40.86 (0.28)0.80 (2.04)0.92 (0.08)0.87 (1.02)
50.48 (0.26)0.31 (2.06)0.49 (0.14)0.46 (0.73)
60.72 (0.28)0.61 (2.18)0.66 (0.18)0.66 (1.09)
70.47 (0.38)0.49 (4.41)0.82 (0.14)0.85 (0.50)
80.46 (0.42)0.64 (2.63)0.55 (0.06)0.66 (0.61)
90.80 (0.29)0.64 (2.55)0.84 (0.07)0.70 (0.87)
100.00 (0.16)0.00 (2.39)0.18 (0.12)0.27 (1.22)
110.40 (0.16)0.33 (2.13)0.15 (0.13)0.34 (1.29)
120.86 (0.23)0.91 (1.24)0.93 (0.08)0.97 (0.62)
130.76 (0.30)0.69 (1.89)0.77 (0.17)0.78 (0.89)
140.64 (0.31)0.47 (2.69)0.82 (0.08)0.80 (0.18)
150.76 (0.34)0.66 (3.31)0.90 (0.06)0.90 (0.12)
160.73 (0.34)0.90 (1.55)0.74 (0.14)0.92 (0.79)
170.67 (0.37)0.67 (1.69)0.70 (0.13)0.78 (0.67)
180.60 (0.40)0.44 (2.78)0.65 (0.08)0.60 (1.43)
190.76 (0.27)0.74 (3.27)0.93 (0.09)0.83 (0.57)
200.89 (0.19)0.89 (1.23)0.90 (0.09)0.95 (0.72)
21−0.10 (0.40)−0.15 (3.90)0.80 (0.07)0.83 (0.53)
22−0.13 (0.41)−0.22 (3.73)0.71 (0.09)0.79 (0.61)
230.59 (0.10)0.47 (2.14)0.66 (0.08)0.56 (1.05)

[60] Figure 4a sheds light on the underlying climatological processes governing seasonal phenology: temperature controls the rapid greenup in April (temperature factor rises after light factor) and light controls senescence in October (light factor decreases before temperature factor). The narrow range of Tmmin and Tmmax and a concurrently high γ (Tables 3 and 4) create a short positive peak of the growth vector ΔGSI in April. Its negative counterpart in autumn is less well defined because of the slightly broader range of Rgmin and Rgmax.

Figure 4.

PROGNOSTIC model simulations: factors controlling the growth vector ΔGSI/Δt of the prognostic phenology model in response to the environmental predictors temperature, light and moisture at a (a) temperate deciduous broadleaf forest, (b) tropical evergreen broadleaf forest, (c) boreal evergreen needleleaf forest, and (d) mediterranean savanna.

3.2.2. Tropical

[61] Low or no correlation is found for tropical sites (10, 11), either with the ORIGINAL or with the PROGNOSTIC model (Table 5). The two sites have constantly high FPAR and LAI constrained only by a few good quality observations during data assimilation. Furthermore correlation is not a suitable metric for a constant time-series. RMSE on the other hand decreases for FPAR and is halved for LAI at both sites.

[62] Figure 3b reveals a decreased LAI during the wet season (March–June) and leaf growth at the beginning of the dry season (July–August) simulated by PROGNOSTIC. The underlying climatological drivers shown in Figure 4b demonstrate that during the wet season light controls the phenological cycle for tropical EBF, and not humidity during the dry season as assumed by many other models.

3.2.3. Boreal

[63] Correlation for Boreal ENF either becomes worse (site 6) or only slightly better (site 13) in PROGNOSTIC compared to ORIGINAL. As already found for the tropical sites R is not a suitable metric for evergreen ecosystems. However RMSE is substantially lowered to the order of 10% of FPAR and LAI magnitude by use of more realistic structural vegetation parameters such as the evergreen vegetation fraction (FPARmin). Figure 3c for instance shows a minimum LAI of around 2 corresponding to an FPARmin of around 0.6 for site BOREAS Old Black Spruce (13).

[64] Figure 3c suggests that SOS of the ENF's deciduous cover is missed by 1–2 weeks and that EOS is predicted as a process much too gradual. Figure 4c reveals that SOS is gradually initiated by temperature and later influenced by light while EOS is initiated by light but later modulated by temperature. During the short summer growing period neither light nor temperature but rather moisture constrains leaf development for the ENF's deciduous cover.

3.2.4. Mediterranean

[65] Highest increase in R and strongest decrease in RMSE for both FPAR and LAI is found for drought-deciduous mediterranean ecosystems (sites 7, 21 and 22, Table 5). For Tonzi Ranch (21) R increases from −0.1 (FPAR) and −0.15 (LAI) to 0.8 (FPAR) and 0.83 (LAI) while RMSE decrease from around 50% to less than 10% of their maximum parameter range (0–1 for FPAR and around 0–7 for LAI).

[66] Figures 1d and 3d document improvements in predicting drought-deciduous phenology for a mediterranean ecosystem like Tonzi Ranch (21). A few years of satellite data can successfully yield phenological parameters for this ecosystem. Figure 4d reveals that light constrains phenological development in winter (November–March) while a prolonged drought period inhibits leaf growth in summer (June–November). The growth vector ΔGSI shows the opposite seasonal cycle compared to temperate deciduous phenology: it is negative in late spring and positive in autumn when the dry season is over.

3.3. Validation of Inter-Annual Variability

3.3.1. Morgan Monroe

[67] PROGNOSTIC has high skill to reproduce MODIS LAI with R = 0.97 (Table 5) and also compares well to site-measured LAI with R = 0.90 (not shown). The ORIGINAL model has a lower R = 0.84 (not shown) when compared to site-measured LAI. However RMSE for PROGNOSTIC is higher (1.65, not shown) compared to ORIGINAL (1.09, not shown) which is mainly because PROGNOSTIC (and MODIS) LAI rises above 6 while site-measured LAI saturates at around 5. Since the MODIS assimilation area covers 49 km2 and the validation transect only roams a few km2, such structural vegetation parameter differences can occur.

[68] The coarse and irregular LAI measurement interval leads to a high uncertainty of 1–2 weeks for measured SOS and EOS (Figure 5). PROGNOSTIC has some skill to predict 2000–2006 SOS (Table 6, R = 0.40, compared to R = 0.23 for ORIGINAL), but predicted SOS is around 12days early. Correlation rises to 0.64 (significant only with p < 0.1) and bias is reduced to 2.4 days if PROGNOSTIC SOS is compared to SOS derived from site-measured NEE.

Figure 5.

ORIGINAL and PROGNOSTIC model simulations: SOS and EOS for a deciduous broadleaf forest in eastern US validated with radiatively measured ground LAI for 2000–2006.

Table 6. Validation: Correlation R and Bias (in Brackets) of Observed Versus Predicted SOS and EOS for the ORIGINAL and PROGNOSTIC Modelsa
SiteObs. MethodThresholdYearsORIGINALPROGNOSTIC
  • a

    Site-measured leaf area indices (LAI), eddy-covariance measurements of CO2 fluxes (NEE) or species-level phenological observations (SPC) are used for validation. SOS and EOS are derived from modeled and observed LAI time series by choice of a threshold ranging between 25–75% (see methods section). For bold numbers R is significant with p < 0.0001 (two-tailed T-test, μ0: R = 0).

Morgan MonroeLAI50%2000–20060.23 (6.4)−0.05 (3.3)0.40 (−12.3)−0.42 (1.0)
Morgan MonroeNEE50%2000–2006−0.37 (18.4)0.05 (14.7)0.64 (2.4)0.16 (−2.2)
Harvard ForestSPC50%1990–20060.51 (−42.2)−0.44 (−7.1)0.82 (−23.7)−0.17 (−28.9)
Swiss LowlandSPC50%1958–20060.55 (−4.5)-0.60 (−4.8)-
Swiss LowlandSPC25%1958–20060.53 (−31.7)-0.73 (−18.5)-
Swiss LowlandSPC75%1958–20060.63 (20.8)-0.52 (4.3)-

[69] Neither ORIGINAL nor PROGNOSTIC have skill to reproduce EOS variability (Table 6 and Figure 5). EOS derived from NEE is similarly accurate in absolute timing (EOS bias = −1.7 days), but there is no predictive skill in PROGNOSTIC to reproduce EOS variability (R = 0.1, not significant).

3.3.2. Harvard Forest

[70] Skill for predicting SOS is much higher for PROGNOSTIC (R = 0.82; significant with p < 0.0001; Table 6) than for ORIGINAL (R = 0.51). Although bias is reduced in PROGNOSTIC, SOS occurs three weeks early compared to site-observed SOS. As shown in Figure 6 PROGNOSTIC precisely captures early springs of 1991, 1993, 1998, 2001 and 2004 with a similar inter-annual variability as seen from the observations. ORIGINAL especially misses late years 1997 and 2005 as well as early years 2001 and 2004.

Figure 6.

ORIGINAL and PROGNOSTIC model simulations: SOS and EOS for a deciduous broadleaf forest in eastern US validated with ground observed phenological phases of individual species for 1990–2006.

[71] As for Morgan Monroe EOS variability cannot be reproduced by either ORIGINAL or PROGNOSTIC. Figure 6 shows that many years are anti-correlated, such as 1994 and 2002 for ORIGINAL and 1996, 2003 and 2004 for PROGNOSTIC. Bias furthermore increases for PROGNOSTIC which indicates that leaf coloring and leaf fall of the selected species (oak and beech) may not be representative for the large-scale signal from MODIS.

3.3.3. Swiss Lowland

[72] A 49-year-long SOS time series in the Swiss Lowland [Rutishauser et al., 2007] allows testing the model's performance at decadal time scale. R only increases from 0.55 in ORIGINAL to 0.60 in PROGNOSTIC (Table 6). Modeled SOS in both models occurs around 5 days earlier than observed, a systematic difference already found at the other two validation sites. Magnitude of simulated inter-annual variability is larger than observed, but patterns match well: 1959, 1961, 1974, 1981 and 1989 have a early SOS while 1958, 1965, 1973, 1980, 1982 and 2001 have a late SOS in both the observations and in PROGNOSTIC (Figure 7). Changing the threshold at which SOS is diagnosed from modeled LAI substantially influences predictive skill: R = 0.73 for 25% and R = 0.52 for 75% (significant with p < 0.0001; Table 6). Late years 1984–1986 or 2002 or the early year 1990 can only be reproduced by choice of a 25% threshold, for which however the bias increases to −18.5 days (Figure 7).

Figure 7.

SOS defined by a 50%, 25% and 75% LAI threshold from PROGNOSTIC model simulations for mixed forest pixels over Switzerland validated with a reconstructed time-series of SOS from ground observations for 1958–2006.

[73] The model can be used to explain phenological processes responsible for the simulated temporal patterns which would not be possible from data alone. The temperature factor f(equation image) correlates with R = 0.58 to observed SOS and the light factor f(equation image) has R = 0.43. No correlation is found for the moisture factor f(equation image) with SOS as expected. Therefore, while temperature or light alone have substantial skill to reproduce spring variability for this temperate ecosystem, both are needed: temperature controls in years with late SOS while in years with early SOS light is limiting.

4. Discussion

[74] Advantages and disadvantages of remote sensing data assimilation for reducing uncertainty in model parameters is now discussed, followed by an analysis of simulated phenological variability and a discussion of validation issues revealed by this study.

4.1. Remote Sensing Data Assimilation

[75] Climate control parameters influence the phenological timing and phase (R in Table 5) while structural vegetation parameters define the magnitude of FPAR or LAI (RMSE in Table 5). Simulated timing can be compared to ground observed phenological events [Studer et al., 2007] as shown in the validation of our results. For the application in global land surface models a realistic timing is valuable in order to improve vegetation biochemistry and therefore the seasonal course of terrestrial water and carbon exchanges. A more realistic vegetation structure on the other hand, like a clearly defined contribution of the deciduous cover for ENF or the maximum LAI for EBF, mostly influences vegetation biophysics in those models (like albedo or aerodynamics). Structural parameters have a smaller influence on biochemistry since vegetation is inactive when approaching FPARmin and photosynthesis is insensitive to LAI variability when LAI is close to LAImax.

[76] By use of data assimilation structural vegetation parameters were better constrained than climate control parameters simply because vegetation structure is explicitly observed while the climate control parameters driving the predicted and observable phenological states are an implicit part of the prognostic phenology model. Also, phenological events which constrain climate control parameters only occur 1–2 times each year and the events can be shorter than the MODIS 8-day compositing period (for example, rapid spring greenup at site 12). The attempt to constrain climate control parameters (moderately successful in this study) and time averaging parameters (highly unsuccessful in this study) should therefore be repeated with homogenized and calibrated AVHRR-based long-term data records extending back to the 1980s [Tucker et al., 2005; Masouka et al., 2007]. However data assimilation relies on the data quality layer which cannot be provided for most AVHRR-based data sets and is a unique feature of newer remote sensing products like those derived from MODIS.

[77] It should be mentioned that the well constrained structural vegetation parameters depend on satellite retrieval algorithm [Tian et al., 2002; Morisette et al., 2006; Garrigues et al., 2008] and have a high-spatial heterogeneity below the 1 km MODIS pixel size [Cohen et al., 2003, 2006]. Variability of structural vegetation parameters also increases at larger scales. Confirming results by Fisher and Mustard [2007] and Fisher et al. [2007] we for instance find that at Harvard Forest (7 km × 7 km) predicted LAI explains 92% of MODIS DBF pixels while for the Swiss Lowland (0.25° × 0.25°) predicted LAI explains only 56% of all MODIS MF pixels. Instead of using pixel-predominant vegetation classes we suggest that a linear mixing of 10–20 plant functional types for each MODIS pixel can account for landscape heterogeneity. It would further eliminate structural vegetation parameters like FPARmin, FPARmax or fv which explicitly account for subgrid-scale landscape heterogeneity in our experiments.

4.2. Predicting Phenology

[78] Site 23 reveals that SOS in warm years can be driven by light and not just by temperature. EOS is mainly controlled by light because light becomes limiting before temperature in autumn. DBF temperature control parameters are constrained to within 0.5–1.0 K and we confirm previous estimates of these parameters. DBF light control parameter uncertainty is 1–2 W m−2 and it is substantially larger for boreal ENF sites, introducing some errors in predicting EOS of the deciduous fraction for ENF (for example, site 13, Figure 3c). Boreal ENF phenology is complex to model because there is no single climatological predictor which can be identified for either SOS or EOS. Also, climate control parameters are weakly constrained for boreal ENF sites because they only account for the deciduous fraction within those forests which is further highly variable from site to site.

[79] Concurrent with studies of tropical EBF heat and water fluxes [da Rocha et al., 2004] we find that light is the main predictor for tropical EBF phenology. Large uncertainties in the order of 5–15 W m−2 for tropical light controlling parameters are due to remaining uncertainties in the satellite data (see, for example, Figure 3b). Results should still be robust because of the employed quality screening and uncertainty estimation. For instance, for KM83 (10) 80.7% and for site KM67 (11) 85.5% of the MODIS data were not used for assimilation. Figure 3b further shows that the predicted FPAR and LAI is high and almost constant during most of the cloudy wet season and during the aerosol contaminated dry season when MODIS data have the largest error bars. In PROGNOSTIC tropical EBF shed most of their leaves at the end of the wet season and grow them back during the dry season. This result agrees with previously published ground observed [Hutyra et al., 2007] and satellite observed phenological patterns for the Amazon [Huete et al., 2006; Myneni et al., 2007]. It disagrees with other prognostic phenology models employed in climate research (Figure 1b).

[80] Reliable satellite observations in sub-tropical and mediterranean areas result in highly constrained light and moisture control parameters for drought-deciduous vegetation with a remaining uncertainty of 2–5 W m−2 and less than 1 mb respectively. vpd is a good predictor for leaf senescence in mediterranean grasslands while light controls SOS in early spring. By use of data assimilation correlations switch from negative and non-significant to highly significant on the order of 0.8 and RMSE decreases to around 25% of initial RMSE (Table 5). Instead of vpd either soil moisture or precipitation could also be used as predictors. Soil moisture was not used here because it requires a complex parameterization of subsurface hydrology and vegetation biophysics. Furthermore soil moisture magnitude and variability is largely model dependent and therefore not suitable for a generally applicable phenology model. Use of precipitation was unsuccessful in initial experiments (not shown) since an integrating storage process (in other words, a soil hydraulic model including biophysical sources and sinks) would be required to model the effect of precipitation on phenology. In support of Jolly et al. [2005] it is found here that vpd is a very suitable predictor for drought-deciduous phenology since it is a good surrogate for soil water availability [Hunt et al., 1991].

4.3. Validating Phenology

[81] Arbitrarily chosen thresholds serve as transfer functions to relate predicted phenological states like FPAR and LAI to ground observed phenological phases. The 50% threshold has been widely used in literature and also in our study, but PROGNOSTIC (and therefore MODIS-derived) SOS derived with a 50% threshold occurs 5–20 days earlier than observed SOS at the chosen validation sites. Despite this absolute bias inter-annual variability of SOS is accurate to within 2–3 days which is in accordance with findings by Schwartz et al. [2002] and Studer et al. [2007].

[82] The bias might be explained because ground cover can emerge before trees green up [Ahrends et al., 2008] and ground cover is not accounted for in most ground-based radiative LAI measurements or in phenological observations of phases like bud-burst or flowering. While this might be an issue at Harvard Forest, daily tower-based photographs at Morgan Monroe (not shown) reveal a dormant under-story at the time of PROGNOSTIC SOS. As shown in Figure 7 and in Table 6 the use of a 25% threshold raises correlation (R = 0.73) between modeled and observed SOS but it creates a negative bias of −19 days. A 75% threshold on the other hand decreases R to 0.52 but it has a better bias of +7 days. The 75% threshold captures a later stage of greenup which is likely more in accordance with observed bud-burst of tall trees. Our findings at site 23 further provide evidence that light can modulate greenup after temperature has initiated leaf-out. A more complicated set of environmental controls would therefore explain the lower R for the 75% threshold. The 25% threshold has a strong correlation to temperature and defines a stage of greenup which is highly predictable. We find that at Morgan Monroe modeled SOS derived by a 50% threshold and SOS derived from NEE only differs by 2.4 days. NEE integrates both plant physiological and phenological activity [Law et al., 2002] and is more comparable to modeled FPAR and satellite-derived FPAR. This further supports our methodology since phenological parameterizations in climate models should be coupled to the seasonal cycle of terrestrial photosynthesis rather than to observable events like bud burst or flowering.

[83] Prediction of EOS inter-annual variability and absolute timing was not successful at either Harvard Forest or Morgan Monroe. Although NEE-derived EOS at Morgan Monroe is accurate to within 2 days, inter-annual variability of modeled EOS has a negative correlation. The previous section demonstrated that modeled EOS at temperate DBF sites is controlled by light. Solar radiation has a small inter-annual variability compared to the temperature-driven SOS so predictive skill of the model is expected to be rather small. It is also likely that other biotic and climatological controls not simulated in the model might be responsible for observed EOS variability.

[84] The Swiss Lowland validation experiment demonstrated that PROGNOSTIC has not only skill on inter-annual and local but also decadal and regional time and spatial scales. The model simulates a difference between late years in the 1980s and early years in the 1990s. A prognostic phenology model trained with only 7 years of MODIS data covering a regional spatial domain can be used to predict several decades of SOS with inter-annual and decadal variability comparable to a completely independent 49-year-long phenological validation record composed of flowering and bud burst of individual ground-observed species.

5. Conclusions and Outlook

[85] Most current diagnostic satellite-derived phenology data sets are unreliable during the cloudy wet season and aerosol-loaded dry season in tropical climates or the dark winter and snow covered spring season in boreal and arctic climates. Simple empirical phenology models like Spring Indices [Schwartz et al., 2006] or the statistical Spring Plant [Rutishauser et al., 2007] successfully reproduce interannual-decadal variability for the temperate DBF biome. On the other hand most mechanistic representations of global phenology used in climate models fail to reproduce observations when applied to a broad range of climate zones and ecosystem types as reviewed in the introduction of this study. This mismatch can be attributed to both a weak understanding of phenological processes [Kucharik et al., 2006] and the uncertainty of empirical model parameters that are often derived from a few local observations in temperate ecosystems and applied globally as part of climate model simulations.

[86] A comparison of Figures 1 and 3 demonstrates that these uncertainties can be successfully mitigated at a range of climate zones by use of remote sensing data assimilation. Modeled drought-deciduous phenology is highly improved because current phenology models do not well represent drought-deciduous phenology and satellite data are very reliable in sub-tropical climates. Moisture and light control parameters can be successfully constrained for such ecosystems. Also, vpd is found to be a more suitable predictor for moisture control than rainfall without the need for an explicit representation of soil moisture processes. Tropical EBF is revealed to be light controlled during the wet season and greens up during the dry season. High uncertainty of satellite observations and low seasonal FPAR and LAI variability result in a moderate uncertainty of underlying light control parameters. However the use of a phenology model constrained by only a few high-quality satellite observations effectively overcomes the deficiency of most current satellite-only phenological data sets in tropical areas.

[87] The model further reproduces the inter-annual to decadal variability of SOS with correlations ranging between 0.6–0.9 when compared to independent ground observations of phenological phases. The employed long-term validation data sets [van Vliet et al., 2003; Rutishauser et al., 2007] are however only available for temperate DBF and MF ecosystems while arctic, mediterranean and tropical ecosystems generally lack such validation sources. Since we suggest that current parameterizations of drought-deciduous phenology need to be improved, a broader set of phenology data sets such as those described in Cleland et al. [2007] need to be available for validation. Modern phenological observation methods like NEE measurements from FLUXNET [Baldocchi et al., 2001; Friend et al., 2007] or camera-based vegetation indices [Richardson et al., 2007; Ahrends et al., 2008] are further valuable validation tools since they provide an integrated view of ecosystem states.

[88] This study is a first step to improve phenological parameterizations such as IBIS, CN or TRIFFID with a high potential to mitigate uncertainties in the simulated global water and carbon cycle of current climate models [Friedlingstein et al., 2006]. In support of Kathuroju et al. [2007] our prognostic phenology model based on satellite remote sensing data of a new sensor like MODIS can be superior to the data on which it is based. It is for instance able to predict surface biophysical states when satellite observations are of minor quality. It can be used for numerical weather forecast or seasonal climate prediction where both vegetation structural parameters and climate control parameters need to be known to predict the state of land surface vegetation before satellite data become available. It is also a valuable tool to answer climate change questions. For instance, as a hypothesis from our findings, recently observed advances in northern hemisphere SOS might come to a halt when temperate deciduous vegetation switches from temperature-driven SOS in early and warm springs of extreme years today and in the past [Rutishauser et al., 2008] to a possibly light-limited SOS in a predicted future climate [Intergovernmental Panel on Climate Change, 2007a].


[89] The NASA Energy and Water Cycle Study (NEWS) grant No. NNG06CG42G is the main funding source of this study. Computing resources were mainly provided by sub-contract 2207-06-016 issued by Science System and Application Inc. through NASA contract NAS5-02041. The MODIS Science Team and the MODIS Science Data Support Team provided the MOD15A2 and the MOD12Q1 data. Meteorological predictor data have been provided by the site PI's and their teams participating in the CarboEurope IP, AmeriFlux and LBA projects as part of FLUXNET: Marc Aubinet (Vielsalm), Christian Bernhofer (Tharandt), Riccardo Valentini (Castelporziano and Collelongo), Tuomas Laurila (Kaamanen), Timo Vesala (Hyytiälä), Maria Jose Sanz (El Saler), Serge Rambal (Puechabon), Andre Granier (Sarrebourg), Mike Goulden (Santarem Km83), Steven Wofsy (Santarem Km67 and BOREAS NSA Old Black Spruce), Hans Peter Schmid (Morgan Monroe State Forest), Brian Amiro (BOREAS NSA Old Black Spruce), Lawrence Flanagan (Lethbridge), Tilden Meyers (Fort Peck and Bondville), Bill Munger (Harvard Forest) and Russ Monson (Niwot Ridge), Kyaw Paw U (Wind River) and Dennis Baldocchi (Tonzi and Vaira Ranch). The first author is grateful to Arif Albaryrak (NASA/GSFC GMAO) for his advice and comments on the ensemble data assimilation methodology.