Carbon cycle data assimilation with a generic phenology model



[1] Photosynthesis by terrestrial plants is the main driver of the global carbon cycle, and the presence of actively photosynthesizing vegetation can now be observed from space. However, challenges remain when translating remotely sensed data into carbon fluxes. One reason is that the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR), which documents the presence of photosynthetically active vegetation, relates more directly to leaf development and leaf phenology than to photosynthetic rates. Here, we present a new approach for linking FAPAR and vegetation-to-atmosphere carbon fluxes through variational data assimilation. The scheme extends the Carbon Cycle Data Assimilation System (CCDAS) by a newly developed, globally applicable and generic leaf phenology model, which includes both temperature and water-driven leaf development. CCDAS is run for seven sites, six of them included in the FLUXNET network. Optimization is carried out simultaneously for all sites against 20 months of daily FAPAR from the Medium Resolution Imaging Spectrometer on board the European Space Agency's ENVISAT platform. Fourteen parameters related to phenology and 24 related to photosynthesis are optimized simultaneously, and their posterior uncertainties are computed. We find that with one parameter set for all sites, the model is able to reproduce the observed FAPAR spanning boreal, temperate, humid-tropical, and semiarid climates. Assimilation of FAPAR has led to reduced uncertainty (by >10%) of 10 of the 38 parameters, including one parameter related to photosynthesis, and a moderate reduction in net primary productivity uncertainty. The approach can easily be extended to regional or global studies and to the assimilation of further remotely sensed data.

1. Introduction

[2] The terrestrial biosphere does not only respond to climate fluctuations and change, it is an active part of the Earth's climate system. By taking up or releasing carbon, it directly impacts on the levels of CO2 in the atmosphere, influencing climate [Friedlingstein et al., 2006]. Therefore, understanding how CO2 fluxes between the atmosphere and the terrestrial biosphere respond to interannual fluctuations of today's climate is crucial for sound projections of future climate change [Ciais et al., 2005; Knorr et al., 2005b; Zeng et al., 2005]. Capturing changes of the terrestrial biosphere by satellites is an especially interesting method because of its global coverage. The presence of healthy vegetation can be captured well from space, because it exhibits a strong contrast in reflectance between the visible and the near-infrared part of the solar spectrum [Verstraete et al., 1996]. This feature is robust across all land plants because it has its roots in an adaptive mechanism: visible light is photosynthetically active, while near-infrared light is not, and therefore its absorption is avoided to reduce overheating and transpirational loss [Jones, 1983]. This robustness of spectral features has led to the design of various vegetation indices [Deering et al., 1975; Verstraete and Pinty, 1996; Huete, 2007], but the quantity best related to the phenomenon just discussed is the Fraction of Absorbed Photosynthetically Active Radiation (FAPAR). In the context of this study, it is defined as the amount of radiation in the photosynthetically active range (0.3–0.5μm) absorbed by healthy leaves or conifer needles divided by the total amount of radiation absorbed at the surface (i.e., one minus albedo in that spectral range) [Pinty et al., 2009]. Thus, an optically deep canopy composed entirely of leaves would be able to reach a theoretical maximum close to 1, while no vegetation always amounts to a value of 0.

[3] This study is based on the assumption that agreement between satellite-derived FAPAR and FAPAR computed by the vegetation model assures that, given the same illumination conditions, the modeled photon absorption by leaves is consistent with what was estimated during the satellite retrieval. However, it must be borne in mind that the proper interpretation of FAPAR data requires understanding of the various physical processes that impact the reflected light on its course through the atmosphere and the canopy before it arrives at the sensor. This understanding has been embedded in radiative transfer models that are then used in inversion methods to retrieve vegetation parameters from remote sensing data. Various mathematical tools have been developed that minimize the effects due to the scattering by atmospheric particles, the brightness of soils and the changing geometry of illumination and observation.

[4] Current operational remotely sensed FAPAR products are mainly derived from medium-resolution satellite instruments to provide regional and global operational FAPAR products at a variety of spatial and temporal scales. They have been designed for various sensors, such as the Moderate Resolution Imaging Spectroradiometer (MODIS [Knyazikhin et al., 1998a]), the Multiangle Imaging Spectro-Radiometer (MISR [Knyazikhin et al., 1998b]), the Sea-viewing Wide Field-of-view Sensor (SeaWiFS [Gobron et al., 2002]) and the Medium Resolution Imaging Spectrometer (MERIS [Gobron et al., 2008]), as part of various projects, such as LANDSAF [Roujean and Bréon, 1995] and CYCLOPES [Baret et al., 2007], and as a derived product of broadband surface albedo [Pinty et al., 2007].

[5] FAPAR products have been used to monitor large-scale changes in vegetation status [Gobron et al., 2005, 2010] and have then been related to large-scale changes in land-atmosphere CO2 fluxes via algorithms for net primary productivity (NPP) [Cao et al., 2004] or by validation with results from vegetation modeling [Zeng et al., 2005; Knorr et al., 2005a, 2007]. The latter approach raised the prospect of eventually identifying all major processes at the ecosystem level that are responsible for the climate response and including them in an assimilation system, opening up the way for improving future assessments of climate change impacts. This has been taken a step further by the Carbon Cycle Data Assimilation System (CCDAS [Rayner et al., 2005]). Here, observational evidence for interannual changes in atmosphere-biosphere carbon fluxes is assimilated into a terrestrial vegetation model through constraining prior uncertainty ranges of process parameters. This approach allows using data that are only rather indirectly related to the process studied, in the case cited concentrations of atmospheric CO2 at remote flask sampling sites, which would normally exclude them from being used for anything but validation of model results.

[6] This advantage of using indirect information for constraining mathematical process descriptions is also valid for the use of FAPAR for constraining rates of carbon fluxes and even photosynthesis. Even though FAPAR has been used in the past to infer NPP by multiplying with incident photosynthetically active radiation (PAR) and some efficiency factor [Monsi and Saeki, 1953; Prince, 1991; Ruimy et al., 1994], the presence of leaves is determined not by the rate of photosynthesis, but by the adaptation of the plant to environmental stresses such as frost or drought [Woodward, 1987], a process described as leaf phenology. Therefore, CCDAS used FAPAR not as an input quantity for an efficiency NPP model, but as a constraint on a small number of model parameters independently for each grid cell of the global model. This has been carried out in a first assimilation stage [Knorr and Schulz, 2001], which was not included in the overall estimation of parameter uncertainties [Rayner et al., 2005] and their propagation to diagnostic and prognostic uncertainties of carbon fluxes [Scholze et al., 2007].

[7] The main aim of the present study is therefore to report on progress in the further development of CCDAS, where the assimilation of FAPAR data has been fully integrated into the variational stage of the system. What is meant here is that a large set of parameters is optimized for a multitude of locations simultaneously by minimizing a global cost function, and the second derivative of the cost function is used to infer improvements in parameter accuracy from the prior to the posterior estimate. Details of the approach are given in section 2. The wider aim is to assess in more general what information is added by including FAPAR into a modeling framework aimed at simulating NPP in addition to information from climate, land use and soils data. This second aim is motivated by the widely used efficiency approach for modeling carbon fluxes from FAPAR [Field et al., 1998; Zhao et al., 2005; Turner et al., 2009] (i.e., where FAPAR is used as an input parameter and the models therefore can run only for time periods where such observations are available), and by the existence of satellite-derived net (NPP) and gross primary production (GPP) products [Zhao et al., 2005; Sims et al., 2008].

2. Description of CCDAS

[8] The setup, data and models used in CCDAS have been described by Scholze [2003], Rayner et al. [2005], and Scholze et al. [2007] to which we refer for details regarding the formulation. In brief, BETHY, the core CCDAS model, is a process-based model of the terrestrial biosphere [Knorr, 2000]. It simulates carbon assimilation and plant and soil respiration embedded within a full energy and water balance and phenology scheme. BETHY is a fully prognostic model, and is thus able to predict the future evolution of the terrestrial carbon cycle under a prescribed climate scenario. Global vegetation is mapped onto 13 plant functional types (PFT) based on Wilson and Henderson-Sellers [1985] (see Table 1 for the PFTs covered by this study). Each grid cell of arbitrary size can contain up to three different PFTs, with the amount specified by their fractional coverage. As mentioned, all previous CCDAS studies have used a two-stage inversion procedure, where the first stage uses the full BETHY model to assimilate fields of FAPAR derived from satellite data, thereby optimizing parameters controlling soil moisture and phenology. The second stage uses a reduced version of BETHY with no phenology scheme and no water balance, to assimilate atmospheric CO2 concentration observations from a global station network [GLOBALVIEW-CO2, 2004]. This simplified form of the model uses the leaf area index (LAI) and plant available soil moisture fields provided by the first stage after optimization.

Table 1. List of the Plant Functional Types With Their Corresponding Numbers in the Global Version of BETHY, and the Associated Phenologya
  • a

    PFT, plant functional type.

1tropical broadleaf evergreen treewarm-evergreen
2tropical broadleaf deciduous treewarm-deciduous
4temperate deciduous treecold-deciduous
5evergreen coniferous treecold-evergreen
8deciduous understorey shrubcold-deciduous
9C3 grassgrass
10C4 grassgrass

[9] The assimilation of FAPAR data in CCDAS [Knorr and Schulz, 2001] is based on minimization of the difference between satellite and model-derived FAPAR. Within BETHY, FAPAR is calculated as the vertical integral of absorption of photosynthetically active radiation by healthy green leaves divided by the difference between the incoming and outgoing radiation flux at the top and bottom of the canopy. This integration is carried out by a two-flux scheme, which takes into account soil reflectance, solar angle and amount of diffuse radiation. Equating satellite and model FAPAR means that given the same illumination conditions, the same number of photons enter the photosynthetic mechanism of the vegetation, even if some of the assumptions differ between BETHY and the model used to derive FAPAR [Gobron et al., 2000]. It also means that FAPAR in the model is defined only with respect to the absorption by photosynthesizing plant parts [Pinty et al., 2009], which is consistent with the definition used for deriving the MERIS FAPAR product.

[10] The second stage then allows the rigorous propagation of uncertainties as described by Kaminski et al. [2002, 2003] and Rayner et al. [2005] and demonstrated by Kaminski et al. [2002], Rayner et al. [2005], and Scholze et al. [2007]. It uses a probabilistic framework, described in detail by Tarantola [1987] or Enting [2002], who also gives an exhaustive overview on applications to biogeochemistry.

[11] The state of information on a specific physical quantity is conveniently formulated in terms of a probability density function (PDF). The prior information is quantified by a PDF in the space of control variables (here: process parameters of BETHY and the initial atmospheric CO2 concentration), and the observational information by a PDF in the space of observations. Their respective means are denoted by x0 and d and their respective covariance matrices by C0 and Cd. Note that Cd has to account for uncertainties in the observations and uncertainties from errors in simulating their counterpart. We approximate the posterior PDF by a Gaussian with mean xpost and covariance matrix Cpost. The mean is the minimum of the following cost function:

equation image

where M(x) denotes the model operated as a mapping of the control variables onto simulated counterparts of the observations. In practice, the minimization of J is performed iteratively by a gradient algorithm, and the search direction is determined via the gradient of J, evaluated by adjoint code. The use of adjoint model code greatly enhances computational performance of the nonlinear optimization.

[12] We approximate the covariance matrix of the model parameters as

equation image

where H(xpost) denotes the Hessian matrix of J, i.e., the matrix composed of its second partial derivatives equation image. Since the dimension of xpost never exceeds a few hundred, it is computationally feasible to evaluate the full Hessian by running efficient second derivative code.

[13] The inverse step is followed by a second step, the estimation of a diagnostic or prognostic target quantity y. The corresponding PDF is approximated by a Gaussian with mean

equation image

and covariance

equation image

where N(x) is the model operated as a mapping of the control variables onto the target quantity. In other words, the model is expressed as a function of the vector of its parameters x and returns a vector of quantities of interest, for example the rate of photosynthesis at some desired time step. N′(xpost), the Jacobian matrix of N, is its linearization around xpost, and Cy,mod is the uncertainty in the simulation of y resulting from errors in the model. In the hypothetical case that the model was perfect, only the first term would contribute to Cy. On the other hand, if the control variables were known to perfect accuracy, only the second term would contribute to Cy.

[14] The minimization of equation (1) and the propagation of uncertainties are implemented in a normalized parameter space with Gaussian prior. The normalization is such that parameter values are specified in multiples of their standard deviation, i.e., C0 is the identity (for details, see Kaminski et al. [1999] and Rayner et al. [2005]). We assume that Hessian Eigenvalues less than 1 reflect small-scale noise. To remove this noise Hessian Eigenvalues around 1 are set to 1 as in the procedure detailed by Rayner et al. [2005].

[15] The basic setup of the previously used and the newly developed CCDAS are shown in Figure 1. The previous scheme consists of two stages, and satellite-derived FAPAR are only assimilated in Stage 1. Since uncertainties of parameters and diagnostic quantities (such as NPP) are only evaluated in Stage 2, no information on the reduced uncertainty after assimilating FAPAR data is available. The purpose of this two-stage process is to reduce computational demand for the optimization in Stage 2, which is run only with part of the full BETHY model of Stage 1. Instead, the state variables of the phenology and hydrology model are passed from Stage 1 to Stage 2, where the reduced “carbon-BETHY” lacks those two components. The quantity assimilated in Stage 2 is atmospheric CO2, which requires an atmospheric transport model (TM2) as well as additional background CO2 fluxes. Those background fluxes, which enter directly into TM2, are not required for the new setup, but would be if this were applied at the global scale with assimilation of CO2 concentrations. This is currently under development.

Figure 1.

Sketch of the (left) previously used and (right) new CCDAS structures: ovals represent input and output data, and gray boxes represent calculation steps. Shown in bold are the state variables of the phenology and hydrology scheme. Diagnostics are quantities of interest such as carbon fluxes computed by CCDAS. Uncert., uncertainty; param., parameters.

[16] FAPAR in the new setup is directly assimilated into “CCDAS BETHY”, which includes the hydrology model of the previous full BETHY (with some slight modification, see below), and the newly developed phenology model which replaces the one of the full BETHY [Knorr, 2000]. Another modification, which is not depicted in Figure 1, is that BETHY can now be switched between modes where it runs for a multitude of sites simultaneously (as in this study, see section 4.3), globally, or simultaneously at sites and globally.

[17] By including the FAPAR assimilation into the previous Stage 2, the derivative-based framework can now be used to propagate uncertainties in observed FAPAR back to uncertainties in control variables and then forward to uncertainties in target quantities. This change, however, requires some modification to the model as it was present in full BETHY, which is the reason why a new phenology scheme had to be developed. The requirement is that the simulated FAPAR (and, with it, the cost function in equation (1)) and the carbon fluxes (see equation (3)) are at least twice differentiable functions of all process parameters. As it is explained below, this was not the case for the phenology scheme of the previous full BETHY. All derivative code of Stage 2 or the new CCDAS is generated from the model code by the automatic differentiation tool Transformation of Algorithms in Fortran (TAF) [Giering and Kaminski, 1998].

[18] The following changes to the BETHY version of Knorr [2000] were necessary: First, the phenology model was replaced by a new scheme described in section 3. Second, the soil evaporation model was simplified such that soil evaporation, Es, happens at the following rate:

equation image

where Es,pot is the potential rate determined by the net radiation at the soil surface, ws the amount of water in a second soil water bucket, and ws,max its maximum value. The modified BETHY works with two overlapping buckets, both of which have exactly the same inputs and outputs, but the shallower bucket, which overflows earlier, determines soil evaporation via equation (5), whereas the original, deeper bucket (maximum value Ws,max) determines transpiration rates. We set ws,max to the lower of 5 mm and Ws,max (see section 4.2). Third, the size of the large bucket is redistributed between tree/shrub and herbaceous PFTs such that the average Ws,max of the herbaceous PFTs at each grid cell is 30% of the average of the tree PFTs.

3. A Generic Global Phenology Scheme

[19] The purpose of this section is to explain the development of a new generic phenology scheme that captures all major phenology types of the global terrestrial biosphere with one set of equations. Its parameters are described in Table 2. Differences between functional types are formulated entirely through differences in parameters, as shown in Table 3. Other required features of the scheme are explained next.

Table 2. Summary of the Parameters of the Phenology Model
equation imagemaximum leaf-area index-
Tequation imagetemperature at leaf onset°C
Trspatial range (1σ) of Tequation image°C
tcday length at leaf sheddinghours
trspatial range (1σ) of tchours
ξinitial linear leaf growthd−1
kLinverse of leaf longevity, τLd−1
τWlength of dry spell before leaf sheddingdays
Table 3. Differentiation Between Phenology Typesa
PFTPhenology TypeGrowth TriggerDeciduousness
Temperature Tequation image (°C)Day Length tc (h)Temperature τL (days)Water τW (days)
  • a

    Asterisks denote a fixed parameter value indicating that this limitation is not active. Dashes show that the parameter is not active because of the choice of Tequation image. All other parameter values are prior estimates and are optimized during data assimilation.

9, 10grass20*1050

[20] The remainder of this section is laid out as follows: section 3.1 explains the consequences of using the derivative approach for the design of the phenology scheme. Section 3.2 explains the approach to subgrid variability necessary to ensure differentiability, and introduces parameters for temperature and light dependent growth triggers. These are discussed in detail in section 3.4, following section 3.3, which introduces the differential equation describing the time evolution of the LAI. Finally, section 3.5 discusses water and other limitations and their implementation via a maximum LAI, with a single parameter allowing for different water-use strategies between PFTs.

3.1. Requirements

[21] Currently available phenology schemes for global vegetation models have some issues that make them unsuitable for use in CCDAS. Firstly, the present phenology scheme of BETHY relies heavily on NPP to determine LAI under drought conditions [Knorr, 2000]. Several tests with CCDAS have shown that this leads to oscillations around an optimal point where LAI and NPP are in equilibrium. Those oscillations then cause sudden jumps in the cost function with extremely small changes in the control parameters. An alternative approach using soil water content directly to drive drought-limited LAI was not considered, because the amount of soil water sufficient to support a certain LAI depends on the local evaporative demand.

[22] More complex schemes such as the phenology scheme of SILVAN [Kaduk and Heimann, 1996] or logro-P (C. Reick, personal communication, 2010, implemented by Raddatz et al. [2007]) use a number of switches, or “triggers”, to change the state of the vegetation between dormant, growth or senescent. With such a formulation involving discrete state variables, however, the cost function in equation (1) cannot be differentiated where one of those states changes. One goal of the present work is therefore to find a globally applicable but reasonably simple phenology scheme that should be differentiable everywhere with as few as possible sudden changes.

[23] In the following we present such a newly devised phenology scheme with state variables that depend on the process parameters in differentiable form. The complexity of the formulation and the number of process parameters have been kept low. Nevertheless, the scheme allows for both temperature and water limitation and is able to represent the major global phenology types, namely cold-deciduous, warm-deciduous, cold-evergreen, moist-evergreen, grass and annual crop. This constitutes an important advance over similar global schemes that have been derived from remotely sensed information, but either consider only temperature limited phenology, or use only soil moisture without considering evaporative demand [Botta et al., 2000].

3.2. A General Spatial Approach

[24] The phenology models cited above use triggers that set vegetation instantly from a dormant to an active, and again to a senescent stage. However, in reality over an area the size of a model grid cell, such transitions will happen at different times because either the environmental conditions or the exact point of the trigger, or both vary. Apart from being unrealistic, the sudden change of state will create a nondifferentiable dependency of the state variables on the control parameters (see section 3.4).

[25] Here we assume that spatial variability within a grid cell is entirely the result of differences in the threshold parameter defining the trigger, effectively subsuming impacts of small-scale climatic variability under the same. This parameter is assumed to have a Gaussian probability distribution in space. There are two of those threshold parameters: equation imageequation image and equation imagec. It is important to note that the transition to the active state requires both equation imageequation image > T and equation imagec > td, where T is a temperature and td length of day. The tilde (∼) denotes that these are parameters or state variables of individual plants within a grid cell. As shown next, these parameters are integrated over their probability distribution, replacing the integration across the space of the grid cell.

[26] Before proceeding to the spatial integration, we define a generic differential equation in time for the LAI of individual plants, equation image (t):

equation image

where f1 and f2 are some arbitrary functions of the state of the vegetation.

[27] In this discrete formulation, the response of LAI to changes in equation imageequation image or equation imagec is usually nondifferentiable at the threshold. The continuous version of equation (6), which is valid for the spatially integrated LAI, Λ(t), resolves to an integral over the Gaussian probability density functions (PDF), p and q, of the two trigger variables:

equation image

The spatial PDF p is characterized by a mean Tequation image and its standard deviation Tr, while the mean of q is tc and the standard deviation tr. All four are CCDAS control parameters. Note the distinction between these two spatial PDFs and the fact that their four parameters have again PDFs in the Bayesian sense in the same way as all other control parameters.

[28] The previous expression simplifies to

equation image


equation image

where Φ is the cumulative normal distribution. f is the fraction of plants within the proportion of a grid cell occupied by each PFT that are actively growing or maintaining leaves.

3.3. Time Evolution of LAI

[29] Describing the generic formulation for the time evolution of LAI of a single plant, which is then integrated spatially via equation (8), requires definition of f1 (for plants in their growing season) and f2 (for senescent plants). For f1, we assume the simplest formulation that satisfies the following two conditions: leaf growth starts immediately and is not limited by substrate availability, such as LAI itself; and growth stops if a target LAI is reached that is in balance with the environmental limitations, described as Λmax. These conditions are met by the following formulation:

equation image

where ξ is a linear growth constant describing the increase in LAI per time unit shortly after bud burst. This rate is chosen to be independent of carbon gains (NPP), because initial leaf development relies on buds and reserves from the previous year [Kaduk and Heimann, 1996]. This formulation differs from those used in similar applications, such as logro-P or the one by Liu et al. [2008], where the initial growth is exponential resulting in a logistic function for the time integral under constant conditions. Equation (10) (with f = 1) results in a time dependence described by Λ(t)/Λmax = 1 − exp(−ξt) for Λ(0) = 0, which is linear in Λ for small t. The advantage of this approach is that it does not require setting a minimum LAI to set off growth (0.1 in the work by Liu et al. [2008]). This would not work here, because Λmax might be less than such a minimum value. The difference in approach can be explained by the fact that the work just cited is restricted only to temperature controlled phenology and does not include situations where either the temperature or the water balance only allows small values of LAI.

[30] For those plants that are outside their growth stage, we again chose the simplest formulation that allows accommodating both deciduous and evergreen phenology:

equation image

The new parameter, τL, which is related to leaf longevity, describes how quickly leaves are shed, or whether they stay inactive until the next growing season. Deciduous vegetation will normally shed leaves (which includes leaves turning brown according to our definition of FAPAR) within days to weeks. Evergreen vegetation, on the other hand, should have values at the order of a year or more.

[31] We now consider evolution of the spatially integrated grid-cell average LAI. Inserting equation (10) and equation (11) into equation (8) yields:

equation image

In order to find a convenient form for integrating this expression, we define

equation image


equation image

so that equation (12) takes the form:

equation image
equation image

[32] As long as f and Λmax (and therefore r and Λmin) do not depend on t, the equation above has the following solution:

equation image

[33] Here it is sufficient to state that Λmax depends on quantities that are updated either daily or every few days, while f depends on daily values of temperature and day length. Therefore, the last equation can be used to integrate over a single daily time step of the phenology scheme, Δt. This mixture of analytical and numerical integration is not only highly efficient, it also ensures stability, i.e., it avoids negative Λ as long as Λlim and Λ(0) are nonnegative.

3.4. Temperature and Day Length Requirements

[34] Determining the date of leaf onset in cold-seasonal climates is often approached by the concept of growing-degree days (GDDs), defined as the sum of the daily mean temperatures minus a threshold temperature, TGDD, as long as this contribution is positive (see the discussion by Botta et al. [2000]). In the simplest approach, the parameters of the scheme would be the critical GDD (GDDc), and TGDD. GDD approaches attach the same weight to temperatures in the past, and therefore must be reset at certain times per year, for example at the beginning of January for sites in the northern temperate zone. For a global model, this might lead to complications when determining the appropriate date of the reset. Further, the LAI at a given day with a given GDD will be some positive number if GDD ≥ GDDc, but 0 if GDD < GDDc. If GDD is close to GDDc, then a small change in GDDc can lead to a jump in the LAI at leaf onset. The LAI is therefore not always differentiable with respect to the parameter GDDc. The same applies to TGDD because it changes GDD. Thus, the GDD concept produces nondifferentiable dependencies of the state variables on the parameters.

[35] Instead of the GDD sum, our approach uses a phenology determining temperature, T, defined as the normalized integral over the 2m air temperature, T2m, with exponentially declining weights attached when going back into the past:

equation image

This is equivalent to an exponentially declining memory of the plants for the ambient temperature.

[36] The remaining parameter is τm, the averaging period for T. Decreasing τm means that the temperature trigger uses more recent temperature data, so that the threshold is reached earlier and the vegetation period is shifted forward in the season. However, if τm becomes too short, T oscillates because short-term fluctuations in T2m are not dampened any more. Because this leads to instabilities in the optimization, the parameter is held fixed at a value of 30 days. This is not an undue limitation of the model, because the optimization already has the possibility to both extend the length of the growing season and shift it in time through changing both Tequation image and tc.

[37] It is computationally favorable to bring equation (18) to an incremental form:

equation image

If Δt is the time step of the model and thus T2m constant over that period, then the expression simplifies to

equation image

which allows continuous updating of T with only instantaneous values of T2m. In practice, this scheme is implemented by setting T = T2m at the beginning of the model run and performing a sufficiently long spin-up.

[38] The temperature condition for leaf growth is T > equation imageequation image, supplemented by an additional trigger for day length, td, with the condition td > equation imagec. Because the warmest period across the year is considerably later then the period of maximum daylight hours, this double condition has the effect that leaf onset is triggered by equation imageequation image, and leaf shedding by equation imagec. As a reminder: the control parameters are Tequation image and tc (means), and Tr and tr (standard deviations on a grid cell).

[39] According to White et al. [1997], of the PFTs considered here only cold-deciduous and cold-evergreen trees and shrubs have a day length trigger for leaf shedding (see Table 3). Grass PFTs do not have such a day length requirement, so that for those we set tc = 0. Warm-evergreen and warm-deciduous PFTs also do not have an explicit temperature requirement for growth because they are not actively protected against cold conditions. For these, we set Tequation image = −∞. For all other PFTs, both are control parameters. The temperature trigger for the cold-deciduous and cold-evergreen (Table 1) is estimated to lie in the region of 10°C, except for the understorey shrubs (PFT 8) where it would be somewhat lower, and only little above freezing for grasses (PFT 9,10).

3.5. Water and Structural Limitations

[40] On a global scale, the main limiting factor on terrestrial plant growth is not temperature but water [Woodward, 1987]. Whenever photosynthesizing, plants loose water by transpiration through pore openings (“stomata”) in their leaves. This limitation together with any other limitation on leaf growth, here considered “structural”, is described by a single state variable, Λmax.

[41] If soil water is limiting, an increasingly negative soil water potential leads to a falling leaf potential in a complicated process involving root water uptake, xylem resistance to flow, and transpiration through the leaf stomata. If stomata close, leaves can retain water, but only to a degree that depends on their cuticular resistance, which in itself is dependent on the plant functional type (PFT). However, it is not possible to represent those complex mechanisms in a model designed for global-scale applications.

[42] The scheme chosen here goes back to Woodward [1987], who used annual potential evapotranspiration and precipitation to derive water limited LAI on a global scale. To accommodate the shorter timescale of our model, we have modified his scheme by using daily actual transpiration and soil moisture instead of annual potential evapotranspiration and precipitation. However, to have LAI react not to rapidly changing daily conditions but to the longer-term climatic state, the water limited LAI is averaged back in time using the same approach as for T (equation (18)).

[43] Generally speaking, leaf development will stop and leaves will be shed if there is insufficient soil water for transpiration. At which level this happens exactly will be a function of various drought adaptations of the PFT concerned. Independent of the details, however, adaptation will determine how long the plant, at a given LAI, can survive with a given amount of soil moisture without rain. This timescale, τW, can serve as a universal parameter of water limitation. This defines a water-limited LAI, ΛW through

equation image

where W is plant-available soil moisture. What is needed then is the total water loss after time τW as a function of leaf area.

[44] To compute this water loss, we linearize the potential rate of transpiration, E, as a function of the LAI, Λ:

equation image

equation image is the daily mean potential rate of transpiration last computed by the model at a LAI of equation image. This approximation is most accurate at low values of Λ and equation image, where net radiation of the leaf canopy, which drives evapotranspiration [Jarvis and McNaughton, 1986], can be assumed to scale linearly with LAI.

[45] Combining equation (21) and equation (22) yields

equation image

The parameter τW represents the expected length of drought periods tolerated before leaf shedding. For τW → 0 the plant “expects” its water reserves to always be sufficient for continuing survival. In this case, ΛW → ∞, meaning the plant has no explicit drought adaptation in its phenology. This is assumed for the cold-deciduous and cold-evergreen PFTs. For warm-evergreen plants, we expect the value for τW in the region of 1 year, and for grasses and warm-deciduous plants between one and two months. Water limitation is implemented separately for each PFT to reflect differences in the water use strategy, defined mainly by τW.

[46] Observe also that for equation image → 0 we have ΛW → ∞, since without evaporative demand the leaf area is not water limited, as in the case of τW → 0. Since the LAI cannot grow indefinitely, it must be limited by other factors, such as light availability, nutrients and structure. These additional limitations are summarized into a single universal parameter equation image (cf. [Knorr, 2000]) and incorporated into the model via:

equation image

ν (x, y) is a smoothed minimum function defined by

equation image

with η = 0.99. equation imagemax is recomputed daily with daily values of the soil moisture, W, whereas equation image may be recomputed at longer intervals. This allows avoiding recomputing the diurnal cycle of photosynthesis and energy balance for every simulated day, while keeping a daily time step of phenology and water balance, to save computing time with the full BETHY model [Knorr, 2000].

[47] Instead of equation imagemax, equation (10) uses Λmax, its weighted time integration computed in the same way as T from T2m (equation (9)), with the analogous definition:

equation image

Updating happens in the same way as described by equation (20) for T:

equation image

[48] The advantage of this scheme is that it has only one free parameter, τW, in addition to τs. Since changing τs might lead to instabilities of the optimization in a way similar to τm, this parameter is also held constant at value of 30 days.

4. Setup of the Data Assimilation System

[49] This section explains the sites, the input data, the satellite FAPAR product used with the assimilation, and the parameterization of the model. A complete list of the parameters included in the assimilation is shown in Table 4. Some describe the site characteristics and are fixed, others are control parameters of CCDAS and are optimized during assimilation. CCDAS has the possibility of freely assigning control parameters either to one PFT subgrid cell at a particular station, or to all subgrid cells that represent the same PFT, or even to groups of PFTs. The parameter setup for the new phenology parameters uses this PFT grouping. Parameters of the previous CCDAS are also optimized and grouped in the same way as in previous work [Rayner et al., 2005; Scholze et al., 2007].

Table 4. Process Parameters and Their Initial and Optimized Values and Prior and Posterior Uncertaintiesa
NumberPFTsParameterPrior ValuePosterior ValueRelative Change (%)Prior UncertaintyPosterior UncertaintyUncertainty Reduction (%)
  • a

    Relative change is posterior minus prior divided by prior uncertainty. Units of physiology parameters are Vmax, μmol(CO2)m−2 s−1; k, mmol(air)m−2 s−1; aΓ,T, μmol(CO2)mol(air)−1°C−1; KC in μmol(CO2)mol(air)−1; KO in mol(O2)mol(air)−1; activation energies E in J mol−1; others unitless. Uncertainties represent one standard deviation, except for the lognormally distributed parameters denoted by (*), for which relative change in log-space and the analogous difference between mean and upper 67th percentile is given.

17all exc. 10EKO35,94835,807−81,7971,7970
18all exc. 10EKC59,35660,611422,9672,9650
1910 Ek50,96750,96402,5482,5480
20all exc. 10αq0.280.292860.0140.0141
2110 αi0.040.0400.0020.0020
22all exc. 10KC25460462923230
23all exc. 10KO250.330.330320.01650.01650
24all exc. 10aΓ,T1.71.66−470.0850.0080
25Allequation image54.2−3200.250.245
264, 5Tequation image109.21−1580.50.2941
278Tequation image88.0240.50.50
289, 10Tequation image31.92−3630.2354
291, 2, 4, 5, 8Tr22.04400.10.11
309, 10Tr20.3−8520.0550
314, 5, 8tc10.513.375740.50.3138
324, 5, 8tr0.50.48−200.10.11
34all exc. 5kL0.10.07−600.050.01275
355kL3 × 10−31.3 × 10−41911.5 × 10−38.9 × 10−441
389, 10τW(*)5023−19225548

4.1. Site Descriptions

[50] Out of the total of 13 PFTs of the global version of CCDAS, seven distributed over eight sites occur in the assimilation study presented here (Table 1). On a global scale, the PFTs not considered (deciduous conifers, evergreen and tundra shrubs and wetlands) represent more marginal vegetation then the ones included.

[51] Each site consists of a rectangular study area over one to several satellite pixels as described in Table 5. The last site shown in Table 5 has been included for validation purposes and is therefore excluded from the data assimilation exercise. The areas were chosen in such a way that they constitute homogeneous land cover as identified through Google Earth images. BETHY represents the vegetation of each site by two to three PFTs and a corresponding surface cover fraction shown in Table 6, where the remainder corresponds to bare ground. The water holding capacity of the soil (Ws,max) and the soil brightness class were extracted from the global version of BETHY. For the Hainich grass site, we use half the water holding capacity of the forest site, while for Manaus, we assume that deep roots cause the maximum available water storage to greatly exceed the values at the other sites [Nepstad et al., 1994; Kleidon and Heimann, 1999]. The soil reflectance values for the brightness classes are 0.07 (dark) and 0.1 (medium) if the soil is wet, and 0.15 (dark) and 0.2 (medium) if the soil is dry. The model is run once per PFT at each site, and the results are weighted according to the PFT fractions shown in Table 6.

Table 5. List of Sites for Assimilation With Central Coordinatesa
SiteCountryLatitudeLongitudeElevation (m)N-S (km)E-W (km)n
  • a

    N-S and E-W are extent of the rectangular satellite scenes, and n is the number of daily data points after spatial averaging. The site in the last row has been included for validation only.

Hainich forest siteGermany51.0793°N10.4520°E4301.21.2106
Hainich grass siteGermany51.0199°N10.4348°E3022.41.2119
Table 6. Site Descriptions and Site-Specific Parametersa
SiteDescriptionPFTFractionWs,maxSoil Class
  • a

    First to third PFT in order of most to least dominant, “fraction” is the associated fraction of surface area in percent, and Ws,max is maximum plant-available soil moisture in milimeters. Soil class refers to the brightness of the soil.

Sodankyläboreal evergreen forest54-1510-156medium
Zotinoboreal mixed forest54-2015-105medium
AardhuisC3 grassland94-6020-101medium
Loobostemperate pine forest58920155101medium
Hainich foresttemperate deciduous forest49-8010-101dark
Manaustropical rainforest110-7010-800medium
Mauntropical savanna210-2010-150medium
Hainich grassC3 grassland94-805-50.5dark

4.2. Parameterization

[52] The way phenology types are differentiated by choice of prior or fixed parameter values is summarized in Table 3. Fixed parameters always indicate that the corresponding limitation is deactivated, so the parameter does not need to be included in the optimization. Whether a plant is evergreen or deciduous is controlled either by τL or by τW.

[53] In one case, a phenology type has a different parameterization depending on whether the vegetation is overstorey (PFT 4) or understorey (PFT 8). This reflects the different strategy of understorey plants, which need to develop leaves earlier to evade being shaded out. In general, however, parameters of different PFTs were grouped together whenever reasonable. (Tequation image for PFTs 4 and 5, tc of PFTs 4, 5 and 8, and τL for PFTs 4, 8, 9, 10). The two grass PFTs (9, 10) always use the same control parameters for phenology as they are only differentiated by photosynthetic pathway. We also know much less about the temperature control of grass phenology than we know about woody plants, where leaf onset and shedding tend to happen within a well defined period. Therefore, we assign a much larger uncertainty to Tequation image and Tr for the two grass PFTs.

[54] The complete list of parameters and how they are differentiated by PFT is shown in Table 4 together with their prior means and uncertainties. Table 4 also represents the technical implementation of the control parameters, which differs from the model description in two cases: one is that τL is replaced by kL = 1/τL, which avoids division by small numbers in equation (12) and equation (13); and for τW, the control parameter is the natural logarithm, which in turn avoids negative numbers leading to negative LAI via equation (23). As the prior PDF is always Gaussian in the space of the control parameters, this transforms the prior PDF of both τL and τW accordingly. The logarithmic transformation has been described by Rayner et al. [2005].

[55] For the two parameters representing within-pixel variability of growth triggers, Tr, and tr, we only differentiate Tr by PFT, with one control parameter for trees and shrubs and one for grass. Here, we assume a larger spatial variability for the larger group containing trees and shrubs. There is only a single control parameter for tr. Finally, equation image, the structural LAI limit, is 5 as in the work of Knorr [2000], and ξ is 0.5 d−1, which describes rapid leaf sprouting where 97% of the maximum LAI has been reached after one week.

[56] We assign rather large uncertainties in the region of 30–50% to most of the new parameters, considering the values of those parameters are known only approximately. The remaining Parameters 1–24 were adopted from the previous version of CCDAS with prior values and uncertainties taken from Scholze et al. [2007]. The most important ones in the context of this study are those controlling Vmax (1–7), the maximum capacity of the enzyme that fixes the atmospheric CO2 and makes it available for further metabolism.

4.3. MERIS FAPAR and Input Data

[57] We assimilate daily data from the Level 2 FAPAR land product derived from the Medium Resolution Imaging Spectrometer (MERIS) of the European Space Agency (ESA) at the operational resolution of 1.2 km for the period June 2002 to September 2003. Square 15 by 15 pixel scenes have been processed that are centered at the position of the six sites previously introduced (see Table 6, except Aardhuis which is contained within the Loobos scene). We have used Google Earth imagery to identify areas of uniform cover type around the centers of the scenes and have selected rectangular subscenes of those original scenes, which were then spatially averaged. The extent of those subscenes (in km) and the number of valid daily data points after averaging are listed in Table 5. The fraction of days with valid data is highest in the tropics (32% for Maun) and lowest at the northern boreal site of Sodankylä (17%). We use an uncorrelated data uncertainty of 0.1 irrespective of how many pixels where used in the spatial averaging of the FAPAR pixels [Gobron et al., 2008]. Thus, Cd in equation (1) contains only diagonal elements with values of either 0.12, or ∞ if no data are available for the day and site concerned (in practice set to a very large value).

[58] As described elsewhere [Knorr, 2000], the model is run with daily precipitation, minimum and maximum temperatures and incoming solar radiation. The data were generated through a combination of available monthly gridded and daily station data (R. Schnur, personal communication, 2010) by a method by Nijssen et al. [2001], using gridded data from the Summary of the Day Observations (Global CEAS), National Climatic Data Center and the latest updates of gridded data by Jones et al. [2001] and Chen et al. [2002] and using the available data nearest to the site. Input data related to soil and vegetation characteristics have already been described in section 4.2.

5. Results

5.1. Calibration and Uncertainties

[59] In the setup described in section 4.3, the minimization of equation (1) is carried out three times from three different starting points, and the procedure is repeated on two different computers. All six minimization procedures converge to the same minimum. The minimization starting from the prior value takes 30 iterations to reduce the cost function J from 822.4 to 468.3 and the norm of its gradient by more than seven orders of magnitude from 209.8 to 1.76 × 10−5. The FAPAR observations lead to a substantial reduction of uncertainty (by 10% or more) in ten directions in parameter space (Table 4).

[60] Not surprisingly, most of the uncertainty reduction goes to parameters of the leaf phenology, namely: inverse leaf longevity (kL, Parameters 34 and 35), the parameter describing adaptation to the expected length of dry periods (τW, Parameters 36 to 38), the temperature threshold for leaf onset (Tequation image, Parameter 26 and 28), the spatial variability for grasses (Tr, Parameter 30) and the critical photoperiod (tc, Parameter 31).

[61] Except for grasses, parameters describing the spatial distribution of leaf onset or shedding are not or negligibly constrained (Parameters 29, 32), and so is the leaf growth parameter, ξ. We do not gain information on the phenology of understorey shrubs either (Parameter 27). The maximum leaf area index (Parameter 25) is also relatively little constrained. At high LAI, changes in LAI of the order of one or two can have little impact on FAPAR because FAPAR has already reached a value approaching its theoretical maximum (close to one). As a result, the uncertainty of the satellite-derived FAPAR is too large to resolve LAI values in the range of 4 to 5 [Gobron et al., 1997].

[62] Finally, a considerable reduction in uncertainty (by >10%) is only found for one physiology parameter, namely Vmax25 for PFT 2, which only occurs at Maun. We note that Maun also has the largest number of valid data points (see Table 5).

5.2. Fit to Observations

[63] After assimilation, we found overall good agreement between data and observations considering the assumed data uncertainty (Figures 2 and 3). There are, however, a number of issues that remain to be addressed: the model overestimates the growing season length of the two grassland sites, the agreement for Manaus is less satisfactory, and there are frequent outliers in the data, notably around days 200 and 350 for Loobos. We suspect that the poor match at Manaus and the outliers are both due to remaining contamination by clouds not captured by the MERIS FAPAR algorithm, in particular cloud shadows, which are more difficult to detect than clouds themselves. We also note that the satellite data indicate an early spring greening at the Hainich forest site that is not captured by the model, even though the deviation is only slightly beyond the uncertainty range of the data. For the grass sites, it is interesting to note that the fit is better at Hainich (not included in the assimilation, root-mean-squared deviation of 0.165 at optimum) than at Aardhuis (deviation 0.267 at optimum), which was included. At Hainich, the model captures the summer drought in 2003 very well, better indeed than for the forest site, where the decline in FAPAR is less pronounced in both FAPAR and model. At both grass sites the algorithm overestimates the length of the growing season.

Figure 2.

Observed (crosses with uncertainty ranges) and modeled prior (dotted) and posterior (solid line) FAPAR for Sodankylä, Zotino, Aardhuis, and Loobos from north to south. Numbers are root-mean-squared deviation between model and satellite data for the prior (gray) and posterior (black) case.

Figure 3.

As in Figure 2, but for the Hainich forest site, Manaus, Maun, and the Hainich grass site. The Hainich grass site is shown for validation and not included in the assimilation.

[64] The data assimilation itself has led to an improvement of the fit at all sites, including the one where FAPAR data were not assimilated. The smallest improvement is found (in this order) for Loobos, Sodankylä and Zotino. In these cases, simulated FAPAR changes only slightly from prior to posterior, and prior agreement with the data is already good.

[65] Table 4 shows both the prior and posterior parameter values, and the change in each parameter after optimization relative to the prior uncertainty. This relative change gives a measure of the extent to which the optimization has accepted our prior parameter estimates. A range of this value between −200% and +200% corresponds to a change within a ≈95% confidence interval and indicates agreement between model and prior estimate, while a value clearly outside this range indicates disagreement.

[66] We observe that three parameters change by considerably more than two standard deviations of the prior PDF. These are Parameters 25 (maximum LAI, equation image), 31 (critical photoperiod, tc) and 36 (the water-use strategy parameter, τW, for tropical evergreen trees). For Parameters 25 and 31, we might have been too optimistic with our rather tight prior error margins.

[67] For Parameter 36, however, the result suggests an extremely conservative strategy for evergreen tropical vegetation, apparently anticipating extremely long drought periods of almost three years. Examination of Figure 3 (Manaus site) shows that even though FAPAR reaches very high values of >0.8, a large number of data points are below 0.5. This suggests that contamination by clouds and cloud shadows might be a serious problem at this site with its year-round humid climate. Obvious outliers to low FAPAR values were earlier identified for the Loobos site. Increasing τW during optimization had the effect of lowering ΛW and thus improving the fit to the data. The optimal value for τW at Manaus must therefore be considered unrealistic. For the warm-deciduous trees at Maun, however, the expected drought period is just over three months (τW = 112 days), consistent with the length of the dry season at the site (150 days).

[68] The problem with cloud contamination of FAPAR at Manaus also explains why Parameters 1 (Vmax25 for PFT 1) and 25 (equation image) were significantly downsized. The problem could have been reflected in a larger error bar for the satellite FAPAR, which would have resulted in a larger weight given to the priors and thus in less change in the parameters.

[69] For all sites we find that the vegetation period in cold climates is too long in the prior case. As a consequence, the optimization reduces Tequation image for all PFTs, except for understorey shrubs (PFT 8), and tc is increased to 13.4 h by the optimization. Oleksyn et al. [1992] show even larger length-of-day thresholds in their study (14 h and more), but referring to cessation of growth and not to the start of decline in LAI as it is the case with tc. The latter implies that tc needs to be less than the values they cite. Taken together, tc seems consistent with independent ecophysiological data.

[70] There is also a large reduction in Parameter 35 (kL for PFT 5), which affects the three northernmost sites. As the deciduous vegetation dominates the seasonal amplitude, we expect that this parameter is more affected by the FAPAR observations outside the growing season. Such observations, however, are mainly available for Loobos. The prior value amounts to a significant drop in simulated FAPAR throughout the winter, while the posterior value predicts a much greater longevity of needles with little change in LAI throughout the winter. It appears that the adjustment toward an earlier end of the growing season by changing tc upward required a more constant LAI through the year. The posterior value is still smaller than the value reported by Niinemets and Lukjanova [2003] for the rate of decline in pine shoots (3.8 × 10−4), but it is of the same order of magnitude.

5.3. Simulated Net Primary Production

[71] In order to assess to what extent the MERIS FAPAR data have helped to constrain simulations of the net primary production (NPP) of vegetation, we select annual mean NPP at each site as target quantities (i.e., as y in equation (4)), including the site for which no FAPAR data were assimilated. The period chosen for those prognostic simulations is January 2001 to December 2003, which is almost twice as long as the period for which FAPAR data are available. Inferring information for outside the “diagnostic” period is a major strength of the process-based data assimilation technique, as demonstrated before by Knorr and Kattge [2005] and Scholze et al. [2007].

[72] The computed prior and posterior means and uncertainties of annual NPP are shown in Table 7. Relative change in NPP is again shown as a fraction of the prior uncertainty, which is computed at the optimal parameter point. The lowest NPP is found at the far northern sites, a rather low value also for Loobos, owing to the low value of Vmax25 for PFT 5, and for the semiarid Maun site, intermediate values for the temperate sites at Hainich and Aardhuis, and high values for the evergreen tropical site at Manaus. Prior uncertainties are considerable for Sodankylä, and moderate for the remaining ones.

Table 7. Mean Annual Prior and Posterior NPP for the Period 2000–2003 (Inclusive) With Uncertainty, Change Relative to Prior Uncertainty, and Relative Uncertainty Reductiona
SitePrior NPPPosterior NPPRelative Change (%)Prior UncertaintyPosterior UncertaintyUncertainty Reduction (%)
  • a

    Units are in gC m−2 yr−1 or percentage when stated.

Hainich forest689657−291129813
Hainich grass619786971728948

[73] The only site where there is a large relative change (around 200%) in the simulated NPP is Manaus. We suspect that with either larger uncertainties for FAPAR or a more conservative screening algorithm, the posterior NPP would be closer to the prior value. This would also mean much less error reduction for Manaus, which here is shown as 34%. The other sites where we find a considerable uncertainty reduction (by more than 10%) are Aardhuis, a grass site, Hainich forest, and Hainich grass, the latter not included in the data assimilation. This is consistent with a large reduction in uncertainty of Tequation image and Tr for grasses. Analyzing the model's Jacobian at the posterior parameter point (see equation (4)), we also find that the NPP at Hainich forest has by far the highest sensitivity to Parameter 26 (Tequation image for PFT 4,5). This explains its relatively large uncertainty reduction (13%) partly as a result of the uncertainty reduction in this parameter (41%).

6. Discussion

[74] The results of this study have shown that with less than two years of FAPAR data, we are able to constrain most of the parameters of a generic model of leaf phenology. With the optimized parameters, the model is able to reproduce the satellite-based FAPAR observations from MERIS within the uncertainty range of the data for three out of the seven sites for which data were assimilated. Agreement was somewhat less satisfactory for the deciduous forest site and for grassland, but agreement still increased substantially after data assimilation. The same was true for a grassland site for which no satellite data were assimilated.

[75] We further expect that the reduction in parameter uncertainty will increase if longer time series are used. We can approximate the effect of doubling the length of the time series on posterior parameter uncertainties by assuming similar values in the first (data) term of equation (1) for the additional half of the time series, assuming zero correlation with the uncertainties of the first half of the time series. As a result, the part of the cost function containing the (FAPAR) data is increased by about a factor of two, and the Hessian, which as the cost function can also be written as the sum of a data and parameter driven part, also has its data driven part increased by approximately a factor of two. It is this data driven part of the Hessian that produces the reduction in parameter uncertainty.

[76] The main conclusion of this study is that FAPAR data appear to add significant constraints on those model parameters that control the phenological cycle, namely start, duration and maximum LAI during the growing season. It is also encouraging that some of the posterior parameter values could be shown to agree with independent ecophysiological studies. At the same time, we find that comparatively low FAPAR values at the Manaus site affect several parameters in unexpected ways. Both the Manaus and Hainich sites are known to be dense forests, and the reason why Hainich has somewhat higher FAPAR than Manaus in the summer may be a result of some residual contamination of FAPAR, in particular by cloud shadows. For future studies, we might consider insuring the functioning of the data assimilation system by increasing the error bars in such cases, once an appropriate indicator has been identified.

[77] Another important result of this study is that simulations of the phenological cycle and NPP seem largely decoupled, causing much less reduction in uncertainty for NPP than for phenological parameters. This was not expected, since FAPAR controls the number of photons available for photosynthesis and net primary production. The result might have been different had we considered control of leaf phenology via the carbon balance [Kikuzawa, 1995]. Instead, soil moisture and temperature were chosen as the control state variables of the phenology scheme. We therefore expect that FAPAR would put more constraint on simulated soil moisture than on NPP, something that could be the subject of further investigations.

[78] The moderate to small change in the uncertainty of NPP has consequences for the use of light-use efficiency approaches in models and remote-sensing products mentioned in the introduction. For example, the standard MODIS NPP and GPP products [Zhao et al., 2005] rely on LAI and FAPAR data derived from MODIS reflectances, climate data, and land use information. This study hints that climate data, land use information combined with the proper parameterization of the algorithm are sufficient sources of information to derive NPP and GPP. The main advantage of using satellite-derived LAI or FAPAR information is that the algorithm used for those products does not require a phenology model, which can certainly be an advantage for purely diagnostic studies.

[79] Another use of remote sensing data was demonstrated by Sims et al. [2008], who used remotely sensed land surface temperature in addition to vegetation index data to improve the climatic information entering the GPP algorithm. In this case again, the information is not necessary but additional, if a prognostic model is used (i.e., one that does not require observations to run). This conclusion also applies to the use of light-use efficiency approaches in global vegetation models [Field et al., 1998; Zhao et al., 2005; Turner et al., 2009].

[80] The alternative to the diagnostic modeling approach used in those products and models is data assimilation in the way presented here, only extended to larger scales. The advantages are that such an approach makes it explicit which information is used for the final end product, that parameters can be used to create a link between observations across space and time, leading to better constraint target quantities (such as NPP or GPP), and that the assimilation system can also be used to make predictions or be run in situations where no observations are available.

[81] We expect that increasing the amount of FAPAR data used in the assimilation will further increase the constraint on NPP. This is because of the specific approach pursued in CCDAS: common parameters that occur at several sites or grid cells are constrained simultaneously. In this way, remotely sensed data from many locations is used to constrain a limited set of parameters. In this case, the data will deliver considerably more constraint on the parameters, and in all likelihood also on NPP.

[82] Therefore, the answer to the question whether FAPAR data products are a useful source of information for carbon fluxes is a qualified yes. They do, provided that the data are used in a way that process knowledge represented by universal parameters is increased. If, however, the data assimilation is carried out at each pixel independently, we would expect only a moderate to small reduction in NPP uncertainty.

[83] This is also important in the light of a recent study by Medvigy et al. [2009] that constrains terrestrial biosphere models with multiple observations, including satellite-derived LAI estimates, albeit by adjusting a smaller number of dedicated parameters. The results indicate better agreement of simulated carbon fluxes after data constraint. Given the results of that and the present study, we suspect that the main potential of satellite-derived FAPAR data to improve carbon flux simulations lies in the simultaneous use with other constraints, such as eddy covariance and CO2 data. This would also address the problem of validating the model in areas of high cloud contamination, such as evergreen rainforests.

[84] The results presented here are certainly dependent on the model and on the choice of prior parameters and their uncertainties, so that further application of the approach will be needed to confirm them. However, the study shows that much of the information contained in the FAPAR signal can be reproduced by the model, while the cause for some deviation between modeled and observed FAPAR still needs to be investigated. This is important for the perspective of using FAPAR in a data assimilation mode with a fully prognostic model that is able to simulate the observed signal without the use of observations. It is one of the requirements of data assimilation that the simulated and observed signals are similar and the model is in principle able to reproduce the observed signal [Tarantola, 1987]. We also note that other studies such as the one by Turner et al. [2009] have assimilated FAPAR data into an ecosystem model, but no study known to the authors has done so using a fully prognostic model that is able to simulate leaf phenology from climatic, vegetation type and soils information alone.

7. Conclusions and Outlook

[85] We have presented a further development of the Carbon Cycle Data Assimilation in the direction of assimilating remotely sensed information using the same variational approach as previously for the atmospheric carbon dioxide observations. The assimilation is done simultaneously at several sites representing the major biomes of the earth with a common set of parameters. The algorithm converges rapidly and effectively by highly efficient adjoint model code to a final gradient extremely close to zero.

[86] As such, the study is in principle a global one, only that the number of sample points is kept low. Increasing this number to a global grid is therefore straightforward and is planned to be the subject of a further study. Increasing the number of sites or grid points, as well as the length of the observation period, is also needed to further validate the new generic phenology scheme and to further investigate the causes for some notable deviations between optimal model and observations for grassland and deciduous forest sites.

[87] The study has shown that given the right setup where data are assimilated into a process model that has universal parameters, the method can be used in principle to simultaneously constrain parameters of a phenology model, and model-based estimates of carbon fluxes. The approach presented relies on input data of land cover in the form of PFT fractions, soil characteristics in the form of maximum plant available soil moisture, and climate data.

[88] If enough FAPAR data are used to constrain the parameters of the phenology model, it may be possible to constrain in addition the PFT fractions at each site. This would open the perspective of fully automatic, model based land cover mapping complete with characterization of uncertainty ranges. The same applies to the inclusion of maximum plant available soil moisture into the assimilation scheme; such properties cannot be inferred directly from space observations. Applications directed at water resources might require the assimilation of further remotely sensed information, for example from ESA's recently launched SMOS mission.


vector of control variables.


prior values of control variables.


covariance of the associated uncertainty.




covariance of the associated uncertainty.


optimal (posterior) values of control variables.


covariance of the associated uncertainty.


diagnostic/prognostic quantity of interest (target quantity), e.g., a net flux.


covariance of the associated uncertainty.


model operated as a mapping from control variables onto counterparts of observations.


model operated as a mapping from control variables onto target quantity.


its linearization around xpost.


cost function.


Hessian of J, i.e., the matrix composed of its second partial derivatives.


[90] The authors would like to thank the European Space Agency for financing this project under contract 20595/07/I-EC, Philippe Goryl and Olivier Colin from ESA/ESRIN, Frascati, for support with the ESA MERIS product, Monica Robustelli and Ioannis Andredakis for help with data processing, Reiner Schnur for provision of meteorological data, Michael Vossbeck for his help with code administration, and two anonymous reviewers and the Associate Editor for helpful comments and suggestions.