• Open Access

Spatial extrapolation of light use efficiency model parameters to predict gross primary production



To capture the spatial and temporal variability of the gross primary production as a key component of the global carbon cycle, the light use efficiency modeling approach in combination with remote sensing data has shown to be well suited. Typically, the model parameters, such as the maximum light use efficiency, are either set to a universal constant or to land class dependent values stored in look-up tables. In this study, we employ the machine learning technique support vector regression to explicitly relate the model parameters of a light use efficiency model calibrated at several FLUXNET sites to site-specific characteristics obtained by meteorological measurements, ecological estimations and remote sensing data. A feature selection algorithm extracts the relevant site characteristics in a cross-validation, and leads to an individual set of characteristic attributes for each parameter. With this set of attributes, the model parameters can be estimated at sites where a parameter calibration is not possible due to the absence of eddy covariance flux measurement data. This will finally allow a spatially continuous model application. The performance of the spatial extrapolation scheme is evaluated with a cross-validation approach, which shows the methodology to be well suited to recapture the variability of gross primary production across the study sites.

1. Introduction

The knowledge of global patterns of carbon uptake by plants, also referred to as gross primary production, as an important component of the global carbon cycle is of great interest for environmental scientists, particularly with respect to the human-induced elevated CO2-concentration and its coupling to the Earth's climate system [Schimel et al., 2001; Zhang et al., 2009; Beer et al., 2010; Friedlingstein and Prentice, 2010; Zhao and Running, 2010]. In the last two decades large efforts in gathering data around the globe have therefore been undertaken to monitor exchanges of CO2, water vapor and energy between vegetation and the atmosphere. FLUXNET, a highly recognized, world-wide measurement network using eddy covariance (EC) techniques, arose from these endeavors [Friend et al., 2007; Baldocchi, 2008] and led to concerted research projects aimed at quantifying and characterizing the terrestrial carbon exchange [Falge et al., 2002; Luyssaert et al., 2007; Yi et al., 2010].

However, even these continuous and long-term basis operated micro-meteorological measurement towers can only catch a glimpse of the spatial variability of ecosystem fluxes. Exploiting remote sensing data as an information source to provide spatially continuous data is therefore a logical consequence [Running et al., 1999; Turner et al., 2004; Gilmanov et al., 2005; Coops et al., 2007; Zhao and Running, 2010]. Combining these two data sources allows for the modeling of ecosystem fluxes beyond the measurement sites. In this context, light use efficiency (LUE) models [Monteith, 1972] are usually the method of choice for modeling the gross flux of carbon uptake on larger scales due to their simplicity and moderate data demands [Zhao et al., 2005; Yuan et al., 2007; Zhao and Running, 2008]. Despite many studies on light use efficiency models having been carried out, this subject is still an active field of research [Beer et al., 2010; Hilker et al., 2010]. There are still questions to be answered “with issues remaining to be solved on the leaf, stand, and landscape level [...] targeting issues of upscaling from site observations to ecoregion, biome, and global level” [Hilker et al., 2008]. Several LUE models have been developed and cross-validated with the help of FLUXNET data across vegetation types [Yuan et al., 2007] or for specific vegetation types [Xiao et al., 2004a, 2004b; Mäkelä et al., 2008]. For large-scale mapping purposes, the model parameters such as the maximum light use efficiency are, however, typically set to a fixed value or to a land class dependent constant stored in look-up tables [Heinsch et al., 2006]. With respect to the maximum light use efficiency, this practice, however, was considered by Garbulsky et al. [2010] as being “far from optimum and is the possible cause of the low performance of the photosynthetic uptake models” [Garbulsky et al., 2010, p. 255]. Explicit regionalization strategies for the calibrated LUE model parameters across biomes have barely been pursued so far, mainly because this approach has not been feasible due to data limitations. Respective efforts – sparked by unprecedented, harmonized data assemblies – have recently received new impetus [Jung et al., 2009; Garbulsky et al., 2010; Groenendijk et al., 2011].

In this study we explore the merits of relating site-specifically calibrated parameters of a LUE model to biophysical site characteristics. We utilize these relationships to spatially extrapolate parameter values to sites outside the calibration domain thus allowing the gross primary production to be estimated where direct EC measurements are not available. For this purpose we use support vector regression (SVR), a special application of support vector machines (SVM). The use of SVM techniques here is beneficial because of its potential to robustly generalize and represent highly non-linear structures with relatively few training samples. Due to a convex objective function, a unique solution is provided. It has to be mentioned here that also other (non-)linear mapping procedure such as neuronal networks, fuzzy rule based techniques, or decision trees could be applied here in principle. A comprehensive intercomparison of techniques, however, was not the intention here and beyond the scope of this paper.Yang et al. [2007] have already proven SVR to be a powerful technique to regionalize evaporation and gross primary production from tower measurements. This study distinguishes itself from their purely empirical approach by not relating gross primary production directly to remotely sensed land surface characteristics, but through the application of an underlying physiological relationship, namely the aforementioned light use efficiency model. The determination of a relationship between calibrated model parameters and biophysical characteristics will allow us to run the LUE model at sites where a calibration – as in the majority of cases – is not possible due to the absence of measurement data.

2. Data and Methods

2.1. FLUXNET Data

A data set comprising 42 FLUXNET sites ranging from boreal forests to semi-arid grasslands in North America and Europe serve as data base for this study (Table A1). The site selection criterion was based on the existence of at least three measurement years at the time of the data download and the absence of long measurement gaps over three weeks of the most relevant variables (net CO2-flux,FN, soil temperature, TS, photosynthetically active radiation, PAR, latent and sensible heat, LE and H). Table A1lists their main characteristics. The half-hourly or hourly data were downloaded from the web gateways of AmeriFlux and CarboEurope, the regional sub-networks of FLUXNET. For a description of the data processing (gap-filling, partitioning of net carbon fluxes (FN) into the respiration component and the gross carbon uptake or gross primary production (FN), respectively, aggregating to daily data) please refer to Horn and Schulz [2011]. In the following study, solely daily data are used.

2.2. MODIS Data

The fraction of absorbed photosynthetically active radiation (FPAR [-]) as well as the leaf area index (LAI [m2 m−2]) were downloaded as MODIS Land Product subsets [Oak Ridge National Laboratory Distributed Active Archive Center, 2009] for each FLUXNET site. These so called MOD15A2 and MYD15A2 subsets provide a grid of 7x7 pixels each with an area of 1 km2 approximately centered on the flux tower location. MOD15A2 and MYD15A2 data are retrieved from the satellites Terra and Aqua and are merged into one composite time series according to Yang et al. [2006c]. If neither Terra nor Aqua delivered a FPAR value with the main algorithm the mean of all available years (2000–2010) at this day of year was taken instead. Each of the 49 pixels with the same land class as the study site according to the MODIS product MOD12Q1 was taken into account. The mean was calculated at each time step; due to the noisiness of the data, each value was weighted according to its inverse difference to the multi-annual mean at the considered day of year over all measurement years. The final daily FPAR time series was retrieved by a cubic smoothing spline fitted through all data points. For a discussion of issues regarding applied quality checks, spatial aggregation and sensor choice and interpolation,Horn and Schulz [2010] provide a detailed analysis.

Furthermore, the MODIS land product subsets MOD13Q1/MYD13Q1 containing the vegetation indices NDVI [-] (Normalized Difference Vegetation Index) and EVI [-] (Enhanced Vegetation Index) were downloaded and post-processed in the same way as the MODIS LAI/FPAR subsets. The NDVI is the ratio of the near infrared to red reflectance band and standardized to values between -1 and 1; the calculation of EVI also takes canopy background and atmospheric influences into account and incorporates blue band reflectance [Huete et al., 2002].

2.3. Light Use Efficiency Model

Horn and Schulz [2011] derived a LUE model as an advancement of the model proposed by Jarvis et al. [2004]. Both model developments were based on the principles of data-based, mechanistic model development strategies as proposed byYoung [1998, 2001], thus aiming at extracting the dominant modes of system behavior and deriving parsimonious model parameterization. Thereby, more weight is given to the information content in the available data during the model building procedure, and the problem of non-uniqueness and equifinality [Beven and Binley, 1992] in subsequent model application is avoided. For the parameter study presented here, we use the model extension by Horn and Schulz [2011] that was derived directly from an analysis of FLUXNET data using state dependent parameter estimation (SDP) techniques [Young and Pedegral, 1999; Young, 2000; Young et al., 2001]. The model is based on the following equation:

equation image

with FG [gC m−2 d−1] denoting the gross flux of carbon uptake, ϵmax [gC MJ−1] being the maximum attained light use efficiency, PAR [MJ m−2 d−1] the photosynthetically active radiation, and p a weighting factor for the subfunctions fT and fW. The latter are scaled between 0 and 1 and describe the dependence of the light use efficiency on the soil temperature, TS, and a moisture surrogate, W. A lag function [Jarvis et al., 2004] is applied to TSin the case of temperate and boreal climates (mild C- and D-climates in the Köppen-Geiger climate classification), and toWin climates with a distinct dry season (C-climates with hot and dry summers ‘Csa’ and B-climates ‘BSh’, BSk):

equation image

where α [-] is the lag parameter.Z stands for the used water availability surrogate W (equation (1)) in case of Csa, BSh and BSk climates; in all other cases it stands for TS. ZF is the filtered W or TS depending on the climate class. Thus, equation (2) is only applied to the dominant driver of the vegetation stands; this main driver is expected to trigger the start and end of dormant periods after which the vegetation has to regenerate and redevelop green tissue [Horn and Schulz, 2011].

fT is a sigmoidal peak function with the temperature Topt [°C] at which the light use efficiency is maximum, and kT [°C−1] is the rate of change from the lower level of fT to its maximum:

equation image

fW is defined as following sigmoidal function

equation image

with kW being the constant rate of change between lower and upper level and WI being the inflection point with units depending on the choice of W. The magnitude of influence fT and fW have on ϵ is determined by the factor p which ranges between 0 and 1. If p is near 1, fT has a greater influence on the light use efficiency, and vice versa, if p approaches 0, the light use efficiency is mostly influenced by fW. If both fT and fW are at their maximum, their sum is 1 and maxis realized. With this formulation the LUE model can account for contrasting biomes from boreal forests with a highly seasonal climate to semi-arid sites with the vegetation period being determined by the water availability.Figure 1 shows examples of the modeled FG in comparison to the measured flux, fT and fWfor two contrasting sites. The evaporative fraction, EF [-], has shown to be the superior water availability measure [Horn and Schulz, 2011] and is therefore applied in this study as the W-variable within thefW-subfunction. After performing a sensitivity analysis [Horn and Schulz, 2011], the parameter kW was set to a constant value (here: −13.1).

Figure 1.

LUE model results compared to the measured values for two contrasting sites: (a, b, c) the continental forest site UMBS and (d, e, f) the Mediterranean grassland site Vaira Ranch. The subfunctions fT and fW (see equations (1)(4)) are shown below (Figures 1b 1c, 1e, and 1f); the grey lines represent hypothetical, the black lines the actually realized values.

The remaining six free model parameters (max, p, Topt, kT, WI, α) were calibrated at each study site. The parameter optimization was performed by the Matlab nonlinear least-square routine “lsqnonlin”. This algorithm uses a subspace trust-region method based on the interior-reflective Newton method with box constraints [Coleman and Li, 1994]. In addition to optimized parameter values, the routine provides an approximation of the Jacobian matrix. The latter were used to provide the parameter covariance matrix which allowed the calculation of parameter confidence intervals based on the student t-distribution [Thornley and Johnson, 2002]. The confidence intervals as well as the results of a Monte-Carlo sensitivity analysis showed the parameters to be well-defined for most sites and parameters, especially those parameters of the dominant subfunction [Horn and Schulz, 2011].

2.4. Support Vector Regression

Since it is the aim of this study to explain the calibrated parameters by site-specific, biophysical characteristic attributes, a relationship between the parameter values and the attributes has to be identified. To determine this relationship, a regression technique is needed. Classical regression approaches include multi-regression and neural networks. The former, however, is prone to over-fitting and over-sensitivity to noisy data [Hawkins, 2004], and the latter suffers from difficulties finding a suitable network structure and a global optimum [Haykin, 2008]. To avoid these pitfalls, the machine learning technique support vector regression (SVR) was chosen in this study. SVR is a special application of the support vector machine technique (SVM) Vapnik et al. [1997] which is a supervised statistical learning method [Vapnik, 2000]. SVM was originally used for classification tasks by constructing a separation line that optimally separates the training samples of two classes. All SVM methods implement the method of structural risk minimization [Shawe-Taylor et al., 1998] by setting an upper bound on the error rate of a model applied on training data, rather than solely minimizing the empirical training error itself. This approach leads to a better generalization performance and thus, a higher predictive capability compared to other methods [Burges, 1998; Gunn, 1998].

Furthermore, all SVM methods have in common that they transform the training data points into a feature space using a set of nonlinear functions. In this potentially high-dimensional feature space, the problem can be solved linearly; in the classification case this is a separation line [Byun and Lee, 2002], whereas in the regression case, this is a linear regression function [Smola, 1996]. Thus, a possibly high-dimensional and highly nonlinear problem can be solved linearly [Smola and Schölkopf, 2003]. This linear problem is solved by a convex optimization function which, by definition, has a global optimum and therefore a unique solution [Basak et al., 2003]. Moreover, SVM methods require only a subset of all training samples: the support vectors (see below and Hua and Sun [2001]). This fact makes SVM methods suitable for small data sets as were available in this study with 42 sites/samples. For many applications, the mentioned characteristics and advantages (generalization capability, independence of the dimensionality and of the non-linearity of the input space, provision of a unique solution, requirement of fewer training samples) make SVR superior to other mapping techniques such as neural networks and multi-regression [Smola, 1996; Drucker et al., 1997; Tay and Cao, 2001; Yao et al., 2004; Yang et al., 2006b; Yoon et al., 2010].

In the following, an idea shall be given about the principles behind SVR; more detailed information on the technique can be found in the papers cited: As with other regression methods, the goal of SVR is to find a function which maps a set of independent variables (x) into the output domain of a depending variable (y). Thus, sample data of x and y are used to determine a function with which predictions of the output variable y can be made. In our case, y represents each of the six LUE model parameters, and x stands for the set of site characteristics which explain yor the model parameters, respectively. Finding an appropriate function can be a challenging task when being confronted with highly non-linear problems. In these cases SVR is the method of choice: The basic principle behind SVR is that the original space spanned by the training samples is transformed into a higher dimensional feature space by so-called kernel functions. In this new feature space, a linear instead of a highly non-linear regression function can be formulated [Gunn, 1998]:

equation image

with xRn and bR. The function that optimally describes the sample data is the one that minimizes the functional,

equation image

satisfying the conditions

equation image

with a norm vector w and an offset b. ϵSVR forms the margins of a band with the width of 2·ϵSVR wrapping the true output values (Figure 2a). Within this band, the optimization is insensitive to deviations of the data points from y and only x-values at the margins or outside this band are recognized by the algorithm. Data points at the margins represent the support vectors. Therewith, the magnitude of ϵSVR has an effect on the complexity of the SVR model and it also affects the number of support vectors and consequently the generalization capability of the SVR model. The training errors as distance to the margins of the ϵSVR-band are represented by the so-called slack variables ξi > 0 and ξ+i > 0, which contribute linearly to the loss function. Hence, this type of SVR using slack variables combines the structural with the empirical risk. The latter is defined by the second term in equation (6)and is called ‘soft-margin’ optimization criterion.C> 0 is the weighting factor for this training error term and determines the trade-off between the training error and dimension of the model: A large value of C leads to a large penalty for training data points outside the ϵSVR-band at the cost of the simplicity of the model.C and ϵSVR have to be adjusted for each application.

Figure 2.

(a) The linear SVR model y (black line, with the norm vector w) in the transformed space. ϵSVR defines the margins of a band in which SVR is insensitive to data values (white circles). The slack variables ξ mark the distance of the data points (black circles) outside the ϵSVR-band from its margins; their sum (the empirical error) is weighted by a cost factorC. (b) An example for the evolution of the training quality measure SSEwduring the accomplished attribute selection procedure by leaving - one by one - all attributes successively out. If theSSEwindicates an equally good or even better result (circles), the left-out attribute is finally removed (without circle).

The mapping into the higher dimensional feature space is done by Kernel functions. The idea behind Kernel functions is to enable operations in the lower dimensional input space rather than the higher dimensional feature space without having to waive the advantage of the linear solution in the feature space. The Radial Basis Function (RBF) is such a Kernel-function which has shown to be highly flexible [Hsu et al., 2003] and indeed showed in test runs the best performance and is consequently used in this study. The RBF uses only one parameter, γ, which has to be identified in addition to the SVR parameters C and ϵSVR.

This SVR parameter identification was achieved by a dynamically dimensioned search (DDS) global optimization algorithm [Tolson and Shoemaker, 2007] instead of a typically applied grid search because it showed to be more efficient in test runs. DDS automatically scales the search within the maximum number of model evaluations and allows the accomplishment of computationally challenging optimization problems. Since it became clear quickly in this study that the applied optimization algorithm (see below) always yielded C-values around 1000 and above, and the result was not sensitive to the exact value,C was always set to a value of 2000, and only γ and ϵSVRwere optimized for each training routine. SVR was implemented with the LIBSVM-package [Chang and Lin, 2001] for usage with MATLAB®. All data sets were z-transformed to zero mean and a standard deviation of 1.

2.5. Finding Explanatory Attributes

Using the described SVR algorithm, the six calibrated LUE model parameters were each regionalized depending on a bundle of characteristic attributes specific for each parameter. From the data available, attributes in various categories were collected: vegetation (coniferous, deciduous broadleaf, mixed, evergreen broadleaf, grass) and climate classifications (Köppen-Geiger), climate characteristics (temperature, precipitation, radiation, continentality and aridity index), characteristics concerning the physiological status of the vegetation (LAI, EVI, stand age), seasonal characteristics concerning the vegetation period and the seasonal course of climatic and physiological characteristics in relation to each other. Nominal attributes such as the vegetation classes were binarized.Table A2 lists all attributes with a short description.

The attributes that are relevant for each model parameter are determined by testing the model performance with various attribute combinations. SVR is capable of achieving a very high training accuracy which often does not reflect the model performance for unknown data [Burges, 1998]. Therefore, the cross-validation performance was used to test the several combinations of attributes [Basak et al., 2003; Hsu et al., 2003; Smola and Schölkopf, 2003]. Considering the small number of study sites with often just very few sites representing a specific climate and vegetation type, a ‘leave-one-out’ cross-validation was applied. Thus, it was pretended that there were no data available to calibrate the model at a specific site. Instead, the parameters at this site were retrieved by means of SVR and its site characteristics. This was repeated for each site until each site was used once as validation site. The performance was evaluated by the weighted SSE (sum of squared error):

equation image

where O represents a vector containing a calibrated LUE model parameter from each site and prefers to the respective parameters reproduced by SVR by cross-validation.i runs from 1 to the number of sites, N. The weighting factor, w, denotes the LUE model parameter uncertainty. Thus, data points having assigned a small 95% confidence interval or a small parameter uncertainty received more consideration in the SVR optimization procedure than those having been retrieved with a higher uncertainty in the LUE model calibration. The model's cross-validation performance is also determined by the standard coefficient of determination,r2, and a weighted coefficient of determination, r2w, defined as follows:

equation image

with the weights wi for i = 1 to the number of sites, N, and:

equation image

Since it is not known which and how many attributes are needed to explain the model parameters, many attribute combinations were tested. However, for computational reasons, it was not feasible to test all possible combinations. Also, the explanatory power of a specific attribute can depend on the inclusion of another attribute into the training process. This forbids a cumulative procedure starting with one attribute and adding further attributes step by step. Therefore, the attribute selection and training procedure was done iteratively starting with all attribute candidates (Figure 2b). First, a cross-validation was done with all attributes and the resultingSSEwwas stored as a training quality measure. In the next step, every attribute is removed and a cross-validation with the remaining attributes is executed. If the resultingSSEw is smaller, hence the model performance is better, the omitted attribute is removed definitively, otherwise it was used again. When all attributes have been left out once, the same procedure was done with the remaining set of features. At the most, after five rounds, the set of attributes was stable. Since the results of this approach depend on the order in which the attributes were left out, the starting configuration was randomly varied 1000 times. Despite the large number of possible configurations of the attribute matrix, repetitions of the resulting set of attributes occurred soon and new results did not appear anymore at the latest after 300 repetitions; hence, the exact position was not decisive. The attribute set with a maximum of 12 attributes that has produced the lowest SSEw- values was chosen; amongst equally good results, the attribute set with less members and those having been chosen more frequently within all runs was selected. The SVR-parameter ϵSVRwas fixed for this procedure to the value 0.06. This was somewhat lower than the default value of 0.1 and was found by test runs; it represented a trade-off between the average number of attributes chosen and the goodness of fit. ϵSVRwas optimized in a final run when the attributes were selected. For computing feasibility, the other parameters were optimized during the selection procedure only when the number of attributes changed. The performance of the SVR extrapolation scheme with the final attribute combination is again assessed in a leave-one-out cross-validation. Finally, the LUE model performance using the extrapolated parameters is also evaluated at each site by comparing the modeledFG-values to the observed time series.

3. Results

3.1. Selected Attributes

SVR is able to reproduce the six calibrated LUE model parameters in the cross-validation exercise by a combination of seven to twelve attributes (Table 1). It shall be noted that some of the attributes were binary representatives of one and the same characteristic (e.g., vegetation). Hence, it was shown that it is possible to estimate the model parameters at sites where measurement data for a model calibration were not used.

Table 1. Selected Attributes to Explain the LUE Model Parameters by Means of SVRa
  • a

    ‘VP’ refers to the vegetation period, ‘CI’ and ‘AI’ are the continentality and aridity indices, respectively. The attribute names used should be self-explanatory, however, explanations can be found inTable A2.

VPstartdbf grassEVI_VPendebf
D   r2EF-LAI 
dbf   c 
enf   ebf 

To explain the maxparameter, radiation, moisture, temperature and physiological stand characteristics were selected. The impact of radiation is represented by the maximum PAR and the average fraction of sunshine. The number of days with precipitation and the aridity index reflect the importance of sufficient water for a high ϵ. The Köppen-Geiger main type ‘D’ indicate sites with a continental climate. Also, the distinction between the vegetation classes of deciduous broadleaf forests, evergreen needleleaf forests, and grasslands, as well as the stand age were important for SVR to extrapolatemaxwhen using SVR. Finally, the day of the year at which the growing season starts, and the amplitude of EVI and its maximum are taken into consideration. The number of days with precipitation and the binary variables for the Köppen-Geiger climate class ‘D’, and deciduous broadleaf forest led to the greatest decline in the model performance when omitted (Figure 3a).

Figure 3.

The quality criteria SSEw for each selected attribute when it has been left out for the model parameters (a) max, (b) p, (c) Topt, (d) k, (e) EFI, (f) α. The thick black line indicates the SSEw-value when all attributes have been used.

Regarding the parameter p, which determines the influence of the temperature and moisture function on the light use efficiency, those attributes related to seasonal characteristics played a decisive role. In addition to general climatic features (mean temperature, average annual net radiation balance and the fraction of sunshine) the differentiation between deciduous and non-deciduous sites, between continental climates with strong seasonal characteristics and those without, appeared to be essential for the differences inp between sites. Furthermore, the LAI amplitude within each year, the timing of the onset of the growing season and, finally, the correlation between the annual courses of evaporative fraction (EF) and LAI, and between EF and TS explain p. The omission of the four latter variables at a time strongly deteriorates the extrapolation performance strongly, especially without the LAI amplitude attribute (Figure 3b).

The peak of the temperature function Topt can be recaptured by four attributes based on temperature characteristics: the mean soil temperature, the mean air temperature and the continentality index (depending on the temperature difference of the coldest and warmest month), as well as the correlation between TS and EVI. The selection procedure chose the maximum PAR as a further explanatory variable. The length of the vegetation period and the variable indicating deciduous broadleaf forest or not, complement the variables with which SVR can explain the Topt-variations best. Removing the vegetation period attribute had by far the strongest negative effect upon the SVR performance within this combination (Figure 3c).

Attributes related to the seasonal dynamics accounted for five of the eight attributes for the parameter k, which indicates the rate of change of the temperature function, fT (equation (3) ): the start of the growing season, the EVI at the end of the vegetation period, the seasonal correlation between TSand EF as well as the attributes for the Köppen-Geiger main type ‘D’ and deciduous broadleaf vegetation class. The mean air and soil temperatures and the grass vegetation class were additionally selected to extrapolatek. Omitting the temperature to EF correlation deteriorated the SVR extrapolation capability the most (Figure 3d).

Eleven attributes had to be applied to extrapolate the inflection point, WI, of the moisture function with EF as water availability surrogate and therefore referred to as EFI in the following. Four of them are directly related to EF characteristics (amplitude, maximum, mean EF and the correlation between EF and TS) and one attribute is a water stress measure (aridity index). The two general climate features, mean air temperature and fraction of sunshine, were also among the selected attributes as well as the start of the vegetation period, the EVI, the vegetation class evergreen broadleaf forest and the Köppen-Geiger subtype ‘c’ indicating cool and short summers in temperate and continental climates. The omission of both, the mean air temperature and the fraction of sunshine, in the SVR extrapolation procedure had the strongest negative impact on the quality measureSSEw (Figure 3e).

The lag parameter α, was extrapolated with five out of eight attributes having a connection to seasonal dynamics. The continentality index depended on the temperature amplitude, the LAI amplitude, the length and start of the vegetation period, and the correlation between TS and EF. The average fraction of sunshine and the vegetation class attributes of evergreen needleleaf and broadleaf forests complemented the set features explaining p-dynamics between sites. Omitting the LAI amplitude and theTS-EF-correlation, deteriorate the extrapolation quality the most (Figure 3f).

3.2. SVR Performance

With the selected features, all six LUE model parameters can be extrapolated with reasonable results in the framework of the cross-validation (Figure 4). The r2-values ranged between 0.68 forTopt and 0.90 for EFI, r2w-values between 0.60 for α and 0.94 formax and EFI. The largest deviations occurred for Topt at Metolius Intermediate, for k at Oensingen, and for α at various sites; especially coniferous forests, although their parameter uncertainty with respect to the model calibration is in the medium range. However, α and Topt, the parameters with the lowest SVR cross-validation performance, were also the parameters with the highest calibration uncertainties.

Figure 4.

The calibrated model parameters compared to the parameters extrapolated with SVR by means of the respective other sites (‘leave-one-out’ cross-validation): (a)max, (b) p, (c) Topt, (d) k, (e) EFI, (f) α. As quality measures, the coefficient of determination, r2, and the coefficient of determination weighted by the confidence intervals resulting from the parameter calibration (r2w) are given.

In addition to the cross-validation, the trained SVR model is run with the data from all sites to determine the number of support vectors since they allow an additional evaluation of the generality capability of the SVR model. Using the SVR-parameter ϵSVR as determined in the above selection procedure, led to a virtually perfect match of the SVR model applied to all study sites, and the number of support vectors varies between 36 (k) and 41 (Topt, α) support vectors. As a further test, the SVR-parameter ϵSVR was increased as long as the r2-value was greater than 0.85, a compromise between precision and generality of the model. Consequently the number of support vectors was determined. This exercise led to numbers of support vectors between 17 (EFI) and 22 (p). Hence, not surprisingly, the parameters which yielded better results in the cross-validation, also tended to need somewhat fewer support vectors. And data points or sites, respectively, which experienced larger deviations in the cross-validation consequentially served more often than others as support vectors since they obviously cannot be explained well by the other sites.

3.3. LUE Model Performance

The performance of the final LUE model is certainly more relevant than the accuracy with which the calibrated model parameters are reproduced by the extrapolation scheme. Therefore, the model is run with the extrapolated parameters at all sites. Comparing the derived FG time series with the dynamics of the calibrated model time series revealed a high similarity of the variance indicated by high r2-values between 0.91 and 1.00 (Table A3). However, the time series were biased in some cases. The largest bias (difference between the means of the time series) was found at Sylvania Wilderness, with a value of 0.99 gC m−2 d−1, accompanied by a RMSE (root mean square error) of 1.71 gC m−2 d−1 compared to a FG amplitude of about 13 gC m−2 d−1. More positive than negative biases occured indicating an overprediction of fluxes; the median of the bias values is 6.7. RMSE-values normalized with the range ofFG-values (nRMSE) are between 0.01 (Willow Creek) and 0.14 (Wind River, Blodgett).

The comparison with the measured FG time series is certainly more important and shows FG-values derived with the extrapolated parameters explaining large parts of the measured variations withr2-values between 0.46 at Donaldson and 0.95 at UMBS; these sites had already the lowest and highest calibration performance. Indeed, ther2-values had an average of 0.82 very similar to those of the calibration (r2 = 0.84). Only in one case (Wind River), the coefficient of determination differs maximally by 0.1. At this site, the temperature function is given more weight by the extrapolated p-value and its shape leads to higher light use efficiencies at high temperatures which results in a strong over-prediction ofFG-fluxes in summer. This behavior is aggravated by an additional slight over-prediction ofmax. Considering all sites, the biases range between 0.01 and 1.12; the latter occurs at Griffin having a FG-range of about 12 gC m−2 d−1. The frequency distribution of all biases yields a mode of 0.00 gC m−2 d−1(with a class-width of ±0.10), and a median of 0.08. RMSE-values range between 0.62 and 2.16, with a mean of 1.25. Normalized RMSE-values vary between 0.06 and 0.19; the latter is found again at Donaldson. Relating these nRMSE-values to those resulting from the model calibration, however, shows a stronger deterioration of this model performance measure due to the extrapolation process than for ther2-values. Hence, the correlation of the extrapolated time series are not strongly affected by the extrapolation. Rather, the absolute deviation from the measured time series are somewhat more influenced due to observed biases and are discussed below.

Diagnostic models such as the LUE model presented, are a suitable tool for carbon budgeting [Beer et al., 2010]. Consequently, not only simulating the temporal evolution of the gross primary productivity with an adequate accuracy is an important criterion for a light use efficiency model, but also its reliability to calculate the cumulative sum of carbon uptake at a considered area within a specific time frame. Therefore, the cumulative sums of the modeled and extrapolated FG time series were compared with the measured ones for the whole time series available. Figure 5 shows four examples of this comparison. The relative error of the sum of the carbon uptake per year was calculated. Values ranged between −23 % and 37 % difference per year; these values with the lowest performance are found at Sylvania Wilderness and Griffin, both of which show large biases. Griffin already had a relatively low model calibration performance in terms of a quite large bias and wide uncertainty bounds of four parameters including those of the dominating subfunction fT. At Sylvania Wilderness, in contrast, the model performance was acceptable, only the uncertainty bounds of fT were somewhat wider than usual. In this case, the extrapolation performance was not satisfyingly. Especially the parameters of fT could not be adequately recaptured. Taking all sites into account, the mean of the relative differences was 2 %, the median was 7 %, therefore a positive bias of the error distribution (in terms of relative differences per year) between the extremes at these two sites, was detected. Considering the absolute values of these errors (instead of the positive and negative differences), the relative errors range between 0.4 % and 37 %; the mean of these values is 13 %, the median is 12 %.

Figure 5.

The measured (black) temporal evolution of FG compared to the time series resulting from the calibration (blue) and parameter extrapolation (red), exemplarily shown for three study sites: (a) UMBS, (b) Neustift, (c) Donaldson. Overall, UMBS achieved the highest r2-value (0.95), Neustift the mediumr2 of 0.82, and Donaldson has with a r2 of 0.43 the lowest r2.

4. Discussion

Using support vector regression, the optimized parameters of a light use efficiency model have been related to climatic and biophysical site characteristics in order to be able to determine model parameters at sites where a calibration is not possible. This extrapolation approach was evaluated by a ‘leave-one-out’ cross-validation. Comparing the extrapolated parameters with the calibrated ones at each site shows a good correlation withr2w-values between 0.6 and 0.94. The fluxes modeled with these parameters correlated very well with the fluxes originating from the calibration model results (r2-values > 0.91) but show biases of 12 % per year on average, with respect to the cumulative sum, and 5 % with respect to the means of the GPP time series. The better correlation of the fluxes than that of the model parameters themselves is a consequence of the fact that a large part ofFG-dynamics is explained by PAR or APAR itself [Jenkins et al., 2007]. The comparison of fluxes between those resulting from model runs with the extrapolated parameters and the measured FG-values revealedr2-values in the range of 0.43 and 0.95 which is only marginally lower than ther2-values of the calibration performance in most cases. Simulating the fluxes at Donaldson where the lowest model and extrapolation performance occurred revealed also difficulties in the studies ofYuan et al. [2007, 2010]. Due to the observed biases between the modeled and measured data sets, however, the normalized RMSE-values differ more. Nonetheless, the deterioration of the cumulativeFG-sums due to the extrapolation procedure is on average 7 %.

The observed biases were introduced by inaccurate extrapolations of the model parameters. For example, an overestimation of max in the extrapolation procedure will also lead to an overestimation of the modeled GPP time series. Another large uncertainty factor within this extrapolation framework, which could possibly lead to lower extrapolation and model performances, certainly is the input of MODIS LAI/FPAR data. They have often been found to be inaccurate under certain conditions [Wang et al., 2005; Pandya et al., 2006; Pisek and Chen, 2007; Horn and Schulz, 2010], especially for needleleaf forests [Wang et al., 2004; Yang et al., 2006a]. Additionally, there is a distinct scale mismatch between EC measurements and MODIS data, and the linkage between these two data sources is complicated by the variability of the area for which EC measurements are representative [Chen et al. 2009, 2010]. However, remote sensing data are the only data source for these variables at all study sites, so it is common practice to use this product despite its limitations [Yuan et al., 2007; Xiao et al., 2008, 2011]. Against this background, it is used in this study, too, being aware of its drawbacks.

Considering other studies simulating the gross primary production on a daily basis and using similar performance measures, the results of this study compare very well (Table 2). Yuan et al. [2007] calibrated a light use efficiency model with 12 AmeriFlux sites and validated it with 16 other sites which yielded in r2-values of 0.84 and 0.77, respectively; the relative validation error (as % difference of the simulated and observed means relative to the observed mean) was about 18 %.Yang et al. [2007] trained a SVM model with 36 AmeriFlux sites for the years 2000–2003 and predicted the fluxes of 2004 with an r2-value of 0.71, an RMSE of 1.87 and an average error of 28 % with respect to the cumulative sums. Both models out-performed the MODIS-GPP algorithm. The calibration of two process based photosynthesis models using five consecutive measurement years of a FLUXNET forest site byVerbeeck et al. [2008] resulted in r2-values of 0.72 and 0.73, an RMSE of about 2.3 and an average error of 17 and 18 % per year with regard to the cumulative GPP sums. It is clear that the results cannot be compared directly due to differences in the data processing methods, differing degrees of freedom, as well as the number and type of validation and calibration sites, but they did reveal a tendency. The presented modeling and extrapolation scheme reveals a performance at least as good as previously proposed models with its average model calibrationr2of 0.84, a cross-validationr2 of 0.82, an average RMSE of 1.25 and an average relative error per year of 13 % with regard to the cumulative sums, and 6 % with regard to the means of the GPP time series.

Table 2. Comparison of the Model Performance of the Proposed Model and Extrapolation Scheme With the Calibration and Validation Performance of Other Studies Using FLUXNET Sitesa
  • a

    The RMSE is in gC m−2 d−1, RE refers to the difference between the annual sums of measured and modeled gross primary production relative to the measured sums. DoF: degrees of freedom.

current0.840.821.255 %13 %42 sites, leave-one-out
      cross-validation. DoF: 6.
Yuan et al. [2007]0.850.77 17 %18 %12 forest sites for calibration,
      16 for validation of a LUE model.
      DoF: 2.
Yang et al. [2007] 0.711.87 28 %36 sites, SVR model trained with data from 2000–2003, validated with 2004.
      DoF: -
Verbeeck et al. [2008]0.78 / 0.800.72 / 0.732.33 / 2.382 %17 / 18 %Two leaf scale photosynthesis models applied at one site to 5 years; 1 of these years served for calibration.
      DoF: 24 (6 parameters calibrated, 18 defined a priori)

The composition of the automatically selected attribute sets for the respective parameters showed that the selection appears to follow biophysically meaningful patterns. For max, the maximum PAR and the average annual fraction of sunshine are the most intuitive and directly related explaining attributes; the fraction of sunshine or the cloud coverage, respectively, and therefore diffuse PAR, were discussed several times in the last years in this context and showed to be an important factor influencing the light use efficiency [Medlyn, 1998; Schwalm et al., 2006; Jenkins et al., 2007]. Three main vegetation classes were also selected. The vegetation type was often shown to strongly influence max [Running et al., 2004]. Garbulsky et al. [2010] showed in their comprehensive analysis that maxwas also determined by the vegetation type, but in a first instance, by precipitation. It is therefore not surprising that two attributes including the number of days with precipitation and the aridity index account for this climatic variable. The inclusion of the EVI-attributes was supported by studies ofWu et al. [2010] or Sims et al. [2008] who found EVI to be capable of capturing light use efficiency variations. The timing of the start of the growing was shown by Falge et al. [2002] and Schwalm et al. [2006]to influence maximum carbon uptake especially in boreal climates. The respective boreal Köppen-Geiger-class was also selected for the SVR extrapolation. The effect of stand age, finally, had also often been shown to be important for light use efficiency differences between sites, e.g., byDesai et al. [2008].

The outcome of the selection process of the other five parameters cannot be directly compared to other studies since they are model specific. However, it can be discussed whether the sets of selected attributes appear to be biophysically meaningful and reveal biophysical characteristics of the respective parameters. In addition to the three general climate characteristics, mean air temperature, average annual fraction of sunshine and annual radiation net balance, the parameter p (balancing temperature moisture influences) was explained by attributes describing the seasonality of the sites as well as the interdependence of temperature, EF and LAI. This selection is consistent with the fact, that EF as a water availability proxy is not a pure moisture indicator, but integrates system dynamics. It is not solely an index of water deficit and is connected with soil moisture and thereby precipitation, but it is also linked to the temperature gradient between surface and atmosphere and the biophysical process of stomatal carbon exchange [Schwalm et al., 2010]. Hence, accounting for the correlations between EF, LAI and TS when determining the magnitude of p and thereby the influence of EF and TSis the logical conclusion. Intimately connected with these considerations are attributes indicating strong seasonal dynamics due to dormant periods such as the D-Climate and the vegetation class deciduous forest because in these periods the components can be decoupled from each other. The carbon assimilation of needle leaf forests can be very low due to low temperatures in D-climates despite LAI-values greater than zero. In deciduous forests,FGis not necessarily correlated with EF in the dormant period, and there is no photosynthesis when the weather conditions are temporarily favorable for photosynthesis but the trees are still bare-branched.

It is not surprising that four temperature attributes were found to explain the variations of the parameter Topt. The attributes indicating the length of the vegetation period and deciduous broadleaf forests can be explained by the pattern observable in the study data. In coniferous forests, higher light use efficiencies occurred more often in spring or autumn when the temperature conditions were favorable but the solar radiation was not very high. The light use efficiency of deciduous broadleaf forests, however, followed more the course of the temperature with leaf development corresponding to the temperature increase in spring, which led to higher light use efficiencies occurring at higher temperatures when the leaves were fully developed. The parameter k, as the rate of change of the temperature function fT, was closely linked to the parameter Topt and intrinsically connected to seasonality indicators. It was therefore comprehensible that attributes corresponding to these characteristics dominated this attribute set.

Four attributes predicting site variations of the inflection point, EFI, of the moisture function, fW, were directly related to water availability surrogates. The attribute evergreen broadleaf forests was part of this set. That this vegetation class appeared in this attribute set can be justified by the adaptation of evergreen broadleaf forests to a warmer Mediterranean climate with elevated drought risks through a better water use efficiency [Pereira et al., 2007], and thus a generally lower EFI than average.

Finally, all attributes used to explain α (except for the fraction of sunshine) are related to seasonal dynamics and indicate large or small seasonal differences of temperature or moisture. This makes sense considering the fact that stronger seasonal differences in environmental conditions tend to lead to lag effects regarding the reaction of plants to these driving forces. However, the capability of SVR to recapture the parameter α was lowest compared to the other parameters. This is especially true for coniferous forests. However, it has to be kept in mind that, as a general trend, the parameter uncertainties were highest at coniferous and mixed forest sites. Furthermore, a suboptimal reconstruction of α is found at sites having a rather low p-value, thus a higher influence of the EF-function. A reason for this could be that α is not always used by the model parameter optimization process to account for actual lag effects but also to compensate for model deficiencies. The usage of this parameter has therefore to be reconsidered in further studies.

This study also analyzed which attribute in each set had the most negative effect on the performance of the respective extrapolation when executing the SVR model without it. This certainly indicates the importance of the parameter, but it is clear that this is only true for the selected set of features. Another important indicator for the importance of an attribute for a specific parameter is the frequency with which it was selected amongst all performed selection loops. For example, the mean air temperature and the fraction of sunshine have the largest negative impact on the EFI-extrapolation, the EF-attributes, however, seem to be as least important, but compensate each other to a certain extent when one of them is left out. Furthermore, there are no sets resulting from the selection process with the multiple executed runs without EF-attributes, but there are sets without the mean air temperature, and even more of them without the fraction of sunshine.

Overall, the day of year of the start of the vegetation period appeared as the most frequently selected attribute. In the only set of attributes in which it was not a member, the length of the vegetation period was selected instead. The binary attribute ‘deciduous broadleaf forest’ was found in four of the six sets. The attributes mean air temperature, the average annual fraction of sunshine, the correlation between the time courses of temperature and EF as well as the continental Köppen-Geiger-climate class ‘D’ each explain three of the six parameters. Hence, amongst the most frequently selected attributes two are related to climate characteristics and four to characteristics related to seasonal dynamics whereas we count ‘D’ and the attribute deciduous forest to this group since they indicate strong seasonality. If we assign all selected attributes to four categories (climate, seasonality, phenology, and vegetation class), most attributes will fall into the climate category – about half of which are related to moisture characteristics – followed by phenology and seasonality with equal number of matches. The attributes ‘D’, the continentality index, the vegetation class deciduous forest, and the amplitudes of the variables EF, LAI and EVI can also be classified as indicators of seasonality; in this case, the seasonality category clearly dominates the attributes. This outcome is of practical importance for future diagnostic model building exercises.

In summary, this study proved the developed regionalization scheme to be suitable to extrapolate the calibrated, site-specific parameters of a LUE model to locations outside the calibration domain where no EC flux measurements are available. Based on the derived non-linear relationships between the model parameters and biophysical site characteristics using SVR, the model parameters could be determined with reasonable precision in most cases; this was shown by a cross-validation. The resulting time series of carbon uptake modeled with the extrapolated parameters yielded good correlations similar to the original model with the calibrated parameters. However, a bias was in some cases introduced leading to deviations of the annual sum of assimilated carbon of 13% per year, on average. Amongst the attributes selected, those related to seasonality characteristics dominated. It is clear, that, due to its high adaption capacities, SVR could also perform well with similar sets of features. However, it is also obvious from the outcomes that SVR cannot extrapolate the parameters with arbitrarily attributes. This fact undermines the argument that SVR is not suitable for such a task because it somehow can perform with any variable combination via the high dimensional feature space. The non-arbitrarily appearing attribute selection also attests a certain biophysical meaning to the parameters of the LUE model to some degree, and does not indicate a purely empirical nature, although the lag parameter has to be reconsidered. In a future study, however, a more efficient algorithm to select the characteristic attributes could be tested. The stationary of derived mapping characteristics, especially under changing data/location availability, will be interesting to be explored, too.

The exercise challenging the SVR model with an increase of the SVR parameter ϵSVR responsible for the SVR generalization capacity shows a certain capability of the applied method in this regard. But clearly, the number of sites in this study was at its minimum given the broad range of vegetation and climate classes. Consequently, this study arouses curiosity about performing this exercise with a larger data set to cover more combinations of vegetation and climate classes and subtypes. Then, a model application continuous in space and time will be made possible also over large and divers areas. To realize this, the found relationships between the calibrated model parameters and the biophysical characteristics, which have been extracted for each parameter in this study by means of SVR, will allow the determination of an individual set of parameters for each model grid cell. The necessary characteristic attributes were chosen such that they can be retrieved from remote sensing and meteorological databases.

Appendix A:: Study Sites, Site Characteristics, and Model Performance

The appendix provides information on the FLUXNET study sites and years as well as their vegetation and climate classes (Table A1). It specifies the site characteristics used to spatially extrapolate the model parameters (Table A2) and shows extrapolation performance measures with respect to the measured time series and those resulting from the model calibration (Table A3).

Table A1. Main Characteristics of the Study Sitesa
  • a

    Vegetation: deciduous broadleaf forest (DBF), mixed (MF), evergreen needleleaf (ENF), evergreen broadleaf (EBF), grass (G). Köppen-Geiger climate classes: steppe climate (BS), temperate (C), continental (D); summer dry (s), fully humid (f); hot (h), cold in winter (k); hot summer (a), warm summer (b), cool summer (c), cold winter (d).

Black Hills (US-Blk)ENFDfa2004–2006Wilson and Meyers [2007]
Blodgett (US-Blo)ENFCsb2002–2006Goldstein et al. [2000]
Boreas (CA-Man)ENFDfc1995–2005Goulden et al. [2006]
Donaldson (US-SP3)ENFCfa2001–2004Gholz and Clark [2002]
Flakaliden (SE-Fla)ENFDfc2000–2002Wallin et al. [2001]
GLEES (US-GLE)ENFDfc2006–2008Massman and Clement [2005]
Griffin (UK-Gri)ENFCfb1998,2000–2001Clement et al. [2003]
Hyytiälä (Fl-Hyy)ENFDfc1997–2006Suni et al. [2003]
Le Bray (FR-LBr)ENFCfb2001–2003Berbigier et al. [2001]
Loobos (NL-Loo)ENFCfb1997–2006Dolman et al. [2002]
Metolius Interm. (US-Me2)ENFCsb2002–2005,2007Anthoni et al. [2002]
Metolius Young (US-Me5)ENFCsb2002–2002Anthoni et al. [2002]
Niwot Ridge (US-NR1)ENFDfc1999–2006Sacks et al. [2006]
Norunda (SE-Nor)ENFDfb1996–2005Lagergren et al. [2005]
Tharandt (DE-Tha)ENFDfb1997–2003Grünwald and Bernhofer [2007]
Wetzstein (DE-Wet)ENFDfb2002–2008Rebmann et al. [2010]
Wind River (US-Wrc)ENFCsb1999–2004,2006Shaw et al. [2004]
Bartlett (US-Bar)DBFDfc2004–2007Jenkins et al. [2007]
Duke Hardwood (US-Dk2)DBFCfa2001–2005Stoy et al. [2005, 2007]
Hainich (DE-Hai)DBFDfb2000–2007Mund et al. [2010]
Hesse (FR-Hes)DBFCfb1997–2007Granier et al. [2008]
MMSF (US-MMS)DBFDfa1999–2006Schmid et al. [2000]
Missouri Ozark (US-MOz)DBFDfa2005–2008Gu et al. [2006, 2007]
Roccarespampani (IT-Ro1)DBFCsa2001–2003Keenan et al. [2009]
Soroe (DK-Sor)DBFCfb1997–2005Pilegaard et al. [2003]
Sylvania Wilderness (US-Syv)DBFDfb2002–2004Desai et al. [2005]
UMBS (US-UMB)DBFDfb1999–2003Gough et al. [2008]
Willow Creek (US-WCr)DBFDfb2000–2006Cook et al. [2004]
Castelporziano (IT-Cpz)EBFCsa2002–2003Seufert et al. [1997]
Puechabon (FR-Pue)EBFCsb2001–2008Allard et al. [2008]
Audubon (US-Aud)GBSh2004–2008Wilson and Meyers [2007]
Goodwin Creek (US-Goo)GCfa2004–2006Wilson and Meyers [2007]
Lethbridge (CA-Let)GDfb1999–2004Flanagan [2009]
Neustift (AT-Neu)GDfb2002,2005–2007Wohlfahrt et al. [2008]
Oensingen (CH-Oe1)GDfb2002–2007Ammann et al. [2009]
Peck (US-FPe)GBSk2000–2006Wilson and Meyers [2007]
Vaira Ranch (US-Var)GCsa2001–2007Ma et al. [2007]
Brasshaat (BE-Bra)MFCfb1997–2008Carrara et al. [2003, 2004]
Duke (US-Dk3)MFCfa1999–2002Siqueira et al. [2006]
Harvard (US-Ha1)MFDfb1992–2007Urbanski et al. [2007]
Howland (US-Ho3)MFDfb1996–2004Hollinger et al. [2004]
Vielsalm (BE-Vie)MFCfb2000–2008Aubinet et al. [2001]
Table A2. Site Characteristics Used to Regionalize the Model Parametersa
  • a

    Köppen-Geiger climate and vegetation classes: see caption ofTable A1. λ: latent heat of vaporization, ϕ: geographical latitude, TA is the difference between the coldest and warmest month.

Vegetation and Climate Classes
B,C,DMain type of the Köppen-Geiger classification [-]
f,sSubtype 1 of the Köppen-Geiger classification [-]
a,b,cSubtype 2 of the Köppen-Geiger classification [-]
ENF, DBF, MF, EBF, GVegetation class [-]
Climate Characteristics
meanTSmean annual soil temperature (TS) [°C]
diffTSamplitude of TS [°C]
maxTSmaximum TS [°C]
meanTAmean annual air temperature (T) [°C]
sumPaverage sum of precipitation per year [mm]
Pdaysnumber of days with precipitation [d]
maxPARmaximum photosynthetically active radiation (PAR) [MJ m−2 d−1]
sumPARaverage annual cumulative sum of PAR [MJ m−2 d−1]
sunshineaverage fraction of sunshine per year (according to LocClim) [%]
Rnetannual net radiation balance [MJ m−2 d−1]
maxEFaverage maximum evaporative fraction (EF) [-]
minEFaverage minimum EF [-]
diffEFamplitude of the EF [-]
meanEFaverage EF [-]
PEPotential monthly evaporation [mm] (according to LocClim)
AIaridity index [-] [Budyko, 1958]: AI = 100·Rnet/(Psum·λ)
CIcontinentality index [-] [Conrad, 1950]:
 CI = 1.7·TA/sin(ϕ+10)−14
Physiological Characteristics
maxLAImaximum leaf area index (LAI) [m2 m−2]
diffLAIamplitude of the LAI [m2 m−2]
meanLAIaverage LAI [m2 m−2]
maxEVImaximum enhanced vegetation index (EVI) [-]
minEVIminimum EVI [-]
diffEVIamplitude of the EVI [-]
meanEVIaverage EVI [-]
ageage of the vegetation stand [a]
Seasonal Characteristics
vegLengthaverage length of the vegetation period [d]
VPstartaverage day of year of the start of the vegetation period
VPendaverage day of year of the end of the vegetation period
VPstartEVIaverage EVI at the start of the vegetation period [-]
VPendEVIaverage EVI at the end of the vegetation period [-]
VPstartLAIaverage LAI at the start of the vegetation period [-]
VPendLAIaverage LAI at the end of the vegetation period [-]
r2TS-LAIcorrelation between the TS and LAI time series [-]
r2EF-LAIcorrelation between the EF and LAI time series [-]
r2TS-EFcorrelation between the TS and EF time series [-]
r2EF-EVIcorrelation between the EF and EVI time series [-]
r2TS-EVIcorrelation between the TS and EVI time series [-]
Table A3. SVR Extrapolation Performance Related to the Calibrated Model and Related to the Measured Time Series of Carbon Uptakea
 Calibrated ModelMeasured Time Series of Carbon Uptake
  • a

    The bias is the difference between the means, nRSME is the RMSE normalized with the range of FG-values, RES (relative error) refers to the difference between the annual sums of measured and modeled gross primary production relative to the measured sums.



We greatly appreciate the careful and constructive review of this manuscript by Bano Mehdi. We gratefully acknowledge all FLUXNET investigators generously providing level 2 data for this analysis. Specifically, we thank: (1) Christof Ammann of the Swiss Federal Research Station ART, Zürich, for data of Oensingen; (2) Marc Aubinet of the Universitaire des Sciences Agronomiques for data of Vielsalm; (3) Dennis Baldocchi of the University of California, Berkeley, for data of Vaira Ranch; (4) Paul Berbigier of INRA-Bioclimatology for data of Le Bray; (5) Christian Bernhofer of the TU Dresden for data of Tharandt; (6) Ken Bible of the University of Washington for data of the Wind River Canopy Crane Research Facility; (7) Paul Bolstad of the University of Minnesota, Bruce Cook from the NASA, Kenneth J. Davis of The Pennsylvania State University, and Ankur Desai of the University of Wisconsin for data of Sylvania Wilderness and Willow Creek; (8) Alexander Cernusca and Georg Wohlfahrt of the University of Innsbruck for data of Neustift; (9) Reinhart Ceulemans and all members of the Research Group of Plant and Vegetation Ecology of the Department of Biology, University of Antwerp, for data of Brasshaat; (10) Peter S. Curtis of The Ohio State University for data of University of Michigan Biological Station (UMBS); (11) Hans-Peter Schmid and Danilo Dragoni of Indiana University, funded by the U.S. Department of Energy Terrestrial Carbon Processes program (DOE TCP), for data of Morgan Monroe State Forest (MMSF); (12) Lawrence B. Flanagan of the University of Lethbridge for data of Lethbridge; (13) Allen H. Goldstein of the University of California, Berkeley, for data of Blodgett; (14) Andre Granier of the French National Institute for Agricultural Research for data of Hesse; (15) Lianhong Gu of the Oak Ridge National Laboratory for data of Missouri Ozark; (16) David Y. Hollinger of the U.S. Forest Service for data of Howland Forest; (17) Gabriel G. Katul of Duke University for data of Duke Forest (Loblolly Pine); (18) Werner L. Kutsch of Max-Planck-Institute for Biogeochemistry at Jena (now at the German Federal Research Institute for Rural Areas, Forestry and Fisheries) for data of Hainich; (19) Beverly E. Law of Oregon State University for data of Metolius funded by DOE grant DE-FG02-06ER64318; (20) Anders Lindroth of Lund University for data of Flakaliden and Norunda; (21) Timothy A. Martin of the University of Florida for data of Donaldson; (22) William J. Massman of U.S. Forest Service for data of GLEES; (23) Tilden Meyers of NOAA for data of the study sites Black Hills, Audubon, Goodwin Creek, Peck; (24) John B. Moncrieff and R. J. Clement of the School of GeoSciences, The University of Edinburgh, as part of the CarboEurope programme for data of Griffin; (25) Russell Monson of the University of Colorado for Niwot Ridge; (26) Eddy J. Moors of Wageningen University for data of Loobos; (27) J. William Munger of Harvard University for data of Harvard Forest; (28) Ram Oren, Paul Stoy, Gaby Katul, Kim Novick, Mario Siqueira and Jehnyih Juang of Duke University for data of Duke Forest (Hardwood); (29) Kim Pilegaard of Risø DTU National Laboratory for data of Soroe; (30) Serge Rambal of CEFE-CNRS for data of Puechabon; (31) Corinna Rebmann of the Max-Planck-Institute for Biogeochemistry at Jena for data of Wetzstein; (32) Andrew D. Richardson of Harvard University for data of Bartlett; (33) Riccardo Valentini of the University of Tuscia for data of Roccarespampani, Castelporziano; (34) Timo Vesala of the University of Helsinki for data of Hyytiälä; (35) Steve Wofsy from Harvard University (1994–2005) and Brian Amiro from University of Manitoba (2005–2008) for data of Boreas. The data were downloaded from the CarboEuropeIP Ecosystem Component Database supported by the European Commission, as well as from the AmeriFlux data archive at the Carbon Dioxide Information Analysis Center (CDIAC) of DOE's (U.S. Department of Energy) Oak Ridge National Laboratory (ORNL). This study was funded by the Deutsche Forschungsgemeinschaft (DFG, grant SCHU 1271/4–1/2).