What eddy-covariance measurements tell us about prior land flux errors in CO2-flux inversion schemes



[1] To guide the future development of CO2-atmospheric inversion modeling systems, we analyzed the errors arising from prior information about terrestrial ecosystem fluxes. We compared the surface fluxes calculated by a process-based terrestrial ecosystem model with daily averages of CO2flux measurements at 156 sites across the world in the FLUXNET network. At the daily scale, the standard deviation of the model-data fit was 2.5 gC·m−2·d−1; temporal autocorrelations were significant at the weekly scale (>0.3 for lags less than four weeks), while spatial correlations were confined to within the first few hundred kilometers (<0.2 after 200 km). Separating out the plant functional types did not increase the spatial correlations, except for the deciduous broad-leaved forests. Using the statistics of the flux measurements as a proxy for the statistics of the prior flux errors was shown not to be a viable approach. A statistical model allowed us to upscale the site-level flux error statistics to the coarser spatial and temporal resolutions used in regional or global models. This approach allowed us to quantify how aggregation reduces error variances, while increasing correlations. As an example, for a typical inversion of grid point (300 km × 300 km) monthly fluxes, we found that the prior flux error follows an approximate e-folding correlation length of 500 km only, with correlations from one month to the next as large as 0.6.

1. Introduction

[2] Carbon dioxide (CO2) fluxes at the Earth's surface may be recovered (or inverted) from the observed spatial and temporal gradients of the CO2 concentrations in the atmosphere by applying Bayes' theorem [e.g., Enting et al., 1995; Bousquet et al., 2000; Gurney et al., 2002]. Atmospheric mixing makes the problem ill-constrained and therefore prior information about the CO2 flux originating from the land and water surface is also used in the inversion process. In statistical terms, this approach transforms the prior probability density p(x) about the CO2 fluxes, jointly called state vector x here, into the posterior probability density p(x∣y) conditioned on atmospheric measurements, jointly called y. The statistically optimal estimator of the fluxes, given the available information, corresponds to the maximum of the function p(x∣y). By design, it critically depends on the assumed prior density function p(x). Under the numerically convenient assumption of a multivariate Gaussian density, describing p(x) requires assigning means, variances and correlations. The atmospheric inversion studies of CO2 fluxes published so far have assumed various probability distributions centered on climatology, regional inventory statistics or the output of terrestrial ecosystem models, as well as ocean carbon cycle models [Gurney et al., 2002]. In practice, some of the key characteristics of the prescribed a priori flux error distributions p(x) in use stem from the capacity of the current flux-inversion systems to deal with large state vectorsx, rather than from the statistics of the inference problem: the largest correlation patterns in space and time are specified in the case of classical analytical systems (i.e., coarse regions inversions [e.g., Gurney et al., 2002]), while the narrowest structures (i.e., pixel size) can be introduced in the variational (i.e., adjoint-based) schemes [Chevallier et al., 2005; Rödenbeck, 2005; Baker et al., 2006]. Ensemble methods lie in-between [Zupanski et al., 2007; Peters et al., 2007; Feng et al., 2009]. This subjective choice of error correlation structures critically influences the way the information from a single atmospheric measurement is spread in space and time for the flux inversion systems.

[3] Two studies have attempted to shed light on the characteristics of p(x) based on observations. Michalak et al. [2005] used CO2 concentration measurements within a flux inversion system by introducing some poorly known characteristics of the prior errors in the state vector x. They highlighted the power of their method but stressed its subjectivity. In the second study, Chevallier et al. [2006]relied on the non-gap-filled, raw CO2flux measurements at the eddy-covariance flux sites (total 34) in the northern hemisphere to constrainp(x). They showed a heavy-tail distributionp(x) that contradicts the usual assumption of a multivariate Gaussian distribution. Further, the error correlations appeared to follow a linear temporal dependency after the second lag day without any particular spatial structure.

[4] Following the approach of Chevallier et al. [2006], we examine the characteristics of p(x) for terrestrial ecosystem CO2 fluxes, when p(x) is centered around the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE), a process-based ecosystem model [Krinner et al., 2005]. Our study advances our previous knowledge in two ways. First, it uses a much-wider archive of eddy-covariance sites (156 in total) with gap-filled records, which provides more detailed information onp(x) for a variety of biomes. Additionally, we explore the influence of temporal and spatial aggregation on the statistics in order to bridge the gap between the local scale of the daily eddy-covariance flux measurements that are used to definep(x) and the typically much larger spatial and temporal scales of the inversion systems.

2. Methods

[5] Surface CO2 fluxes were measured at tower sites using the eddy covariance technique, which derives CO2 flux from the covariance between fluctuations around the mean vertical wind velocity and CO2 mixing ratio [Aubinet et al., 1999]. Eddy covariance fluxes are typically representative of surface areas ranging from a few hectares to a few km2, depending on the height of the sensors above the canopy, on the roughness of the surface and on the air stability. Flux measurements have been conducted over most major biomes across the world under a global network of flux towers, FLUXNET [Baldocchi et al., 2001; Baldocchi, 2008]. A synthesis data set comprising 253 flux sites, named after the Italian town La Thuile, was compiled in 2007. This archive also stores standardized gap-filled measurements of CO2fluxes at a 30-min time step using a common protocol [Papale et al., 2006]. The La Thuilearchive also collects the corresponding meteorological data, which we gap-filled based on the data from the Interim Reanalysis of the European Centre for Medium-Range Weather Forecasts [Berrisford et al., 2009] for use as input to the ORCHIDEE model. The present study focuses on the 156 sites of the La Thuiledatabase that encompass at least three consecutive years of measurements between 1991 through 2007. Their locations cover much of the temperature-precipitation climate space across the world and most biomes (Figure 1). The full site list is given in Appendix A. Lasslop et al. [2008]estimated the overall uncertainty of the half-hourly gap-filled fluxes to be about 0.07 gC·m−2 (one sigma uncertainty [cf. Lasslop et al., 2008, Figure 2e]. This relatively large uncertainty does not contain much autocorrelation (i.e., the autocorrelation is <0.2 after only four hours [cf. Lasslop et al., 2008, Figure 5a]) so that the uncertainty of the fluxes amounts to only a few tenths of a gC·m−2·d−1 for daily totals [Hollinger and Richardson, 2005]. Therefore, we will not report results here at scales shorter than one day. Errors are site-specific and therefore have low spatial correlations. Some systematic errors affect the measurements: e.g., the measurements do not show energy balance closure and low nighttime wind conditions are not properly treated [Baldocchi et al., 2001]. We rely here on the quality control of the database to minimize these biases. In any event, biases should not affect the statistics (standard deviations and correlations) presented in the subsequent analyses.

Figure 1.

Location of 156 FLUXNET sites used in this study. A different color is used for each dominant biome type.

[6] The site-level meteorological variables were used as boundary conditions for ORCHIDEE simulations at each site. The plant functional type (PFT) is prescribed in the model using parameters that most closely represent the site vegetation. ORCHIDEE simulates the half-hourly, annual and longer variations of the carbon, water and energy fluxes, as well as soil carbon and water pools [Krinner et al., 2005]. This study uses the configuration of the model which was run to set up the prior information for flux inversions at Laboratoire des Sciences du Climat et de l'Environnement (LSCE), as illustrated by Piao et al. [2009] and Chevallier et al. [2010]. Each site is often a net sink of CO2 (and rarely a net source of CO2) and may be in a nonsteady state since the last disturbance event. However, the average long-term net ecosystem exchange (NEE) simulated by the ORCHIDEE model is in a steady state because the model was spun up for 2000 years until soil and biomass pools reached equilibrium with climate conditions at each site.

3. Results

3.1. Daily Fluxes at Site Scale

[7] When combining all site-years together, the standard deviation of the differences between simulated and observed daily fluxes is 2.5 gC·m−2·d−1. The distribution is biased by 0.7 gC·m−2·d−1, with the ORCHIDEE model having the smaller annual mean carbon uptake by vegetation. This bias is partially expected because of the equilibrium assumption set for the simulation and the fact that many FLUXNET sites are managed ecosystems with higher uptake than the regional means within the global context. The measurement biases may also explain some of the bias. The differences between model- and observed values vary with the season, with larger values during the growing season. From May to August, the bias rises to 1.2 gC·m−2·d−1 with a standard deviation of 3.6 gC·m−2·d−1. Interestingly, the observed variability in NEE across the year has a standard deviation of 2.3 gC·m−2·d−1 only, suggesting that, at the synoptic scale, the NEE uncertainty for a model like ORCHIDEE is no better than assuming a constant flux field.

[8] The spatial structure of the error appears to be a function of the lag distance between pairs of sites based on the Pearson correlation coefficient of the model-minus-observation differences (Figure 2). The median reveals spatial structure at short distances (less than 100 km) that did not show up in the 34-site study conducted byChevallier et al. [2006]: the correlation median is 0.33 for distances <100 km, 0.26 for distances between 100 and 200 km, and negligible beyond 400 km. Some systematic and spatially coherent errors in the modeling of plant phenology [e.g., Demarty et al., 2007] may contribute to the correlations. As expected, the observed variability of NEE contains larger correlations: the median is 0.55 for distances less than 100 km, 0.44 for distances between 100 and 200 km, and is still larger than 0.1 after 2500 km. Therefore, the model errors are at a finer spatial scale than the measured signal itself, which implies that the model captures the main patterns of the true NEE for spatially averaged quantities.

Figure 2.

Distance correlogram, i.e., correlation between the daily net ecosystem exchange (NEE) errors of the ORCHIDEE model at pairs of distant sites for a same time. Each point includes all the common years of one of the site pairs. The black line represents the median of the points per 100-km bin.

[9] Chevallier et al. [2006]did not address the possibility that spatial error correlations could be significant within a given vegetation type. If this is the case, inverting surface fluxes directly in eco-regions would be legitimate [e.g.,Peters et al., 2007]. Figure 3examines this possibility by restricting the scatterplots to a single dominating PFT, such as cropland, deciduous broad-leaf forest, evergreen broad-leaf forest, evergreen needleleaf forest, and grassland. Only the deciduous broad-leaf forest type increased the average correlation coefficient, by about 0.4, compared to the all-site statistics. Clearly, assigning biome-dependent correlations does not seem justified, at least given the spatial density of theLa Thuile flux network data.

Figure 3.

Distance correlogram for errors between sites within the same dominant PFT. The regression line of the points is shown, for visualization purpose only, in blue. The black line represents the median for all sites, as shown in Figure 2.

[10] Figure 4 shows the temporal structure of the error in a manner similar to Figure 2. To minimize edge effects, we condense the results in an all-site-year correlation, rather than in a binned median correlation. The former quantity gives more weight to the sites with long data records than the latter and yields slightly larger values (Figure 4). The all-site correlation is positive for lags <85 days and for lags >275 days, which reflects some seasonal pattern of the error. Limited negative correlations exist at lags of 85–275 days, the all-site correlation being not lower than −0.03. Again, the measurement-only statistics of NEE (not shown) contain larger temporal correlations: the all-site correlation remains positive until day 95 and the negative correlations reach −0.1 around the six-month lag. This highlights some skill of the model at a seasonal scale.

Figure 4.

Time correlogram, i.e., autocorrelation between the errors of the ORCHIDEE model at distant times for a same site. Each red line corresponds to a different site. The black line represents the all-site autocorrelation.

3.2. Effect of Temporal and Spatial Aggregation of the Fluxes

[11] The FLUXNET database enables a synthetic estimation of the daily NEE error statistics of the ORCHIDEE model at the site scale, while Bayesian atmospheric CO2 flux inversion systems usually operate at coarser temporal and spatial scales, typically several 10,000 km2and several days or weeks for global inversions. Therefore, the effect of space-time aggregation on the NEE prior error characterization needs to be investigated.

[12] If the site-level daily error statistics are normally distributed (i.e., Gaussian) with covariance matrixB, the corresponding low resolution error statistics are also normally distributed and have a covariance matrix as follows [e.g., Kaminski et al., 2001; Bocquet et al., 2011]:

display math

where Uis the operator that upscales the fluxes from the high-resolution scale to the low resolution one (i.e., coarse-graining operator).Equation (1) provides a straightforward approach to upscale the error statistics, but its application for global atmospheric inversion systems is hampered by the large dimension of B. We therefore introduce an alternative approach to upscale the error statistics from the relatively high resolution of our previous results to any coarser regular spatial and temporal resolution. The model is described in Appendix B. It consists of two simple equations, equation (B3) and equation (B4), which bridge the gap between the scales. They show that aggregation dampens the higher frequencies of the error and therefore reduces it in relative values (equation (B3)), while increasing the low (correlated) frequencies (equation (B4)).

[13] Figure 5shows the effect of temporal and spatial aggregation on the error statistics, as computed from the error model for various lags and aggregation scales. We have assumed that each one of the coarse-resolution NEE fluxes can be split into high-resolution pixels of 1 day and 1 km2for time and space, respectively. Correlation lags are defined from a reference point taken as the middle of a coarse-resolution tile of lengthg. For a given aggregation time or distance g, the correlation behaves like a staircase function: it is constant between lags 0 and g/2, between lags g/2 and g + g/2, between lags g + g/2 and 2 g + g/2, etc. We illustrate the variations of the correlations R with two examples: at lag time 30 days (Figure 5a), Rvaries between 0.4 (without any aggregation) and 1.0 (with 60-day aggregation or more); at lag distance 500 km,Rvaries between 0.1 (without any aggregation) and 1.0 (with 1000-km aggregation or more). The standard deviationsS decrease from 2.5 gC·m−2·d−1 to 1.6 gC·m−2·d−1for 90-day aggregation and to 0.7 gC·m−2·d−1for 1000-km aggregation.

Figure 5.

Effect of (a and c) temporal and (b and d) spatial aggregation of the fluxes on error correlation in the same dimension (Figures 5a and 5b) and on error standard deviation (Figures 5c and 5d). The aggregation distance (Figures 5b and 5d) is defined as the length of the side of a square on which the aggregation is performed.

[14] The high temporal density of the flux data allows us to evaluate the model behavior for temporal aggregation. Here we take the example of semi-monthly averaged fluxes, for which we still have a sufficient number of flux pairs to perform the statistics despite the averaging. For the standard deviationSof the semi-monthly flux errors at the site scale, our model yields a value of 2.1 gC·m−2·d−1 (Figure 5c), which is the same value as the actual model-minus-observation statistics (to be compared withσ = 2.5 gC·m−2·d−1for 24-h site-scale). The correlations in the model and in the data for the semi-monthly fluxes are displayed inFigure 6. The model reproduces the behavior of the data fairly well, except for lag times longer than 200 days where differences of up to 0.1 are seen. Note for instance the good simulation for lags less than 15 days in Figure 6b, for which the shape of the data all-site correlation significantly differs between high and low temporal resolutions (Figure 4 versus Figure 6b). For longer lags, the two-week aggregation does not change the correlations much.

Figure 6.

(a) Distance and (b) time correlograms for semimonthly site-scale fluxes. Each point includes all the common years of a given pair. The black line represents the median of the correlations per 100-km bin (Figure 6a) or the all-site correlation (Figure 6b). The blue lines correspond to the statistical model of the errors. The model curves are extracted fromFigures 5a and 5bwith a two-week smoothing in the case ofFigure 5b.

[15] We use the statistical model to compute error statistics at the typical space-time resolution of current global atmospheric inversion systems that use ORCHIDEE fluxes as prior information [Piao et al., 2009; Chevallier et al., 2010]. For the eight-day fluxes used byChevallier et al. [2010]that had an e-folding temporal error correlation length of four weeks (the e-folding length being the lag required for the correlation to decrease by a factor ofe; note that these authors separate daytime and nighttime, which is not performed here), the coarse-resolution errors (Figure 5a) are correlated by 0.43 at lag one month (instead of 0.35 in the case of daily fluxes), which is close to an exponentially decreasing function of e-folding length of 36 days. We note that an exponentially decreasing function does not capture the positive values at the one-year lag (Figure 4). The standard deviation of the eight-day flux error at the site scale is 2.2 gC·m−2·d−1 (Figure 5c) instead of 2.5 gC·m−2·d−1 at daily scale. For monthly fluxes (used by Piao et al. [2009]without any temporal error correlation), the coarse-resolution flux errors are correlated by 0.59 from one month to the next (Figure 5a). The standard deviation of the monthly error at site scale is 2.0 gC·m−2·d−1 (Figure 5c). For fluxes at the scale of a square grid box of 300 km × 300 km (comparable to work by Piao et al. [2009] and Chevallier et al. [2010]with, respectively e-folding correlation lengths of 1000 km and 500 km) the between-grid-boxes spatial correlation exponentially decreases with the distance with an approximate e-folding length of 500 km (Figure 5b). The error standard deviation at the coarse scale reduces to 1.2 gC·m−2·d−1 (Figure 5d). Combining spatial (300 km) and temporal (eight-day or monthly) averaging does not change the computed correlations but further reduces the error standard deviation to 1.1 and 1.0 gC·m−2·d−1, respectively, for eight-day and monthly fluxes.

4. Discussion and Conclusions

[16] We estimated prior NEE errors for CO2-flux atmospheric inversions. Using the ORCHIDEE simulations as an example of prior information and based on observations made at 156 FLUXNET sites across the world, we described the mean, variances and correlations of the prior errors at the site scale with a 24-h temporal resolution. The model error statistics differed substantially from those of the fluxes themselves. A model generalized these results to a range of larger temporal and spatial scales.

[17] There are four main limitations to this study. First, it only addresses natural vegetation fluxes and cannot consider the emissions from agricultural crops, fossil fuel, cement manufacturing and fires. Second, the ORCHIDEE simulations at the site scale are based on local meteorological measurements as boundary conditions and rely on an accurate description of the vegetation type, but boundary conditions for regional to global simulations can carry large biases. Low accuracy (i.e., large biases) of the coarse resolution boundary conditions (usually provided by weather centers) would tend to increase the spatial correlations of the errors compared to the situation described here. Third, it is not clear if our results apply to models other than ORCHIDEE. We argue that the correlation lengths would decrease with better model accuracy and vice versa. Fourth, the results are tied to the La Thuile database of the FLUXNET. This spatially heterogeneous network may poorly represent the error statistics outside North America and Europe, and, even there, they may not be dense enough to describe the fine features of the error statistics.

[18] Keeping these limitations in mind, the main characteristics of the prior errors may include: (i) that the simulated site-scale daily (24-h) flux is biased by a few tenths of a gC·m−2·d−1 toward a too low carbon uptake, with random errors of standard deviations of 2.5 gC·m−2·d−1 (by comparison, Chevallier et al. [2006] obtained 2.0 gC·m−2·d−1as the standard deviation for their 34 study sites without gap-filling); (ii) that some small positive spatial correlations of the error (unnoticed in work byChevallier et al. [2006]) are seen within the first few hundred kilometers only; (iii) that correlations between areas dominated by a same vegetation type are not larger than the others, except in the case of deciduous broad-leaf forests; (iv) that positive temporal correlations exist within the first few weeks and after a year; and (v) that negative correlations are negligible. Aggregating the fluxes in space or time makes the covariance matrix of the error a bit denser and significantly reduces the variances. It should be highlighted that a side effect of aggregation is the increase of the observation errors and their complexity (as discussed byKaminski et al. [2001]).

[19] The unknown mean distribution of the NEE of the terrestrial biosphere violates the assumption of unbiased prior error statistics in the Bayesian CO2-flux inversions. One may therefore wonder if assigned variances and correlations could empirically be tuned to account for biases. To meet the requirement of optimal estimation, any bias on the prior is reduced by the inversion, by the same amount as the prior error variances. However, it is common practice to inflate the prior variances in order to allow the inversion system to yield larger reductions of the bias, at the expense of increased random errors. As long as the biases of the prior fluxes are much smaller than their error standard deviations, there is no advantage to tune the prior correlations as well and biases can be neglected in the inversion design.

Appendix A:: List of the Selected FLUXNET Sites

[20] The identification codes of the FLUXNET sites used in this study are: AT-Neu [Wohlfahrt et al., 2008], AU-How, AU-Tum, AU-Wac, BE-Bra, BE-Lon, BE-Vie, BR-Ban, BR-Cax, BR-Ji2, BR-Ma2, BR-Sa1, BR-Sa3, BW-Ma1, CA-Ca1, CA-Ca2, CA-Ca3, CA-Gro, CA-Let [Flanagan and Johnson, 2005], CA-Man [Dunn et al., 2007], CA-Mer [Lafleur et al., 2003], CA-NS1, CA-NS2, CA-NS3, CA-NS4, CA-NS5, CA-NS6, CA-NS7, CA-Oas, CA-Obs, CA-Ojp, CA-Qcu [Giasson et al., 2006], CA-Qfo [Bergeron et al., 2007], CA-SF1 [Mkhabela et al., 2009], CA-SF2 [Mkhabela et al., 2009], CA-SF3 [Mkhabela et al., 2009], CA-SJ1, CA-SJ2, CA-TP2, CA-TP3 [Peichl et al., 2010], CA-TP4 [Peichl and Arain, 2007], CA-WP1 [Cai et al., 2010], CH-Oe1, CN-HaM, CN-Xfs, CZ-BK1, CZ-BK2, DE-Bay, DE-Geb, DE-Hai, DE-Kli, DE-Meh, DE-Tha, DE-Wet, DK-Sor, ES-ES1, ES-ES2, ES-LMa, ES-VDA, FI-Hyy, FI-Kaa, FI-Sod, FR-Hes, FR-LBr [Berbigier et al., 2001], FR-Lq1, FR-Lq2, FR-Pue, GF-Guy, HU-Bug, HU-Mat, IE-Ca1, IE-Dri, IL-Yat, IS-Gun, IT-Amp, IT-BCi, IT-Col, IT-Cpz [Garbulsky et al., 2008], IT-Lav, IT-LMa, IT-Mal, IT-MBo, IT-Noe, IT-Non, IT-Pia, IT-PT1, IT-Ren, IT-Ro1 [Rey et al., 2002], IT-Ro2 [Tedeschi et al., 2006], IT-SRo, JP-Tak, JP-Tom, KR-Hnm, KR-Kw1, NL-Ca1, NL-Hor, NL-Loo, PT-Esp, PT-Mi1, PT-Mi2, RU-Che, RU-Cok, RU-Fyo, RU-Ha1 [Belelli Marchesini et al., 2007], RU-Zot, SE-Deg, SE-Fla, SE-Nor, UK-ESa, U.S.-ARM, U.S.-Atq, U.S.-Aud, U.S.-Bkg, U.S.-Blo, U.S.-Bo1, U.S.-Bo2, U.S.-Brw, U.S.-Dk1, U.S.-Dk2, U.S.-Dk3, U.S.-FPe, U.S.-FR2, U.S.-Goo, U.S.-Ha1 [Urbanski et al., 2007], U.S.-Ho1, U.S.-Ho2, U.S.-IB1, U.S.-IB2, U.S.-Ivo, U.S.-KS2, U.S.-Los, U.S.-LPH, U.S.-Me2 [Thomas et al., 2009], U.S.-Me4 [Law et al., 2001], U.S.-MMS, U.S.-MOz, U.S.-Ne1, U.S.-Ne2, U.S.-Ne3, U.S.-PFa, U.S.-SO2, U.S.-SO4, U.S.-SP2, U.S.-SP3, U.S.-SRM, U.S.-Syv, U.S.-Ton [Ma et al., 2007], U.S.-UMB, U.S.-Var [Ma et al., 2007], U.S.-WBW, U.S.-WCr, U.S.-Wi4, U.S.-Wkg, U.S.-Wrc, VU-Coc, ZA-Kru.

[21] The site citations above are those given at http://www.fluxdata.org/ (accessed 5 October 2010), as requested by the La Thuile data policy. A description of each site can be found at http://www.fluxdata.org:8080/SitePages/ (accessed 16 July 2010).

Appendix B:: Aggregation Model

[22] In the following, the high-resolution (in space or time) error statistics are symbolized by lowercase letters (σ for a standard deviation and r for a correlation), while capital letters (S for a standard deviation and Rfor a correlation) represent the coarse resolution (in space or time) error statistics. We represent the high-resolution errors as a stationary field of normally and identically distributed random variables, all with zero mean and the same standard deviationσ. The stationarity property implies that the correlations depend on the time-space distance between the pairs of random variables but not on their absolute position in time and space. The high-resolution standard deviationσ is set to 2.5 gC·m−2·d−1(i.e., the standard deviation of the model-minus-observation differences for daily fluxes at site scale). The high resolution correlations are represented (i) by a continuous parameterization of the all-site autocorrelation as a function of the lag time (equation (B1) and Figure 4) and (ii) by a continuous parameterization of the median correlation as a function of the lag distance between pairs of sites (equation (B2) and Figure 2):

display math
display math

with rτ the dependency of the correlation as a function of lag time τ, in days, and rδ the correlation as a function of the lag distance δ, in km. The parameterization of rτ (equation (B1)) is only valid for lag times less than 365 days. Equations (B1) and (B2) have been obtained by regression on the data. They fit the thick lines of Figures 2 and 4 with a root mean square of 0.01 and 0.06, respectively.

[23] By developing the variance estimator (i.e., Σixi2/n, with xi the nrandom samples of a random variable with zero mean) for the errors of the coarse-resolution NEE fluxes (i.e., time- and space-normalized), one can show that:

display math

with inline imageintra being the arithmetic mean of the correlations rbetween all possible pairs of high-resolution-fluxes within the coarse-resolution fluxes. The definition of inline imageintra is illustrated in Figure B1a.

Figure B1.

Illustration of the definition of the mean correlations (a) inline imageintra and (b) inline imageinter. Dotted lines represent the boundaries of high-resolution fluxes and continuous lines correspond to two low resolution fluxes. The arrows illustrate the pairs over which the average correlations are computed.

[24] Similarly, the Pearson correlation between the errors of a pair of coarse-resolution NEE fluxesf1 and f2 can be expressed as:

display math

where inline imageinter (f1, f2) is the arithmetic mean of the distant correlations between all the different possible pairs of fine-resolution NEE fluxes that it is possible to construct from those inside coarse flux regionsf1 and f2. The definition of inline imageinter is illustrated in Figure B1b.

[25] The computation of inline imageinter (f1, f2) and inline imageintrastems from the above-described high-resolution statistics (σ and the correlations r) and the geometry of the fluxes. Equations (B3) and (B4) quantify how aggregation dampens the higher frequencies of the error (equation (B3)), while increasing the low (correlated) frequencies (equation (B4)).

[26] The last hypothesis of our error model is that spatial upscaling is independent from temporal upscaling. In this case, the coarse-scale standard deviation after aggregation in both space and time can be simply simulated by applyingequation (B3)in each dimension successively, while coarse-scale correlations followequation (B4) in each dimension separately.


[27] The authors thank the participants of the Transcom session about prior errors that was held on 20 September 2009 in Jena, Germany, for fruitful discussions. The study was co-funded by the European Commission under the EU Seventh Research Framework Programme (grant agreement 212196, COCOS) and by the French Agence Nationale pour la Recherche (grant agreement ANR-08-SYSC-014, MSDAG). The authors would like to thank N. Viovy (LSCE) who provided the gap-filling tools for the local atmospheric variables, F. Marabelle and his team at LSCE for computational support, Gil Bohrer (Ohio State University) and two anonymous reviewers for their stimulating comments on the text. This work used eddy covariance data acquired by the FLUXNET community and in particular by the following networks: AmeriFlux (U.S. Department of Energy, Biological and Environmental Research, Terrestrial Carbon Program (DE-FG02-04ER63917 and DE-FG02-04ER63911)), AfriFlux, AsiaFlux, CarboAfrica, CarboEuropeIP, CarboItaly, CarboMont, ChinaFlux, FLUXNET–Canada Research Network/Canadian Carbon Program (supported by CFCAS, NSERC, BIOCAP, Environment Canada, and NRCan), GreenGrass, KoFlux, LBA, NECC, OzFlux, TCOS–Siberia, and the USCCC. We acknowledge the financial support to the eddy covariance data harmonization provided by CarboEuropeIP, FAO–GTOS–TCO, iLEAPS, Max Planck Institute for Biogeochemistry, the National Science Foundation, the University of Tuscia, Université Laval, Environment Canada, and the U.S. Department of Energy, and the database development and technical support from Berkeley Water Center, Lawrence Berkeley National Laboratory, Microsoft Research eScience, Oak Ridge National Laboratory, the University of California, Berkeley, and the University of Virginia.