## 1. Introduction

[2] Carbon dioxide (CO_{2}) fluxes at the Earth's surface may be recovered (or *inverted*) from the observed spatial and temporal gradients of the CO_{2} concentrations in the atmosphere by applying Bayes' theorem [e.g., *Enting et al.*, 1995; *Bousquet et al.*, 2000; *Gurney et al.*, 2002]. Atmospheric mixing makes the problem ill-constrained and therefore prior information about the CO_{2} flux originating from the land and water surface is also used in the inversion process. In statistical terms, this approach transforms the prior probability density *p*(**x**) about the CO_{2} fluxes, jointly called *state vector* **x** here, into the posterior probability density *p*(**x∣y**) conditioned on atmospheric measurements, jointly called **y**. The statistically optimal estimator of the fluxes, given the available information, corresponds to the maximum of the function *p*(**x∣y**). By design, it critically depends on the assumed prior density function *p*(**x**). Under the numerically convenient assumption of a multivariate Gaussian density, describing *p*(**x**) requires assigning means, variances and correlations. The atmospheric inversion studies of CO_{2} fluxes published so far have assumed various probability distributions centered on climatology, regional inventory statistics or the output of terrestrial ecosystem models, as well as ocean carbon cycle models [*Gurney et al.*, 2002]. In practice, some of the key characteristics of the prescribed a priori flux error distributions *p*(**x**) in use stem from the capacity of the current flux-inversion systems to deal with large state vectors**x**, rather than from the statistics of the inference problem: the largest correlation patterns in space and time are specified in the case of classical analytical systems (i.e., coarse regions inversions [e.g., *Gurney et al.*, 2002]), while the narrowest structures (i.e., pixel size) can be introduced in the variational (i.e., adjoint-based) schemes [*Chevallier et al.*, 2005; *Rödenbeck*, 2005; *Baker et al.*, 2006]. Ensemble methods lie in-between [*Zupanski et al.*, 2007; *Peters et al.*, 2007; *Feng et al.*, 2009]. This subjective choice of error correlation structures critically influences the way the information from a single atmospheric measurement is spread in space and time for the flux inversion systems.

[3] Two studies have attempted to shed light on the characteristics of *p*(**x**) based on observations. *Michalak et al.* [2005] used CO_{2} concentration measurements within a flux inversion system by introducing some poorly known characteristics of the prior errors in the state vector **x**. They highlighted the power of their method but stressed its subjectivity. In the second study, *Chevallier et al.* [2006]relied on the non-gap-filled, raw CO_{2}flux measurements at the eddy-covariance flux sites (total 34) in the northern hemisphere to constrain*p*(**x**). They showed a heavy-tail distribution*p*(**x**) that contradicts the usual assumption of a multivariate Gaussian distribution. Further, the error correlations appeared to follow a linear temporal dependency after the second lag day without any particular spatial structure.

[4] Following the approach of *Chevallier et al.* [2006], we examine the characteristics of *p*(**x**) for terrestrial ecosystem CO_{2} fluxes, when *p*(**x**) is centered around the Organizing Carbon and Hydrology In Dynamic Ecosystems (ORCHIDEE), a process-based ecosystem model [*Krinner et al.*, 2005]. Our study advances our previous knowledge in two ways. First, it uses a much-wider archive of eddy-covariance sites (156 in total) with gap-filled records, which provides more detailed information on*p*(**x**) for a variety of biomes. Additionally, we explore the influence of temporal and spatial aggregation on the statistics in order to bridge the gap between the local scale of the daily eddy-covariance flux measurements that are used to define*p*(**x**) and the typically much larger spatial and temporal scales of the inversion systems.