The inversion of CO2 surface fluxes from atmospheric concentration measurements involves discretizing the flux domain in time and space. The resolution choice is usually guided by technical considerations despite its impact on the solution to the inversion problem. In our previous studies, a Bayesian formalism has recently been introduced to describe the discretization of the parameter space over a large dictionary of adaptive multiscale grids. In this paper, we exploit this new framework to construct optimal space-time representations of carbon fluxes for mesoscale inversions. Inversions are performed using synthetic continuous hourly CO2 concentration data in the context of the Ring 2 experiment in support of the North American Carbon Program Mid Continent Intensive (MCI). Compared with the regular grid at finest scale, optimal representations can have similar inversion performance with far fewer grid cells. These optimal representations are obtained by maximizing the number of degrees of freedom for the signal (DFS) that measures the information gain from observations to resolve the unknown fluxes. Consequently information from observations can be better propagated within the domain through these optimal representations. For the Ring 2 network of eight towers, in most cases, the DFS value is relatively small compared to the number of observations d (DFS/d < 20%). In this multiscale setting, scale-dependent aggregation errors are identified and explicitly formulated for more reliable inversions. It is recommended that the aggregation errors should be taken into account, especially when the correlations in the errors of a priori fluxes are physically unrealistic. The optimal multiscale grids allow to adaptively mitigate the aggregation errors.
 Top-down approaches allow inferring the spatiotemporal distribution of carbon dioxide fluxes at the Earth surface by combining diverse sources of information in a statistically optimal way, namely prior estimates of surface fluxes, CO2 concentration observations, and atmospheric transport models that link concentrations with surface fluxes [Tans et al., 1990]. Due to the sparsity of the available concentration observations, the spatial extent of fluxes and the dispersive nature of the atmospheric transport, the inversion of carbon fluxes is an ill-posed inverse problem [Enting, 2002].
 The imbalance between the fluxes and observations can be alleviated either by making more observations or by reducing the effective degrees of freedom of fluxes. New observations have been increasingly collected from extended networks or satellites [Lauvaux et al., 2011; Chevallier et al., 2007], and continuous observations from towers may provide additional gains [Law et al., 2003; Peylin et al., 2005].
 Assigning correlations in errors of a priori (or background) fluxes, either implicitly or explicitly, reduces the number of degrees of freedom of the flux variables. For instance, the usual prescription of the flux variations within large regions (so-called ecoregions) [Fan et al., 1998; Bousquet et al., 2000] implements such correlations. However, imposing prior error correlations can generate aggregation errors that, in some cases, can be of the same order as the flux magnitude [Kaminski et al., 2001]. There are too few independent estimates of flux variables to allow reliable modeling of the spatial statistics for flux variations at fine scales.
 Mesoscale or regional inversions, which enable simulation-observation comparisons [Lauvaux et al., 2009b] and probably capture local meteorological or orographic scenarios, have been recently developed aiming at regional constraints on anthropogenic and biogenic carbon emissions and the coupling between regional and global scales [Gerbig et al., 2003; Lauvaux et al., 2008].
 The number of flux variables increases with finer spatiotemporal scales, which degrades the conditioning of the carbon inverse problem. As mentioned above, the dimension of the flux vector can be reduced through the aggregation of flux variables. However, it is often expected that the aggregation does not cause great loss of information. Sensitivity analyzes have been conducted with several different settings of regular resolutions for either temporal [Gourdji et al., 2010] or spatial aggregations [Tolk et al., 2008]. The aggregation errors, although qualified or even quantified, are not formulated explicitly for most carbon inversions.
Gerbig et al. [2003, 2006] were the first to use heterogeneous spatial grids to lessen aggregation errors. Unrealistic correlations make the estimation of the a posteriori uncertainties of the fluxes less accurate.
 The adaptive spatial grid from Gerbig et al.  is fixed and obtained with a polar projection, which is centered around one tower to adapt to the heterogeneous influence of observations. We will revise this heterogeneous inverse problem using general adaptive spatial grids with more towers that cover the domain. In this paper, the following questions are addressed: Can the adaptive grids be optimized so that the information from observations can be better propagated within the domain? What if the aggregation errors are explicitly formulated for carbon inversions? How about the role of correlations in background errors for inversions using optimal adaptive grids?
 Such questions are seldom investigated due to the lack of a multiscale framework for analysis. Based on a recent consistent Bayesian multiscale formalism to optimally design control space (in which control variables are to be estimated) [Bocquet, 2009; Bocquet et al., 2011; Bocquet and Wu, 2011], we will construct the optimal adaptive representations of the fluxes for inversions using synthetic concentration data. Such representations are taken from a large dictionary of adaptive multiscale grids. The criterion for representation optimization is chosen to be the number of degrees of freedom for the signal (DFS) that measures the information gain from observations to resolve the unknown fluxes. Consequently the information from observations is expected to be better propagated within the domain through these optimal representations. We will then conduct carbon inversions on the optimal representations. Several issues, e.g., the information propagation, the correlations in background errors, the explicit formulation of aggregation errors, will be examined in the optimal multiscale settings. Hopefully, such optimal adaptive representations would be helpful to set up fixed multiscale grids for practical carbon flux inversions.
 The paper is organized as follows. In Section 2, we present the methodology for the representation optimization and the inversion in the multiscale setting. Inversions are performed in the context of the Ring 2 experiment in support of the North American Carbon Program Mid Continent Intensive (http://www.ring2.psu.edu). The experimental setup is detailed in Section 3. We report the resulting optimal representation and the corresponding inversion results in Section 4. Finally conclusions are given in Section 5.
2.1. Inversion at Finest Resolution
 When properly discretized into Nfg regular grid cells at finest resolution, the surface flux vector σ ∈ can be related to the observation (or receptor) vector μ ∈ d as:
where ε is the vector of errors originating from the imprecision of observations, the representativeness errors and the deficiency of transport models, and H is an operator that includes the atmospheric transport. In CO2 flux inversion, the transport models are usually assumed to be linear. Hence H take the form of a source-receptor matrix H (the so-called Jacobian matrix) of size d × Nfg.
 If one assumes Gaussian and independent errors for the fluxes and observations, a BLUE (Best Linear Unbiased Estimator) analysis reads
where K is the gain matrix
σb is the a priori flux vector, and σa is the a posteriori flux vector. Here R is the error covariance matrix for observations μ, and B is the error covariance matrix for the background fluxes σb. The corresponding a posteriori error covariance matrix for fluxes σa after inversion is
where is the identity matrix in .
2.2. Source-Receptor Matrix
 If the number of observations d is significantly smaller than the number of flux components Nfg, it is more efficient to compute the source-receptor matrix H row-wise. For transport models, the rows of H identifies with the set of adjoint solutions indexed by the observations.
 The influence of an upstream flux σ(s, t) on one tracer concentration observation at receptor location sr and a later time tr can be evaluated using stochastic Lagrangian transport models, in which an ensemble of particles are released from the receptor (sr, tr) and transported backward to source location (s, t) [Uliasz, 1994; Lin et al., 2003; Seibert and Frank, 2004]. The particle distribution is supposed to be well mixed (which implies time reversibility), and the turbulence is accounted for using Markov chain process. Local meteorological fields are used to drive the displacements of each particle. Each particle has its position indexed exactly rather than only up to the grid resolution of the model and consequently Lagrangian models are less diffusive than their Eulerian counterparts. Therefore, Lagrangian transport models are quite popular for the computation of H in mesoscale inversion [Gerbig et al., 2003; Lauvaux et al., 2008].
 For each observation at (sr, tr), the influence of a flux defined at a discrete spatiotemporal grid cell (si, tn) with a finite volume and a finite time interval can be characterized by the density of the particles released from receptor location sr during the time period of the measurement around tr, e.g. one hour. Integrating up all possible flux influences multiplied by the corresponding flux σ(si, tn), one obtains the variation in the concentration value at (sr, tr) resulting from all the flux influences.
 A practical flux variable for inversion is the mass surface source-sink flux in units of (g C m−2 time−1), rather than the volume flux in units of (ppm m−3 time−1). If the concentration observations are in units of (ppm) (volume mixing ratio), and if the surface fluxes are diluted within a surface layer of height h in units of meters, then the footprint which relates a flux at a surface spatiotemporal grid cell to its resulting concentration variation at (sr, tr) can be calculated by:
where xi, yi, tn are the spatiotemporal coordinates of that surface grid cell, and mair are the molecular mass for CO2 and dry air respectively, g is the gravitational acceleration, ΔP is the air pressure difference between ground level and height h above ground level, Δτp,i,n is the residence time for particle p staying in the spatiotemporal grid cell at (xi, yi, tn), and Ntot is the total number of particles released for the observation at (sr, tr). Here the hydrostatic approximation is assumed.
 Similarly to the surface fluxes, the footprint can be computed for the concentrations at boundary grid cells or at the initial time. In this study, synthetic data are used. In this synthetic case, the boundary and initial conditions can be assumed to be perfectly known, therefore they do not contribute to the variations in observation values. For each concentration observation, one can thus compute the footprint elements for the surface fluxes at all upstream spatiotemporal grid cells. These footprint elements form one row of the source-receptor matrix H.
2.3. Error Parameterization
 The background error covariance matrix B is a key ingredient of carbon inversions, because the correction σa − σb lies in the column space of B. For sparse observations, the correlations in B spread the information from the observations sites to their neighborhood.
 Unfortunately in most cases B is not well established. The only objective study to date [Chevallier et al., 2006], which compares the simulation of vegetation models and the in situ flux observations from eddy-covariance sites, shows that, for terrestrial models run at the resolution of global transport models, the length of the spatial error correlation is a few hundred kilometers at most, and that the temporal correlation is strong (up to several weeks, even months). This motivates a spatially diagonal B with perfect temporal correlations for a few days in some studies [Lauvaux et al., 2008]. One alternative popular choice is the exponential decaying correlation model [Rödenbeck et al., 2003; Michalak et al., 2004; Peylin et al., 2005]. Note that Lauvaux et al.  have combined the spatial correlations with ecosystem considerations [Peters et al., 2007].
 In this study, the two above mentioned assumptions on B are tested: a spatial diagonal B and isotropic correlations in background errors. In the latter case, the error covariance between two spatial points s1 and s2 is computed according to the Balgovind parameterization [Balgovind et al., 1983]:
where Ls is the characteristic correlation length, hs = ∥s1 − s2∥ is the spatial distance between two points, and κ is the background error standard deviation which can be heterogeneous, for instance, when the information from local ecosystems is considered. In this study, we assume homogeneous κ (thus a constant). Note that similar formulations can also be introduced for temporal correlations [Gourdji et al., 2010].
 The estimation of the observational error covariance matrix R is a difficult issue, because the error of the transport model is involved [Lauvaux et al., 2009a]. As in most carbon inversions, we assume R diagonal, that is, the observation errors are spatiotemporally independent.
2.4. Multiscale Formalism
 We summarize the multiscale formalism in this subsection. For a detailed presentation, please refer to Bocquet et al. .
2.4.1. Multiscale Representation Structure
 Let Ω be a spatiotemporal 3D (2D + T) domain of surface fluxes discretized into a regular grid at a finest scale of Nfg grid cells. A hierarchical scale of resolution can be obtained by successive dyadic coarse-grainings of the grid cells at the finest scale. The source-receptor matrix H is computed at the finest scale. Its dyadic coarse-grainings can be obtained by simple averaging or summation.
 Let us define a representation ω as a set of N multiscale grid cells that cover Ω. For admissible representations, each point in Ω corresponds to one and only one grid cell of ω. Trees are appropriate tools to describe such multiscale representations. For instance in a 1D domain, a coarse grid cell (mother tree node) can be divided into two refined grid cells (daughter tree nodes). This forms a binary tree (Figure 1a). Multiscale representations in a 2D domain can be constructed using grid cells that are Kronecker products of grid cells of two binary trees. This leads to the so-called tiling representation (Figure 1b), which is anisotropic, since each grid cell in this representation can have arbitrary scale levels in each direction of the 2D domain. A more numerically effective representation is to use a quaternary tree, hereafter called qtree (Figure 1c), instead of products of grid cells of binary trees. That is, each mother grid cell is divided into four daughter grid cells.
2.4.2. Restriction and Prolongation
 In order to switch scale in this multiscale setting, we define a restriction operator that describes how a source (a spatiotemporal flux vector) is coarse-grained, and a prolongation operator that describes how a source is refined to the finer scales.
 Let σ be the source at the finest scale. The coarse-graining of σ on ω is described by σω = Γωσ, where Γω : → N is the restriction operator that can be unambiguously defined as simple dyadic averaging. Refining a source σω on ω back to the finest scale is described by σ = Γ*ωσω, where Γ*ω : N → is the prolongation operator. We sketch the restriction and prolongation operators in Figure 2.
 The prolongation operator Γ*ω is ambiguous, since, given only a source σω, no information is available at finer scales to refine that source. Therefore additional information or assumptions have to be exploited to reconstruct a source σ* at finest scale that corresponds to σω. Such additional information can be the prior probability density function (pdf) q(σ) on σ. Thus the source can be reconstructed using the posterior pdf q(σ|σω). A Bayesian analysis gives:
where qω(σω) is the prior pdf of σ on ω, and the conditional pdf q(σω | σ) can be defined by δ(σω − Γωσ). Here δ is the Dirac distribution.
 If we assume a Gaussian prior at finest scale: q(σ) ∼ (σb, B), the prior on ω is also Gaussian: qω(σω) ∼ (σωb, Bω), where σωb = Γωσb and Bω = ΓωBΓωT. In this case, the corresponding maximum likelihood estimation based on q(σ|σω) is:
This defines an unambiguous affine prolongation operator:
where Λ*ω = BΓωT (ΓωBΓωT)−1 and Πω = Λ*ωΓω. An application of Γ*ω on σω is given by a linear transform Λ*ωσω shifted by ( − Πω)σb, which reproduces equation (8). We use the name of affine operator to emphasize the translation related to σb. It can be verified that the linear operator Πω is a projector and has the property of B−1-symmetry which means that ΠωB = BΠωT. The projector Πω is a composition of a coarse-graining Γω and a projection Λ*ω back from ω to the finest grid. Therefore it characterizes the variations at a coarse-scale representation ω. Consequently − Πω is also a projector which conserves the small-scale variations smoothed out by Πω.
 For efficient multiscale representation optimizations, B is preferred to be diagonal. When cross-correlations in background errors are present between different grid cells, B is non-diagonal. However, the multiscale optimization machinery can be kept by introducing a new coarse-graining operator ω = ΓωB−1/2 where Γω is the original restriction operator. For a source σ ∈ the new coarse graining is described by σω = ωσ = ΓωB−1/2σ. Here a linear transform = B−1/2σ is implicitly introduced to remove the cross-correlations in the sense that the error covariance of b = B−1/2σb is an identity matrix. The maximum likelihood estimation still applies with the new coarse graining operator ω:
The new prolongation operator can thus be derived as:
where ω = B1/2ΓωT (ΓωΓωT)−1 and ω = ωω. Note that in this case the dyadic averaging of Γω is applied to the sources ∈ with decorrelated errors, therefore the adaptive grid of the representation ω cannot be related directly to the original spatiotemporal lat-lon domain.
2.4.3. Aggregation Error
 For a given representation ω, the source-receptor matrix H becomes Hω = HΓ*ω, which is also an affine operator. Its linear part is Hω = HΛ*ω. The multiscale fluxes on ω are related to the observations by: μ = Hωσω + εω = Hσb + HΠω (σ − σb) + εω. Note that at finest scale μ = Hσ + ε. Therefore we can identify the total scale-covariant error [Bocquet et al., 2011]:
where εωagg = H( − Πω)(σ − σb) is the part of the total error resulting from the aggregation. Assuming independence between the observational error ε at finest scale and the background error εb = σ − σb, one obtains
It can then be verified that the statistics of the innovation vector μ − Hσb is scale-invariant when the aggregation error is identified and formulated according to equation (12): Rω + HωBωHωT = R + HBHT (for details see Section 3.2 of Bocquet et al. ). Failing to taking into account εωagg leads to an inconsistent innovation statistics R + HωBωHωT.
2.5. Inversion With Multiscale Representations
 The innovation vector μ − Hσb is scale-invariant due to the fact that Hωσωb = Hσb. The BLUE analysis on ω gives the following update:
The posterior error covariance matrix is
When the aggregation error εωagg is considered, the term Rω + HωBωHωT can be computed by R + HBHT in the above two formulae. By contract, when the aggregation error fails to be formulated, the inconsistent innovation statistics R + HωBωHωT is used.
 The inverted flux vector σωa on ω can be transformed into a flux vector σa at the finest scale by the prolongation operator:
For non-diagonal B, Γω is redefined as ω = ΓωB−1/2 to obtain the inverted fluxes at the finest scale:
In practice, all the expressions in decorrelated space, such as ωa, can be calculated using the same formulae obtained in the diagonal case, except that B and H should be replaced by and HB1/2 respectively.
2.6. Criteria for Optimal Representation of Sources
 Diverse criteria have been proposed to evaluate the performance of a multiscale representation ω [Bocquet et al., 2011]. In this study, we choose the number of degrees of freedom for the signal (DFS) as the criterion, which is defined by [Rodgers, 2000]
where E denotes the expectation operator on the background and observational errors, and σa is the BLUE analysis in equation (2). Note that σa can be obtained by the following variational problem:
where the observational error is defined by the observation equation (1): ε = μ − Hσ. The DFS is thus the part of χ2 that measures the relative correction of σa to σb. In the simplest case, for which a measurement is made of a scalar: μ = σ + ε, the DFS is vb/(vb + vε), where vb and vε are the prior and measurement error variances respectively. If the measurement is exact (vε = 0), or if there is no prior information (vb = ∞), we have one degree of freedom for the signal which is provided by the measurement. By contrast, if vε = ∞, we have zero DFS, or one degree of freedom for the noise.
 In the general vector case, the DFS can be computed by Tr(A) where A = KH is the averaging kernel matrix. (The averaging kernel is defined as the sensitivity of the inversion to the true state from Rodgers . In this paper, it is referred to simply as a mathematical term, and we do not seek its interpretation like smoothing functions as in the retrieval community.) This equals to Tr[(B − Pa)B−1], which measures the relative reduction of uncertainty for the BLUE analysis. In the presence of noise ε, it can be demonstrated that the DFS value ranges between 0 and the number of observations d.
 In summary, the DFS measures the information gain from observations to resolve the unknown parameters. Using the property of scale-invariance of the innovation statistics, the DFS on ω can be computed by:
where Πω = BΓωT (ΓωBΓωT)−1Γω following equation (9). For non-diagonal B, the DFS become
where Πω is reduced to ΓωT (ΓωΓωT)−1Γω.
 The DFS is to be maximized for optimal multiscale representation ω*. In practice, Πω has explicit algebraic formula for an efficient evaluation of ω.
2.7. Representation Optimization
 The optimization of ω over admissible representations is a constrained optimization problem. The numerical procedure for solutions is mainly composed of three parts: the calculation of the DFS ω, a statistical regularization scheme and a gradient-based optimization routine (see Bocquet et al.  and references therein for details). Thanks to the algebraic form of Πω, the efficient evaluation of ω enables efficient optimizations less costly than the inversion at the finest scale. For representations with a larger number of grid cells, fast asymptotic solutions are possible, which for most cases last only for seconds [Bocquet and Wu, 2011].
 It is possible that the numerical solutions lead to suboptimal representations. We do not require the strict optimality. The resulting optimal representations can be validated by their improvements in the DFS value, and one can also check whether they are physically coherent.
3. Experimental Setup
 The Ring 2 campaign in support of the North American Carbon Program Mid Continent Intensive (MCI) will be used to test and discuss the concepts introduced earlier. The spatial domain covers an area of size 980 km × 980 km centered at [37.1906°N, 98.5925°W] (Figure 3). A ring of eight towers (one from PSU (Pennsylvania State University), five from MCI, and two from NOAA (National Oceanic and Atmospheric Administration)) around the state of Iowa collect hourly averaged CO2 concentration observations (in ppm) in and out of the corn belt area. The locations of these towers are shown in Figure 3. The time period of the experiment is from 1st June 2007 at 0000 UTC to 16th June 2007 at 0000 UTC. The time length is 15 days. Simulations of a vegetation model SiBcrop [Lokupitiya et al., 2009] within this spatiotemporal domain are used as the reference true fluxes (e.g. the fluxes during 15 days in Figure 3). The total number of observations d is thus 2880 (8 × 24 × 15).
 The atmospheric transport is simulated using the meteorological WRF model [Skamarock et al., 2005] with a horizontal resolution of 10 km. There are 60 vertical levels. Forty of them are in the lower 2 km, and the top of the first level is 20 m. Backward particle trajectories over 15 days are generated using the Lagrangian transport model LPDM [Uliasz, 1994] with an integration time step of 20 s. At each time step, 10 particles are released from the tower locations. Therefore, for each hourly averaged observation, the total number of particles Ntot is 1800. The surface layer height h is taken to be 50 m. This height represents the atmospheric surface layer depth where the well-mixed criteria allows us to consider that particles are influenced by the surface. In theory, one should count the touchdowns at the surface, but the misrepresentation of the lowest level dynamics in the model (from 0 to 55 m in our case or the first 3 levels) may cause unrealistic footprints during stable atmospheric conditions in particular, mainly due to under-estimated vertical mixing velocities. Larger scale models use even deeper layer to compensate for this issue, which is problematic when using nighttime mixing ratios but reasonable for observations in the daytime well-mixed convective PBL.
 The 2D spatial domain is discretized into a finest regular grid of 128 × 128 points. In this finest reference grid, each grid cell is of size about 8 km × 8 km. The particles within these surface grid cells are recorded to compute the influences of the fluxes on concentration observations.
 The temporal correlations between fluxes are usually considered to be significant over days for mesoscale inversions [Carouge et al., 2010; Gourdji et al., 2010]. This leads to long time aggregations for inversions, e.g. over one week [Schuh et al., 2010; Lauvaux et al., 2011]. In this study, the mean 15-day fluxes are to be inverted. The flux dimension is the total number of grid cells: Nfg = 16384.
 Two settings are tested for the background error covariance matrix B: a diagonal one and a Balgovind form. We set the standard deviation κ of background flux errors to 10 g C m−2 15 d−1. In our case, the corn crop had not yet fully developed in June, therefore κ is set to a value smaller than those of Lokupitiya et al.  and Lauvaux et al. . Realistic correlations are assessed by testing two Balgovind correlation length (Ls) values: 20 and 50 km. This is approximately equivalent to an exponential model with correlation length set to 50 and 100 km respectively (Figure 4). Lauvaux et al.  found little impact on inversion results with correlation length larger than 100 km for exponential models. Note that the effective degrees of freedom of the background fluxes are reduced significantly with increasing correlation lengths. Quantitative results are omitted since the degrees of freedom (information content) of background fluxes are not under the same metric as that of DFS for a fair comparison.
 The observational error covariance matrix R is assumed to be diagonal. Two values of the standard deviation of the observational error are tested: 3 and 0.5 ppm. The larger value 3 ppm is supposed to include both the atmospheric transport error and the aggregation error (that leads to the representativity error); the smaller value only accounts for the instrumental error. These observational error values are consistent with those of Carouge et al.  and Schuh et al. .
 The synthetic observations are generated by the right multiplication of the source-receptor matrix H with the true reference fluxes. These synthetic observations are perturbed according to the observational error. We employed 10 different seeds for the random number generation to obtain different realizations of observation perturbations. Our preliminary tests showed that there are about 2% relative variations in the root mean square errors (RMSE) for inverted fluxes resulting from this randomness. For simplicity, we present inversion results based on only one realization of observation perturbation. The background fluxes σb are generated by perturbing the true reference fluxes according to the given error structure: σb = σt + B1/2n, with σt the true reference flux and n a draw from a random vector of independent normal components of standard deviation 1.
 In our preliminary tests, we found no significant difference when adopting a tiling structure or a qtree structure. The results obtained with the DFS criterion are also quite similar to those obtained with the criterion described by Bocquet . That is why we shall only present the results obtained with qtree, based on the DFS criterion.
 In this section, we present the results on representation optimization and multiscale inversion. Different experimental setups are listed in Table 1, where we specify the background and observational error covariance matrices, as well as the generation of the first guess for inversions. These experiments help to demonstrate the impact of aggregation errors (Section 4.1 for regular grids and Section 4.3 for optimal grids), the information propagation from observation sites to the whole domain (Section 4.2 on optimal representations and Section 4.3 on reduction of uncertainties), and the importance of a realistic correlation length for inversions (Section 4.1 for regular grids and Section 4.3 for optimal grids). We discuss specific issues, e.g., the non-diagonal observational error covariance matrix and the inversion errors at different scales in Sections 4.4 and 4.5.
We detail the configurations for the background and observational error covariance matrices (B and R), as well as the covariance matrices of the zero-mean Gaussian perturbation vectors added to the true reference fluxes to generate the first guesses. For all the settings, the standard deviations (std) of background errors are set to 10 g C m−2 15 d−1, and R is assumed to be diagonal.
std = 3 ppm
std = 0.5 ppm
Balgovind, Ls = 20 km
std = 3 ppm
Balgovind, Ls = 20 km
std = 0.5 ppm
Balgovind, Ls = 50 km
std = 3 ppm
Balgovind, Ls = 50 km
std = 0.5 ppm
Balgovind, Ls = 20 km
std = 3 ppm
Balgovind, Ls = 20 km
std = 0.5 ppm
Balgovind, Ls = 20 km
std = 3 ppm
Balgovind, Ls = 50 km
std = 3 ppm
4.1. Regular Representation at Different Scales
 We first consider regular grids at different coarse scales, for instance: a regular grid of 64 × 64 points in which four adjacent grid cells at finest scale are aggregated into one large grid cell. It is straightforward to vary scales and perform inversions on these resulting regular grids. We list their inversion performance in Figure 5.
 The expectation of the root mean square error of inverted fluxes is (E[(σa − σt)T (σa − σt)]/Nfg)1/2 = [Tr(Pa)/Nfg]1/2. Similarly, the expected RMSE of the first guess is [Tr(B)/Nfg]1/2. In general, the inverted fluxes have smaller RMSEs than those of the first guesses, since the information from concentration observations are assimilated. Note that the DFS that measures the information gain from observations is evaluated by Tr[(B − Pa)B−1], whereas the RMSE of inverted fluxes measures the residual uncertainty after analysis.
 For a diagonal B, the improvement in RMSE is not significant (Figures 5a and 5c). However, the corresponding DFSs are considerable (e.g. DFS/d ∼ 8% in Figure 5b and DFS/d ∼ 37% in Figure 5d for the finest grid). Since there are no correlations in the background errors, the number of degrees of freedom of the system is quite large. In addition, the information is not spread to the domain away from the observation sites in the absence of correlations in background errors. The DFS values indicate that the observations are effectively assimilated locally. Nevertheless, there are still many unresolved degrees of freedom of system. This explains the weak improvement in RMSE.
 For B in Balgovind form, the correlations of the background errors play a role similar to the aggregation of variables, but decaying with distance. The number of degrees of freedom of the system is therefore decreased, and the information from observations is more propagated within the domain. Consequently, one observes better improvements in RMSE (e.g., Figure 5e). The best improvements in RMSE are obtained with the longest length in error correlations (e.g. Figure 5i). In this case, we have the smallest number of unresolved degrees of freedom of system. When the standard deviation of the observational error is set to the realistic value 3 ppm, the DFSs for Balgovind B (DFS/d ∼ 9% in Figure 5f and DFS/d ∼ 5% in Figure 5j) are comparable with that for the diagonal case (Figure 5b). When the ratio between the background and observational errors increases, the DFSs for Balgovind B (DFS/d ∼ 19% for Figure 5h and DFS/d ∼ 12% for Figure 5l) are inferior to that for the diagonal case (Figure 5d).
 In practice, the error structure of B is unknown and difficult to parameterize. We simulate this fact by deliberately using misspecified error structures to perturb the reference fluxes in order to obtain the first guess σb. The RMSE is affected by this mis-specification, but the DFS do not depend on the perturbation error structure. The theoretical improvement in RMSE of the BLUE analysis is not guaranteed with perturbations that break the assumption on background error structure. For instance, if in reality B bears little correlation structures, and if the inversion is performed with a B with a correlation length of 20 km, we have catastrophic increasing RMSE for inverted fluxes (Figures 5m and 5n). The imposed aggregations on independent regions lead to dominant aggregation errors. The corresponding inversions may diverge and produce spurious and large a posteriori error covariance matrix Pa. The mis-specification of a 20 km correlation length by 50 km or vice versa leads to poorer inversions (Figures 5o and 5p compared with Figures 5i and 5e respectively).
 The impact resulting from explicitly formulating the aggregation error for inversion is also shown in Figure 5. The inversions at the finest scale are the best, because the aggregation error εωagg is null at that scale. Inversions taking into account aggregation errors perform systematically better than those without consideration of aggregation errors, especially when B is not physically realistic (Figures 5m–5p) or when the ratio between the background and observational errors increases (e.g. Figures 5c, 5g and 5k). This latter case may play important roles in practice, since our background error has been underestimated. It is also noticeable that the aggregation of variables with diagonal B also generates considerable aggregation errors. Based on these results, it is recommended that, if possible, the aggregation error should be considered explicitly for inversions using the scale-independent innovation statistics R + HBHT rather than the inconsistent term R + HωBωHωT.
4.2. Optimal Representations
 We show optimal multiscale representations under the DFS criterion in Figure 6. The number of grid cells in these representations are far fewer than that of the finest scale Nfg. Equally important, the multiscale representations are optimal in the sense that the gain of information from observations is maximized. In other words, the optimal representation characterizes how information from observations are propagated spatiotemporally within the domain.
 For diagonal B, there are no correlations in background errors introducing the aggregation effect. However, the information from μ can still be spread to the regions around observation sites by advection and diffusion through the source-receptor matrix H in the innovation vector μ − Hσb. As a result, the adaptive optimal grids are dense around the observation sites, with shapes following the wind conditions (leftmost column in Figure 6). Since little information from observations can be used to resolve the fluxes in regions distant from the observation sites, the optimal grids are sparse in these distant regions.
 By contrast, for B in Balgovind form, the information from observations can be propagated to distant regions due to the aggregation effect introduced by the correlation in background errors. The adaptive optimal grids are more uniformly distributed than those for diagonal B (right two columns in Figure 6). The longer the correlation length is, the more uniform the optimal representation becomes. The influence of the variations in meteorological conditions seems to be smoothed by the correlations in background errors. Nevertheless, the optimal representations are still dense around the observation sites (e.g. Figure 6h), which results from the balance between the meteorological conditions and the introduced aggregation effect [Saide et al., 2011].
 It has been found by preliminary tests by Lauvaux et al.  that the spatial correlation structure for crops is rather short. Inversion results with correlation lengths longer than 100 km are very similar. Further studies on the exact optimal correlation length are needed, e.g., an objective analysis using in-site flux observations and/or cross validations using observation sets from different towers.
4.3. Inversion With Multiscale Representations
 Once the multiscale representation is fixed, inversions can be performed as described in Section 2.5. For both regular and optimal multiscale grids, we perform diverse inversions with B chosen to be either diagonal or in Balgovind form. The inversion results are shown in Figures 7 and 8. It is verified that, for diagonal B, the gain through inversions peaks around the observation sites (Figure 8a), and the information from observations cannot be conveyed to distant regions. The corrections of inverted fluxes to background fluxes are small, especially in regions distant from observation sites (Figure 8b).
 By contrast, for Balgovind B even with spatial correlations (Ls = 20 and 50 km respectively), the gain through inversions can be spread to regions far from the observation sites (Figures 8e and 8k). Considerable corrections are obtained, and there are significant improvements in RMSE for the inverted fluxes (Figures 8g and 8l). In these cases, the first guesses are generated by perturbing the true reference fluxes with a zero-mean Gaussian vector whose covariance matrix is taken to be the same Balgovind B. In other words, B is assumed to be well tuned.
 The relative gain of variance at i-th grid cell is computed by: ([B]ii − [Pa]ii)/[B]ii, where [B]ii denotes the i-th diagonal element of B. For the cases with non-diagonal B, the sum of relative gains over all the grid cells is bigger than the DFS, because the gains are expected to be correlated. By contrast, the sum of relative gains of variance for diagonal B equals to the DFS.
 For diagonal B, since little gain is found in regions far from the observation sites, there is no need to allocate computational resources in these regions. Therefore efficient optimal multiscale representations can be obtained against the regular grids. Accordingly, one observes large improvement in DFS for optimal representations against regular representations in Figure 7b. Compared with the grid at finest scale, an optimal representation may have only 25% of the total number of grid cells, but can keep 94% of the DFS value (Figure 7b) when the aggregation error is taken into account explicitly. The correction of the inverted fluxes to the background fluxes for optimal representations (Figure 8d) is similar to that for the finest grid (Figure 8b) and much better than that for regular representations of the same size (Figure 8c).
 For Balgovind B, the corrections of inverted fluxes to background fluxes for the regular and optimal representations are very similar (Figures 7d and 8g). This is not a surprise, since the optimal representations are more uniformly distributed in this case. The influence of observations transported by the meteorological conditions is smoothed by the imposed correlations in background errors. The optimal representation may have only 25% of the total number of grid cells, but can keep at least 96% of the DFS value (Figures 7d and 7f) when the aggregation error is taken into account explicitly.
 Without accounting for the aggregation error, inversions with optimal adaptive grids perform systematically better than those with regular grids (Figure 7). The optimal grids seem to be less affected by the aggregation effect. In fact, the aggregation effect can be quantified by:
Here Rω is given by equation (13). To minimize the aggregation effect is hence equivalent to the maximization of the Fisher criterion Tr(ΠωBHTR−1H). Bocquet et al.  gave more details on the Fisher criterion, which is the limiting case of the DFS criterion when R is inflated or when B vanishes. Our preliminary tests showed that the optimal grids are quite similar under the Fisher and DFS criteria (results omitted). In summary, the optimal grids seem to have the desirable property that mitigates the aggregation effect.
 Most of the CO2 inversions [Rödenbeck et al., 2003; Michalak et al., 2004; Peylin et al., 2005] rely on lengthy correlations in background errors for effective corrections of fluxes. However, such lengthy correlations may not be realistic as indicated by Chevallier et al. . Indeed, when the first guess is perturbed differently with a diagonal B, the correction (Figure 8h) becomes dramatically different from that with the perturbation using correct Balgovind B (Figure 8f). The RMSE of inverted fluxes can even be worse than the background RMSE (Figure 5m). This result is consistent with the investigations of Gerbig et al.  and Carouge et al.  using regular grids. It is clear from our study that the choice of such an unphysical correlation length is likely to yield a significant over-estimation of the inversion gain.
 Suppose that the realistic B has a short correlation length, and that the inversion is performed using a B with longer correlation length, the resulting correction (Figure 8j) tends to omit the small scale variations (compared with Figure 8f). In reverse, underestimation of the correlation length of B (Figure 8i) will improperly impose small scale variations in the corrections (compared with Figure 8l). Therefore, the specification of a realistic and robust B is a crucial problem in top-down CO2 inversions that needs further in-depth investigations.
4.4. On Correlated R
 In this study using synthetic data, the model transport errors are not formulated explicitly, and the observational error covariance matrix R is assumed to be diagonal. Lacking of in-depth investigations on model errors, most recent regional inversions adopt a diagonal R [Carouge et al., 2010; Gourdji et al., 2010; Schuh et al., 2010].
 In practice, R is probably non-diagonal if the transport model errors are considered. The transport errors due to the imprecision in horizontal transport, vertical mixing, and sub-grid physics may persist across hours, which probably leads to spatiotemporal correlations in R. In this regard, Gerbig et al.  used exponential decaying models for a parameterization of spatial correlations. Based on an ensemble of atmospheric transports [Lauvaux et al., 2009a], Lauvaux et al.  introduced temporal correlations within 12 hours in the observational error to account for its impact on inversions. In general, such observational spatiotemporal correlations would make the inversions less constrained by the observations [Lauvaux et al., 2011].
 One may roughly assess the effect of observational spatiotemporal correlations by artificially inflating the error variances of a diagonal R. Recall that the Fisher criterion is the limiting case of the DFS criterion with larger error variances in R, and the optimal grids under the two criteria were found to be stable. We conjecture that the optimal grids are less influenced by the introduction of observational spatiotemporal correlations than the inversion on regular grids.
4.5. Inversion Errors at Different Scales
 All the errors (RMSEs related to the Euclidean norm ∥σa − σt∥2) in Figures 5 and 7 are compared at the finest scale. This provides an equal basis to compare all coarse-scale inversions, which helps to assess the aggregation errors. However, when inversions are performed on coarser representation ω, it would be interesting to compare the inversion errors at the same coarse scale, that is, some quantities related to ∥σωa − σωt∥.
 We evaluate the inversion errors at different scales by decomposing the errors at the finest scale in B−1-norm into two parts: one for the variability at the finest scale, and the other for coarser variations. In fact, substituting equations (16) or (17) into ∥σa − σt∥B−12, we have
Let γf = ( − Πω)(σb − σt) and let γc = Λ*ω (σωa − σωt). One can verify that γf and γc are B−1-orthogonal, that is, γf,TB−1γc = 0. Therefore we have
Here γf characterizes the variability at the finest scale, and γc is related to the variation σωa − σωt at coarse scales.
 We compare the B−1-norm of σb − σt, σa − σt, γf and γc in Figure 9. The inversion errors are normalized by B, consequently the improvements in B−1-norm are less significant (especially for the case of diagonal B) than those in RMSE in Figures 5 and 7. When the numbers of grid cell is small, the inversion error is mainly due to the failure of coarser representations in accounting for the variability γf at the finest scale. When increasing the number of grid cells, γf decreases since finer grid cells are used, and γc increases because there are more grid cells to be compared for the term σωa − σωt.
 In general, the inversion error for optimal grids has a greater γf but smaller γc and σa − σt compared with those for regular grids with the same total number of grid cells. The sum of ∥γf∥B−12 and ∥γc∥B−12 is slightly greater than ∥σa − σt∥B−12, because the approximation of the square root of B in inversions slightly breaks the B−1-orthogonality between γf and γc. In summary, the RMSEs in Figures 5 and 7 are related to ∥σa − σt∥B−1, which is the composition of the variations in errors at the finest and coarser scales. Therefore they are reasonable criteria to evaluate the inversion performances. Although it is not possible to compute σωa − σωt directly through those RMSEs, one can assess the variations at coarse scales γ c which typically follow the curves plotted in Figure 9. For the case of diagonal B, the optimization algorithm is less efficient and produces zigzag for very small number of grid cells in Figure 9a.
 We have implemented a consistent Bayesian formalism to construct optimal multiscale space-time representation of carbon fluxes for mesoscale inversions under an information criterion: the number of degrees of freedom for the signal (DFS). This methodology has been tested using synthetic CO2 concentration data in the context of the Ring 2 experiment in support of the North American Carbon Program Mid Continent Intensive.
 The DFS, ranging from 0 to the number of observations d, measures the information gain from observations to resolve the unknown fluxes. It has been found that, for continuous hourly concentration observations from the Ring 2 network of eight towers with realistic observational errors, in general only a small part of observations are effectively assimilated (DFS/d < 20%). By contrast, the root mean square errors (RMSE) of inverted fluxes measure the residual uncertainty after analysis.
 In the absence of correlations in the errors, there are many more degrees of freedom of the system to be resolved compared to the relatively small DFS value. Therefore, one observes large residual uncertainties (small improvements of RMSE). In general, aggregation of flux variables or introducing correlations in background errors are needed to reduce the effective degrees of freedom of system. In this multiscale setting, we have formulated the scale-dependent aggregation errors explicitly. The inversion performances have been systematically improved by explicitly taking into account the aggregation error for inversions. The multiscale representation of carbon fluxes allow to adaptively mitigate the aggregation errors. This is justified by the better performance of the optimal grids against the regular grids in the inversions without accounting for the aggregation error.
 The optimal multiscale representations have been found to have far fewer grid cells (e.g. 25% of the total grid cells), but can keep most of the DFS value (e.g. 94%). This enables more efficient inversions, since far fewer flux variables are to be inverted. Equally important, the optimal multiscale representations characterize how information from observations can be optimally spread to the whole domain. For instance, in the absence of correlations in background errors, the information from observations has little impact on regions distant from observation sites. Consequently the optimal representations are dense around observation sites but very sparse in distant regions. By contrast, when correlations in background errors are introduced, the optimal multiscale representations are more uniformly distributed, which results from the balance between atmospheric transport and the imposed aggregation effect.
 The correlations in background errors have been shown to be crucial for carbon inversions. Failure in the specification of realistic correlations leads to significant aggregation errors. In this case, scale-dependent aggregation errors should be formulated explicitly for more reliable carbon inversions.
 In this study, transport errors were not taken into account in the formulation of the representativity error. With finer spatial scales, the aggregation errors are supposed to decrease since fewer regions are aggregated. Nevertheless the smoothing of carbon fluxes by the complex mesoscale atmospheric transport makes it more difficult to retrieve flux variations from concentration observations. Therefore the estimation errors are expected to increase [Peylin et al., 2001; Enting, 2002]. When considering scale-dependent transport errors, the balance between the aggregation and estimation errors may result in an optimal multiscale representation with an inversion performance better than that of the regular grid at finest scale [Bocquet et al., 2011]. Moreover, the transport model error probably arouses spatiotemporal correlations in the observational error. We will examine its impact on the representation optimization in details in the subsequent studies.
 Further in-depth investigations on realistic correlations in background errors are needed for reliable inversions. More sources of information may be exploited, e.g. the carbon flux observations, simulations of diverse vegetation models, and different correlation parameterizations. Non-stationary correlations may be helpful to incorporate additional constraints (e.g. ecosystem information). Clues on correlations may also be inferred through leave-one-out cross validations, in which we compare the concentration observations from one excluded tower and the simulated concentrations for this tower using observations from other towers (see Pickett-Heaps et al.  for an example of validation studies).
 This study will be extended to real concentration observations. This means that the boundary and initial conditions will also be included in the inversion. The methodology for multiscale analysis in this paper is not limited to spatial aggregations. We can deal with multiscale temporal aggregations within the same framework. Inversions with finer time scales will be a future subject, which needs temporal correlations in background errors [Gourdji et al., 2010]. The optimal multiscale representations depend on the meteorological scenarios. Experiments during a longer time period, e.g. several months, could be performed. Statistics on the resulting optimal representations will help to set up fixed multiscale grids for practical carbon flux inversions.
 This paper is a contribution to the MSDAG project supported by the Agence Nationale de la Recherche, grant ANR-08-SYSC-014. The authors would like to thank the reviewers for their substantial suggestions that helped to improve the manuscript.