## 1. Introduction

[2] Satellite missions, such as the Orbiting Carbon Observatory (OCO) and the Greenhouse Gases Observing Satellite (GOSAT), will provide global data of column-averaged CO_{2} dry-air mole fraction (X_{CO2}) at high spatial resolutions. These data will be used in inverse modeling studies to improve the precision and resolution of current estimates of global carbon budgets [*Rayner and O'Brien*, 2001; *Houweling et al.*, 2004; *Chevallier et al.*, 2007]. The amount of information that satellite retrievals contribute towards improving CO_{2} flux estimates will depend on their error characteristics; therefore, an accurate evaluation of the error statistics of retrieved soundings is central to providing accurate estimates of CO_{2} sources and sinks and their associated uncertainties [*Chevallier et al.*, 2007; *Engelen et al.*, 2002].

[3] In inverse modeling studies, observation errors (a.k.a. model-data mismatch) are a combination of: (1) measurement errors due to the satellite instrument, and any approximations or errors in the retrieval algorithm, (2) transport model errors due to modeling simplifications and the uncertainties of model parameters, (3) aggregation errors caused by estimating CO_{2} fluxes at temporal and spatial resolutions coarser than the transport model, and (4) representation errors due to the resolution mismatch between observations and model gridcells [*Enting*, 2002; *Engelen et al.*, 2002; *Michalak et al.*, 2005]. Representation errors are attributed to the inability of atmospheric transport models to resolve the spatial and temporal variations captured by CO_{2} observations, due to the low spatial and temporal resolution of the models relative to that of measurements [*Engelen et al.*, 2002; *Gerbig et al.*, 2003]. In theory, the concentration value assigned to a model gridcell should be equal to the true X_{CO2} mean over the area of the gridcell and during the model time-step. In reality, the true mean is not known and is instead estimated from the satellite retrievals. The representation error is therefore equal to the uncertainty associated with the inferred gridcell mean, given the satellite retrievals over the gridcell, and is a function of X_{CO2} variability over the sampled region. For example, sparse retrievals over a gridcell located in a region with high X_{CO2} variability will be less likely to capture the true mean for that gridcell, and will have higher representation error.

[4] A number of studies have provided an evaluation of the representation error of observations used in inverse modeling studies. *Rödenbeck et al.* [2003] approximated representation errors using the standard deviation of simulated CO_{2} concentrations of gridcells surrounding a measurement location. Although this approximation provides a measure of simulated CO_{2} variability at the gridcell resolution in the region of a measurement, it does not evaluate the representativeness of a measurement of the mean CO_{2} concentration within a gridcell.

[5] *van der Molen and Dolman* [2007] studied the representation error of measurements of CO_{2} based on model simulations. Their analysis showed that representation errors increase with CO_{2} variability. The study quantified these errors empirically as the average standard deviation of simulated CO_{2} fields within different radii of measurement locations. This approach, however, requires knowledge of the entire sampled distribution (e.g. X_{CO2} over gridcell) and does not evaluate the representativeness of multiple measurements within a given gridcell.

[6] *Gerbig et al.* [2003] and *Lin et al.* [2004] evaluated the spatial covariance of partial CO_{2} columns using aircraft measurements. The studies used the evaluated spatial covariance to statistically generate simulated fields with a similar spatial covariance at small spatial resolutions. The simulated fields were divided into subareas used to represent model gridcells. The representation error was then evaluated as the average standard deviation of the simulated values within each model gridcell. This evaluation reflects the variance of the potential retrievals within a model gridcell, but does not represent the uncertainty in estimating the gridcell mean given multiple measurements within each gridcell.

[7] In the context of satellite data, a number of studies have evaluated the representation error as the within-gridcell X_{CO2} variance (or sampling variance). *Corbin et al.* [2008] and *Miller et al.* [2007] evaluated representation errors empirically based on high resolution X_{CO2} simulations. *Miller et al.* [2007] sampled model gridcells according to a North-South swath, and assumed that the representation error is equal to the difference between the true simulated gridcell mean and the sample mean. *Corbin et al.* [2008] extended this approach to include temporal variability and the effect of clouds, by excluding cloudy pixels from the sampled North-South swaths. Both studies calculated the swath means of all possible swath locations within model gridcells, and used the statistics of the resulting swath mean distribution. *Corbin et al.* [2008] subtracted the known simulated gridcell means from these distributions, and used the standard deviation of the residuals as an estimate of the representation error. The methods presented in these two studies cannot be reproduced for actual satellite sampling conditions, however, because the true gridcells means are unknown.

[8] This study introduces a statistical method for evaluating the representation errors associated with using satellite retrievals to represent the mean X_{CO2} within atmospheric transport model gridcells. The proposed method is based on: (1) the spatial distribution of satellite retrievals within a model gridcell, and (2) knowledge of the degree of X_{CO2} variability in the vicinity of the model gridcell. The proposed method uses a geostatistical evaluation of the X_{CO2} variability to quantify the spatial covariance between any two satellite retrievals as a function of their separation distance. This spatial covariance function can be inferred from available in situ data, X_{CO2} model simulations, or potentially from the satellite retrievals themselves. Together with known retrieval locations, the method evaluates representation errors in a way that: (1) reflects the amount of information provided by available retrievals about the true unknown gridcell mean, and (2) does not require knowledge of the true value of that mean. The method is demonstrated using the regional spatial covariance statistics derived by *Alkhaled et al.* [2008] using modeled X_{CO2}, together with assumed spatial distributions of satellite retrievals within hypothetical model gridcells.