Satellite measurements of column-averaged CO2 dry-air mole fraction (XCO2) will be used in inversion and data assimilation studies to improve the precision and resolution of current estimates of global fluxes of CO2. Representation errors due to the mismatch in spatial scale between satellite retrievals and atmospheric transport models contribute to the uncertainty associated with flux estimates. This study presents a statistical method for quantifying representation errors as a function of the underlying spatial variability of XCO2 and the spatial distribution of retrieved soundings, without knowledge of the true XCO2 distribution within model gridcells. Representation errors are quantified globally using regional XCO2 spatial variability inferred using the PCTM/GEOS-4 model and a hypothetical atmospheric transport model with 1° × 1° resolution, 3 km2 retrieval footprints, and two different sounding densities.
 Satellite missions, such as the Orbiting Carbon Observatory (OCO) and the Greenhouse Gases Observing Satellite (GOSAT), will provide global data of column-averaged CO2 dry-air mole fraction (XCO2) at high spatial resolutions. These data will be used in inverse modeling studies to improve the precision and resolution of current estimates of global carbon budgets [Rayner and O'Brien, 2001; Houweling et al., 2004; Chevallier et al., 2007]. The amount of information that satellite retrievals contribute towards improving CO2 flux estimates will depend on their error characteristics; therefore, an accurate evaluation of the error statistics of retrieved soundings is central to providing accurate estimates of CO2 sources and sinks and their associated uncertainties [Chevallier et al., 2007; Engelen et al., 2002].
 In inverse modeling studies, observation errors (a.k.a. model-data mismatch) are a combination of: (1) measurement errors due to the satellite instrument, and any approximations or errors in the retrieval algorithm, (2) transport model errors due to modeling simplifications and the uncertainties of model parameters, (3) aggregation errors caused by estimating CO2 fluxes at temporal and spatial resolutions coarser than the transport model, and (4) representation errors due to the resolution mismatch between observations and model gridcells [Enting, 2002; Engelen et al., 2002; Michalak et al., 2005]. Representation errors are attributed to the inability of atmospheric transport models to resolve the spatial and temporal variations captured by CO2 observations, due to the low spatial and temporal resolution of the models relative to that of measurements [Engelen et al., 2002; Gerbig et al., 2003]. In theory, the concentration value assigned to a model gridcell should be equal to the true XCO2 mean over the area of the gridcell and during the model time-step. In reality, the true mean is not known and is instead estimated from the satellite retrievals. The representation error is therefore equal to the uncertainty associated with the inferred gridcell mean, given the satellite retrievals over the gridcell, and is a function of XCO2 variability over the sampled region. For example, sparse retrievals over a gridcell located in a region with high XCO2 variability will be less likely to capture the true mean for that gridcell, and will have higher representation error.
 A number of studies have provided an evaluation of the representation error of observations used in inverse modeling studies. Rödenbeck et al.  approximated representation errors using the standard deviation of simulated CO2 concentrations of gridcells surrounding a measurement location. Although this approximation provides a measure of simulated CO2 variability at the gridcell resolution in the region of a measurement, it does not evaluate the representativeness of a measurement of the mean CO2 concentration within a gridcell.
van der Molen and Dolman  studied the representation error of measurements of CO2 based on model simulations. Their analysis showed that representation errors increase with CO2 variability. The study quantified these errors empirically as the average standard deviation of simulated CO2 fields within different radii of measurement locations. This approach, however, requires knowledge of the entire sampled distribution (e.g. XCO2 over gridcell) and does not evaluate the representativeness of multiple measurements within a given gridcell.
Gerbig et al.  and Lin et al.  evaluated the spatial covariance of partial CO2 columns using aircraft measurements. The studies used the evaluated spatial covariance to statistically generate simulated fields with a similar spatial covariance at small spatial resolutions. The simulated fields were divided into subareas used to represent model gridcells. The representation error was then evaluated as the average standard deviation of the simulated values within each model gridcell. This evaluation reflects the variance of the potential retrievals within a model gridcell, but does not represent the uncertainty in estimating the gridcell mean given multiple measurements within each gridcell.
 In the context of satellite data, a number of studies have evaluated the representation error as the within-gridcell XCO2 variance (or sampling variance). Corbin et al.  and Miller et al.  evaluated representation errors empirically based on high resolution XCO2 simulations. Miller et al.  sampled model gridcells according to a North-South swath, and assumed that the representation error is equal to the difference between the true simulated gridcell mean and the sample mean. Corbin et al.  extended this approach to include temporal variability and the effect of clouds, by excluding cloudy pixels from the sampled North-South swaths. Both studies calculated the swath means of all possible swath locations within model gridcells, and used the statistics of the resulting swath mean distribution. Corbin et al.  subtracted the known simulated gridcell means from these distributions, and used the standard deviation of the residuals as an estimate of the representation error. The methods presented in these two studies cannot be reproduced for actual satellite sampling conditions, however, because the true gridcells means are unknown.
 This study introduces a statistical method for evaluating the representation errors associated with using satellite retrievals to represent the mean XCO2 within atmospheric transport model gridcells. The proposed method is based on: (1) the spatial distribution of satellite retrievals within a model gridcell, and (2) knowledge of the degree of XCO2 variability in the vicinity of the model gridcell. The proposed method uses a geostatistical evaluation of the XCO2 variability to quantify the spatial covariance between any two satellite retrievals as a function of their separation distance. This spatial covariance function can be inferred from available in situ data, XCO2 model simulations, or potentially from the satellite retrievals themselves. Together with known retrieval locations, the method evaluates representation errors in a way that: (1) reflects the amount of information provided by available retrievals about the true unknown gridcell mean, and (2) does not require knowledge of the true value of that mean. The method is demonstrated using the regional spatial covariance statistics derived by Alkhaled et al.  using modeled XCO2, together with assumed spatial distributions of satellite retrievals within hypothetical model gridcells.
2. Data and Methods
 When XCO2 measurements are used as observations within a model, the XCO2 value assigned to a given gridcell is intended to represent the true mean of the XCO2 distribution within that gridcell. In reality, however, individual OCO XCO2 soundings will have a much smaller footprint relative to a typical atmospheric transport model gridcell, and these soundings will not sample the full area of gridcells. Therefore, statistically, the representation error is the uncertainty associated with inferring the mean XCO2 for a given gridcell using retrieved soundings. The proposed method evaluates representation errors using block kriging [e.g., Chilès and Delfiner, 1999], a spatial estimation method that uses the spatial covariance information of XCO2 over sampled regions together with information about the locations of retrieved soundings to quantify the uncertainty associated with the mean XCO2 within each model gridcell (i.e. representation error σRE).
 To construct the block kriging system, each model gridcell is divided into m pixels with areas equal to the satellite sounding footprint (e.g. 3 km2 for OCO). The retrievals are assumed to be an n × 1 vector of noisy samples z taken at locations x of a random spatial process y representing the XCO2 distribution within the gridcell at the resolution of satellite soundings:
 The retrieval measurement errors () have an n × n covariance matrix R, which can be diagonal if the errors are assumed to be uncorrelated, or can have off-diagonal elements to represent spatially-correlated retrieval errors. The XCO2 distribution within the gridcell at the resolution of satellite soundings (y) is assumed to have a mean E[y] = Xβ, where X is a matrix of covariate values at the sampling locations, β is a vector of coefficients, and E[.] is the expectation operator. For the current application, the spatial mean (E[y]) within each gridcell is assumed constant (although it can vary between gridcells); therefore, X is an m × 1 vector of ones and β is an unknown large-scale mean. y is also described using an m × m spatial covariance matrix Q = E[(y − Xβ) (y − Xβ)T]. Each element of the covariance matrix (Qij) is calculated based on the regional spatial covariance and the separation distances (hij) between the gridcell pixels. For example, for an exponential covariance structure, the elements of Q will have the form [e.g., Chilès and Delfiner, 1999]:
where σreg2 and Lreg represent the regional XCO2 variance and range parameter, respectively, and where the distance beyond which the correlation between any two XCO2 measurements approaches zero (i.e. the correlation length) is 3Lreg.
 The uncertainty associated with the estimated XCO2 distribution within a gridcell () at the resolution of the satellite soundings can be quantified by solving the following kriging system:
where S is an n × m indicator matrix of zeros and ones, with each row of S corresponding to a single satellite retrieval, and a one indicating the location of the sampled pixel. Equation 3 is solved for an m × n matrix of coefficients Λ and a 1 × m vector of Lagrange multipliers M. The Λ's represent the weighting that each of the n retrieved soundings receives in estimating the XCO2 value at each of the m locations within the gridcell, and M represents the additional uncertainty resulting from the fact that the mean of the spatial process y is assumed unknown.
 The evaluated parameters (Λ and M) define the m × m covariance matrix (V) of the uncertainties of the XCO2 signal at the resolution of the retrievals within each gridcell:
The representation error, which is equal to the uncertainty associated with the estimated average (or block) XCO2 within each gridcell, is evaluated by aggregating V as:
where 1m is an m × 1 vector of ones. This estimated representation error, expressed as a variance, takes explicit account of both the spatial covariance structure of XCO2 (Q), and the physical distribution (and redundancy) of retrievals within each gridcell.
2.2. XCO2 Spatial Variability
 To implement the method described in section 2.1, the spatial covariance of XCO2 must be known. This covariance can be evaluated using aircraft measurements, or potentially satellite retrievals, in the geographic region of a gridcell. Alternately, as will be presented in this study, the covariance can be approximated based on model simulations of the global XCO2 distribution. The spatial covariance information used in this study is based on work by Alkhaled et al. , where the spatial variability of pressure-averaged dry-air mole fractions (XCO2) was evaluated using simulations from the PCTM/GEOS-4 global chemistry and transport model run at a 2° latitude by 2.5° longitude resolution [Kawa et al., 2004], as well as a second global model, a finer resolution regional model and aircraft measurements. Alkhaled et al.  evaluated the spatial variability of XCO2 as modeled by PCTM/GEOS-4 by fitting the exponential covariance parameters σ2reg and Lreg (Section 2.1) in regions surrounding each gridcell. These parameters are used here to populate the covariance matrix (Q) as shown in equation (2).
2.3. Model Gridcell and Sampling Conditions
 In addition to XCO2 variability over the sampled region, representation errors also depend on the satellite's retrieval footprint, the transport model resolution and the spatial distribution of retrievals within each gridcell.
 To demonstrate the proposed methodology, representation errors are quantified using a hypothetical transport model with 1° × 1° resolution and a 3 km2 retrieval footprint. Representation errors are evaluated assuming two spatial distributions of retrievals within each model gridcell, which represent idealized and adverse sampling conditions (Figure 1): gridcells are sampled assuming (1) a full North–South swath of retrievals in the middle of each gridcell, and (2) a single satellite retrieval at the corner of each gridcell. For illustration, the two sampling conditions are applied to all model gridcells, even at locations that would not be sampled due to the satellite track. The dimensions of the swath are representative of the sampling design of OCO, with 8 soundings across each swath. Each sounding is assumed to measure 2.4 km in latitude by 1.25km in longitude.
 To analyze the effects of the factors specifically controlling representation errors, no measurement error is included in the presented analysis (R = 0). For actual satellite retrievals, however, an accurate evaluation of the representation errors using the proposed method requires satellite measurement errors to be incorporated in equation 3.
 Although the example used here makes specific assumptions about model setup and satellite retrievals, the method can accommodate any transport model resolution, retrieval footprint, and retrieval distribution.
3. Results and Discussion
 To demonstrate the effects of seasonal changes in XCO2 variability on representation errors, the presented method is applied for the months of January and July for both the swath and edge sampling described in Section 2.3. The regional spatial variability parameters evaluated by Alkhaled et al.  range from 0.24 ppm2 to 1.3 ppm2 in January, and 1.6 ppm2 to 9 ppm2 in July. The shortest observed correlation lengths were 700 km in January, and 1800 km in July. The corresponding representation errors are presented in Figure 2 and show that: (1) representation errors are high over regions with high XCO2 variability (see Figures 3, 4, and 5 of Alkhaled et al. ), and (2) adverse sampling conditions increase representation errors even over areas with low XCO2 variability.
 Seasonal changes in XCO2 variability cause the location of maximum representation errors to vary seasonally. During the Northern Hemisphere (NH) summer, high representation errors occur over East Asia, Eastern North America and extend over the Atlantic Ocean, due to CO2 variability caused by North American fluxes (Figures 2c and 2d). During the NH winter, high representation errors occur over the Tropics and East Asia (Figures 2a and 2b). Relatively high errors are also expected during the NH winter over Eastern North America. Representation errors are generally low over oceans and other continental areas.
Figure 2 also demonstrates that the impact of the number of retrievals and their distribution within a gridcell is comparable to the impact of differences in XCO2 variability. Figures 2a and 2c shows that for one satellite retrieval located in the corner of a gridcell, the representation errors range from 0.07 ppm to 0.82 ppm during the NH winter and 0.14 ppm to 0.86 ppm during summer. For a complete satellite swath in the middle of each gridcell, Figures 2b and 2d shows that the representation errors range up to 0.16 ppm during both the NH winter and summer.
 Results also show that representation errors are a function of gridcell area. The representation errors decrease for all model gridcells when moving from a single retrieval to a complete satellite swath, but this decrease is different for cells at the equator (large gridcell area) and for cells near the poles (small gridcell area).
 The presented method provides a flexible framework for accounting for the impact of geographic differences in XCO2 variability and differences in the spatial distributions of retrieved soundings within gridcells. As such, the results presented here can be compared to representation errors reported in previous studies for cases involving similar sampling conditions. For example, Miller et al.  estimated the representation error using XCO2 simulated by a regional model over North America (NA), as described in Section 1. Representation errors were calculated for 1km and 16km grid resolutions for domains of 38km and 600km, respectively. Errors were found to be approximately 0.18 ppm for both the coarse and fine resolutions. Using a similar gridcell retrieval distribution and similar XCO2 variability, the method presented in the current work produces similar results (0.12 ppm), as shown in Figure 2d over NA in July for a 10 km-wide swath and 3km2 sounding footprint. A possible reason for the difference is the small height (7.2 km) of the XCO2 column used by Miller et al.  (i.e. higher XCO2 variability) relative to the 48 km column used here. This comparison shows that for similar gridcell sampling conditions, region, and time, the presented method produces similar results, with the advantage that the actual gridcell mean need not be known to perform the analysis.
 Results can also be compared to those of Corbin et al.  over NA and South America (SA) under swath sampling conditions. Corbin et al.  evaluated the representation error for two model gridcell resolutions, 1 km and 5 km, and two grid sizes, 97 km for the fine resolution and 355 km to 450km for the coarse resolution. For these regions, August representation errors range from 0.09 ppm to 0.19 ppm over NA (results not shown), which are comparable to the values reported by Corbin et al.  for the same month (0.06 ppm for the fine grid and 0.43 ppm for the coarse grid). Over SA, August representation errors range from 0.09 ppm to 0.20 ppm, which are also similar to Corbin et al.  values of 0.21 ppm to 0.24 ppm for the fine and coarse grids, respectively. The advantage of the current method, however, is the ability to estimate representation errors without knowledge of all possible swath means over a gridcell, which is required by Corbin et al.  and will not be known for real satellite retrievals.
Gerbig et al.  and Lin et al.  evaluated the spatial covariance of aircraft XCO2 measurements over NA and the Pacific Ocean, and used these covariances to produce statistical realizations of XCO2 at two model gridcell resolutions (5 km and 50 km) and a range of gridcell sizes (up to 1000 km). As discussed in Section 1, the representation error was then assumed to equal the average standard deviation of XCO2 values within all possible gridcells of the domain. In the case of a single retrieval per gridcell, the uncertainty associated with the inferred gridcell mean is equivalent to the variance of XCO2 at the retrieval resolution. Therefore, the representation errors reported by Gerbig et al.  and Lin et al.  are comparable to representation errors under adverse sampling conditions. Despite the mismatch between the sample and gridcell areas, the representation errors reported by Gerbig et al.  and Lin et al.  (0.5 ppm for NA and 0.25 ppm for the Pacific Ocean, as approximated from Figure 3 of Lin et al. ) are comparable to the representation errors calculated here (Figures 2a and 2c), with the advantage that the approach presented here can accommodate any spatial distribution of samples within gridcells.
 Representation errors occur due to the mismatch between the spatial footprint of XCO2 retrievals and the resolution of atmospheric transport models. The magnitude of these errors depends on the ability of retrieved soundings to capture the true XCO2 mean within model gridcells, which in turn depends on the number and spatial distribution of retrievals within model gridcells and the underlying spatial variability of XCO2 over the gridcell areas.
 This study introduces a geostatistical method for evaluating representation errors. Unlike previous studies, the method provides a statistical tool that quantifies grid-scale representation errors by linking the actual spatial distribution of retrievals within each gridcell and the regional XCO2 variability. The proposed method can evaluate errors associated with any model resolution and any satellite sounding footprint, as well as accounting for uncorrelated or correlated measurement errors. The XCO2 variability can be estimated using modeled XCO2 distributions, as was presented here, or could be inferred from actual satellite retrievals. The method does not require knowledge of the XCO2 distribution within gridcells at the resolution of the satellite footprint.
 The presented method was applied using spatial covariance information from Alkhaled et al. , assuming a hypothetical model with 1° × 1° resolution and a sounding footprint representative of OCO soundings. Results show that representation errors vary spatially and temporally, as a function of seasonal and geographic changes in XCO2 variability, and the spatial distribution of satellite retrievals within each gridcell.
 Although this study focused on spatial representation errors, temporal XCO2 variability would also contribute to representation errors if retrievals taken across multiple days were used jointly to estimate XCO2 for a given location and time. Extending the presented method to include temporal variability will be the topic of future work.
 This research was partially performed for the Orbiting Carbon Observatory Project at Caltech-JPL, under a contract with NASA. Additional support was provided through NASA grant NNX08AJ92G, and the Kuwait University Scholarship Committee. The PCTM work was enabled by G. J. Collatz and Z. Zhu, and supported by NASA Carbon Cycle Science.