Precision requirements are determined for space-based column-averaged CO2 dry air mole fraction data. These requirements result from an assessment of spatial and temporal gradients in the relationship between precision and surface CO2 flux uncertainties inferred from inversions of the data, and the effects of biases on the fidelity of CO2 flux inversions. Observational system simulation experiments and synthesis inversion modeling demonstrate that the Orbiting Carbon Observatory mission design and sampling strategy provide the means to achieve these data precision requirements.
 Carbon dioxide (CO2) is a natural component of the Earth’s atmosphere and a strong greenhouse forcing agent. Concentrations of atmospheric CO2 fluctuated between 185 and 300 parts per million (ppm) over the last 500,000 years [Interrovernmental Panel on Climate Change (IPCC), 2001]. However, since the dawn of the industrial era 150 years ago, human activity (fossil fuel combustion, land use change, etc.) has driven atmospheric CO2 concentrations from 280 ppm to greater than 380 ppm. Such a dramatic short-term increase in atmospheric CO2 is unprecedented in the recent geologic record, prompting Crutzen to label the current era as the Anthropocene [Crutzen, 2002]. Better carbon cycle monitoring capabilities and insight on the underlying dynamics controlling atmosphere exchange with the land and ocean reservoirs are needed as society begins to discuss active management of the global carbon system [Dilling et al., 2003].
 Data from the existing network of surface in situ CO2 measurement stations [GLOBALVIEW-CO2, 2005] indicate that the terrestrial biosphere and oceans have absorbed almost half of the anthropogenic CO2 emitted during the past 40 years. The nature, geographic distribution, and temporal variability of these CO2 sinks are not adequately understood, precluding accurate predictions of their responses to future climate change [Friedlingstein et al., 2006; Cox et al., 2000; Fung et al., 2005]. Inverse modeling of the surface in situ CO2 data [GLOBALVIEW-CO2, 2005] provide compelling evidence for a Northern Hemisphere terrestrial carbon sink, but the network is too sparse to quantify the distribution of the sink over the North American and Eurasian biospheres or to estimate fluxes over the Southern Ocean [Gurney et al., 2002, 2003, 2004, 2005; Law et al., 2003; Baker et al., 2006a]. Existing models and measurements also have difficulty explaining why the atmospheric CO2 accumulation varies from 1 to 7 gigatons of carbon (GtC) per year in response to steadily increasing emission rates.
 Space-based remote sensing of atmospheric CO2 has the potential to deliver the data needed to resolve many of the uncertainties in the spatial and temporal variability of carbon sources and sinks. Several sensitivity studies have evaluated the improvement in carbon flux inversions that would be provided by precise, global space-based column CO2 data [Dufour and Breon, 2003; Houweling et al., 2004; Mao and Kawa, 2004; O’Brien and Rayner, 2002; Rayner et al., 2002; Rayner and O’Brien, 2001; Baker et al., 2006b]. The consensus of these studies is that satellite measurements yielding the column-averaged CO2 dry air mole fraction, with bias-free precisions in the range of 1–10 ppm (0.3–3.0%) will reduce uncertainties in CO2 sources and sinks due to uniform and dense global sampling. As Houweling et al.  demonstrated, the precision requirements for space-based data vary depending on the spatial and temporal resolution of the data and the spatiotemporal scale of the surface flux inversion. Clearly, the highest precision data (for example, 1 ppm or better) would best address the largest number of carbon cycle science questions. However, this requirement must be balanced against the significant technical challenges of delivering a satellite measurement/retrieval/validation system that can produce bias-free, sub-1% precision data. The space-based data must also be accurately calibrated to the WMO reference scale for atmospheric CO2 measurements so that they can be ingested simultaneously with suborbital data in synthesis inversion or data assimilation schemes without producing spurious fluxes.
 The Orbiting Carbon Observatory (OCO) was selected by NASA’s Earth System Science Pathfinder (ESSP) program in July 2002 to deliver space-based data products with the precision, temporal and spatial resolution, and coverage needed to characterize the variability of CO2 sources and sinks on regional spatial scales and seasonal to interannual timescales [Crisp et al., 2004]. The mission is designed for a 2-year operational period with launch scheduled for 2008, the first year of the Kyoto Protocol commitment period. OCO will join the EOS Afternoon Constellation (A-Train), flying in a sun-synchronous polar orbit with a constant 1:26 p.m. local solar time (1326 LST) flyover, a 16-day (233 orbit) repeat cycle and near global sampling.
 The OCO science team analyzed a broad range of measurement and modeling data to define the science requirements for space-based data precision. The products of this investigation address two fundamental questions:
 1. What precision does the OCO data product need to improve our understanding of CO2 surface fluxes (sources and sinks) significantly?
 2. Does the measurement/retrieval/validation approach adopted in the OCO mission design provide the needed data precision?
 This paper analyzes atmospheric CO2 observations and modeling studies of CO2 sources and sinks to derive the science requirements for space-based data precision (question 1). Analyses of space-based and suborbital measurements, as well as the development and validation of retrieval algorithms demonstrating the potential of the OCO mission design to achieve the required precision (question 2) are the subjects of recent [Kuang et al., 2002; Boesch et al., Space-based near-infrared CO2 retrievals: Testing the OCO retrieval and validation concept using SCIAMACHY measurements over Park Falls, Wisconsin, submitted to Journal of Geophysical Research, 2007, hereinafter referred to as Boesch et al., submitted mansucript, 2007; Washenfelder et al., Carbon dioxide column abundances at the Wisconsin tall tower site, submitted to Journal of Geophysical Research, 2007, herein referred to as Washenfelder et al., submitted manuscript, 2007] and ongoing studies.
2. Precision Requirements
 Two community-wide undertakings define the current state of knowledge for the atmospheric CO2 budget: the GLOBALVIEW-CO2 in situ measurement network [GLOBALVIEW-CO2, 2005] and the TransCom 3 transport/flux estimation experiment [Gurney et al., 2002, 2003, 2004, 2005; Law et al., 2003; Baker et al., 2006a]. The GLOBALVIEW-CO2 network’s emphasis on acquiring accurate measurements through rigorous experimental methods, constant calibration using procedures and materials traceable to WMO standards, and continual vigilance against biases has created the recognized reference standard data set for atmospheric CO2 observations. The network collects surface in situ CO2 measurements at approximately 120 stations worldwide, spanning latitudes from the South Pole to 82.4°N (Alert, Canada). Typical measurement uncertainties are on the order of 0.1 ppm (0.03%).
 The TransCom 3 project reported estimates of carbon sources and sinks from variations in the GLOBALVIEW-CO2 data via inverse modeling with multiple atmospheric transport models. This assessment confirmed that carbon fluxes integrated over latitudinal zones are strongly constrained by observations in the middle to high latitudes. Flux uncertainties were also constrained by inadequacies in the transport models and the lack of observations in tropical forests. The latter result is not surprising since the GLOBALVIEW-CO2 network strategy was originally designed specifically to avoid measurement contamination from air locally influenced by large CO2 sources or sinks. The inversions also exhibited significant uncertainties when trying to distinguish meridional contributions to the fluxes.
Rayner and O'Brien  showed that space-based data could dramatically improve our understanding of CO2 sources and sinks if these measurements provided adequate precision and spatial coverage. This study used a synthesis inversion model to estimate the surface-atmosphere CO2 flux uncertainties in 26 continent/ocean basin scale regions. The baseline was established by using measurements from 56 stations in the ground-based GLOBALVIEW-CO2 network. The results were compared to simulations that used spatially resolved, global data. Rayner and O’Brien found that global, space-based data with 2.5-ppm precisions (and no biases) on 8° × 10° scales would be needed to match the performance of the existing ground based network at monthly or annual timescales. Space-based data with 1-ppm precisions were predicted to reduce inferred CO2 flux uncertainties uncertainties of annual mean fluxes from greater than 1.2 GtC region−1 year−1 to less than 0.5 GtC region−1 year−1 when averaged over the annual cycle. Additionally, the uncertainties in all regions were more uniform for inversions using the space-based data.
 While these simulations clearly illustrate the advantages of precise space-based data, they do not explicitly quantify the data precision required for OCO because they do not simulate the spatial and temporal sampling strategy proposed for the OCO mission nor do they adequately characterize the sensitivity of source-sink inversions to data precision. To address these concerns, we combined simulated atmospheric CO2 data and transport models to estimate spatial gradients as well as global and regional scale variability. A series of observational system simulation experiments (OSSEs) sampled the synthetic CO2 fields using strategies simulating the GLOBALVIEW-CO2 surface network [GLOBALVIEW-CO2, 2005] as well as the OCO satellite. Inverse modeling of these data characterized the relationship between the inferred surface flux uncertainties and uncertainties in the space-based data. We also investigated the use of inversions to detect bias in the data and what level of sensitivity such analyses would provide.
2.1. Spatial and Temporal Gradients
 Distributions of atmospheric CO2 were simulated with the Model of Atmospheric Transport and Chemistry (MATCH) three-dimensional atmospheric transport model [Olsen and Randerson, 2004]. MATCH represents advective transport using a combination of horizontal and vertical winds and has parameterizations of wet and dry convection and boundary layer turbulent mixing [Rasch et al., 1994]. MATCH operates off-line using archived meteorological fields which for this study were derived from the NCAR Community Climate Model version 3 with T21 horizontal resolution (approximately 5.5° × 5.5°) and 26 vertical levels from the surface up to 0.2 hPa (about 60 km) on hybrid sigma pressure levels. The top of the first model level is approximately 110 m. The meteorological fields represent a climatologically “average” year rather than any specific year. This meteorological data was archived every 3 model hours and was interpolated to the 30-min MATCH time step. In this configuration MATCH has an interhemispheric transport time of approximately 0.74 years, about in the middle of the 0.55 to 1.05 year range of the models that participated in the TransCom 2 experiment [Denning et al., 1999]. A single year of dynamical inputs was recycled for the multiyear runs used in this study.
 Constraints on CO2 sources and sinks incorporated fossil fuel emissions as estimated by Andres et al. , atmosphere-oceanic exchange as estimated from sea-surface pCO2 measurement by Takahashi et al. , and biospheric fluxes modeled using the Carnegie-Ames-Stanford Approach (CASA) model [Randerson et al., 1997], including a diurnal cycle of photosynthesis and respiration. Simulated data were obtained from the model output by integrating vertically according to the OCO averaging kernel at the T21 horizontal resolution. In these simulations terrestrial ecosystem exchange was annually balanced; in other words, we omitted a “missing” carbon sink necessary to balance fossil carbon sources with the atmospheric CO2 growth rate [Gurney et al., 2002; Tans et al., 1990].
 The 15th of each month was taken as a sample representative day for each month. Data were extracted for 1300 local time globally, as a preliminary approximation of as would be observed by OCO. The resulting data are presented for January and July 2000 in Figure 1. These data differ slightly from the monthly mean maps presented in Figure 9 of Olsen and Randerson , capturing more of the instantaneous seasonal variability. All model values are reported relative to the annual mean surface mixing ratio south of 60° south. Figure 1 shows that the Northern Hemisphere variability is typically about 6 ppm. This is only 50% of the amplitude of the seasonal cycle in the near-surface concentrations of CO2 [see Olsen and Randerson 2004; Figure 5]. Somewhat larger high frequency variations, associated with passing weather systems and strong source regions, are also common. The monthly peak-to-peak amplitude of Southern Hemisphere is typically 2–3 ppm while the annual peak-to-peak amplitude varies from 6–7 ppm over tropical forests to less than 3 ppm over the Southern Ocean.
 These results indicate that space-based data with precisions better than 2 ppm are needed to resolve the peak-to-peak amplitudes in monthly and annual This precision is also sufficient to resolve regional scale meridional variations over the Northern Hemisphere boreal forests or the Southern Ocean. The OCO sampling strategy (section 4) is specifically designed to return space-based measurements with the high sensitivity and dense sampling in space and time required to attain this precision even when cloud and aerosol interference prevents observations of the complete atmospheric column in the majority of the observed scenes.
2.2. Global and Regional Variability
 Global spatial variability was quantified by analyzing the CASA/MATCH data calculated for 2000. Raw and experimental variograms were calculated to quantify the global variability of The raw variogram is defined for any two measurements as [Cressie, 1991]
where γ(h) is the raw variogram, z(x) is a measurement value at location x, z(x′) is a measurement value at location x′, and h is the separation distance between x and x′. The distance was calculated using the great circle distance between points on the surface of the earth [e.g., Michalak et al., 2004]:
where the coordinates xi = (ϕi, i) are the latitude and longitude, respectively, of the sample locations, and r is the mean radius of the earth. The raw variogram values are averaged for different ranges of separation distance to obtain the experimental variogram. Because the variogram is designed to represent the portion of the distribution that cannot be represented by a deterministic trend, the consistent North-South gradient was accounted for by detrending the data with respect to latitude, using the simulated latitudinal gradient for 1300 LST each month. In this way, the variograms represent the stochastic, spatially correlated, portion of the distribution, which is the portion that will need to be estimated to obtain a continuous distribution based on the point measurements taken by OCO. The resulting variograms are presented in Figure 2.
where σ2 is the semivariance and L is the length parameter. The theoretical variogram describes the decay in spatial correlation between pairs of measurements as a function of physical separation distance between these samples. The overall variance at large separation distances is 2σ2 and the practical correlation range is approximately 3L. The σ2 and L parameters were estimated using a least squares fit to the raw variogram. The fitted variograms are presented in Figure 2, and the global variance and correlation range for each month are summarized in Table 1. The correlation length represents the distance at which the expected covariance between z(x) and z(x′) approaches zero, and the measurement z(x′) no longer provides useful information about the value z(x). The variance indicates the maximum uncertainty at unsampled locations, in the absence of nearby measurements, assuming the overall mean or trend is known.
Table 1. Global XCO2 Variability at 1:00 PM Local Time for a Representative Day in Each Month of the Year 2000 From CASA/MATCH Model Runs
Correlation Length (3L), km
Variance (2σ2), ppm2
 This analysis shows that the CASA/MATCH field exhibits significant spatial correlation and that the degree of spatial correlation varies throughout the year. The variance is higher during the Northern Hemisphere summer and lower in winter. The seasonality of the global correlation length is less pronounced, but follows a similar pattern to that of the variance. These two factors have opposite effects on the required sampling intensity (i.e., a higher variance leads to a larger number of required sampling locations whereas a longer correlation length reduces the number of required samples). Overall, because of the stronger seasonality of the variance of the distribution, it is expected that a larger fraction of samples will need to be processed in summer months in order to achieve a specified level of uncertainty in the interpolated field.
 On the basis of this global analysis, an average sampling interval of approximately 1500 km will be required globally to achieve an interpolated uncertainty with a standard deviation below 1 ppm. This represents the average sampling interval over the entire year, although individual months will require either sparser or denser measurements depending on the statistical characteristics of the variability in each month. This estimate is based on:
where L and σ2 are taken from the theoretical variogram and Vmax is the maximum allowable uncertainty, expressed as a variance. In the above calculation, we used the annual mean parameters and Vmax = 1 ppm2. Note that this analysis assumes that either (1) the value sampled by OCO is representative of the average on the scale modeled by CASA/MATCH, or that (2) the covariance structure as estimated at the CASA/MATCH model grid scale is valid at the smaller OCO measurement scale. In reality, OCO will measure at a significantly smaller scale than the CASA/MATCH model resolution, and data at smaller scales tend to exhibit more variability relative to measurements that represent averages at larger scales. For these reasons, we expect the sampling interval required to achieve a maximum Vmax = 1 ppm2 uncertainty in the interpolated product to be smaller for the OCO sampling scale relative to the CASA/MATCH data. Note also that we have not considered measurement errors in this calculation, which would again increase the amount of sampling required to constrain the interpolated error to a given uncertainty threshold.
 To assess the regional variability of a separate variogram was constructed for each grid cell of the 5.5° × 5.5° model output (2048 cells globally), centered at that grid cell. In calculating the raw variogram for each grid cell, only pairs of data points with at least one member within a 2000-km radius of the grid cell were considered. Therefore the raw variogram consisted of data pairs where either (1) both measurements were within 2000 km of the central grid cell, or (2) one measurement was within 2000 km of the central grid cell and the other was not. In essence, this approach quantifies the variability between measurements in the vicinity of a grid cell and the global distribution.
 The regional-scale raw variograms were fitted using weighted least squares and an exponential variogram, giving greater weight to pairs of points at shorter separation distances and constraining the correlation length to less than 20,000 km. The resulting correlation lengths and variances of the fitted theoretical variograms are presented in Figures 3 and 4, respectively. These global maps of parameters describe the regional correlation structure of the CASA/MATCH The correlation structure exhibits temporal variability, as was seen in Figure 2, as well as strong spatial variability. This can be observed both in the variance and correlation lengths of the distributions. For example, in the Northern Hemisphere is correlated over shorter distances than the global average (Figure 3). The variability exhibited by the distribution is caused both by the scales and degree of variability of the underlying fluxes, as well as the variability induced by atmospheric transport.
 The results of the regional variability analysis are qualitatively consistent with the results of Lin et al. , who also found longer correlation lengths over the Pacific relative to continental North America in their analysis of aircraft-derived fields. A quantitative comparison is difficult to establish because Lin et al.  used a nonstationary power variogram to represent CO2 variability. Such a variogram does not have a finite maximum variance (i.e., sill) or correlation length to compare to those presented in Figures 3 and 4. One quantitative comparison that can be made is a calculation of the separation distance at which the expected difference in at two sampling locations is expected to reach a specified variance. Based on the variogram used in Lin et al. , the separation distance at which the squared difference between vertically integrated CO2 concentrations (<9 km) is expected to reach 1 ppm2 is 57 km over the North American continent in June 2003, and 727 km over the Pacific Ocean. Data over the Pacific Ocean were a composite of multiple years of springtime (February to April) and fall (August to October) data. For the CASA/MATCH data, the separation distance is 460 km over the North American continent for June 2000, and 13,600 km (March 2000) and 5200 km (September 2000) over the Pacific Ocean. The two sets of results are consistent in showing greater spatial variability over the continental regions, but Lin et al.  shows more overall variability at smaller scales. The higher variability inferred by Lin et al.  is most likely largely due to the scale at which the aircraft measurements were taken relative to the scale of the CASA/MATCH modeled data, and the limited vertical extent of the aircraft profiles. As was previously discussed, data at finer scales typically exhibit more variability relative to coarser data. This will need to be considered further in interpreting global model data in the context of fine scale OCO measurements. Regional scale variability will also be driven by local conditions and meteorology [Nicholls et al., 2004]. Spatial and temporal heterogeneity of the covariance structure must therefore be taken into account in the design of a sampling strategy and retrievals.
3. CO2 Fluxes from Inversions
3.1. Relationship Between OCO Precision and Surface CO2 Flux Uncertainties
 The synthesis inversion methods of Rayner and O’Brien [O’Brien and Rayner, 2002; Rayner et al., 2002; Rayner and O'Brien, 2001] were used to evaluate the impact of particular OCO mission design choices and the resultant regional scale data precisions on the surface CO2 flux uncertainties. That study used a higher resolution than Rayner and O’Brien  (116 source regions versus 26) and used an orbit simulator to sample the model in accordance with satellite orbit and viewing geometry. This is more stringent than the uniform monthly mean sampling assumed by Rayner and O’Brien . The new study still retrieves monthly mean fluxes. We note that Chevallier et al. (Chevallier et al., The contribution of the Orbiting Carbon Observatory to the estimation of CO2 sources and sinks: Theoretical study in a variational data assimilation framework, submitted to Journal of Geophysical Research, 2007, hereinafter referred to as Chevallier et al., submitted manuscript, 2007) has increased both spatial and temporal resolution of the retrieved sources and still shows considerable potential for OCO measurements.
Rayner et al.  also studied the ability to retrieve actual fluxes from a set of synthetic or pseudodata sampled to mimic various in situ or remotely sensed products. They used fluxes representing fossil fuel combustion, the ocean air-sea gas exchange and the seasonal flux from the terrestrial biosphere. We follow that setup here.
 Following Rayner and O'Brien , a baseline for comparisons with the simulated space-based data was established by performing synthesis inversion experiments to estimate the CO2 flux uncertainties for the GLOBALVIEW-CO2 surface CO2 monitoring network over the seasonal cycle. These flux errors are expressed in grams of carbon per square meter per year, (gC m−2 yr−1). The prior uncertainty for all regions was assumed to be 2000 gC m−2 yr−1 for monthly fluxes. This is a very weak prior estimate so as not to artificially inflate the performance of the inversion system. Monthly mean CO2 flask data were simulated with an uncertainty that assumed that the monthly mean had been constructed from four samples (i.e., one per week) from each of the 72 surface stations. Results are shown for January (Jan) and July (Jul) in Figures 5a–5b. For this baseline case most regions have flux uncertainties in excess of 1000 gC m−2 yr−1 with uncertainties greater than 1500 gC m−2 yr−1 typical for most land regions.
Figures 5c–5d show the flux uncertainties inferred by inverting a simulated network containing 25 continuous CO2 monitoring sites plus 47 sites reporting weekly CO2 flask measurements. The continuous monitoring sites were located based on current in situ measurements. Continuous surface measurements produce their greatest benefits in the vicinity of measurement stations that are located well away from strong sources and sinks. Even with the addition of the continuous CO2 measurements, flux uncertainties remain greater than 1000 gC m−2 yr−1 in most continental regions near strong surface fluxes because of the limited spatial coverage offered by the continuous monitoring stations. The largest uncertainties are seen in South America, central Africa, and southern Asia.
Figure 6 shows January (Jan) and July (Jul) CO2 flux uncertainties from synthesis inversion simulations assuming data sampled along the OCO orbit track with 1 ppm (0.3%, Figures 6a–6b) and 5 ppm (1.5%, Figures 6c–6d) precisions for monthly averages on 4° latitude × 5° longitude scales. For well-constrained regions in which the prior estimate has little impact, the flux uncertainty is proportional to the data uncertainty. The relationship breaks down for small regions (such as the subdivision of Australia in Figures 5 and 6) and at high latitudes where the measurement frequency is lower. Such problems can be reduced by calculating fluxes over larger spatial scales after performing the inversion.
 The Northern Hemisphere terrestrial carbon sink is thought to absorb about 1 GtC yr−1 from the atmosphere [Gurney et al., 2002]. The results presented in Figure 6 indicate that space-based data with monthly averaged precision of 1–2 ppm (0.3 to 0.5%) will yield flux uncertainties no greater than ∼100 gC m−2 yr−1 or 0.1 GtC (106 km2) −1 yr−1 (with the exception of Greenland). Inversions using such space-based data should be able to detect the 1 GtC yr−1 carbon sink if it is confined to an area or areas smaller than a few 1000 × 1000 km regions, for example, Northeastern North America. We anticipate even greater sensitivity to detecting such a sink if space-based and in situ surface CO2 data are combined in the inversion.
 Comparisons of the flux uncertainties inferred from the 5-ppm precision data (Figures 6c–6d) with results from the baseline inversion (Figures 5a–5b) show that, even at this degraded precision, the satellite data still provide a better constraint on surface fluxes for most regions. Augmenting the surface network with continuous monitoring stations (Figures 5c–5d) improves the surface flux constraints for regions around one of these continuous monitoring sites, but still provides inadequate constraints in other regions, particularly tropical forests. Because of uniform spatial sampling over land and ocean and sheer data volume, space-based data will make a substantial impact on reducing continental scale flux uncertainties even at 5-ppm precision.
3.2. Effects of Systematic Bias on CO2 Flux Inversions
 The precision requirements space-based data presented in section 2 assumed random errors with no significant spatially or temporally coherent biases. This section addresses the potential impact of systematic biases in the space-based data on inferred CO2 flux uncertainties. Systematic biases might result from such measurement considerations as signal-to-noise ratio, viewing geometry, whether the observations were made over land or ocean, spatial variations in clouds or aerosols, topographic variations, diurnal effects on the vertical CO2 profile, etc. The effects of such biases on flux uncertainties depend on their spatial and temporal scale since CO2 sources and sinks are inferred from gradients. Constant global biases do not compromise flux inversions because they introduce no spurious gradients in the fields that could be misinterpreted as sources or sinks. However, a constant bias in space-based data would complicate inversions or assimilations that also included suborbital data or data from other satellite platforms. Biases occurring on spatial scales smaller than ∼30 km are not a major concern because they will be indistinguishable from random noise contributions like scene-dependent variability. Coherent biases on 100–5000 km horizontal scales pose the greatest threat to the integrity of space-based data and must be corrected below detectable levels. Temporal biases occurring on seasonal time-scales will also complicate CO2 flux inversion and assimilation studies.
 Different biases were considered to define requirements for the OCO calibration and validation programs. For example, CASA/MATCH simulations indicate that a 1 GtC yr−1 Northern Hemisphere carbon sink superimposed on a background emissions source of 6 GtC yr−1 coming from the northern hemisphere would create an additional 0.4 ppm gradient between 45°N and 45°S (i.e., it would contribute about 1/6 of the gradient shown in Figure 1). If the space-based data were systematically biased by +0.2 ppm in the Northern Hemisphere and −0.2 ppm in the Southern Hemisphere, inversion modeling would fail to detect the sink. If they were systematically biased by −0.2 ppm in the Northern Hemisphere and +0.2 ppm in the Southern Hemisphere, inversion modeling would infer a Northern Hemisphere sink of 2 GtC yr−1 rather than 1 GtC yr−1. In either of these hypothetical cases, the large discrepancy between the inferred fluxes and prior estimates would signal potential problems.
 The model described in section 3.1 was used to assess the impact of a small, spatially coherent land-ocean bias on surface flux inversions, and to determine whether such a bias could be detected. We performed two separate forward simulations. The control case was derived assuming a 1400 LST orbit while test case used data biased by +0.1 ppm over land. The two resulting flux fields were input to the CRC-MATCH transport models and subsampled at 4-hour intervals at 72 stations. The (bias–control) near-surface CO2 concentration differences for the annual mean are shown in Figure 7. One might expect biases to reflect much larger errors near the surface, since spatial variations in CO2 (and many sources of bias) are largest there. In this test, the near-surface CO2 concentration differences are generally less than ±0.2 ppm. However, the results are spatially coherent with positive differences inferred over land and negative differences inferred over the oceans. Thus, the comparison of surface CO2 concentration data and data flux inversions clearly reveals a land-ocean bias in the retrievals, even when the bias is only 0.1 ppm.
 We corrected the artificially biased retrievals by inverting the time series of (bias–control) differences simulated at each of the 72 surface sites with a 4-hour sampling frequency. The fluxes produced from this inversion were added to the fluxes from the test (biased) inversion. Figure 8 depicts the difference between these corrected fluxes and the original flux estimates from the unbiased data. Most regions show differences smaller than ±10 gC m−2 yr−1. More importantly, the differences are no longer spatially coherent. Larger differences occur for land regions in the tropics where the surface network (black circles shown in Figure 7) is sparse. The annual mean land-ocean partition is corrected by the inversion of the surface data (shifting 1.6 GtC yr−1 from land to ocean).
 These tests increase our confidence in the ability to validate space-based data to a precision of 1 ppm because, while biases on the order of 0.1 ppm will be difficult to detect directly, they can be detected and corrected by combining the space-based measurements with observations from a reasonable number of surface stations. Fewer than 72 stations currently monitor CO2 continuously but for both practical and scientific reasons [e.g., Law et al., 2002] there is a trend toward more continuous monitoring. We note that the procedure employed here is demanding on transport models. These tests also show the importance of CO2 sources-sink inversions (level 4 data products) in validating the retrievals (level 2 data products).
3.3. CO2 Flux Constraints on Regional and National Spatial Scales
 Another OSSE was performed to assess the precision requirement for data to constrain CO2 fluxes on regional and national scales. We examined CO2 fluxes in Asia in spring. Studies by Suntharalingam et al.  and Palmer et al.  previously showed that high-density aircraft observations from the March to April 2001 TRACE-P aircraft campaign in Asian outflow over the NW Pacific [Jacobet al., 2003] provide valuable constraints on the CO2 flux from different countries in Asia. We evaluate here the extent to which OCO-like data can disaggregate the CO2 fluxes from China, India, Japan, Korea, and Southeast Asia (Figure 9).
 Pseudoobservations for March to April 2001 were generated using the GEOS-CHEM global three-dimensional chemistry transport model. Details of the GEOS-CHEM model and the CO2 simulation may be found in the work of Suntharalingam et al. . We used version 4.21 of GEOS-CHEM driven by GEOS-3 assimilated meteorological fields for 2000 and 2001, at a horizontal resolution of 2° latitude × 2.5° longitude. GEOS-CHEM was also employed as the forward model in the inversion analysis. The seasonal CO2 surface flux in the model, aggregated over the regions considered here, is listed in Table 2. These flux estimates are based on the source inventories used by Suntharalingam et al. . These fluxes were adopted as the true surface fluxes of CO2 in our inversion analysis. The inversion was conducted using an a priori estimate of the fluxes obtained by perturbing the “true” fluxes within the a priori errors.
Table 2. GEOS-CHEM CO2 Fluxes for March to April, 2001
Values are net fluxes and include contributions from fossil fuel and biofuel combustion, biomass burning, and exchange with the biosphere. The GEOS-CHEM model fluxes are taken as the “true” fluxes for purpose of the OSSE.
 The GEOS-CHEM simulation was conducted from 1 January 2000 to 30 April 2001, starting from observed CO2 latitudinal gradients. The first 13 months (to 31 January 2001) were used for initialization of the model CO2 background. Starting on 1 February 2001, CO2 surface fluxes for the different regions of Figure 9 were transported as separate tracers in the model; the background concentration as of 31 January was carried forward as an additional tracer with no further sources and sinks. We generated retrievals for OCO in March to April 2001 by sampling this pseudoatmosphere along the satellite orbit, transforming the modeled profiles with the OCO column averaging kernel, and adding noise. The retrieved pseudodata were limited to the region between the equator and 64°N and from 2.5° to 167.5°E for the purposes of this test. Model output was provided with 3-hour temporal resolution and we used the modeled CO2 profile closest to the 1326 LST OCO sampling time. Contributions from the different model tracers to the simulated CO2 concentrations were used to construct the Jacobian of the forward model for purpose of the inversion.
 To assess the precision requirements for in terms of monthly mean data with 4° × 5° resolution, the pseudoretrievals were correspondingly averaged and the results for March 2001 are shown in Figure 10a. In generating these observations we neglected the loss of data from cloud cover. Such data loss is inconsequential because of the large number of observations, as long as there are no correlations between CO2 column and cloud cover [Rayner et al., 2002].
 We assumed that the CO2 flux errors from the different regions in Table 2 were uncorrelated, with a uniform a priori uncertainty of 50%. The actual uncertainties will vary with the relative contributions of different sectors to the regional CO2 sources. Emissions from fossil fuel use in the industrial and vehicular sectors are known to within about 10% [Streets et al., 2003]. Emissions from the domestic fuel use sector (residential coal and biofuels), a major source in east Asia, may have uncertainties of about 50% on national scales [Palmer et al., 2003; Suntharalingam et al., 2004]. Emissions from biomass burning are uncertain by at least a factor of 2 [Palmer et al., 2003]. Net fluxes from the terrestrial biosphere in East Asia are uncertain by ∼100% [Gurney et al., 2002]. We also assumed no error covariance between individual OCO observations.
 Modeled values for March 2001, shown in Figure 10a, were convolved with 0.3% Gaussian measurement noise to generate the pseudoobservations. The a posteriori CO2 surface fluxes determined from the inversion are compared with the a priori and true fluxes in Figure 10b. We aggregated the fluxes from Japan and Korea because the 4° × 5° monthly mean pseudodata prohibited discriminating between emissions from these regions. The inverse model accurately updated the CO2 fluxes for the resulting four Asian regions. The flux uncertainty was significantly reduced for all regions, with the exception of the combined Japanese and Korean region (JPKR). The a posteriori errors for China, India, and south-east Asia improved to 16, 20, and 27%, respectively, from the a priori uncertainty of 50%.
 The result of the inversion analysis depends strongly on the observation error and on the a priori uncertainties assumed for the regional CO2 sources. Figure 11 shows the relative uncertainties of the a posteriori sources as a function of the observation error for three values of the a priori source uncertainty (50, 100, and 150%). The observations constrain the inferred CO2 fluxes from the Asian regions, with the exception of JPKR, when errors are 0.3% or less. With larger observation errors, it becomes more difficult for the inversion to resolve the contributions from individual regions, and the curves associated with the different a priori assumptions diverge. For example, with an observation uncertainty of 0.6% the a posteriori estimate for India is sensitive to the assumed a priori error and the estimates for both the Indian and Southeast Asian regions become strongly correlated with those from China (not shown). As expected, the error estimates for JPKR are most sensitive to the assumed a priori error, since this is the least well-constrained region in the inversion. These results demonstrate the potential for OCO observations to accurately disaggregate CO2 surface fluxes from India and China. This is important since these two regions are rapidly industrializing and experiencing significant land use changes.
4. The OCO Sampling Approach
 The modeling studies of spatial and temporal variability and surface CO2 flux inversions define the science measurement requirements for space-based data. The OCO sampling strategy is designed to return observations that maximize precision and minimize bias in the space-based data so as to obtain the most accurate possible constraints on regional scale surface CO2 fluxes [Crisp et al., 2004]. In situ measurements from tower [Bakwin et al., 1998; Haszpra et al., 2005] and aircraft [Anderson et al., 1996; Andrews et al., 1999; Andrews et al., 2001a; Andrews et al., 2001b; Bakwin et al., 2003; Machida et al., 2003; Matsueda et al., 2002; Ramonet et al., 2002; Sawa et al., 2004; Vay et al., 2003] have shown that vertical concentrations of CO2 can vary significantly, especially in the boundary layer. Therefore space-based measurements that sample the full atmospheric column are required. Space-based data must also capture variations in on seasonal to interannual timescales globally without diurnal biases. Measurements made from a polar, sun-synchronous orbit address these requirements. Scattering of solar radiation by clouds and optically thick aerosols prevents measurements that sample all the way to the surface. Spatial inhomogeneities within individual soundings (variations in topography, surface albedo, etc.) can compromise the accuracy of retrievals. A small sampling footprint mitigates both of these issues. The space-based data must be precise and unbiased over land and ocean (section 3.2), despite low surface albedos or other effects that may limit signal-to-noise levels. OCO includes both nadir and glint observing modes to mitigate concerns about signal-to-noise issues and a point-and-stare (target) mode for routine validation of retrievals over a range of latitudes, viewing angles, and geophysical conditions. We also address whether space-based data acquired via the OCO sampling strategy is representative of the regional scale fields it samples.
4.1. Space-Based Sampling Strategy
 The observatory will fly at the head of the earth observing system (EOS) Afternoon Constellation (A-Train), a polar, sun-synchronous orbit that follows the World Reference System 2 (WRS-2) ground track, providing global sampling with a 16-day repeat cycle and 1326 LST observations. This local time of day is ideal for spectroscopic observations of CO2 in reflected sunlight because the sun is high, maximizing the measurement signal-to-noise ratio, and because is near its diurnally averaged value at this time of day. This orbit also facilitates direct comparisons of OCO observations with complementary data products from Aqua (for example, AIRS temperature, humidity, and CO2 retrievals; Moderate Resolution Imaging Spectroradiometer (MODIS), Cloudsat, and Cloud-Aerosol Lidar and Infrared Pathfinder Satellite Observations (CALIPSO) clouds and aerosols; MODIS surface type), Aura (TES CH4 and CO), and other A-Train instruments. The 16-day repeat cycle enables tracking global variations twice per month, with nearby revisits (∼100 km horizontal separation) occurring at least once every 6 days.
 Each OCO sounding includes bore-sighted spectra of solar radiation reflected from the Earth’s surface in the 0.76 μm O2 A-band and the CO2 bands at 1.61 and 2.06 μm. is retrieved from the CO2/O2 ratio. The OCO instrument and observing strategy were designed to obtain a sufficient number of useful soundings to characterize the distribution accurately on regional scales, even in the presence of patchy clouds. The OCO instrument records up to eight soundings along a 10-km wide (nadir) cross-track swath at 3.0 Hz, yielding up to 24 soundings per second. As the spacecraft moves along its ground track at 6.78 km/s, each sounding will have a surface footprint with dimensions of 1.25 × 2.26 km at nadir, yielding up to 390 soundings over each 1° latitude increment along the orbit track.
 OCO will collect science observations in nadir, glint, and target modes. The same sampling rate is used in all three modes. In nadir mode, the spacecraft points the instrument boresight to the local nadir, so that data can be collected along the ground track directly below the spacecraft. Science observations will be collected at all latitudes where the solar zenith angle is less than 85°. This mode provides the highest spatial resolution on the surface and is expected to return more useable soundings in regions that are partially cloudy or have significant surface topography.
 Glint mode was designed to provide superior signal-to-noise (SNR) performance at high latitudes and over dark ocean, where nadir mode observations might have difficulty meeting the precision requirements. In glint mode the spacecraft points the instrument boresight toward the bright “glint” spot, where solar radiation is specularly reflected from the surface. Glint measurements will provide 10–100 times higher signal over the ocean than nadir measurements [Kleidman et al., 2000; Cox and Munk, 1954]. Glint soundings will be collected at all latitudes where the local solar zenith angle is less than 75°. The nominal OCO mission operations plan is to switch between nadir and glint modes on alternate 16-day repeat cycles such that the entire Earth is sampled in each mode on monthly timescales. Operating in both nadir and glint modes each month is an ideal way to detect global bias in the product since the retrieved data and inferred carbon fluxes should be independent of the observation technique.
 Target mode will acquire “point and stare” validation observations of specific stationary surface targets as the observatory flies overhead. Simultaneous acquisition of solar-viewing Fourier transform spectrometer (FTS) data from a targeted OCO validation site provides a means to transfer calibration of the space-based data to the WMO standard for atmospheric CO2 [Washenfelder et al., 2007; Boesch et al., 2007]. Target passes will last up to 8 min, providing up to 10,000 soundings over a given site at local observing angles between 0° and ±85°. Target mode enables the OCO team to assess the impact of viewing geometry on retrievals. Furthermore, the FTS validation sites have been distributed from pole-to-pole to identify and remove any biases that might arise as a function of latitude or region. Target passes will be conducted over each of the OCO validation sites 1–2 times per month. The Observatory will also regularly acquire target data over homogeneous Earth scenes such as the Sahara desert [Cosnefroy et al., 1996; Dinguirard and Slater, 1999] and Railroad Valley, CA [Abdou et al., 2002] for vicarious radiometric calibration.
 The OCO observing strategy provides thousands of samples on regional scales for each 16-day orbit track repeat cycle. The observatory will collect up to 3400 soundings every time it flies over each 1000 × 1000 km region. There are at least five overflights of each region every 16 days, resulting in up to 17,000 regional soundings per repeat cycle. Analysis of high spatial resolution MODIS cloud data aggregated to the 3 km2 size of the OCO footprint indicates that on average only about 24% of these soundings will be sufficiently clear for accurate retrieval. Breon et al.  recently analyzed GLAS data and determined that the global fraction of clear sky scenes (τ < 0.01) is ∼15% with an additional ∼20% of scenes having total cloud and aerosol optical depth τ< 0.2, the approximate threshold for which precise retrievals are possible [Crisp et al., 2004; Kuang et al., 2002]. Thus, the OCO sampling strategy should yield between 2600 and 6000 soundings per region per 16-day repeat cycle as candidates for retrievals. Multiple passes through each region also provide constraints on subregional spatial and temporal variations that are associated with local topography, passing weather systems or other phenomena, and provide the data needed to identify systematic biases that could compromise the data, even in persistently cloudy regions. The large number of mostly clear scenes provides sufficient sampling statistics to support the OCO baseline plan of alternating between nadir and glint observations on alternating 16-day repeat cycles.
 If all errors in the space-based retrievals were purely random and individual soundings had uncertainties of 16 ppm, then one would need to perform retrievals on only 256 out of the 2600–6000 candidate soundings to achieve estimates with precisions of 1 ppm on 16-day intervals. The OCO team has adopted a more stringent 6 ppm worst case single sounding data precision requirement to ensure that useful data can be collected even in persistently cloudy regions (i.e., the Pacific Northwest coast of North America or Northern Europe in the winter), where typical data yields are anticipated to be much less than 10% of the total number of soundings. Sensitivity analyses indicate that a 6-ppm single sounding precision requirement also provides adequate precision to identify and characterize systematic biases within individual 1000 × 1000 km2 regions.
4.2. Orbit Sampling Time of Day and Latitude Range
 As noted above, OCO will fly in a sun synchronous, polar orbit with an ascending 1326 LST equator crossing time. A series of synthesis inversion calculations, using the set-up of Rayner et al.  previously described, were performed to ensure that space-based data acquired at this time of day yield the precision needed to characterize regional scale CO2 sources and sinks. These OSSEs also allowed us to assess the sensitivity of flux inversions to the range of solar zenith angles (SZA) sampled by the data. Three orbit choices were tested using the full data set, (i.e., no cloud obscuration):
 1. 1100 orbit, with a solar zenith angle cut-off <75° (399,171 data points).
 2. 1400 orbit, with a solar zenith angle cut-off <75° (391,796 data points).
 3. 1400 orbit, with a solar zenith angle cut-off <60° (258,143 data points).
 All times are local solar times (LST) and refer to ascending equatorial crossing. Note that with the 1-hour time step in CRC-MATCH, 1400 LST seemed the best match to the OCO equatorial crossing time of 1326 LST. All inversions assumed an OCO data precision of 1 ppm.
Figure 12 shows January and July CO2 flux uncertainties, in gC m−2 yr−1, for each of the three orbit/SZA cases. The prior uncertainty for all regions was 2000 gC m−2 yr−1. In general, larger uncertainties are seen in smaller regions because the smaller regions are sampled less frequently than larger ones. The results for the 1100 and 1400 orbits that sample the globe at SZA < 75° are very similar. There are large uncertainties at high latitudes in the winter hemisphere where the SZAs are largest. This is more noticeable in the Northern Hemisphere in January than in the Southern Hemisphere in July because the average region size is smaller in the northern high latitudes than the southern high latitudes. As expected, the SZA < 60° case gives larger uncertainties in winter at midlatitude to high latitude than the SZA < 75° cases. It is noticeable that the loss of information impacts regions closer to the equator (to 30°) in the SZA < 60° case relative to the SZA < 75° cases. We would expect a SZA < 60° case to produce similar effects on the 1100 orbit.
 These simulations verify that space-based measurements from the OCO orbit are sufficient to meet the mission sampling requirements as well as providing explicit constraints on the range of SZAs over which measurements must be recorded. The OCO Science Requirements now specify that the observatory shall be capable of acquiring data at solar zenith angles as large as 75° in glint mode, and at solar zenith angles as large as 85° in nadir mode.
4.3. Diurnal Sampling Bias
 In addition to providing data with adequate precision to resolve key spatial and temporal gradients, the OCO mission design also minimizes sensitivity to diurnal variations in the data. For example, Haszpra  found that only measurements obtained in the early afternoon can be considered as regionally representative of the CO2 mixing ratio in the planetary boundary layer based on measurements made at two monitoring sites located 220 km apart in the Hungarian plain. The 1326 LST sun synchronous polar orbit selected in the OCO mission design minimizes diurnal sampling bias, since the near-surface CO2 concentrations are close to their diurnally averaged values near this time of day [Olsen and Randerson, 2004]. Additionally, the largest diurnal variations in CO2 occur near the surface, and the amplitude of these variations decreases rapidly with height. data are therefore inherently much less sensitive to diurnal variations. CASA/MATCH simulations show that the residual uncertainty after correcting retrieved from 1326 LST observations to a 24-hour-averaged value will be <0.1 ppm, and that existing models can correct for OCO diurnal sampling bias.
 To assess the impact of the 1326 LST sampling bias on the inferred surface CO2 flux inversions, benchmark surface fluxes were estimated from orbits sampling twice a day at 0600/1800 and 1100/2300, respectively. These orbits are used only to define sampling times for the CO2 fields for comparison: for example, it would be impossible to measure reflected sunlight at 2300 globally. These fluxes are compared to the flux estimates generated from the 1400 orbit with SZA < 75°. We find that the differences between the monthly mean source estimates associated with diurnal sampling bias are usually smaller than the uncertainties on the 1400 orbit source estimates (for example, Figure 12b). Where larger differences do occur, it is not always possible to attribute these solely to diurnal biases. For example, sampling biases at high latitudes in the winter hemisphere due to the lack of sunlight are likely to swamp any diurnal effect there. This suggests that diurnal sampling biases alone are not a serious problem in estimating CO2 sources and sinks at monthly intervals.
4.4. Impact of Clouds on OCO Sampling
 We analyzed 1-km resolution MODIS cloud data to assess the science impact of cloud interference on the OCO sampling strategy. We adopted the Aqua MODIS products as the most representative of the cloud fields that OCO will encounter because OCO will fly in formation with the Aqua platform. See Figures 1 and 2 and Table 1 of Breon et al.  for global distributions of clear sky and almost clear sky frequency.
 To determine the relationship between clear-sky frequency and spatial resolution, we used MODIS cloud mask results for nonpolar daytime surfaces on 5 November 2000. The pixel size for the MODIS Aqua product is 1 × 1 km. For the present analysis, the MODIS pixels were aggregated into progressively larger square arrays (2 × 2, 3 × 3, 4 × 4 km2, etc.) with the array labeled clear if at least 95% of the 1-km pixels were “confident clear” in the MODIS cloud mask process. Globally averaged results for this analysis are presented in Figure 13.
 The clear-sky frequency decreases rapidly with increasing field of view (FOV) area up to about 36 km2 (6 × 6 km). For FOVs larger than 36 km2, the clear-sky frequency continues to decrease with increasing area, asymptotically approaching the 10% clear sky fraction commonly quoted for global averages, but the dependence on FOV area is significantly weaker. Figure 13 suggests that the clear-sky fraction for the 3 km2 OCO FOV is approximately 24%, a value more than two times larger than the 10% clear-sky fraction assumed in early OCO mission design calculations. This analysis thus increases confidence that the OCO small footprint sampling strategy will provide a sufficient number of clear soundings for accurate retrievals. The potential impact of a clear sky bias on the CO2 fluxes inferred from OCO data is a question that requires further investigation.
4.5. Flux Errors for Nadir and Glint Modes
 OCO science observations will alternate between nadir and glint observing modes on subsequent 16-day repeat cycles. In nadir mode, the spacecraft will collect data along the spacecraft ground track. This mode will provide the highest spatial resolution, and is expected to yield the most reliable data over continents, in regions occupied by patchy clouds, where spatial inhomogeneities could introduce systematic errors in the product. The primary shortcoming of this mode is that it is expected to yield lower measurement signal-to-noise ratios (SNR) over dark ocean surfaces. Glint mode addresses this issue by pointing the instrument boresight at the point on the surface where sunlight is specularly reflected toward the spacecraft. This mode is expected to yield measurements with a much higher SNR over the ocean, especially at high latitudes.
 Using the setup of Rayner et al. , we performed three simulations to compare the uncertainties in the fluxes returned from observations made in nadir and glint modes. The comparison is not perfect since the glint calculations were explicitly screened for cloud as the ground track was calculated while the nadir calculations were not. To accommodate these differences, we normalized the data uncertainty to mimic equal sampling density in nadir and glint modes. The principal remaining difference between the two inversions is the coverage, which depends primarily on the choice of SZA cut-off. To focus exclusively on differences associated with the viewing geometry, we chose a SZA < 70° for both inversions. This is somewhat pessimistic choice, since the nominal mission will acquire data at SZA as large as θo = 75° in glint mode and as large as θo = 85° in nadir mode. An additional nadir mode inversion was performed in which all soundings over oceans were omitted. This mimics a worst case scenario, where low albedos preclude reliable retrievals from nadir observations over ocean.
 Flux errors from these three inversion experiments are compared in Figure 14. The inversions show little difference in the surface flux uncertainties for the nominal glint and nadir cases. This is not surprising since both modes provide similar coverage and were constrained to provide data with the same precision. The reduced spatial coverage degrades high latitude winter performance relative to the baseline case (Figure 6a). It is also interesting to note that even if nadir observations over ocean are omitted (Figure 14c), the dense spatial coverage provided over continents by the space-based measurements still offers an advantage over land regions compared to the existing flask and augmented continuous monitoring networks (Figure 5).
4.6. Regional Scale Representativeness Errors
 Accurate surface flux inversions do not require space-based data with contiguous spatial sampling due to atmospheric transport (Chevallier et al., submitted manuscript, 2007) and representativeness scale lengths [Gerbig et al., 2003; Lin et al., 2004]. OCO uses 3-km2 footprints and a 10-km cross track swath to minimize potential biases associated with clouds and other sources of heterogeneity in the atmosphere and surface. It samples the atmosphere and surface rather than mapping them. Inferring surface CO2 fluxes from OCO data requires careful consideration, since inversion models typically use grids significantly larger (100–1000 km) than the OCO cross track swath (10 km). It is important that models aggregate OCO data to accurately represent spatial and temporal averages at the inversion model resolution. If this is not the case, the representativeness errors could be substantially increased.
 To address the question of short-term mesoscale representativeness errors in surface CO2 fluxes inferred from OCO data, we performed a 5-day simulation of surface fluxes and atmospheric CO2 concentration using the Regional Atmospheric Modeling System (RAMS) coupled to the Simple Biosphere Model (SiB2) on a nested set of four grids centered on the WLEF tall tower site in Park Falls Wisconsin for 26–30 July, 1997. Overall results and comparison to the tower observations are reported by Nicholls et al. . Here we report an analysis of potential representativeness error of north-south swaths of in the central part of the domain, formed by a 38 × 38 grid with a 1 × 1 km grid spacing. There are 45 vertical levels in this simulation, extending to 7.2 km.
 Several small lakes in the vicinity of the tower produced anomalous surface fluxes and, more importantly, anomalous circulations on some afternoons, leading to variations of CO2 of as much as 6 ppm in the planetary boundary layer. These variations are apparent, although much weaker, in the column mean. Figure 15 shows the spatial variations in simulated for four times during a 24-hour period extending over 28–29 July 1997.
 We evaluated possible representativeness errors associated with mesoscale variations by comparing the mean of 1-km-wide N-S swaths of simulated column mean CO2 with the “true” domain-averaged column mean mixing ratio over the 38 × 38 km grid at 1400 LST on each of the five days. The range of column mean mixing ratio at 1400 LST over the 5 days was 0.98 ppm. Spatial autocorrelation of swath means was quite high among swaths within 5 km of the target swath, so we used 19 degrees of freedom (38swaths per day times 5 days divided by 10 autocorrelated neighboring swaths) in a t test. Under these conditions, we found that 95% of the swaths represented had mean mixing ratios within 0.18 ppm of the true domain-averaged mixing ratio.
 We also performed a similar calculation on a regional domain of 600 × 600 km with 16-km grid spacing. Substantial spatial variability is imposed on the regional scale by the presence of the Great Lakes on the eastside of the domain. The range of column mean mixing ratio at 1400 LST was larger than on the mesoscale domain, with variations of 3.4 ppm over 5 days. Nevertheless, 95% of the N-S swaths captured the domain average within 0.17 ppm. These simulations provide confidence that the baseline OCO sampling strategy will deliver precise space-based data even in the presence of representativeness errors associated with realistic spatial variations in since typical representativeness errors are much smaller than 1 ppm. These results also provide additional confidence in our ability to validate OCO space-based retrievals against ground-based solar-viewing FTS spectra obtained at Park Falls [Boesch et al., 2007; Washenfelder et al., 2007].
 Precision requirements for space-based data have been assessed for the Orbiting Carbon Observatory mission from the results of observational system simulation experiments and synthesis inversion models. The precision requirements were determined by evaluating the variability of spatial and temporal gradients in the relationship between data precision and inferred surface CO2 flux uncertainties, and the OCO sampling strategy. The OCO measurement concept was tested using OSSEs and synthesis inversion models to infer regional scale CO2 sources and sinks from global and regional data.
 Simulated OCO data was ingested into synthesis inversion models to quantify the relationship between the precision and inferred surface-atmosphere CO2 flux uncertainties. On a global scale, uniform spatial sampling and the sheer number of space-based retrievals will still reduce the uncertainties in inferred surface-atmosphere CO2 fluxes compared to the fluxes inferred from the GLOBALVIEW-CO2 network even if the space-based data had precisions as poor as 5 ppm on regional scales. precisions of 1–2 ppm are needed on regional scales to improve our knowledge of carbon cycle phenomena. Simulated sampling of CO2 data fields demonstrated the ability of the OCO measurement concept to constrain regional fluxes of CO2 and quantified the relationship between data precision and the ability to distinguish regional CO2 fluxes from China and India.
 The impact of systematic biases on CO2 flux uncertainties depends on the spatial and temporal extent of a bias since CO2 sources and sinks are inferred from regional-scale gradients. Source-sink inversion modeling demonstrated that a land/ocean systematic bias as small as 0.1 ppm could be identified and removed from the data product. Biases on spatial scales smaller than ∼104 km2 may be discounted since they will appear the same as random noise. Constant global scale biases do not affect the CO2 flux uncertainties since they introduce no error into the gradients. Persistent geographic biases at the regional to continental scale will have the largest impact on the inferred CO2 surface fluxes. Therefore the OCO validation program must identify and correct regional to continental scale biases.
6. Note Added in Proof
 Since the completion of the work reported here, Chevallier et al. (Chevallier et al., submitted manuscript, 2007) performed inversion experiments which demonstrate that OCO data reduce inferred surface CO2 flux uncertainties even more than anticipated in the analyses presented above.
 This work was supported by the Orbiting Carbon Observatory (OCO) project through NASA’s Earth System Science Pathfinder (ESSP) program. SCO and JTR were supported by a NASA IDS grant (NAG5-9462) to JTR. We thank R. Frey for the assistance with the MODIS cloud data.