##### 3.2.1. Semivariogram Model

[18] The spatial variability of X_{CO2} is quantified by modeling the semivariogram of the X_{CO2} distribution, which describes the degree to which two X_{CO2} values are expected to differ as a function of their separation distance (*h*). To evaluate this relationship, the raw semivariogram *γ*(*h*) is evaluated for all pairs of X_{CO2} data:

where the distance (*h*) between locations *x*_{i} and *x*_{j} is the great circle distance between these points on the surface of the Earth:

and where (ϕ_{i}, *ϑ*_{i}) are the latitude and longitude of location *x*_{i}, and *r* is the Earth's mean radius. The semivariogram is used to model the spatial autocorrelation of X_{CO2} that is not explained by a deterministic trend in the data. Therefore, the X_{CO2} north–south gradient is estimated for each month using linear regression and subtracted from the data prior to the analysis.

[19] A theoretical variogram model is selected on the basis of the observed variability to represent the spatial autocorrelation structure. The theoretical variogram describes the decay in spatial correlation between pairs of X_{CO2} measurements as a function of physical separation distance between these measurements. The exponential semivariogram [e.g., *Cressie*, 1993] is selected here to model MATCH/CASA X_{CO2} spatial variability, based on an examination of a binned version of the raw variogram. The exponential variogram is defined as:

where *σ*^{2} represents the expected variance of the difference between X_{CO2} measurements at large separation distances, and 3*L* represents the practical correlation range between X_{CO2} measurements. These parameters also define the corresponding exponential covariance function: *C*(*h*) = *σ*^{2} exp(−*h/L*).

[20] The exponential model parameters are fitted to the raw semivariogram of the latitudinally detrended X_{CO2} data using nonlinear least squares. The fitted variogram parameters define the spatial covariance structure of the modeled X_{CO2} signal. The uncertainty of the least squares fit of the variance (*σ*^{2}) and range parameter (*L*) are not reported in this study because the results are based on an exhaustive sample from the simulated field, and the uncertainty resulting from limited sampling is negligible. The majority of the uncertainty associated with variogram parameters stems from assumptions about fluxes and transport, and the sensitivity to these choices is explored is sections 3.3 and 4.3.

##### 3.2.2. Spatial Variability Analysis

[21] The global spatial variability is defined through semivariogram parameters fitted to the raw semivariogram. For each day, the raw semivariogram is constructed using detrended MATCH/CASA X_{CO2} at 1300 local time for all model grid cells. The analysis is repeated for each day of the model year to identify both the seasonal trends in global variability at daily resolution, and the relationships between these trends and seasonal changes in global CO_{2} flux and transport.

[22] Regional variability in the spatial covariance structure is evaluated through localized variograms representing subareas of the global domain. This analysis requires areas (regions) large enough to capture the scales of variability within a given subdomain of the model, while at the same time small enough to reveal the characteristics of local spatial variability.

[23] A regional variability analysis with a similar methodological goal was previously adopted by *Doney et al.* [2003] to measure the mesoscale global spatial variability of satellite measurements of ocean color. In that study, daily anomalies from the monthly block mean of the natural log of chlorophyll concentrations were used to fit spherical variograms for nonoverlapping 5° regions globally.

[24] In the case of X_{CO2}, regional covariance parameters were fit for each model grid cell, resulting in a regional spatial variability analysis at a 5.5° resolution. Because regional spatial variability may reflect global general circulation patterns as well as differences in surface fluxes between regions, correlation lengths of X_{CO2} may extend beyond individual continents or ocean basins. To account for this, the local semivariogram parameters in the current work are constructed to reflect both the local variability and its relationship to global spatial variability. First, regions are defined as overlapping 2000 km radius circles centered at each model grid cell, resulting in a total of 2048 regions covering the globe. A 2000 km radius was selected because it is sufficiently large to capture much of the variability in the vicinity of a given grid cell, while being small enough to capture regional variability in the spatial covariance structure. Second, the raw semivariogram (*γ*_{region}(*h*)) is constructed using pairs of points with one point always within the defined region (*X*_{CO2}(*x*_{region})) and the other either within or outside that region (*X*_{CO2}(*x*_{region} + *h*)) (see Figure 1). This approach focuses on the variability observed within each subregion, while also accounting for larger scales of variability:

Third, to emphasize the covariance of X_{CO2} within the analyzed region, weighted nonlinear least squares is used to fit the local semivariogram parameters, with higher weights assigned to points within a separation distance less than or equal to 4000 km. Numerically, correlation lengths are also restricted to a maximum of half the Earth's circumference.

[25] Conceptually, a higher variance is representative of more overall variability, as is a shorter correlation length, which is indicative of more variability at smaller scales. The parameter *h*_{o} is introduced to provide a single representation of the degree of variability observed in different regions, and to merge information about both the variance and correlation lengths of X_{CO2} variability. If we consider a single sounding at a known location, *h*_{o} is defined as the maximum distance from the sounding location at which the mean squared X_{CO2} prediction error is below a preset value, *V*_{max}. The mean squared prediction error is the uncertainty associated with using the sounding to predict the unknown value at a given distance away from the sounding location, using ordinary kriging. Ordinary kriging is a minimum variance unbiased interpolator that takes advantage of knowledge of the spatial covariance structure to interpolate available measurements while providing an estimate of the interpolation error [*Chiles and Delfiner*, 1999]. For an exponential variogram:

where *σ*_{R}^{2} and *L*_{R} are the fitted regional variance and range parameter, respectively.

[26] Both a higher regional variance *σ*_{R}^{2} and a shorter regional range parameter *L*_{R} lead to a decrease in the overall spatial scale over which a given measurement is representative of the surrounding X_{CO2} values. It should be noted that no measurement error is assumed in the calculation of the regional variance *σ*_{R}^{2} and range parameter *L*_{R}. Therefore, the resulting *h*_{o} values demonstrate the overall spatial scale of the information provided by a noise-free X_{CO2} measurement over the measurement region and time.

[27] In subsequent sections of this work, variability inferred from the MATCH/CASA model is compared to other models and field data, where different theoretical variogram models are used to represent X_{CO2} spatial variability. Because parameters used to describe the variability differ between variogram models, the *h*_{o} parameter also provides a convenient universal metric that can be compared across models. The equivalent *h*_{o} parameters for the other variogram models used in this study are presented in subsequent sections.

[28] Conceptually, the *h*_{o} parameter can also be thought of as a measure of the expected relative spatial density of retrieved soundings that would be required to capture the spatial variability of X_{CO2} over different regions. The choice of *V*_{max} is somewhat flexible, but should represent a level of interpolation uncertainty that is relevant to potential applications of the data. In the presented results, *V*_{max} is chosen to be 0.25 ppm^{2} (√V_{max} = 0.5 ppm). This level is comparable to the 1 ppm regional-scale uncertainty described as a goal for OCO [*Chevallier et al.*, 2007]. It should be noted that *V*_{max} represents the interpolation uncertainty assuming no measurement error. Thus, the lower variance was chosen to compensate for the additional uncertainty that would be contributed by measurement errors and other sources of error.