Global Biogeochemical Cycles

Using high temporal frequency data for CO2 inversions



[1] Almost all previous inversions for CO2 sources and sinks from atmospheric measurements have used monthly or annual mean data. Here, we assess the potential benefits and challenges of using higher time resolution CO2 concentration data. Inversions are performed with synthetic data since there are currently only limited continuous CO2 measurements available. The use of synthetic data also enables the inversion quality to be evaluated in terms of both systematic biases in the estimated sources and the magnitude of the source uncertainties. When estimating sources for 22 large regions, we find that using high time resolution data gives large reductions in uncertainty compared to using monthly mean data. Greater reductions are achieved (around 80%) for smaller rather than larger networks (around 45%), although larger networks give lower uncertainties overall (0.35 Gt C yr−1 on average compared to 1.2 Gt C yr−1). In most cases, inversion biases are larger than the estimated uncertainties (by up to 6 Gt C yr−1) due to the assumptions made in the inversion about the spatial distribution of sources within regions. These biases can be significantly reduced by estimating the sources for smaller regions. This is demonstrated using a case study that subdivides Australia into the model grid cells. Sources are successfully estimated from high time resolution data for the individual grid cells and for the total Australian source.

1. Introduction

[2] A useful context for this study is provided by a recent article [Cihlar et al., 2002] that outlines the requirements for an integrated carbon observing system. Cihlar et al. [2002] identify three reasons why better estimates of, particularly terrestrial, CO2 sources and sinks are needed: the “policy imperative” of the Kyoto protocol and other international agreements, “improved knowledge of the carbon cycle,” and “information on the biosphere to support sustainable development and resource management.” They also acknowledge that current observations (both terrestrial and atmospheric) and analysis methods are insufficient to quantify the CO2 sources and sinks with the necessary accuracy to meet these needs.

[3] One of the current analysis methods listed by Cihlar et al. [2002] is atmospheric inversion. This method uses atmospheric measurements of CO2 at a network of sites to estimate CO2 sources and sinks by modeling atmospheric transport. Inversion methods have been extensively developed over the last couple of decades progressing from two- to three-dimensional transport models, from annual-mean to fully time-dependent cases, and with improving uncertainty analysis. However, the results remain limited by the available data and model deficiencies, such that inversions can only estimate sources for large regions (typically about 10–20 globally) and with substantial uncertainties [Cihlar et al., 2002]. This can be illustrated by results from an inversion intercomparison, TransCom. They found, in an annual-mean inversion for 22 regional sources, that source uncertainties related to limited CO2 data ranged from 0.2–1.1 Gt C yr−1 while source uncertainties related to model transport ranged from 0.1–0.8 Gt C yr−1 [Gurney et al., 2002]. The uncertainties were often of similar magnitude to the source estimates themselves.

[4] Current inversions rely on CO2 concentrations provided by flask sampling. Flasks are collected at approximately 100 locations worldwide by a number of laboratories. The samples are usually taken under so-called “clean-air” or “baseline” conditions. The aim is for the air sample to be representative of large-scale air masses and not nearby (terrestrial) sources and sinks. For this reason the network is biased towards sites that sample marine boundary layer air. Cihlar et al. [2002] propose that future inversions will also need to incorporate other CO2 measurements, such as those from continuous analyzers or proposed satellite instruments. They recognize that development of the analysis methods and models is required in order to make the best use of these additional data. This study is intended as a first step in that methodological development.

[5] Global CO2 inversions to date have used monthly mean, or less frequent, concentration data, where the monthly means are determined from a smooth curve fit to baseline (usually flask) data, such as those the GLOBALVIEW-CO2 [2000] data set provides. There are a couple of reasons why we might anticipate that more frequent data from a continuous analyzer would provide the inversion with new information. The first potential advantage is that high time resolution data may be expected to have greater signal strength. Since atmospheric transport appears highly diffusive on timescales longer than the dominant synoptic timescale (about a week in the extratropics), monthly mean concentration observations see diluted signals from source regions. This problem is compounded when baseline sampling protocols avoid air with recent contact with potentially important source regions. Thus, flask samples intentionally remove part of the signal that the inversion may be able to interpret while high frequency data include it. The other advantage is the use of atmospheric flow variations as a differential sampling tool. If we attempt to use monthly mean concentrations to estimate monthly mean sources, we are left using the average transport field for that month; that is, we are really sampling an average source field only once. With higher time resolution we may sample different regions at the one measurement location depending on the wind regime.

[6] Continuous CO2 measurements have been collected at a limited number of sites for some decades and there are now about 20 continuous analyzers worldwide operated by different groups. Their deployment has been limited because the instruments are expensive to operate and require frequent human intervention. This is largely due to the quantities of well-characterized calibration gases required to maintain the stability of the concentration measurement over time. The data from these instruments are sometimes incorporated into CO2 inversions but only as monthly means derived from baseline selected measurements.

[7] In addition to incorporating current nonbaseline data into inversions, new continuous measurement systems are now becoming available that promise the capacity for extremely precise measurement and greatly reduced consumption of calibration gases [Da Costa and Steele, 1997, 1999]. Such instruments expand the possibilities for measurement sites since they are more capable of remote deployment. An example of data from such a prototype instrument system operating at Cape Grim, Tasmania (40.7°S, 144.7°E) is shown in Figure 1. While data are available at 1-min frequency, we have plotted 4-hourly data for easier comparison with the model-generated data used in our inversion tests (see Figure 3). We also only show data for those periods when the hourly average wind speed is greater than 5 m s−1 so as to avoid those periods when the data are dominated by local CO2 fluxes. The measurements show a small seasonal cycle and positive trend in the baseline data and large (5–10 ppm) positive deviations from baseline concentrations. Negative deviations are less common and mostly occur in late winter and spring (August–September). Given the increasing availability of this type of data, it is timely that we consider the ways in which continuous data can be used effectively for global inversions.

Figure 1.

Four-hourly CO2 concentrations (in ppm) averaged from 1-min observations at Cape Grim, Tasmania (40.7°S, 144.7°E) during 2001. Data are excluded when the wind speed is below 5 m s−1.

[8] Nonbaseline data are already being used on regional scales. Some of this work is reviewed by Enting [2002]. Many of these studies have used “tracer ratio” techniques where one tracer, with a known source, is used to calibrate sources of another tracer on the basis of the ratio of their concentrations or more commonly the ratios of deviations in concentration above the baseline level. This technique is common in urban pollution studies. On the larger scales that we consider, most such studies have used data from either Cape Grim, Tasmania [e.g., Wang and Barret, 2001] or Mace Head, Ireland. In each case there are adjacent ocean regions from which “background” concentrations are obtained and, in other directions, regions with agricultural, industrial, and natural fluxes. Regional studies that are closer to the “synthesis” approach used in global inversions are those of Stijnen et al. [1997], Seibert [2000], and Ryall et al. [2001].

[9] We follow a synthesis inversion approach here, estimating sources and their uncertainties from model-generated concentration data. This follows previous network design studies by Rayner et al. [1996] and Gloor et al. [2000], who examined how best to extend the flask-sampling network. Gloor et al. [2000] also considered airborne sampling. More speculatively, Rayner and O'Brien [2001] investigated the potential impact of monthly mean column-integrated CO2 amounts uniformly tiling the globe. This was intended as some measure of the potential for space-based measurement.

[10] As a first step in testing new inversion techniques, the use of synthetic data has a number of advantages. Since the data are generated from known sources, any bias in the inversion estimates can be assessed. New data can also be added to the inversion for locations where monitoring does not currently exist, or in the case here, does not exist at high temporal frequency. Reductions in estimated source uncertainty will indicate how useful the new data would be. We use the same transport model to both generate the synthetic data and to perform the inversion, which means that transport can be considered perfect. This is an advantage in these initial tests as it allows us to clearly identify (and hopefully solve) other difficulties with the inversion. It is also clearly a limitation that needs to be addressed in the future, since the requirement to accurately model synoptic or shorter timescale concentration variations will place new demands on global atmospheric transport modeling. A further limitation of our tests here is that the sources that we use to generate the synthetic data do not vary on timescales less than a month. This neglects such important features as the large diurnal cycle in land fluxes. Nevertheless, the first step remains important; until we test whether we are able to successfully invert monthly varying sources using high frequency data, we are right to be cautious about including sources with submonthly variability. The inclusion of this variability will be an important topic for future work.

[11] The outline of the paper is to consider a series of extensions of the usual monthly mean inversion framework. In section 2 we briefly sketch the general form of these inversions. Section 3.1 compares inversions with monthly mean and high time resolution data in cases using small and moderately large observing networks. Despite the simplifications of the synthetic data test, biases occur in the estimated sources principally due to “aggregation” errors. This type of error was demonstrated by Kaminski et al. [2001] for monthly mean inversions. We test a possible solution for these errors using Australia as an example. Section 3.3 presents the more spatially detailed description of Australian fluxes, made both possible and necessary by the inclusion of the high frequency data. In the appendix, we address an additional problem posed by the nonlinearity of transport model advection codes.

2. Method

[12] The general form of our experiments is to use the technique of synthesis inversion to attempt to recover a set of known sources from concentrations. The concentrations were generated by running these known sources through the same transport model used in the inversion so that errors in transport itself are eliminated. Here we use the MATCH transport model [Rasch et al., 1997] in the same configuration as used by Law and Rayner [1999].

2.1. Synthesis Inversion

[13] The technique of synthesis inversion has been used in many studies for estimating surface sources of CO2, [e.g., Enting et al., 1995; Bousquet et al., 1999]. The background of the technique is described by Enting [2000, 2002]. Briefly, we divide the Earth's surface into a number of regions for which we wish to estimate fluxes. We similarly divide our study period into a number of time periods, usually months or years. We term these flux distributions within these regions, in space and time, basis functions. Basis functions may often have internal structure reflecting, for example, positions of deserts versus forests or (in time) the shape of the seasonal cycle.

[14] Each basis function is used as a flux boundary condition for an atmospheric transport model and we sample the generated concentration fields at chosen sampling locations. The sampled concentrations are termed the response function corresponding to a given basis function. The linearity of concentrations as a function of sources implies that the linear combination of response functions that most closely matches the observed concentration field corresponds to the best estimate of the sources. We can hence use linear least squares fitting to find the magnitudes for each basis function.

[15] Linear least squares fitting involves minimizing the distance between the linear combination of response functions and the observed concentration field (arranged as a vector). Since not all concentrations are equally well determined, or can be equally well modeled, we weight each datum by a corresponding uncertainty. Also, we can include prior knowledge of the source fields by choosing initial magnitudes for each basis function, again weighted by a corresponding uncertainty.

[16] As well as returning best estimates for the basis function multipliers, the linear fit also returns their uncertainties in the form of a covariance matrix. One important corollary of the linear fitting we use, is that these estimated uncertainties are not a function of the data themselves, only of the prior source uncertainties, the data uncertainties and the response functions. This property has been previously used by, for example, Gloor et al. [2000] and Rayner and O'Brien [2001] to study the properties of potential new measurements to constrain source estimates.

2.2. Problem Specifics

[17] In order to set up a specific synthesis inversion experiment, we need to choose basis functions, prior source estimates (with uncertainties), and data (with uncertainties). Since we are performing synthetic data experiments to assess the value of potential data sources, it is important to make choices reflecting a real situation. In particular, we should not insert too much knowledge about the correct answer.

[18] Of the choices, the most troublesome is the basis functions. This is because the main cost of synthesis inversion is calculating a response function for each basis function since this involves a run of an atmospheric transport model. The model needs to be run for long enough so that all important signatures of the basis function in the concentration field have been captured, characteristically until concentrations have come to equilibrium, i.e., 3–4 years for tropospheric measurements. For 22 spatial regions and monthly time resolution, the required 264 atmospheric transport model runs can be quite demanding. Worse yet, Kaminski et al. [2001] have shown that there can be a serious bias caused by using too few regions. In general one should use as many basis functions as possible. This will lead to higher uncertainty (there are more degrees of freedom) but one can always combine regions afterwards and hopefully reduce the resulting uncertainty.

[19] In this paper we perform inversions at monthly time resolution; that is, we estimate monthly mean sources. Spatially, we use two resolutions. The first uses the same 22 basis functions (11 over land and 11 over ocean) as used in the TransCom3 model comparison [Gurney et al., 2002] while the second includes the subdivision of one region, Australasia. The 22 basis functions are shown in Figure 2a. Each ocean basis function has uniform flux within its region and regions are contiguous. Land basis functions are structured internally according to the net primary productivity (NPP) simulated by the CASA model of Potter et al. [1993] and Field et al. [1995] as used by Randerson et al. [1997]. This structure reflects knowledge we would use in a real situation. It does not enforce a correct answer since the pseudodata we use are constructed using fluxes of net ecosystem production that differ in structure from month to month and are always different from the NPP basis functions.

Figure 2.

Distribution of regions and sites used in the inversions. Islands are generally part of the nearest mainland region: (a) 12 sites and region names, and (b) 84 sites. In the 84-site case, vertical profiles are represented by only one dot.

[20] We require prior estimates of sources and their uncertainties. Here we use prior source magnitudes of zero for all regions and prior source uncertainties of 10 Gt C yr−1. This very loose prior estimate both avoids biasing the answer too much and allows enough freedom to match the large seasonal cycle over some land regions. The total annual source is constrained to match the global growth rate of the data being inverted. This is similar to the explicit specification of an atmospheric growth rate in inversions using real data.

2.3. Pseudodata and Uncertainties

[21] The synthetic data, or “pseudodata,” that we invert are from a forward model simulation that is forced with the fossil, neutral biosphere, and ocean source fluxes used in TransCom3. The fossil emissions are constant in time while the biosphere and ocean fluxes vary with month. The fossil emissions are from Andres et al. [1996] and total 5.812 g C m−2 yr−1 globally. The biosphere emissions are from the CASA model as used by Randerson et al. [1997]. The ocean emissions are from the air-sea fluxes derived from pCO2 measurements by Takahashi et al. [1999].

[22] The forward simulation ran for 4 years, and concentration data from the fourth year are used for the inversion. Data were saved at 4-hourly resolution at the 228 locations specified in the TransCom3 experimental protocol [Gurney et al., 2000]. Monthly mean concentrations were saved for all model grid points. We compare inversions using 4-hourly data with inversions using monthly mean data; for each site, 12 data points are input to the inversion in the monthly case compared to 2190 data points in the 4-hourly case.

[23] We perform test inversions with a small network of 12 sites and a large network of 84 sites. The 12 sites were chosen to be spread reasonably evenly across the globe, and the 84 sites were all the GLOBALVIEW-CO2 [2000] sites for which model data had been saved at high time frequency. The two networks are shown in Figure 2. Note that a number of the observing sites have been moved offshore according to the TransCom3 protocol, to approximate the baseline air measured by the flask network. The shift of the sites is not important in this context as the data we invert are for the same locations as the responses.

[24] Cape Grim is one of the sites that has been moved offshore. Nevertheless it is interesting to compare the pseudodata for this location shown in Figure 3 with the observations shown in Figure 1. The same basic features are seen in the pseudodata as in the observations. The seasonal cycle and trend in the baseline data are reasonably well represented though the pseudodata seasonal cycle is a little larger than observed. The nonbaseline deviations in the pseudodata are much smaller than in the observations, which is to be expected since the sampled grid cell is much farther from land than in reality and Tasmania is also not represented in the model. The negative deviations occur at approximately the correct time of year, which is pleasing.

Figure 3.

Four-hourly modeled CO2 concentration (in ppm) at a grid cell offshore from Cape Grim from a simulation with fossil, neutral biosphere, and ocean CO2 emissions.

[25] The inversion method, used here, requires that a data uncertainty be applied to each data value; how to choose the appropriate data uncertainty is not obvious. Early synthesis inversions tended to use constant data uncertainties in time and space, usually taking values of about 0.3–0.6 ppm, somewhat larger in magnitude than measurement precision. More recently, we have come to understand that the data uncertainties need to reflect not only the measurement precision but also our ability to model point data using coarse-grid transport models and even coarser descriptions of sources. Errors of this latter type can dominate measurement uncertainties [Kaminski et al., 2001]. Thus, more recent inversions have chosen data uncertainties that vary spatially, often derived from the residual standard deviation of flask samples around a smooth curve fit. This gives, for example, higher values for Northern Hemisphere, continental sites and lower ones for remote, southern sites. Data uncertainties applied to annual (or longer) means have typically been 0.3–2 ppm [e.g., Gurney et al., 2002]. Somewhat larger values would be applied to monthly mean data, for example, Bousquet et al. [2000] set a minimum of 0.5 ppm but they do not report an upper value. The sensitivity of annual-mean inversions to the choice of data uncertainty has recently been explored as part of the TransCom3 intercomparison (manuscript in preparation) but similar testing with monthly mean inversions has not yet been done.

[26] Given this evolving understanding of data uncertainty choice, but with no previous applications to high frequency data, we test here both constant and spatially variable data uncertainties. For the constant data uncertainty we use a value of 2 ppm for the 4-hourly data. The choice of this value is arbitrary but was intended to be indicative of how well we might hope to be able to model a 4-hourly concentration. Assuming (optimistically) that the uncertainties are independent in time, then the data uncertainty will decrease as the 4-hourly values are averaged over longer periods, i.e.,

equation image

where σ1 is the data uncertainty applied to the 4-hourly data and σn is the uncertainty for the average of n data values.

[27] For monthly data, the uncertainty is consequently reduced to 0.15 ppm. This is a lower uncertainty than one would normally use for a monthly mean inversion (as indicated above). However, we chose this form (of uncertainty reduction with increasing n) since we were interested in comparing the sampling characteristics of high- and low-frequency data. If we used a more realistic uncertainty for the monthly mean, then the high-frequency data would be favored not because of its characteristics but simply because of its greater density in time.

[28] For the variable data uncertainties, the uncertainty applied to the 4-hourly data was based on the short timescale variability in the pseudodata records with which the inversion are forced. A running mean, calculated over 59 4-hour time steps (approximately 10 days), was removed from the pseudodata record at each site. The standard deviation of the remaining concentration was calculated, and the uncertainty used was one plus this standard deviation. The addition of the value one is an arbitrary choice but follows the practice in annual- and month-mean inversions of setting a minimum data uncertainty of 0.2–0.5 ppm. The minimum is understood as incorporating modeling limitations as well as possible intercalibration uncertainties between sites. The resulting data uncertainties ranged from 1.07 ppm at South Pole to 3.64 ppm at Black Sea (46.0°N, 33.8°E) with a mean data uncertainty across the 84-site network of 1.89 ppm, similar to the 2 ppm used in the constant uncertainty case.

3. Results and Discussion

[29] Three sets of results will be presented. In the first set, we compare inversions using small and large networks with monthly mean and 4-hourly data and constant data uncertainties. In the second set, we test whether the 84-site, 4-hourly data inversion can be improved by using smaller prior source uncertainties or variable data uncertainties or by removing individual sites. In the final set, we demonstrate the value of subdividing regions, using the Australian region as a test case.

[30] To assess the inversions, we define two measures of inversion quality, the root mean square bias (RMSB) and the root mean square uncertainty (UNC). The first is a measure of how close the estimated sources for a given region are to the correct sources across all months,

equation image

where Cn is the correct source for month n, En is the estimated source and N is the number of months. The second is derived from the uncertainties on the monthly source estimates generated by the inversion,

equation image

where Un is the uncertainty for month n. This uncertainty gives an indication of how well a region is constrained by the data being inverted. Using these measures a good quality inversion requires RMSB to be smaller than UNC and UNC to be as small as possible.

3.1. Small Network Versus Large Network

[31] Figure 4 shows the RMSB and UNC for each region from inversions with 12 and 84 sites and at 4-hourly and monthly data frequency. Land regions are ordered approximately north to south in the top half of Figure 4 and ocean regions are in the bottom half of Figure 4.

Figure 4.

Root mean square bias (RMSB) and uncertainty (UNC) for four inversions, 12 sites with monthly data (dashed line), 12 sites with 4-hourly data (solid line), 84 sites with monthly data (bold, dashed line), and 84 sites with 4-hourly data (bold, solid line). The units are Gt C yr−1 and the region names correspond to those of Figure 2.

[32] Except for the 12-site monthly case, the RMSBs are similar to or larger than the uncertainties. In the 12-site, monthly case, UNC is large for many regions, particularly in the tropics, indicating that at monthly time resolution, there are inadequate sites in the 12-site network to provide a meaningful constraint on all 22 regions in the inversion. There is a significant reduction in uncertainties when 4-hourly data rather than monthly data are used in the 12-site case. This indicates that the inversion is able to retrieve useful information from the high temporal resolution data, effectively constraining more regions than there are sites. Uncertainties are also reduced when moving from monthly to 4-hourly data in the 84-site case, but the reduction is smaller.

[33] The source uncertainties in the 84-site monthly case are smaller than would normally be expected from inversions with monthly mean data. This is due to the small data uncertainties (around 0.15 ppm) that we have chosen to use with the monthly mean data in order to make a fair comparison with the 4-hourly data. The small source uncertainties in the 84-site monthly case are accompanied by large biases in the source estimates. This case gives the largest RMSB for almost all regions, with very large RMSB for tropical land regions due to noisy source estimates that swamp the correct seasonal cycle of sources.

[34] The reason for the large biases in the source estimates is the coarse resolution (only 22 regions) at which the inversion is performed. The distribution of sources within each region has to be assumed and if this is different from the correct source distribution then biased source estimates can result. Kaminski et al. [2001] showed, for monthly mean inversions, that the bias can be large but that it is reduced if larger data uncertainties are used. This explains why the biases are particularly bad for this monthly case since it used very small data uncertainties. The problems are less serious in the 12-site case because some regions are so poorly constrained by the small network that the source estimates remain close to the prior estimate of zero.

[35] In the 4-hourly data cases, the largest biases (up to 4 Gt C yr−1) are also found for the tropical land regions. The 84-site case mostly gives better results over land while the 12-site case is better for tropical ocean regions. Overall, the larger RMSB than UNC for most cases and most regions is a major problem; the inversion “thinks” it can produce better estimates than it is actually achieving.

3.2. Improving the Inversion: Large-Scale Options

[36] Three tests have been performed in an attempt to improve the inversion results. Modifications are made to the inversion using the 84-site network with 4-hourly data. The first modification is to reduce the prior source uncertainties especially for the ocean regions. An inversion has been performed using prior source uncertainties of 2 Gt C yr−1 for ocean regions and 5 Gt C yr−1 for tropical and southern land regions. Northern land region uncertainties were unchanged, remaining at 10 Gt C yr−1. The impact on the source estimates was very small, with an average decrease in RMSB of only 2.5%. The largest decrease in RMSB was 11% for South America. Thus, reducing the prior source uncertainties does not significantly improve the inversion in this case.

[37] The second modification to the inversion is to the data uncertainties. While measurement precision is likely to be similar across the observing network, our ability to model those data is more variable. For example, sites that are remote from CO2 sources should be easier to model than those close to large and seasonally changing sources. We can reflect this variation by assigning variable data uncertainties to each site (as described earlier). Figure 5 compares the RMSB and UNC for the 84-site, 4-hourly data inversion using constant (as in Figure 4) or variable data uncertainties. Note that the x range is smaller than in Figure 4. There are small reductions in the uncertainties when moving to variable data uncertainties. The reduction is mostly in the Southern Hemisphere regions, which is expected since most Southern Hemisphere sites have smaller data uncertainties in the variable case compared to the constant case. The variable data uncertainties have a large impact on the RMSB for some regions, particularly tropical America, while for other regions the RMSB is virtually unchanged. For most regions the biases remain larger than the uncertainties.

Figure 5.

RMSB and UNC for inversions with 4-hourly data: 84 sites with constant data uncertainties (solid line), 84 sites with variable data uncertainties (dashed line), and 83 sites (no Cape Rama) with variable data uncertainties (dotted line). The units are Gt C yr−1.

[38] The large biases, particularly in the tropical regions, often occur due to very bad estimates in only 1 or 2 months of the year. For example, in the tropical Asian region, the October source is incorrectly estimated by −8 Gt C yr−1. To determine whether any of these biases could be attributed to the inclusion of a particular site in the inversion, a set of inversions were performed in which each site was dropped from the 84-site network in turn. The site that produced the largest differences when removed was Cape Rama, India (12.6°N, 73.1°E). The RMSB and UNC for this case is also shown in Figure 5. This shows that the source estimate for temperate Asia is worse when Cape Rama is removed but that the estimates for tropical Asia and southern Africa are significantly improved. Improvements are also seen for the tropical ocean regions nearest to Cape Rama.

[39] This case illustrates the difficulties inherent in the inversion as it is currently set up. The biases occur because the spatial distribution of sources within a region is different between the sources that we are trying to estimate and those that were assumed in generating the response functions. Cape Rama is sited at the southern boundary of our large temperate Asia region. This region also has a number of sites located in and just outside of the eastern end of the region. The problems associated with a large region constrained by widely separated sites becomes apparent in the October source estimates.

[40] In this month, the CO2 flux that we are trying to estimate is a sink in the Indian part of the temperate Asia region but a source in the eastern part of the region. The spatial distribution used for the response function does not allow for fluxes of different signs. With more sites around the eastern end of the region, the inversion finds a source for the temperate Asia region and tries to match the low pseudodata (due to the local sink) at Cape Rama by inserting a sink into a nearby more weakly constrained region. This results in a large sink being placed in tropical Asia since a larger, more remote sink can have a similar impact at Cape Rama as a small local sink. The −8 Gt C yr−1 bias noted above for tropical Asia is due to this large sink.

[41] Global constraints on the total source also mean that the large sink in tropical Asia requires compensation elsewhere. The compensation will tend to occur in other weakly constrained regions, in this case predominantly southern Africa which in October shows an overestimated source by about 6 Gt C yr−1. Thus, when the inversion is run without Cape Rama, both the tropical Asia and the southern Africa regions are improved with the October biases dropping to −1 and 2 Gt C yr−1, respectively. It is worth noting, though, that the temperate Asia region estimates across the whole year are worse without the Cape Rama data, indicating that the site does provide useful information for its local land region even while causing problems elsewhere. It is evident that the temperate Asian region is too large and should be split for inversions with this type of network. In general, it would seem unwise to have areas with different source seasonality within the same region.

[42] The dominant cause of bias in these inversions is due to the within-region spatial distribution differences. However, we have found that even if these are correct, small RMS biases (up to 0.6 Gt C yr−1) still persist due to the nonlinearity of the advection algorithms used in most transport models. An example inversion and an iterative solution to the problem is given in the Appendix.

3.3. Improving the Inversion: Australian Subdivision

[43] As in the case of temperate Asia above, Australasia is also a region that has quite different seasonality for various areas across the region. While the total source from this region is small and hence its RMSB is relatively small compared to other regions, we would still consider the inversion for this region to be poor. This is shown in Figure 6 for the inversion case above using the network without Cape Rama. Only in June and July do the correct sources lie within the uncertainty ranges of the estimated sources. Given that there are a number of sites in the vicinity of this region (Figure 7), it provides a good test to see whether the inversion method can be extended, both to correct the bias on the total region and to glean some subregional information. The expectation that this is possible is seen in Figure 8 which shows the pseudodata for four sites around Australia and the fit by the inversion to the data. It is clear that there is additional information in the psueododata timeseries at the sites around the Australian region which the inversion is unable to fit. For example, the inability at Cape Grim to get the magnitude of the non-baseline deviations correct, suggests that there should be larger seasonal sources and sinks in South East Australia. At Darwin the consistent underestimation of the data between September and February suggests that the seasonality of sources in northern Australia should be different from southern Australia. To enable the inversion to fit all records (and thus make use of the information contained in them), it is necessary to subdivide the Australian region.

Figure 6.

Monthly estimated (dash-dotted) and correct (bold, solid line) sources for the Australasian region in the 83-site, variable data uncertainty inversion. The shaded region indicates the uncertainty on the sources as given by the inversion. The units are Gt C yr−1.

Figure 7.

Australasian grid cells and nearby sites (labeled dots). The letters indicate grid cells whose source estimates are presented in Figure 11. The cross indicates the location of the additional site referred to in section 3.4.

Figure 8.

Concentration data input to the inversion (shaded line) and fitted by the 22-region, 83-site inversion (black line) for four sites marked in Figure 7: (a) Darwin, (b) Cape Ferguson, (c) Cape Grim, and (d) Baring Head. Note that the y range is different between the top and bottom panels. The units are ppm.

3.3.1. Extension to Inversion Method

[44] The Australasian region is divided into its 45 constituent grid cells, 44 over mainland Australia and 1 for New Zealand. The grid cells are 5.625° longitude by approximately 2.8° latitude and are shown in Figure 7. A 1 Gt C yr−1 source is run forward through the transport model for each grid cell and month to create the concentration responses. To reduce the computational cost, the concentration responses are only run for 3 months compared to the 3 years for the original 22 regions. These short responses are possible because when the inversion is performed, we invert for both the individual grid cells and the original total Australasia region. Atmospheric transport acts differently on the larger Australasian region and the individual gridpoints. Transport is sufficiently diffusive for these single gridpoint sources that their concentration signatures are hard to differentiate two months after the pulse is turned off. Thus any information about localizing sources to individual grid cells will come from data nearly contemporary with the source. We do need to account for the contribution to background atmospheric gradients somewhere, and this information is present in the response to the larger Australasian region. We wish to test whether this method, using 67 regions (the 22 original regions plus the 45 grid-cell regions), produces a better estimate of the total Australasian source as well as whether this inversion can provide usable subregion information.

3.3.2. Choice of Source Prior Uncertainties

[45] We present here three inversions using different source prior uncertainties for the Australian grid cells. In the first case, the source priors are 0 ± 2 Gt C yr−1. In the second case the uncertainty is reduced to 0.4 Gt C yr−1. In the final case, variable uncertainties are used based on the NPP distribution used in the total Australasian region. Individual grid cell uncertainties range from 0.02 Gt C yr−1 in desert areas to 0.45 Gt C yr−1 in biologically active regions, and the sum of the 45 grid-cell uncertainties is 5 Gt C yr−1. The other elements of the inversion are the same as the previous case, i.e., 83 sites (no Cape Rama) with 4-hourly data. Prior source uncertainties for the 22 large regions remain at 10 Gt C yr−1.

[46] The total Australasian source estimated by the inversion is the sum of sources estimated for the large region and each of the individual gridpoints. This total Australasian source is shown in Figure 9 for the three inversions. In each case, the source estimates in almost all months are improved compared to the inversion without the continental subdivision (Figure 6). The inversion with the variable uncertainties provides the best estimate; in all months except February the correct source lies in or very close to the estimated source range shown by the shaded region. The RMSBs for the variable, 0.4 and 2.0 Gt C yr−1 uncertainty cases, respectively, are 0.17, 0.27, and 0.43 Gt C yr−1, which compares with an RMSB of 0.62 Gt C yr−1 in the 22-region case. UNC is increased compared to the 22-region inversion value of 0.09 Gt C yr−1, to 0.23, 0.31, and 0.41 for the variable, 0.4 and 2.0 Gt C yr−1 uncertainty cases. The increase in uncertainty is due to the greater choice of regions that a source could be placed in to fit a particular data set. The larger number of regions (relative to the number of data sites) is also the reason why the prior source uncertainty now provides some constraint on the inversion compared to the earlier result in the 22-region case where the prior source uncertainty had little impact. It is pleasing that in each of the three 67-region inversions UNC is greater than or comparable to the RMSB, showing that the uncertainty generated by the inversion is now a more realistic estimate of the accuracy of the source estimates.

Figure 9.

Monthly correct and estimated sources for the Australasian region in the 67-region inversion with different cases of prior source uncertainty. The line identification is given in the key. The shaded region, indicating the uncertainty on the sources estimated by the inversion, is only shown for the variable prior source uncertainty case. The units are Gt C yr−1.

[47] Figure 10 shows the data fit for the inversion with variable prior source uncertainties at the four sites shown in Figure 8. There is a vast improvement in the inversion's ability to fit the data at these four sites with the greater flexibility that the extra regions allow.

Figure 10.

Concentration data input to (shaded line) and fitted by (black line) the 67-region inversion with variable prior source uncertainty for four sites marked in Figure 7: (a) Darwin, (b) Cape Ferguson, (c) Cape Grim, and (d) Baring Head. Note that the y range is different between the top and bottom panels. The units are ppm.

3.3.3. Individual Grid Cell Source Estimates

[48] In addition to improving the total Australian source estimate, useful information is gained for many of the individual grid cells. This is illustrated in Figure 11 for three grid cells close to observing sites and one grid cell distant from the sites. The grid cells are marked in Figure 7. The sources for each of the 67-region inversions comprise the source estimate for the appropriate grid cell plus the contribution for that grid cell from the large Australian region as determined by its NPP weighting.

Figure 11.

Monthly sources, from the 67-region inversion with different cases of prior source uncertainty, for the individual grid cells marked in Figure 7: (a) NT, (b) WA, (c) VIC, and (d) NZ. The sources from the 22-region inversion are also shown for comparison. The line identification is given in the key. The units are Gt C yr−1 and the y range varies between panels.

[49] The source estimate from the original 22-region inversion is also shown and highlights the limitation of that case. The internal division of source within the region is based on NPP so the source magnitude for the individual grid cells varies but the seasonality is fixed. The correct sources show quite different seasonality and so it is clear that inverting for just the total region results in an unsatisfactory compromise. The NZ region shows a dramatic improvement in all of the three 67-region cases. The good result for this grid cell is not surprising since it is isolated from the rest of the region and has an observing site in the neighboring grid cell. The VIC grid cell is a little farther from an observing site but still shows a good match to the correct source for all three 67-region cases. The NT grid cell has very small seasonality and the results are noisy, particularly for the 2 Gt C yr−1 uncertainty case. However, again we find that the seasonality, which was almost completely out of phase, is corrected.

[50] The WA grid cell, which is more remote from the observing sites, illustrates the constraint on the inversion by the prior source uncertainties. In the 2 Gt C yr−1 uncertainty case, the source estimates are very noisy and are compensated for in surrounding grid cells. As the prior source uncertainty is reduced the source estimates improve. While the best estimate remains somewhat different from the correct source, there is some indication in the summer months that the grid cell is deriving some useful information from the data.

[51] The results for all grid points are summarized in Figure 12 for the inversion with variable prior source uncertainties. The RMSB is always smaller than the uncertainty, which is encouraging. The largest biases and uncertainties are in the biologically active parts of the continent. This in part reflects the variable prior uncertainties used in this inversion. Figure 13a shows the posterior uncertainty as a proportion of the prior uncertainty, indicating which grid cells are most constrained by the data. As expected, those cells closest to the observing sites show the largest reductions in uncertainty but the influence of the observations can also extend some grid cells from the site location. The uncertainty reduction in SW Australia is interesting. The prevailing westerly winds at this latitude appear to allow this region to be constrained by the surface and vertical profile data at Cape Grim. It is also worth noting that all grid cells give uncertainty reductions of greater than 50% due to the reduction in uncertainty for the full Australasian region from its original uncertainty of 10 Gt C yr−1.

Figure 12.

Root mean square source bias (RMSB) and root mean square uncertainty (UNC) for each Australasian grid cell in the 67-region inversion with variable prior uncertainty. Smaller values are lighter shades and the units are Gt C yr−1.

Figure 13.

Ratio of estimated UNC to prior UNC for (a) the 83-site inversion and (b) an inversion with an additional site at the grid cell marked by a cross in Figure 7.

[52] There are perhaps two cautionary notes that we should add to this encouraging result. The first is that we are probably fortunate that the area of Australia that is seen least by the data is the desert region. In our best case with variable prior source uncertainties, part of the good outcome is due to the small uncertainties used for these desert regions where there is little data constraint. If it was the biologically active part of Australia that had little data constraint, then the inversion may not have performed as well. The second point to note is that the source estimates for Australia in this 67-region inversion are more sensitive to the inclusion of Cape Rama data than in the earlier 22-region inversion. In the 22-region case, the single Australian region is tightly constrained by trying to fit the local sites. In the 67-region case, these local sites can be fitted using the grid-cell regions, effectively giving more freedom to the large Australasian region to be influenced by other data and global constraints. The result is that the large sink estimated in October for the tropical Asia region when Cape Rama is included, generates a partially compensating source in the large Australasian region that the grid-cell regions are unable to fully correct. This serves as a warning that there are potential problems in subdividing one region while maintaining large regions relatively close by.

3.4. Improving the Inversion: Adding New Sites

[53] In this section we demonstrate the information added to the inversion by one extra measurement site within the disaggregated Australian region. The example is largely illustrative, pointing out the potential value of more continental measurements but also the relationship between atmospheric transport and the regions constrained by such a site. Although meant as an example, our chosen site is a location where a CO2 flux tower is intended to be sited. It is marked with a plus sign in Figure 7. When this site is added to the 22-region inversion, the RMSB for Australia drops from 0.62 to 0.38 Gt C yr−1 while the UNC drops from 0.085 to 0.070 Gt C yr−1. The extra site improves the source estimate but the bias remains substantially larger than the uncertainty. In the 67-region inversion, the addition of the extra site reduces the RMSB from 0.17 to 0.13 Gt C yr−1 and the UNC from 0.23 to 0.19 Gt C yr−1, giving both a more accurate and more certain result. The influence of the site can be seen more clearly in the uncertainties for the individual grid cells, shown as the proportion of the prior uncertainty in Figure 13b. Comparison of this figure with Figure 13a shows a large reduction in uncertainty for the grid cell in which the site is located. Reductions in uncertainty are also found for the area around the grid cell, particularly in the eastward direction. This is consistent with the prevailing winds at this latitude coming from the east.

4. Summary and Future Work

[54] The small uncertainties achieved for inversions using high temporal resolution data indicate that there is great potential in this type of data. The reduction in uncertainties was particularly noticeable for small networks, which is encouraging since there are relatively few locations where continuous measurements are currently available and there will be some time delay before new instruments are deployed. A useful next test for this type of pseudodata inversion will be to run cases which include both high frequency and monthly mean data in configurations similar to current availability. Tests should incorporate issues of intercalibration of records from different laboratories.

[55] Low source uncertainties could also be achieved with larger measurement networks and monthly averaged data. However, with both monthly averaged and high frequency data, these low uncertainties were accompanied by considerable bias in the predicted sources. The bias is due to the mismatch in source structure within our large regions compared to that in the flux distribution we are attempting to recreate.

[56] The bias is reduced when we subdivide a region, thereby avoiding the imposition of an incorrect structure on the sources. The subdivision of Australia showed that sources for the total region could be estimated successfully and that the 4-hourly data also contained information to produce good source estimates at grid-cell scale. Choice of prior source uncertainties became important for the higher spatial resolution. An important next step is to test the effectiveness of subdividing other regions for improving source estimates.

[57] All the inversions presented here use the same transport in both the pseudodata creation and in the inversion. In any real world application we cannot expect a perfect representation of transport even if analyzed winds are used. There is much to be done to investigate how the inversion is degraded by imperfect transport and what strategies might be used to limit the problems. One direction will be to test intermediate time frequencies such as daily or five-daily, for which the magnitude of transport errors might be smaller.

[58] Another important area of future work is to test inversions using synthetic data generated from sources with submonthly variations. Of particular concern is how the large diurnal cycle of terrestrial fluxes will impact on the inversion. As with transport errors, using intermediate time frequencies may be one way forward; another option may be to avoid data from night time when the boundary layer is shallow.

[59] In summary, this potential new data source holds considerable promise and may perhaps be gathered more cost-effectively than potential satellite measurements discussed by, for example, Rayner et al. [2002]. In either surface or satellite data cases, improvements to transport models and inversion algorithms are required to take full advantage of the data. This work has demonstrated that one limitation of current synthesis inversion methods can be overcome by using smaller basis function regions.

Appendix A:: Impact on Inversions of Nonlinear Advection

[60] Synthesis inversion involves fitting observational data with a sum of concentration responses from a number of individual sources. We will refer to this sum of responses as the reconstructed time series. The inversion seeks source estimates to minimize the difference between the reconstructed time series and the observed time series. Most transport models use advection schemes that are nonlinear (due to the desire to be monotonic and at least second-order accurate). This means that the reconstructed time series is different from a time series that would be generated by running the summed sources through the transport model. This is illustrated in Figure A1 for the Cape Grim grid cell. The forward simulation with the summed sources produces a larger seasonal cycle than the time series constructed from the responses for each source. There are also differences in the magnitude of the nonbaseline excursions. These differences between reconstructed and modeled time series mean that errors are consequently introduced into the source estimation.

Figure A1.

CO2 concentration (in ppm) at Cape Grim from a forward model run using a given set of sources (shaded line) and from a reconstruction using the response functions for the same set of sources (black line).

[61] We can use a pseudodata inversion to illustrate the magnitude of error involved. In this case, the pseudodata are generated using sources of the same magnitude as those used previously (fossil plus biosphere plus ocean emissions) but the spatial distribution of the source is NPP over land and flat over ocean. Thus the same within-region spatial distribution is used for both the creation of the pseudodata and for the inversion. Any remaining inability to correctly estimate the known sources must be due to the nonlinearity of the transport model. We invert for 22 regions as before, meaning that 264 sources are used since there is a separate source for each region and each month. The inversion uses data for 84 sites (Figure 2b) at 4-hourly time resolution. Variable data uncertainties are used. The magnitude of the uncertainty is determined from the variability in the concentration data for that location. The RMSB and UNC are shown for each region in Figure A2. RMSBs are up to 0.6 g C m−2 yr−1 while UNC is generally larger. There are three regions for which RMSB is greater than UNC, boreal North America, boreal Asia, and Europe. The biases cannot be attributed just to an insufficiently large network. When the inversion is run with all the 228 locations for which time series were saved, there are reductions in both biases and uncertainties over land but the RMSBs increase over the ocean (Figure A1). The reductions in uncertainty mean that many more regions have RMSB greater than UNC which is undesirable.

Figure A2.

RMSB and UNC for inversions (with correctly shaped basis functions) using 4-hourly data at 84 sites (solid line) and 228 sites (dashed line). The units are g C m−2 yr−1.

[62] Since we expect that the nonlinearity biases will scale with the concentration signals that we are inverting, we test whether an iterative method provides better source estimates. We sum the sources estimated from the 84-site inversion and run this total source through the transport model to generate concentration time series at the 84 locations used in the inversion. We then invert the difference between these time series and the reconstructed time series. The aim is to estimate source corrections that are then subtracted from the initial source estimates to provide new source estimates. The process can be iterated to continue to improve the source estimate. Two iterations are shown in Figure A3. After the first iteration, the RMSBs for all regions are improved, on average dropping by 70%. The largest RMSB is just over 0.3 g C m−2 yr−1 for South America. RMSB is smaller than UNC for all regions. The second iteration shows that the source estimates continue to be improved. The maximum RMSB is now 0.09 g C m−2 yr−1 for Southern Africa.

Figure A3.

RMSB for 4-hourly data from 84 sites for the original inversion (solid line) and two iterations (line styles indicated in key). The units are g C m−2 yr−1.

[63] While we have demonstrated that this iterative method improves the source estimates in this case when the biases are due only to nonlinear transport, it is not clear that iteration will help when other biases are present. The spatial structure biases are currently significantly larger than those due to nonlinearity and biases due to imperfect transport are also likely to be large. Until these biases are reduced, it would seem possible to ignore the nonlinearity biases. However, we should be aware that as we increase the number of basis functions used in an inversion (spatially or temporally), then nonlinear biases may increase and it may become important to use linear advection schemes. When inversions are run with real rather than pseudodata, it is difficult to assess whether biases due to nonlinearity are becoming large. One test that would be possible is to run the sum of all source estimates forward through the transport model to compare modeled time series with those constructed by the inversion. The differences seen may give an indication of the magnitude of any problems.


[64] We acknowledge helpful comments on the manuscript by Cathy Trudinger and Bernard Pak. We thank Darren Spencer and Paul Krummel of CSIRO Atmospheric Research and Stuart Baly of the Cape Grim Baseline Station for their dedicated efforts in maintaining the operation of the prototype CO2 analyzer system at Cape Grim.