Inversion of diurnally varying synthetic CO2: Network optimization for an Australian test case



[1] Hourly CO2 fluxes, generated from a biosphere model applied to the Australian region, are used to produce synthetic CO2 atmospheric concentration data. The concentration data are inverted using a Bayesian synthesis method, to test whether the CO2 fluxes can be successfully retrieved at monthly temporal resolution. The inversion is performed globally (with a base network of about 100 sites), but the tests focus on the Australian continent, subdivided into 12 regions. The inversion is tested using dense networks of approximately 40 new sites in and around Australia. Land-based and offshore networks are compared. The land-based network produces biased source estimates, for regions with large diurnal source variability, if the inversion only solves for a constant flux throughout the month. The bias is eliminated when we solve for two fluxes for each region, a constant monthly flux and a daytime-only flux. The offshore network gives large uncertainties and biases for inland regions of Australia. These latter biases are not significantly improved by solving for the daytime flux. The inversion that includes daytime fluxes is used to design a network to minimize the average annual mean uncertainty for Australian sources. Networks designed using incremental optimization are compared with some reference networks. Site locations are found to be sensitive to the data uncertainty applied to each site. The incremental optimization method appears to be most effective for networks of fewer sites than about half the number of regions being solved for.

1. Introduction

[2] The ability to manage carbon dioxide (CO2) emissions into the future requires a good understanding of present carbon fluxes between the atmosphere and both the ocean and biosphere. One well-established method of estimating these fluxes is to invert atmospheric concentration measurements using a numerical model of atmospheric transport. Given the time and expense involved in maintaining CO2 monitoring networks, it is important both that the inversions make best use of the current data and that any network expansion incorporates knowledge available from inversion studies.

[3] There are around 30 sites worldwide that measure CO2 concentration continuously. The variability of these records, on diurnal and synoptic timescales, contains much information about regional and local CO2 sources and sinks. However, this information has not been included in global CO2 inversions because the source estimation techniques currently being used focus on interpreting monthly mean background concentrations. We are working to extend the estimation techniques to incorporate concentration data with high temporal frequency, and are using synthetic data experiments in which the CO2 sources are known so that the inversion methods can be tested.

[4] Law et al. [2002] (hereinafter referred to as L02) tested whether a Bayesian synthesis inversion method [Enting, 2002] could be used to estimate monthly sources for 22 continental-scale regions with 4-hourly concentration data at around 80 sites. They found large biases in the source estimates even though the inversion was set up with “perfect” data and model transport. They found that the source biases were due to aggregation errors, caused by solving for large regions with “incorrect” spatial source distributions and that the biases could be significantly reduced by subdividing the continental-scale region into its constituent grid cells. This was implemented for an Australian test case. Law et al. [2003] used this same Australian test case to evaluate the inversion in cases where the data or the model transport were imperfect. They found that imperfect data, for example due to calibration differences, could be accommodated more easily by the inversion when more frequent data (daily or sub-daily) were used. Imperfect model transport was most easily accommodated with data at synoptic (2 to 5 day) intervals.

[5] Both of these studies, while promising, simplified the estimation problem by generating the synthetic concentration data from monthly averaged sources. Since any real sources will have both diurnal and synoptic variations, this simplification needs to be addressed. An analogous situation has been previously encountered in inversion studies, when monthly concentration data were used to estimate annual average sources. Peylin et al. [2002] found, in a series of inversions, that more outlying estimates occurred when annual fluxes were estimated from monthly data than when the temporal resolutions of the data and fluxes were the same. They concluded that the use of fixed seasonal flux patterns made the inversion highly sensitive to the data in particular months, with the possibility that the resulting flux estimates were biased.

[6] It is conceivable that similar biases could occur here, particularly with the large diurnal cycle of land biosphere fluxes. It would be useful to determine how significant any biases are and how they might be minimized, either through the choice of sampling location or through extensions to the inversion method. These questions are addressed, again in the context of an Australian test case, as the first part of this paper (section 3). The second part of the paper uses the findings on inversion set-up to investigate network design questions (section 4). Our aim is to evaluate where sites are best located in order to minimize the uncertainty on annual mean flux estimates for 12 subregions of Australia. From this specific case, some general conclusions about network design are discussed.

2. Method

[7] Two series of inversions are performed using model-generated concentration data to retrieve a known set of CO2 fluxes. The first set of inversions are used to assess the inversion method. The estimated sources and their uncertainties can be compared to the known fluxes to assess whether the inversion successfully retrieves the sources. While the inversion of model-generated data is valuable for assessing the inversion method, we should be aware that it simplifies the problem compared to inverting real data because it assumes that we are able to perfectly model atmospheric transport. In reality, the relatively coarse resolution transport models in current use may have difficulty simulating the observed concentrations, particularly for continental sites.

[8] The second set of inversions is used to address network design questions. In this case, the aim is to minimize the uncertainty on the estimated sources. We present here aspects of the method that are common to both the sets of inversions, while details of the actual experiments performed will be covered in sections 3 and 4.

2.1. Fluxes

[9] The CO2 fluxes that are used to create the synthetic concentration data represent fossil [Andres et al., 1996], biospheric (CASA [Randerson et al., 1997], except for Australia), and ocean [Takahashi et al., 1999] fluxes. These are the same monthly resolution fluxes as used by L02. In the Australian region the CASA biosphere fluxes are replaced with fluxes generated by the CSIRO Biosphere Model (CBM) [Wang and Leuning, 1998].

[10] CBM is part of the land surface scheme in the CSIRO Atmospheric Research Limited Area Model (DARLAM). DARLAM was run for the Australian region at 75-km resolution for 1989 to 1998 [Wang and McGregor, 2003]. The integration time step in DARLAM is 900 s. DARLAM was initialized using interpolated ECMWF data. The interpolated ECMWF data were also used to update the model twice daily (0000 and 1200 UTC) at the boundaries and once per day in the interior (0000 UTC). Carbon fluxes were archived at every time step. The prescribed canopy leaf area index was derived from remotely sensed data and varied monthly. Two respiration parameters (non-leaf plant respiration at 20°C and soil respiration at 20°C) and optimal soil moisture varied monthly and spatially. All other physical and physiological parameters were kept constant for each vegetation or soil type (see Wang and Barrett [2003] for further details). Diurnal and synoptic variability in CO2 fluxes at a point were driven by meteorological variables such as incoming solar radiation, air temperature, and precipitation.

[11] Hourly CO2 fluxes for 1997 were taken from the DARLAM simulation and aggregated to the transport model grid (5.6° × 2.8°). A 3-year forward simulation was run using the CRC-MATCH transport model [Law and Rayner, 1999] to create the synthetic concentration data. Figure 1 shows the standard deviation of the flux in each transport model grid cell. This measure of the flux variability is dominated by the diurnal cycle. High variability occurs in the northern, tropical part of the continent and along the east coast.

Figure 1.

Standard deviation of hourly surface CO2 flux for each transport model grid cell over Australia. The units are MtC yr−1.

2.2. Standard Inversion and Prior Sources

[12] In our standard case, the Bayesian synthesis inversion method [Enting, 2002] is used, solving for monthly fluxes from 116 regions globally (Figure 2). Australia is divided into 12 regions with three to five model grid cells in each region. These relatively small regions across Australia are used in order to minimize aggregation error (a significant problem in the work of L02 until the Australian region was subdivided into its individual grid cells). The basis functions are monthly source pulses for each region; that is, a spatially and temporally constant source of 1000 MtC yr−1 is input to the atmosphere during the chosen month and then the simulation continues with no input source. Forward simulations are run of 1 year's duration, for each region and for source pulses from each month, to create response functions at each observing site (or potential site). The inversion seeks the best fit of these 1402 (116 × 12) response functions to the concentration data, within the constraints provided by the prior source estimates and uncertainties. Zero prior source estimates are used with two alternative choices for the prior source uncertainties.

Figure 2.

Regions for which fluxes are estimated and locations of concentration data for (a) base network (stars: monthly data, dots: 4-hourly data) and (b) Australian land (solid squares) and offshore networks (open circles).

[13] The first choice, “variable,” sets the prior source uncertainties based on the largest “correct” monthly flux for each region, specifically, to twice the maximum absolute value monthly flux. Given the large seasonal cycles over some regions, this form of uncertainty only provides a weak constraint on the solution, though it does provide information about the relative variability of each region. For the Australian subregions (labeled in Figure 2b), the monthly prior source uncertainties range from 32 MtC yr−1 for the S region to 392 MtC yr−1 for the C1 region. The prior source uncertainty for the Australian continent is 720 MtC yr−1. The second choice of prior source uncertainty, “constant,” maintains the same uncertainties for non-Australian regions but sets the Australian subregions to a constant flux uncertainty of 1 kgC m−2 yr−1. The integrated Australian regional uncertainties vary by region size and range from 472 MtC yr−1 for the SE region to 903 MtC yr−1 for the NW region. For some regions, the “constant” uncertainties are an order of magnitude larger than the “variable” ones. The prior source uncertainty for the Australian continent is 3188 MtC yr−1. (Note that these uncertainties, expressed as a rate per year, are for monthly sources, and annual mean prior uncertainties are smaller by a factor of equation image.)

2.3. Extended Inversion

[14] One concern with the standard inversion is whether the source estimates will be biased due to the diurnal cycle of fluxes, which can be large. It is not feasible to solve for independent hourly fluxes. We can imagine two alternative strategies to deal with this problem that can be used separately or in combination. The first is to take some estimate of the diurnal cycle of flux, perhaps from a biospheric model, and run a forward simulation. The resulting time series of concentration at each site is subtracted from the time series of observed concentration before performing the inversion. This is analogous to the pre-subtraction of the impact of the seasonal cycle of fluxes when performing annual-mean inversions [e.g., Gurney et al., 2002] or cyclostationary inversions [e.g., Gurney et al., 2004]. The second approach is to estimate two fluxes for each region within the same inversion. The first is the standard basis function that is constant throughout the month and the second is a daytime-only basis function. This allows for the estimation of a simplified (square) mean monthly diurnal cycle. Both approaches have advantages and disadvantages. The first probably captures a more accurate description of the diurnal cycle, since it allows a more realistic temporal evolution of fluxes. However, its magnitude is fixed. The magnitude is a rough measure of the gross productivity. Observed and modeled estimates of this quantity vary enormously, suggesting that fixing the magnitude is a risky choice. The second strategy faces the opposite problem, with a fixed (and approximate) shape to the diurnal cycle but flexibility in the magnitude. In this study we follow this second approach. Future studies could use a combination of the two approaches.

[15] To apply the second strategy here, we perform an “extended” inversion. In addition to estimating the constant monthly fluxes for all regions, daytime fluxes are estimated for each of the 12 Australian regions. The daytime flux is constant during daylight hours during the month but zero at nighttime, allowing for changes in day length with season. The same prior sources and uncertainties are applied to both the constant and daytime fluxes.

2.4. Data

[16] Our base network is similar to the current observing network with about 100 sites, the majority of which are flask sampling sites (Figure 2a). At locations where continuous CO2 measurements are available, concentration data sampled at 4-hourly intervals from the transport model simulation are used. The inversion requires a data uncertainty to be attached to each data value. This data uncertainty needs to provide a measure of how well we expect to be able to represent any data value given such limitations as a relatively low-resolution transport model, unrepresented spatial variability of sources, and measurement precision. Comparing estimated source biases in a synthetic data test with estimated source uncertainties can provide guidance in determining the magnitude of data uncertainties. For the base network sites, we determine the data uncertainties for the 4-hourly values from the high-frequency variability in concentration at that site. The concentration time series is fitted with a linear trend and two harmonics, and the standard deviation of the residuals (rsd) is calculated. The data uncertainty is set to 3 times the rsd, giving a range of data uncertainties of 0.3 to 9.6 ppm. For the flask locations, monthly mean concentration data are used in the inversion, consistent with most current inversion practice. To determine data uncertainties for the monthly mean values, the 4-hourly uncertainties would normally be reduced based on the number of samples contributing to the mean (in this case 6 × 30). However, in reality, flask samples are only taken approximately weekly, and so here the data uncertainties are determined assuming that four data values contributed to the monthly mean (i.e., we divide by equation image). This gives a range of data uncertainties from 0.2 to 4.9 ppm for the flask locations.

[17] In the Australian region, continuous CO2 data are available at Cape Grim, Tasmania (40.7°S, 144.7°E), and the nearest model grid point at 40.5°S, 146.3°E is used to represent this site (Figure 2b, solid circle). This is an ocean grid point since Tasmania is not resolved due to the coarse resolution of the transport model. The data uncertainty applied to the 4-hourly data from this location is 2.2 ppm. Also available are flask measurements at Cape Ferguson (19.3°S, 147.1 E), and the offshore grid point (18.1°S, 151.9°E) is used to represent this coastal site (Figure 2b, star) since the flask samples are typically taken during onshore flow. A data uncertainty of 1.6 ppm and monthly mean data are used at this location.

[18] The base network is augmented by hypothetical sites (continental or offshore) in the Australian region (Figure 2b). The lowest model level, approximately 60 m above the surface, is used for each of these additional sites. For most of the inversions presented, the data uncertainties for these extra sites are set according to the method described for the base network. This results in “variable” data uncertainties that range from 0.9 to 6.1 ppm for the continental sites and 0.6 to 2.2 ppm for the offshore sites. Some tests are also performed with constant data uncertainties. In the “constant” case, all the Australian grid points and offshore points use a data uncertainty of 1.7 ppm, the average value of the variable data uncertainties across the sites.

3. Assessment of Inversion Method

3.1. Experiments

[19] The first set of experiments performed have been designed to assess the inversion method. For this reason, dense networks in and around Australia are used. All additional sites use 4-hourly data samples. Two networks are compared: The first (A) comprises the base network (described above) plus sites at all 44 Australian grid points; the second (O) comprises the base network plus the 40 grid points just off the coast of Australia. In the O case the Cape Ferguson grid point uses 4-hourly rather than monthly data. The two networks are shown in Figure 2b. For each network, two inversions are performed, one with the standard basis function set (S) and one with the extended set (E), which includes the daytime only basis functions. We will use the codes AS, AE, OS, and OE to refer to the four experiments. All four experiments use the variable prior source uncertainties and the variable data uncertainties. The experiments are summarized in Table 1.

Table 1. Summary of Experiments
LabelBasis setSitesaData uncertaintyMin (ppm)Mean (ppm)Max (ppm)Prior Source uncertainty
  • a

    In addition to the base network.

  • b

    Reference cases are also performed with constant data uncertainty (1.7 ppm) and constant prior source uncertainty.

Method assessment experiments
ASstandard44 landvariable0.92.16.1variable
AEextended44 landvariable0.92.16.1variable
OSstandard40 oceanvariable0.61.32.2variable
OEextended40 oceanvariable0.61.32.2variable
Network design: addition of 1 site
Network design: incremental optimization
RIO-Vextended1–12 landvariable0.96.1variable
RIO-Cextended1–12 landconstant1.71.7constant
Network design: reference networksb
7extended7(Figure 6)variable1.02.25.2variable
12lowextended12(Figure 6)variable0.91.63.6variable
12highextended12(Figure 6)variable1.32.96.1variable
AE, OEas above      
AE + OEextended84variable0.61.76.1variable

3.2. Measures of Inversion Quality

[20] To assess the inversion method, we need to quantify both the bias and the uncertainty of the source estimates. The bias is defined as the difference between the retrieved sources and the known sources that were used to create the pseudodata. The uncertainty represents the standard deviation of the estimated flux distribution and is determined by the inversion, dependent on the prior source uncertainty, prior data uncertainty, and the extent to which air is sampled at the network from that source region. The uncertainty does not depend on the actual data values. We measure the quality of the inversion by the size of the source uncertainties; we require the source uncertainties to be as small as possible but larger than or comparable with the source biases. Law et al. [2003] provide a detailed discussion of the reasoning behind this definition and how the choice of data uncertainty can be used to influence the relative magnitude of the source uncertainty and bias.

[21] The bias and uncertainty can be defined at various temporal and spatial resolutions. In assessing the inversion method, we will consider the bias and uncertainty of the annual mean source estimates for the 12 Australian regions and the whole continent. We also construct, from the monthly source biases and uncertainty, a root mean square bias (RMSB) and uncertainty (RMSU) for each of the 12 Australian regions.

3.3. Results and Discussion

3.3.1. Total Australian Continent

[22] The source estimates and uncertainties for the total Australian continent are shown in Figure 3 for the experiments with the A (land) and O (ocean) networks. In case AS, when only the standard basis functions are used, the sources are overestimated in all months. This is consistent with the bias expected due to diurnal “rectification”; a diurnal cycle of fluxes with a net source of zero will usually result in a positive daily mean surface concentration because the nighttime CO2 source is mixed into a shallower boundary layer than the daytime CO2 sink [Denning et al., 1996]. This positive mean surface concentration is misinterpreted in the AS inversion as a positive monthly mean source, and hence we find the overestimate compared to the correct sources. The annual mean bias is 229 MtC yr−1, which is much larger than the annual mean uncertainty of 10 MtC yr−1. The shape of the seasonal cycle is reasonably well reproduced. In case AE, with the extended basis function set, the source estimates are much closer to the correct sources. In this case the extended basis set allows a simple mean diurnal cycle of fluxes to be estimated and the diurnal rectification is no longer misinterpreted. This is confirmed by the annual mean bias, which is 6 MtC yr−1 and now less than the annual mean uncertainty of 12 MtC yr−1. This indicates that the use of daytime-only basis functions is one method for reducing biases associated with inverting concentration data with large diurnal cycles. The inversion produces a good fit to the concentration data, well within the chosen data uncertainties. It is the source bias, rather than the data fit, that prevents the use of smaller data uncertainties, as was discussed by Law et al. [2003].

Figure 3.

Estimated and correct sources for the Australian continent for the (a) AS, (b) AE, (c) OS, and (d) OE inversions. The uncertainty on the source estimates is shown by the shaded area. The units are MtC yr−1.

[23] The continental sources estimated with the O network are very similar whether the extended basis function set is used or not. Annual mean biases are 85 and 65 MtC yr−1 for the OS and OE cases, respectively. The annual mean uncertainty of 14 and 15 MtC yr−1 is slightly larger than in the A experiments because the offshore sites are outside the region being estimated. The larger biases than uncertainties are indicative of aggregation error which can be either spatial or temporal. Aggregation errors are manifest because the spatial and temporal patterns of source assumed by the basis functions will be different from those of the sources to be estimated. The aggregation errors are magnified as sampling becomes less homogeneous [Kaminski et al., 2001]. Here the temporal sampling is homogeneous so we anticipate small temporal aggregation errors. The spatial sampling is less homogeneous in the O inversions than the A ones, relative to the land regions that we are estimating. This implies that spatial aggregation errors will be larger in the O than the A cases, which is consistent with the larger source bias for the OE inversion compared to the AE inversion.

[24] The similarity in the OS and OE results indicates that when concentration data are used from outside the region with diurnally varying sources, there is no real benefit from using the extended basis function set. This is supported by comparing the response functions at the offshore locations from the all-day versus daytime-only basis functions. The responses are almost identical, indicating that there is sufficient mixing of air to the offshore locations that the time of day of source release becomes largely irrelevant. Response variability is instead dominated by synoptic variations.

3.3.2. Australian Subregions

[25] A similar pattern of results is seen in the source estimates for the 12 Australian subregions. Annual mean bias and root mean square monthly bias are plotted in Figure 4 for each region and each experiment. In experiment AS, four regions stand out with larger annual mean and RMS biases: NW, N, NE, and E3. These are all regions with large diurnal flux variability. The relationship between large bias and large diurnal flux variability is confirmed by the high correlation (0.90) between RMSB and the standard deviation of flux in a region. There is one exception: The SE region has a large flux variability but does not produce a large annual mean bias.

Figure 4.

(a) Annual mean bias (symbol) and uncertainty (error bar) and (b) RMS monthly bias (symbol) and RMS monthly uncertainty (horizontal line) for each Australian region for the AS (solid square), AE (open square), OS (solid star), and OE (open circle) inversions. The regions are identified by the labels shown in Figure 2b. The units are MtC yr−1.

[26] The annual mean and RMS biases in case AE are almost all smaller than the AS biases. In particular, the regions with much larger annual mean biases no longer stand out as being any worse than the other regions. The SE region gives a worse annual mean bias and an almost unchanged RMS bias compared to the AS case.

[27] The anomalous result for the SE region is due to the choice of data uncertainties for the three sites within this region. The data uncertainty was chosen to be proportional to the synoptic- and shorter-timescale variability in concentration at a site. This variability is very different for the three sites in the SE region resulting in data uncertainties of 1.6, 2.5, and 6.1 ppm (a much larger uncertainty range than is found in other regions). This results in much less weight being given to the data for the site with larger diurnal and synoptic variability. However, this site also has a larger seasonal cycle and larger annual mean than the other two sites, and so the lower weighting results in sources that underestimate the seasonal cycle and the annual mean. The underestimate in the annual mean corrects the bias due to the diurnal cycle in the AS case and results in the negative bias in the AE case. This is confirmed by performing an inversion where the data uncertainty at the three sites is set to the average value of 3.4 ppm. In this case the annual mean bias is 30 MtC yr−1 for the AS inversion and 4 MtC yr−1 for the AE inversion, which is comparable behavior to that produced by the other high source variability regions. Similarly, the RMSB behavior also now fits the pattern of other regions with the RMSB smaller (10 MtC yr−1) for the AE inversion than for the AS inversion (31 MtC yr−1). Using constant data uncertainties within the region makes the sampling more homogeneous, and this appears to reduce aggregation errors.

[28] The annual mean source uncertainties and RMSU are 14–36% larger, depending on the region, for the AE inversion relative to the AS inversion. The increase is due to the extra fluxes that are solved for in the AE inversion. The increase in uncertainty is relatively small because there are six samples per day that can contribute to the reduction in source uncertainty. The increase in uncertainty for the AE inversion relative to the AS inversion becomes larger when less frequent data are used. For example, if the data frequency at the Australian sites is reduced from 4-hourly to daily data, then the uncertainties are increased by 32–114% for the AE daily case relative to the AS daily case. We consider the 14–36% increase in the 4-hourly case acceptable, given the reduction in bias achieved.

[29] The OS and OE source biases and uncertainties are also shown in Figure 4. In general, there is less difference between the OS and OE inversions than there was between the AS and AE inversions. This confirms, at the regional scale, the result already shown for the total Australian region. The source uncertainties are generally larger for the O inversions than the A ones, especially for those regions (C1, C2, E2) without coincident offshore grid points. The NE and E1 regions also have larger uncertainties. This occurs because the winds in those regions are predominantly from the east, which means that the regions are rarely sampled by the offshore grid points to the east. The regions with larger uncertainties also tend to produce larger source biases in both the annual mean and the RMSB. The annual mean biases suggest that there is a misallocation of source between the E1, E2, and SE regions. While the O inversions gave acceptable results for the total Australian region, the subregion analysis suggests that better results are obtained by using data from land grid points, provided that the extended basis function set is used.

3.3.3. Diurnal Cycle

[30] The extended basis set allows an estimate to be made of the mean diurnal cycle for each month and region. Figure 5 shows the diurnal amplitude of the estimated sources from the AE inversion for four of the regions that have larger diurnal source variability. The diurnal amplitude is calculated as follows. For the correct sources, the mean diurnal cycle for each month is calculated first. The sources are then averaged separately for all hours for which the sources are larger (equation image) or smaller (equation image) than the monthly mean source. The diurnal amplitude is defined as half the difference between equation image and equation image. This average diurnal amplitude is smaller than the peak amplitude but follows the same seasonal evolution. For the estimated sources, the diurnal amplitude is half the magnitude of the estimated source for the daytime-only basis function, divided by the proportion of daylight hours in the given month. (For a fixed monthly pulse of 1000 MtC yr−1, the amplitude is larger when the source is spread over fewer daylight hours.) The average and peak amplitudes are the same for the estimated diurnal cycle because the pulse is rectangular in time; the average amplitude is the appropriate comparison with the correct sources. We calculate the uncertainty on the diurnal amplitude from the uncertainty on the daytime-only source as given by the inversion using the same scaling as for the diurnal amplitude calculation.

Figure 5.

Diurnal amplitude of correct (dashed line) and estimated (solid line) sources from the AE inversion for regions (a) N, (b) NE, (c) SE, and (d) E3. The shaded region represents the uncertainty on the diurnal amplitude. The units are MtC yr−1.

[31] Figure 5 shows that the seasonal changes in diurnal amplitude are generally well produced by the inversion estimates but that the amplitude is almost always underestimated. This is particularly the case for the SE region where the estimated amplitudes are less than half the correct values. This poor result is consistent with the other anomalous behavior for the SE region, which was traced to the variable weighting given to the three sites within this region. Here, as before, the results are improved when the inversion is run with a constant data uncertainty for these three sites; the diurnal amplitude is increased so that the correct amplitude lies within the uncertainty range for most months. Overall, we consider the ability to estimate the diurnal amplitude to be encouraging given the very simple basis functions used.

4. Network Design

4.1. Experiments

[32] The second set of experiments begins with the base network and considers where new sites should be located to provide optimal estimates of annual mean flux from each Australian subregion. First, one site is added to the network at each Australian grid point (continental and offshore) and inversions are performed with variable (VS) or constant (CS) prior source uncertainties and variable (VD) or constant (CD) data uncertainties. This gives four cases that will be identified by VS/VD, VS/CD, CS/CD, and CS/VD (see Table 1). This enables us to check how dependent the optimal site location is on inversion set-up. Further sites are then added to the VS/VD and CS/CD inversions, always seeking the next location that minimizes the annual mean flux uncertainty averaged across the 12 Australian subregions. Patra and Maksyutov [2002] found, for an annual-mean inversion, that this “incremental optimization” method for creating an optimal observing network was as effective as the simulated annealing method used by Rayner et al. [1996]. These cases are labeled IO-V and IO-C (Table 1).

[33] We do not know how well incremental optimization will perform in this situation. To provide some check on the generated networks, comparisons are made with results from some reference networks of different sizes and with networks generated by a restricted incremental optimization. The reference networks are additions to the base network (including Cape Grim). Table 1 lists the networks, and inversions are performed with each network with both variable (V) and constant (C) data and source uncertainties. Here “7” is seven additional sites located such that every grid point neighbors at least one site (Figure 6). Two networks of 12 additional sites are used (Figure 6). One site is placed in each region; the site is chosen as the location with lowest (12low) or highest (12high) data uncertainty (based on the variable data uncertainties, in the constant data uncertainty case the same networks are used but there is no difference in data uncertainty). The remaining three networks are the 40 offshore sites (the OE inversion described above), the 44 Australian sites (the AE inversion), and the combination of these two networks (AE + OE, 84 sites).

Figure 6.

Reference networks of 7 (solid square) and 12 sites with low data uncertainty (open circle) and high data uncertainty (cross). The base network sites, Cape Grim (solid circle) and Cape Ferguson (open star), are also shown.

[34] In the restricted incremental optimization (RIO), a search is made, as before, for each site in turn that gives the smallest annual mean flux uncertainty, but two restrictions are placed on the search. First, only land grid points are searched. Second, only one site is allowed to be placed in each region. Again, the inversions are performed with both variable and constant data and prior source uncertainties (see Table 1).

4.2. Measure of Improved Inversion

[35] When designing a network, we need some measure to determine whether a new site improves the inversion result. In choosing this measure, consideration is only made of the estimated source uncertainties, since these are independent of the actual concentration data set being inverted (whereas the source biases are not). This should allow the production of a network design that is applicable to a wider range of cases than just one particular concentration data set. It is necessary to choose one uncertainty metric to minimize. Here the average annual mean uncertainty across the 12 Australian regions (equation image) is used, defined as

equation image

where Un is the uncertainty on the annual mean source for each Australian region. (The annual mean uncertainty for individual regions is calculated within the inversion code and accounts for any covariances between the monthly source estimates). We choose to minimize equation image since one of the primary tasks of an extended Australian network might be to detect changes in regional CO2 fluxes on annual to decadal timescales.

4.3. Results for Addition of One Site

[36] Figure 7 shows the reduction in the average annual mean uncertainty, equation image when one extra site (using 4-hourly data) is added to the base network at each Australian (land or offshore) grid point in turn. The quantity plotted is the ratio of equation image for the new network to equation image for the base network so that the locations that give greatest reductions are those with the smallest ratios. Four cases are shown, these being the results from the four different inversion set-ups (VS/VD, VS/CD, CS/VD, CS/CD). Overall, the patterns of results are similar. All cases show that the larger reductions are achieved for sites in the northwest part of the continent. This is reasonable because the base network includes a site (Cape Grim) off the southeast coast which provides a good constraint on the SE region and some information about other southern and eastern regions (as was also found by Wang and McGregor [2003]). The NW region of the continent is not sampled by the base network, and adding a site there also provides some information about regions farther to the east, since this latitude predominantly experiences easterly winds.

Figure 7.

Decrease in uncertainty from adding a new station at each grid point on or around Australia to the base network. The decrease is shown as the ratio of equation image for the new network relative to equation image for the base network for four inversion set-ups, (a) VS/VD, (b) CS/VD, (c) VS/CD, and (d) CS/CD.

[37] There are some differences in the patterns. The CS cases tend to have lower ratios overall. This is because the CS prior uncertainties are generally larger than the VS ones. Thus the addition of a site can have relatively greater impact. The CS cases also tend to extend the region of low ratio farther south than the VS cases. This is because the VS prior uncertainties are very small for the mostly desert regions in the western and central southern areas of the continent. Thus in the VS cases, there is less value in placing a site in these areas since they are already well constrained compared to other regions. By contrast, in the CS cases, the value of putting a site in this area is more determined by transport considerations (i.e., which location samples air from which regions) than by relative prior source uncertainties and hence the greater southward extent of the low ratios.

[38] The effects of changing data uncertainties are more subtle. In the VD cases, there is some favoring of offshore sites because these tend to have lower data uncertainty. Conversely, the CD cases produce lower ratios for locations in the N region since these locations have lower data uncertainties in the CD than in the VD set-up.

4.4. Results of Network Creation

[39] Figure 8 shows the networks created by incremental optimization for the IO-V and IO-C inversion set-ups and the corresponding networks created with the “one site per region” restriction (RIO cases). Figure 9 shows the reduction of equation image with increasing network size along with equation image for the reference networks. Note that a log scale is used.

Figure 8.

Location of sites in networks created by incremental optimization for (a) IO-V inversion, (b) RIO-V case (one site per region), (c) IO-C inversion, and (d) RIO-C case (one site per region). The number indicates the order in which sites are added to the network.

Figure 9.

Average annual mean uncertainty (equation image) for each network created using incremental optimization and each reference network for (a) V inversion set-up and (b) C inversion set-up. The line identification is given in the key (the larger AE, OE, and AE+OE networks are shown as horizontal lines rather than extending the x-axis to 40+ sites). The 12low network is indicated by the A, the 12high network by the B, and the 7 network by the C.

[40] We begin by discussing the equation image values for the reference networks since they demonstrate some general patterns. The OE network (all offshore points) performs relatively badly, the higher value in the constant case being due to the larger prior source uncertainties and the larger data uncertainties (1.7 ppm) than in the variable case (1.3 ppm average). The network with all Australian grid points (AE) gives a low equation image. For this network the C case gives the smaller value because the data uncertainty for land points is smaller than for the V set-up (average of 2.1 ppm). The addition of the offshore points (AE + OE) gives a further small decrease in equation image.

[41] The 7 network gives a equation image of 13.2 and 11.5 MtC yr−1 in the V and C cases, respectively. Again, the difference in equation image is consistent with the difference in data uncertainty (2.2 ppm for V compared to 1.7 ppm for C). The impact of data uncertainty is also clearly seen in the 12-site cases (12low, 12high). In the V case, 12high (with average data uncertainty of 2.9 ppm) gives a equation image of 7.9 MtC yr−1, almost 50% larger than 12low (5.4 MtC yr−1, data uncertainty 1.6 ppm). The 12low and 12high C cases (data uncertainty 1.7 ppm) give comparable equation image (5.5 and 5.7 MtC yr−1) to the low uncertainty V case. The small difference in equation image between the 12-site C cases appears to be transport related. The annual mean uncertainties on most regions are smaller when the site is placed on the west side of the region. A test inversion in which 12 sites were located on the western side of each region reduced equation image further to 5.0 MtC yr−1 with the C set-up.

[42] The incremental optimization with the IO-V inversion set-up produces the network shown in Figure 8a. The optimization places three sites offshore to take advantage of the lower data uncertainties at these locations. The early additions to the network are placed in locations which reduce the annual mean uncertainty for more than one region. The addition of seven sites is sufficient to give a equation image smaller than that achieved for all offshore grid points (the OE inversion) and 30% smaller than the 7-reference network. This latter result is largely driven by the data uncertainty; the average value for the optimal 7-site network is 1.4 ppm compared to 2.2 ppm for the 7-reference network. The 12-site network created by IO gives a slightly larger equation image than the 12low reference. This is a result of using an incremental approach to the optimization. The decision to place sites to reduce uncertainties on multiple regions early in the optimization limits the procedure later, when more sites become available. For example, it is unlikely that site 6 contributes much to the reduction in equation image once sites 9 and 10 have been included. It is worth noting that when a site is placed in a region it is almost always at the grid point with the lowest data uncertainty.

[43] The network for the RIO-V inversion (Figure 8b) shows only some similarity to the unrestricted case. Offshore points are no longer allowed, so the optimization starts with a site in central Australia rather than the northwest coastal site. Again, we can see evidence early in the optimization of locations that are designed to reduce uncertainties on two or more regions, for example, site 2 reduces equation image for both the NW and N regions. The restricted IO case results in larger equation image for networks with up to seven sites but smaller equation image for networks of 8 to 11 sites. Interestingly, with 12 sites, equation image is again larger than the unrestricted IO network, possibly because of the inclusion of site 2. This site, chosen early as a compromise location, has the largest data uncertainty of the NW region grid points.

[44] Figures 8c and 8d shows the networks generated with the C inversion set-up. Of most note here is that in the IO-C case, two sites are placed in the regions W and C1. The second site is being chosen primarily to constrain neigboring regions rather than the region itself. No sites are placed offshore because there is no longer any data uncertainty advantage in doing this. Both networks give smaller equation image with six sites than the network of all offshore points, and both give similar equation image for seven sites as the 7-reference network. The similar results occur because all sites are given the same data uncertainty in this set-up. In the same way, all the 12-site networks give comparable equation image. The RIO-C case gives smaller equation image than the unrestricted case for all networks larger than six sites. In these larger networks the unrestricted case is penalized for placing multiple sites in a region.

[45] In comparing the V and C results, the main determinant of differences in site placement is the different data uncertainties used. The impact of the different prior source uncertainties is harder to discern, but these differences may be the reason that the W, SW, and S regions are generally sampled earlier in the C cases than in the V cases. These regions have smaller prior uncertainties compared to other regions in the V set-up and so there is less benefit to the inversion in sampling in these regions. The initial “penalty” of larger prior source uncertainty in the C cases results in larger equation image for the IO-C inversion than the IO-V inversion until the network size reaches 15 sites, when the generally lower data uncertainties for land points in the C case results in a slightly smaller equation image (4.77 MtC yr−1) for the IO-C inversion than the IO-V one (4.82 MtC yr−1).

[46] The four network cases have shown that any network design is dependent on the inversion set-up and on the method used to create the network. However, a number of general points can be inferred. It appears that incremental optimization is most useful for networks containing up to half the number of sites as the number of regions being solved for (in this case, six sites for 12 regions). Networks with more sites, created by IO, are unlikely to be optimal, although they may produce uncertainties close to the minimum possible uncertainty. In the V cases, the data uncertainty applied to each site has a much stronger influence on the site placement choices and the magnitude of the uncertainty than any differences in the transport of signal to each location, although this may be influenced by the relatively low resolution of the transport model.

[47] If the hypothesis is accepted that data uncertainty should be larger for sites with more variable concentration records, then this would imply that sites should be located in areas with lower variability; that is, they should be some distance from large flux regions rather than in the midst of them. Intuitively, this seems reasonable, as it allows for some mixing of fluxes before being sampled and would make the inversion less susceptible to errors associated with not being able to represent a given data record with whichever atmospheric transport model is being used. It does seem clear, though, that continued work is required to choose appropriate data uncertainties in a less ad-hoc manner than currently. The result shown earlier, that the SE region source estimates were less biased using constant rather than variable data uncertainties within that region, was initially unexpected but makes sense when we consider the impact of inhomogeneous sampling on aggregation error. Whether this finding about the use of constant data uncertainties is applicable to the real world is doubtful; it is unlikely that a region will be uniformly sampled in space, and so variable data uncertainties will continue to be required.

5. Conclusions

[48] Even though the work presented here has only considered an Australian test case, some general conclusions are appropriate. First, it is possible to achieve unbiased estimates of monthly mean fluxes, even when the fluxes vary on diurnal and synoptic timescales. However, it is necessary to solve for a mean daytime flux in addition to the standard constant monthly flux if land-based sites are included in the observing network. Solving for these extra fluxes increases the uncertainty on the estimates, but not sufficiently to negate the benefits of including the daytime fluxes. The uncertainties are at least an order of magnitude smaller than those obtained using standard inversions of monthly baseline data. An estimate of the mean diurnal amplitude of sources is also achieved. This could provide an indicator of the carbon fixing capacity of the biosphere.

[49] Second, incremental optimization is useful for finding good locations to add a few additional sites to a network, but the method may give misleading results as the number of additional sites approaches the number of regions being solved for. For these larger networks, the uncertainty on flux estimates and consequently the choice of site locations becomes strongly dependent on the data uncertainty applied to each site. Determining the most appropriate data uncertainty values requires further work.

[50] We can also draw some conclusions that are specific to the Australian region. Given the current location of continuous measurements at Cape Grim, most improvement to Australian source estimates comes from locating a new site in the northwest or central part of the continent. We find, to a rough approximation, that doubling the number of additional sites results in a halving of the average regional annual mean uncertainty. Thus a network of approximately six extra sites could achieve regional uncertainties of 10 MtC yr−1 and a continental uncertainty of around 30 MtC yr−1 while 12 sites could achieve regional uncertainties of about 5 MtC yr−1 and a continental uncertainty of around 18 MtC yr−1. These values can be compared to the 2000 Australian anthropogenic emissions, which were estimated from inventories to be 104 MtC yr−1 ( Thus our continental uncertainty for a 12-site network is under 20% of the anthropogenic continental emissions.

[51] It is also important to note that the estimated uncertainty values are somewhat arbitrary since they scale with the data uncertainty applied to each site. Currently, the data uncertainties are chosen, in part, to ensure that any bias in the source estimates is not significantly larger than the uncertainty. The biases may be reduced if, for example, the inversion solved for more regions, or more care was taken to make regions homogeneous (unlike the SE region in the current set-up). This could then allow for a reduction in data uncertainty and a consequent reduction in estimated source uncertainty.


[52] We thank Paul Steele for helpful comments on this manuscript.