3.1 GEOS-Chem Model and A Priori Emissions
We use the GEOS-Chem CTM v9-01-02 (http://acmg.seas.harvard.edu/geos/index.html) as the forward model for the inversion. GEOS-Chem is driven by GEOS-5 meteorological data from the NASA Global Modeling and Assimilation Office. The GEOS-5 data have 1/2° latitude × 2/3° longitude horizontal resolution and 6 h temporal resolution (3 h for surface variables and mixing depths). Here we use the native 1/2° × 2/3° resolution for GEOS-Chem over North America and adjacent oceans (10–70°N, 40–140°W), with 3 h dynamic boundary conditions from a global simulation with 4° × 5° resolution. This nested North American functionality of GEOS-Chem has been used previously in a number of air quality studies including extensive evaluation with observations [Park et al., 2004, 2006; L. Zhang et al., 2011, 2012; Y. Zhang et al., 2012; van Donkelaar et al., 2012]. These show a good simulation of regional transport with no apparent biases.
The GEOS-Chem methane simulation was originally described by Wang et al.  and updated by Pickett-Heaps et al. . The main methane sink is tropospheric oxidation by OH, computed using a 3-D archive of monthly average OH concentrations from a GEOS-Chem simulation of tropospheric chemistry [Park et al., 2004]. The mean mass-weighted tropospheric OH concentration is 10.8 × 105 molecules cm−3. Additional minor sinks for methane are soil absorption [from Fung et al., 1991] and oxidation in the stratosphere. We use stratospheric methane loss frequencies archived from the NASA Global Modeling Initiative model [Considine et al., 2008; Allen et al., 2010] as described by Murray et al. . The resulting global mean atmospheric lifetime of methane is 8.9 years, and the lifetime against oxidation by tropospheric OH is 9.9 years. Model intercomparisons in the literature give corresponding values of 8.6 ± 1.2 years and 9.8 ± 1.6 years [Voulgarakis et al., 2013]. Prather et al.  estimate corresponding values of 9.1 ± 0.9 years and 11.2 ± 1.3 years from observational constraints.
For the a priori emissions we use the 2004 anthropogenic inventory from Emission Database for Global Atmospheric Research (EDGAR) v4.2 with 0.1° × 0.1° resolution and no seasonality [European Commission, Joint Research Centre/Netherlands Environmental Assessment Agency, 2009]. Natural sources include temperature-dependent emissions from wetlands [Kaplan, 2002; Pickett-Heaps et al., 2011], termites [Fung et al., 1991], and daily Global Fire Emissions Database Version 3 open fire emissions [van der Werf et al., 2010; Mu et al., 2010]. Figure 3 shows total methane emissions for North America and the contributions from the five largest source types.
Figure 3. North American methane emissions used as a priori for the inversion: (top left) total emissions and contributions from the major source types. Inventories are from Kaplan et al.  and Pickett-Heaps et al.  for wetlands and from Global Emissions Inventory Activity v4.2 for all other (anthropogenic) sources. Values are averages for 22 June to 14 August 2004. Annual emission rates for 2004 (Tg a−1) are shown inset for the North America domain as encompassed by the figure.
Download figure to PowerPoint
Table 1 lists U.S. anthropogenic emission totals by source type in the EDGAR v4.2 and EPA inventories (the EPA inventory is available only as a national total). Total U.S. anthropogenic emissions from EDGAR v4.2 and EPA are 25.8 and 28.3 Tg a−1, respectively. EDGAR v4.2 and EPA give similar estimates for emissions by source type, except for oil and gas and coal mining. EDGAR reports oil and gas emissions of 6.3 Tg a−1, 30% lower than the EPA  estimate of 9.0 Tg a−1. It reports U.S. coal mining emissions of 3.9 Tg a−1, 40% higher than the EPA  estimate of 2.7 Tg a-1.
Table 1. U.S. Fluxes of Methane in 2004 [Tg a−1]
|Source Type||EPA a||EDGAR v4.2b||Miller et al. c||This Workd|
|Total|| || ||47.2 ± 1.9||37.0 ± 1.4|
|Anthropogenic||28.3 (24.6, 32.3)||25.8||44.5 ± 1.9||30.1 ± 1.3|
|Livestock||8.8 (7.7, 10.4)||8.5||16.9 ± 6.7||12.2 ± 1.3|
|Natural Gas and Oil||9.0 (7.2, 13.4)||6.3|| ||7.2 ± 0.6|
|Landfills||5.4 (2.5, 7.9)||5.3|| ||5.8 ± 0.3|
|Coal Mining||2.7 (2.3, 3.2)||3.9|| ||2.4 ± 0.3|
|Othere||2.4 (1.4, 4.2)||1.9|| ||2.5 ± 0.2|
|Naturalf|| || ||2.7||6.9 ± 0.5|
Figure 4 shows surface air methane concentrations from the global and nested GEOS-Chem simulations with a priori emissions as described above, compared to observations from the NOAA Global Monitoring Division network (http://www.esrl.noaa.gov/gmd/). Boundary concentrations for the nested grid are archived at the edge of the North America domain. Comparison of GEOS-Chem with the NOAA data over the remote oceans shows that the model simulates realistic latitudinal gradients, and this is further supported by comparison to High-performance Instrumented Airborne Platform for Environmental Research Pole-to-Pole Observations (HIPPO) pole-to-pole aircraft observations over the Pacific [Wofsy et al., 2011; Turner et al., 2013].
Figure 4. Methane concentrations in surface air averaged over the inversion period (22 June to 14 August 2004). The GEOS-Chem simulation with a priori sources (background) is compared to NOAA GMD observations (circles). (left) Global simulation at 4° × 5° resolution used to archive a priori boundary concentrations for the nested simulation. (right) Nested simulation at 1/2° × 2/3° resolution for the North America domain. The thick line represents the boundaries for the nested simulation. Note the difference in scale between panels. The NOAA GMD data were obtained from http://www.esrl.noaa.gov/gmd/.
Download figure to PowerPoint
3.2 Inversion Method
We seek to use the SCIAMACHY observations over North America to optimize methane emissions on the 1/2° × 2/3° GEOS-Chem grid. Consider the ensemble of SCIAMACHY observations (column mean methane mixing ratios) assembled into an observation vector y. We assemble the gridded emissions and the gridded boundary conditions for GEOS-Chem into a state vector x. Let F represent GEOS-Chem serving as forward model for the inversion. We have
where ε is the observational error and includes contributions from forward model error, representation error (sampling mismatch between observations and the model), and measurement error. Error statistics are represented by the observational error covariance matrix SO = E[εεT] where E[ ] is the expected value operator.
Bayesian optimization weighs the constraints on x from the SCIAMACHY observations with the a priori estimates xA (error covariance matrix SA). Applying Bayes' theorem and assuming Gaussian errors leads to an optimized estimate for x by minimizing the cost function J(x) [Rodgers, 2000]:
Minimization of J(x) is done with the GEOS-Chem adjoint model, developed by Henze et al.  and previously applied to methane source optimization by Wecht et al. . The adjoint calculates ∇xJ(xA), passes it to a steepest-descent algorithm that returns an improved estimate x1 for x, calculates ∇xJ(x1), and iterates until convergence to find ∇xJ(x) = 0. We describe below in more detail the different components of the inversion.
The ability of the inversion to constrain methane emissions over North America is contingent on the model variability being driven by these emissions. Starting from initial conditions, we find that it takes about a week for variability of methane columns over North America in the nested model to be driven by fresh emissions and boundary conditions (as opposed to the initial conditions). We therefore initialize our simulation on 22 June 2004, 9 days prior to assimilating the first observations on 1 July. The inversion period over which we solve for emissions is 22 June to 14 August; observations are assimilated from 1 July to 14 August. The lifetime of methane against oxidation by OH is sufficiently long to play no significant role in the variability of methane concentrations over the North America domain. Prescribed OH concentrations used in the model are therefore of no significant consequence to the inversion results.
We attempted at first to optimize North American emissions and boundary conditions as a single state vector in the inversion. This was not successful because boundary conditions have a much larger impact in determining methane concentrations, even if they are less important for determining variability. We therefore iteratively minimize two separate cost functions, J(xb) and J(xe), to optimize boundary concentrations and emissions, respectively:
Here the state vectors are xB, scale factors of boundary concentrations at the edge of the North American model domain relative to the a priori, and xE, logarithms of scale factors of methane emissions relative to the a priori within the North America domain. We optimize the logarithms of the emission scale factors to ensure positivity in the optimized emissions. A priori values for xB and xE are labeled xB,A and xE,A, respectively, and the corresponding a priori error covariance matrices are SB,A and SE,A.
Each element of xB represents a temporally averaged scale factor applied to a 4° × 5° grid cell along the boundaries of the North American model domain and extending over 47 vertical levels, for a total of 3290 elements. A priori boundary concentrations are specified from the global GEOS-Chem simulation with a priori emissions (shown in Figure 4). The a priori error covariance matrix SB,A is constructed using error statistics from HIPPO-GEOS-Chem comparisons over the central Pacific presented by Turner et al. . The diagonal is populated with a model error standard deviation of 16 ppb (0.9%), and off-diagonal terms are parameterized with exponential error correlation length scales of 275 km in the horizontal and 78 hPa in the vertical [Wecht et al., 2012]. We assume that the above error statistics apply to all four boundaries.
Each element xE,i.j of xE represents a temporally averaged scale factor applied to total emissions from each 1/2° × 2/3° emitting grid cell (i,j) in North America for a total of 7906 elements. It is expressed as follows:
where Ei,j is the true emission flux and EA,i,j is the a priori described above.
The a priori error covariance matrix for the emissions, SE,A, is constructed by assuming a uniform relative error standard deviation of 30% for emissions from each model grid cell and no a priori error correlations (diagonal matrix). The sensitivity of the optimized solution to the specification of a priori error will be discussed later by considering an inversion without a priori constraints.
The observational error covariance matrix SO includes contributions from representation error, measurement error, and GEOS-Chem model error [Heald et al., 2004]. Representation error is assumed to be negligible because SCIAMACHY XCH4 observations have horizontal footprints (30 km × 60–120 km) comparable to the size of GEOS-Chem grid cells. We use reported IMAP v5.5 values for the measurement error (standard deviation 30.2 ppb or 1.7%) since these are consistent with our INTEX-A validation (section 2). GEOS-Chem comparison to HIPPO vertical profiles across the Pacific indicates a model error standard deviation of 16 ppb for methane column mixing ratios, and we assume that this holds for North America too. All errors are assumed to be Gaussian and are added in quadrature to calculate the observational error for each observation. We do not include error correlation between observations since the overall observational error variance is dominated by the measurement error for which no correlation is found in the validation presented above.
The iterative optimization is implemented as follows. First, we perform five adjoint iterations to reduce J(xB). We then use the updated values of xB to calculate J(xE) and perform five iterations to reduce J(xE). We use the updated values of xE to recalculate J(xB) and repeat. When the reduction of the cost function at each iteration becomes small (0.5% of the cumulative cost function reduction up to that point), after 40 iterations, we hold xB constant, and iteratively solve . Optimization of xB corrects background methane for the inversion and is of peripheral interest here. We focus our discussion on the optimization of xE.
Figure 5 shows the results from the inversion described above as optimized correction factors to the a priori methane emissions at 1/2° × 2/3° horizontal resolution. Correction factors are weak, less than 30% for 93% of grid cells. This is because the observations have insufficient information to constrain emissions at that resolution. As the discretization of emissions becomes finer, the observations become less sensitive to emissions from each grid cell. The inversion therefore has less ability to pull emissions in each grid cell away from their a priori value, and the optimal solution will be more tightly constrained by the a priori. This can be seen quantitatively from the minimization of (6):
where is the Jacobian matrix of the forward model. As the dimension of xE increases, the Jacobian matrix values become smaller, and thus, the individual terms of decrease in magnitude as distributes SO− 1(F(xE) − y) over a larger number of state vector elements. By contrast, the magnitude of individual terms of SE,A− 1(xE − xE,A) does not change. Thus, the a priori increases in importance relative to the observations.
Figure 5. Emission scale factors relative to the a priori (Figure 3, top left) from inversions optimizing emissions (left) for the 1/2° × 2/3° native resolution of GEOS-Chem and (right) for 1000 clustered regions. Gray areas (ocean/ice) are not included in the state vector for the inversion.
Download figure to PowerPoint
The problem could be mitigated by accurately specifying error correlations in the a priori or by imposing them in the solution, as is done in geostatistical inversions [Michalak et al., 2004], but there is little confidence to be had in the specification of error correlations for methane sources. It could also be avoided altogether by optimizing grid cell fluxes rather than scaling factors (equation (7)) in the inversion, but this would require specification of absolute rather than relative errors for each grid cell.
We opted therefore to reduce the dimension of our emission state vector by clustering of grid cells, taking advantage of the results from the native resolution inversion (Figure 5) to group together neighboring grid cells with similar emission scale factors and thus minimize the aggregation error associated with clustering. We tried successively smaller numbers of clusters and repeated the inversion in the same manner described above for the native resolution inversion, seeking to find the best number of clusters for the inversion as measured by the fit to observations. As we initially decrease the number of clusters starting from the native resolution, we can expect an improved fit of the inversion results to the observations for the reasons discussed above. However, as the spatial resolution of the state vector becomes too coarse (too few clusters), the fit to observations degrades because of aggregation error.
We use a hierarchical clustering algorithm [Johnson, 1967] as a data-driven aggregation technique to optimally define clusters from the native resolution emissions grid. The algorithm initially assigns each 1/2° × 2/3° grid cell to its own region, calculates the “distance” to all other regions, and joins the two most similar. Distance is calculated as follows. We define the location for a region l by the vector vl = (p, 0.05 s)T where p is the location of the region centroid on a sphere and s is the mean value of the optimized scale factor from the native resolution inversion presented in Figure 5. All variables are normalized to unit variance and zero mean. The factor 0.05 was selected to adjust the weight of scale factors relative to geographic distance. The distance between two regions l and m is calculated as the norm ||vl–vm||. The process of joining the two most similar regions proceeds iteratively, reducing the number of regions by one during each step. The algorithm can be stopped at any stage so that any number of clusters can be constructed.
Figure 6 (black) shows the contribution of the observation term, , to the optimized cost function for inversions performed using different numbers of clustered regions. Here is the optimal estimate from the inversion. We do not include the a priori term since it depends on the number of clusters used. The best results are achieved for 300–1000 clusters. As the number of clusters decreases from 7906 (native resolution) to 1000, the observations become more sensitive to elements in the state vector, producing a better model fit. As the number of clusters decreases below 300, aggregation error degrades the model fit. The range in the cost function for the different inversions is relatively small because the measurement error dominates for any individual data point. We use the inversion with 1000 clusters as our best estimate in terms of optimization and spatial detail. Figure 7 shows the 1000 clusters used in this analysis.
Figure 6. Sensitivity of inversion results to the resolution with which North American methane emissions are optimized from the SCIAMACHY data for 1 July to 14 August 2004. Resolution is expressed as the number of spatial clusters used in the inversion. The maximum of 7906 clusters represents the native 1/2° × 2/3° grid of GEOS-Chem. Optimal aggregation of grid cells based on proximity and emission correction tendencies yields successively smaller numbers of clusters. Black points show the observation term of the cost function describing the ability of the cost function to fit the SCIAMACHY observations. Red points show optimized U.S. anthropogenic emissions for each inversion.
Download figure to PowerPoint
Figure 7. Map of 1000 clusters used in the optimal inversion. Colors represent a unique index number for each cluster.
Download figure to PowerPoint
Figure 5 (right) shows the correction factors to the a priori methane emissions from the 1000 cluster inversion. Patterns are similar to the native resolution inversion (Figure 5, left), but correction factors are much larger, reflecting the stronger influence from the observations. Total U.S. anthropogenic emissions are only weakly sensitive to the number of clusters used. The variability of inversion results using different numbers of clusters will be used in section 4 to derive uncertainty estimates for our optimal emissions.
3.4 Evaluation With SCIAMACHY and INTEX-A Data
Figure 8 shows optimized emissions, calculated as the product of optimized correction factors and prior emissions in each grid cell. We checked for improvement of the model fit to the SCIAMACHY data by comparing GEOS-Chem simulations with optimized versus a priori emissions and boundary conditions. For this we calculated the GEOS-Chem-SCIAMACHY root-mean-square difference (RMSD) and correlation coefficient (R) for the ensemble of 1/2° × 2/3° grid cells with SCIAMACHY data, averaged over the 1 July to 14 August 2004 period and weighted by the number of SCIAMACHY observations in each grid cell. We find that the inversion reduces the model-observation RMSD from 11.6 to 9.7 ppb, while R increases from 0.65 to 0.76. This demonstrates improvement, limited by the random noise in the individual SCIAMACHY measurements.
Figure 8. Optimized North American methane emissions from the 1000 cluster inversion: (top left) total emissions and contributions from the major source types. The annual emission rate for 2004 (Tg a−1) is shown inset for the North America domain as encompassed by the figure.
Download figure to PowerPoint
We further used the boundary layer observations from INTEX-A (Figure 1) to provide verification of the inversion results. The model-observation RMSD for individual observations decreases from 33.5 to 28.5 ppb, while R increases slightly from 0.73 to 0.74. Here the improvement appears to be limited by small-scale model and representation error for individual observations. Averaging of the data allows us to reduce that error and is a more useful comparison. Figure 9 shows boundary layer (>850 hPa) GEOS-Chem-INTEX-A differences averaged on an 8° × 10° horizontal grid and for the INTEX-A period. The resulting model-observation RMSD weighted by the number of INTEX-A observations in each 8° × 10° grid cell decreases 23.2 to 12.3 ppb when using optimized instead of a priori emissions. The correlation coefficient R increases from 0.69 to 0.88.
Figure 9. Evaluation of the SCIAMACHY inversion of methane emissions using INTEX-A aircraft data. The panels show the mean differences between GEOS-Chem and INTEX-A observations below 850 hPa and for 8° × 10° grid squares in the simulation (left) with a priori emissions and (right) with optimized emissions from the 1000 cluster inversion. A priori and optimized emission maps are shown in Figures 2 and 7. The model-observation root-mean-square difference and weighted correlation coefficient (R) are inset.
Download figure to PowerPoint
We performed sensitivity inversions to investigate the effects of a priori constraints on emissions and model bias. A native resolution inversion without a priori constraints on emissions shows similar signs and patterns of emission corrections to the inversion with a priori constraints, but the magnitudes of corrections are larger. Evaluation using INTEX-A data averaged into 8° × 10° regions as above does not show as good a fit to observations, with an RMSD of 14.5 ppb and R of 0.77. This indicates that the a priori inventory contributes useful information. A sensitivity inversion including a uniform positive bias correction of 15 ppb in GEOS-Chem on the basis of INTEX-A free tropospheric data shows negligible effect on the correction factors to emissions because most of the bias is absorbed by correction to the boundary conditions.