Observational constraints on the distribution, seasonality, and environmental predictors of North American boreal methane emissions


Correspondence to: S. M. Miller,



Wetlands comprise the single largest global source of atmospheric methane, but current flux estimates disagree in both magnitude and distribution at the continental scale. This study uses atmospheric methane observations over North America from 2007 to 2008 and a geostatistical inverse model to improve understanding of Canadian methane fluxes and associated biogeochemical models. The results bridge an existing gap between traditional top-down, inversion studies, which typically emphasize total emission budgets, and biogeochemical models, which usually emphasize environmental processes. The conclusions of this study are threefold. First, the most complete process-based methane models do not always describe available atmospheric methane observations better than simple models. In this study, a relatively simple model of wetland distribution, soil moisture, and soil temperature outperformed more complex model formulations. Second, we find that wetland methane fluxes have a broader spatial distribution across western Canada and into the northern U.S. than represented in existing flux models. Finally, we calculate total methane budgets for Canada and for the Hudson Bay Lowlands, a large wetland region (50–60°N, 75–96°W). Over these lowlands, we find total methane fluxes of 1.8±0.24 Tg C yr−1, a number in the midrange of previous estimates. Our total Canadian methane budget of 16.0±1.2 Tg C yr−1 is larger than existing inventories, primarily due to high anthropogenic emissions in Alberta. However, methane observations are sparse in western Canada, and additional measurements over Alberta will constrain anthropogenic sources in that province with greater confidence.

1 Introduction

Atmospheric methane (CH4) is the second most important long-lived greenhouse gas, and since the preindustrial era, its radiative forcing has increased to 0.507 W m−2, approximately one third that of CO2 [Butler, 2012]. Therefore, greenhouse gas reduction strategies and future climate predictions will require accurate estimates of methane emissions. Total global emissions are constrained to approximately ±15% using observations of the global CH4 burden and rate of increase, combined with an estimate of the CH4 atmospheric lifetime [e.g., Kirschke et al., 2013]. However, uncertainties in emissions from individual source types can be greater than a factor of 2 [O'Connor et al., 2010; Dlugokencky et al., 2011; Melton et al., 2013]. For example, wetlands likely constitute the largest single source of atmospheric methane, but estimates of global fluxes vary from 60 to 213 Tg C yr−1 (80 to 284 Tg CH4 yr−1), meaning they comprise anywhere from 14 to 50% of the total budget [e.g., O'Connor et al., 2010; Melton et al., 2013; Bridgham et al., 2013; Kirschke et al., 2013; Ciais et al., 2013, and references therein]. Anthropogenic sources (e.g., fossil fuel extraction and processing, ruminants, and landfills), by comparison, likely account for 50–65% of total global emissions [Ciais et al., 2013; Kirschke et al., 2013]. Uncertainties in methane fluxes are larger at the regional scale; estimates of methane from the Hudson Bay Lowlands (HBL), a large boreal wetland region in Canada, range from 0.28 to 8.5 Tg C yr−1 [Roulet et al., 1992; Worthy et al., 2000; Pickett-Heaps et al., 2011; Melton et al., 2013].

The present study focuses on improving methane flux estimates from boreal wetlands. These regions are a particular concern because of their large soil carbon stocks. Methane fluxes in wetlands occur primarily in waterlogged, anaerobic soil conditions due to the decomposition of organic material by methanogenic Archaea. Boreal and arctic regions are far less productive than many other ecosystems but nonetheless play a vital role in the global carbon cycle. These northerly regions may contain half of all wetlands and soil carbon in the world (~1700 Pg C), twice the amount of carbon currently held within the atmosphere [Tarnocai et al., 2009].

Evidence suggests that high-latitude wetlands are already changing due to an evolving climate and that ecosystem changes may accelerate [Tarnocai, 2009; Avis et al., 2011; Schuur et al., 2013]. For example, most studies predict that climate change will increase methane fluxes from boreal and arctic regions; estimates range from 6% to 35% increase in methane fluxes per °C of global temperature increase [e.g., Gedney et al., 2004; Khvorostyanov et al., 2008; O'Connor et al., 2010; Koven et al., 2011; Zhu et al., 2011].

Three factors may explain the large differences among model estimates of boreal methane fluxes. First, models differ in their underlying environmental variables. For example, existing models of global wetland area range from 2.6 to 9 ×106 km2 [Petrescu et al., 2010] and have differing spatial distributions (especially over boreal North America [Melton et al., 2013]). Second, models further differ in functional form (see section 2.4), due in part to uncertainties and/or complexity in biophysical methane processes. For example, many models relate maps of soil temperature to wetland methane fluxes using a coefficient known as Q10. This coefficient describes the factor by which a reaction rate increases per 10°C rise in temperature. Estimates of this coefficient range from 1 to 35, largely due to microbial and soil heterogeneity [van Hulzen et al., 1999; Whalen, 2005; O'Connor et al., 2010; Lupascu et al., 2012]. Finally, differences among existing flux models also stem from difficulties extrapolating from plot level to regional scale. Most flux models calibrate to individual wetland sites and extrapolate to regional or global scales [O'Connor et al., 2010; Zhang et al., 2012]. However, small-scale study sites exhibit substantial heterogeneity, and fluxes can vary by an order of magnitude over microtopography on the centimeter scale [Waddington and Roulet, 1996; Comas et al., 2005; Hendriks et al., 2010].

Top-down approaches like inverse modeling provide one means of reducing the wide uncertainty in wetland methane fluxes. Top-down studies use atmospheric methane measurements and meteorological models to improve existing flux estimates at regional [Zhao et al., 2009; Bergamaschi et al., 2010; Villani et al., 2010; Kim et al., 2011] and global [Chen and Prinn, 2006; Bergamaschi et al., 2013; Fraser et al., 2013] scales. Most existing methods emphasize total emissions budgets and provide relatively little information on wetland processes, but two recent publications begin to bridge this gap. Spahni et al. [2011] conduct a global-scale inversion that estimates fluxes by wetland type. Pickett-Heaps et al. [2011] use atmospheric methane measurements from northern Ontario to assess the magnitude and seasonal structure of a wetland flux model over the HBL. Results imply a premature seasonal onset of fluxes in this model, referred to as the “Kaplan model.” The authors suggest removing fluxes from snow-covered regions as one possible solution. In spite of these recent studies, existing top-down approaches provide limited assessment of the underlying environmental variables or the functional form of existing wetland flux models.

The present study moves closer to integrating top-down flux estimates with process-based, bottom-up modeling methods. First, we explore how atmospheric methane measurements can be used to construct and assess biogeochemical process models at continental scale. Second, we use a broad network of measurement sites in Canada and the U.S. to understand the spatial and seasonal distribution of North American boreal wetland fluxes. To achieve these goals, we combine in situ methane measurements across Canada and the United States from 2007 and 2008, a regional atmospheric transport model, and a geostatistical inverse modeling framework.

2 Model and Measurements

The methods sections and subsequent discussion are organized as follows. First, we describe the atmospheric model and measurements (sections 2.1, 2.2, and 2.3). Using this model, we compare two existing wetland flux estimates, Kaplan and DLEM, against atmospheric methane observations. Both flux estimates are described in detail below (sections 2.4). We subsequently use a geostatistical inverse modeling framework to estimate North American boreal methane fluxes (section 3.1). This flux estimate has two components. The first component, termed the deterministic model, is a combination of environmental predictors (e.g., soil moisture and temperature) that best represent the methane fluxes, as seen through the atmospheric methane observations (section 3.2). The second component, termed the stochastic component, estimates the spatial and/or temporal flux patterns that may be lacking in the environmental predictors and therefore cannot be modeled using the deterministic model. The geostatistical inverse model produces a final best estimate, termed the posterior fluxes, and it is the sum of the deterministic and stochastic components.

2.1 The Regional Atmospheric Model

We simulate in situ methane mixing ratios using STILT, the Stochastic, Time-Inverted, Lagrangian Transport model [Lin et al., 2003]. STILT is a particle model; an ensemble of air-following particles is released from each methane observation site. In this study, a new 500-particle ensemble is initiated for each of the hourly methane measurements. These particles travel backward in time along the wind fields of a meteorology model, in this case for 10 days. STILT further includes stochastic motions that simulate boundary layer turbulence.

Wind fields from the Weather Research and Forecasting model (WRF version 2.1.2) are used to drive STILT trajectories in this study. Nehrkorn et al. [2010], Hegarty et al. [2013], and the supporting information describe this meteorology in greater detail. The WRF fields used here have a nested resolution, 10 km within 24–48 h of the observation sites and 40 km in more distant regions (see supporting information).

STILT subsequently uses the trajectories to calculate a footprint map. The footprints relate the surface fluxes in North America to the concentration increment seen at the measurement location and have units of mixing ratio per unit surface flux. This footprint is based on the number of particles in a region and their altitudes relative to the planetary boundary layer.

The STILT setup here incorporates fluxes from existing inventories on a math formula by math formula longitude-latitude grid (11 to 65°N and 145 to 51°W).

2.2 Model Boundary Condition

STILT only models emissions over the North American continent. The model therefore requires a boundary condition to represent the concentration of methane in incoming air over the Pacific and Arctic oceans before reaching North American sources. This study uses an empirical boundary curtain that interpolates a variety of trace gas measurements from ground-based sites and aircraft in the NOAA Earth System Research Laboratory Global Monitoring Division's Cooperative Global Air Sampling Network. The resulting boundary curtain varies latitudinally and vertically and has a daily temporal resolution (see the supporting information). The estimated boundary condition value associated with each STILT particle run depends on the ending latitude, altitude, and day of each particle. This boundary value is then added to the modeled methane signal from North American sources. The sum can be directly compared against measured methane mixing ratios at tower sites across Canada and the northern U.S. (e.g., section 3.1).

2.3 Measurements

This study uses observed methane mixing ratios for 2007 to 2008 from five observation sites sensitive to boreal wetland fluxes: hourly measurements from four Canadian observation towers and daily flask measurements from one U.S. tall tower. Sites (from east to west) include Chibougamau, Quebec (CHM, 50°N, 74°W, 30 m above ground level (agl)); Fraserdale, Ontario (FSD, 50°N, 83°W, 40 m agl); Park Falls, Wisconsin (LEF, 46°N, 90°W, 244 m agl); East Trout Lake, Saskatchewan (ETL, 54°N, 104°W, 105 m agl); and Candle Lake, Saskatchewan (CDL, 54°N, 105°W, 30 m agl, 2007 only) (Figure 1).

Figure 1.

Summer mean wetland fluxes from the Kaplan and DLEM wetland methane models (for July, August, and September, averaged over 2007–2008). Both models estimate similar annual totals for the HBL, but DLEM has a more pronounced summer peak.

Small-scale heterogeneities caused by turbulent eddies and incomplete mixing make it difficult to model hourly-scale variability in the in situ data. STILT also has difficulty estimating the very shallow nighttime boundary layer and therefore rarely captures variations in nighttime concentrations. Hence, this study uses afternoon averages of the methane data and model output (1 P.M. to 7P.M. local time), a total of 2485 observations after averaging.

2.4 Existing Flux Models

2.4.1 The Kaplan Model

The first inventory used in this study is the Kaplan model, described in Kaplan [2002] and Pickett-Heaps et al. [2011] (Figure 1). The model has the following functional form:

display math(1)

where E is the wetland flux expressed here in units of μmol m−2 s−1. C1 and C2 represent the moles of carbon per unit area in soil and litter, normalized by their respective lifetimes (τ1 and τ2). This soil carbon estimate is taken from the Lund-Potsdam-Jena (LPJ) dynamic global vegetation model [Sitch et al., 2003]. W is the maximum possible extent of wetlands in a given grid box (a fraction, from LPJ), and δ is a measure of whether wetlands are actually present (δ=0 if soil moisture (M)<10% and δ=1 otherwise). An emissions factor b represents the fraction of methane per mole of carbon respired (b=3×10−2). f(T) represents an Arrhenius equation of temperature (in Kelvin). In this equation, F adjusts the inventory based on soil temperature to better match differences between boreal (B) and tropical (T) wetlands (refer to Pickett-Heaps et al. [2011]). We build this inventory with soil moisture (for δ) and soil temperature (T) from WRF (the same meteorology used to drive STILT) at a soil depth of 25 cm. This soil depth provided the best match between the Kaplan model and atmospheric methane observations. This setup differs from Pickett-Heaps et al. [2011], who used surface skin temperature instead of soil temperature below the ground surface.

The LPJ model outputs used here for wetland coverage and soil carbon are updated from previous studies that also used the LPJ/Kaplan model [e.g., Bergamaschi et al., 2007; Pickett-Heaps et al., 2011]. Among other updates, soil carbon is approximately a factor of 4 lower than in the previous studies listed above. This adjustment matches the LPJ model against upland soil profiles, but the change appears inconsistent with methane observations over boreal wetlands [e.g., Pickett-Heaps et al., 2011]. We readjust the LPJ soil carbon estimate upward by a factor of 4.15 to match the LPJ/Kaplan model in Pickett-Heaps et al. [2011]. This previous study compares the Kaplan model against measurements from Fraserdale (FSD), Ontario, and likely better represents high-latitude soil carbon than the new LPJ estimate.

2.4.2 The DLEM Model

DLEM, the Dynamic Land Ecosystem Model, includes more complexity than the Kaplan model described above [Tian et al., 2010] (Figure 1). It models the production of methane in soil pore water (P, expressed here in μmol m−2 s−1), and only a fraction of methane produced is released to the atmosphere (E). This fraction depends on a multitude of factors that are discussed in depth by Tian et al. [2010]: plant-mediated transport, diffusive flux, ebullition, oxidation by methanotrophy, and oxidation during plant-mediated transport.

Methane production in soil pore water (P) is simpler to describe:

display math(2)

where Pmax is the maximum possible rate of CH4 production in soils, a spatially variable parameter [see Tian et al., 2010]. [DOC] is dissolved organic content, determined by gross primary productivity, litter fall, and soil organic matter decomposition rates [Tian et al., 2012]. k is the half-saturation coefficient, f(T) is the effect of soil temperature, f(pH) is the effect of soil pH, and f(M) is the effect of soil water content. The functions of temperature, pH, and soil moisture have the following forms [Tian et al., 2010]:

display math(3)
display math(4)
display math(5)

where Mfc is the field capacity, and Ms is the saturated water content of soil. Tian et al. [2010] provide a graphical depiction of these functional dependences.

The environmental data for DLEM are derived from a number of sources: meteorological data from North American Regional Reanalysis (NARR) [Mesinger et al., 2006] and land cover/vegetation data from a combination of sources (see Tian et al. [2010] for more details).

3 Statistical Framework

3.1 Conceptual Overview

We implement a geostatistical inverse model to infer information about methane fluxes and to assess the environmental drivers in existing wetland models. The statistical approach follows that of Kitanidis and Vomvoris [1983], Michalak et al. [2004], and Gourdji et al. [2012]. The inversion estimates the spatial and temporal distribution of emissions that is most likely given the atmospheric methane measurements and the transport information provided by the atmospheric model.

The inversion first requires a linear expression for the model-measurement framework:

display math(6)

where s (m×1) are the true, unknown fluxes. Unlike the wetland-specific fluxes estimated by Kaplan and DLEM (E), s encompasses fluxes from all source types. z is the n×1 vector of observed mixing ratios minus the estimated boundary condition value (see section 2.2). H (n×m) are the footprints computed by STILT (section 2.1). Finally, ϵ (n×1) describes model-data mismatch—all errors unrelated to an imperfect emissions estimate (e.g., transport error and aggregation error). This vector is assumed to follow a multivariate normal distribution with a mean of 0:

display math(7)

where R (n×n) is the covariance matrix of these errors.

Using the above framework, the inversion then models the unknown fluxes (s, equation (6)) using the following structure:

display math(8)

The first component of the statistical model (Xβ) is a weighted least squares regression and is termed the “deterministic model” or “inversion prior” (section 3.2). Each column of X (dimensions m×p) is a predictor in the weighted regression [e.g., Gourdji et al., 2008, 2012]. In this study, X includes data sets termed “auxiliary data” (e.g., soil temperature, moisture, and/or an anthropogenic emissions inventory) that help explain the spatial and seasonal distribution of methane fluxes. Additionally, one column of this matrix is constant, equivalent to the intercept of the regression. The regression coefficients (β, dimensions p×1) are unknown and are estimated in the inversion using the atmospheric methane data.

The second component the geostatistical inverse model, math formula, is termed the “stochastic component” or the “spatially correlated residual.” The stochastic component adjusts, at grid scale, the fluxes estimated by the deterministic model. This component, for example, can correct the deterministic model if any environmental data in X have the incorrect distribution. The covariance matrix Q (dimensions m×m) describes the variance and the spatiotemporal correlation of the stochastic component. It includes off-diagonal elements that follow an exponential covariance model: any fluxes estimated by the stochastic component will be spatially correlated with a given decorrelation length. This spatial correlation means that the stochastic component can adjust the flux estimate on a fine grid scale relative to the density of atmospheric observations [e.g., Michalak et al., 2004; Mueller et al., 2008; Villani et al., 2010; Bergamaschi et al., 2013; Miller et al., 2013].

The best estimate of a geostatistical inversion is obtained by minimizing a cost function (L) with respect to the methane fluxes (s) and the coefficients (β) [e.g., Kitanidis and Vomvoris, 1983; Michalak et al., 2004]:

display math(9)

The supporting information discusses further details of the statistical setup. In particular, we implement the inversion with Lagrange multipliers to prevent negative fluxes (see supporting information) [Miller et al., 2014]. Furthermore, we estimate the covariance matrices (R and Q) using restricted maximum likelihood estimation (REML) [Kitanidis, 1995; Michalak et al., 2004].

We use this statistical framework to estimate monthly methane fluxes (s) on a 1° by 1° longitude-latitude grid over the years 2007 and 2008, yielding 41,328 total locations in space and time. The geographic domain of the inversion spans from 35 to 65°N latitude and 145 to 51°W longitude.

3.2 The Deterministic Model of Fluxes

The following sections discuss the deterministic model in greater detail.

3.2.1 Auxiliary Environmental Data

We consider a number of auxiliary data sets or predictors for use in the deterministic model. Ultimately, only a selection of these data sets is used in the inversion depending on how well each explains the atmospheric methane data (see section 3.2.2). These data sets include both environmental drivers of wetland fluxes and inventory data on anthropogenic emissions. The full array of possible data sets for X are shown in Table 1. These include meteorological data from WRF (used in this version of Kaplan model [Nehrkorn et al., 2010]) and NARR (used in DLEM [Mesinger et al., 2006]). We consider soil carbon estimates from the LPJ model (used in Kaplan [Sitch et al., 2003; Pickett-Heaps et al., 2011]) and the Northern Circumpolar Soil Carbon Database (NCSCD) [Tarnocai et al., 2009; Hugelius et al., 2013]. Wetland coverage estimates include model output from LPJ and surface water data from the Global Inundation Extent from Multi-Satellites (GIEMS) database [Prigent et al., 2007; Papa et al., 2010]. Refer to the supporting information for maps of these auxiliary data sets.

Table 1. Auxiliary Data or Predictors Tested for Use in the Deterministic Modela
DescriptionStatic/VariableSource Model
  1. a

    See sections 2.4 and 3.2.1. The second column (Static/Variable) lists whether the auxiliary data in question are seasonally constant or vary temporally.

  2. b

    Soil moisture and temperature are available at multiple vertical soil levels in WRF and NARR: 5, 25, 70, and 150 cm depth in WRF and 0, 10, 40, and 100 cm depth in NARR.

Liquid soil moisture (e.g., not frozen) (M)variableWRF, NARRb
Total soil moisture (liquid + frozen) (MTot)variableWRF, NARR
Soil temperaturevariableWRF, NARR
The maximum fraction of a region that could be covered by wetlands (W)staticLPJ, GIEMS
Soil carbon content (C)staticLPJ, NCSCD
The estimated distribution of anthropogenic emissionsstaticEDGAR v4.2
Smooth tricubic functions over anthropogenic source regionsstatic 

In addition to wetland-related data sets, we also consider multiple data sets or predictors for the distribution of anthropogenic emissions. Specifically, we consider including the EDGAR v4.2 anthropogenic inventory in the deterministic model as well as the individual sector-by-sector emissions estimates from EDGAR. A companion study found that EDGAR v4.2 did not match the estimated distribution of anthropogenic emissions in the United States [Miller et al., 2013]. Hence, we consider additional proxies other than EDGAR v4.2 for the spatial distribution of anthropogenic emissions. For example, we construct smooth tricube functions centered over known anthropogenic source regions (e.g., Alberta, Oklahoma, California, and the U.S. East Coast; refer to the supporting information). The subsequent section discusses how to choose among this array of auxiliary data sets when constructing the deterministic model.

3.2.2 Selection of Auxiliary Data

It would be ill advised to use all auxiliary data sets from Table 1 in the deterministic model; the resulting model would be an overfit with problematic colinearity [e.g., Zucchini, 2000]. We instead use a statistical selection method to choose an optimal set of auxiliary data sets for the deterministic model. These methods select as many data sets for X as can explain variability in the methane fluxes but will prevent an overfit or unreliable coefficient estimates. We implement one of the most common methods, the Bayesian information criterion (BIC) (as in Gourdji et al. [2012]). The BIC numerically scores all possible combinations of auxiliary data based on how well they reduce the model-measurement residuals and applies an increasing penalty for model complexity (refer to the supporting information). Specifically, this penalty increases with the number of columns in X and with the log of the number of observations. Unlike frequentist statistics, these scores do not support p values or traditional hypothesis testing. The best model is simply the one with the lowest score. Kass and Raftery [1995] provide a qualitative assessment of model strength based on the difference in BIC scores. A score difference greater than 2 is “worth mentioning” and greater than 10 is “very strong.”

In many cases, one might expect that the product of two or more different environmental variables may be a better predictor than an additive model, so we test multiplicative interactions among the wetland-related auxiliary data sets. Additionally, several of the auxiliary data sets are colinear (e.g., total soil moisture and unfrozen soil moisture), and we are careful not to include similar or colinear predictors in the same candidate model for X. For consistency, we do not mix WRF and NARR data sets in the same candidate model.

4 Results and Discussion

4.1 Model-Data Comparison Using Existing Flux Estimates

Methane concentrations modeled with existing flux estimates exhibit a variable fit against the atmospheric data (see Figure 2). For example, both the Kaplan and DLEM models match the general shape of the seasonal cycle at eastern tower sites (LEF, FSD, and CHM) but underestimate the magnitude of the measurements. Among these sites, models match observations most closely at Fraserdale, Ontario (FSD), possibly because Pickett-Heaps et al. [2011] validated the Kaplan model at Fraserdale. Existing methane flux estimates, however, perform far worse at the western sites (CDL and ETL). For example, the models underestimate both observed summer and winter maxima at these sites. The observed summer maxima are likely caused by peak summer wetland fluxes, while the winter maxima likely reflect a combination of advected anthropogenic emissions and limited vertical mixing within the troposphere. This result implies that existing inventories underestimate both wetland and anthropogenic fluxes in western Canada.

Figure 2.

A comparison of modeled mixing ratios against measurements at the observation sites. The estimated boundary condition values have been subtracted from the observations; the difference indicates the effect of North American methane sources on the measurement sites. EDGAR v4.2 is an anthropogenic emissions inventory, while Kaplan and DLEM model wetlands. The model and observations are smoothed using a third-order Savitzky-Golay filter with a 61-point window.

The model-data comparison in Figure 2 also reveals important conclusions about the interdependence of wetland and anthropogenic emissions estimates. Gaps in anthropogenic emissions inventories can affect the perceived amplitude or seasonality of the wetland flux model. Even in remote regions like the HBL, the estimation of wetland fluxes hinges on a reliable anthropogenic emissions estimate. For example, the Kaplan/EDGAR v4.2 modeled concentrations are consistently too low at the Fraserdale site, but the amplitude of the summer maximum is similar to the amplitude of the data. This discrepancy could reflect one of two problems: either the wetland inventory has the incorrect magnitude and seasonal structure or the anthropogenic inventory (EDGAR v4.2) is simply too low. The time series at Park Falls (LEF), Wisconsin, further illustrates the importance of the anthropogenic emissions estimate. It appears that the wetland flux models begin producing methane too early in the spring of 2008 at LEF. A closer examination of Figure 2, however, reveals large (~25 ppb) modeled concentrations from anthropogenic sources during this period. This model-data discrepancy could stem from misspecified anthropogenic emissions, not problems in the seasonal structure of the wetland model. These examples highlight the difficulty of disentangling anthropogenic and wetland methane fluxes.

Subsequent sections discuss the deterministic model and geostatistical inversion results in greater detail.

4.2 Environmental Predictors of Wetland Fluxes

This section explores the results of the deterministic model (Xβ, sections 3.1 and 3.2). As discussed in the methods sections, the deterministic model is analogous to a weighted multivariate regression. Model selection methods (like the BIC, section 3.2.2) play a crucial role in constructing this deterministic model; they select auxiliary data sets (Table 1) for the deterministic model that can best explain the atmospheric methane data. In this way, model selection provides a means to objectively understand and assess biogeochemical methane models at continental scale.

The BIC selection chooses the following deterministic model for methane fluxes in Canada (Table 2):

display math(10)
Table 2. BIC Scores for a Selection of Candidate Deterministic Modelsa
Candidate ModelBIC
  1. a

    We test all possible combinations and interactions of the auxiliary variables in Table 1 and display only a sample here. The table is intended to show the range of BIC scores for the best scoring models and a few other notable models. The drift coefficients (β) scale the magnitude of the auxiliary data to match the methane observations. All models above use inputs from NARR (10 cm depth) and LPJ, unless otherwise noted. fKaplan(⋯) refers to the functional form used in the Kaplan model and fDLEM(⋯) the functional form in DLEM.

β0+β1[smooth functions over anthropogenic source regions]+β2[W][M]fKaplan(T)16,725
β0+β1[smooth functions...]+β2[W][MTot]fKaplan(T)16,728
β0+β1[smooth functions...]+β2[W][M]fKaplan(T)fKaplan(C)16,729
β0+β1[smooth functions...]+β2[full Kaplan model]16,735
β0+β1[smooth functions...]+β2[W][M]fKaplan(T) using NARR surface soil layer16,744
β0+β1[smooth functions...]+β2[W]fDLEM(M)fDLEM(T)16,750
β0+β1[EDGAR v4.2]+β2[W][M]fKaplan(T)16,885

This selected model for methane fluxes is relatively simple. The first term (β0) is a constant component, equivalent to the intercept in a regression. It describes the average magnitude of all sources not explicitly included in other components of the deterministic model. For example, this component might include agriculture, landfills, and wastewater treatment sources (among other possibilities).

The second term (β1[smooth functions...]) parameterizes anthropogenic sources (section 3.2.1). This term places smooth geometric functions over known source regions, including Alberta, California, Oklahoma, and the U.S. east coast. The BIC does not choose the EDGAR v4.2 anthropogenic inventory for the deterministic model because it fits the atmospheric data less well than the smooth geometric functions (Table 2). Hence, we do not utilize EDGAR within the atmospheric inversion.

The final component of the deterministic model (β2[W][M]fKaplan(T)) parameterizes wetland fluxes. This term in the deterministic model includes three auxiliary data sets: the distribution of wetlands (W), a map of unfrozen soil moisture (M), and an Arrhenius equation based upon soil temperature (fKaplan(T)). The optimal deterministic model uses the wetland map from the LPJ model and soil variables from NARR (at 10 cm soil depth). All other possible combinations and interactions of the auxiliary variables in Table 1 produce higher BIC scores (Table 2). For example, we test wetland models that include soil carbon, environmental variables at different depths in the soil profile, and different estimates for wetland distribution. Furthermore, we test a deterministic model that uses the functional form of temperature and/or soil moisture from the DLEM model.

This selected wetland model is similar to the Kaplan flux model but with soil carbon removed. Section 4.4 synthesizes the wetland flux results from this study and highlights what this parameterized wetland model might indicate about biogeochemical methane modeling.

Table 3 lists the Canadian methane budget associated with each component of the deterministic model and compares these estimates against existing inventories. It is important to remember that the methane budgets from the deterministic model are estimated using the atmospheric data—via the unknown coefficients, β. The smooth functions represent the largest component of the deterministic model, followed by the constant component, and finally the wetland component. When interpreting these budgets, however, it is important to note that the constant component math formula could represent either anthropogenic emissions or wetland fluxes.

Table 3. Canada Methane Budgets From the Deterministic Model (South of 65°N) and Several Inventory Estimates
 Canada Budget
Flux Model(Tg C yr−1)
Deterministic model 
math formula5.4 ± 1.5
math formula7.9 ± 0.9
math formula3.2 ± 0.6
Existing wetland models 
Kaplan model4.4
DLEM model5.6
Existing anthropogenic inventories 
Environment Canada3.3
EDGAR v4.23.9

Figures 3 and 4.2 visualize the deterministic model, both spatially and in relation to the atmospheric methane data. Figure 3 displays the annual average of the deterministic model. The smooth geometric functions to parameterize anthropogenic emissions are evident over the province of Alberta and over the Dakotas. The wetland model is more difficult to distinguish in this annual mean plot but is largest south of Hudson Bay in eastern Canada and near Great Slave Lake in the Northwest Territories. The deterministic model is nonzero everywhere across Canada, and this reflects the constant term β0 of the deterministic model. This term has an estimated magnitude of 2±0.5×10−3μmol m−2 s−1 (5.4±1.5 Tg C yr−1 over Canada, Table 3).

Figure 3.

The 24 month (2 year) mean estimated methane flux from (top) the deterministic model and (bottom) the final posterior estimate.

Figure 4.

A comparison of modeled mixing ratios against measurements at the observation sites. This figure is similar to Figure 2 but compares the deterministic model and posterior emissions estimate instead of existing flux models.

Despite the simplicity of the deterministic model, the mixing ratios estimated with model match favorably against atmospheric measurements (Figure 4.2). The deterministic model fits the atmospheric methane observations (R=0.72, root mean squared error (RMSE) =20.9 ppb) better than either the model setup with Kaplan and EDGAR v4.2 (R=0.12, RMSE =37.1 ppb) or DLEM and EDGAR v4.2 (R=0.08, RMSE =37.2 ppb). The formulation of anthropogenic emissions in the deterministic model may account for much of this improved fit against the atmospheric data. Despite the improvement, the deterministic model displays two notable shortfalls. First, the deterministic model does not reproduce the summer maxima observed at western observation sites (CDL and ETL). Second, the deterministic model underestimates the summer maxima at the Wisconsin (LEF) and Quebec (CHM) observation sites. These shortfalls suggest that the spatial distribution of wetland fluxes in the deterministic model may be too restrictive. In other words, wetland fluxes likely extend further west, east, and south than in the deterministic model, which places the largest wetland fluxes in the HBL.

Subsequent sections discuss the final methane flux estimate from the geostatistical inversion (math formula). This final, best estimate (R=0.89, RMSE =12.0 ppb) is henceforth referred to as the “posterior” fluxes.

4.3 The Spatial and Temporal Distribution of Emissions

The posterior flux estimate identifies two major source regions in Canada (Figure 3): over Alberta in western Canada and over the HBL in eastern Canada. This discussion analyzes each geographic region individually.

In western Canada, the inversion identifies a large, seasonally constant methane source region over Alberta. In the deterministic model, this source is represented by a smooth function. But in the posterior estimate, this source region becomes a more well-defined crescent shape over Alberta (Figure 3). These emissions likely originate from anthropogenic activity, and a future study will give an in-depth analysis of anthropogenic emissions in Canada. The posterior flux estimate also includes a large summer source in Alberta and Saskatchewan. As discussed previously, these fluxes are not represented by the auxiliary environmental data sets in the deterministic model. This omission in western Canada dominates the discrepancy in summertime Canadian methane between the deterministic model and posterior fluxes (Figure 5). The omission implies that either the LPJ wetland or the NARR soil moisture map is an underestimate in westerly regions of Canada. Unfortunately, the atmospheric data in this region have limited capability to pinpoint the exact location of these western wetland fluxes; atmospheric observations are sparse in western Canada, and wetland fluxes are colocated with large anthropogenic sources. In sum, this study identifies Alberta as a region with poorly known wetland fluxes and as a possible hot spot of anthropogenic emissions. We recommend that future methane measurement efforts focus on Alberta because this province is a key uncertainty in current understanding of Canadian methane sources.

Figure 5.

The monthly average methane budget estimated for the HBL and Canada in 2007–2008. Existing models underestimate wetland fluxes in western Canada. This regional shortfall explains much of the summertime discrepancy between the flux models and the posterior estimate in the lower panel.

Eastern Canada, in contrast, is dominated by seasonal methane fluxes that presumably emanate from wetlands. Figure 5 compares the seasonal cycle of DLEM, Kaplan, the deterministic model, and the posterior flux estimate over the HBL. The seasonal cycle of the deterministic model and posterior flux estimate compare similarly to the Kaplan model but have a broader seasonality than DLEM. The posterior flux estimate matches the Kaplan model more closely than the deterministic model over the HBL (though the deterministic model is a better match than Kaplan/EDGAR v4.2 in other regions of Canada and the northern U.S.).

Seasonal structure aside, the flux models also diverge in spatial distribution. Figure 6 displays the mean summer (July, August, and September) methane flux estimated by the inversion for eastern Canada. It also displays the difference between this estimate and the DLEM and Kaplan models. Our flux estimate is more spatially dispersive than DLEM across the Hudson Bay region. The differences between the posterior estimate and Kaplan are more subtle. The posterior estimate indicates methane fluxes across a broader region than Kaplan: into Minnesota, Wisconsin, Manitoba, and further west.

Figure 6.

(top) The posterior flux estimate averaged over all summer months (July to September for 2007–2008). (Bottom) The difference between the posterior estimate and the DLEM and Kaplan methane models.

Figure 7 summarizes the findings of this study as an annual methane budget estimate for the HBL and for all of Canada (south of 65° latitude). Our methane estimate for Canada is a factor of 1.5 to 2.2 times existing estimates. Anthropogenic emissions in western Canada may explain much of this discrepancy. In contrast, our annual HBL budget is consistent with that of DLEM and Pickett-Heaps et al. [2011] who use the Kaplan wetland model, but our estimate diverges from a site-based study by Roulet et al. [1992] and a box model study by Worthy et al. [2000] (see the supporting information). Furthermore, the HBL budget estimated here is low compared to the array of biogeochemical models listed in Melton et al. [2013]. The HBL budgets in those models range from 1.7 to 8.5 Tg C yr−1. This range in wetland methane estimates is likely greater than the interannual variability in wetland fluxes. For example, Tian et al. [2010] estimate an 11% standard deviation in annual North American methane fluxes.

Figure 7.

Total methane budgets from this study and others for the HBL and for Canada. The HBL estimates listed here are from DLEM and from observational studies. Melton et al. [2013] list numerous additional model-based HBL methane budgets, which range from 1.7 to 8.5 Tg C yr−1. “Env. Canada” refers to Environment Canada's [2013] National Inventory Report. The confidence intervals for this study do not encompass uncertainties in model selection and therefore may underestimate the total budget uncertainty.

4.4 A Synthesis Perspective on Biogeochemical Methane Models

This section explores the study's implications for biogeochemical methane modeling. The inversion results (e.g., section 4.2) raise the question of why a simple flux model fits the atmospheric methane data as well as sophisticated process models. The deterministic model developed here excludes a number of factors that can affect methane fluxes: soil carbon, plant-mediated transport, and heterogeneities in microbial communities, among many other processes. This question could be answered in two ways.

First, simple parameterizations may be sufficient when regional-scale flux patterns are the primary goal. For example, a synthesis study of existing chamber measurement sites found that methane fluxes across all sites are influenced most strongly by only a few environmental variables: water table height, soil temperature, and vegetation type [Olefeldt et al., 2013]. Furthermore, Bubier et al. [1993] and Waddington and Roulet [1996] argue that most centimeter-scale flux variability ultimately depends on two primary parameters: temperature and water table position. These studies imply that a simple model may adequately parameterize regional-scale flux variability. A second reason may account for the simplicity of the deterministic model. Complex methane flux processes can be challenging to upscale, meaning that the most complete methane model is not always the most accurate at regional scales. The spatial distribution of many flux-related processes is highly uncertain [e.g., Melton et al., 2013] due to a paucity of both land surface and methane flux data. This uncertainty means that models with many processes and parameters could run the risk of overfitting limited, available data (A paper by Zucchini [2000] illustrates the hazards of overfitting.). For example, a number of physical processes like ebullition, plant-mediated transport, and microbial community dynamics are all thought to play a role in methane emissions [Bridgham et al., 2013, and references therein], but how these processes or features vary on regional spatial scales is often poorly understood.

To that end, model selection methods, like the BIC used here, provide a means to diagnose weaknesses in flux model upscaling from plot level to regional or continental scale [e.g., Olefeldt et al., 2013]. Model selection methods choose the set of predictors that can best explain variability in any available methane data. If model selection does not choose a given predictor, that outcome implies one of several conclusions. Either the distribution of the predictor does not match against the distribution implied by the methane data, or the available methane data are insufficient to constrain the effect of that predictor. In either case, any conclusions based upon the predictor would likely overfit the available data at the expense of describing the large-scale flux process of interest.

5 Conclusions

This study uses atmospheric methane observations and geostatistical inverse modeling to understand North American boreal methane fluxes and associated biogeochemical models. The conclusions of this study fall under three general themes. First, we find that a simple wetland flux model, when combined with WRF-STILT, provides as good agreement with atmospheric methane observations as more complex flux process models. This result may have several possible causes: either simple models adequately parameterize regional-scale flux patterns, or the spatiotemporal distribution of important but complex flux processes is difficult to accurately model with available data at this geographic scale.

Second, we estimate both the spatial and seasonal distributions of methane fluxes over much of boreal North America. We find wetland fluxes that are more broadly distributed than in existing inventories, even extending into Minnesota, Wisconsin, Manitoba, and western Canada. This result implies that existing maps may under-represent the extent of soil moisture and/or the distribution of wetlands.

Finally, we calculate regional and Canadian methane budgets. Our HBL budget is on the upper range of observational studies but on the lower range of biogeochemical model estimates (Figure 7 and Melton et al. [2013]). In addition, we estimate total Canadian emissions that exceed existing inventories, largely due to sources in or near Alberta. Available atmospheric data are limited near Alberta during the study period, and this work highlights a need for more intensive methane measurements over that region.


This work was supported by the American Meteorological Society Graduate Student Fellowship/DOE Atmospheric Radiation Measurement Program, the DOE Computational Science Graduate Fellowship, and the National Science Foundation Graduate Research Fellowship Program. We thank Thomas Nehrkorn and Janusz Eluszkiewicz for their help with the WRF meteorology and the NASA Advanced Supercomputing Division for their computing support. We further thank Catherine Prigent for providing GIEMS data. Support for this research was provided by NASA grants NNX08AR47G and NNX11AG47G, NOAA grants NA09OAR4310122 and NA11OAR4310158, NSF grant ATM-0628575, and Environmental Defense Fund grant 0146-10100. NOAA measurements were funded in part by the Atmospheric Composition and Climate Program and the Carbon Cycle Program of NOAA's Climate Program Office.