Methane (CH4) emissions from wetland ecosystems in nothern high latitudes provide a potentially positive feedback to global climate warming. Large uncertainties still remain in estimating wetland CH4 emisions at regional scales. Here we develop a statistical model of CH4 emissions using an artificial neural network (ANN) approach and field observations of CH4 fluxes. Six explanatory variables (air temperature, precipitation, water table depth, soil organic carbon, soil total porosity, and soil pH) are included in the development of ANN models, which are then extrapolated to the northern high latitudes to estimate monthly CH4 emissions from 1990 to 2009. We estimate that the annual wetland CH4 source from the northern high latitudes (north of 45°N) is 48.7 Tg CH4 yr−1 (1 Tg = 1012 g) with an uncertainty range of 44.0~53.7 Tg CH4 yr−1. The estimated wetland CH4 emissions show a large spatial variability over the northern high latitudes, due to variations in hydrology, climate, and soil conditions. Significant interannual and seasonal variations of wetland CH4 emissions exist in the past 2 decades, and the emissions in this period are most sensitive to variations in water table position. To improve future assessment of wetland CH4 dynamics in this region, research priorities should be directed to better characterizing hydrological processes of wetlands, including temporal dynamics of water table position and spatial dynamics of wetland areas.
 Methane (CH4) is the second most significant greenhouse gas after carbon dioxide (CO2). According to the latest Intergovernmental Panel on Climate Change report, the radiative efficiency of CH4 is about 25 times that of CO2 on a 100 year time horizon [Solomon et al., 2007]. The atmospheric concentration of CH4 has increased from a preindustrial value of about 700 ppb to a current value of about 1790 ppb [Dlugokencky et al., 2009], contributing 0.48 W m−2 [O'Connor et al., 2010] of radiative forcing to the atmosphere. Global CH4 budget can be relatively well determined based on observations of atmospheric concentration of CH4. However, the high spatial and temporal variability of CH4 makes it hard to fully understand the strength and trends of natural and anthropogenic contributing sources [Solomon et al., 2007]. Among these multiple sources, wetlands are thought to be the single largest and climate-dominated natural source [Bartlett and Harriss, 1993; Wuebbles and Hayhoe, 2002]. And it was estimated that more than half of global wetlands are located in the northern high latitudes above 50°N [Aselmann and Crutzen, 1989].
 The amount of CH4 emitted from wetland soils is determined by the balance between CH4 production and consumption. In anoxic environments, e.g., saturated soils below the water table, CH4 is produced by methanogens which require oxygen-free environments [Whitman et al., 1992]. In aerobic environments, e.g., unsaturated soils above the water table, CH4 is oxidized by methanotrophic bacteria in the presence of oxygen [Hanson and Hanson, 1996]. Both of CH4 production and oxidation are mainly controlled by soil temperature, pH, and substrate availability [Christensen et al., 1995; MacDonald et al., 1998; Wagner et al., 2005]. CH4 can escape to the atmosphere via diffusion, plant-mediated transport, and ebullition, and the sum of these three release pathways represents the total amount of CH4 emitted to the atmosphere from the soil.
 Estimates of wetland CH4 emissions are often obtained using “bottom-up” approaches, ranging from simple empirical or statistical models [e.g., Andronova and Karol, 1993; Granberg et al., 1997; Levy et al., 2011] to detailed process-based models [e.g., Cao et al., 1996; Walter et al., 2001; Zhuang et al., 2004]. Previous process-based model simulations presented a large uncertainty in the estimates of wetland CH4 emissions at regional and global scales, and the estimated northern high-latitude wetland CH4 budgets had a wide range of 20~157 Tg CH4 yr−1, with minimum and maximum reported by Christensen et al.  and Petrescu et al. , respectively.
 The uncertainty in these estimates could result from many sources including model structures, assumptions, parameterization, and choice of forcing data. Among these uncertainty sources, the paucity of CH4 flux measurements could be an important factor. The lack of enough measurements of CH4 fluxes and related environmental factors may limit the understanding of ecological processes in specific wetland ecosystems, the model assumptions, and the parameterization of models. All of these limit the abilities of process-based models to estimate wetland CH4 emissions. In addition to the large uncertainty present in wetland CH4 emissions, the sensitivity of CH4 fluxes to environmental controls is not well understood, which also limits explicit representations of many mechanistic processes in models.
 Dramatic environmental changes including higher temperature, changes of precipitation pattern, thawing permafrost, and longer growing seasons all occur in the northern high latitudes compared with low latitudes [Fedorov, 1996; Hansen et al., 1996; Romanovsky et al., 2000]. Most of these environmental changes affect wetland CH4 emissions, including the magnitude and temporal variations [Friborg et al., 1997; Whalen and Reeburgh, 1992; Zimov et al., 2006]. The complex interactions between climate, soil, and ecosystems in the northern high latitudes provide a significant challenge for CH4 model studies. Without a sound understanding of all of these interactions, it is difficult to explicitly represent these interactions in process-based models.
 In view of these facts, we opt here to use an artificial neural network (ANN) to estimate wetland CH4 emissions in the northern high latitudes. During the past decades, most of the field measurements of CH4 fluxes were made in the northern high latitudes. With the accumulation of available flux measurements, there is an opportunity for using a data-driving ANN approach to estimate CH4 emissions. The ANN approach has appeared as a great alternative to classical statistical models [Delon et al., 2007; Dupont et al., 2008], and it is particularly useful in quantifying the responses of nonlinear processes, like wetland CH4 emissions. In this study, we first use the ANN approach to find the optimal nonlinear regression between CH4 fluxes and key environmental controls. Driven with the spatially explicit data of climate, hydrology, and soil properties, the developed ANN is then extrapolated to the northern high latitudes (north of 45°N) to estimate wetland CH4 emissions in this region.
2.1 Data Organization
 To begin, we collected direct CH4 flux chamber measurements of wetland ecosystems in the northern high latitudes from peer-reviewed literature [e.g., Glagolev et al., 2011; Levy et al., 2011]. Our data contain CH4 flux chamber measurements from 34 sites, covering a range of wetland types under various field conditions (Table 1 and Figure 1). Each site contains a collection of flux measurement records. These flux measurements were originally recorded as hourly, daily, monthly, or growing-season flux values per unit wetland area. We converted them to monthly values in this study. Since most of original CH4 flux measurements were hourly or daily values, we simply averaged all hourly or daily values within a month and aggregated to montly values for that month, without considering within-month flux variations. For those flux measurements recorded as growing-season flux values, at a few sites, we disaggregated evenly into monthly values without considering intermonth flux variations.
Table 1. Description of the Sites Used in This Analysis
 The climate, hydrology, and soil property information we used included mean air temperature (T), precipitation (P), water table depth (WTD), soil organic carbon (SOC), soil total porosity (TP), and soil pH. These site-level data were first retrieved from original research papers and then complemented with other spatially explicit data sets based on the geographic coordinates and experiment dates of the measurements. Specifically, WTD data were entirely retrieved from original research papers. Complementary climate information was derived from a historical climate database (CRU TS3.1) from the Climate Research Unit (CRU) [Mitchell and Jones, 2005]. Complementary SOC, TP, and pH in the top soil (0–30 cm) were taken from the International Soil Reference and Information Centre World Inventory of Soil Emission Potentials (ISRIC-WISE) spatial soil database [Batjes, 2006]. The number of total measurement records for each variable is listed in Table 2. Only those having complete measurement records, containing both CH4 fluxes and six environmental variables (N = 1049, due to the limited availability of WTD), were used for developing neural network models.
Table 2. Spearman Correlations Between CH4 Flux Measurements (N = 1790) and Different Environmental Factors: Air Temperature (T), Precipitation (P), Water Table Depth (WTD), Soil Organic Carbon (SOC), Soil Total Porosity (TP), and Soil pHa
aAll values are statistically significant at the 1% level.
 The generalized regression neural network (GRNN) [Specht, 1991] was employed to perform the input-output mapping between the independent variables (six environmental variables) and the dependent variable (CH4 fluxes). Similar to other kinds of neural networks, GRNN is a data-driven “black box” model. It can be used to estimate the underlying nonlinear relationship between model inputs and outputs, requiring no prior knowledge of the inputs. Relative to other neural networks, GRNN has some advantages including fast learning (without an interative training procedure) and good convergence with a large number of training data [Specht, 1991]. Thus, the GRNN model is a suitable mathematical model to construct the relationship between CH4 fluxes and the related environmental factors given that accurate prior knowledge (the link beween CH4 emissions and environmental factors) is usually unavailable. The GRNN model has a four-layer network architecture consisting of input, pattern, summation, and output layers [Zhuang et al., 2012]. The training data set, including input and output values of measurements, is fed into this multilayer neural network, and the network is trained to obtain a set of optimized interconnected network weights which are used to produce the most probable value for the outputs. More details about the GRNN algorithm and network optimization method can be found in Specht .
 In order to test the performance of the ANN model, the popular neural network validation method, train and test, was adopted to validate the capability of the developed model. Specifically, the whole measurement data set was randomly divided into two sets: a training set (75%; N = 787) used to construct the ANN model and a testing set (25%; N = 262) used to validate the constructed model. In addition, to compare the performance of the ANN model with those of traditional regression approaches, we also used a stepwise regression approach to model the relationship between monthly CH4 fluxes and environmental variables based on the same training data set (see the supporting information). MATLAB codes were used for developing the ANN model (The Mathworks, 2006).
2.3 Regional Extrapolation
 The developed ANN model was used to simulate monthly CH4 emissions from wetland ecosytems in the northern high latitudes from 1990 to 2009 at a 0.5° × 0.5° spatial resolution. In this study, we used the Global Lakes and Wetlands Database (GLWD) [Lehner and Döll, 2004] to define the spatial extent of wetland ecosystems in the northern high latitudes (north of 45°N). The cartography-based GLWD data set provides a global database of natural wetlands at a 30 s resolution (GLWD-3). We aggregated the 30 s GLWD-3 raster map to generate a data set of 0.5° × 0.5° resolution in which each 0.5° grid cell recorded the percentage of 30 s GLWD-3 wetland pixels.
 To extrapolate the ANN model, we organized spatially explicit climate, hydrology, and soil properties data. The climate data, including monthly air temperature and precipitation, were extracted from the CRU TS3.1 data sets [Mitchell and Jones, 2005]. The spatially explicit soil properties in the top soil (0–30 cm), including SOC, TP, and pH, were taken from the ISRIC-WISE spatial soil database [Batjes, 2006].
 The spatial extent of wetlands and the fractional wetland areas within each 0.5° grid cell were determined by the GLWD-3 data set [Lehner and Döll, 2004], while the WTD of wetlands within each 0.5° grid cell was derived from hydrological model simulations combined with a TOPMODEL-based method. The grid cell mean WTD was first simulated by a sophisticated hydrological model, which is able to simulate the soil moisture profile and WTD for wetland ecosystems [Zhuang et al., 2002, 2004]. Then we used the TOPMODEL-based formulation [Lu and Zhuang, 2012] to represent the spatially distributed WTD for each 1 km pixel within a 0.5° grid cell:
where f is the decay parameter, ki is the topographic wetness index (TWI), and λ is the average of ki over a 0.5° grid cell. ZWTD is the average WTD that is calculated from the hydrological model, and ZWTDi is the local WTD at a 1 km spatial resolution. Following Fan and Miguez-Macho , the decay parameter (f) was modeled as
where s is the terrain slope and T is the mean surface air temperature in January. The 1 km topographic information and TWI were acquired from the HYDRO1k database (available on http://eros.usgs.gov/#Find_Data/Products_and_Data_Available/gtopo30/hydro), which provides comprehensive and consistent global coverage of topographically derived data based on the U.S. Geological Survey 30 s digital elevation model of the world (GTOPO30). After acquiring the local WTD (ZWTDi) for each 1 km pixel within a 0.5° grid cell, we sorted ZWTDi to get an ascending order of local WTD (ZWTDj) and calculated the WTD of wetlands in that 0.5° grid cell (WTDwet) as
where n is the number of 1 km wetland pixels within each 0.5° grid cell, and the value of n is determined by the multiplication of the number of 1 km pixels and the fraction of wetland pixels, derived from the GLWD-3 data set, within each grid cell.
2.4 Sensitivity and Uncertainty Analysis
 A regional inventory of wetland CH4 emissions would typically have a wide range of emission estimates. Before exploring the uncertainty in model estimates, we first conducted a sensitivity analysis of the ANN model to reveal the sensitivities to each input data. We conducted 36 other regional simulations by altering the climate, hydrology, and soil input data uniformly for each grid cell at regional scale. Each of the six variables was individually increased or decreased at three levels: ±10%, ±25%, and ±50%. In each of these sensitivity simulations, when a single variable was changed, the other variables were held as the same as they were in the “baseline” simulation. The sensitivity was then calculated as the percentage change between the estimated mean CH4 fluxes of each sensitivity and the baseline simulation.
 The uncertainty in our regional inventories of wetland CH4 emissions is mainly due to uncertain regional forcing variables and model structures/parameters. Ideally, we should propagate uncertainties in these two kinds of sources to the model output. However, due to the lack of accurate prior knowledge of regional model inputs (the six environmental variables in this case), we excluded regional forcing uncertainty analysis by only focusing on the uncertainties associated with ANN model structures/parameters. The obtained ANN model was a data-driven and highly nonlinear system with only optimized weight values; thus, it was difficult to directly quantify the uncertainty range of the model through parametric inference since the model parameters (or network weights) were determined on the basis of the training data set (subsampled from site measurements). Here the model uncertainty (structures/parameters) was assessed through developing a number of alternative models using the “delete-one” cross-validation method [Zhuang et al., 2012]. Specifically, we randomly sampled three quarters of the training data from the organized measurement records to develop a new ANN model. Each possible training set was used to construct a different set of network parameters or weights, which was subsequently used for spatial extrapolation of CH4 fluxes. During this step, the uncertainties in ANN model structures/parameters were quantified in an implicit manner. These steps were repeated 100 times to obtain 100 sets of regional estimates. The 95% confidence intervals of all estimates of CH4 emissions were considered to be the range of model uncertainty and were thus used to define the lower and upper uncertainty bounds of the regional wetland CH4 inventory.
3.1 Artificial Neural Networks
 Before developing the ANN model, we first conducted a Spearman rank correlation analysis to explore the correlations between CH4 flux measurements and other environmental factors. The pairwise correlation shows that CH4 emissions are significantly correlated with climate, hydrology, and soil properties (Table 2). Among the six input variables, temperature and WTD are the two most important controls on CH4 emissions. CH4 emissions are correlated positively with air temperature and negatively with WTD (i.e., the lower the water table is, the less the CH4 is emitted from wetland soils). There are also significant positive correlations between CH4 emissions and SOC, soil total porosity, precipitation, and soil pH. Based on the correlation analysis, we found that these six environmental variables are all significantly correlated with wetland CH4 fluxes, at a significance level of p < 0.01. Thus, we considered all these explanatory variables as the ANN model inputs.
 The simulated CH4 fluxes from the ANN model are close to the observed data (root-mean-square error (RMSE) = 0.51 g CH4 m−2 month−1 for the training set and 1.1 g CH4 m−2 month−1 for the testing set), and the coefficients of determination (r2) between the simulated and measured fluxes are 0.92 and 0.68 (at a significance level of p < 0.01) for the training set (Figure 2a) and the testing set (Figure 2b), respectively. The linear regression between the simulated and measured CH4 fluxes is close to the 1:1 line, with some underestimation at higher fluxes for the testing set. In spite of the imperfect performance of the developed ANN model, it is much better than the performance of the fitted stepwise regression model (Figure S1), which has a RMSE of 1.48 g CH4 m−2 month−1 with the same training data set (r2 = 0.43).
3.2 Temporal Variations of Regional CH4 Dynamics
 The CH4 emissions from wetland ecosystems exhibit a large spatial variability over the northern high latitudes (Figure 3). The simulated emission patterns show that the Canadian lowlands, Alaska, West Siberia, and the far East Siberia are predominant sources of CH4. The highest emissions of CH4 occurred in two of the world's largest wetlands: the Hudson Bay Lowlands and the West Siberian Lowlands, where wetland ecosystems act as a source of atmospheric CH4 up to 40 g CH4 m−2 yr−1.
 The annual wetland CH4 fluxes show a significant interannual variability from 1990 to 2009 (Figure 4a). There is no significant trend of annual emissions during the period. The mean annual emissions are 48.8 Tg CH4 yr−1, with a range from 46.7 Tg CH4 yr−1 in 1994 to 51.0 Tg CH4 yr−1 in 2006. In terms of seasonal variability, wetland CH4 emissions exhibit substantial seasonal variations with weak fluxes in the winter and strong fluxes in the summer (Figure 4b). The highest emissions of 5.1 Tg CH4 month−1 occurred in June while the lowest emissions of 2.1 Tg CH4 month−1 in January. The variations in monthly emissions during 1990–2009 are such that they are higher in the winter than in the summer.
3.3 Uncertainty of Regional CH4 Estimates
 The sensitivity analysis of the ANN model was conducted by altering the input environmental variables individually (Figure 5). Among the six input variables, WTD stands out as the most sensitive one. Wetland CH4 emissions change uniformly with WTD at three changing levels. Increasing WTD (lower water table) inhibits emissions, while decreasing WTD (higher water table) favors more emissions. Higher pH favors more emissions, while lower pH inhibits emissions. CH4 emissions increase (decrease) with increasing (decreasing) TP at “medium” and “large” levels, and there is no significant change of emissions at a “small” level. For the SOC and climate variables, no consistent relationship exists across the three changing levels.
 The uncertainty analyses of regional CH4 emissions, based on 100 ANN models, indicate that larger uncertainties usually accompany higher CH4 emission rates (Figures 3 and 6). The estimates of grid cell mean CH4 fluxes from different ANN models do not vary significantly, with standard deviations normally lower than 2.4 g CH4 m−2 yr−1. The 100 ANN models provide a probability distribution of regional CH4 emissions (Figure 7a). The uncertainties of regional CH4 emissions from ANN model structures/parameters are defined, in our estimates, as the range between the lower bound and the upper bound of the 95% confidence intervals. There is little difference between the mean annual CH4 emissions from the 100 ANN models (Figure 7b) and previous estimates (Figure 4a), but the interannual variability of annual CH4 emissions increases once the model uncertainties are taken into account: The difference between the highest and lowest annual emissions changes from 4 to 14 Tg CH4 yr−1 when the uncertainties are considered (Figures 4a and 7b). During the period of 1990–2009, the mean annual CH4 emissions are 48.7 Tg CH4 yr−1, with the lower bound (44.0 Tg CH4 yr−1) and the upper bound (53.7 Tg CH4 yr−1) of the 95% confidence intervals.
 In terms of correlation analyses, temperature condition and water availability strongly constrain CH4 emissions from wetland soils. The high positive correlation between CH4 fluxes and temperature is consistent with laboratory studies [e.g., Whalen and Reeburgh, 1996] and field observations [e.g., Bellisario et al., 1999; Christensen et al., 2003]. The correlation between the depth of the water table and CH4 fluxes accords with field experiments [Heikkinen et al., 2002; Nykänen et al., 1998], suggesting that an inverse relationship exists between water table position and CH4 fluxes (deeper water tables lead to smaller emissions). In addition, the heterogeneity of soil properties is also an important control on the variations of CH4 emissions [Levy et al., 2011].
 Net CH4 emissions from boreal wetland ecosystems have a wide range of estimates ranging from 20 Tg CH4 yr−1 [Christensen et al., 1996] to 157 Tg CH4 yr−1 [Petrescu et al., 2010] during the past decades, based on measurements or model simulations (Table 3). Our estimate of CH4 emissions, 47~51 Tg CH4 yr−1, is within the range of these estimates and is comparable to the estimates from Bartlett and Harriss  and Zhuang et al.  focusing on the same region.
Table 3. Emissions of CH4 From Wetland Ecosystems in the Northern High Latitudes
 Some anomalies in annual wetland CH4 emissions can be identified during the period of 1990–2009, although no significant interannual trend exists. The CH4 emissions in 1993 are higher than the emissions in 1992, which is the same as reported by Bousquet et al. . Consistent with many other studies [e.g., Chen and Prinn, 2006; Mikaloff Fletcher et al., 2004], our estimates indicate a significant emission increase in 1998. Indeed, many previous studies [e.g., Cunnold et al., 2002; Dlugokencky et al., 2001] have attributed elevated CH4 emissions in 1998 to strong El Niño phenomena which occurred in late 1997 and 1998 [Bell et al., 1999] and influenced climate on a global scale. The higher emissions occurred in 2005 and 2006, which accords with the simulation results of Petrescu et al. . In addition, it is noted that the interannual variability of annual wetland CH4 emissions increases when uncertainties of model structures/parameters are taken into account (Figures 4a and 7b).
 Sensitivity analyses indicate that the ANN model is more sensitive to the availability of water table and soil properties (e.g., pH and TP). WTD stands out as the most sensitive and consistent factor, suggesting that the water table position is the key control of CH4 emissions. To date, most earth system models apply simplified representations of WTD, assuming an equally distributed water table in each grid cell without considering subgrid spatial heterogeneity when simulating wetland CH4 emissions [Petrescu et al., 2010; Walter et al., 2001]. This simple representation of WTD neglects the effects of microtopography on water table dynamics, to which CH4 production and oxidation processes are sensitive [Zhuang et al., 2007]. Although the developed ANN model could overfit the training data to a certain extent (Figure 2), the uncertainty analyses of model structures/parameters show little difference between the mean annual CH4 emissions from the 100 ANN models and those from the optimized ANN model, which suggests that the structure of the optimized ANN model is well developed for estimating CH4 emissions. Another uncertainty of the ANN model estimates could be from the spatial scale differences in models and their application since the ANN model was developed and parameterized at site level (from a few square meters to several kilometers) and applied to regional scale (0.5° × 0.5° in this case). In this study, we disregarded the differences in spatial scales by assuming that the relationships between CH4 fluxes and environmental variables do not change across different spatial scales. But this assumption should be further assessed in future study.
 In addition to the uncertainties in the ANN model associated with CH4 flux density (per unit wetland area), the extent of wetlands used for the estimates at regional scales could be an important source of uncertainty due to the difficulties in characterizing wetland areas and their dynamics [Zhu et al., 2011]. The regional total of CH4 emissions may be greatly affected by the choice of wetland data set. For instance, a model study by Petrescu et al.  gave a broad range of current CH4 emissions, between 38 and 157 Tg CH4 yr−1 from circum-Arctic wetlands (< 5°C for mean annual air temperature) based on multiple different wetland extent data sets. In this study, we use the GLWD data set [Lehner and Döll, 2004] to define the spatial extent of wetlands, and we use a simulated monthly mean water table position of wetlands within a grid cell to represent hydrological dynamics of wetlands. The cartography-based GLWD data set is expected to represent the maximum extent of wetlands [Lehner and Döll, 2004], which may include the wetlands that are inundated only for a short time (or never inundated) over a year. In our simulations, we use fixed fractional wetland areas for each grid cell, without considering the expansion and contraction of wetland areas during the course of a year. Thus, our estimate of wetland CH4 emissions could be higher than the actual value, especially for the winter months when contraction of wetland areas occurs. It would be better to utilize satellite-based wetland data sets [e.g., Prigent et al., 2007] to represent the temporal dynamics of wetland areas, but it should be kept in mind that satellites fail to detect those never inundated wetlands that are also associated with CH4 emissions. Also, the choice of wetland extent data set influences the estimated WTD of wetlands within a grid cell, which is calculated from grid cell mean WTD and the fractional wetland area based on the TOPMODEL method (equations ((1))–((5))).
 Based on published site-level CH4 flux measurements of wetland ecosystems and associated environmental data, we develop a model to estimate wetland CH4 emissions using an artificial neural network approach. The developed ANN model fits well with the observed CH4 fluxes. The mean annual wetland CH4 emissions in the northern high latitudes are estimated to be 48.7 Tg CH4 yr−1 with an uncertainty range of 44.0~53.7 Tg CH4 yr−1, and there are both significant interannual and seasonal variations of emissions during the period of 1990–2009. We find that the regional wetland CH4 emissions are most sensitive to variations in water table position. The simulated wetland CH4 emissions show a large spatial variability over the northern high latitudes, due to variations in hydrology, climate, and soil conditions. To improve future assessments of wetland CH4 dynamics in this region, research priorities should be directed to better characterizing hydrological dynamics of wetlands (i.e., variations of areas and water table position) in quantifying regional CH4 emissions from northern wetlands.
 This study was supported through projects funded by the NASA Land-Cover/Land-Use Change Program (NASA-NNX09AI26G), the Department of Energy (DE-FG02-08ER64599), the NSF Division of Information and Intelligent Systems (NSF-1028291), and the NSF Carbon and Water in the Earth Program (NSF-0630319). Computing support was provided by the Rosen Center for Advanced Computing (RCAC) at Purdue University.