Upscaling Wetland Methane Emissions From the FLUXNET-CH4 Eddy Covariance Network (UpCH4 v1.0): Model Development, Network Assessment, and Budget Comparison

Wetlands are responsible for 20%–31% of global methane (CH 4 ) emissions and account for a large source of uncertainty in the global CH 4 budget. Data-driven upscaling of CH 4 fluxes from eddy covariance measurements can provide new and independent bottom-up estimates of wetland CH 4 emissions. Here, we develop a six-predictor random forest upscaling


Introduction
The post-industrial rise in atmospheric methane (CH 4 ) concentrations has had a large climate warming effect, 60% the size of that for carbon dioxide (CO 2 ) (IPCC, 2021).The short atmospheric lifetime of CH 4 also promises relatively fast climate change mitigation effects following CH 4 emissions reductions, rather than century-or-more timescales for CO 2 reductions (Abernethy et al., 2021;Turner et al., 2019).However, current emissions trajectories more closely track high emissions scenarios (Zhang et al., 2023).Since 2014, there has been an accelerating increase in the CH 4 growth rate that reached a record level in 2022, at 18.2 ppb y −1 (Lan et al., 2023), and these increases could continue as global temperatures rise (Bansal et al., 2023;Zhang et al., 2017).
Large uncertainties around total (8%-39%) and individual CH 4 sources (Table 1) prevent CH 4 budget closure at regional-to-global scales.Better constrained CH 4 budgets are needed to more accurately attribute and mitigate the sources causing the accelerating rise in CH 4 emissions (Nisbet et al., 2022).Improving wetland CH 4 emissions estimates will help constrain the global CH 4 budget as wetlands comprise both the largest natural CH 4 emissions source (20%-30% of total emissions) and the second largest uncertainty in the CH 4 budget (Saunois et al., 2020).
Plain Language Summary Wetlands account for a large share of global methane emissions to the atmosphere, but current estimates vary widely in magnitude (∼30% uncertainty on annual global emissions) and spatial distribution, with diverging predictions for tropical rice growing (e.g., Bengal basin), rainforest (e.g., Amazon basin), and floodplain savannah (e.g., Sudd) regions.Wetland methane model estimates could be improved by increased use of land surface methane flux data.Upscaling approaches use flux data collected across globally distributed measurement networks in a machine learning framework to extrapolate fluxes in space and time.Here, we train and evaluate a methane upscaling model (UpCH4) and use it to generate monthly, globally gridded wetland methane emissions estimates for 2001-2018.The UpCH4 model uses only six predictor variables among which temperature is dominant.Global annual methane emissions estimates and associated uncertainty ranges from upscaling fall within state-of-the-art model ensemble estimates from the Global Carbon Project (GCP) methane budget.In some tropical regions, the spatial pattern of UpCH4 emissions diverged from GCP predictions, however, inclusion of flux measurements from additional ground-based sites, together with refined maps of tropical wetlands extent, could reduce these prediction uncertainties.
Wetlands, as defined here, include both seasonally and permanently inundated soils that are vegetated including riparian and floodplain forests, inland marsh systems, and peat-forming wetlands (peatlands) but exclude rice agriculture, tidal and non-vegetated waterbodies such as ponds, lakes, streams, rivers, and estuaries (Zhang et al., 2021).
Current methods to estimate global wetland CH 4 emissions generally fall into one of two approaches: top-down (TD) atmospheric observation-based inversions and bottom-up (BU) land surface models.Although both approaches involve state-of-the-art methods, emissions estimates vary significantly within and between the two approaches (Saunois et al., 2020).For the decade 2008-2017, a comprehensive TD model ensemble, compiled as part of the Global Carbon Project (GCP) methane budget activity, estimated wetland emissions of 159-200 (mean 181) TgCH 4 yr −1 .TD inversions use approaches such as Bayesian inference or ensemble Kalman filters to estimate the (posterior) wetland emissions required to reproduce in-situ and satellite atmospheric CH 4 concentration retrievals.Since observations are limited, wetland source attribution strongly depends on prior assumptions as additional sources of information, constraining sectoral contributions (i.e., natural, industrial, agricultural and waste emissions) in space and time (Jacob et al., 2022).Considerable variability exists between TD models, primarily due to differences in assumptions and whether satellite retrievals are included as constraints (Kirschke et al., 2013;Saunois et al., 2020).
In contrast to TD models, BU model estimates are not constrained by atmospheric CH 4 concentration data and attempt to directly represent wetland CH 4 fluxes and underlying flux processes with varying complexity (Riley et al., 2011;Ueyama et al., 2023).For the same decade (2008)(2009)(2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017), the GCP's 13-member BU model ensemble estimated emissions of 102-182 (mean 149) TgCH 4 yr −1 , ∼20% lower than the TD ensemble mean (Saunois et al., 2020).The widespread observed among BU models arises from differences in model parameterization, which is informed by process knowledge and literature estimates of parameter values and sometimes by calibration to observed wetland CH 4 fluxes at a limited number of sites (e.g., 3 northern wetlands for    (Saunois et al., 2020) 10.1029/2023AV000956 4 of 24 Wetland-DNDC in Zhang et al., 2002).However, Chang et al. (2021) showed a large variability in CH 4 flux temperature dependency across a network of eddy covariance tower sites (FLUXNET-CH4; Delwiche et al., 2021;Knox et al., 2019), indicating that upscaling from few sites is likely to introduce errors during global extrapolation.To date, process-based models have yet to use networked data, for instance FLUXNET-CH4, for multi-site calibration, in part because this approach is more technically challenging than for machine learning models.
Equally large uncertainties (∼50% of total uncertainty) are introduced when BU models simulate independent wetland extents in prognostic runs versus using prescribed global wetland extents in diagnostic runs (Melton et al., 2013).Notably, the substantial GCP BU spread (±30%-50% of ensemble mean emissions) was observed even in diagnostic model runs where all models were prescribed a common Wetland Area and Dynamics for Methane Modeling (WAD2M) wetland extent (Zhang et al., 2021), underscoring the need to reduce wetland CH 4 flux uncertainties as well as wetland extent uncertainties.
No global benchmark data set exists to favor or falsify, with a strong degree of confidence, any BU or TD model (Saunois et al., 2020).Insights about model parameterization and sources of uncertainty can only be gained at present via model intercomparisons, such as the BU Wetland CH 4 Inter-Comparison of Models Project (WETCHIMP) activity (Melton et al., 2013) and the TD Wetland CH 4 emission and uncertainty ensemble data set for Atmospheric Chemistry and Transport modeling (WetCHARTS) activity (Bloom et al., 2017), the GCP wetland CH 4 synthesis (Poulter et al., 2017), by regional scale evaluation of converging or diverging TD and BU estimates (Stavert et al., 2021; example in Figure 1), or by comparison to satellite retrievals (Parker et al., 2018).Independent estimates of global wetland CH 4 emissions, incorporating new data for calibration and model constraints, and implementing new modeling approaches, such as machine learning algorithms, are emerging alternatives for refining models and reducing uncertainties around wetland CH 4 sources (Saunois et al., 2020).
One data stream that can improve wetland CH 4 models and global emission estimates is the growing availability of CH 4 fluxes measured near the land surface.In situ eddy covariance flux towers provide long-term, semi-automated, and quasi-continuous fluxes at ecosystem scales (<1 km 2 ) with minimal disturbance to soils or canopy structure/function (Baldocchi, 2014;Chu et al., 2021).Although BU models have been parameterized using CH 4 flux data at individual sites, network CH 4 data has not been fully utilized.Since the late-1990s, FLUX-NET has provided standardized CO 2 flux data measured using eddy covariance across hundreds of locations around the world, enabling independent benchmarking of satellite measurements and Earth system models (Jung et al., 2020;Pastorello et al., 2020).Upscaling is a workflow combining statistical models and data to transfer information across scales, often using machine learning, and has been used by projects such as FLUXCOM to extrapolate FLUXNET data from 224 sites (∼850 site years) and predict global terrestrial ecosystem carbon and energy fluxes (Bodesheim et al., 2018;Jung et al., 2020;Tramontana et al., 2016).The FLUXNET-CH4 data set now provides similar opportunities to refine model parameterization and generate independent, data-driven estimates of regional-to-global CH 4 emissions (Chang et al., 2021;Delwiche et al., 2021;Knox et al., 2019).Peltola et al. (2019) independently acquired eddy covariance data from 25 high-latitude sites and developed a wetland CH 4 flux upscaling workflow to predict monthly, regional (>45°N) wetland CH 4 emissions for 2012-2013.Annual emissions of 31-38 TgCH 4 y −1 agreed well with previous bottom-up estimates (e.g., Chen et al., 2015;Treat et al., 2018;Zhang et al., 2017) but were higher than those of top-down estimates (23-28 TgCH 4 y −1 ) for the region (e.g., Bruhwiler et al., 2014;Spahni et al., 2011).Peltola et al. (2019) thus demonstrated that upscaling estimates from eddy covariance data could produce plausible CH 4 fluxes at regional scales.To date, however, no upscaling project has taken advantage of the full FLUXNET-CH4 site network to make and evaluate global wetland CH 4 emissions predictions.
Here, we develop a wetland CH 4 upscaling workflow (UpCH4) that combines FLUXNET-CH4 and globally gridded predictor data to train random forest model ensembles, including validation and test routines optimized for spatial prediction applications.Given that surface flux measurement may networks take decades to grow, our first goal is to robustly evaluate the ability of machine learning models to extrapolate beyond training conditions in space and time.We then use this CH 4 flux upscaling workflow (UpCH4) to predict wetland CH 4 emissions globally and compare them to current top-down and bottom-up model estimates.Given knowledge of model structure and network coverage, our second goal is to identify regions of convergence and diagnose regions of divergence between UpCH4 and existing estimates.Finally, we use knowledge of the model structure to conduct a FLUXNET-CH4 network dissimilarity analysis with the goal of informing strategic improvements in eddy covariance site coverage.

Eddy Covariance CH 4 Flux Data
UpCH4 was designed to predict CH 4 fluxes globally at inundated and non-inundated (i.e., shallow water table) freshwater vegetated wetlands.Forty-five natural and restored freshwater wetland sites from the FLUXNET-CH4 database qualified for model training (Delwiche et al., 2021).Two sites (RU-VrK and SE-St1) were excluded after quality control filtering and 1 year of data was excluded from a restored wetland site (US-Sne;    Note. Mean annual temperature (MAT; °C) and mean annual precipitation (MAP; mm) were extracted from WorldClim 2.0 (Fick & Hijmans, 2017).Weekly mean CH 4 fluxes were computed from half-hourly FLUXNET-CH4 Version 1.0 fluxes, which are available as a standardized gap-filled product (Delwiche et al., 2021).Although current wetland CH 4 emission products are resolved at monthly timesteps, a finer-resolution (here, weekly) increases training data size and helps capture sub-monthly functional dependencies between predictors and flux (Jung et al., 2020;Tramontana et al., 2016).

Table 2 Continued
Weekly fluxes were only retained when there was a minimum of 1 day (48 half-hours or ∼14%) of CH 4 observations.This gap-filling threshold was chosen to retain as much training data as possible while minimizing the errors introduced by filling long gaps (Dengel et al., 2013;Peltola et al., 2019).Most gaps were less than 5 hr in length, and the maximum possible gap-length was 12 days (Figure S1 in Supporting Information S1).A detailed study of machine learning-based gap-filling of eddy covariance CH 4 fluxes found that bias introduced by gap-filling remains small and consistent across gap-lengths of 12 days or less (Irvin et al., 2021).After applying the gap-filling threshold, the final flux data set consisted of 6,210 weekly observations spanning 2006-2018, with 96% of the data recorded after 2010, and 38% recorded after 2015.
Sites within 300 km of each other were grouped together, resulting in 26 clusters that were used for spatial leave-one-out cross-validation (LOOCV) of the machine learning model, where each training/validation fold consisted of all data except one hold-out cluster (Meyer et al., 2019) (Figure 2).Spatial LOOCV has previously been applied to evaluate models used in global upscaling of CO 2 and energy fluxes (Tramontana et al., 2016) and is suitable for making spatio-temporal predictions from spatially sparse time series data (Roberts et al., 2017).Further information on the wetland site class, geolocation, climate, site investigators, and data source is provided in Table 2.Additional information about the FLUXNET-CH4 sites considered for this study, including data digital object identifiers, site references, and source locations, are detailed in (Table S1), on the FLUXNET website and in Delwiche et al. (2021).

Predictor Data
A total of 140 candidate predictors were considered for data-driven upscaling.These candidate predictors were organized into five broad classes: climatic (e.g., temperature, precipitation), biometeorological (i.e., flux tower-measured air temperature, and ecosystem carbon and energy fluxes), land cover class and properties (e.g., vegetation class) (Tuanmu & Jetz, 2014, 2015), soil physical and chemical properties (e.g., clay content) (Hengl et al., 2017; Lamarque et al., 2013), topography (e.g., slope), and vegetation indices (e.g., vegetation greenness) (Gao, 1996;Hall & Riggs, 2016;Huete et al., 2002;Jensen & Mcdonald, 2019;Myneni et al., 2015;Vermote, 2015;Wan et al., 2015).Each predictor class contains data from different sources (e.g., models, tower-observations, and/or remote sensing) and information content (e.g., spatial only, temporal only, or spatio-temporal).Gridded data products were extracted at pixels corresponding to tower locations and used directly in model development whereas tower-measured data, when available, were used preferentially in model training and then substituted by a globally gridded product to force the model during upscaling runs (e.g., tower-measured air temperature was mapped to the MERRA-2 reanalysis air temperature gridded product).Predictor data sources for each class are described in detail in (Text S1 and Table S2 in Supporting Information S1; Table S3).
Lead and lag times of one, two, or 3 weeks or months (for weekly and monthly data, respectively) were imposed on all temporally-resolved predictors, corresponding to multi-day and seasonal lead and lag timescales identified between wetland CH 4 fluxes, and temperature, eddy covariance-derived gross primary production (GPP), and soil moisture-related drivers (Delwiche et al., 2021;Knox et al., 2021).For MODIS data, monthly mean seasonal cycles (MSC) and other annual metrics (e.g., site-year mean, minimum, maximum, amplitude) were also derived.Quality control figures (example shown in Figure S2 in Supporting Information S1) were generated for all predictors and used primarily to identify and replace outliers with values from a proximate site within a cluster, an adjacent pixel (when sites were isolated within a cluster), or the site-year median (when varying in time).After deriving predictors, a total of 273 predictors were available for model training.

Predictor Selection
We used forward feature selection (FFS) to identify the optimal predictor subset from across all possible predictors to use in the final model (Gregorutti et al., 2017;Meyer et al., 2018).For each FFS step, we trained a random forest model algorithm (Breiman, 2001) with all possible predictors and computed the cost function (here, mean absolute error (MAE)) on validation data to identify the predictor(s) associated with the smallest MAE.In the first FFS step, we evaluated all possible predictor pairs (33,670 possible combinations) and selected the pair that resulted in the smallest MAE.In the second FFS step, we evaluated all possible single predictors (of 273), and selected the predictor that, when combined with the first pair, resulted in the smallest MAE.The second step was repeated 10 times to ensure the identification of a global MAE minima.More details are provided in Text S2 in Supporting Information S1 and predictors identified via FFS are visualized in Figure S4 in Supporting Information S1.FFS is suitable for spatial prediction tasks because it adds predictors when they reduce the cost function computed on held-out data.In contrast, recursive selection relies on importance rankings generated from the training data itself, which increases chances of overfitting (Meyer et al., 2018).

Cross Validation
After FFS, random forest models were re-trained using the final predictor set and their predictive performance was evaluated using leave one (cluster) out cross validation (cross validation, hereafter) (Meyer et al., 2018).During model training, a full hyperparameter grid-search was performed, which allowed for deeper, more complex trees (with a small minimum leaf node size).Model performance was evaluated by comparing predicted and observed CH 4 fluxes using the coefficient of determination (R 2 ), MAE normalized by flux standard deviation (nMAE), and bias, computed as the mean of residuals.Nash-Sutcliffe Efficiency (NSE) was also computed as an integrative measure of model performance.NSE is equal to R 2 when model bias is zero, and a NSE > 0 corresponds to a model performance better than simply taking the average of the data.Performance was evaluated with respect to three data set components: site mean flux, mean seasonal cycle (MSC) calculated as the average monthly anomaly from site mean, and interannual monthly anomalies from the MSC (Jung et al., 2020;Peltola et al., 2019;Tramontana et al., 2016).These components distinguish spatial prediction performance (annual site means) from monthly mean seasonal cycles (MSC), and interannual variability (weekly or monthly anomalies).

Final Model Ensemble
A final random forest model ensemble was trained to propagate uncertainties in training data to a gridded product.First, Monte Carlo simulations were used to create 1,000 simulated training datasets (each composed of CH 4 flux plus the final predictor variables) where each weekly observation was drawn from a normal (Gaussian) distribution, except CH 4 flux for which the measurement uncertainty is defined as the variance of a double-exponential (Laplace) distribution (Irvin et al., 2021;Knox et al., 2019).For gridded products (e.g., static WorldClim data), dispersion around the true observations (distribution mean) was parameterized as a standard deviation of a 0.25° bounding box around each extracted pixel, for unitless MODIS enhanced vegetation index (EVI) products with the overall measurement uncertainty (0.015), for CH 4 flux as the weekly mean uncertainty (incorporating both random and gap-filling uncertainties (Irvin et al., 2021;Knox et al., 2019), and for tower-measured air temperature as a conservative estimate of standard temperature sensor precision (0.5°C) (Campbell Scientific, Utah).Second, 500 datasets were bootstrap sampled with replacement from the 1,000 simulations, and in each sample one site cluster was dropped at random.The site US-OWC was also always excluded due to its exceptionally high fluxes which could not be reproduced accurately by the best model (Figure 3a) and could introduce unnecessary bias error at other sites.Training on each bootstrap data set resulted in a final 500-member random forest model ensemble.Full details on Monte Carlo simulations are provided in Table S4 in Supporting Information S1.

Model Forcing
We applied the 500-member final model ensemble to an 18-year (216-month) time series of global grids of final predictor variables covering 2001-2018 from which the mean and standard deviation of predictions at each pixel globally was used as mean CH 4 flux and data-driven uncertainty.The reconstruction period (2001-2018) aligns with the current Global Carbon Project CH 4 modeling protocol and the monthly timestep of the WAD2M wetland extent product (Saunois et al., 2020;Zhang et al., 2021), though weekly data were used to train the machine learning model.For MODIS data, Google Earth Engine (Gorelick et al., 2017) was first used to prepare global monthly grids at 10-km resolution, excluding low quality observations, aggregating from 8-day values to monthly averages using the average of all good 8-day observations within a month, and aggregating from 500-m resolution to 10-km resolution using the average of all good quality 500-m or 1000-m observations within the 10-km pixel.Gaps of time-series of MODIS images were filled using the same methods as for site-based MODIS time series.All global grids were then resampled to a common 0.25° resolution and cropped to exclude Antarctica.Data were reprojected to WGS-84 geographic coordinates.Monthly positive and negative lags were imposed on grid stacks by shifting the stack by whole-month time steps, and linear interpolation between these shifts was used to create weekly time shifts.For each temporal predictor, a mean seasonal cycle stack was created by averaging rasters for each month across all available years for use in the global dissimilarity and tower constituency analyses (see Sections 2.2.3 and 2.2.4).

Wetland Area Products
We weighted each grid cell CH 4 flux prediction by fractional grid cell wetland extent to estimate CH 4 emissions using the primary WAD2M (Zhang et al., 2021) product and an alternate Global Inundation Estimate from Multiple Satellites GIEMS version 2; Prigent et al., 2020) global wetland map.We used WAD2M as the primary map because it was also used in the GCP bottom-up model ensemble allowing for direct flux prediction comparisons (Saunois et al., 2020).However, global wetland area is the largest source of uncertainty in wetland CH 4 emissions along with flux rates (Bloom et al., 2017;Melton et al., 2013), and the two maps enable a preliminary illustration of the sensitivity of our predicted emissions to different wetland area products.A robust evaluation of wetland extent uncertainties on upscaled emission estimates would require a detailed intercomparison such as that of Melton et al. (2013).Both WAD2M and GIEMS-2 maps were modified with several correction data layers to represent the monthly area covered by vegetated wetlands, excluding open water and coastal wetlands (Text S3 in Supporting Information S1) (Pekel et al., 2016).The maps were generated based on distinct multi-sensor methodologies estimating monthly inundated wetland area, to which a set of tailored correction layers and steps were applied to isolate only vegetated wetland area, following the methodology of Zhang et al. (2021).

Global Applicability and Tower Constituency
Similar to the challenges faced in global wetland CH 4 prediction using BU process models upscaled model predictions were extrapolated across a much larger spatial (Stell et al., 2021) and temporal (Chu et al., 2017) domain than that captured in the training data.Model extrapolation is likely to reduce the accuracy of flux predictions and can distort uncertainty estimates (Stell et al., 2021).To measure extrapolation during CH 4 upscaling, we first computed point-based dissimilarity (Hoffman et al., 2013) globally, defined as the minimum Euclidean distance between each grid cell-to-flux tower pair in predictor space, normalized by the mean distance among flux towers (Meyer & Buchta, 2020;Meyer et al., 2018Meyer et al., , 2019)).We then used the dissimilarity map to define a monthly, global area of model applicability (AOA) and corresponding area of extrapolation, using a dissimilarity threshold (Meyer & Pebesma, 2022).Finally, we defined the global constituency of each site cluster to identify which training conditions dominate global predictions and further evaluate the plausibility of global extrapolations.Each pixel was assigned as a constituent of the site cluster that was closest in predictor space (Hargrove & Hoffman, 2004).We propose that combining constituencies with AOA provides a semi-quantitative and informative approach to evaluating global representativeness and extrapolation confidence in upscaling models.We demonstrate this approach by identifying the regions that are most like training conditions and which training conditions each region is most similar to.This approach allows us to diagnose CH 4 flux prediction patterns in extrapolations.

Model Predictors
Only the first six predictors from the FFS were used in the final upscaling model, as they accounted for ∼85% of the MAE reduction.These predictors were tower-measured air temperature, with and without a 2-week lag, MODIS EVI with a 3-week lag, mean temperature of the driest quarter, precipitation of the wettest month, and vegetation canopy height (Figure S4 in Supporting Information S1).It is notable that three of the final six predictors were temperature related.An additional five predictors not included in the final model extended the FFS to the MAE minimum (Table S5 in Supporting Information S1) and included MODIS snow cover, tower-measured GPP with a 2-week lead, and the annual minimum in MODIS EVI.The random forest variable importance rankings deviated slightly in order from FFS, with air temperature and the two static climatological predictors ranked as the most important final predictors, while canopy height and MODIS EVI were less important (Figure S5 in Supporting Information S1).A strong exponential dependency was observed between CH 4 flux and air temperature, likely explaining its dominance in the variable importance, while more complex and/or less tightly correlated dependencies were observed between CH 4 flux and the other predictors (Figure S6 in Supporting Information S1).Temperature hysteresis was not reproduced in the model in many sites where it has been observed in the CH 4 flux observations (Chang et al., 2021) (Figure S7 in Supporting Information S1).

Cross Validation Performance
Model residuals (errors) were normally distributed around zero at bogs, fens, swamps, and wet tundra sites (Figure S8 in Supporting Information S1).At freshwater marshes, residuals displayed more negative outliers due to one site (US-OWC) that displayed exceptionally high CH 4 fluxes (>10× higher than overall median) that the model did not reproduce.The fluxes at US-OWC are plausible because the site is situated in a eutrophic estuarine marsh on the southern shore of Lake Erie, USA, (Rey-Sanchez et al., 2018), which displays very high rates of sediment methanogenesis (Angle et al., 2017).However, evaluating the global scale emissions from eutrophic wetland is beyond the scope of this first wetland upscaling effort and therefore, hereafter, cross validation metrics are reported with the exclusion of the exceptionally high fluxes at site US-OWC.

Unweighted Global Methane Flux Predictions
Gridded freshwater wetland CH 4 flux predictions, before being weighted by wetland extent (Figure S9 in Supporting Information S1), were compared pixel-wise to the original 43 training and four additional test flux tower sites (Figure 4).Globally, the model achieved an R 2 of 0.53, NSE of 0.51, nMAE of 0.35, and Bias of −6.0 n mol m −2 s −1 .The slight improvement compared to cross-validation performance is expected as all the training data appears many times across the 500 bootstrap-sampled data sets models.Examples of gridded product predictions at training sites and their uncertainties are shown in Figure S10 in Supporting Information S1.Notably, the model also performed well at three of four additional test sites (Figure 4), reproducing the site mean with nMAE ranging from 0.39 to 0.49 at the boreal and temperate sites.However, as was observed in training evaluations, predictions at the tropical forest test site (PE-QFR) did not reproduce the seasonal signal and exhibited the largest nMAE (1.35).

Global Model Applicability and Tower Constituency
Confidence in gridded model predictions was also evaluated semi-quantitatively using global dissimilarity and tower constituency analyses.Global dissimilarity was low, even in areas geographically distant from existing towers (Figure S11 in Supporting Information S1), suggesting that the model was not forced to extrapolate far from training conditions, even when making global predictions.The most dissimilar regions were Eastern ).These semi-arid or arid tropical constituency assignments are reasonable on the basis of climate; however, it is important to note that the wetland sites in these environments where the aforementioned towers are located are associated with large inland deltas (i.e., the Okavango Delta, Botswana, or the Sacramento Delta, California), which provide water to support marsh and swamp wetlands.The final model did not include surface hydrological variables and therefore extrapolation to these regions may not be easily evaluated by the dissimilarity and constituency analysis employed here.No relationship was observed between model error or variance and site-month dissimilarity (Figure S13 in Supporting Information S1) that could be used to scale errors in regions of extrapolation (Jung et al., 2020).

Wetland Area-Weighted CH 4 Fluxes
Time series of mean freshwater wetland CH 4 fluxes from UpCH4 (2001UpCH4 ( -2018)), weighted by WAD2M pixel wetland area, displayed regional patterns that reflected the interaction of wetland area and the model's flux predictions (Figure 5a).The highest wetland area-weighted fluxes (>30 mg CH 4 m −2 d −1 ) were predicted in both high-and low-latitude regions with extensive wetland area (e.g., Hudson Bay Lowlands (HBL), Congo Basin) and in semi-arid regions where wetland cover is low but model flux predictions were very high (Figure S9 in Supporting Information S1).Relative uncertainties (Figure 5b) were the smallest for high-latitude high emission (>20 mgCH 4 m −2 d −1 ) hotspots associated with the HBL and West Siberian Lowlands (WSL) wetland complexes and were largest in monsoon regions of relatively low flux (<5 mgCH 4 m −2 d −1 ) sandwiched between the semiarid and humid tropics.
Upscaled (UpCH4) fluxes were compared to three alternative global wetland CH 4 emission datasets using either the same WAD2M wetland area (i.e., GCP BU ensemble; Figure 5c), or variable wetland products (i.e., GCP TD ensemble; Figure 5e).At high-latitude wetland complexes (HBL and WSL), UpCH4 predicted slightly lower emissions than the mean of the GCP BU ensemble, similar to the GCP TD ensemble, but higher than WetCHARTS.Given the same WAD2M wetland area was applied in upscaling as in the GCP BU ensemble, this difference can be attributed to lower predicted CH 4 fluxes by UpCH4.At mid-to-low latitudes, UpCH4 predicted 10-30 mgCH 4 m −2 d −1 higher fluxes than the other products from the semi-arid tropics, including the Sahel and Horn of Africa, central and western Australia, and western Asia, while also predicting 20-30 mgCH 4 m −2 d −1 lower fluxes from the humid tropics, including the large wetland complexes of the Amazon and Congo Basins, and SE Asia, especially in Indonesia and Malaysia.Again, as the GCP BU ensemble used the same wetland area, these product differences can be attributed to wetland CH 4 flux rates rather than extent.The regional pattern of tropical emissions in WetCHARTS was more similar to the GCP TD ensemble pattern than the GCP BU ensemble pattern, which differ as described in Figure 1.

Temporal Trends and Spatial Patterns in UpCH4 Emissions
Upscaling model (UpCH4) emissions using both WAD2M and GIEMS-2 wetland area products are seasonal, with a JJA peak that corresponds with the expansion of wetland area, the warmest soil temperatures, and peak productivity at northern high-latitudes, (Figure 6).As indicated by globally integrated fluxes, UpCH4-WAD2M emissions correspond well with the average of the GCP BU ensemble range, whereas UpCH4-GIEMS-2 emissions are significantly lower, although the amplitude of the seasonal cycle is larger.No clear long term annual trend is predicted by UpCH4 (or other models), though interannual variability is apparent, driven by wetland extent changes.
The latitudinal pattern of UpCH4 emissions using either wetland extent lacks the year-round elevated equatorial band found in the GCP products (Figure 7).The temporal pattern of this band varies slightly between the GCP BU and TD ensemble, which show a clear seasonal peak in AMJ.The UpCH4-WAD2M upscaling still produces a similar global total to the GCP BU ensemble from a much larger latitudinal range for moderate fluxes.Increased UpCH4 emissions are observed for narrow tropical N hemisphere (10-20°N) and subtropical S hemisphere (20-30°S) bands, likely related to high flux predictions in the semi-arid climate whereas GCP BU and TD ensembles show enhanced fluxes around 30°N and 10°N.

Model Performance
UpCH4 achieved comparable metrics based on our spatial cross-validation (global R 2 = 0.50; Northern site R 2 = 0.48) to a recent northern high-latitude upscaling of CH 4 fluxes from 25 eddy covariance towers (Peltola et al., 2019), despite a larger and more variable global flux data set.UpCH4 also achieved better performance than global upscaling models for net ecosystem exchange of CO 2 (R 2 < 0.5) (FLUXCOM; Tramontana et al., 2016).
As noted by Peltola et al. (2019), net CH 4 fluxes measured with eddy covariance may be difficult for machine learning models to reproduce with limited datasets as they display complex and non-linear behavior, and are subject to storage effects and lags (Knox et al., 2021;Sturtevant et al., 2016) due to underlying CH 4 production, oxidation, and transport processes (Bridgham et al., 2013;Chang et al., 2021) identified substantial hysteresis in the seasonal temperature dependency of wetland CH 4 flux and Delwiche et al. (2021) identified lags and leads of various lengths between peak growing season air temperature and peak CH 4 flux.Temperature lags may be due to the substrate control of CH 4 production and would agree with coherence with ecosystem production (Knox  , 2021;Mitra et al., 2020).In UpCH4, leads and lags of various lengths were imposed on all temporal predictors to address part of this complexity and two lagged predictors were selected in the final model (i.e., lagged temperature and EVI), indicating their utility when using non-temporal machine learning algorithms such as random forests.However, observed hysteresis between temperature and CH 4 flux was not reproduced (Figure S7 in Supporting Information S1), suggesting that simply imposing lags on predictors cannot capture the complex biogeochemical processes that drive intra-seasonal variability (Chang et al., 2021).Deep learning models able to learn temporal dependencies, such as Long Short Term Memory (LSTM) neural networks, and/or able to incorporate process knowledge as constraints, could be considered as the core algorithms for CH 4 upscaling; however, this would require adapting model architecture for spatio-temporal predictions (Reichstein et al., 2019).
Performance improvements may also be expected if wetland CH 4 fluxes measured using eddy covariance can be partitioned into ebullition, diffusion, and plant-mediated transport pathways as has been done to partition diffusion and ebullition in lakes and bogs (Iwata et al., 2018;Ueyama et al., 2022).Each of these transport  (2010)(2011)(2012)(2013)(2014)(2015)(2016)(2017).The UpCH4 mean and spread are difficult to distinguish from those of the GCP BU ensemble due to small differences between the two estimates.processes is regulated by distinct drivers and produces flux signals that may be more directly attributed to these processes when separated from each other.Improved models may also be possible if we can reconcile current wetland classifications with real wetland differences in the mean and/or variance of methane fluxes.

Budget Comparisons
Global freshwater wetland CH 4 emissions and overall uncertainties during the period 2001-2018 from UpCH4 were 146 ± 42.7 TgCH 4 y −1 using WAD2M wetland area (Figure 8a), which closely matches emissions from the GCP bottom-up model ensemble for 2007-2018 (mean 149 TgCH 4 y −1 ; range 102-182 TgCH 4 y −1 ) but is substantially lower (31-44 TgCH 4 y −1 ) than GCP top-down emissions (mean 181 TgCH 4 y −1 ; range 159-200 TgCH 4 y −1 ) (Saunois et al., 2020) and WETCHIMP emissions (190 ± 39 TgCH 4 y −1 ) (Melton et al., 2013).Notably, UpCH4-WAD2M global emissions were also similar to that of Nzotungicimpaye et al. (2021)  At the global scale, UpCH4 mean CH 4 emissions agreed more closely with GCP BU emissions for tropical (60S-30N), temperate (30-60 N), northern (>45 N), and Arctic (60-90 N) latitudinal ranges compared to GCP TD estimates, which were consistently higher, most notably in the tropics.However, UpCH4 uncertainty ranges overlapped with both TD and BU estimates and supported the previously reported observation that ∼68% of wetland CH 4 emissions originate from tropical wetlands (Saunois et al., 2020).Within major wetland complexes at northern latitudes (Figure 8b), UpCH4 agreed more closely with lower TD estimates for the WSL, and UpCH4 emissions for the HBL and Prairie Pothole Region were also lower than both BU and TD estimates.For tropical wetland regions, estimates diverged more significantly between UpCH4 and the GCP ensembles (Figure 8c) with larger uncertainties likely due to lack of data in tropics.Although UpCH4 agreed closely with BU estimates for the tropical latitude total, UpCH4 emissions were much (∼3x) higher for the semi-arid monsoon Sahel compared to either GCP ensemble product, and much lower emissions estimated for the humid tropical forested wetlands of the Amazon, Congo, and the Indonesian archipelago.

Interpreting Product Differences
Interpreting similarities and differences in spatial patterns between global estimates of wetland CH 4 emissions is challenging because they arise from differences in both wetland fluxes and wetland area (akin to Melton et al., 2013).However, some broad conclusions can be drawn.GCP BU ensemble comparisons are informative at subtropical and tropical latitudes because in this study the shared use of the WAD2M wetland area enables model differences to be fully attributed to flux processes.As UpCH4 global upscaling emissions agreed most closely with GCP BU ensemble, it is notable that tropical and subtropical regional patterns diverged substantially (Figure 5c).The semi-arid to humid tropics gradient is inverted in UpCH4 when compared with GCP and thus these models produce similar global totals but different regional distributions.Given that inversion and observational studies in the Amazon Basin indicate a very large CH 4 source (Devol et al., 1988;Gauci et al., 2022;Pangala et al., 2017), it is plausible that  the upscaling pattern is biased, and that semi-arid and humid tropical wetland emissions are over-and underestimated, respectively.However, observations of exceptional CH 4 emissions from seasonal wetlands in the hot semi-arid tropics (e.g., BR-Nxr and BR-Gum, the Okavango Delta wetlands in this study) underscore that more measurements could support more robust validation of these patterns.If semi-arid/monsoon wetland CH 4 sources are currently being underestimated in BU models, this missing source would significantly close the gap between current BU and TD global wetland CH 4 emissions estimates.

Data Limitations
The training data contained a geographic bias with many more temperate and boreal northern hemisphere sites than tropical and southern hemisphere sites.Only 9% of the training data (5 sites) were acquired from south of 23°N, despite the tropics accounting for the vast majority of the non-frozen global wetland area and an estimated two-thirds of global CH 4 emissions (Melton et al., 2013;Saunois et al., 2020).As a result, tropical CH 4 flux predictions were based on flux data from a few towers that are unlikely to be representative of the entire tropical region and, correspondingly, tropical prediction uncertainties from UpCH4 were high (Figure 5b).For instance, training data for semi-arid subtropical regions were dominated by site clusters in the Sacramento Delta, United States (US-Myb, US-Sne, US-Tw1, US-Tw4, US-Tw5), and the Okavango Delta, Botswana (BW-Gum, BW-Nxr), which together accounted for ∼20% of all training data.In both regions, minerotrophic deltaic wetlands are dependent on large seasonal or permanent allochthonous riverine inputs from regional-scale drainage basins, and thus sustain very productive marsh or swamp conditions conducive to high CH 4 fluxes that contrast sharply from the surrounding dryland environments (Hemes et al., 2019;Knox et al., 2015).The lack of wetland data from more varied hydrologic classes of wetlands (e.g., riverine, isolated or rainfed, i.e., ombrotrophic) under similar climate conditions, combined with the biases described above, may have led to the high predicted CH 4 fluxes for the semi-arid subtropics.
The CH 4 flux predictions and uncertainties (Figure 5), combined with the tower constituency and model applicability maps (Figures S10 and S11 in Supporting Information S1), provide the first global survey for situating new eddy covariance measurement sites based on both CH 4 flux and environmental information.Moreover, an expanded site constituency representativeness analysis confirmed that global wetland CH 4 upscaling will benefit most from additional tropical wetland flux data (Figure 9).The large humid tropical site constituencies (e.g., ID-Pag-a SE Asian peat swamp forest) imply very wide model extrapolation across the Amazon and Congo Basins, where there is currently no available CH 4 flux data and where fluxes could be quite different.Similarly, the largest CH 4 flux uncertainties were observed for transitional climate regions between the semi-arid and humid tropics (Figure 5) and establishing towers to bracket or traverse these regions could help capture important gradients in tropical wetland conditions relevant to CH 4 flux variability.Overall, the lack of observations for Constituencies and their upscaling-based emission estimates are likely to be insensitive to adding additional sites (more transparent colors in map (a)) when emissions differences are close to zero (short segments in (b)).In contrast, constituencies and emission estimates are likely to be more sensitive to additional sites (opaque colors in map (a)) where absolute emissions differences are large (long segments in (b)).The size of the segment end points is proportional to the annual constituency CH 4 flux.A full constituency map without variable color transparency is provided in Figure S12 in Supporting Information S1. large regions of the tropics combined with the distinctive hydrological regimes characterizing different tropical wetland regions (Apers et al., 2022;Dalmagro et al., 2018), likely account for the upscaling discrepancy with past and typically higher tropical emission estimates, such as for the Amazon Basin (Figure 6).Refined methods to evaluate CH 4 flux tower network representativeness along different dimensions of variability could result in improved estimates, as has been undertaken at regional scales (Malone et al., 2022;Villarreal & Vargas, 2021).Similarly, use of more finely resolved spatial forcing data can more accurately represent wetland conditions and may improve model functional responses (e.g., 30-m resolution in Bansal et al., 2023).
In addition to geographic bias at global scales, wetland flux towers are also likely biased with respect to average grid cell conditions.Wetlands are often a minority cover type at landscape scales meaning that training data based on grid cell averages alone, for example, MODIS vegetation products may not be representative of tower conditions (Chu et al., 2021).Applied to the case described above of minerotrophic swamp or marsh surrounded by drylands, scale-mismatch will likely result in erroneous wetland vegetation indices, such as greenness and phenology metrics, as well as derived products such as GPP, which many studies, including this study, indicate as important for predicting CH 4 fluxes(e.g., Bridgham et al., 2013;Chang et al., 2021;Delwiche et al., 2021;Knox et al., 2019;Whiting & Chanton, 1993).Spatial biases at global and landscape scales could be addressed in future work by improving the geographic and grid cell representativeness of CH 4 flux and predictor data, respectively.Developing methods to reconcile and integrate chamber flux data with tower flux data could be prioritized to gain information from the large amount of existing chamber data, such as in Bansal et al. (2023), Kuhn et al. (2021), andTuretsky et al. (2014).Chamber methods offer a relatively inexpensive and accessible means to gather data in underrepresented regions (Harriss & Matson, 2009), may provide insights into patch-scale effects when paired with tower data via environmental response functions methods (Xu et al., 2017), and can extend the temporal representativeness of flux data (Chu et al., 2017).

Conclusions
We develop a wetland CH 4 flux upscaling workflow (UpCH4) for eddy covariance flux data and evaluate global CH 4 emission predictions.Extratropical estimates from UpCH4 can provide insights from comparisons to existing CH 4 model predictions.The use of UpCH4 tropical wetland emissions estimates should include consideration of uncertainties and joint use of additional regional data constraints is strongly encouraged.UpCH4 estimates average annual freshwater wetland CH 4 emissions of 146 ± 42.7 TgCH 4 y −1 for 2001-2018 which aligns closely with the most recent GCP BU estimates (149 TgCH 4 y −1 ) (Saunois et al., 2020) and a hybrid study that constrained a large ensemble of process models with satellite data (148 TgCH 4 y −1 ) (Ma et al., 2021).UpCH4 emission uncertainties were larger than, and overlapped with, both GCP BU and TD estimates, and the sensitivity to wetland extent products is illustrated by the halving of the global emissions total (71.8 ± 22.6 TgCH 4 y −1 ) when using GIEMS-2 corrected for only vegetated wetland area.All gridded emissions products are available via ORNL DAAC (https://doi.org/10.3334/ORNLDAAC/2253). UpCH4 is most suitable for comparison to other bottom-up and top-down models within temperate, boreal, and arctic climate zones from 2010 onwards, and will improve in tropical regions as EC data coverage is expanded over time.and validation are available via Zenodo (https://doi.org/10.5281/zenodo.7978099).The Global Carbon Project methane budget is available at https://doi.org/10.18160/GCP-CH4-2019.

Figure 1 .
Figure 1.(a) Large regional discrepancies exist between (b) bottom-up (BU) and (c) top-down (TD) model estimates of wetland CH 4 emissions, in addition to different global totals.Mean daily natural wetland CH 4 emissions for 2010-2017 estimated from a 17-member TD ensemble are subtracted from the mean from a 13-member BU process model ensemble (Saunois et al., 2020).High northern latitude bounding boxes correspond to locations of Hudson Bay Lowland (left) and West Siberian Lowland (right) wetland complexes.

Figure 2 .
Figure 2. Location, class, and size of 26 globally distributed freshwater wetland clusters.Symbol sizes reflect the number of weekly CH 4 fluxes in each cluster expressed as a percent of the total data set considered in this study.Clusters combine data from sites that occur within 300 km of each other.At least one site was available from each major climate zone (Arctic-boreal, temperate, and tropical) and all major wetland classes were represented.Mean annual maximum wetland area fraction over 2000-2017 is shown from the Wetland Area Dynamics for Methane Modeling (WAD2M) product (Zhang et al., 2021).

Figure 3 .
Figure 3. Random forest model predicted versus observed values for(a-d) the mean seasonal cycle (MSC) of methane (CH 4 ) flux for sites in (a) tundra, (b) boreal, (c) temperate, and (d) tropical climate regions and (e) the overall sites mean CH 4 flux, during cross validation.Although the US-OWC site is plotted in (a), it is excluded from calculation of Nash-Sutcliffe Efficiency (NSE), Coefficient of Determination (R 2 ), and mean absolute error (MAE; nmol m −2 s −1 ) performance metrics, and sample count (n).The 1:1 fit is shown as a dashed black line.

Figure 4 .
Figure 4.The mean seasonal cycle of model-predicted CH 4 flux (solid black line) and uncertainty range (gray ribbon) compared against monthly observed mean fluxes (open circles) and (a-c) standard deviation or (d) 25th-75th percentiles (vertical bars) for four AmeriFlux test sites.The model reproduced mean fluxes and the seasonal cycle best at (a) a boreal fen (CA-CF2; Tenuta, 2020), (b) a temperate fen (US-ALQ; Olson, 2018), and (c) a boreal bog site (US-MBP; Roman et al., 2021), whereas seasonal cycle performance was not reproduced at (d) one humid tropical forest site (PE-QFR; Romanet al., 2020), which is an upland site that experiences very wet soil conditions and supports substantial but highly variable CH 4 fluxes.

Figure 5 .
Figure 5. Global maps of: (a) Upscaled (UpCH4) mean 2001-2018 CH 4 flux using WAD2M wetland area (b) CH 4 flux uncertainty computed as 1 standard deviation of the random forest ensemble expressed as a percent of the mean (i.e., coefficient of variation) 2001-2018 flux; (c, e) the mean 2001-2018 CH 4 flux from the Global Carbon Project (GCP) bottom-up/top-down process model ensemble (also using WAD2M) subtracted from the UpCH4 mean (a); (d, f) the correlation, expressed as the correlation coefficient of determination (R 2 ), between the mean seasonal cycle (MSC) of UpCH4 and the GCP bottom-up/top-down ensemble.

Figure 9 .
Figure 9. (a) Global wetland CH 4 flux site-cluster constituencies for which pixels are assigned to one of 26 site-clusters (colors) based on similarity to UpCH4 predictor conditions at the sites within that cluster.(b) The ranked difference in predicted CH 4 emissions (TgCH 4 y −1 ) between UpCH4 predictions for a given constituency (colors) and a simple extrapolation of the monthly mean flux for the site cluster to the entire constituency (ignoring flux data from other sites).Constituencies and their upscaling-based emission estimates are likely to be insensitive to adding additional sites (more transparent colors in map (a)) when emissions differences are close to zero (short segments in (b)).In contrast, constituencies and emission estimates are likely to be more sensitive to additional sites (opaque colors in map (a)) where absolute emissions differences are large (long segments in (b)).The size of the segment end points is proportional to the annual constituency CH 4 flux.A full constituency map without variable color transparency is provided in FigureS12in Supporting Information S1.

Table 1
Natural and Anthropogenic Methane Emissions by Source Sector for the Decade 2008-2017 From Global Carbon Project Top-Down (TD) and Bottom-Up (BU) Model Ensembles Table 2) that had not yet developed vegetation cover.One site (DE-Hte) is coastal but was freshwater dominated during the observation period.The final eddy covariance tower data set consisted of 43 freshwater wetland sites covering

Table 2 Cluster
). High northern latitude bounding boxes correspond to locations of Hudson Bay Lowland (left) and West Siberian Lowland (right) wetland complexes.
Assignments, Number of Weeks and Percent of Total Data Set, and Wetland Site ID, Location, Class, Climate, and Digital Object Identifiers (DOI) for the 43 Wetlands Included in Upscaling 2576604x, 2023, 5, Downloaded from https://agupubs.onlinelibrary.wiley.com/doi/10.1029/2023AV000956by UNIVERSITY OF MINNESOTA 170 WILSON LIBRARY, Wiley Online Library on [06/09/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License Ma et al. (2021)1 ) who implemented WETMETH-a CH 4 process model, within the UVic Earth System Climate Model, andMa et al. (2021)(148 TgCH 4 y −1 ) who used satellite-based observations to refine estimates from 42 BU process models.Using GIEMS-2 corrected for only vegetated wetland area, rather than WAD2M, cut UpCH4 global emissions by more than half (71.8 ± 22.6 TgCH 4 y −1 ), highlighting the sensitivity of total emissions to the wetland map used (Figures S14-S16 in Supporting Information S1).