A Closer Look at the Effects of Lake Area, Aquatic Vegetation, and Double‐Counted Wetlands on Pan‐Arctic Lake Methane Emissions Estimates

Lake methane emissions are commonly upscaled from lake area, with recognition that smaller, non‐inventoried lakes emit more per unit area. There is also growing awareness of the importance of lake aquatic vegetation and potential “double‐counting” with wetlands, but lack of consensus on which is most impactful. Here, we combine high‐resolution data with the comprehensive lake inventory HydroLAKES to rank these three variables based on emissions sensitivity. Including non‐inventoried small lakes <0.1 km2 (+30 [range: 9.0 to 82]% change) is greatest, followed by double‐counting (−20 [−11 to −34]%) and lake aquatic vegetation (+14 [2.7 to 43]%). Significantly, emissions from non‐inventoried lakes contribute far less than the ∼40% previously determined globally through statistical area extrapolation. We produce a first pan‐Arctic estimate of lake aquatic vegetation in 1.37 million km2 of lakes, but after correcting for persistent double‐counting, its net effect is to decrease emissions estimates by 9%. Thus, previous global emissions estimates are likely too high.

• High-resolution lake maps are scaled to the pan-Arctic, showing small lakes <0.1 km 2 comprise less area (12% of total) than thought • Double counting between lake aquatic vegetation and wetlands is still prevalent (11% of lake area) and occurs in lakes of all sizes • Previous lake methane emissions estimates are likely biased high, and uncertainty can best be reduced by improving lake map resolution

Supporting Information:
Supporting Information may be found in the online version of this article. 10.1029/2023GL104825 2 of 9 upscaled methane estimates.Current global lake inventories include HydroLAKES (Messager et al., 2016, 0.1 km 2 ), derived from national inventories supplemented by coarse-scale remote sensing; and GLOWABO (Verpoorter et al., 2014, 0.002 km 2 ), derived from resolution-enhanced satellite remote sensing.The absence of small lakes and ponds (hereafter: lakes) in such inventories is a known problem (Downing, 2008;Holgerson & Raymond, 2016).Holgerson and Raymond (2016) extrapolated GLOWABO using log-log linear regression and the Pareto distribution to estimate that very small ponds (<0.001 km 2 in area) comprise 8.6 [range: 5.9-11.2]% of global lake area and contribute a striking 40.6 [0-68.8]% of global lake methane emissions.Using similar extrapolation methods and a larger emissions data set, Rosentreter et al. (2021) report a 37% emissions contribution from this very small size class.However parametric extrapolations of small lake size classes show regional variations (Kyzivat et al., 2019b;Muster et al., 2019) and are poorly validated, and the Pareto distribution can markedly overestimate the abundance of small water bodies (Seekell & Pace, 2011).
A second factor gaining attention is the role of aquatic vegetation in lake methane emissions (Bastviken et al., 2023;Bodmer et al., 2021;Kyzivat et al., 2022).Aquatic vegetation and lake littoral zones have long been recognized as "hot spots" of methane production and emission (Colmer, 2003;Hutchinson, 1957;Juutinen et al., 2003), but to our knowledge have not yet been used as upscaling variables, in part due to a lack of comprehensive mapping (Melack & Hess, 2023).In general, lake aquatic vegetation has received much less attention than lake area in upscaling studies, but is generally understood to exert a net positive contribution to methane emissions (Bastviken et al., 2023;Bodmer et al., 2021).
A third, chronic problem with published lake inventories, statistical extrapolations, and remotely sensed lake products is inclusion of open water bodies otherwise classified as wetlands in other data products, leading to "double-counting" of the same area in merged total aquatic estimates (Saunois et al., 2020).Double-counting is thought to be particularly common between non-inventoried lakes and coarse pixels in Arctic wetland maps (Thornton et al., 2016) and may contribute to a large observed discrepancy between bottom-up (statistically upscaled) and top-down (inverse atmospheric modeled) methane budgets (Saunois et al., 2020).Since wetlands and ponds are high methane emitters (Rosentreter et al., 2021), even small overlaps in area may translate to significant overestimation lake methane emissions, yet the exact degree of such overlap has not previously been quantified at the pan-Arctic scale.
Here, we examine the relative importance of non-inventoried small lakes, aquatic vegetation, and double-counting in upscaled Arctic lake methane emission estimates.First, we obtain high-resolution Arctic lake/pond maps comprising 626,000 water bodies >0.0001 km 2 .Next, we scale these maps with HydroLAKES to estimate the relative area contribution from lakes <0.1 km 2 .We provide a simple estimate for corresponding methane emissions based on individual lake area and temperature.Then, we estimate lake aquatic vegetation coverage and quantify double-counting, and discuss how these factors change our pan-Arctic estimate.We find that contributions from non-inventoried small lakes <0.1 km 2 are significant (12 [range: 3.9-31] %), but very small ponds <0.001 km 2 only contribute 0.5 [0.0-1.4]% of area and 2.6 [0.1-6.8]% of emissions, far less than previously thought.We conclude by comparing the effects of the three selected variables on pan-Arctic lake methane emissions and giving recommendations for future research.

Estimation of Small Water Bodies
For pan-Arctic lake mapping we use HydroLAKES (Messager et al., 2016), which is often used for lake methane upscaling studies both globally (Johnson et al., 2022;Zhuang et al., 2023) and in the Arctic (Matthews et al., 2020).While the GLOWABO (Verpoorter et al., 2014) data set contains smaller lakes (>0.002 km 2 ), previous studies note its anomalously large total area (21%-66%) relative to others (DelSontro et al., 2018;Johnson et al., 2022;Matthews et al., 2020).Based on our own examination of this product, incomplete river masking likely contributes to this anomaly.Our analysis uses the study domain (Figure S1 in Supporting Information S1) and grid of the Boreal-Arctic Wetland and Lake Methane Dataset (BAWLD; Olefeldt et al., 2021a;Olefeldt et al., 2021b, Figure S1).
While suitable for small lake extrapolation, these data sets under-represent large lakes, due to their small, 10.1029/2023GL104825 3 of 9 sometimes airborne-derived, extents.The Water Body Data set for the North American High Latitudes (Feng & Sui, 2020;Sui et al., 2022; WBD-NAHL, hereafter WBD), which inventories lakes ≥0.001 km 2 over all tundra and boreal forest ecoregion in North America using Sentinel-2 satellite observations, was used for validation.These data sets were primarily derived from optical remote-sensing and exclude vegetated portions of lakes.In total, some 97,000 km 2 of HR airborne and satellite images were compiled, including lakes >0.0001 km 2 within 75 diverse regions around the pan-Arctic.
Small lake areas were extrapolated from HydroLAKES, based on the HR lake area distribution.HydroLAKES was only used for lakes ≥0.5 km 2 , assuming some under-representation near its 0.1 km 2 resolution limit.This threshold was determined by sensitivity test (Figure S2 in Supporting Information S1).
Extrapolation entailed binning HR lake area into 100 logarithmically-spaced bins between 0.0001 and 0.5 km 2 (Figure 1, dashed line).The HR bins were normalized by comparing the total area of lakes sized 0.5-5 km 2 to Hydro-LAKES to ensure a smooth transition.The upper threshold of this bin was chosen to exclude the sparse larger lakes under-represented in the HR data sets (there are only ∼21,000 in the pan-Arctic).This method, which we consider empirical, not parametric, estimates binned areas and not individual lake counts.Two HR regions with extremely high (Surgut, Siberia) and low (Tuktoyaktuk Peninsula, Yukon) small lake fractions were used to compute upper and lower confidence intervals by taking their distributions instead of the merged HR distributions (Figure 1, gray envelope), thus representing the most extreme distributions observed.This approach was validated over the entire WBD data set, with excellent agreement in distribution, that is, +3.3% difference in extrapolated lake area (Figure 1).

Lake Aquatic Vegetation and Double-Counting
Aquatic vegetation coverage is poorly constrained in Arctic lakes.We provide a first estimate of its coverage using the 30 m resolution Global Surface Water (GSW) data set (Pekel et al., 2016), which derives water occurrence frequency from 30 years of Landsat satellite imagery.The optical wavelengths of the Landsat spectrometer prohibit sensing of water under aquatic vegetation (Kasischke et al., 1997), so pixels with fluctuating inundation likely contain littoral wetland vegetation (Vanderhoof et al., 2023).To avoid double-counting of lakes and wetlands, the major global wetlands data set WAD2M (Zhang et al., 2021) and Global Carbon Project methane budget (Saunois et al., 2020) both use a 50% GSW occurrence threshold to mask out permanent surface water (assumed lakes), which we adopt as an indicator of potential wetland double-counting.
The probability of littoral wetland vegetation based on water occurrence was assessed from a reference HR airborne synthetic aperture radar (SAR)-derived data set of lake emergent vegetation (LEV, Kyzivat et al., 2022).Emergent vegetation, defined as protruding from the water surface, is readily detected with radar, but only comprises a portion of lake aquatic vegetation, which includes submerged and floating-leafed vegetation.To estimate lake aquatic vegetation coverage, the distribution of occurrence values was obtained for LEV pixels in each of the four regions analyzed in Kyzivat et al. (2022).Bayes' Theorem was used to convert this distribution to a prediction across HydroLAKES based on underlying water occurrence values (Text S1 in Supporting Information S1).
Similar to the area extrapolations, confidence intervals were derived from reference regions (Kyzivat et al., 2022) having the lowest (1.0%) and highest (59%) LEV coverage, that is, the Canadian Shield (Daring Lake) and Peace-Athabasca Delta, respectively, which represent extreme end-members of the Arctic landscape.Lake aquatic vegetation and double-counting in extrapolated area bins was simply taken as the mean LEV fraction within that size bin based on reference data-a rough approximation only required for a small portion (∼12%) of total lake area.Estimated vegetation in lakes with no corresponding occurrence values-either due to misregistration or no data-was set to 0. Observations for 98 lakes drawn from all regions were held out for validation, yielding r 2 of 0.60, bias of +4.0 points, and RMSE of 15% coverage against fractional LEV values (Figure S7 in Supporting Information S1).The high bias is attributed to including data from the extremely wetland-rich Peace Athabasca Delta in the estimator.Given that the GSW data set is only available up to 78°N and that LEV is a subset of all 10.1029/2023GL104825 4 of 9 lake aquatic vegetation (including submerged plants), our prediction of lake aquatic vegetation is thus conservative and uncertainty is dominated by variation between reference regions.

Reference Methane Emission Calculations
To explore the sensitivity of upscaled methane emission estimates to inclusion of small lakes, aquatic vegetation, and double-counting, a baseline methane emission must be assigned to every lake.Lake area and temperature were chosen as easily obtainable upscaling variables, with areas either provided in HydroLAKES or obtained from the extrapolation estimate.Candidate water and land temperature variables were obtained from ERA5 reanalysis (Hersbach et al., 2020) for the calendar year 2022.Equations for lake area and temperature were tested using multiple linear regression against open-water (diffusive plus ebullitive) methane emissions from the comprehensive BAWLD-CH4 data set (Kuhn et al., 2021a;Kuhn et al., 2021b;Text S2 in Supporting Information S1).
An alternative upscaling approach is to scale by lake aquatic vegetation coverage.Estimates for relative emissions from lake aquatic vegetation were obtained from two studies synthesizing flux chamber measurements (which integrate diffusion, ebullition, plant flux, and occasionally ebullition).Kyzivat et al. (2022) report 6.1x greater emissions from global LEV than open water from the same lakes over all pathways.BAWLD-CH4 reports median open water emissions of 28.9 mg CH 4 /m 2 /day, compared to 106 from marsh sites (the category that includes littoral wetlands), giving a factor of 3.7x.In effort to be conservative, and given its matching spatial domain, this smaller 3.7x factor from BAWLD was used to upscale reference emissions by lake aquatic vegetation coverage for all 265,701 HydroLAKES polygons and for the 100 extrapolated area bins examined in this sensitivity study.
A first pan-Arctic estimate of lake aquatic vegetation covers some 108,000 [20,000-325,000] km 2 , comprising 7.8 [1.5-24]% of lake area (Figure 3b) and the equivalent of 3.3% of BAWLD wetlands.Figure 3a is normalized by total lake area, thus highlighting regions with high-vegetation lakes, which do not always coincide with traditionally lake-rich regions like the Canadian Shield.This map offers a first data product to enable pan-Arctic upscaling of aquatic vegetation contributions to lake methane emissions.1).Figures S8 and S9 in Supporting Information S1 illustrate such double-counting and show how coarse vector maps like HydroLAKES can overestimate inventoried lake area, due to an artificially convex shape caused by lack of polygon vertices.This double-counting in conjunction with potential over-estimation of inventoried lake areas suggests even HydroLAKES, the leading global lake inventory, needs to be modified before using it to upscale methane emissions.
The best-performing regression equation for open-water emissions used ERA5-model soil temperature from 1 to 7 cm, outperforming reported in situ lake water temperatures, likely because methane is produced in lake sediments, whose temperatures may better resemble soil than the water surface.
The following equation (R 2 = 0.221, p < 0.001) was used to obtain baseline estimates of methane emissions for the 1.37 million km 2 of lakes in this study: where F is open-water emissions (mg CH 4 /m 2 /day), A is lake area (km 2 ), and T is modeled soil temperature (K).The total estimated emissions for this baseline scenario are 2.4 Tg CH 4 /yr.

Discussion
Our empirical, satellite-based analysis finds that very small ponds <0.001 km 2 comprise only 0.5 [0.0-1.4]% of pan-Arctic lake area and 2.6 [0.1-6.8]% of lake methane emissions, closely matching the findings of Ludwig et al. (2023) in Arctic delta lakes (1.1% of area and 2.3% of emissions).In contrast, global studies using parametric extrapolations find much greater emissions from very small ponds.Holgerson and Raymond (2016), for example, estimate 8.6 [5.9-11.2]% of area and 40.6 [0-68.8]% of diffusive emissions from ponds <0.001 km 2 .Similarly, Rosentreter et al. (2021) estimate   37% of open-water (diffusive plus ebullitive) emissions from this size class.Although our study examines the pan-Arctic, this lake-dense region includes 78% of HydroLAKES by count and 40% by area, so differences in geographic domain cannot explain the entire discrepancy.Some likely explanations for our surprising results are: (a) a weaker relationship between area and open-water emissions present in the newer BAWLD-CH4 emissions data sets relative to earlier concentration data sets; (b) previous use of the GLOWABO surface water inventory, which has recently been shown to be biased high, particularly in the 0.1-1 km 2 size bin (Johnson et al., 2022); and (c) derivation of upper and lower bounds from the Pareto distribution, which can overestimate small lake abundance by orders of magnitude (Seekell & Pace, 2011).A downward revision of total global lake area based on our pan-Arctic study alone would reduce these two global methane estimates by ∼40%.
Use of the Pareto distribution-or indeed any other parametric-based extrapolation-to estimate the abundance of small lakes may no longer be necessary, given the growing availability of remotely sensed HR lake inventories.Nearly all heavy-tailed distributions fit power laws in their tails, for example, the right portions of the plots in Figures 1 and 2 (Alstott et al., 2014).This power-law behavior occurs for lakes >∼0.5 km 2 regardless of region (Kyzivat & Smith, 2023;Kyzivat et al., 2019b), signifying that HydroLAKES can barely inform a power-law extrapolation.The approach presented here-a hybrid of direct observation and statistical extrapolation-fits data empirically, regardless of the onset of power-law behavior.Furthermore, classic lake-size distribution studies emphasizing the importance of small lakes (Downing, 2008;Downing et al., 2006;Hanson et al., 2007) have focused on their abundance, not total area, which tells the entirely opposite story (Table 1).For methane upscaling, lake surface area, not abundance, is key, and parametric extrapolations from coarse-resolution inventories overestimate both.
As supported by recent reviews (Bastviken et al., 2023;Melack & Hess, 2023), aquatic vegetation contributes a significant proportion of lake emissions and area-if it is not already double-counted as wetlands in other data products.Our previous work (Kyzivat et al., 2022) examined lakes <∼10 km 2 in four Arctic regions and found lake emergent vegetation (LEV) comprises 1.0%-59% of lake area.Its net effect after correcting for double-counting was to add +21% to upscaled emissions estimates derived from unvegetated zone reference data.The present study, despite a less direct method for mapping aquatic vegetation, benefits from pan-Arctic observations with better-constrained regional variability and double-counting, and no small lake sampling bias.Our present estimate of 7.8 [1.5-24]% lake aquatic vegetation adding 14 [2.7-43]% to emissions is most likely biased high, given the equally-weighted contribution from the extremely wetland-rich Peace-Athabasca Delta.In sum, vegetated zone emissions, though still highly uncertain, appear to be comparable in magnitude to the total open-water emissions from non-inventoried small lakes, before correcting for double-counting.Thornton et al. (2016) introduce Arctic double-counting as a scale mismatch problem due to non-inventoried lakes being conflated with coarsely gridded wetlands.We find that most double-counted area (Table 1, Figure S3 in Supporting Information S1) comes from small inventoried lakes (0.1-10 km 2 ), due to their abundance and potential for extensive littoral zones.Correcting for double-counting alone by adjusting total lake area would decrease emissions by 20 %.Above all, the amount of area in question (∼11% of lake area) is minor compared to differences in global lake area estimates, which vary by ∼60% (DelSontro et al., 2018;Johnson et al., 2022).
The double-counting problem can be solved by defining lakes in such a way as to completely avoid littoral wetlands (Bastviken et al., 2023;Olefeldt et al., 2021b) or by using the same lake inventory both for bottom-up lake estimates and for subtracting from wetland maps.As a simple correction for HydroLAKES, if we define lake aquatic vegetation to be mutually exclusive from double-counted areas, it comprises 6.3% of area, in addition to the 11% double-counting.Consequently, its net effect is to decrease estimated lake emissions by 9%, since increases caused by high-emitting littoral wetlands not considered in most upscaling frameworks are more than offset by correcting double-counting.This modest number explains why stricter area accounting measures implemented by Saunois et al. (2020) decreased the global wetlands estimate by only 36 TgCH 4 , leaving a remaining 154 TgCH 4 discrepancy between bottom-up and top-down models.It is worth noting that despite these accounting measures, double-counting persists because no lake area estimates use the GSW data set, even though it is used by wetland models to mask out lakes.Clearly, HydroLAKES is insufficient as an open-water lake data set and needs further corrections, such as demonstrated here, for mutually-exclusive comparison to wetland emissions.
By focusing solely on lake area, lake aquatic vegetation, and double-counting, our broad-scale study ignores other important variables (e.g., nutrients, chlorophyll, lake type, seasonal fluctuations in lake area, sample bias, 10.1029/2023GL104825 7 of 9 regression uncertainty) that also affect methane upscaling.Prediction based on lake area has also been shown to overestimate total emissions (Ludwig et al., 2023).Therefore, our findings should not be interpreted as a new upscaling estimate, but rather a first sensitivity experiment.Furthermore, the Bayesian technique used for estimating aquatic vegetation coverage is limited by potential conflation with non-vegetation drivers, such as hydrology, that impact water occurrence (high bias to emissions).Our calculations are also sensitive to the precise shorelines of HydroLAKES, which may be overly generous and include terrestrial vegetation (high bias), especially if misregistration errors are prevalent (Figure S8 in Supporting Information S1).Conversely, the HR data sets used for extrapolation exclude vegetated regions and thus under-estimate lake area (low bias).Our double-counting estimate assumes all low-occurrence water would be included in a hypothetical wetland map (high bias), and the emissions sensitivity to double-counting and vegetation relies on a highly uncertain scaling factor (unknown bias, Text S3 in Supporting Information S1).Future studies should examine emissions driven by submerged as well as emergent vegetation and focus on large lakes (>1 km 2 ) (e.g., Bastviken et al., 2004), given their large emissions and aquatic vegetation contributions.Most importantly, better techniques are needed to estimate lake emissions beyond upscaling methods like those investigated here, especially process models and intercomparison efforts, for example, as done for wetlands (Bohn et al., 2015;Melton et al., 2013;Zhang et al., 2017).

Conclusion
The task of upscaling "bottom-up" lake methane emissions is data-limited.This study examines how non-inventoried small lakes, absence of lake aquatic vegetation maps, and double-counting of vegetated areas already included in wetland maps likely affect upscaled lake methane estimates.Based on newly available in situ and satellite data observations, we assess the most impactful of these three variables to be the area of small lakes (+30%), followed by double-counting (−20% of emissions) and lake aquatic vegetation (+14%), which partially offset each other.While previous upscaling studies (Bastviken et al., 2011;DelSontro et al., 2018;Holgerson & Raymond, 2016;Johnson et al., 2022;Matthews et al., 2020;Rosentreter et al., 2021) account for these variables to different degrees, this sensitivity study highlights their relative contributions and notes that double-counting and biased extrapolation are still present in estimates.
Importantly, our finding of just 2.6% emissions from very small ponds is far lower than previous global estimates (37%-41%).High resolution remote sensing of small lakes is thus preferable to statistical extrapolation, which overestimates small lake abundance and consequently total methane emissions.The growing availability of remotely sensed, high resolution lake mapping suggests that the use of power-law extrapolations can soon be abandoned in favor of direct observation (e.g., WBD, Sui et al., 2022).Upscaling uncertainty should be especially reduced when lake inventories to 0.001 km 2 become globally available (e.g., only ∼0.5% of area and ∼2.6% of emissions missed), and focusing on finer resolutions would yield diminishing returns.Due to their small total surface area, we submit that very small non-inventoried ponds <0.001 km 2 have received excessive interest relative to their CH 4 emissions at the pan-Arctic scale.Lake aquatic vegetation has received less attention but is likely equally useful for upscaling emissions, provided double-counting can be eliminated.Precisely measuring global lake area remains the largest obstacle to lake methane emissions estimates.

Data Availability Statement
The following publicly-available data sets were used in this analysis: HydroLAKES (Messager et al., 2016) from the World Wildlife Fund; GSW (Pekel et al., 2016) from the European Commission's Joint Research Centre; BAWLD (Olefeldt et al., 2021a) and BAWLD-CH4 (Kuhn et al., 2021a) from the National Science Foundation Arctic Data Center; WBD (Feng & Sui, 2020) from the Chinese National Tibetan Plateau/Third Pole Environment Data Center, available in English; LEV (Kyzivat et al., 2021) from the Oak Ridge National Lab (ORNL) Distributed Active Archive Center (DAAC); HR lakes (Kyzivat et al., 2019a;Mullen et al., 2022;Muster et al., 2017a) from ORNL DAAC, ORNL DAAC, and PANGEA, respectively;and ERA5 (Muñoz Sabater, 2019) from Copernicus Climate Change Service Climate Data Store.New data and a python toolbox generated for this manuscript are archived in the Arctic Data Center (Kyzivat & Smith, 2023).

Figure 1 .
Figure1.Our HR-derived small lake (<0.5 km 2 ) area extrapolation over North American tundra boreal regions (black dashed line) is similar in total and extrapolated area (+3.3% difference) to the WBD validation data set (red line).

Figure 2 .
Figure2.Cumulative distribution of (a) pan-Arctic lake and lake aquatic vegetation areas; and (c) fraction of reference methane emissions.Inset (b) highlights the shape of the vegetation curve and uses a re-normalized right y-axis so that coverage from small lakes can be easily read.Dotted lines indicate the 0.1 km 2 resolution limit for HydroLAKES, although extrapolations (dashed lines) begin from 0.5 km 2 .

Figure 3 .
Figure 3.A first pan-Arctic estimate of lake aquatic vegetation, both as (a) percentage of inventoried HydroLAKES; and (b) percentage of BAWLD grid cell areas.Missing data signify that no HydroLAKES lakes are present within the grid cell.