Global Biogeochemical Cycles

The pCO2 in boreal lakes: Organic carbon as a universal predictor?



[1] During the past two decades, it has become evident that a majority of lakes are net conduits of CO2 to the atmosphere. This insight has implications both for lake metabolism per se and for assessing the role of lakes in the global C cycle. The concentration of dissolved organic carbon (DOC), which constitutes >90% of the total organic carbon (TOC), has been identified as a key driver of partial pressure of CO2 (pCO2). A crucial question is whether one may identify global relationships in the DOC-pCO2 relationship in lakes or whether this has to be determined regionally or locally. A second major aspect is how to best predict CO2 as a function of DOC. Based on a survey of pCO2 and a range of lake and catchment variables in 112 lakes, we support the view that DOC is by far the most important determinant of pCO2 while groundwater influx has a minor contribution. Contrary to expectations, total phosphorus (P) also apparently contributed positively to pCO2, owing to the fact that most P in these lakes is on the form of allochthonously organic P, and thus correlates strongly with DOC. Physical principles dictate that even a lake completely devoid of DOC should have a nonzero pCO2. This is not reflected in power models, which imply that the pCO2 approaches zero with zero DOC. Based on this study as well as published data on DOC-pCO2 relationships, we argue that identity link gamma–generalized linear models are appropriate for predicting pCO2 in lakes and that their application makes it possible to reach reasonably accurate global models for how pCO2 relates to DOC and other environmental factors.

1. Introduction

[2] Freshwater systems constitute a comparatively small proportion of our planet's surface. The impact of freshwater systems has therefore until recently, and still not infrequently, been considered marginal in the context of ecosystem feedbacks in the global carbon (C) cycle. Beginning in the early 1990s it became clear, however, that freshwater ecosystems were not only important conduits of CH4 [Kelly et al., 1997; Bastviken et al., 2004], but they were also commonly net heterotrophic, serving as sources of CO2 due to bacterial mineralization of terrestrially derived organic C [Hessen et al., 1990; Kling et al., 1991; Cole et al., 1994]. Building on an increasing amount of evidence, the consensus is now that CO2 supersaturation in circumboreal lakes is the norm rather than the exception, and not only a property of heavily colored humic lakes. Several reports suggest that lakes, in disproportion to their relative area, are significant components in the carbon budget of terrestrial ecosystems and vent a considerable amount of CO2 to the atmosphere [Dean and Gorham, 1998; Cole et al., 2007; Tranvik et al., 2009]. Hence, the conversion of organic to dissolved inorganic C (DIC), and the fate and flux of DIC has become a major issue in the context of the global C cycle as well as for the understanding of lake metabolism and C budget.

[3] The literature offers a wide range of factors which potentially regulate the degree of supersaturation of CO2 in freshwaters. The physical and morphometric properties of lakes (e.g., lake area, depth and temperature) influence the net atmospheric gas exchange in several ways. Large amounts of organic carbon are stored in lake sediments [Dean and Gorham, 1998; Tranvik et al., 2009], which, depending on redox conditions and mixing processes, may serve as a source of CH4 or CO2 to the water column. A major fraction of produced CH4 may subsequently be converted to CO2 by water column methanotrophs [cf. Rudd and Hamilton, 1978; Hessen and Nygaard, 1992]. The bathymetric properties of a lake determine the ratio of sediment surface to lake volume and thus, the impact of the sediment. Likewise, the ratio of lake area to lake volume might determine the efficiency of gas exchange over the surface. Lake area per se could also affect gas exchange as increased fetch facilitates the wind induced mixing of the top layer. Finally, solar irradiation may generate high levels of photo-oxidation of dissolved organic carbon (DOC) in surface layers [Granéli et al., 1998], while the solar heating of the upper water column will in itself play a key role for CO2 exchange via the negative relationship between gas saturation and temperature.

[4] Since the net CO2 exchange in ecosystems is primarily governed by the balance between photosynthesis and respiration, biotic processes have been the focus of most studies on CO2 saturation in lakes. The concentration of allochthonous DOC has been suggested as the main driver for the flux of CO2 from lakes to the atmosphere [del Giorgio et al., 1997; Prairie et al., 2002; McCallister and del Giorgio, 2008], and in humic, or brown water lakes, the production of heterotrophic bacteria is high and “subsidized” by allochthonous C [Hessen et al., 1990; Tranvik, 1990; Jansson et al., 2008]. The very C rich nature of both dissolved and particulate matter in many freshwater localities may further induce high respiratory disposal of “excess C” [cf. Hessen and Anderson, 2008]. In-lake processes are not the only source of lake CO2 however. Some studies suggest that weathering processes as well as soil and root respiration may generate high levels of CO2 in the catchment, which have been subsequently transported to the lakes via groundwater or surface flow. Such processes serve as another route of allochthonously generated CO2 in lakes [Rantakari and Kortelainen, 2008], which have been reported to be of similar magnitude as in-lake DOC mineralization [Stets et al., 2009; Humborg et al., 2009].

[5] There seems to be a general consensus that, among these factors, the concentration of DOC primarily sets the stage for the net exchange of inorganic C, but that there is also a major variability in the DOC-CO2 relationships owing to abiotic and biotic properties of the lakes and their catchments. Furthermore, there may be interregional inconsistencies in the slopes and intercepts of regression models relating CO2 (actually, the partial pressure of CO2pCO2) to DOC [Roehm et al., 2009]. A crucial question is whether one may identify global relationships in the DOC-pCO2 relationship in lakes, or whether this has to be determined regionally or locally. A second major aspect is how to best predict CO2 as a function of DOC. The general approach has been to log transform both variables (to attain homoscedasticity in the data set) and fit the data to a linear model (i.e., log(y) = log(a) + b log(x)), which is equivalent to a power function model when back-transformed to a linear scale (i.e., y = axb). This approach implies that the pCO2 approaches zero with zero DOC. However, physical principles dictate that even a lake completely devoid of DOC should have a nonzero pCO2 close to that of the air. Thus, it is reasonable to assume that a power function is not the most appropriate model for describing the relationship between pCO2 and DOC. In this study, we have modeled pCO2 in lakes under the assumption that the relation between DOC and pCO2 is linear with a nonzero intercept.

[6] Thus, while it is widely recognized that the vast majority of boreal lakes are supersaturated with CO2, the role of DOC relative to other parameters still remains to be settled, and so do the issues of global versus local relationships, as well the statistical model chosen to represent this relationship. To assess these issues and explore the impact of a wider range of ambient parameters on lake pCO2, we surveyed 112 lakes with respect to pCO2 and a range of parameters related to lake water chemistry (total phosphorus, total nitrogen, chlorophyll, pH and sulfate), physical lake properties (surface area, altitude, depth and drainage area ratio), and watershed characteristics (runoff, annual air temperature, vegetation density (NDVI, Normalized Difference Vegetation Index), slope and the catchment proportions of bog, forest and farmland). The wide span of parameters and lake properties allowed us to examine the relative contribution of in-lake processes, lake physical properties and watershed characteristics on the pCO2 in lakes. We also compare our results with various published models relating pCO2 in lakes to physical, chemical and catchment properties.

2. Methods

[7] This study was based on a regional lake survey comprising 112 lakes in southern and central Norway (Figure 1). The lakes were chosen to ensure that all geographical regions and a wide span in lake properties were represented. Each lake was sampled once during the daytime in October 2004. Most lakes were reached by hydroplane that also was used as a platform for the sampling. Samples for water quality parameters and pCO2 were taken at 0.7–1.0 m depth at the deepest part of the lake.

Figure 1.

Map of southern Norway with the 112 localities represented as yellow disks with areas proportional to the measured pCO2 (μatm). The color code of the map represents the fraction of land covered with bog. The rectangle in the map insert shows the global location.

[8] Analyses of pCO2 were done according to Sobek et al. [2003] in duplicate 1.125 L bottles, flushed by at least twice its volume by gently lowering the water sampler tube to the bottom of the bottle while ensuring that no bubbles were generated. Each bottle was sealed with a rubber stopper and inserted into separate thermoinsulated containers. Water temperature at the time of sampling was measured with a thermometer fitted inside the water sampler, which could then be compared to the water temperature at the time of analysis. The partial pressure of CO2 was analyzed in the field by infrared gas analysis of a 50 mL head space. The head space was generated by gently injecting ambient air into the bottles with a plastic syringe while simultaneously withdrawing water. The bottles were shaken vigorously for two minutes to ensure gas equilibrium before gas was extracted and analyzed in an EGM-4 high-precision gas analyzer (PP Systems). Each replicate was analyzed twice to quantify measurement errors (mean coefficient of variation = 0.23%). Atmospheric CO2 partial pressure was recorded and the pCO2 in the water sample calculated by applying Henry's law to the partial pressure of CO2 in the head space and using correction factors for temperature, atmospheric pressure, and CO2 introduced by injecting ambient air into the bottles. The mean coefficient of variation for duplicate samples was 0.96%.

[9] Two separate 200 mL samples for chlorophyll quantification were vacuum filtered on 25 mm Whatman GF/F filters on the day of sampling. The filtered volume was reduced to 100 mL for samples with a visible high content of chlorophyll. After filtration, the filters were placed in zip-locked plastic bags and frozen in liquid nitrogen for later chlorophyll extraction in acetone and fluorometric analysis. All other water chemistry parameters were analyzed by the accredited laboratory at the Norwegian Institute for Water Research (NIVA) using standard methodology. In brief, total phosphorus (TotP) was measured as PO4 by manual spectrophotometry after wet oxidation with peroxodisulfate. Total organic carbon (TOC) was measured as CO2 after catalytic high-temperature combustion and detected by infrared gas analysis. In general, the dissolved fraction of TOC (DOC) makes up >90% of TOC. Total nitrogen (TotN) was analyzed colorimetrically in a segmented flow autoanalyzer after conversion to NO3 by wet oxidation. The base cations Na, K, Mg and Ca, as proxies of groundwater influence, were analyzed by atomic absorption spectrophotometry. Cation concentrations were corrected for seawater influence by subtracting the product of the Cl concentration and the respective Cl to cation seawater ratios.

[10] Key catchment properties (slope, area, altitude, NDVI, runoff, temperature and the proportion of forest, bog and farmland) were analyzed from digital maps with ESRI ArcGis 9.3 geographic information system (GIS) using the extension of Hawth's Analysis Tools (H. L. Beyer, Hawth's analysis tools for ArcGIS: Version 3.08, 2004; available at Runoff was represented by averages over the period 1960–1990 obtained from the Norwegian Water Resources and Energy Directorate (NVE). Mean catchment slope and altitude were calculated from a 1x1 km digital elevation model of Norway from the Norwegian Mapping Authority (Statkart). Land area use was extracted from 1:50k vector maps from Statkart. NDVI was acquired as monthly composites from the U.S Geological Survey Eurasia Land Cover Characteristic database ( Data on mean annual air temperature were downloaded as a 1 × 1 km raster map from the BioClim database [Hijmans et al., 2005] ( Total catchment specific yearly runoff was calculated as the product of area specific yearly runoff (mm yr−1) and catchment area (km2). Lake volumes were estimated from a power function of lake surface area, explaining 88% of the lake volume variation in an independent data set from 490 Norwegian lakes (S. Larsen et al., manuscript in preparation).

[11] The pCO2 and TOC data exhibited a large degree of heteroscedasticity such that a logarithmic transformation would be the common remedy called for to reduce skewness. Since we consider a nonzero intercept be an important feature of the pCO2-DOC relationship, we used generalized linear models (GLMs) [McCullagh and Nelder, 1989] based on the gamma distribution with an identity link function instead of the canonical inverse link. The link function in a GLM is a transformation linking the expectation of the response variable to a linear predictor of the explanatory variables. Identity link means that it is the expectation of the untransformed response variable which is predicted by the explanatory variable(s). Modern statistics [e.g., Venables and Ripley, 1999] uses the linear model as the common term for statistical models where the dependent variable is predicted by a linear combination of the independent variables, such as in multiple regression, analysis of variance and combinations of these. The unexplained variation in the dependent variable of a linear model is assumed to be equivalent to that of the independent variable, normally distributed noise with constant variance. Generalized linear models [McCullagh and Nelder, 1989] extend this concept to situations where the dependent variable belongs to any distribution of the exponential family, which includes binomial, Poisson, and gamma distributions as well as the normal distribution. The linear predictor of a GLM is functionally related to the expectation of the dependent variable by a link function, which can be nonlinear.

[12] The gamma distribution can be used to model continuous, positive variables where the standard deviation is proportional to the mean, i.e., that have a constant coefficient of variation. This heteroscedasticity property is characteristic for many types of chemical analysis, including measurements of the partial pressure of CO2 in water. A gamma-GLM with an identity link function is particularly suitable for modeling pCO2 because it both captures the heteroscedasticity of the measurement errors and allows the predicted relationship to have a nonzero intercept with the y axis.

[13] In statistics tools like R, S+, or SAS, GLM models are fitted by maximizing the likelihood of the observations given the model parameters by iteratively reweighted least squares. The negative logarithm of the likelihood at the best parameter fit can be partitioned into so-called deviance components, representing the contributions from the independent variables and the unexplained error to the total variation in the dependent variable. Deviance components can be compared by analysis of deviance, in direct analogy with the classical analysis of variance for normally distributed variables. The negative log likelihood of a model is also a key component of the Akaike information criterion (AIC) and other parsimony indicators, used for selecting the best balance between goodness of fit and model complexity (number of model parameters or independent variables) in statistical modeling [Johnson and Omland, 2004].

[14] Initially, we also tested generalized additive models (GAMs) [Hastie and Tibshirani, 1997] which are able to capture nonlinear relationships between dependent and independent variables, but they failed to improve the predictions beyond those of the GLM models.

[15] Measured pCO2 in lakes was first fitted to univariate gamma-GLMs with an identity link to assess the predictive powers of each of the independent variables. As this step, established TOC was by far the best predictor of pCO2, and a second set of GLMs using the same distribution and link function were fitted to groups of covariate variables in interaction with TOC to identify factors influencing the pCO2-TOC relationship. The covariate groups were chosen to reflect aspects of the origin and fate of TOC (lake physical properties, in-lake processes and water chemistry, catchment properties and parameters related to groundwater influence; see Table 1 for further details), Regression models were simplified by stepwise backward elimination using model selection by the Bayesian information criterion (BIC) [Johnson and Omland, 2004]. All statistical analyses were performed with the R statistical programming environment version 2.6.1 [R Development Core Team, 2008].

Table 1. Single-Predictor Gamma-GLM Models With Identity Link for pCO2 Sorted by Goodness of Fita
VariableUnitMeanMedianRangeTransformationp ValueR2
  • a

    Each predictor variable is identified by name, unit, mean, median, range and transformation, as well as significance probability (p value) and fraction of total deviance explained by the corresponding regression model (R2).

  • b

    Lake water chemistry.

  • c

    Catchment properties.

  • d

    Lake physical properties.

  • e

    Groundwater influence.

TOCmg L−–24.6None<2E-160.73
TotPbμg L−–20Log10<2E-160.55
bogc%3.632.320–17Arcsin Sqrt<2E-160.51
Forestc%41.436.80–88Arcsin Sqrt<2E-160.50
TotNbμg L−126525542–645None<2E-160.49
Runoffcmm yr−117501411341–6350Log105.71E-080.26
Water temperature°C7.557.602.3–12.2None6.45E-080.25
Kemg L−–0.84Log106.04E-070.24
Air temperaturec°C33.033.5−2–7.5None2.69E-070.23
SO4bmgS L−11.751.430.5–8.08Log104.31E-070.22
Caemg L−11.230.890.13–5.14Log103.29E-060.18
Mgemg L−10.300.240.06–1.05Log108.1E-060.17
Farmlandc%0.50800–0.5Arcsin Sqrt7.53E-040.16
Area ratiocratio0.0960.0740.004–0.470Log103.49E-040.13
Naemg L−11.721.260.28–7.22Log100.01240.08
Catchment areacKm237.58.91.25–463Log100.1540.02
N depositioncmg m−2 yr−10.680.700.251–1.08None0.3430.01
Lake areadKm21.70.70.02–21.7Log100.3160.01
chlbμg L−10.790.600.07–3.7Log100.9230.00
pHb 5.895.904.6–7.0None0.06120.4 × 10−3
Residence timecYears1.050.560.004–9.27Log100.9130.9 × 10−4

3. Results

[16] All but three of the 112 lakes surveyed were supersaturated with pCO2, ranging from 351 μatm (slightly undersaturated) to 2512 μatm (sevenfold supersaturation). Mean pCO2 was 774 μatm and 74% of the lakes were more than >150% supersaturated (Figure 1).

[17] pCO2 had statistically significant relationships with 21 of the 26 tested variables in the single-predictor models (Table 1), but TOC was by a good margin the best predictor. The identity link gamma-GLM model with TOC as the predictor (pCO2 = 426 (23.5) + 90.5 (7.7) TOC; numbers in parentheses are standard errors) explained 69% of the total deviance. The model intercept (426) was within the 95% confidence interval for air pCO2 (356–434). For comparison with the power function used in other studies, we also fitted a linear model on log-transformed pCO2 and TOC: log10(pCO2) = 2.72 (0.014) + 0.329 (0.023) log10(TOC). The power function model explained 65% of the variation in pCO2, but the model did not capture the asymptotic behavior at low TOC in the data set. Furthermore, applying the two modeling approaches to subranges of the data sets (above and below the median TOC) revealed the power function model to be less robust to the underlying range of data than the identity link gamma-GLM model (Figure 2). The sample with the highest value for pCO2 (2512 μatm) was identified as an influential outlier with high leverage. Omitting the outlier from the model resulted in an increase in the amount of variance explained to 73% for the GLM model and 67% for the power function model, while the model coefficients remained within the original confidence areas.

Figure 2.

Relationship between pCO2 and TOC fitted to the identity link gamma-GLM model (solid lines) and the power function model (dashed lines). (left) The full data set and (middle and right) corresponding models fitted to subsets above or below the median TOC (represented by vertical solid lines). Horizontal lines indicate the mean and 95% confidence interval for ambient air pCO2.

[18] Having established TOC as the best predictor of pCO2 in our data set, we proceeded to investigate how other factors may modify the effect of TOC on pCO2. We did this by fitting gamma-GLM models using three sets of variables chosen to represent different environmental factors (lake, water and catchment properties) either alone or in interaction with TOC (Table 2). Among the variables representing physical properties of the lakes, only altitude and depth gave significant contributions and were thus included the final model, which explained 46% of the variation in pCO2 (Table 2). With TOC included in the model, 79% of the variation in pCO2 was explained by TOC and its interactions with altitude and lake area. The effect of TOC on pCO2 decreased with increasing lake area and increasing altitude.

Table 2. Identity Link Gamma-GLM Models Using Different Sets of Predictor Variables Representing Lake, Water, and Catchment Properties as Well as Groundwater Indicators, Alone or in Interaction With TOCa
ModelSingle Variable EffectsTOC InteractionsR2BIC
  • a

    Significant single variable or interaction effects are marked with plus signs or minus signs for positive or negative regression coefficients, while nonsignificant effects are marked with ns. R2 represents the fraction of total deviance explained in the models resulting from a backward elimination process based on the Bayesian information criterion (BIC).

Lake propertiesaltitude (−), depth (−), lake area (ns)Not included in model0.461538.8
Lake properties and TOCTOC (+), altitude (−), depth (ns), lake area (ns)TOC:altitude (−), TOC:lake area (−)0.791430.8
Water chemistryTotP (+), TotN (+), pH (ns), SO4 (ns), chl (−)Not included in model0.621465.6
Water chemistry and TOCTOC (+), TotP (+), TotN,(ns) pH(ns), SO4(ns), chl(−)No significant interactions0.741424.4
Catchmentslope (ns), air temperature (ns), runoff (ns), NDVI (ns), Ndep (ns), forest (+), bog (+), catchment area (−), farmland (+), area ratio (−), residence time (ns)Not included in model0.691474.7
Catchment and TOCTOC (+), slope (ns), air temperature (ns), runoff (+), NDVI (+), N deposition (−), forest (+), bog (ns), catchment area (−), arable (+), area ratio (−), residence time (+)TOC:Ndep (+), TOC:runoff (−), TOC:area ratio (+), TOC:residence time (−)0.851375.6
Groundwater indicator variablesanc (ns), Ca (ns), Mg (ns), K (+), Na (ns)Not included in model0.231481.6
Groundwater indicators and TOCTOC (+), anc (ns), Ca (−), Mg (−), K (+), Na (ns)None0.831329.5

[19] The model based on parameters related to lake water chemistry and in-lake processes explained 62% of the variation. Unsurprisingly, Chla had a negative effect on pCO2, while the opposite was true for TotP and TotN. There were no significant interactions with TOC in this subset of variables.

[20] When catchment properties were considered alone, five variables (Table 2) contributed significantly in a model that explained 69% of the variation in pCO2, while 86% was explained when TOC was also included in the model. The interactions in the latter model suggest that the effect of TOC on pCO2 increases with N deposition and area ratio while it declines with the area-specific runoff and residence time. Air temperature, spanning 9.5°C across the data set, gave no significant contribution.

[21] Parameters related to the influx of groundwater (Ca, Mg, Na, K and ANC) explained 23% of the variation in pCO2 (Table 2). However, combining groundwater indicator variables together with TOC increased the amount of variance explained to 83%.

4. Discussion

[22] By far, the best single predictor of pCO2 was TOC. The six next-ranking predictors in Table 1 probably all owe their positions to direct or indirect relationships with TOC. In these pristine catchments, both total N and P are mostly in organic form and are closely correlated with TOC [Meili, 1992; Hessen et al., 2009], and may thus act primarily as proxies for TOC. This confounding with TOC probably overruns the expected negative effect of total P on pCO2 through stimulating primary production [cf. del Giorgio and Peters, 1994; Hanson et al., 2004]. The positive effect of N and P on pCO2 in high TOC lakes may also partly work through stimulating bacterial mineralization of organic matter. The strong positive single-predictor effects of catchment vegetation properties (forest, bog and NDVI) probably reflect their role as TOC sources, while the negative effect of altitude is related to the general decrease in vegetation density with elevation. The five nonsignificant single-predictor variables in Table 1 (catchment area, lake area, N deposition, chlorophyll and water residence time) all had significant contributions as covariates with other explanatory variables, or in interaction with TOC (Table 2).

[23] The multiple-predictor model using only the physical properties of the lakes (altitude, lake depth and lake area) as covariates explained 46% of the pCO2 variation, which increased to 79% when interactions with TOC were included. The negative interaction effect of altitude with TOC could indicate qualitative changes in TOC related to its terrestrial source. The negative interaction effect between TOC and lake area could be related to increased physical degassing of CO2 with wind fetch, which would be positively related to lake surface area.

[24] The multiple-predictor model using only water quality variables (total P, chlorophyll, pH and SO4) as covariates explained 62% of the total deviance. The contrasting effects of chlorophyll and total P is attributed to the fact that total P correlates positively with TOC (r = 0.69, p < 2.2e−16), and probably acts as a proxy for it when TOC is absent from the model. By including TOC as a predictor variable, the water chemistry related variables explained 74% of the deviance, which was only slightly more than the model with TOC as a single predictor. While chlorophyll, as a proxy for phytoplankton biomass, had no significant effect on pCO2 by itself (Table 1), it had negative contributions both when combined with other water quality variables, and in interaction with TOC (Table 2). The apparently minor role of chlorophyll in this survey could also be affected by the samples being taken late in the growing season.

[25] The multipredictor model with catchment related parameters alone explained 69% of the variation in pCO2, with significant positive contributions from fractional coverage of forest, bog and farmland and negative contributions from catchment area and the lake to catchment area ratio. Although the degree of explanatory power of this model was somewhat less than the one with TOC as a single predictor (Table 1), it still had higher power for predicting pCO2 than what has been reported in many other studies [del Giorgio et al., 1999; Jonsson et al., 2003; Kelly et al., 2001; Prairie et al., 2002; Rantakari and Kortelainen, 2005; Roehm et al., 2009; Sobek et al., 2003]. The predictive power is particularly notable when considering that this model involves no variables that require actual water sampling; all the relevant catchment property parameters can be extracted from remote sensing products and digital land use, elevation and hydrology maps.

[26] Including both catchment properties and their interactions with TOC gave by far the best model for predicting pCO2, explaining 85% of the total deviance. While DOC derived from bogs is expected to be older and more recalcitrant than DOC from other sources, the absence of significant interaction effects between bog or forest cover and TOC indicates that such effects played a minor role in our data set. The significant positive interaction between TOC and N deposition may indicate that TOC mineralization increases with increased nitrogen deposition.

[27] Inflow of CO2-supersaturated groundwater could also serve as an important source of pCO2 in lakes. Based on a study of more than 20,000 Swedish lakes, Humborg et al. [2009] claimed that the importance of groundwater was in the same order of magnitude as DOC. Groundwater influx could explain some 20–30% of the variation in pCO2, which is consistent with the findings in our study (R2 = 0.27). The reported contribution of TOC to pCO2 was no more than 21%, however, in contrast to the 73% of variance explained by TOC alone in our study. Humborg et al. did not measure pCO2 directly but estimated this from lake chemistry variables. While this may have introduced some scatter in their data, it still does not explain why they arrived at such a low impact of DOC. The majority of the lakes in our study are located at relatively high altitude with a thin layer of topsoil and in areas dominated by bedrock which does not facilitate deep water infiltration. It is also noteworthy that, in our study, the best the best model according to the AIC value was based on TOC and K combined, which explained 83% of the observed variation in pCO2. Assuming that K serves as a good proxy of groundwater impact, it lends support to the idea that groundwater has a significant contribution to pCO2 in the studied lakes, albeit the contribution seems to be relatively minor compared to other studies [e.g., Humborg et al., 2009].

[28] Acknowledging DOC as the key predictor of pCO2, it would be of major interest to assess the potential of this parameter for global prediction of pCO2 in lakes. Roehm et al. [2009] claimed that the different sources and types of DOC only allows for regional models, and support this by a comparison of empirical models of lake pCO2 as a function DOC from different regions of the world. In their commentary to Figure 6 of Roehm et al. [2009], they point to the regionally different intercepts and slopes, leading them to conclude that “This pattern would suggest that changes in DOC loading or input to lakes have very different consequences in terms of surface water pCO2 in lakes of different regions, and this in turn could be related to fundamental differences in the nature of the DOC among regions.” As mentioned in our introduction, there is however no a priori reason to presume that a power function is the most adequate model for describing the relationship between pCO2 and DOC. In fact, elementary physical considerations speak against this, suggesting a nonzero intercept model such as the identity link gamma-GLM models which we have used in this work.

[29] Neither the gamma-GLM model nor the power function models managed to fully capture the relationship between pCO2 and TOC in Figure 2. The sensitivity of the power function models to subsets of the data range resulted in very different models with overabundances of positive residuals at either the high or the low end of the TOC gradient. The gamma-GLM model seemed to overestimate pCO2 at low TOC values, but was less affected by the subsetting of the data set. Thus, the gamma-GLM is more robust, with no difference in slope over data ranges, while the slopes of the power models depend on data range. Furthermore, a direct comparison of the fraction of deviance explained (R2) is in favor of the gamma-GLM, which has the added advantage of producing physically plausible predictions when extrapolated to zero TOC.

[30] To further explore this, we gathered a wide range of literature data sets (some transcribed from tables, but most digitized from published figures) for comparing the performances of the two modeling approaches (Figure 3). The amount of variance explained by the GLM models were equal to or larger than the amount of variance explained by the power function models (Table 3). As expected from Roehm et al. [2009], there is a wide span in the power function model slopes and intercepts, which is not reflected in the corresponding gamma-GLM models. It should be noted that the while the power function approach does not catch the asymptote at low TOC, the identity link gamma-GLM models do. The intercepts in the gamma-GLM models can be taken as estimates of pCO2 when no TOC is present. The gamma-GLM model based on the entire data collection has an intercept at 361 ppm CO2. This value is consistent with the range of literature values on the partial pressure of CO2 in the ambient air, and with a priori considerations. This analysis suggests that the identity link gamma-GLM approach should be preferred over the power function model. Furthermore, the apparent regional-specific differences in the correlation between DOC and pCO2 could be an artifact stemming from the power function model applied to data sets with different ranges of DOC. This concern should be addressed in future studies before making conclusions about regional differences in the nature of DOC and how this may influence the relationship between organic carbon and pCO2 in lakes.

Figure 3.

The relation between literature values of DOC and pCO2. (left) The power function approach generates models with highly variable intercepts and slopes, and (right) the identity link gamma-GLM approach reduces the apparent dissimilarity between models based on data from different regions. Green lines, del Giorgio et al. [1999]; red lines, Sobek et al. [2003]; blue lines, Jonsson et al. [2003]; yellow lines, Roehm et al. [2009]; orange lines, Humborg et al. [2009]; black lines, this study; gray lines, all data.

Table 3. Power Function Model and Gamma-GLM Model Summaries on pCO2 as a Function of DOCa
ReferencePower Function ModelGamma-GLM Model
  • a

    The data originated from literature data sets; some transcribed from tables, but most digitized from published figures. In power function models, log-transformed variables are fitted to a linear model (i.e., log10(pCO2) = log(a) + b log10(DOC), which is equal to pCO2 = a * DOCb), while gamma-GLM models treat the relation between DOC and pCO2 as linear with a nonzero intercept (i.e., pCO2 = a + b*DOC). R2 under the power function models represents the fraction of total variance explained (i.e., the ratio of residual and total sum of squares). It is conceptually comparable to R2 under the gamma-GLM model, which can also be generalized as the ratio of deviance explained by the model to the total deviance. Values in parentheses are standard errors.

  • b

    Not significant.

del Giorgio et al. [1999]0.062.52 (0.26)0.39 (0.38)b0.12349 (166.6)b64.3 (35.0)b
Sobek et al. [2003]0.512.26 (0.09)0.80 (0.09)0.52212 (110.8)110.1 (11.9)
Jonsson et al. [2003]0.572.48 (0.04)0.42 (0.05)0.58276 (34.34)65.1 (8.6)
Roehm et al. [2009]0.392.26 (0.08)0.61 (0.09)0.43238 (74.3)53.5 (9.0)
Humborg et al. [2009]0.232.50 (0.10)0.68 (0.11)0.27517 (163.6)123.1 (19.8)
This paper0.652.7 (0.01)0.33 (0.02)0.69426 (23.5)100.5 (8.6)
All data0.462.59 (0.02)0.46 (0.03)0.48360 (29.7)95.4 (5.2)

[31] TOC was by far the most significant predictor variable for the partial pressure of carbon dioxide in Norwegian surface waters, while a suite of catchment property parameters also gave significant, but minor contributions. This study supports the hypothesis that the elevated pCO2 in lakes primarily stems from microbial mineralization of allochthonous DOC. It also suggests that identity link gamma-GLM models are more appropriate for predicting pCO2 in lakes, and that their application makes it possible to reach reasonably accurate global models for how pCO2 relates to DOC and other environmental factors.


[32] We acknowledge the Norwegian Institute of Water Research for providing access to chemical data from the sampled lakes. This work was covered by grants from the Norwegian Research Council project 165139 “Biogeochemistry in Northern Watersheds, a Reactor in Global Change” to D.O.H. and project 196336 “Biodiversity, community saturation, and ecosystem function in lakes” to T.A.