Abstract
 Top of page
 Abstract
 1. Introduction
 2. Methods
 3. Results
 4. Discussion
 Acknowledgments
 References
 Supporting Information
[1] During the past two decades, it has become evident that a majority of lakes are net conduits of CO_{2} to the atmosphere. This insight has implications both for lake metabolism per se and for assessing the role of lakes in the global C cycle. The concentration of dissolved organic carbon (DOC), which constitutes >90% of the total organic carbon (TOC), has been identified as a key driver of partial pressure of CO_{2} (pCO_{2}). A crucial question is whether one may identify global relationships in the DOCpCO_{2} relationship in lakes or whether this has to be determined regionally or locally. A second major aspect is how to best predict CO_{2} as a function of DOC. Based on a survey of pCO_{2} and a range of lake and catchment variables in 112 lakes, we support the view that DOC is by far the most important determinant of pCO_{2} while groundwater influx has a minor contribution. Contrary to expectations, total phosphorus (P) also apparently contributed positively to pCO_{2}, owing to the fact that most P in these lakes is on the form of allochthonously organic P, and thus correlates strongly with DOC. Physical principles dictate that even a lake completely devoid of DOC should have a nonzero pCO_{2}. This is not reflected in power models, which imply that the pCO_{2} approaches zero with zero DOC. Based on this study as well as published data on DOCpCO_{2} relationships, we argue that identity link gamma–generalized linear models are appropriate for predicting pCO_{2} in lakes and that their application makes it possible to reach reasonably accurate global models for how pCO_{2} relates to DOC and other environmental factors.
1. Introduction
 Top of page
 Abstract
 1. Introduction
 2. Methods
 3. Results
 4. Discussion
 Acknowledgments
 References
 Supporting Information
[2] Freshwater systems constitute a comparatively small proportion of our planet's surface. The impact of freshwater systems has therefore until recently, and still not infrequently, been considered marginal in the context of ecosystem feedbacks in the global carbon (C) cycle. Beginning in the early 1990s it became clear, however, that freshwater ecosystems were not only important conduits of CH_{4} [Kelly et al., 1997; Bastviken et al., 2004], but they were also commonly net heterotrophic, serving as sources of CO_{2} due to bacterial mineralization of terrestrially derived organic C [Hessen et al., 1990; Kling et al., 1991; Cole et al., 1994]. Building on an increasing amount of evidence, the consensus is now that CO_{2} supersaturation in circumboreal lakes is the norm rather than the exception, and not only a property of heavily colored humic lakes. Several reports suggest that lakes, in disproportion to their relative area, are significant components in the carbon budget of terrestrial ecosystems and vent a considerable amount of CO_{2} to the atmosphere [Dean and Gorham, 1998; Cole et al., 2007; Tranvik et al., 2009]. Hence, the conversion of organic to dissolved inorganic C (DIC), and the fate and flux of DIC has become a major issue in the context of the global C cycle as well as for the understanding of lake metabolism and C budget.
[3] The literature offers a wide range of factors which potentially regulate the degree of supersaturation of CO_{2} in freshwaters. The physical and morphometric properties of lakes (e.g., lake area, depth and temperature) influence the net atmospheric gas exchange in several ways. Large amounts of organic carbon are stored in lake sediments [Dean and Gorham, 1998; Tranvik et al., 2009], which, depending on redox conditions and mixing processes, may serve as a source of CH_{4} or CO_{2} to the water column. A major fraction of produced CH_{4} may subsequently be converted to CO_{2} by water column methanotrophs [cf. Rudd and Hamilton, 1978; Hessen and Nygaard, 1992]. The bathymetric properties of a lake determine the ratio of sediment surface to lake volume and thus, the impact of the sediment. Likewise, the ratio of lake area to lake volume might determine the efficiency of gas exchange over the surface. Lake area per se could also affect gas exchange as increased fetch facilitates the wind induced mixing of the top layer. Finally, solar irradiation may generate high levels of photooxidation of dissolved organic carbon (DOC) in surface layers [Granéli et al., 1998], while the solar heating of the upper water column will in itself play a key role for CO_{2} exchange via the negative relationship between gas saturation and temperature.
[4] Since the net CO_{2} exchange in ecosystems is primarily governed by the balance between photosynthesis and respiration, biotic processes have been the focus of most studies on CO_{2} saturation in lakes. The concentration of allochthonous DOC has been suggested as the main driver for the flux of CO_{2} from lakes to the atmosphere [del Giorgio et al., 1997; Prairie et al., 2002; McCallister and del Giorgio, 2008], and in humic, or brown water lakes, the production of heterotrophic bacteria is high and “subsidized” by allochthonous C [Hessen et al., 1990; Tranvik, 1990; Jansson et al., 2008]. The very C rich nature of both dissolved and particulate matter in many freshwater localities may further induce high respiratory disposal of “excess C” [cf. Hessen and Anderson, 2008]. Inlake processes are not the only source of lake CO_{2} however. Some studies suggest that weathering processes as well as soil and root respiration may generate high levels of CO_{2} in the catchment, which have been subsequently transported to the lakes via groundwater or surface flow. Such processes serve as another route of allochthonously generated CO_{2} in lakes [Rantakari and Kortelainen, 2008], which have been reported to be of similar magnitude as inlake DOC mineralization [Stets et al., 2009; Humborg et al., 2009].
[5] There seems to be a general consensus that, among these factors, the concentration of DOC primarily sets the stage for the net exchange of inorganic C, but that there is also a major variability in the DOCCO_{2} relationships owing to abiotic and biotic properties of the lakes and their catchments. Furthermore, there may be interregional inconsistencies in the slopes and intercepts of regression models relating CO_{2} (actually, the partial pressure of CO_{2} − pCO_{2}) to DOC [Roehm et al., 2009]. A crucial question is whether one may identify global relationships in the DOCpCO_{2} relationship in lakes, or whether this has to be determined regionally or locally. A second major aspect is how to best predict CO_{2} as a function of DOC. The general approach has been to log transform both variables (to attain homoscedasticity in the data set) and fit the data to a linear model (i.e., log(y) = log(a) + b log(x)), which is equivalent to a power function model when backtransformed to a linear scale (i.e., y = ax^{b}). This approach implies that the pCO_{2} approaches zero with zero DOC. However, physical principles dictate that even a lake completely devoid of DOC should have a nonzero pCO_{2} close to that of the air. Thus, it is reasonable to assume that a power function is not the most appropriate model for describing the relationship between pCO_{2} and DOC. In this study, we have modeled pCO_{2} in lakes under the assumption that the relation between DOC and pCO_{2} is linear with a nonzero intercept.
[6] Thus, while it is widely recognized that the vast majority of boreal lakes are supersaturated with CO_{2}, the role of DOC relative to other parameters still remains to be settled, and so do the issues of global versus local relationships, as well the statistical model chosen to represent this relationship. To assess these issues and explore the impact of a wider range of ambient parameters on lake pCO_{2}, we surveyed 112 lakes with respect to pCO_{2} and a range of parameters related to lake water chemistry (total phosphorus, total nitrogen, chlorophyll, pH and sulfate), physical lake properties (surface area, altitude, depth and drainage area ratio), and watershed characteristics (runoff, annual air temperature, vegetation density (NDVI, Normalized Difference Vegetation Index), slope and the catchment proportions of bog, forest and farmland). The wide span of parameters and lake properties allowed us to examine the relative contribution of inlake processes, lake physical properties and watershed characteristics on the pCO_{2} in lakes. We also compare our results with various published models relating pCO_{2} in lakes to physical, chemical and catchment properties.
2. Methods
 Top of page
 Abstract
 1. Introduction
 2. Methods
 3. Results
 4. Discussion
 Acknowledgments
 References
 Supporting Information
[7] This study was based on a regional lake survey comprising 112 lakes in southern and central Norway (Figure 1). The lakes were chosen to ensure that all geographical regions and a wide span in lake properties were represented. Each lake was sampled once during the daytime in October 2004. Most lakes were reached by hydroplane that also was used as a platform for the sampling. Samples for water quality parameters and pCO_{2} were taken at 0.7–1.0 m depth at the deepest part of the lake.
[8] Analyses of pCO_{2} were done according to Sobek et al. [2003] in duplicate 1.125 L bottles, flushed by at least twice its volume by gently lowering the water sampler tube to the bottom of the bottle while ensuring that no bubbles were generated. Each bottle was sealed with a rubber stopper and inserted into separate thermoinsulated containers. Water temperature at the time of sampling was measured with a thermometer fitted inside the water sampler, which could then be compared to the water temperature at the time of analysis. The partial pressure of CO_{2} was analyzed in the field by infrared gas analysis of a 50 mL head space. The head space was generated by gently injecting ambient air into the bottles with a plastic syringe while simultaneously withdrawing water. The bottles were shaken vigorously for two minutes to ensure gas equilibrium before gas was extracted and analyzed in an EGM4 highprecision gas analyzer (PP Systems). Each replicate was analyzed twice to quantify measurement errors (mean coefficient of variation = 0.23%). Atmospheric CO_{2} partial pressure was recorded and the pCO_{2} in the water sample calculated by applying Henry's law to the partial pressure of CO_{2} in the head space and using correction factors for temperature, atmospheric pressure, and CO_{2} introduced by injecting ambient air into the bottles. The mean coefficient of variation for duplicate samples was 0.96%.
[9] Two separate 200 mL samples for chlorophyll quantification were vacuum filtered on 25 mm Whatman GF/F filters on the day of sampling. The filtered volume was reduced to 100 mL for samples with a visible high content of chlorophyll. After filtration, the filters were placed in ziplocked plastic bags and frozen in liquid nitrogen for later chlorophyll extraction in acetone and fluorometric analysis. All other water chemistry parameters were analyzed by the accredited laboratory at the Norwegian Institute for Water Research (NIVA) using standard methodology. In brief, total phosphorus (TotP) was measured as PO_{4} by manual spectrophotometry after wet oxidation with peroxodisulfate. Total organic carbon (TOC) was measured as CO_{2} after catalytic hightemperature combustion and detected by infrared gas analysis. In general, the dissolved fraction of TOC (DOC) makes up >90% of TOC. Total nitrogen (TotN) was analyzed colorimetrically in a segmented flow autoanalyzer after conversion to NO_{3} by wet oxidation. The base cations Na, K, Mg and Ca, as proxies of groundwater influence, were analyzed by atomic absorption spectrophotometry. Cation concentrations were corrected for seawater influence by subtracting the product of the Cl concentration and the respective Cl to cation seawater ratios.
[10] Key catchment properties (slope, area, altitude, NDVI, runoff, temperature and the proportion of forest, bog and farmland) were analyzed from digital maps with ESRI ArcGis 9.3 geographic information system (GIS) using the extension of Hawth's Analysis Tools (H. L. Beyer, Hawth's analysis tools for ArcGIS: Version 3.08, 2004; available at http://www.spatialecology.com/htools). Runoff was represented by averages over the period 1960–1990 obtained from the Norwegian Water Resources and Energy Directorate (NVE). Mean catchment slope and altitude were calculated from a 1x1 km digital elevation model of Norway from the Norwegian Mapping Authority (Statkart). Land area use was extracted from 1:50k vector maps from Statkart. NDVI was acquired as monthly composites from the U.S Geological Survey Eurasia Land Cover Characteristic database (http://edc2.usgs.gov/glcc/). Data on mean annual air temperature were downloaded as a 1 × 1 km raster map from the BioClim database [Hijmans et al., 2005] (http://www.worldclim.org/). Total catchment specific yearly runoff was calculated as the product of area specific yearly runoff (mm yr^{−1}) and catchment area (km^{2}). Lake volumes were estimated from a power function of lake surface area, explaining 88% of the lake volume variation in an independent data set from 490 Norwegian lakes (S. Larsen et al., manuscript in preparation).
[11] The pCO_{2} and TOC data exhibited a large degree of heteroscedasticity such that a logarithmic transformation would be the common remedy called for to reduce skewness. Since we consider a nonzero intercept be an important feature of the pCO_{2}DOC relationship, we used generalized linear models (GLMs) [McCullagh and Nelder, 1989] based on the gamma distribution with an identity link function instead of the canonical inverse link. The link function in a GLM is a transformation linking the expectation of the response variable to a linear predictor of the explanatory variables. Identity link means that it is the expectation of the untransformed response variable which is predicted by the explanatory variable(s). Modern statistics [e.g., Venables and Ripley, 1999] uses the linear model as the common term for statistical models where the dependent variable is predicted by a linear combination of the independent variables, such as in multiple regression, analysis of variance and combinations of these. The unexplained variation in the dependent variable of a linear model is assumed to be equivalent to that of the independent variable, normally distributed noise with constant variance. Generalized linear models [McCullagh and Nelder, 1989] extend this concept to situations where the dependent variable belongs to any distribution of the exponential family, which includes binomial, Poisson, and gamma distributions as well as the normal distribution. The linear predictor of a GLM is functionally related to the expectation of the dependent variable by a link function, which can be nonlinear.
[12] The gamma distribution can be used to model continuous, positive variables where the standard deviation is proportional to the mean, i.e., that have a constant coefficient of variation. This heteroscedasticity property is characteristic for many types of chemical analysis, including measurements of the partial pressure of CO_{2} in water. A gammaGLM with an identity link function is particularly suitable for modeling pCO_{2} because it both captures the heteroscedasticity of the measurement errors and allows the predicted relationship to have a nonzero intercept with the y axis.
[13] In statistics tools like R, S+, or SAS, GLM models are fitted by maximizing the likelihood of the observations given the model parameters by iteratively reweighted least squares. The negative logarithm of the likelihood at the best parameter fit can be partitioned into socalled deviance components, representing the contributions from the independent variables and the unexplained error to the total variation in the dependent variable. Deviance components can be compared by analysis of deviance, in direct analogy with the classical analysis of variance for normally distributed variables. The negative log likelihood of a model is also a key component of the Akaike information criterion (AIC) and other parsimony indicators, used for selecting the best balance between goodness of fit and model complexity (number of model parameters or independent variables) in statistical modeling [Johnson and Omland, 2004].
[14] Initially, we also tested generalized additive models (GAMs) [Hastie and Tibshirani, 1997] which are able to capture nonlinear relationships between dependent and independent variables, but they failed to improve the predictions beyond those of the GLM models.
[15] Measured pCO_{2} in lakes was first fitted to univariate gammaGLMs with an identity link to assess the predictive powers of each of the independent variables. As this step, established TOC was by far the best predictor of pCO_{2}, and a second set of GLMs using the same distribution and link function were fitted to groups of covariate variables in interaction with TOC to identify factors influencing the pCO_{2}TOC relationship. The covariate groups were chosen to reflect aspects of the origin and fate of TOC (lake physical properties, inlake processes and water chemistry, catchment properties and parameters related to groundwater influence; see Table 1 for further details), Regression models were simplified by stepwise backward elimination using model selection by the Bayesian information criterion (BIC) [Johnson and Omland, 2004]. All statistical analyses were performed with the R statistical programming environment version 2.6.1 [R Development Core Team, 2008].
Table 1. SinglePredictor GammaGLM Models With Identity Link for pCO_{2} Sorted by Goodness of Fit^{a}Variable  Unit  Mean  Median  Range  Transformation  p Value  R^{2} 


TOC  mg L^{−1}  3.9  3.1  0.250–24.6  None  <2E16  0.73 
TotP^{b}  μg L^{−1}  3.3  4.0  0.9–20  Log10  <2E16  0.55 
bog^{c}  %  3.63  2.32  0–17  Arcsin Sqrt  <2E16  0.51 
Forest^{c}  %  41.4  36.8  0–88  Arcsin Sqrt  <2E16  0.50 
TotN^{b}  μg L^{−1}  265  255  42–645  None  <2E16  0.49 
NDVI^{c}  Index  130  132  104–149  None  3.70E16  0.47 
Altitude^{d}  M  464  392  10–1329  None  2.33E14  0.41 
Runoff^{c}  mm yr^{−1}  1750  1411  341–6350  Log10  5.71E08  0.26 
Water temperature  °C  7.55  7.60  2.3–12.2  None  6.45E08  0.25 
K^{e}  mg L^{−1}  0.23  0.18  0.04–0.84  Log10  6.04E07  0.24 
Air temperature^{c}  °C  33.0  33.5  −2–7.5  None  2.69E07  0.23 
SO_{4}^{b}  mgS L^{−1}  1.75  1.43  0.5–8.08  Log10  4.31E07  0.22 
Ca^{e}  mg L^{−1}  1.23  0.89  0.13–5.14  Log10  3.29E06  0.18 
Mg^{e}  mg L^{−1}  0.30  0.24  0.06–1.05  Log10  8.1E06  0.17 
Farmland^{c}  %  0.508  0  0–0.5  Arcsin Sqrt  7.53E04  0.16 
Area ratio^{c}  ratio  0.096  0.074  0.004–0.470  Log10  3.49E04  0.13 
ANC^{e}  μeq/L  48.9  33.3  −14.59–237  None  2.14E03  0.12 
Slope^{c}  degree  6.7  5.6  0.745–24.7  Log10  3.7E03  0.10 
Na^{e}  mg L^{−1}  1.72  1.26  0.28–7.22  Log10  0.0124  0.08 
Depth^{d}  M  33.4  29  4–102  Log10  0.0234  0.06 
Catchment area^{c}  Km^{2}  37.5  8.9  1.25–463  Log10  0.154  0.02 
N deposition^{c}  mg m^{−2} yr^{−1}  0.68  0.70  0.251–1.08  None  0.343  0.01 
Lake area^{d}  Km^{2}  1.7  0.7  0.02–21.7  Log10  0.316  0.01 
chl^{b}  μg L^{−1}  0.79  0.60  0.07–3.7  Log10  0.923  0.00 
pH^{b}   5.89  5.90  4.6–7.0  None  0.0612  0.4 × 10^{−3} 
Residence time^{c}  Years  1.05  0.56  0.004–9.27  Log10  0.913  0.9 × 10^{−4} 
3. Results
 Top of page
 Abstract
 1. Introduction
 2. Methods
 3. Results
 4. Discussion
 Acknowledgments
 References
 Supporting Information
[16] All but three of the 112 lakes surveyed were supersaturated with pCO_{2}, ranging from 351 μatm (slightly undersaturated) to 2512 μatm (sevenfold supersaturation). Mean pCO_{2} was 774 μatm and 74% of the lakes were more than >150% supersaturated (Figure 1).
[17] pCO_{2} had statistically significant relationships with 21 of the 26 tested variables in the singlepredictor models (Table 1), but TOC was by a good margin the best predictor. The identity link gammaGLM model with TOC as the predictor (pCO_{2} = 426 (23.5) + 90.5 (7.7) TOC; numbers in parentheses are standard errors) explained 69% of the total deviance. The model intercept (426) was within the 95% confidence interval for air pCO_{2} (356–434). For comparison with the power function used in other studies, we also fitted a linear model on logtransformed pCO_{2} and TOC: log_{10}(pCO_{2}) = 2.72 (0.014) + 0.329 (0.023) log_{10}(TOC). The power function model explained 65% of the variation in pCO_{2}, but the model did not capture the asymptotic behavior at low TOC in the data set. Furthermore, applying the two modeling approaches to subranges of the data sets (above and below the median TOC) revealed the power function model to be less robust to the underlying range of data than the identity link gammaGLM model (Figure 2). The sample with the highest value for pCO_{2} (2512 μatm) was identified as an influential outlier with high leverage. Omitting the outlier from the model resulted in an increase in the amount of variance explained to 73% for the GLM model and 67% for the power function model, while the model coefficients remained within the original confidence areas.
[18] Having established TOC as the best predictor of pCO_{2} in our data set, we proceeded to investigate how other factors may modify the effect of TOC on pCO_{2}. We did this by fitting gammaGLM models using three sets of variables chosen to represent different environmental factors (lake, water and catchment properties) either alone or in interaction with TOC (Table 2). Among the variables representing physical properties of the lakes, only altitude and depth gave significant contributions and were thus included the final model, which explained 46% of the variation in pCO_{2} (Table 2). With TOC included in the model, 79% of the variation in pCO_{2} was explained by TOC and its interactions with altitude and lake area. The effect of TOC on pCO_{2} decreased with increasing lake area and increasing altitude.
Table 2. Identity Link GammaGLM Models Using Different Sets of Predictor Variables Representing Lake, Water, and Catchment Properties as Well as Groundwater Indicators, Alone or in Interaction With TOC^{a}Model  Single Variable Effects  TOC Interactions  R^{2}  BIC 


Lake properties  altitude (−), depth (−), lake area (ns)  Not included in model  0.46  1538.8 
Lake properties and TOC  TOC (+), altitude (−), depth (ns), lake area (ns)  TOC:altitude (−), TOC:lake area (−)  0.79  1430.8 
Water chemistry  TotP (+), TotN (+), pH (ns), SO_{4} (ns), chl (−)  Not included in model  0.62  1465.6 
Water chemistry and TOC  TOC (+), TotP (+), TotN,(ns) pH(ns), SO_{4}(ns), chl(−)  No significant interactions  0.74  1424.4 
Catchment  slope (ns), air temperature (ns), runoff (ns), NDVI (ns), Ndep (ns), forest (+), bog (+), catchment area (−), farmland (+), area ratio (−), residence time (ns)  Not included in model  0.69  1474.7 
Catchment and TOC  TOC (+), slope (ns), air temperature (ns), runoff (+), NDVI (+), N deposition (−), forest (+), bog (ns), catchment area (−), arable (+), area ratio (−), residence time (+)  TOC:Ndep (+), TOC:runoff (−), TOC:area ratio (+), TOC:residence time (−)  0.85  1375.6 
Groundwater indicator variables  anc (ns), Ca (ns), Mg (ns), K (+), Na (ns)  Not included in model  0.23  1481.6 
Groundwater indicators and TOC  TOC (+), anc (ns), Ca (−), Mg (−), K (+), Na (ns)  None  0.83  1329.5 
[19] The model based on parameters related to lake water chemistry and inlake processes explained 62% of the variation. Unsurprisingly, Chla had a negative effect on pCO_{2}, while the opposite was true for TotP and TotN. There were no significant interactions with TOC in this subset of variables.
[20] When catchment properties were considered alone, five variables (Table 2) contributed significantly in a model that explained 69% of the variation in pCO_{2}, while 86% was explained when TOC was also included in the model. The interactions in the latter model suggest that the effect of TOC on pCO_{2} increases with N deposition and area ratio while it declines with the areaspecific runoff and residence time. Air temperature, spanning 9.5°C across the data set, gave no significant contribution.
[21] Parameters related to the influx of groundwater (Ca, Mg, Na, K and ANC) explained 23% of the variation in pCO_{2} (Table 2). However, combining groundwater indicator variables together with TOC increased the amount of variance explained to 83%.
4. Discussion
 Top of page
 Abstract
 1. Introduction
 2. Methods
 3. Results
 4. Discussion
 Acknowledgments
 References
 Supporting Information
[22] By far, the best single predictor of pCO_{2} was TOC. The six nextranking predictors in Table 1 probably all owe their positions to direct or indirect relationships with TOC. In these pristine catchments, both total N and P are mostly in organic form and are closely correlated with TOC [Meili, 1992; Hessen et al., 2009], and may thus act primarily as proxies for TOC. This confounding with TOC probably overruns the expected negative effect of total P on pCO_{2} through stimulating primary production [cf. del Giorgio and Peters, 1994; Hanson et al., 2004]. The positive effect of N and P on pCO_{2} in high TOC lakes may also partly work through stimulating bacterial mineralization of organic matter. The strong positive singlepredictor effects of catchment vegetation properties (forest, bog and NDVI) probably reflect their role as TOC sources, while the negative effect of altitude is related to the general decrease in vegetation density with elevation. The five nonsignificant singlepredictor variables in Table 1 (catchment area, lake area, N deposition, chlorophyll and water residence time) all had significant contributions as covariates with other explanatory variables, or in interaction with TOC (Table 2).
[23] The multiplepredictor model using only the physical properties of the lakes (altitude, lake depth and lake area) as covariates explained 46% of the pCO_{2} variation, which increased to 79% when interactions with TOC were included. The negative interaction effect of altitude with TOC could indicate qualitative changes in TOC related to its terrestrial source. The negative interaction effect between TOC and lake area could be related to increased physical degassing of CO_{2} with wind fetch, which would be positively related to lake surface area.
[24] The multiplepredictor model using only water quality variables (total P, chlorophyll, pH and SO_{4}) as covariates explained 62% of the total deviance. The contrasting effects of chlorophyll and total P is attributed to the fact that total P correlates positively with TOC (r = 0.69, p < 2.2e−16), and probably acts as a proxy for it when TOC is absent from the model. By including TOC as a predictor variable, the water chemistry related variables explained 74% of the deviance, which was only slightly more than the model with TOC as a single predictor. While chlorophyll, as a proxy for phytoplankton biomass, had no significant effect on pCO_{2} by itself (Table 1), it had negative contributions both when combined with other water quality variables, and in interaction with TOC (Table 2). The apparently minor role of chlorophyll in this survey could also be affected by the samples being taken late in the growing season.
[25] The multipredictor model with catchment related parameters alone explained 69% of the variation in pCO_{2}, with significant positive contributions from fractional coverage of forest, bog and farmland and negative contributions from catchment area and the lake to catchment area ratio. Although the degree of explanatory power of this model was somewhat less than the one with TOC as a single predictor (Table 1), it still had higher power for predicting pCO_{2} than what has been reported in many other studies [del Giorgio et al., 1999; Jonsson et al., 2003; Kelly et al., 2001; Prairie et al., 2002; Rantakari and Kortelainen, 2005; Roehm et al., 2009; Sobek et al., 2003]. The predictive power is particularly notable when considering that this model involves no variables that require actual water sampling; all the relevant catchment property parameters can be extracted from remote sensing products and digital land use, elevation and hydrology maps.
[26] Including both catchment properties and their interactions with TOC gave by far the best model for predicting pCO_{2}, explaining 85% of the total deviance. While DOC derived from bogs is expected to be older and more recalcitrant than DOC from other sources, the absence of significant interaction effects between bog or forest cover and TOC indicates that such effects played a minor role in our data set. The significant positive interaction between TOC and N deposition may indicate that TOC mineralization increases with increased nitrogen deposition.
[27] Inflow of CO_{2}supersaturated groundwater could also serve as an important source of pCO_{2} in lakes. Based on a study of more than 20,000 Swedish lakes, Humborg et al. [2009] claimed that the importance of groundwater was in the same order of magnitude as DOC. Groundwater influx could explain some 20–30% of the variation in pCO_{2}, which is consistent with the findings in our study (R^{2} = 0.27). The reported contribution of TOC to pCO_{2} was no more than 21%, however, in contrast to the 73% of variance explained by TOC alone in our study. Humborg et al. did not measure pCO_{2} directly but estimated this from lake chemistry variables. While this may have introduced some scatter in their data, it still does not explain why they arrived at such a low impact of DOC. The majority of the lakes in our study are located at relatively high altitude with a thin layer of topsoil and in areas dominated by bedrock which does not facilitate deep water infiltration. It is also noteworthy that, in our study, the best the best model according to the AIC value was based on TOC and K combined, which explained 83% of the observed variation in pCO_{2}. Assuming that K serves as a good proxy of groundwater impact, it lends support to the idea that groundwater has a significant contribution to pCO_{2} in the studied lakes, albeit the contribution seems to be relatively minor compared to other studies [e.g., Humborg et al., 2009].
[28] Acknowledging DOC as the key predictor of pCO_{2}, it would be of major interest to assess the potential of this parameter for global prediction of pCO_{2} in lakes. Roehm et al. [2009] claimed that the different sources and types of DOC only allows for regional models, and support this by a comparison of empirical models of lake pCO_{2} as a function DOC from different regions of the world. In their commentary to Figure 6 of Roehm et al. [2009], they point to the regionally different intercepts and slopes, leading them to conclude that “This pattern would suggest that changes in DOC loading or input to lakes have very different consequences in terms of surface water pCO_{2} in lakes of different regions, and this in turn could be related to fundamental differences in the nature of the DOC among regions.” As mentioned in our introduction, there is however no a priori reason to presume that a power function is the most adequate model for describing the relationship between pCO_{2} and DOC. In fact, elementary physical considerations speak against this, suggesting a nonzero intercept model such as the identity link gammaGLM models which we have used in this work.
[29] Neither the gammaGLM model nor the power function models managed to fully capture the relationship between pCO_{2} and TOC in Figure 2. The sensitivity of the power function models to subsets of the data range resulted in very different models with overabundances of positive residuals at either the high or the low end of the TOC gradient. The gammaGLM model seemed to overestimate pCO_{2} at low TOC values, but was less affected by the subsetting of the data set. Thus, the gammaGLM is more robust, with no difference in slope over data ranges, while the slopes of the power models depend on data range. Furthermore, a direct comparison of the fraction of deviance explained (R^{2}) is in favor of the gammaGLM, which has the added advantage of producing physically plausible predictions when extrapolated to zero TOC.
[30] To further explore this, we gathered a wide range of literature data sets (some transcribed from tables, but most digitized from published figures) for comparing the performances of the two modeling approaches (Figure 3). The amount of variance explained by the GLM models were equal to or larger than the amount of variance explained by the power function models (Table 3). As expected from Roehm et al. [2009], there is a wide span in the power function model slopes and intercepts, which is not reflected in the corresponding gammaGLM models. It should be noted that the while the power function approach does not catch the asymptote at low TOC, the identity link gammaGLM models do. The intercepts in the gammaGLM models can be taken as estimates of pCO_{2} when no TOC is present. The gammaGLM model based on the entire data collection has an intercept at 361 ppm CO_{2}. This value is consistent with the range of literature values on the partial pressure of CO_{2} in the ambient air, and with a priori considerations. This analysis suggests that the identity link gammaGLM approach should be preferred over the power function model. Furthermore, the apparent regionalspecific differences in the correlation between DOC and pCO_{2} could be an artifact stemming from the power function model applied to data sets with different ranges of DOC. This concern should be addressed in future studies before making conclusions about regional differences in the nature of DOC and how this may influence the relationship between organic carbon and pCO_{2} in lakes.
Table 3. Power Function Model and GammaGLM Model Summaries on pCO_{2} as a Function of DOC^{a}Reference  Power Function Model  GammaGLM Model 

R^{2}  Intercept  Coefficient  R^{2}  Intercept  Coefficient 


del Giorgio et al. [1999]  0.06  2.52 (0.26)  0.39 (0.38)^{b}  0.12  349 (166.6)^{b}  64.3 (35.0)^{b} 
Sobek et al. [2003]  0.51  2.26 (0.09)  0.80 (0.09)  0.52  212 (110.8)  110.1 (11.9) 
Jonsson et al. [2003]  0.57  2.48 (0.04)  0.42 (0.05)  0.58  276 (34.34)  65.1 (8.6) 
Roehm et al. [2009]  0.39  2.26 (0.08)  0.61 (0.09)  0.43  238 (74.3)  53.5 (9.0) 
Humborg et al. [2009]  0.23  2.50 (0.10)  0.68 (0.11)  0.27  517 (163.6)  123.1 (19.8) 
This paper  0.65  2.7 (0.01)  0.33 (0.02)  0.69  426 (23.5)  100.5 (8.6) 
All data  0.46  2.59 (0.02)  0.46 (0.03)  0.48  360 (29.7)  95.4 (5.2) 
[31] TOC was by far the most significant predictor variable for the partial pressure of carbon dioxide in Norwegian surface waters, while a suite of catchment property parameters also gave significant, but minor contributions. This study supports the hypothesis that the elevated pCO_{2} in lakes primarily stems from microbial mineralization of allochthonous DOC. It also suggests that identity link gammaGLM models are more appropriate for predicting pCO_{2} in lakes, and that their application makes it possible to reach reasonably accurate global models for how pCO_{2} relates to DOC and other environmental factors.