Assessing the nonconservative fluvial fluxes of dissolved organic carbon in North America



[1] Fluvial transport of dissolved organic carbon (DOC) is an important link in the global carbon cycle. Previous studies largely increased our knowledge of fluvial exports of carbon to the marine system, but considerable uncertainty remains about in-stream/in-river losses of organic carbon. This study presents an empirical method to assess the nonconservative behavior of fluvial DOC at continental scale. An empirical DOC flux model was trained on two different subsets of training catchments, one with catchments smaller than 2,000 km2 (n = 246, avg. 494 km2) and one with catchments larger than 2,000 km2 (n = 207, avg. 26,525 km2). A variety of potential predictors and controlling factors of fluvial DOC fluxes is discussed. The predictors retained for the final DOC flux models are runoff, slope gradient, land cover, and areal proportions of wetlands. According to the spatially explicit extrapolation of the models, in North America south of 60°N, the total fluvial DOC flux from small catchments (25.8 Mt C a−1, std. err.: 12%) is higher than that from large catchments (19.9 Mt C a−1, std. err.: 10%), giving a total DOC loss of 5.9 Mt C a−1 (std. err.: 78%). As DOC losses in headwaters are not represented in this budget, the estimated DOC loss is rather a minimum value for the total DOC loss within the fluvial network.

1. Introduction

[2] Rivers are the major pathway for land-ocean carbon fluxes which are an important part of the global carbon cycle. Moreover, rivers are biogeochemical reactors with net fluxes of CO2 to the atmosphere and carbon burial in sediments [cf. Battin et al., 2008; Cole et al., 2007; Kempe, 1984]. It was estimated that globally about 2.7 Gt C a−1 are exported from the terrestrial system to inland waters whereas only 0.9 Gt C a−1 are further exported to the marine system [Battin et al., 2009]. Accumulation of 0.6 Gt C a−1in inland waters and a net-efflux of 1.2 Gt C a−1 from inland waters to the atmosphere close the mass balance [Battin et al., 2009]. For streams and rivers, net-CO2-evasion was estimated to amount to 0.56 Gt C a−1 [Aufdenkampe et al., 2011].

[3] Dissolved organic carbon (DOC) accounts for about 29% of the total fluvial carbon exports to the coasts [cf. Ludwig et al., 1996a], the main part stemming from soil drainage [Guéguen et al., 2006; Mulholland, 1997]. A substantial part of the terrestrial DOC flushed into streams and rivers may be lost due to respiration and photo-oxidation during fluvial transport or after being adsorbed to suspended or bed sediments, increasing the proportion of refractory, less decomposable components [Battin, 1999; Battin et al., 2008; Cole and Caraco, 2001; Dawson et al., 2001; Duan et al., 2007; Dubois et al., 2010; Rasera et al., 2008; Worrall et al., 2007]. Thus, lateral DOC fluxes calculated from data sampled at the rivers' mouths underestimate terrestrial DOC exports [Worrall et al., 2007].

[4] Studies analyzing the spatial variability of fluvial DOC fluxes and its controlling factors identified drainage intensity [Ludwig et al., 1996b] or discharge [Harrison et al., 2005], slope [Mulholland, 1997], soil organic carbon content [Aitkenhead et al., 1999], C/N ratios of soils [Aitkenhead-Peterson et al., 2005; Aitkenhead and McDowell, 2000], and specific land cover [Mattsson et al., 2009] as potential predictors for a spatially explicit estimation. However, approaches to predict the land-ocean DOC transfer from such environmental controlling factors [e.g.,Aitkenhead and McDowell, 2000; Harrison et al., 2005; Ludwig et al., 1996b] assume fluvial DOC fluxes to be conservative, and thus disregard in-stream/in-river losses of DOC. The fact that similar predictors are identified from local to global scale is supporting this assumption [cf.Aitkenhead and McDowell, 2000].

[5] On the other hand, recent global estimates of carbon losses during the lateral transport from terrestrial systems to the coasts [Battin et al., 2008, 2009; Cole et al., 2007] are based on literature reviews and do not depict the spatial variability of carbon fluxes. At the regional scale, Worrall et al. [2007] made an attempt to assess the spatial variability in DOC losses within the fluvial system of England and Wales, calculating estimates on DOC respiration from measurements of the biochemical oxygen demand.

[6] In this study, DOC fluxes from small and large catchments, i.e., DOC fluxes at more upstream and more downstream parts, are compared in a high resolution, spatially explicit estimation approach. For this, an empirical equation is set up which estimates fluvial DOC fluxes from geospatial data that represent potential controlling factors. The empirical equation was trained on two subsets of river catchments, divided by catchment size into smaller and larger catchments. Assuming that the large training catchments have already experienced more in-stream/in-river DOC loss than the small catchments, it is hypothesized that estimates of in-river DOC loss can be calculated as the difference between the respective estimates. Further, the applied predictors and their different quantitative effects in both empirical fits of the estimation equation are analyzed and discussed.

2. Methods and Data

2.1. Data Preparation

[7] Hydrochemical data of 453 sampling locations from various sources [Alexander et al., 1997; Pacific and Yukon Water Quality Monitoring and Surveillance Program, Environment Canada, available at, accessed on 25 May 2009; Atlantic Environdat Water Quality Database, Environment Canada, available at, accessed on 20 May 2010; Environment River Network Station Water Quality Data, Government of Alberta, available at, accessed 1 April 2009; Provincial Stream Water Quality Monitoring Network (PWQMN), Ontario Ministry of the Environment, available at, accessed on 31 March 2009; National Water Information System, U.S. Geological Survey, available at, accessed on 24 June 2009] and the UNH/GRDC runoff data set [Fekete et al., 2002] were used to calculate long-term annual DOC fluxes. The UNH/GRDC runoff data set is a global, spatial data set (grid cells, resolution 30′) which combines long-term discharge gauging data (which provide certain information on total runoff from river basins) with climate-driven water balance model outputs (which provide information on the spatial heterogeneity of runoff) [Fekete et al., 2002]. These runoff composites give long-term mean annual runoff and long-term mean monthly runoffs.

[8] The sampling locations were selected following three criteria: (i) DOC concentrations for at least twelve consecutive months were available; (ii) the sampling location could be satisfactorily positioned on the stream network (Hydrosheds by Lehner et al. [2008], 15″ resolution) and the derived catchments were consistent with satellite imagery (no streams crossing the catchments boundaries), and (iii) the sampling location was not within an urban area nor was the catchment dominated by urban or industrial areas.

[9] From the hydrochemical time series data of each sampling location, as many nonoverlapping series of twelve consecutive monthly measurements as possible were identified (on average 3.1 years per station). Long-term mean monthly DOC fluxes were calculated from mean monthly DOC concentrations and monthly UNH/GRDC runoff data. The long-term mean monthly DOC fluxes were then summed up to long-term mean annual DOC fluxes. By this procedure, the monthly DOC concentrations are implicitly weighted by the monthly UNH/GRDC runoff data. This procedure is considered the best practice, because instantaneously measured or daily discharges, which could also be used for the weighting of DOC concentrations, are often missing or inconsistently reported in the hydrochemical data sets.

[10] Based on different sets of geodata (Table 1), the catchments of each of the sampling locations were delineated (based on Hydrosheds 15 s flow directions) and their catchment properties were calculated. For this, all of the raster data sets representing catchment properties were projected to Lambert Azimuthal Equal Area projection and resampled to a resolution of 1 km × 1 km cell size. Data sets with a similar or coarser resolution in decimal degrees were resampled to 1 km × 1 km resolution using the nearest neighbor method (Standard option in ArcGIS 9.3/Data management tools/Resample (cf., ESRI, 1995–2011, accessed 1 December 2011)). Data sets with a finer resolution in decimal degrees were first resampled to cell sizes corresponding to the original cell size (GlobCover: 200 m, SRTM DEM: 100 m, Worldclim: 200 m), also using the nearest neighbor technique. The resulting raster cells were then aggregated to 1 km × 1 km grid cells assigned the average of the finer cells.

Table 1. Geodata Used to Derive Catchment Properties
ParameterNotationData SetScale/Resolution
Flow directions Hydrosheds DEM [Lehner et al., 2008]15″
Catchment area Derived from flow directions (see above) 
Slope gradientsDerived from SRTM DEM [Jarvis et al., 2006]3″
Modified compound topographic indexCTImodFor each sampling location, calculated from catchment area and catchment slope gradient (equation (1))
RunoffqUNH/GRDC runoff composites [Fekete et al., 2002]30′
Land coverABF, ACF, AHV, A (see text)GlobCover [Arino et al., 2007]300 m
WetlandsAWLGlobal Lake and Wetlnad Data set (GLWD) [Lehner and Döll, 2004]30″
Avg. annual precipitation Worldclim [Hijmans et al., 2005]30″
Avg. annual air temperature Worldclim [Hijmans et al., 2005]30″
Topsoil organic carbon contenttopsoil CorgHarmonized World Soil Database v 1.1 (HWSD) [Food and Agriculture Organization, 2009]based on soil maps at 1:5 mil. to 1:1 mil.
Topsoil clay contenttopsoil clayHarmonized World Soil Database v 1.1 (HWSD) [Food and Agriculture Organization, 2009]based on soil maps at 1:5 mil. to 1:1 mil.
Net primary productionNPPCalculated as mean annual NPP based on monthly Terra/MODIS data from Nov. 2007 to Oct. 2011 (NASA Earth Observatory Team)6′
Lake area proportion SRTM Water Body Data Set (SRTM water body data product specific guidance, version 2.0, NASA and NGA, 2003)Based on SRTM DEM with 3″ resolution

[11] Within the study area, the Global Lake and Wetland Data Base (GLWD) [Lehner and Döll, 2004] distinguishes different coherent wetlands by type, while for extensive parts, only the proportion range of total wetland cover (‘25% to 50%’ and ‘50% to 100%’) is given. To gain a consistent wetland proportion data set, it was assumed that each of the distinguished wetland types take an areal proportion of 1 within the respective raster cells. For the wetland proportion classes, the mean of the given range, i.e., 0.375 and 0.75, was assigned.

[12] Based on catchment area and average catchment slope gradient (s), an index was calculated (equation (1)) that is similar to the compound topographic index (CTI), which is used to predict soil erosion and deposition areas [cf., e.g., Gessler et al., 1995, 2000]

equation image

Originally, for this index the slope gradient at a specific point is taken instead of the average slope gradient of the point's contributing area. However, in flat areas where the slope gradient is generally low, this index is less of use. That is generally the case for rivers themselves. Thus, the modified version of this parameter based on the catchment slope gradient was considered as a potential predictor for fluvial DOC fluxes.

2.2. Statistical Analysis

2.2.1. DOC Flux Estimation

[13] Multiple linear regression was used to identify catchment properties as predictors of the spatial variability in fluvial DOC fluxes (applying the software Statistica 9.0) (equation (2)). For each predictor xi in the regression equation a factor bi(b-estimate) is derived by regression analysis, expressing the variable's effect on the specific DOC fluxesfDOC [t C km−2 a−1].

equation image

From the multitude of possible predictor sets, one model set up was chosen as best model based on the Akaike Information Criterion (AIC) [Akaike, 1974] and the condition that each of the b-estimates assigned to predictor variables are statistically significant (p < 0.05).

[14] The regression equation was refitted for two subsets of training catchments: one subset with catchment areas below 2,000 km2 (n = 246, ‘small catchments’) and one subset with catchment areas above 2,000 km2 (n = 207, ‘large catchments’) (Figure 1). The threshold of 2,000 km2 was chosen in order to gain two subsets of similar sample size. Note that the ‘small catchments’ have an average area of 494 km2. They are several times larger than catchments of small headwater streams which were often studied in local studies [e.g., Aitkenhead et al., 1999; Dawson et al., 2001]. The ‘large catchments’ have a mean area of 26,525 km2 and are thus on average more than 53 times larger than the ‘small catchments’.

Figure 1.

Specific DOC fluxes versus catchment area. The whole set of 453 training catchments was divided into two subsets, one with catchments smaller than 2,000 km2 (‘small catchments’) and one with catchments larger than 2,000 km2 (‘large catchments’). The threshold of 2,000 km2 (straight vertical line) was chosen to gain subsets of similar sample size.

[15] The regression equations, representing the same set of predictors but different fits for different catchments size classes, were used for DOC flux estimations and compared against one another. An important criterion for a regression equation used for predictions is the normal distribution of the regression residuals (equation (3)) and the absence of notable correlations of any potential predictor with the residuals or relative residuals (equation (4)) of the regression.

equation image
equation image

The regression equations fitted for the small (Regression S) and the large (Regression L) catchments were used to calculate spatially explicit estimates on fDOC from the geospatial source data of the predictors (1 km × 1 km raster for continental North America south of 60°N). This geographical restriction was necessary, as the applied digital elevation data (SRTM) are only available up to this latitude. Negative values in the resulting rasters, which can occur if one or more of the predicting variables have negative influence on the expected variable, are forced to 0 t C km−2 a−1, as negative mobilization (i.e., DOC flux from the fluvial system to the terrestrial system) is not expected. As mobilization of DOC from the terrestrial system into surface waters is focused, the estimated DOC mobilization within mapped lake areas (according to SRTM water body data product specific guidance, version 2.0, NASA and NGA, 2003) is also forced to 0 t C km−2 a−1. In the following, the spatially explicit estimates on fluvial DOC fluxes derived from the small catchments are denoted as fDOC,mod[S]. Estimates derived from the large catchments are denoted as fDOC,mod[L].

2.2.2. DOC Loss Estimation

[16] A constraint on DOC loss within the fluvial system was calculated in two different budget approaches. For the large catchments the in-river DOC loss was estimated as the difference betweenfDOC,mod[S] and the specific DOC flux calculated from the hydrochemical source data (fDOC,calc) (equation (5)). For the extrapolation area, the in-river loss is estimated based onfDOC,mod[L] and fDOC,mod[S] (equation (6)).

equation image
equation image

Note that these budgets address the DOC loss within river stretches along which the contributing area increases from the size of the ‘small catchments’ to the size of the ‘large catchments’. As DOC losses within smaller streams and the lower reaches of larger rivers remain unaccounted for, these budgets represent conservative constraints on DOC losses from the fluvial system.

3. Results

3.1. Setup of Empirical DOC Flux Equations

3.1.1. Regression Fitting and Identification of Predictors

[17] Using the set of all 453 training catchments, the best model identified, i.e., the model with the lowest AIC and for which all fitted b-estimates are statistically significant (p < 0.05), contains the predictors average slope gradient (catchment slope gradient,s), annual runoff (q), areal proportions of coniferous forests (ACF), broadleaf forests (ABF), herbaceous vegetation (AHV), and wetlands (AWL), net primary production (NPP), and clay content of the topsoil (topsoil clay) (Regression A.1 in Table 2, for list of all considered parameters see Table 1). Note that the b-estimates express the effects the predictors take on the predicted specific DOC flux in t C km2per unit of the respective predictor. Slope gradient and topsoil clay content are assigned negative b-estimates, while the remaining predictors are assigned positive b-estimates.

Table 2. Fit of Regression Equations for All Training Catchments (n = 453)a
 Regression A.1bRegression A.2cRegression A.3d
bStd. ErrVIFbStd. ErrVIFbStd. ErrVIF
  • a

    Regression A.1 represents the regression equation chosen based on the AIC and the condition that each predictor is statistically significant (p < 0.05). For regression A.2, NPP was discarded from the set of predictors, because of its high variance inflation factor (VIF). Regression A.3 represents the set of predictors which was used for fitting the regression equation for the subsets of small and large training catchments (Table 4).

  • b

    r2 = 0.62, and AIC = 1,253.

  • c

    r2 = 0.61, and AIC = 1,268.

  • d

    r2 = 0.59, and AIC = 1,309.

s (deg)−−−
q (m a−1)2.690.193.762.840.183.353.060.183.18
ABF (1)0.850.357.141.420.253.760.720.202.29
ACF (1)2.680.358.
AHV (1)3.430.564.273.580.564.221.730.361.69
AWL (1)1.180.411.621.180.411.621.560.411.54
Topsoil clay (1)−3.220.697.45−2.940.687.24   
NPP (kg C m−2 a−1)0.890.3714.37      

[18] With regard to land cover, it has to be noted that only three land cover classes (ACF, ABF, AHV) were included in the model. The remaining land cover classes were taken in sum as reference land cover (Aother), for which fDOCsolely depends on the remaining, not land cover related predictors. The reference land cover is mainly composed of agricultural land and shrub land (within training catchments: 69% and 16%, respectively). Compared to the reference land cover, the three included land cover classes contribute higher specific DOC fluxes according to the assigned b-estimates. Coniferous forests yield mostfDOC, followed by herbaceous vegetation and broadleaf forests.

[19] Note that the predictors are to some extent correlated with each other (Tables 3 and A2). This multicollinearity affects the b-estimates and the standard errors associated with the b-estimates of the single predictors. A measure representing these effects is the variance inflation factor (VIF). A VIF of 10 is usually taken as threshold above which the multicollinearity effect is considered severe and the respective predictor should be discarded [cf.O'Brien, 2007]. In the case of regression A.1 (Table 2), the multicollinearity effect is severe for NPP (VIF = 14.37). Discarding this predictor (Regression A.2 in Table 2) does not significantly change the r2nor the b-estimates of the predictors slope gradient, topsoil clay content, and areal proportion of wetlands. In contrast to that, the b-estimates for areal proportion of coniferous forests and broadleaf forest are substantially increased in regression A.2 (compared to regression A.1) while the VIF for these predictors are substantially decreased. This indicates that specifically the intercorrelation with these two predictors led to the high VIF forNPP in regression A.1. It is concluded that NPP is a redundant predictor and thus regression A.2 is preferred.

Table 3. Correlations of the Applied Predictors (for Regression S and L) Within Both Subsets of Training Catchmentsa
 MeanStd. Dev.q (m a−1)s (deg)ABF (1)ACF (1)AHV (1)Aother (1)AWL (1)
  • a

    Statistically significant correlations (p < 0.05) are highlighted in bold.

Small Catchments, n = 246
q (m a−1)0.540.391      
s (deg)5.175.610.281     
ABF (1)0.340.280.070.151    
ACF (1)0.320.330.270.340.521   
AHV (1)  
Aother (1) 
AWL (1)−0.01−0.05−0.071
fDOC,calc (t C km−2 a−1)−
Large Catchments, n = 207
q (m a−1)0.300.231      
s (deg)6.634.790.211     
ABF (1)    
ACF (1)0.300.260.240.660.411   
AHV (1)  
Aother (1) 
AWL (1)0.080.12−−0.070.12−0.081
fDOC,calc (t C km−2 a−1)1.090.920.620.

[20] For the analysis of scale effects in the spatially explicit assessment of fluvial DOC fluxes, regression equation A.2 was refitted for the subsets of small and large training catchments. However, for the large training catchments, the predictor topsoil clay content was statistically insignificant (p > 0.05) and consequently discarded. For comparability reasons, the same set of predictors was applied for both subsets of training catchments, i.e., a modified regression equation comprising the predictors slope gradient, runoff, areal proportions of coniferous forests, broadleaf forests, herbaceous vegetation, and wetlands (Table 4, corresponds to Regression A.3 in Table 2). The proportion of explained variance in fDOC for both fits is close to that of the regression fits for the total set of 453 catchments (Table 2). The VIF are below 10 for every predictor in regression equations S and L (Table 4). The residuals (equation (3)) of the fitted regression equations show nearly normal distributions (Figure 2), indicating that the conditional expected values fit the conditional mean values of fDOC,calc well over the value range of the applied predictors within each subset of training catchments. Thus, these regression fits can be used for prediction of fDOC.

Table 4. Fit of Regression Equations for Small (Regression S, n = 246) and Large (Regression L, n = 207) Catchmentsa
 Regression SbRegression Lc
bStd. ErrVIFbStd. ErrVIF
  • a

    Note that predictor topsoil clay had to be discarded, because fitting this predictor for the large training catchments did not yield a statistically significant b-estimate.

  • b

    r2 = 0.55.

  • c

    r2 = 0.60.

s (deg)−−
q (m a−1)
ABF (1)0.820.302.150.500.193.00
ACF (1)2.950.322.511.320.245.55
AHV (1)1.990.621.551.070.282.09
AWL (1)1.680.631.471.810.371.76
Figure 2.

Histograms of residuals of the regression equations (a) S (small catchments) and (b) L (large catchments) (equation (3)). Means, minima, maxima and standard deviations of this difference are reported in units of the x axis. The curve indicates the theoretical normal distribution based on the mean and the standard deviation.

3.1.2. Analysis of Predictors Weight on Estimated DOC Fluxes

[21] According to the structure of the multiple linear regression equations S and L, an increase in one of the predictors would cause a proportional increase or, in the case of slope gradient, decrease in estimated fDOC. The b-estimates (Table 4) give the amount of estimated fDOC in t C km−2 a−1 that the respective predictors contribute per unit.

[22] Assuming all other predictors to be 0, for a runoff of 100 mm a−1, a specific DOC flux of 0.31 t C km−2 a−1 (Regression S) or 0.27 t C km−2 a−1 (Regression L) would be estimated (Table 4). Increasing the average slope gradient by 1° would reduce this specific flux by 0.21 t C km−2 a−1 (Regression S) or 0.09 t C km−2 a−1 (Regression L). Thus, each increase of 1° in the average slope gradient would counterbalance the effect of an increase in runoff by 67 mm a−1 (Regression S) or 33 mma−1 (Regression L) (Table 5).

Table 5. Comparison of Effects the Retained Predictors Take on the Estimated Specific DOC Flux in the Regression Equations S and La
 q (mm a−1)s (deg)ACF (1)ABF (1)AHV (1)AWL (1)
  • a

    For each predictor, the effects per unit are expressed in equivalent values of q and s taking the same effect on the estimated specific DOC fluxes. Note that for q the unit mm a−1 is used instead of m a−1 being used in the regression equations. For the areal proportions of the distinguished land cover classes (ACF, ABF, AHV), the equivalents refer to a proportion of 100% of the land cover class compared to a 100% proportion of the reference land cover (Aother, see text). For AWL this refers to 100% wetland proportion compared to 0% wetland proportion.

Small Catchments, Regression S
q (mm a−1)1−67953267642543
s (deg)−0.0151−14.2−4.0−9.6−8.1
Large Catchments, Regression L
q (mm a−1)1−33481184392662
s (deg)−0.0301−14.4−5.5−11.7−19.8

[23] Compared to slope and runoff, the areal proportions of the land cover classes and wetlands have a theoretically more constraint value range, with 0 as minimum proportion and 1 as maximum proportion. For instance, a land cover of 100% coniferous forests would add 2.95 t C km−2 a−1 (Regression S) or 1.32 t C km−2 a−1 (Regression L) to the estimated fDOC compared to a land cover with 0% of the explicitly included land cover classes. Thus, the effect of a 100% areal proportion of coniferous forests on estimated fDOC is equivalent to the effect of an increase in runoff by 953 mma−1 (Regression S) or 481 mma−1 (Regression L) (Table 5), and would counterbalance the negative effect of an increase in average slope gradient by about 14° for both regression equations. If the negative effect of slope gradient is not fully counterbalanced by the combined positive effects of the other predictors, the estimated specific DOC flux becomes negative. However, that was the case for only 18 of the 246 small catchments (using Regression S) and 11 of the 207 large catchments (using Regression L).

[24] Relative to runoff (Table 5), the predictors slope gradient and areal proportions of coniferous forests (ACF), broadleaf forests (ABF), and herbaceous vegetation (AHV) have substantially lower effects on the estimated fDOC in the large catchments than in the small catchments. The effects of wetland proportions on the estimated fDOC, on the contrary, are rather similar within both sets of training catchments.

3.2. Spatially Explicit Application of the Regression Equations

[25] The regression equations S and L were applied spatially explicitly to the area of continental North America south of 60°N (Figures 3a and 3b). In the following, the respective estimates are denoted as fDOC,mod[S] (Regression S) and fDOC,mod[L] (Regression L). The general spatial patterns of fDOC,mod[S] and fDOC,mod[L] (Figures 3a and 3b) are quite similar, with low values for the dry southwestern part and high values in the northwestern part (Hudson Bay low land area and Eastern Canada) and along the East Coast and West Coast of the U.S..

Figure 3.

Spatial variation of (a) fDOC,mod[S] and (b) fDOC,mod[L], (c) calculated in-stream/in-river DOC loss, and (d) distribution of training catchments. Note that in the spatially explicit estimates, negative DOC fluxes and DOC fluxes from lake areas have been forced to 0 t C km−2 a−1. Nonvalid values of DOC loss rates due to division by 0 are assigned no data (gray). Superscript “a”: literature values for the depicted areas are discussed in section 4.1.1.

[26] For the whole extrapolation area, the averages of fDOC,mod[S] and fDOC,mod[L] are 1.82 t C km−2 a−1 and 1.40 t C km−2 a−1, respectively. The estimated total DOC fluxes within the study area are 25.8 Mt C a−1 (referring to fDOC,mod[S]) and 19.9 Mt C a−1 (referring to fDOC,mod[L]), giving a total DOC loss of 5.9 Mt C a−1 and an average DOC loss rate of 23% within respective river stretches.

[27] Standard errors of the predictions were derived from Monte Carlo simulations (100,000 runs each) based on the regression equations S and L. These Monte Carlo simulations apply the b-estimates and their standard errors to the average predictor values within the entire study area (lumped estimation). Note that it is not feasible to directly calculate the standard errors of the spatially explicit estimation due to limited computing capacity. The relative standard errors of these lumped predictions are considered to be representative for the spatially explicit application of the regression equations to the study area.

[28] The relative standard errors for the mean specific and total DOC fluxes are 12% for regression S and 10% for regression L. The relative standard error for the derived DOC loss is 79%. This high standard error is not surprising considering the method to indirectly derive the DOC loss from two predictions. However, based on the standard errors of each predictor, it was estimated that with a probability of 90% the estimated DOC loss is positive, which means that regression S predicts higher DOC fluxes than regression L.

[29] The spatially explicitly calculated DOC loss rate (Figure 3c) is positive in 70.3% of the area. An elevated DOC loss rate is particularly visible in the northern parts of the extrapolation area. The reverse case, i.e., fDOC,mod[S] being lower than fDOC,mod[L] (14.4% of the extrapolation area), is mainly restricted to areas characterized by steep slopes (cf. Figure A1). Areas with fDOC,mod[S] < fDOC,mod[L] have on average a slope gradient of 10.3°, which is substantially higher than the average slope gradients of the training catchments (small catchments: 5.2°, large catchments: 6.6°) and the whole extrapolation area (4.2°) (Table A1). This suggests, that the extrapolation of a budget on DOC fluxes between the small and large catchments fails for high slope gradients.

[30] The maximum catchment average slope gradient covered by both sets of training catchments is 22.6° (cf. Table A1). Only about 2.8% of the extrapolation area shows higher slope gradients. If these were discarded from the spatial explicit application of the regression equations, the mean value of fDOC,mod[L] would remain 1.40 t C km−2 a−1 whereas the mean value of fDOC,mod[S] would increase by 1.7% to 1.85 t C km−2 a−1. The calculated DOC loss would also slightly increase by 0.3 Mt C a−1 to 6.2 Mt C a−1 in total, giving an average DOC loss rate of 24%. However, as these differences are rather negligible, a notable bias for the DOC loss estimation is not expected.

3.3. Mass Balance of Estimated and Calculated Specific DOC Fluxes

[31] Estimates on in-river DOC loss can also be derived fromfDOC,mod[S] and calculated specific DOC fluxes (fDOC,calc) (equation (5) and Table 6). Note that in-river DOC loss is calculated from the averages of predicted and calculated fluxes. For the small catchments, on which the empirical estimation equation forfDOC,mod[S] was trained, the estimates predict the calculated DOC fluxes quite well. The 2% ‘estimated DOC loss’ (Table 6) is interpreted as a minor negative bias. For the large catchments, the estimated DOC loss is 19%. Further dividing the large catchments in such smaller than 10,000 km2 (L1) and such larger than 10,000 km2 (L2) reveals that this difference is even higher for the latter subset (Table 6). While from small catchments (S) to subset L1 and from subset L1 to subset L2 the average catchment size increase by about one order of magnitude, the increase in the DOC loss rate is rather linear.

Table 6. Averages of Calculated (fDOC,calc) and Estimated (fDOC,mod[S]) Specific DOC Fluxes and Estimated DOC Loss for Different Catchment Size Classesa
Subsets of CatchmentsNAverage Catchment Size (km2)Average fDOC,calc (t C km−2 a−1)Average fDOC,mod[S] (t C km−2 a−1)Estimated DOC Loss
  • a

    The estimated DOC loss was calculated from the averages of calculated (fDOC,calc) and estimated specific DOC fluxes (fDOC,mod[S]) (equation (5)). In brackets, the relative standard errors are given (relative to the mean). The standard errors were derived by Monte Carlo Simulations based on the mean catchment properties of the respective subset of catchments, the b-estimates, and the standard errors of the b-estimates in regression S (Table 4), see text.

Small catchments (S) <2,000 km22464942.242.29 (±11%)2%
Large catchments (L) >2,000 km220726,5251.091.35 (±20%)19%
L1: 2,000–10,000 km2904,8801.221.38 (±20%)11%
L2: >10,000 km211743,1750.991.33 (±20%)25%

[32] For both subsets of large catchments, the differences fDOC,mod[S] − fDOC,calcper catchment show a bell-shaped distribution around a positive mean value (Figure 4), indicating a general tendency of fDOC,mod[S] to overestimate fDOC,calcwithin both subsets of large catchments. This justifies the presented approach to calculate in-river DOC loss.

Figure 4.

Histograms of the difference fDOC,mod[S] − fDOC,calc for two subsets of the large catchments. Means, minima, maxima and standard deviations are reported in units of the x axis. The curve indicates the theoretical normal distribution based on the mean and the standard deviation.

4. Discussion

4.1. Validity of Empirical Models

4.1.1. Specific DOC Fluxes

[33] Ludwig et al. [1996b] estimated an average specific DOC flux of 1.79 t C km−2 a−1 for the whole North American continent. This estimation is based on measured DOC fluxes of the Mississippi, Columbia, St. Lawrence, Yukon, Mackenzie, and Brazos Rivers and a spatially explicit estimation for the remaining area using a globally trained regression equation [Ludwig et al., 1996b]. As these DOC fluxes refer to the mouths of large world rivers (avg. catchment size 1.13 mil km2), the proportion of terrestrial DOC retained or oxidized within the fluvial network is likely higher than within the ‘large catchments’. The fact that the average specific DOC flux estimated by Ludwig et al. [1996b] is still substantially higher than that derived from the ‘large catchments’ in this study (1.40 t C km−2 a−1) is probably due to the elevated DOC yields from areas north of 60°N, which are not covered by this study. For tundra and taiga ecosystems, which dominate this area, Ludwig et al. [1996b] report a global average DOC flux of 2.0 t C km−2 a−1, being notably higher than their estimate on the North American average. The elevated specific DOC fluxes in high latitudes [Cooper et al., 2008] can be explained by high soil carbon storage combined with a considerable amount of runoff [Ludwig et al., 1996b], or by the permafrost related abundance of wetlands [Frey and McClelland, 2009; Striegl et al., 2007]. Indeed, also for the northern parts of the study area, which can already be referred to as taiga, estimated fluvial DOC fluxes are clearly above average (cf. Figures 3a and 3b).

[34] The total DOC fluxes within the study area of 25.8 Mt C a−1 (referring to fDOC,mod[S]) and 19.9 Mt C a−1 (referring to fDOC,mod[L]) are less than half (43% or 33%, respectively) of the HCO3 fluxes of 59.8 Mt C a−1 (derived from the spatially explicit estimate by Moosdorf et al. [2011]), which are not assumed to be subject to significant in-river changes. At global scale, the ratio of fluvial DOC/HCO3 exports to the coast is about 64% [Ludwig et al., 1996a]. Thus, DOC/HCO3 ratios within the study area are low compared to this global value.

[35] The maximum specific DOC fluxes estimated in this study are 17.6 t C km−2 a−1 (fDOC,mod[S]) and 15.2 t C km−2 a−1 (fDOC,mod[L]). These values, each of which was estimated for a single 1 km × 1 km raster cell, are not much higher than the highest specific DOC flux of 13.4 t C km−2 a−1 derived from the hydrochemical data (fDOC,calc) referring to a whole catchment. These maximum values are still substantially lower than the fluvial DOC export of 32.5 t C km−2 a−1 reported by Worrall et al. [2007] for a hot spot region at the English West Coast. Therefore, it is concluded that the spatially explicit estimates are well within the natural range of specific DOC fluxes.

[36] Areas in which high values of fDOC,mod[S] and fDOC,mod[L] predominate are underrepresented by the training catchments (cf. Figure 3). This is especially true for the Hudson Bay Lowlands and eastern Canada. However, for these areas, literature values can be used for a comparison (Table 7 and Figure 3d). Note that because the average size of the considered river catchments are in the order of magnitude of the ‘large catchments’, fDOC,mod[L] is taken as reference.

Table 7. Comparison of Spatially Explicit Estimates of Fluvial DOC Flux, the Applied Runoff Data (UNH/GRDC Runoff), and Implicitly Derived Mean DOC Concentrations With Literature Values for Different Areas in Eastern Canadaa
Area Defined by River Basin OutletsRiversAverage Catchment Area (106 km2)Literature ValuesfDOC,mod[L]
q (mm a−1)DOC (mg L−1)fDOC (t km−2 a−1)q (mm a−1)DOC (mg L−1)fDOC (t km−2 a−1)
  • a

    The values from this study and that from the literature refer to the same river basins listed in Table 7. All values are flux-weighted averages. The over/underestimations of the literature values by this study are given in brackets.

  • b

    After Hudon et al. [1996].

  • c

    After Mundy et al. [2010].

  • d

    After Granskog et al. [2007].

SE Hudson BayGreat Whale R., Little Whale R., R. du Nord, Nastapoka21484b,c3.6b,c1.76b,c457 (−5%)4.2 (+15%)1.91 (+9%)
Ungaya BayR. aux Feuilles, R. aux Melezes, R. Caniapiscau, Whale R., Georges R.62555b2.5b1.37b518 (−7%)3.8 (+55%)1.99 (+45%)
St. Lawrence GulfPetit Mecatina, Olomane, Natashquan, Romaine, Magpie, Moisie, Manicouagan, Aux Outardes, Bersimis, Saguenay25674b5.3b3.57b648 (−4%)3.5 (−34%)2.27 (−36%)
SW Hudson BayChurchill R., Nelson R., Hayes R., Winisk R.370259c,d10.5c,d2.72c,d108(−58%)14.1 (+33%)1.51 (−44%)
James BayNottaway R., Broadback R., Rupert R.43663c7.1c4.67c536 (−19%)4.4 (−37%)2.38 (−49%)

[37] For the rivers tributary to the southeastern Hudson Bay, fDOC,mod[L] reproduces the literature values well. The lateral DOC fluxes in the rivers tributary to the St. Lawrence Gulf, the southwestern Hudson Bay, and James Bay are substantially underestimated by fDOC,mod[L], whereas the fluvial DOC fluxes in rivers draining to the Ungaya Bay are substantially overestimated by fDOC,mod[L]. Note that for the rivers tributary to the southwestern Hudson Bay and James Bay, the long-term average annual runoff data of UNH/GRDC are substantially lower than the annual runoff reported in the literature (cf.Table 7), contributing to the underestimation of specific DOC fluxes. Moreover, for the rivers running to the southwestern Hudson Bay, the flux weighted average DOC concentration derived from fDOC,mod[L] is even higher than that derived from the literature values (cf. Table 7). However, it is not clear from the literature, in how far the reported DOC fluxes are representative for the long-term DOC fluxes.

4.1.2. Discussion of Predictors Runoff

[38] Of the predictors considered, runoff shows the highest correlation to the specific fluvial DOC fluxes (r = 0.58, Table A2, also see Table 2), well in accordance with previous studies [Harrison et al., 2005; Ludwig et al., 1996b; Worrall and Burt, 2007]. This is not surprising, as specific DOC fluxes are calculated from DOC concentrations and runoff. However, the uncertainties in the estimation of long-term fluvial DOC fluxes are specifically associated with the runoff data used. The applied runoff data set (UNH/GRDC) shows the by far coarsest resolution (30′) of the applied predictor data sets. Uncertainties in the representation of long-term average annual runoff are specifically expected for river basins smaller than 10,000 km2 [Fekete et al., 2002]. Indeed, 75% of the 453 training catchments are smaller than 10,000 km2. For 180 of the sampling locations, daily discharge time series of at least 5 years were available. Taking these data as reference, the correlation between UNH/GRDC and gauged runoff is not very high but still satisfactory (r = 0.76). The UNH/GRDC data overestimate the long-term annual runoff by 6.6% on average, with a 10th percentile of −54% and a 90th percentile of +53%. However, a relation between these deviations and catchment size could not be confirmed.

[39] A critical point in the relation between runoff and fluvial DOC fluxes is the temporal variability in runoff. DOC concentrations tend to increase with runoff; specifically during stormflow events substantial amounts of DOC are flushed from the topsoils into streams [cf., e.g., Idir et al., 1999; Raymond and Saiers, 2010]. Such flushing events can contribute a substantial proportion of long-term DOC fluxes, specifically from small catchments [e.g.,Raymond and Saiers, 2010]. By calculating fluvial DOC fluxes from average monthly DOC concentrations and long-term average monthly UNH/GRDC runoff data, stormflow events are not taken into account. This probably led to an underestimation of fluvial DOC fluxes. To assess this effect, we used the 180 station having more than five years of discharge gauging data and derived average annualfDOC by a rating curve approach which better accounts for flushing effects. For this, we utilized the software LOADEST [Runkel et al., 2004]. For each of the 180 stations, two predetermined multiple regression models, which use discharge as predictor, where fitted by the respective time series of DOC concentrations and instantaneously measured discharge using the adjusted maximum likelihood estimator. Based on the AIC, one of the two fits was chosen for each sampling location. A similar practice was described in a previous study of DOC fluxes from arctic river catchments [Holmes et al., 2012].

[40] The chosen regression model was applied to the time series of daily discharges to predict daily DOC loads. Based on the daily DOC loads and the catchment area, a multiyear average specific DOC flux was calculated for each sampling location. The fDOC derived by LOADEST is highly correlated with fDOC based on UNH/GRDC data (r = 0.88). For the 180 sampling locations, the average fDOC derived by LOADEST (1.60 t C km−2 a−1) is substantially higher than that based on the UNH/GRDC runoff data (1.32 t C km−2 a−1). The same is true for the flux-weighted DOC concentrations per sampling location (4.68 mg L−1 by LOADEST; 4.20 mg L−1 based on UNH/GRDC data). For the 87 of these sampling location, which belong to the ‘small catchments’, this discrepancy is even higher (2.05 t C km−2 a−1, 4.71 mg L−1 by LOADEST; 1.56 t C km−2 a−1, 3.87 mg L−1 based on UNH/GRDC data), while for the 93 sampling locations belonging to the ‘large catchments’ the average fDOCand flux-weighted DOC concentrations from both approaches are rather similar (1.18 t C km−2 a−1, 4.66 mg L−1 by LOADEST; 1.10 t C km−2 a−1, 4.51 mg L−1 based on UNH/GRDC data). This indicates that fDOC is specifically underestimated for the small catchments and, thus, fDOC,mod[S] underestimates fDOC from small catchments. Thus, the DOC loss estimation presented here is conservative even for the river stretches addressed by our budget approach. Slope Gradient

[41] Negative effects of slope gradient on specific DOC fluxes, as reported here, were also identified by Ludwig et al. [1996b] and Mulholland [1997]. Mulholland [1997]attributes those to a generally higher abundance of wetlands in flat areas. For wetlands, high soil organic carbon contents and terrestrial flow paths close to the surface through organic-rich soil horizons are characteristic, which were considered the most important natural controlling factors of DOC inputs into streams [Mulholland, 1997]. However, in this study, only a weak negative relation between slope gradient and areal proportions of wetlands (AWL) was found (Table A2). Land Cover

[42] According to their b-estimates, each of the distinguished land cover classes (coniferous forest, broadleaf forest, and herbaceous vegetation) adds to the specific fluvial DOC fluxes compared to the remaining area, mainly comprised of agricultural lands and shrub lands. Coniferous forests add mostfDOC, followed by herbaceous vegetation and broadleaf forests. A higher specific fluvial DOC export from coniferous forest in comparison to broadleaf forest and grassland was also estimated by Aitkenhead and McDowell [2000] at the global scale. In a local study in France, however, natural deciduous forests were reported to yield twice as much fluvial DOC than planted coniferous forests [Amiotte-Suchet et al., 2007]. From agricultural areas, Royer and David [2005] expect generally lower specific DOC exports than from forests, which is supported by a negative correlation (r = −0.36) between DOC concentrations and agricultural land reported for stream systems across Europe [Mattsson et al., 2009]. A probable explanation for low DOC exports from agricultural land is a lower soil organic carbon content due to soil erosion, aeration caused by agricultural tilling, and removal of plant litter [cf. Follett, 2001]. No study on fluvial DOC fluxes from shrublands is known to the authors. However, as shrublands mainly occur in the arid to semi-arid western part of the conterminous USA, DOC exports from these ecosystems are expected to be low. Net Primary Production

[43] Net primary production (NPP) would be a predictor, which theoretically could represent vegetation effects as a single continuous parameter. Ludwig et al. [1996b] suggested NPP as a potential predictor for global spatial patterns in fluvial DOC fluxes, but did not include it in their multivariate prediction function, due to its high multicollinearity with runoff. Positive relations between terrestrial NPP and DOC mobilization from soils were further suggested by different regional and local studies [e.g., Fröberg et al., 2006; Jansson et al., 2008]. A significant positive correlation between NPP and fDOC was also found in this study (r = 0.44, Table 4). However, NPP had to be discarded from the multiple regression equation, because it was highly affected by multicollinearity effects with other predictors, specifically with land cover. Wetlands

[44] The correlation between wetland area proportions (AWL) and fDOC is low (r = 0.28, Table A2). With regard to multicollinearity effects, AWL is a robust predictor, which was assigned the lowest VIF in each of the regression models (in each case VIF < 2; Regressions A.1, A.2, and A.3 in Table 2; Regressions S and L in Table 4). Areal proportions of wetlands were identified as highly important predictor in local studies, for which detailed information on their distribution exists [e.g., Aitkenhead et al., 1999; Creed et al., 2008; Johnston et al., 2008]. Harrison et al. [2005] used wetland proportions as the only predictor besides discharge in their model of fluvial DOC exports to coastal zones at the global scale. In this study, the relative importance of the predictor wetland proportion was higher for the large training catchments whereas the importance of the land cover classes was decreased compared to the small catchments (Table 5). It is hypothesized, that this predictor becomes even more important if catchments larger than that used in this study are considered [cf. Harrison et al., 2005], because of the lower biolabile proportion of wetland DOC [cf., e.g., Striegl et al., 2007].

[45] However, the partly coarse classification of wetland proportions in the GLWD data set is a considerable source of uncertainty. Further, considering probably substantial DOC inputs from riparian wetlands [cf. Aitkenhead and McDowell, 2000; Mulholland, 1997; Preiner et al., 2008], DOC concentration do not necessarily decrease downstream despite substantial in-stream/in-river respiration of DOC. As by this study, typology and exact position of wetlands are not distinguished, the specific contribution of riparian wetlands to fluvial DOC fluxes could not be identified. This is further a source of uncertainty as different types of wetlands might be of different importance as DOC source [e.g.,Johnston et al., 2008]. Soil Properties

[46] Topsoil clay content was among the predictors of the best model identified by AIC for the set of all training catchments (Regression A.1 in Table 2). It was, however, not included in the regression equations S and L, because this predictor would not be significant if regression A.2 was refitted for the large catchments. Topsoil clay content shows a notable negative correlation to fDOC (r = −0.45, Table A2). These negative effects are attributed to adsorption of DOC onto clay minerals in the soils, which reduce the amount of DOC exported to streams and rivers [cf., e.g., Nelson et al., 1993; Remington et al., 2007]. Other studies also report an effect of soil clay content on the C:N ratio of fluvial DOM [e.g., Molinero and Burke, 2009; Perakis and Hedin, 2007].

[47] Topsoil organic carbon content shows a notable positive correlation to fDOC in this study (r = 0.52, Table A2). It was identified as an important predictor in previous studies [e.g., Aitkenhead et al., 1999; Ludwig et al., 1996b; Mulholland, 1997]. However, it was not within the set of predictors chosen as best model. It is likely redundant due to its substantial correlations with runoff and topsoil clay content (Table A2).

[48] Note that for North America the HSWD represents soil properties at a coarse spatial resolution as it is based on the FAO World Soil map at a resolution of 1:5 Mio [Food and Agriculture Organization, 1971–1981]. Further, for North America the soil information provided by the HSWD is less reliable compared to other parts of the world [Food and Agriculture Organization, 2009]. Thus, soil properties taken from this source may not be an appropriate predictor for fDOC at the spatial resolution addressed by this study.

[49] Land cover and wetland proportions are suitable surrogates, because they are closely related to soil properties, and because they are available at a higher spatial resolution. However, the land cover classification applied for this study is quite coarse and a high variance in soil properties within the distinguished land cover classes is expected [cf. Aitkenhead-Peterson et al., 2005; Homann et al., 2007; Moore et al., 2008], adding uncertainty to the prediction. Climate

[50] Climatic variables like precipitation and air temperature, which were not included as predictors in the chosen regression equations, were previously identified as controlling factors for smaller study areas and in analyses of temporal variations of DOC fluxes within specific river catchments [e.g., Creed et al., 2008; Raymond and Oh, 2007; Worrall et al., 2004]. Annual precipitation shows a significant positive correlation to fDOC (r = 0.51, Table A2). However, it is redundant due to its close correlation to runoff (r = 0.70, Table A2).

[51] Mean annual air temperature does not show a significant correlation to fDOC (Table A2) and was not identified as predictor in a multivariate regression equation. However, this parameter shows a weak negative correlation with the model residuals of regression equation S, but does not with the model residuals of regression equation L (Figure 5). This implies a slight tendency for fDOC,mod[S] to underestimate specific DOC fluxes in colder climates and overestimate it in warmer regions.

Figure 5.

Scatterplots of absolute and relative residuals (equation (3) and (4)) of the applied regression equations versus multiannual means of air temperature [°C]. As relative residuals partly take extreme values, the respective scatterplots are restricted to catchments with relative residuals of −2 to 2. Catchment Area

[52] As catchment area is discussed in this study for its correlation to calculated DOC losses (see below), this parameter would be of potential interest as a predictor for fluvial DOC flux estimation. However, it was not within the set of predictors chosen by the AIC. For relative residuals (equation (4)) of the regression equations S and L, no significant correlations to this factor were identified (Figure 6). Thus, for the training catchments used in this study, the direct influence of catchment area on DOC is probably low compared to the effect of other catchment properties which were identified as predictors.

Figure 6.

Scatterplots of relative residuals of the regression equations S and L versus decadic logarithm of catchment area [m2]. As relative residuals partly take extreme values, these plots and correlation coefficients are restricted to subsets of the training catchments having relative residuals of −2 to 2. Other Sources of DOC

[53] Two sources of DOC not addressed by this study are autochthonous production of DOC and anthropogenic point sources. In-stream DOC release from the decomposition of allochthonous POC can be significant in streams. This DOC release does not only depend on the amount of terrestrial POC exported to streams, but also on its quality [Yoshimura et al., 2010]. Further, the microbial decomposition and DOC production is influenced by the trophic level of streams [Baldy et al., 2007]. Significant production of DOC may also result from algal blooms, as reported for small agricultural streams during low water stages [Royer and David, 2005]. Although for the landscape carbon budget terrestrial DOC sources are more important than autochthonous DOC production, autochthonous sources are important because of providing more biolabile DOC for stream respiration [Sachse et al., 2005]. In-stream production of DOC is likely related to land use, because of terrestrial POC and nutrient inputs. Thus, the predictor land cover may partly include these effects.

[54] Mulholland [1997] concluded that lateral DOC fluxes in rivers across North America are governed by catchment processes and thus terrestrial DOC sources whereas no significant contribution of autochthonous DOC production could be identified. However, for the Mississippi River Bianchi et al. [2004]suggest that autochthonous production of dissolved organic matter (DOM) by algae is of higher importance than previously thought. They hypothesize that damming and eutrophication have increased the proportion of algae derived DOM, which is likely a widespread phenomenon. However, they further suggest, that this algae derived DOM fosters the decomposition of terrestrial DOM by the process of co-metabolism, i.e., nutrients delivered by algae DOM enable decomposition of terrestrial DOM. Generally, lakes and reservoirs are most often thought to be net sinks of DOC rather than net sources [e.g.,Algesten et al., 2004; Kastowski et al., 2011; Sobek et al., 2007].

[55] For the set of 453 training catchments, there is a positive correlation between lake area proportion and fluvial DOC fluxes (r = 0.44, Table A2). This correlation does not necessarily hint at an in-lake production of DOC, as this functional relation does not necessarily represent a causal relation. However, only few catchments show a notable lake proportion (8% of catchments have a lake area proportion >5%). Thus, the set of training catchments is not suitable to analyze the effects of lake area proportions on fluvial DOC fluxes.

[56] Anthropogenic point sources of DOC, which can substantially contribute to fluvial DOC fluxes in densely populated areas [e.g., Tipping et al., 1997], are expected to be negligible in this study. Artificial areas, including urban areas, are scarce (below 1.6% for 90% of all training catchments) and sampling locations within or directly downstream of urban areas were discarded from the analysis. Further, as wastewater DOC is more labile than terrestrial DOC [cf. Garcia-Esteves et al., 2007], the effects from upstream anthropogenic point sources are expected to quickly decline downstream.

4.2. Implications for In-Stream Losses of DOC

[57] The specific fluvial DOC fluxes estimated in this study refer to points within the fluvial network with contributing areas similar in size to the used training catchments. Thus, an average DOC flux of 1.82 t C km−2 a−1 (std. err. ± 12%) for the study area represents specific DOC fluxes from ‘small catchments’ (fDOC,mod[S]) with an average size of 494 km2. For catchment areas similar to the ‘large catchments’ of this study (avg. size 26,525 km2), a substantially lower average specific DOC flux of 1.40 t C km−2 a−1 (std. err. ± 10%) (fDOC,mod[L]) was estimated for the study area, and the 23% difference between both was consequently attributed to in-river losses of DOC. The plausibility of that number is supported by literature, reporting a downstream DOC loss of 24.5% in the Pearl River (Mississippi/Louisiana, catchment area: 22,700 km2) [Duan et al., 2007] and 28% in the lower Hudson River (New York, catchment area: 33,500 km2) [Cole and Caraco, 2001]. Based on the estimates on fluvial DOC exports and in-river respiration of DOC,Worrall et al. [2007]concluded that at least 32% of the terrestrial DOC exports are mineralized within the fluvial network of England and Wales. As it is expected that sinks other than respiration and photo-oxidization are negligible [cf.Worrall et al., 2007], the total DOC loss of 5.9 Mt C a−1 (std. err. ± 78%) estimated for North America south of 60° is assumed to be lost as CO2 from respective river stretches to the atmosphere.

[58] Results from the budget approach averaging fDOC,mod[S] and fDOC,calcfor three catchment size classes suggest a nonlinear relationship between in-river DOC losses and catchment area (cf.Table 5). A downstream decrease of DOC losses is explained by the preferred degradation of the more labile proportion of DOC while the proportion of refractory DOC increases [cf. Worrall et al., 2007].

[59] According to the global estimates by Battin et al. [2008], respiration within small headwater streams is of the same magnitude as respiration within rivers, distinguishing streams from rivers by a mean discharge lower than 0.5 m3 s−1. As only 15% of the ‘small catchments’ used in this study are attributed to streams according to this definition (average discharge of the 246 small catchments: 7.3 m3 s−1), the estimated DOC loss is fully attributed to rivers, and the total in-stream/in-river losses within the study area might be considerably higher.

[60] On the other hand, the fact that similar predictors of fluvial DOC fluxes are reported in literature from local to global scales suggests that still a substantial proportion of terrestrial DOC behaves rather conservative during fluvial transport.

4.3. Uncertainties

[61] Due to the nearly normal distribution of residuals of the used regression equations (Figure 2), no general bias is expected from the estimation of fluvial DOC fluxes. However, as only 55% (Regression S, small catchments) or 60% (Regression L, large catchments) of the variance in fDOC,calc is statistically explained, there is a certain tendency to overestimate low values and underestimate high values (Figure 7 and equations (3) and (4)).

Figure 7.

Scatterplots of absolute and relative residuals (cf. equations (3) and (4)) of the applied regression equations versus calculated specific DOC fluxes (fDOC,calc) [t C km−2 a−1]. As relative residuals partly take extreme values, the respective scatterplots are restricted to catchments with relative residuals of −2 to 2.

[62] For a small proportion of the training catchments (small catchments: 18 of 246, large catchments: 11 of 207 catchments, Figures 8a and 8b), the lumped estimation of fDOC (i.e., for each catchments based on averages of the applied predictors) resulted in negative specific DOC fluxes. After applying the fitted regression equations spatially explicitly, raster cells with negative values were forced to 0 t C km−2 a−1 as negative fluvial DOC fluxes are not expected. This procedure affected 16.1% (fDOC,mod[S]) and 12.5% (fDOC,mod[L]) of the raster cells. However, for the larger proportion of these areas (fDOC,mod[S]: 69%, fDOC,mod[L]: 82%) the UNH/GRDC data give average annual runoffs below 10 mm a−1. Thus, specific DOC fluxes from the affected areas are negligible at the scale addressed by this study, even if DOC concentrations in the river water were high.

Figure 8.

Scatterplots of (a and b) lumped estimates versus calculated specific DOC fluxes, (c and d) lumped estimates versus spatially explicit estimates of specific DOC fluxes, and (e and f) spatially explicit estimates versus calculated specific DOC fluxes (fDOC,calc). Straight lines represent theoretical 1:1 ratios.

[63] Further, lake areas were forced to 0 t C km−2 a−1, as by the applied predictors only terrestrial sources of DOC are addressed and lakes are expected to be sinks of fluvial DOC [cf. Kortelainen and Mannio, 1988; Sobek et al., 2005, 2007]. Within the training catchments, lake proportions are generally small (average: 1.3% within both sets of training catchments; above 5% only in about 10% of the small and 5% of the large catchments). Thus, no substantial bias from this procedure is expected.

[64] The spatially explicit estimates reproduce those derived from the lumped estimation well for estimated specific DOC fluxes above 0.5 t C km−2 a−1 (cf. Figures 8c and 8d). For lumped estimates below 0.5 t C km−2 a−1, spatially explicit estimates are mostly higher, probably due to forcing negative cell values to 0 t C km−2 a−1. The correlations with fluxes calculated from the hydrochemical data (fDOC,calc) are similar for spatially explicit estimates and lumped estimates (Figures 8e and 8f). This confirms the validity of the spatially explicit application of the regression equations for the area represented by the training catchments.

[65] The validity of the spatial extrapolation of the DOC flux estimation requires that the training catchments are representative for the extrapolation area, especially with regard to physiographic properties which were used as predictors (Table A1). The maximum average catchment runoff covered by both sets of training catchments is 1,035 mm a−1. Within the extrapolation area, runoff reaches values of up to 4,738 mm a−1. However, only 3.6% of the extrapolation area shows runoff values above 1,035 mm a−1. The maximum average slope gradient covered by both sets of training catchments is 22.6° (Table A1). Only about 2.8% of the extrapolation area show higher slope gradients. As 94.6% of the extrapolation area shows values for both predictors which are covered by both subsets of training catchments, it is concluded that a probable bias due to a nonvalid extrapolation to the remaining area is negligible for the overall results.

[66] The averages and distributions of land cover classes and wetlands are comparable among the two sets of training catchments. Notable proportions of the training catchments are dominated (>60% cover) by coniferous forests (ACF) (small c.: 62/243, large c.: 38/207) or broadleaf forests (ABF) (small c.: 57/243, large c.: 30/207). This suggests that the effects of these land cover classes are well distinguished from effects of the other land cover. On the contrary, there are no training catchments dominated by wetlands (AWL) or herbaceous vegetation (AHV). With regard to the extrapolation area, especially AWL is underrepresented by the training catchments (Table A1). However, as the comparison with literature values for wetland dominated river catchments in Canada did not reveal a general over- or underestimation of fluvial DOC fluxes (cf.Table 6), a general bias from this predictor is excluded at least for fDOC,mod[L].

[67] The statistical distributions of predictors ACF, ABF, AHV, and AWL are similar within both sets of training catchments. The value ranges of the predictors runoff and slope gradient are larger within the small catchments and fully cover those within the large catchments. Thus, it is concluded that the DOC loss estimation based on fDOC,mod[S] and fDOC,calc within the large catchments is valid.

[68] The relative difference between the averages of fDOC,mod[S] and fDOC,mod[L] for the extrapolation area (23%) is close to the relative difference between fDOC,mod[S] and fDOC,calcwithin the large catchments (19%). This indicates that for the extrapolation area, the estimated in-river DOC loss is reasonable and not generally biased by an invalid extrapolation of the empirical DOC flux equations. Nevertheless, this indirect method to assess DOC loss bears a high level of uncertainty (relative standard error 79%). However, it could be shown with a high level of certainty (probability of 90%) that a downstream decrease in fDOC from the small to the large catchments takes place.

5. Conclusion

[69] As expected, the mean and total DOC flux estimated for the study area (North America south of 60°N) are higher using the regression equation for small catchments <2,000 km2 (1.82 t C km−2 a−1, 25.8 Mt C a−1, Std.Err. 12%) than that for the large catchments >2,000 km2 (1.40 t C km−2 a−1, 19.9 Mt C a−1, Std.Err. 10%). The total difference of 5.9 Mt C a−1 (standard error 79%) is interpreted as the average DOC loss that occurs in river stretches leading from contributing areas of 494 km2 (average size of the 246 small catchments) over to such of 26,525 km2 (average size of the 207 large catchments), giving an overall DOC loss rate of 23% within these river stretches.

[70] The identified and applied predictors for the DOC flux estimation confirm those described in the literature: runoff, slope gradient, land cover, and wetland area proportions. Applying the same set of predictors for different catchment size classes allows the estimation of in-river DOC loss at regional to global scales, which is a main advantage of the methodological concept presented here over studies addressing single river systems.

[71] It was shown that from small to large catchments the relative importance of land cover as predictor decreased whereas that of runoff and wetland area proportions increased. This hints to differences in the lability of fluvially transported DOC depending on its terrestrial source, with wetland derived DOC being less labile. However, further research needs to be done to confirm this hypothesis at regional to global scales.

[72] As smallest headwaters are not represented in the data set, and as literature suggests DOC losses in such headwaters to be substantial, the estimated specific DOC fluxes from small catchments are still a conservative estimate of DOC exports from the terrestrial system. Further, stormflow events, which substantially contribute to DOC fluxes, are not well represented, leading to an underestimation of fluvial DOC fluxes, specifically from small catchments. These two facts imply that our DOC loss estimate represents a conservative estimate of total DOC loss within the whole fluvial network.

[73] To better assess in-stream/in-river DOC losses at regional to global scales, the total exports of DOC from the terrestrial system into streams and rivers need to be assessed in future studies. For this, more data on DOC fluxes are needed which cover smallest headwater catchments and the large variety of land properties. With regard to these smallest headwaters, there is a need for improved geodata sets with a higher spatial resolution, especially for runoff and soil properties. Further, more data on the quality of DOC are needed, which allow for an evaluation of its susceptibility to decomposition during fluvial transport.

Appendix A

[74] This appendix provides an overview of the statistical distribution of the catchment parameters considered in this study (Table A1) and the correlations between the different catchment parameters (Table A2). Further, a map of slope gradients within the study area is provided (Figure A1) which helps interpreting this parameter's effect on the DOC flux estimations.

Table A1. Statistical Distribution of Parameters Within the Small and the Large Catchments, and the Spatially Explicit Extrapolation Raster
ParameterSubsetaMeanMedianMinMax10th Perc.90th Perc.Std.
  • a

    Subset of training catchments (c.) or the output raster of the spatially explicit application and thus extrapolation of the respective DOC flux estimation equations which represents the entire study area.

q (mma−1)small c.53649602,543781,071387
 large c.29726801,03524642228
 raster306 04,738  391
s (deg)small c.5.172.660.4926.611.1015.555.61
 large c.6.636.410.6822.631.1212.504.79
 raster4.21 0.0061.82  5.96
ABF (1)small c.0.340.280.000.970.020.750.28
 large c.
 raster0.25 0.001.00  0.33
ACF (1)small c.0.320.
 large c.0.300.230.000.920.010.690.26
 raster0.31 0.001.00  0.41
AHV (1)small c.
 large c.
 raster0.15 0.001.00  0.26
AWL (1)small c.
 large c.
 raster0.15 0.001.00  0.25
DOC conc. (mg C L−1)small c.4.533.830.6017.591.329.033.20
 large c.4.593.900.5818.111.657.873.11
fDOC,calc (t C km−2 a−1)small c.2.241.420.0013.380.245.072.15
 large c.1.090.830.004.870.142.350.92
fDOC,mod[S] (t C km−2 a−1)small c.
 large c.1.351.240.023.960.382.420.82
 raster1.82 0.0017.61  1.61
fDOC,mod[L] (t C km−2 a−1)small c.1.881.700.015.510.453.381.11
 large c.
 raster1.40 0.0015.24  1.61
Table A2. Correlations Between Potential Predictors and the Calculated Specific DOC Fluxesa
 MeansStd. Dev.s (deg)q (m a−1)ABF (1)ACF (1)AHV (1)Aother (1)AWL (1)Topsoil Clay (1)Topsoil Corg (kg m−2)NPP (kg C m−2 a−1)Mean Air Temp. (°C)Precipitation (m a−1)Lake Area Prop. (1)log10(Catchm. Area (m2))CTImodfDOC,calc (t C km−2 a−1)
  • a

    Based on the entire set of 453 catchments. Statistically significant (p < 0.05) correlations are highlighted in bold.

s (deg)5.845.301.00               
q (m a−1)0.430.340.191.00              
ABF (1)0.320.25−             
ACF (1)0.310.300.450.260.481.00            
AHV (1)           
Aother (1)          
AWL (1)−0.030.01−0.081.00         
Topsoil clay (1)        
Topsoil Corg (kg m−2)1.671.220.110.440.030.300.270.310.300.441.00       
NPP (kg C m−2 a−1)0.490.22−0.060.450.380.340.590.620.      
Mean air temp. (°C)8.345.220.500.170.270.400.110.18−0.080.480.340.151.00     
Precipitation (m a−1)0.950.380.150.700.350.080.300.370.    
Lake area prop. (1)−   
log10(catchm. area (m2))−−  
fDOC,calc (t C km−2 a−1)1.721.790.290.580.−0.050.510.440.310.191.00
Figure A1.

Slope gradient within the study area (based on SRTM data, see text).


dissolved organic carbon.


dissolved organic matter.


specific fluvial DOC flux.


fDOC calculated from hydrochemical data.


fDOC estimated spatially explicit by regression equation fitted for catchments <2,000 km2 (‘small catchments’).


fDOC estimated spatially explicit by regression equation fitted for catchments >2,000 km2 (‘large catchments’).




slope gradient.


areal proportion of broadleaf forests.


areal proportion of coniferous forests.


areal proportion of herbaceous vegetation.


areal proportion of other land cover.


areal proportion of wetlands.

Topsoil clay

topsoil clay content.

Topsoil Corg

topsoil organic carbon content.


net primary production.


variance inflation factor.


Akaike information criterion.


[75] This work was supported by the German Science Foundation (Cluster of Excellence “CliSAP” (EXC177) and the DFG grant HA7742/6-1). We thank the USGS, ‘Environment Canada’, the Alberta Ministry of the Environment, and the Ontario Ministry of the Environment for providing data. The two anonymous reviewers, the editor Dennis Baldocchi, and the associate editor are thanked for helping to improve the manuscript.