Controls of dissolved organic matter quality: evidence from a large-scale boreal lake survey



Inland waters transport large amounts of dissolved organic matter (DOM) from terrestrial environments to the oceans, but DOM also reacts en route, with substantial water column losses by mineralization and sedimentation. For DOM transformations along the aquatic continuum, lakes play an important role as they retain waters in the landscape allowing for more time to alter DOM. We know DOM losses are significant at the global scale, yet little is known about how the reactivity of DOM varies across landscapes and climates. DOM reactivity is inherently linked to its chemical composition. We used fluorescence spectroscopy to explore DOM quality from 560 lakes distributed across Sweden and encompassed a wide climatic gradient typical of the boreal ecozone. Six fluorescence components were identified using parallel factor analysis (PARAFAC). The intensity and relative abundance of these components were analyzed in relation to lake chemistry, catchment, and climate characteristics. Land cover, particularly the percentage of water in the catchment, was a primary factor explaining variability in PARAFAC components. Likewise, lake water retention time influenced DOM quality. These results suggest that processes occurring in upstream water bodies, in addition to the lake itself, have a dominant influence on DOM quality. PARAFAC components with longer emission wavelengths, or red-shifted components, were most reactive. In contrast, protein-like components were most persistent within lakes. Generalized characteristics of PARAFAC components based on emission wavelength could ease future interpretation of fluorescence spectra. An important secondary influence on DOM quality was mean annual temperature, which ranged between −6.2 and +7.5 °C. These results suggest that DOM reactivity depends more heavily on the duration of time taken to pass through the landscape, rather than temperature. Projected increases in runoff in the boreal region may force lake DOM toward a higher overall amount and proportion of humic-like substances.


Dissolved organic matter (DOM) has a central role in biogeochemical processes within lakes and has an active role in the global carbon cycle (Cole et al., 2007; Battin et al., 2009; Tranvik et al., 2009). Lakes receive and process DOM generated from catchment areas substantially greater than the area of the lake itself (Prairie, 2008). The mineralization of organic matter stored within soils is inhibited by numerous stabilization processes (Kögel-Knabner et al., 2008), however as organic matter is mobilized from catchment soils to the aquatic environment, it becomes far more reactive (Kalbitz et al., 2005) with the potential to become an important source of CO2 (Prairie et al., 2002; Sobek et al., 2005). The molecular composition of lake DOM is highly variable and dynamic, reflecting the source, amount of time processed within lakes, and biological reactivity (Berggren et al., 2007; Cory et al., 2007; Jaffe et al., 2008). While lake DOM inputs are largely of terrestrial origin, inputs of DOM from algae and macrophytes can also be highly relevant energy sources for the microbial loop (Yamashita & Tanoue, 2003; Lapierre & Frenette, 2009; Guillemette & Del Giorgio, 2011; Karlsson et al., 2012; Catalán et al., 2013). Thus, DOM can be utilized and transformed by numerous in-lake processes including microbial respiration, photolytic degradation, and flocculation (Bertilsson & Tranvik, 2000; Cory et al., 2007; Von Wachenfeldt & Tranvik, 2008; Koehler et al., 2012). Variability in the hydrological connectivity of different water sources may also have a regulating influence on the composition of DOM (Stedmon & Markager, 2003; Massicotte & Frenette, 2011).

Currently, there is widespread awareness that concentrations of DOM exported to aquatic ecosystems have increased in parts of Northern Europe and North America (De Wit et al., 2007; Monteith et al., 2007; Couture et al., 2012). Associated changes to the molecular composition of DOM with increased concentrations remain unclear as monitoring programs have traditionally ignored or included limited measures of DOM quality. Consequently, the task of identifying mechanisms for increased DOM fluxes has been difficult (Tranvik & Jansson, 2002; Roulet & Moore, 2006). Previous links to declining sulfate deposition have relied on correlative analysis (Monteith et al., 2007), yet, recent experimental work has provided support that increasing dissolved organic carbon (DOC) export from soils is associated with decreasing soil acidity (Evans et al., 2012). It is expected that catchment soils will continue to recover from acidification in the upcoming decade; however, regions, such as the boreal ecozone anticipating increasing precipitation levels, may also observe continued inputs of additional DOM fluxes due to greater runoff (Hongve et al., 2004; Eimers et al., 2008; Löfgren & Zetterberg, 2011).

Increasing DOM fluxes are particularly problematic for water utilities because there is a higher cost associated with removing DOM for drinking water purposes. Untreated DOM can contribute to poor taste and odor problems, increase the potential to form harmful disinfection by-products and contribute to biological regrowth within the distribution system (Matilainen et al., 2011). Thus, there is a growing demand to identify key DOM components found in surface waters that are most difficult to treat, as well as those that are easily removed during water treatment (Bieroza et al., 2009; Murphy et al., 2011).

The concentration of lake water DOM is influenced by a multitude of drivers varying in scale from regional climate influences, to catchment-specific variables, and lake water chemistry. Lake DOM concentration has been linked to mean annual temperature (Weyhenmeyer & Karlsson, 2009; Laudon et al., 2012), water discharge (Eimers et al., 2008), and hydrological connectivity between key landscape features such as wetlands and streams (Schiff et al., 1997; Köhler et al., 2008; Laudon et al., 2011). Increases in precipitation and discharge are anticipated in northern boreal regions (Kundzewicz et al.,2007), which would lead to shorter lake water retention times. For this reason, it is particularly relevant to understand how altered lake water retention times could influence the quality of DOM. At the catchment scale, DOM concentration has been strongly linked to land cover, particularly wetland area and the percentage of other water bodies in the catchment (Kortelainen, 1993; Dillon & Molot, 1997), water retention times (Hanson et al., 2011), differences in topographical gradients (Creed et al., 2003), catchment to lake area, landscape positioning, slope, and catchment soil characteristics (Trumbmore et al., 1992; Kalbitz et al., 2000). Accordingly, a multitude of factors may influence the overall concentration of DOM spatially and temporally.

Studies dedicated to examining how landscape and climate controls the quality of DOM are also emerging. Land use can influence DOM quality. For example, more bioavailable DOM has been found in streams draining agricultural (Wilson & Xenopoulos, 2009; Williams et al., 2010) and urbanized areas (Baker & Inverarity, 2004). DOM draining forested landscapes has been identified as more bioavailable than DOM draining wetland-dominated catchments (Berggren et al., 2007). A recent detailed hydrological study of the St. Lawerence river identified the spatial connectivity of mixing water bodies as an important regulator of DOM composition (Massicotte & Frenette, 2011). In addition, studies on arctic rivers have found that hydrological transport, as well as photochemical (Cory et al., 2007), and biological processes (Mann et al., 2012), control the chemical composition of DOM. Changes to the composition of DOM can have societally relevant consequences. For instance, shifts in DOM from terrestrial to algal-derived composition within a drinking water reservoir has been linked to disinfection by-product formation potential (Kraus et al., 2011). A detailed understanding of how different DOM pools react over time can provide unique insight into how the biogeochemical function of DOM is also dynamic (Miller et al., 2009). During transport from terrestrial sources to the sea, DOM has been described as undergoing a shift in composition analogous to weathering (Smith & Benner, 2005) and typically becomes less colored during passage through the landscape (Weyhenmeyer et al., 2012).

Fluorescence spectroscopy is a relatively easy and cost-efficient method to characterize DOM in natural waters, whereby the fluorescence properties reflect differences in the composition of organic matter (Hudson et al., 2007). Fluorescence spectroscopy generates valuable insight into DOM, the source and level of processing and provides great promise for use in routine monitoring programs as well as for drinking water purposes, due to its selectivity and sensitivity (Baker & Inverarity, 2004; Jaffe et al., 2008; Henderson et al., 2009; Fellman et al., 2010). An especially useful advance has been to decompose the fluorescence spectra into underlying fluorescent components with multivariate data analysis, such as parallel factor analysis, (PARAFAC) (Bro, 1997; Stedmon & Bro, 2008) and subsequently identify patterns with these underlying fluorescence components (Stedmon & Markager, 2005; Walker et al., 2009).

The aim of this study was to link the fluorescence characteristics of lake water DOM to lake chemistry, catchment, and climate characteristics across a wide range of land-use and climate gradients spanning the full geographical area of Sweden. By using multivariate statistical approaches, we aim to decipher the relative importance of climate, catchment, and lake water chemistry in driving DOM quality. Our final goal was also to identify unique PARAFAC components that could be useful proxies for DOM reactivity, by recognizing patterns between components most readily lost within lakes and those found to be most persistent within lakes. These findings are useful for predicting the effects of changing climate patterns and land-use on lake DOM quality and to assess its treatability for drinking water purposes.

Materials and methods

Study sites

Lakes were sampled within the annual Swedish lake water-monitoring program conducted by the Swedish University of Agricultural Sciences (SLU). Of the approximately 100 000 lakes in Sweden larger than 1 hectare, more than 1000 lakes are sampled annually. In 2009 and 2010, every third lake visited by the national monitoring program was analyzed for DOM fluorescence and absorbance characteristics. A total of 560 samples were measured, spanning across the 13° latitudinal gradient of Sweden (Fig. 1). All sampling was performed by helicopter, whereby lake water was collected at a depth of ca. 1 m near the center of the lake during the autumn when lakes are well-mixed, starting from northern to southern Sweden ( We did not monitor fluxes of water or DOM quality from sources of lake inputs (e.g., streams, groundwater, and precipitation), and we assumed water and DOM within lakes were well mixed. Each lake was sampled once, and lakes were not necessarily hydrologically connected to each other. Sampling occurred between September 6th and November 11, 2009 and between September 26th and November 25, 2010. Lake water samples were transported to the laboratory within 2 days of collection, where they were stored at 4 °C in the dark. All analyses including water chemistry and spectroscopic analyses were conducted within days, and no longer than 2 weeks from collection.

Figure 1.

Spatial distribution of 560 study lakes across Sweden.

Climate and geographical variables

Latitude is negatively correlated with the wide range of mean annual temperatures (MAT) included in this study, spanning from −6.1 in the north, to 6.5 in the south. MAT was adjusted for altitude with a −0.6 °C correction per 100 m (Livingstone et al., 1999). In addition to MAT, we used mean growing degree days (GDD) corresponding to the number of days in the year above 0 °C. GDD ranged from 100 days in the north, to 210 days in the south. Mean annual precipitation (MAP) and GDD were derived from the Swedish Meteorological and Hydrological Institute (SMHI) ( as a mean between the years 1961 and 1990. Estimates of the mean long-term lake discharge (Q) [m3 s−1] from the reference period January 1, 1961 to December 31, 1990, and lake volumes (V) were available from SMHI. Where lake volumes were missing (73% of the lakes), we estimated lake volumes from lake area and maximum slope within a 50 m buffer (Sobek et al., 2011). Lake water retention times (WRT) were calculated as V/Q. The altitude of lakes was provided by Lantmäteriet ( Land cover data for lake catchments was generated from the CORINE database (Hagner et al., 2005) developed by SLU, and expressed as percentage forest (conifer, deciduous, and mixed), wetland (open wetland, conifer/deciduous, and mixed forest on wetland), agriculture (landscaped green space, cultivated areas, pasture, and arable land), other (urban, poor vegetation, exploited land, and logging), and water (the surface area of the catchment covered by open water, not including the lake itself). Percentage agricultural coverage was converted to a ranked variable, because >50% of sites had <5% agriculture.

Lake water chemistry

Routine chemical analysis was conducted by the Department of Aquatic Sciences and Assessment (Swedish Agricultural University, SLU) a nationally accredited laboratory using standard protocols for the analyses of lake water (SLU, Chemical variables included total organic carbon (TOC), pH, conductivity, alkalinity, the sum of base cations (BC; Ca+2, Mg+2, Na+1, and K+1), anions (SO4−2, F, and Cl), total nitrogen (TN), total inorganic nitrogen (TIN = NO3-N + NH4+-N), total organic nitrogen (TON=TN-TIN), total phosphorus (TP), total heavy metals (Cu+2, Zn+2, Cd+2, Pb+2, Ni+2, and Co+2), aluminum (Al), and iron (Fe). Lake water was not filtered before the analysis of TOC, TN or TIN; however, the vast majority of organic matter in boreal lakes is in the dissolved state (>90%) (Köhler et al., 2002; Kortelainen et al., 2006; Eimers et al., 2008; Von Wachenfeldt & Tranvik, 2008). Thus, TOC and TON are used here as measures of dissolved organic carbon (DOC) and dissolved organic nitrogen (DON). All samples were filtered prior to spectral analysis using a 0.45-μm filter.

UV-visible absorbance

The absorbance spectra of filtered DOM were measured across a wavelength range from 200 to 600 nm, at 1 nm intervals at a scan speed of 240 nm min−1 and slit width of 2 nm using a Lambda 40 UV-visible spectrophotometer (Perkin Elmer). Samples were measured in a 1 cm quartz cuvette, and Milli-q water was used as the blank. The shape of the absorption spectrum between 250 and 600 nm was characterized with an exponential model (aλ = aλo eS(λo – λ)); aλ (m−1) is the absorbance coefficient at the wavelength λ (nm), and aλo is the absorbance at a reference wavelength, λo (250 nm). We used the measured decadal absorbance at 254 (A254) and 420 nm (A420). The slope (S250–600) is a measure of how steeply the absorbance decreases with increasing wavelength. SUVA (m2 g−1 C) was calculated as the DOC normalized absorbance at 254 nm.

Fluorescence measurements

Excitation-emission matrices (EEM) from lake water samples were analyzed in sample mode (S) using a fluorescence spectrophotometer (SPEX FluoroMax-2, Horiba Jobin Yvon) and a 1 cm quartz cuvette. The sample signal was then corrected for the reference (R) lamp signal, to get S/R. The excitation wavelengths (λex) spanned from 250 to 445 nm, at increments of 5 nm, while emission wavelengths (λem) ranged from 300 to 600 nm, at increments of 4 nm. Excitation and emission slit widths were set to 5 nm, and the integration time was 0.1 s. All EEMs were blank-subtracted using the EEM of Milli-q pure water run on the same day. Manufacturer supplied instrument correction factors were used to correct for instrument-specific biases (using Sc/Rc). Spectra were corrected for inner filter effects using the absorbance-based approach (Lakowicz, 2006; Kothawala et al., 2012), which was tested to be effective within the absorbance range used in this study (Kothawala et al., 2013). Fluorescence intensity was calibrated to Raman units by dividing the intensity by the Raman area of pure water integrated at an excitation of 350 nm, and over an emission range of 380 to 420 nm. Any residual traces of a Rayleigh peak were removed before PARAFAC analysis by transforming negative values to zero.

PARAFAC analysis

Parallel factor analysis was applied to characterize the DOM fluorescence signal using MATLAB software (MATLAB® 7.7.0, The MathWorks, Natick, USA, 2008) and the DOMFluor toolbox (Stedmon & Bro, 2008). PARAFAC decomposes the EEMS dataset in to a series of tri-linear components, which explains the variability in fluorescence characteristics of the dataset. Prior to PARAFAC modeling, data in the region of second-order scatter was deleted (emission wavelength >500 nm). First-order scattering from water was removed by inserting a band of zeros where excitation was greater than emission, starting at 298 nm. Steps to perform PARAFAC analysis included the identification of outliers, minimization of sum of squares residuals, identification of the appropriate number of PARAFAC components, random initialization with 10 iterations, and model validation using split-half analysis (Stedmon & Bro, 2008). Accordingly, six PARAFAC components were found to provide a robust description of DOM fluorescence within the dataset. Four of the six components had multiple excitation loadings, and all emission loadings had smooth uni-modal peaks, as expected for fluorophores (Andersen & Bro, 2003). In addition to assessing the intensity of each PARAFAC component (Ci), the relative fluorescence intensity of each component was expressed as a percentage (%Ci) of the sum of all six component intensities (∑Ci) using, %Ci = Ci/∑Ci × 100%. To evaluate the robustness of the six-component PARAFAC model in describing fluorescence characteristics of lake DOM across the Swedish landscape, we calculated the accumulated sum of squares error across excitation and emission wavelengths for three distinct subregions of the country and compared the results. The three subregions were as follows: Northern Alpine, Northern Lowlands, and Southern Lowlands.

Testing the robustness of PARAFAC Components

A total of six fluorescing PARAFAC components were identified (Fig. S1; Tables S1 and S2). Visual inspection of residual EEM spectra ensured that unaccounted fluorescence was near the baseline for each sample, with residual fluorescence <10% of measured fluorescence intensity. In addition, we did not find any systematic pattern of residual fluorescence by visually examining the average root mean squared error summed across excitation (= 40) and emission (= 76), even when the dataset was divided into three distinct subregions of Sweden (Fig. S2). The analysis of residuals along with split-half validation (Stedmon & Bro, 2008) ensured the six-component PARAFAC model was a robust description of lake DOM fluorescence.

Principle component analysis (PCA) and partial least squares (PLS) analyses

Principle component analysis (PCA) was used initially as a diagnostic tool to examine key relationships and to identify strongly correlated variables, excluding optical measurements. Partial least squares (PLS) was further applied to predict PARAFAC components (Y response variables) from lake water chemistry, catchment characteristics, and climate variables (X predictor variables listed in Table 1). The PLS approach was chosen due to its robustness to colinearity of multiple variables by identifying an optimal set of orthogonal principle components able to explain the maximum variability between X and Y variables. Samples with a disproportionately strong leverage on the overall model were identified as outliers (>95% confidence limit) using the Hotelling's T2 analysis. To ensure assumptions of the model (PCA and PLS) were met, data were normalized using log-transformations where necessary, with skewness targeted to a min/max <0.1 (Eriksson et al., 2001). PLS models were constructed using all six PARAFAC components as the Y variables, expressed either as fluorescence intensities (Ci), or as percentages (%Ci). Climate, catchment, and lake water chemistry were X variables.

Table 1. Distribution of all variables included in partial least squares analysis including in-lake chemistry, catchment, climate and optical characteristics for boreal lakes included in the study (n = 560)
VariablesAbbreviationMean ± Std Dev.MinMax
  1. a

    % Agr converted to a ranked variable for PLS analysis.

  2. b

    MAT corrected for altitude.

In-Lake Chemistry
DOC (mg l−1)DOC13.2 ± 6.082.432.4
pHpH6.5 ± 0.74.528.01
Total nitrogen (mg l−1)TN0.46 ±
Total phosphorous (μg l−1)TP13.1 ±
DOC : DONDOC : DON34 ± 91560
Sum of Ca+2,Mg+1,K+1,Na+1 (meq l−1)BC0.47 ± 0.310.041.88
Dissolved inorganic nitrogenDIN0.07 ±
Dissolved organic nitrogenDON0.39 ±
Total Fe (mg l−1)Fe0.66 ± 0.600.013.30
Total Al (mg l−1)Al0.14 ±
Total Mn (mg l−1)Mn0.05 ±
Sum of SO4−2, Cl−1 (meq l−1)Anions0.12 ±
Sum of Cu, Zn, Cd, Pb,Cr, Ni, Co, As, V (μg l−1)Metals5.33 ± 2.910.9414.16
Catchment Characteristics
Water retention time (years)WRT0.86 ± 0.890.003.92
Altitude (m)Alt.215 ± 1617848
Slope (°)Slope8 ± 4129
Lake Area : Catchment AreaLk : Catch0.15 ±
Runoff (m3 sec−1)Runoff1.195055.546
Water (% of Catchment)% Water9.4 ± 7.1048
Forest (% of Catchment)% Forest63 ± 15098
Wetland (% of Catchment)% Wetland12 ± 12087
Agriculture (% of Catchment)a% Agr5 ± 11097
Other (% of Catchment)% Other12 ± 13091
Climate Variables
Mean annual temperature (°C)bMAT2.5 ± 3.5-6.27.5
Mean annual precipitation (mm)MAP792 ± 1504501250
Mean growing degree daysGDD175 ± 28110210
Absorbance Indexes
Specific absorbance at 254 nm (L mg−1 C m−1)SUVA3.64 ± 0.771.425.52
Absorbance at 420 nm (m−1)A4200.05 ±
Spectral slope (nm−1)S250–6000.013 ± 0.0010.0110.017

To interpret loadings plots of PLS analysis, the distance and positioning of Y variables (PARAFAC components) relative to X variables was revealing of how well, or poorly, they related to each other. The greater the distance a variable (X or Y) was from the origin, the greater its overall influence. Variables situated close together on the PLS plot were positively correlated, while variables opposite to each other were negatively correlated. We performed internal cross-validation, to test the repeatability of the analysis, by removing a random subset of data (1/7th of the samples) to be used as the response dataset, while parallel models were run on the reduced calibration dataset. A comparison of predicted values from the calibration and response datasets allowed computation of the predictive residual sum of squares, expressed as a Q2Y. Overall, PLS model performance was based on the cumulative goodness of fit (R2Y, explained variation), and the cumulative goodness of prediction (or Q2Y, predicted variation) for each model. The variable influence on projections (VIP scores) provided a means of interpreting the importance of X variables on the overall model. X-predictor variables with a VIP score ≥1 were considered highly influential, between 0.8 and 1.0 moderately influential, and <0.8 less influential predictors.

A permutation test was performed to estimate how much of the explained variability could be attributed to random chance. After creating a PLS model for each Y variable separately, the Y variable was randomly permuted, and re-run. The difference between y-intercepts of R2 and Q2 regressions passing through the unpermuted dataset and 100 permuted datasets provided an estimate of the background correlation due to chance for each Y variable. Small model background correlations indicated robust models (Eriksson et al., 2001). All PCA and PLS analyses were carried out in SIMCA version 13.0.2 (Umetrics AB, Umeå, Sweden). Once key relationships between Y and X variables were identified, simple linear and exponential correlation analyses were performed independently of PLS analysis using JMP version 10.0 (SAS Institute, Cary, NC).


Prediction of PARAFAC components

The relative positioning of Y and X variables differed substantially when Y variables were expressed as PARAFAC component intensities (C1 to C6; referred to as the C-intensity model) (Fig. 2a), or as a percentage (%C1 to %C6; referred to as the C-percentage model) (Fig. 2b). Insight from both PLS models provided a more robust interpretation of potential factors driving DOM quality.

Figure 2.

Partial least squares loadings plots predicting the variability of six PARAFAC components (Y variables) expressed as (a) fluorescence intensities, and (b) as a percentage, with several climate, geographical, in-lake chemistry and spectral characteristics as predictors (X variables).

The C-intensity model resulted in four principle components, explaining a high amount of the overall variation (R2Y = 0.61), with good predictive ability (Q2Y = 0.61) (Table S3) and low-background correlations (Table S4). C3, C4, and C1 clustered near C2 situated near highly influential X predictor variables, including DOC concentration and covarying variables (A420, SUVA, Fe, Al, metals, HIX) (Fig. 2a). C6 was situated furthest away from highly influential X predictor variables and C5 between C6 and C2 (Fig. 2a). The strength of explained variation (R2Y) and the goodness of prediction (Q2Y) for C1 to C6 in the C-intensity model (Table S4) were consistent with their relationships to DOC concentration (Table S5), being strongest for C3 (R2 = 0.81, P < 0.0001), and weakest for C6 (R2 = 0.05, P < 0.0001) (Table S5). Consequently, the C-intensity model resulted in poor separation between Y response variables, with little explanation by X predictor variables, due to the strong influence of DOC concentration on PARAFAC component intensities.

For the C-percentage model, two broad-scale patterns were evident. Firstly, the relative positioning of both X and Y variables situated along the first-principle component (horizontal axis, 26%) was a function of DOC concentration (Fig. 2b) and correlated variables (Fe, Al, pH, SUVA). In addition, the %Water in the catchment, and DOC / DON were highly influencial X variables (VIP > 1.0). Secondly, separation along the second-principle component (vertical axis, 10%) corresponded to MAT and correlated variables (e.g., GDD, TN and TP) (Fig. 2b). In addition, PARAFAC components with longer λem (red-shifted) were positioned to the right on the first-principle component and hence were most abundant in lakes with higher DOC concentration, while shorter λem components were associated with lower DOC concentration. Overall, the C-percentage model resulted in four principle components, with an explained variance of R2Y = 0.48 and predictive ability of Q2Y = 0.45 (Table S3), with low-background correlation (0.08 to 0.11) (Table S4), and the first- and second-principle components axes could be related to DOC concentration and MAT, respectively (Fig. S3).

In contrast to C6 in the C-intensity model, %C6 was the Y response variable with greatest amount of overall explained variance and predictability (R2Y = 0.75, Q2Y = 0.73), followed by %C3 (R2Y = 0.64, Q2Y = 0.62) (Table S4). Accordingly, %C6 and %C3 were situated at opposing ends of the loadings plot (x axis) (Fig. 2b), and their relationships with X predictor variables were likewise opposite to each other. C3 is a humic-like component and C6 is a protein-like component. The explained variation and predictability of %C2 was also moderate (R2Y = 0.52, Q2Y = 0.49) and positively correlated with %C6 (Fig. 2b).

There was a group of highly influential X predictor variables clustered near %C3, with strong positive correlations including DOC concentration (Fig. 3e), and covariates (Fe, Al, metals, and SUVA) (Fig. 2b). DOC concentration and lake water pH were negatively correlated based on simple linear regressions (R2 = 0.16, P < 0.0001), as evident from the PCA loadings plot, where they appear opposite to each other (Fig. S4). Accordingly, these two X variables are situated at opposing ends of the loadings plot with %C6 situated near lake water pH. DOC / DON was strongly influential in predicting %C6 and %C3 (Fig. 2b), as confirmed in a scatter plot (Fig. 3g and h). Simple absorbance metrics, SUVA and S250–600, were also highly influential X predictors. SUVA had a strong positive relationships with %C3, while S250–600 was negatively related to %C6 (Fig. 2b). The removal of absorbance metrics from PLS analysis reduced the R2Y on the first-principle component axis by 0.01, yet we include them to simply show how %C3 and %C6 are paired with these commonly used metrics.

Figure 3.

Relationships between the relative abundance of two key PARAFAC components %C3 and %C6, and the percentage of water in the surrounding catchment (%Water) (a, b), lake water retention time (WRT) (c, d), dissolved organic carbon (DOC) (e, f), and DOC to dissolved organic nitrogen (DOC:DON) (g, h) (Regression statistics in Table 2).

%Water was identified as a highly influential X variable in the C-percentage model. Lakes in this study with a high % Water in the surrounding catchment also tended to have a high WRT (WRT = 27.9 × %Water + 40.4, R2 = 0.36, P < 0.0001), despite being measures that are independent of each other, and thus, while weaker, there were significant relationships between WRT and both %C3 (negative) and %C6 (positive) (Fig. 3c, d). When we examined how the intensity of C3 and C6 varied directly with %Water, WRT, DOC, and DOC/DON (Fig. 4), we found a highly significant exponential decrease in humic-like C3 with greater %Water, and likewise with longer WRT (Fig. 4a and c). In contrast, with increasing %Water and longer WRT, the amount of C6 fluorescence remained stable (Fig. 4b and d). Regression statistics for relationships presented in Figs 3 and 4 are provided in Table 2.

Table 2. Regression statistics for relationships between the intensity (C3 and C6) and relative intensity of PARAFAC components (%C3 and %C6) with select variables (Figs 3 and 4).
  1. The adjusted coefficient of variation (R2), slope, and intercept (k, b) provided in brackets, and significance level is indicated with p-value; ns indicate nonsignificant relationships. All relationships are exponential math formula with the exception of linear relationships math formula marked with*.


R2 = 0.23

(−0.2, 17.5)

P < 0.0001

R2 = 0.17


P < 0.0001

R2 = 0.18


P < 0.0001

R2 = 0.32

(0.2, 8.4)

P < 0.0001


R2 = 0.20*


P < 0.0001

R2 = 0.12


P < 0.0001

R2 = 0.54


P < 0.0001

R2 = 0.35


P < 0.0001


R2 = 0.19


= 0.01

R2 = 0.12



R2 = 0.81*


< 0.0001

R2 = 0.24


P < 0.0001


R2 = 0.01


P = 0.01

R2 = 0.00



R2 = 0.05


P < 0.0001

R2 = 0.03


P < 0.0001

Figure 4.

Relationships between the intensity of PARAFAC components C3 and C6 in Raman units (RU), with the percentage of water in the surrounding catchment (%Water) (a, b), lake water retention time (WRT) (c, d), dissolved organic carbon (DOC) (e, f), and DOC to dissolved organic nitrogen (DOC : DON) (g, h) (Regression statistics in Table 2).

Along the second-principle component axis, %C5 was found to cluster with several variables, particularly MAT, which is strongly correlated to GDD, ions (anions and base cations), nutrients (TON, TIN, TN, TP), and % agriculture (Fig. 2b). While relationships derived along the second-principle component were not as strong as the first, there was a significant relationship between %C5 and MAT (%C5 = 0.32 MAT + 0.00, R2 = 0.10, P > 0.0001).


Using a multivariate approach comparing 560 lakes spanning a geographical area of approximately 450 000 km2 across Sweden, we were able to discriminate between the relative influence of lake water chemistry, catchment characteristics, and climatic factors, on the quality of lake DOM across a broad geographical scale.

The most highly influential land cover variable predicting the composition of DOM was the percentage of water in the surrounding catchment (%Water), as revealed from both PLS analysis and direct regression analysis (Fig. 2b, Fig. 3a, b). %Water was defined as the relative areal coverage of open water in the catchment not including the lake itself. So, for instance, aside from small ponds in the catchment, headwater lakes generally had a negligible %Water (≈ 0%). The %Water has been previously identified as the main feature explaining variation in DOC concentration across several Finnish lake survey studies (Kortelainen & Mannio, 1988; Rantakari et al., 2004; Mattsson et al., 2005). As with the Finnish study, this study also finds lower DOC concentrations in lakes with a higher %Water, likely due to the greater possibility for DOM to be lost, produced, and transformed during in-lake processing in upstream water bodies (Kortelainen, 1993). Lakes with a higher %Water will have a lower areal transport of DOM derived from terrestrial sources. With increasing %Water and longer WRT, we found the quality of DOM shifted, with greater proportions of a protein-like component (%C6), and lower proportions of the terrestrially derived humic-like component (%C3) (Fig. 3). Based on component intensities, C3 was lost, but the amount of C6 remained stable regardless of %Water and WRT (Fig. 4). The hydrological retention time of lake water has been identified as an important factor influencing the loss of terrestrial DOM (Hanson et al., 2011). The net loss of DOC and preferential loss of colored DOM (based on SUVA, Fig. 2b) in lakes with longer WRT has been documented previously and attributed to losses from microbial and photodegradation, as well as losses to flocculation, and new algal inputs from primary productivity (Curtis & Schindler, 1997; Kraus et al., 2011; Weyhenmeyer et al., 2012; Köhler et al., 2013). However, traditional calculations of lake WRT neglect to consider the retention time of water in upstream water bodies. In fact, a recent study revealed that the exponential relationship used to describe the loss of colored DOM with longer WRT using conventional measures almost doubled (R2 of 0.11 to 0.21) when the accumulated WRT of upstream inputs were included (Müller et al., 2013). Hence, while traditional WRT estimates only consider water retention within the specific lake of interest, %Water is a better proxy for the total time DOM is exposed to in-lake processing within the catchment. This study confirms that WRT influences DOM quality, but further emphasizes the importance of considering the catchment as a whole when considering potential drivers of DOM quality.

The inverse patterns of %C3 and %C6 across gradients of %Water, WRT, and DOC was also reflected in the DOC/DON. This may be a result of N-rich compounds comprising the protein-like component C6 being selectively retained. DON can be retained efficiently in lakes with increasing WRT (Windolf et al., 1996). Likewise, during passage through the landscape DON may have been retained and even recycled, despite continued net utilization of DOC. An arctic stream study also noticed that more processed DOM contained more DON, while terrestrial sources of DOM were generally poorer in DON (Cory et al., 2007). The wide range of DOC/DON observed in this particular dataset (Table 1) is attributed to the inclusion of hyper-oligotrophic lakes common to northern Sweden, along with southern eutrophic lakes where historical N deposition was greater (Bergström et al., 2005).

Secondary influence on DOM quality was the MAT and a group of associated geographical variables (Fig. 2b). As MAT lies along the secondary principle components axis, results from this study suggest that %Water and WRT are more influential to DOM quality than regional-scale MAT. Warmer temperatures, longer growing degree days, and a greater percentage of agriculture increased the relative abundance of C5. The positioning of C5 on the EEM is in the region of Peak A (Coble, 1996), a very common and highly prevalent region associated with humic-like fluorescence from terrestrial sources. Regional-scale associations between MAT and DOM concentration were expected, because optimal conditions for DOM production in soils and export to inland waters are between 0 and +3 °C (Laudon et al., 2012). MAT may have an indirect influence on DOM quality due to a wide range of potential factors including shifts in vegetation cover and litter quality, the balance between primary productivity and decomposition rates, length of the ice-free season, and even the length of exposure to solar radiation, only to mention a few.

The actual and relative intensity of two PARAFAC components, C3 (%C3) and C6 (%C6), provided the most insight into DOM quality. The positioning of C6 resembles that of the amino acid tryptophan, which indicates autochthonous algal and microbially derived proteins (Yamashita & Tanoue, 2003); however, C6 may also include some other polyphenolic compounds including tannins and lignins (Maie et al., 2007; Hernes et al., 2009). Controlled bottle incubation studies have found fluorescence in the protein-like region to be biologically reactive, both produced and degraded over time (Lønborg et al., 2010; Guillemette & Del Giorgio, 2011). The stable levels of C6 (and increasing %C6) in lakes with greater %Water and longer WRT suggests that it may be more persistent component, constantly being renewed despite the loss of other components. The negative association of %C6 with A420 and SUVA, and positive association with S250–600 observed here, provides further evidence of being weakly colored by-products of microbial degradation, photo-bleaching (Galgani et al., 2011), and production from algae.

The intensity and percentage of C3 was greatest in lakes with lower %Water and shorter WRT. This suggests that C3 was most susceptible to in-lake loss processes and may have been lost rapidly from the water column by mineralization to CO2, flocculation, or transformation into other DOM by-products. A potentially important loss process for C3 may have been flocculation. A substantial proportion of DOM (8 to 22%), generally of terrestrial origin, can be lost from the water column solely from flocculation to lake sediments (Von Wachenfeldt et al., 2008). The abundance of free cations (iron, aluminum, hydrogen, and metals) strongly associated with C3 increases the opportunity for ionic associations such as ligand exchange or cation bridging, reducing the solubility of DOM comprising C3, and improving its ability to naturally flocculate out of the water column (Tipping, 1981; Lofts & Tipping, 1998; Luider et al., 2003). C3 fluorescence was situated at the longest emission wavelength of the humic-like components indicating molecules with higher molecular weight, containing more conjugated structures and being more hydrophobic (Wu et al., 2003; Hunt & Ohno, 2007). C3 also had the strongest positive relationship with SUVA, an indicator of darker DOM with aromatic structures (Weishaar et al., 2003) (Fig. 3). Other studies have likewise found that the most aromatic, highly colored fractions with large molecular weight DOM have shorter half-lives in aquatic systems (Weyhenmeyer et al., 2012; Köhler et al., 2013).

The potential susceptibility of C3 to flocculate out of lake water is of direct relevance to drinking water treatment, particularly where iron or aluminum is used as a flocculating agent. An extensive study examining changes during water treatment found that two protein-like PARAFAC components increased relative to humic-like components (Beggs & Summers, 2011). While the two humic-like components (λex : λem of 240, 340:464 and 240, 300:420) identified by Beggs & Summers (2011) are not directly comparable with C3 in this study (λex : λem of 280,400 : 504), the observation that protein-like components (such as C6) were more resistant to flocculation is consistent with predictions arising from our nationwide lake survey study. From an ecological and biological reactivity perspective, lakes with higher concentrations of DOC and higher %C6 may be important sources of labile and biologically reactive DOM. The overall structural characteristics of C6, being less aromatic than other components, suggest that it would be most resistant to flocculation in the natural environment, relative to other PARAFAC components. In a recent review, three reoccurring PARAFAC components were identified (Ishii & Boyer, 2012). Two of which were identified in this study, corresponding to C5 and C3. The review study also reports C3 to be of terrestrial origin, with strong sorption characteristics to gibbsite, goethite, and sediments (Ishii & Boyer, 2012). This supports our findings of C3 being strongly associated with aluminum and iron, which likely included gibbsite and goethite, resulting in sorption and flocculation to sediments. Stedmon et al. (2005) also report a component similar to C3 to be most abundant in terrestrial systems, less abundant in estuaries and missing entirely in the open ocean, which also suggest preferential loss of C3 with progressive internal processing.

The relative intensity of PARAFAC components (%C's) was particularly useful to assess DOM quality independently of DOC concentration. The separation of %C's along the first-principle component axis (Fig. 2b) revealed that emission wavelength positioning could be a useful general indicator of physicochemical properties (Fig. 5). Components at longer emission wavelengths, or red-shifted components, were more conjugated (higher SUVA) (Weishaar et al., 2003) and more colored (higher A420). Red-shifted components were also most readily lost from lake water and thus were most reactive in the environment (Fig. 5). Consequently, emission wavelength positioning may be a simple way to characterize the complexity and reactivity of PARAFAC components (Fig. 5), particularly to distinguish between components that would otherwise be vaguely classified as humic-like (e.g., %C1 to %C5).

Figure 5.

Conceptual diagram of key physicochemical properties associated with the relative abundance of PARAFAC components contributing to dissolved organic matter (DOM) quality across boreal lakes, including their level of reactivity within the landscape. *PARAFAC components are sorted from longer to shorter emission wavelengths (on vertical axis), detailed spectra are presented in Fig. S1.

DOC concentration was a primary predictor of DOM quality based on the relative abundance of PARAFAC components. Numerous studies have found relationships between DOC concentration and DOM quality measures using fluorescence and absorbance metrics (Yamashita et al., 2010; Erlandsson et al., 2012; Inamdar et al., 2012). However, these findings are not universal. For example, a study examining fluorescence characteristics across multiple ecosystems and across freshwater and estuarine environments did not find a relationship between DOM quality and concentration (Jaffe et al., 2008). Several lake water chemistry variables including pH, iron, aluminum, and metals were strongly correlated with lake water DOC concentration (Fig. S3), appearing to have relationships with both %C3 and %C6 (Fig. 2b); however, their influence may be complex. Water chemistry may have influenced DOM fluorescence intensity due to static quenching by iron (Fe+3), metals, and H+ (Miller, 1981; Lakowicz, 2006; Kelton et al., 2007). Upon investigating the linearity of relationships between fluorescence and absorbance in an earlier study, it was clear that the protein-like region of EEMs (near C6) was particularly attenuated for acidic lakes (4.6 to 6.0) (Kothawala et al., 2013). Fe+3 is a particularly effective quencher at low pH (Cabaniss, 1992) but is only abundant at pH below 4.5 (Neubauer et al., 2013), which contributed to the poor observed relationship for C6 (Kothawala et al., 2013). However, it remains unclear which regions of the EEM were affected most severely by Fe+3 quenching. As this study focused on broad-scale controls on PARAFAC components, only the most dominant overlying relationships were revealed here. Sample manipulation to adjust pH or convert Fe+3 to Fe+2 (Mcknight et al., 2001; Doane & Horwath, 2010) was not possible prior to analysis in this case. Thereby, results are representative of whole DOM under ambient conditions.

Some fraction of the unexplained variation in our study may be attributed to hydrological mixing and dilution of PARAFAC components originating from different sources. For instance, some components may have been transported in part from topsoils (e.g., C3), or groundwater inputs (e.g., C6). The hydrological connectivity of mixing water bodies can play an important role in regulating DOM quality (Massicotte & Frenette, 2011). In this nationwide study, we are unable to monitor the relative quality of DOM from streams and groundwaters. Yet, our results show that annual lake runoff was one of the poorest influencing variables in this dataset (Fig. 2a and b). Studies including a network of hydrologically connected lakes over a smaller spatial scale may be necessary to fully evaluate the relative influence of catchment and climate characteristics along with hydrological mixing and dilution on lake DOM quality.

This is the first comprehensive study of PARAFAC components representing boreal lakes across a wide geographical area. While previous studies have examined relationships between DOM quality and drivers unique to a particular site, this study identified the relative importance of lake chemistry, catchment, and climate drivers over an extremely large spatial scale. Consequently, these results should be interpreted bearing in mind that many variables are spatially correlated within this dataset (Fig. S3). For example, southern Sweden is warmer, with greater agricultural activity and more nutrients, while more mountainous lakes are located in the north. Thus, the relative influence of nutrients and altitude are inherently linked to the overlying influence of temperature in this dataset. In addition, most lakes included in the survey are relatively pristine oligotrophic lakes. Thus, there is some uncertainty as to how universal these results are to the entire boreal landscape. As this study is correlative, we are unable to specifically distinguish between the relative importance of in-lakes processes (e.g., primary production, flocculation, decomposition, photodegradation) and hydrological dilution, which may have collectively contributed to observed shifts in DOM quality. Alternative drivers are almost certainly responsible for DOM quality at more localized scales; however, this study included a vast number of overlapping gradients, chemically, geographically and climatically, and we aimed to identify only the most dominant controls.

This study highlights the influence of land cover, particularly the percentage of water in the catchment and lake water retention time in regulating DOM quality with the mean annual temperature being a secondary influence. These results are particularly relevant in light of projected increases in precipitation and runoff for boreal ecosystems (Kundzewicz et al., 2007). Warmer and wetter conditions may lead to increased DOM inputs from soils and peatlands (Neff & Hooper, 2002; Pastor et al., 2003; Davidson & Janssens, 2006). Shifts in the timing of peak spring melt and frequency of rain on snow events could alter the source and quality of terrestrial DOM inputs (Sebestyen et al., 2008; Casson et al., 2012). Greater annual runoff could result in shorter lake water retention times, decreasing the opportunity for in-lake processing and primary production, shifting the quality of DOM to contain a greater proportion of humic-like C3, relative to protein-like C6. This shift in DOM composition is likely to increase the prevalence of humic and aromatic DOM structures with darker color that we reason here to be easier to treat with flocculation for drinking water purposes (Weishaar et al., 2003). We expect the most aromatic and colored compounds (e.g., C3) to be most effective at absorbing solar radiation (Moran & Zepp, 1997; Vähatalo et al., 2003) and may potentially be a primary source of CO2 due to photomineralization. From an ecological perspective, a higher prevalence of humic lakes in the boreal ecozone could stimulate greater food web subsidies from energy mobilized by aquatic bacteria (Jansson et al., 2007).


We thank Jan Johansson for the analysis of fluorescence and absorbance scans, Christian Demandt and analytical chemists at the Swedish Agricultural University (SLU) for all data on lake water chemistry, Jens Fölster (SLU) for the organization of samples for spectral analysis. We thank the Swedish Meteorological and Hydrological Institute (SMHI) for hydrological and climate data, Jakob Nisell (SLU) for providing GIS, land cover, and CORINE database support, and Lantmäteriet for maps and geographical data. Valuable analytical support and statistical advice were provided from Sebastian Sobek, Anne Kellerman, Hannes Peter, Silke Langenheder, Jürg Brendan Logue, Jason Xing Ji and Cristian Gudasz. This study was funded by the Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning, FORMAS, and is part of the project ‘The color of water – interplay with climate, and effects on drinking water supply.’ Funding for G.A.W. was provided by the Swedish Research Council (VR), Swedish Research Council for Environment, Agricultural Sciences and Spatial Planning (FORMAS) and the Nordic Centre of Excellence “CRAICC” supported by NordForsk. We thank four reviewers for their valuable contributions, which helped improve the manuscript.