Nutrient source attribution: Quantitative typology distinction of active and legacy source contributions to waterborne loads

Distinction between active and legacy sources of nutrients is needed for effective reduction of waterborne nutrient loads and associated eutrophication. This study quantifies main typological differences in nutrient load behaviour versus water discharge for active and legacy sources. This quantitative typology is used for source attribution based on monitoring data for water discharge and concentrations of total nitrogen (TN) and total phosphorous (TP) from 37 catchments draining into the Baltic Sea along the coastline of Sweden over the period 2003–2013. Results indicate dominant legacy source contributions to the monitored loads of TN and TP in most (33 of the total 37) study catchments. Dominant active sources are indicated in 1 catchment for TN, and mixed sources are indicated in 3 catchments for TN, and 4 catchments for TP. The TN and TP concentration contributions are quantified to be overall higher from the legacy than the active sources. Legacy concentrations also correlate well with key indicators of human activity in the catchments, agricultural land share for TN (R2 = 0.65) and population density for TP (R2 = 0.56). Legacy‐dominated nutrient concentrations also change more slowly than in catchments with dominant active or mixed sources. Various data‐based results and indications converge in indicating legacy source contributions as largely dominant, mainly anthropogenic, and with near‐zero average change trends in the present study of catchments draining into the Baltic Sea along the coastline of Sweden, as in other parts of the world. These convergent indications emphasize needs to identify and map the different types of sources in each catchment, and differentiate strategies and measures to target each source type for possible achievement of shorter‐ and longer‐term goals of water quality improvement.

Accumulation of nutrient legacies in soil, groundwater and sediments is a consequence of both slow advective transport and various physical and biogeochemical immobilization-remobilization processes that have since long been quantified to occur in the subsurface parts of hydrological catchments (Cvetkovic et al., 2012;Destouni & Cvetkovic, 1991). The remobilization (desorption, dissolution, diffusive mass transfer) components of these delay processes imply that nutrients are not just irreversibly retained in immobile water/soil/sediment legacy zones, but can also be released back to mobile water and continued, delayed nutrient transport. For mitigation efforts, the long time-lags involved in such delayed transport imply that nutrient decreases and water quality improvements in recipient surface and coastal waters are slow after implementation of mitigation measures (Darracq et al., 2008;Meals et al., 2010;Sharpley et al., 2013;Van Meter & Basu, 2015). This has made goal achievement in water quality and eutrophication management a major sustainability challenge requiring a long-term perspective (Destouni & Jarsjö, 2018;Haygarth et al., 2014;Murray et al., 2019).
To meet this important challenge, mitigation measures need to be selected and placed so they can be effective for the different types of nutrient sources. For currently active surface sources, relevant measures may be regulations and incentives for enhanced nutrient removal in wastewater treatment, reduced fertilizer use, changes in manure spreading time, cultivation of catch crops. For legacy sources in soil, groundwater and sediments, other types of measures are needed, such as well-placed reactive buffer zones, wetlands, pumpand-treat technologies. To effectively select and locate mitigation measures, different types of nutrient sources need to be distinguished, identified, and appropriately targeted (Chanat & Yang, 2018;Levi et al., 2018).
In this paper, a quantitative typology approach to source attribution is outlined that distinguishes between currently active and legacy sources of nutrients, based on commonly available stream monitoring data for water discharges and nutrient concentrations. For concrete case study quantification and testing of this approach we use data for discharge and concentrations of total nitrogen (TN) and total phosphorous (TP) representing 37 hydrological catchments that drain into the Baltic Sea along the coastline of Sweden over the time period 2003-2013. In a parallel study, a similar basic source attribution approach is applied to data for chloride and metals (Destouni et al., 2021). In addition to considering different hydrochemical constituents (nutrients), the present study develops the basic source attribution approach further to a discrete quantitative source typology (whereas the parallel study considers continuous degrees of active and legacy source contributions). The present approach application also regards a greater number of catchments over a larger geographic scale (along the whole coastline of Sweden) and with order-ofmagnitude larger study catchments (on the order of 10 3 -10 4 km 2 ) than those in the parallel study (covering a local geographic area around a major Swedish lake, and including 19 monitored catchments for chloride and fewer for metals, with study catchment size on the order of 10 2 -10 3 km 2 ). In combination, the parallel studies test the applicability, generality and transferability of the underlying source attribution approach for different hydrochemical constituents in and across various hydrological catchment settings.

| Quantitative source typology
This section outlines the general quantitative typology of active, legacy, or mixed source dominance in a catchment. This typology is based on first-order quantification of some key characteristics that should, in general, mechanistically differ in the hydrochemical behaviour versus discharge between the contributions from these different types of sources. This difference can be used for relatively simple source type distinction based on commonly available stream monitoring data for water discharge and hydrochemical concentrations. At each catchment outlet (i.e., monitoring point), these monitored water flow and quality quantities integrate the effects of all source inputs, and subsequent physical and biogeochemical transport processes that occur over the catchment and along all hydrological transport pathways to the catchment outlet. In the following outline, we aim to quantify the general first-order characteristic source type differences that may emerge in the integrated hydrochemical signal at the outlet of a catchment, not to model the details and specifics of all sources and transport processes under various catchment conditions. For concreteness, we also focus the outline text on nutrients, since they are the focus of the present study application, but note that the source attribution approach is more general and applicable to various hydrochemical constituents, such as the chloride and metals focused on in the parallel study by Destouni et al. (2021).

| Currently active sources
Consider a nutrient source (or collection of sources) spread over a catchment and delivering a relatively stable average input mass flow rate, I in-A , over time into the catchment. In the transport to the outlet/ monitoring point, some fraction (1Àα A of I in-A ) may be retained and delayed within the catchment (Levi et al., 2018;Quin et al., 2015), in zones of low conductivity and slow flow, and/or by biogeochemical or physical sorption processes (adsorption, chemical precipitation, diffusive mass transfer to immobile water zones), and to some degree even be lost from the catchment (e.g., by denitrification for TN). For this to be a considered currently active type of source, delivery of nonretained nutrient mass (α A I in-A ) from each point in time of the continuous source input flow I in-A should on average reach the outlet within one or just a few years after that input time. Conservation of mass for a stable input mass flow (I in-A ) with temporally constant relative retention (1Àα A ) then implies an average outlet load that is also stable, as: where C out-A is the flux-averaged nutrient concentration contribution from the active sources in the water discharge (Q out ) at the outlet. Equation (1) implies chemodynamic concentration behaviour, where the concentration (C out-A ) varies with Q out in order for L out-A to remain stable on average (Basu et al., 2010;Levi et al., 2018;Selroos & Destouni, 2015).  Figure 1(a). This expectation can be checked against data-given LÀQ regression lines, as outlined in the next section for the case study of 37 Swedish monitoring stations (Figure 1(b)).

| Legacy sources
For the typology of legacy sources, consider the nutrient mass fraction (1Àα A ) that has been retained within a catchment in soil, groundwater, and sediments, from continuous nutrient input (I in-A ) for decades. Some part of that immobilized nutrient mass may be continuously released back into mobile water (by dispersive/diffusive mass transfer, desorption, dissolution from slow/immobile water and/or sorption zones). The resulting outlet load contribution (L out-L ) can then be quantified based on Equation (2) of Destouni and Jarsjö (2018) as: where C out-L is the flux-averaged nutrient concentration contribution from the legacy sources, n is average volumetric water content in the soil/aquifer/sediment zones containing the legacy source (n is average porosity if the pore volume in these is largely water-filled), C Ã 0 is average bulk concentration in the legacy zones (i.e., average nutrient mass per unit bulk soil/aquifer/sediment volume) at the start of the study period, T (unit: time) is average advective transport time from the legacy zones to catchment outlet, and k is average relative release rate (unit: per time; with 1/k (unit: time) quantifying a characteristic time F I G U R E 1 (a) Schematic illustration of regression line types for different types of (active, legacy, or mixed) nutrient sources, and (b) locations and numbering of the 37 most near-coastal monitoring points over Sweden included in this study. In (b), nutrient concentration and water discharge are measured in close proximity for the 'load set' (red circles), while for the 'load-Est set', discharge is measured at a more upstream station (blue stars) than the concentration measurement (green triangles) scale until total source depletion under assumed zeroth-order release With C Ã 0 , n, k and T assumed not to vary much in time around their respective average value, Equation (2) implies relatively stable average C out-L under variable discharge Q out . The expected type of average L out-L behaviour versus Q out would then be a line with slope C Ã 0 kT=n À Á and intercept near zero (case A, brown line, Figure 1(a)) or negative (case B, blue line, Figure 1(a)). These legacy cases can also be checked against data-given L-Q regression lines, and if the B case emerges from this, it would indicate that legacy nutrient release occurs first at, and above a minimum threshold Q out value (solid part of blue line in Figure 1(a)).

| Mixed sources
The typology of mixed active and legacy sources can finally be quantified in terms of output load as: where γ=L outÀA =L outÀM is a dimensionless fraction (0 ≤ γ ≤ 1) quantifying the relative active source contribution to total load (L outÀM ). The expected type of L outÀM behaviour versus Q out is thereby also a line (purple in Figure 1(a)), with slope S M ¼ C outÀL and intercept I M ¼ C outÀA Q out for I M ≥ 0. This expectation can also be checked against data-given L-Q regression lines for various monitoring stations ( Figure 1(b)), as outlined in the following section.

| Data
As a concrete case study for quantification and testing of the source typology outlined above, we use a dataset of coastal loads from 37 Swedish catchments draining to the Baltic Sea along the whole coastline of Sweden (Figure 1(b)). The associated 37 near-coastal monitoring stations are obtained from official Swedish environmental monitoring of streams and selected for having continuous data availability for TN and TP concentrations (C out ) over the study period 2003-2013 (see Data Availability Statement at the end of the main text for all data sources). We refer to this set of coastal measurement points as the 'Total set', of which only 19 have also closely associated measurements of water discharge (Q out ). By this we mean a Q out measurement close enough to the concentration C out data point to allow direct quantification of nutrient load as L out = Q out C out . The set of only 19 coastal data points that fulfils this condition is referred to as the 'Load set'. For the remaining 18 data points, we apply dischargeupscaling to extend the number of data points to the total 37 concentration points, and refer to the additional set of 18 data points as the 'Load-est set'. For the 'Load-est set', we upscale discharge Q* measured upstream of the catchment outlet (where C out is measured) as where Q out is the estimated discharge at the outlet with total contributing catchment area A c and A* the smaller (sub) catchment area contributing to Q*. The locations of the 'Load set', 'Load-est set' and 'Total set' data points are shown in Figure 1(b).

| Typology testing and quantification based on data
Based on the available monitoring data for water discharge (Q out ), and nutrient concentrations (C out ) and loads (L out = Q out C out ) for TN and TP, we test the expected type of L out behaviour versus Q out and assess the possible source dominance for each nutrient within each catchment (Figure 1(b)). This is done based on the regression line of F I G U R E 2 Schematic illustration of the approach to determine the shortest normalized distance (ΔD, red) from the origin of a regression line (examples in blue) along both the vertical axis (for nutrient load L) and the horizontal axis (for discharge Q). Alternative approaches that are also used in this study consider the shortest distance from the origin along only the vertical (L) axis or only the horizontal (Q) axis F I G U R E 3 Regression lines for total nitrogen (TN) load (L out , vertical axis) versus water discharge (Q out , horizontal axis) and associated source attribution based on data for the "Total set" of stations (Figure 1(b)). Both L out and Q out are normalized with their respective average values over the study period 2003-2013; the value 1 on either axis thus represents the average value of that variable F I G U R E 4 Regression lines for total phosphorus (TP) load (L out , vertical axis) versus water discharge (Q out , horizontal axis) and associated source attribution based on data for the 'Total set' of stations (Figure 1(b)). Both L out and Q out are normalized with their respective average values over the study period 2003-2013; the value 1 on either axis thus represents the average value of that variable F I G U R E 5 Source type classification for total nitrogen (TN, left panels) and total phosphorus (TP, right panels). This is based on the three different approaches to determine the shortest normalized distance from the origin of the data-based regression lines of nutrient load (L) versus discharge (Q) along (see schematic illustration in Figure 2): Both the L axis and the Q axis (top panels a-b); just the L axis (middle panels c-d); just the Q axis (bottom panels e-f). The numbers shown in the panels for data points outside the Legacy A source type are the corresponding station numbers from Figure 1(b) T A B L E 1 Summary of source type classification for total nitrogen (TN) and total phosphorus (TP) based on the results from the three approaches shown in Figure 5 Classification based on regression line distance from origin:  , 5, 13, 33 4, 9, 11, 12, 15, 25, 35, 36, 37 5, 14, 19, 33 F I G U R E 6 Statistics of the coefficient of determination (R 2 ) for the best fit regression lines of nutrient load versus discharge for the different sources of: (a) total nitrogen (TN; Figure 3); and (b) total phosphorus (TP; Figure 4). The boxplots show the median (line) and associated interquartile (box) and total (whiskers) ranges, and the red + symbol in (b) shows an outlier value F I G U R E 7 (a) Spatial distribution of different source types for total nitrogen (TN). Calculated TN source concentrations for (b) legacy sources, and (c) currently active sources. (d) Contribution fractions (γ) for the active sources in the mixed-source catchments The corresponding legacy load contribution fraction (1Àγ) can also be estimated from the regression line slope S M as: For all data points and their associated catchments (Figure 1 (b)), we also map and analyse the spatial distribution of the

| Source attribution
For each coastal measurement station (Figure 1(b)), regression lines are fitted to the available L out and Q out data. Figures 3 and 4 Figure 1(a)).
For TN, the combined results from the three alternative approaches used to quantify the regression line deviation from zero intercept identify 33 catchments as legacy-dominated (32 as case A and 1 as case B), 1 catchment as active source-dominated, and 3 catchments as having mixed sources ( Figure 5, Table 1). For TP, 33 catchments emerge as legacy-dominated (24 as case A and 9 as case B). No catchment exhibits active source dominance for TP, and F I G U R E 1 0 Statistics of: (a) average precipitation (P); (b) average runoff (R); (c) average temperature (T); and (d) area for catchments with different dominant source types with regard to total nitrogen (TN) and total phosphorus (TP). The boxplots show the median (line) and associated interquartile (box) and total (whiskers) ranges, and the red + symbol in (b) shows an outlier value 4 catchments emerge as having mixed sources. For both TN and TP, Figure 6 further shows that the coefficient of determination (R 2 ) for the regression lines that indicate dominant legacy sources is overall considerably greater (mostly in the interval 0.7-0.9) than for the other source types (R 2 near zero for active source lines, and <0.5 for mixed source lines). This result supports the source attribution, since concentration contributions from active sources should respond faster than those from legacy sources to mitigation measures, which have been taken in Sweden to mitigate nutrient loads to the Baltic Sea, targeting known active sources with insufficient results (Destouni et al., 2017).

| Source relationships with hydro-climatic and human activity conditions
In terms of hydro-climatic conditions, the catchments with dominant legacy sources have on average somewhat higher temperature and lower precipitation and runoff (the latter two for TN, while the different catchment types are on average more similar in these respects for TP) than those with active and mixed sources ( Figure 10). These differences are mainly due to the spreading of the many legacy-source catchments over the whole of Sweden, and their relatively greater prevalence in the warmest and driest southern parts, where agricultural land share and population density are also greater than in the north. Overall, the legacy-source catchments are on average smaller than the catchments with active and mixed sources. This size F I G U R E 1 1 Legacy source concentration (C outÀL , Equation (1)) of total nitrogen (TN) and total phosphorus (TP) versus: (a) agricultural land share; and (b) population density in the catchments with dominant legacy sources. Regression lines are fitted to all data points of case a legacy sources. Table 2 Shows the associated coefficients of determination (R 2 ) for the case a legacy data points in each of the sub-datasets 'load set' and 'load-Est set' (Figure 1(b)) difference is consistent with parallel source attribution findings for metals and chloride (Destouni et al., 2021), and may be due to larger catchments allowing for greater source variation within them.   (Destouni et al., 2017;Van Meter et al., 2018) in spite of many actions taken for such improvement in the Baltic region (Iho et al., 2015;Linke et al., 2014)  water quality responses to local mitigation measures can be relatively fast, is needed to achieve relatively fast water quality improvements and meet shorter-term regulatory goals. For longer-term, large-scale water quality improvements, the more impactful legacy sources must also be identified and targeted with appropriate mitigation strategies and measures. The quantitative typology approach developed and tested in this study is general, transferable, and can help identify, map, and target active and legacy sources of nutrients (and other hydrochemical constituents) for both shorter-and longer-term water quality improvement.
Conceptualization, methodology development, and writing of the paper by Georgia Destouni and Yuanying Chen; data compilations by T A B L E 2 Average coefficient of determination (R 2 ) for regression lines between legacy source concentration of total nitrogen (TN) and total phosphorus (TP) versus agricultural land share or population density in the legacy catchments of the total data set ( Figure 11) TN TP Set of stations from Figure 1(