End-member mixing models have been widely used to separate the different components of a hydrograph, but their effectiveness suffers from uncertainty in both the identification of end-members and spatiotemporal variation in end-member concentrations. In this paper, we outline a procedure, based on the generalized likelihood uncertainty estimation (GLUE) framework, to more inclusively evaluate uncertainty in mixing models than existing approaches. We apply this procedure, referred to as G-EMMA, to a yearlong chemical data set from the heavily impacted agricultural Lissertocht catchment, Netherlands, and compare its results to the “traditional” end-member mixing analysis (EMMA). While the traditional approach appears unable to adequately deal with the large spatial variation in one of the end-members, the G-EMMA procedure successfully identified, with varying uncertainty, contributions of five different end-members to the stream. Our results suggest that the concentration distribution of “effective” end-members, that is, the flux-weighted input of an end-member to the stream, can differ markedly from that inferred from sampling of water stored in the catchment. Results also show that the uncertainty arising from identifying the correct end-members may alter calculated end-member contributions by up to 30%, stressing the importance of including the identification of end-members in the uncertainty assessment.
 Using mixing model approaches to separate the different components of a hydrograph has been instrumental in the development of hydrological science, as environmental tracers provide a unique view of the catchment-integrated response of hydrological flow paths. The use of mixing models has evolved from two-component mixing models [e.g., Johnson et al., 1969; Pinder and Jones, 1969; Sklash and Farvolden, 1979], mostly aimed at separating event and preevent water, to the now commonly used multitracer end-member mixing analysis (EMMA) outlined by Christophersen et al.  and Christophersen and Hooper . EMMA has in recent years been applied in various geographical settings and across spatial scales [Barthold et al., 2010; Burns et al., 2001; Guinn Garrett et al., 2012; James and Roulet, 2006; Long and Valder, 2011; Soulsby et al., 2003b]. Mixing model approaches are not limited to hydrology, they are also extensively used in other geosciences as geology [Keay et al., 1997; Weltje, 1997], sedimentology [IJmker et al., 2012], and ecology [Rasmussen, 2010].
 Mixing model approaches rely on the assumptions that (1) stream water can be explained as a linear mixture of extreme source solutions or end-members, (2) solutes used as tracers in the analysis are conservative, and (3) chemical signatures of end-members are invariant in time and space (at least for single events) and can be reliably characterized [Hooper et al., 1990; Sklash and Farvolden, 1979]. As noted by various authors, these assumptions are commonly violated in real-world applications, giving rise to uncertainty in the resulting hydrograph separations [e.g., Hooper et al., 1990; Soulsby et al., 2003a; Uhlenbrook and Hoeg, 2003]. Two separate uncertainty components can be distinguished. First, the end-members contributing to the stream water mixture have to be properly identified. EMMA theory requires end-members to be chosen that best bound stream water tracer data, given a conceptual understanding of the catchment functioning [Hooper et al., 1990]. Problems in duplicating hydrograph separations using different sets of tracers, however, point to the difficulty in identifying the complete set of relevant end-members using a limited number of tracers [Barthold et al., 2011; Rice and Hornberger, 1998]. By applying more tracers than mathematically necessary, the EMMA approach avoids this problem to a certain extent [Christophersen and Hooper, 1992; Christophersen et al., 1990], and the diagnostic tools developed by Hooper  provide a means to investigate the number of contributing end-members as evidenced from the stream water data set. Nevertheless, Barthold et al.  show that the appropriate choice of end-members varies considerably over varying tracer set sizes and composition, resulting in a significant uncertainty. In this paper, we term this type of uncertainty “identification uncertainty.”
 Second, in addition to the analytical error always associated with reported concentrations, spatial and temporal variability in end-member concentrations is ubiquitous at the scales considered, and is nigh impossible to characterize adequately using inevitably sparse sampling [Beven, 1989; Burns et al., 2001; Hoeg et al., 2000; Hooper et al., 1990; James and Roulet, 2006; Kendall et al., 2001]. Moreover, even when the variability of a suggested end-member is adequately characterized from sampling, it cannot be assumed that the characterized variability is mirrored in the flux-weighted contribution to the stream water [Kendall et al., 2001; Rinaldo et al., 2011]. And although the authors have argued that spatiotemporal variability may smooth out at larger scales, resulting in “emergent” end-members [Soulsby et al., 2003b], similar characterization problems will apply. We use the term “characterization uncertainty” for this type of uncertainty.
 Various authors have quantified characterization uncertainty in mixing models. For instance, Hooper et al.  and later Genereux  and Uhlenbrook and Hoeg  mathematically propagated the uncertainty in end-member concentrations, Soulsby et al. [2003a] developed a hierarchical Bayesian approach, while other authors applied a Monte Carlo approach to propagate the uncertainty in the chemical signatures of end-members [Bazemore et al., 1994; Durand and Torres, 1996]. Joerin et al.  extended the latter approach by allowing for nonnormal end-member concentration distributions, and by taking uncertainty in the applied model hypotheses regarding spatial and temporal variation into account, albeit in a simple manner. Iorgulescu et al. [2005, 2007] tried to allow for time changing end-members over a sequence of events using a data-based hydrochemical model within a GLUE framework. Barthold et al.  proposed an iterative methodology to explore, though not quantify, the identification uncertainty in EMMA.
 However, none of the existing approaches account for both identification and characterization uncertainty quantitatively. In addition, none can be applied to end-member mixing analyses using more solutes than mathematically necessary to solve the mixing equations (i.e., overdetermined), even though this is a central property of the widely used EMMA approach [Christophersen and Hooper, 1992]. In this paper, we therefore propose a new method to quantify uncertainty in end-member mixing models, one that specifically considers uncertainty in both identification and characterization of end-members, and allows for overdetermined mixing models. We based our approach on the generalized likelihood uncertainty estimation (GLUE) methodology of Beven and Binley , which recognizes that given the fundamental limitations of models as descriptors of environmental systems, multiple models and parameter sets may exhibit equifinality in that they all acceptably describe the available observational data [Beven and Binley, 1992; Beven, 1989, 2006].
 We apply the proposed approach to a small (10 km2), heavily impacted agricultural catchment in the coastal region of Netherlands. The catchment provides a difficult test case for our approach, as heavily impacted catchments pose specific challenges to the application of end-member mixing models, with agricultural activities and active water management causing marked changes in hydrology and chemistry [Durand and Torres, 1996]. In addition, the “open boundary” nature of this particular catchment, receiving extraneous fluxes of both regional groundwater flow and water intake, further hampers the application of mixing models. Interest in the hydrological functioning of this catchment is motivated by a projected increase in saline seepage [Oude Essink et al., 2010], that would render the surface water in the catchment unfit for agricultural use.
2. Materials and Methods
2.1. Lissertocht Catchment
 The artificial Lissertocht canal drains a 10 km2 intensively drained agricultural catchment (Figure 1). The catchment is part of the former lake Haarlemmermeer, reclaimed in 1852, and is located 25 km southwest of the city of Amsterdam in the Netherlands (52°13′ latitude, 4°36′ longitude). Relief in the catchment is all but flat, with an altitudinal range of 6–3.5 m below mean sea level (BSL). Mean annual precipitation amounts to 840 mm, mean annual potential evapotranspiration to 590 mm [Royal Netherlands Meteorological Institute, 2010]. Excess precipitation is quickly drained through an extensive system of tile drains and ditches. A pumping station at the end of the Lissertocht maintains water levels throughout the catchment at a relatively constant 6.55 m BSL in winter (October-April) and 6.4 m BSL in summer (April-October). An auxiliary pumping station at the western end of the catchment is used only during extreme discharge events. Water is let into the catchment through four culverts from April to October, to maintain surface water levels and improve water quality. Additional fresh water can be taken into the catchment at the location of the auxiliary pump.
 The catchment is underlain by an aquifer of Pleistocene fluvial sands (with transmissivity of 4600 ± 150 m2/d; data from the Netherlands Hydrological modeling Instrument (NHI) model, available at http://www.nhi.nu/). The aquifer is covered by a 6.7 m (± 0.7 m) thick layer of heterogeneous Holocene estuarine clays, loamy sands, and peat deposits on top of a thin (5–10 cm) layer of compressed peat deposits, referred to as basal peat [Stafleu et al., 2009]. This Holocene layer presents a considerable hydraulic resistance (i.e., thickness/vertical hydraulic conductivity; 2400 ± 750 d; NHI model data) to vertical groundwater flow. Aquifer hydraulic heads exceed shallow groundwater levels (mostly within 2 m below ground surface) throughout the catchment, causing a permanent upward seepage flux [Oude Essink et al., 2010]. Part of this seepage is concentrated in boils, which form preferential flow paths between aquifer and surface water [De Louw et al., 2010, 2011].
 We hypothesize five end-members to represent flow path contributions to stream water at the catchment outlet, two of which are external inputs to the catchment: (1) precipitation, entering the stream with minimum interaction with the soil (denoted as PR) and (2) inlet water, extraneous water taken into the catchment through inlet culverts (IL). The other three end-members represent different local groundwater stores, each with a characteristic flow path contributing to stream water: (3) deep aquifer groundwater, discharged through boil seepage (AD), (4) groundwater below ditches, representing diffuse seepage (BD), and (5) shallow, phreatic groundwater discharged mostly through tile drains (SL) (Figure 2, chemistry in Table 2).
 Resulting from differences in geologic history, lithology, palaeohydrology, water management, and agricultural activities, the three local groundwater types show distinct chemical signatures. Groundwater type AD infiltrated the aquifer when a marine transgression approximately 8 – 3.8 kyr B.P. flooded the area [Post et al., 2003]. This brackish groundwater type has a salinized, deeply anoxic, calcite saturated facies, indicated by a negative base exchange index (BEX) [Stuyfzand, 1999], and significantly lower-than-expected SO4 concentrations, given the admixing of sea water (Table 2). AD exfiltrates directly into the surface water through boils [De Louw et al., 2010], thus preventing any subsequent chemical interaction altering its signature. The brackish AD water type is overlain in the aquifer by a layer of fresh groundwater, infiltrating after coastal barriers started to form from 5.5 kyr B.P. onward and extensive marshlands developed behind them, covering the study area. This fresh groundwater has a different facies: freshened, deeply anoxic, and calcite saturated, demonstrated by a positive BEX and calcite saturation. This groundwater type seeps upward through a reactive layer of basal peat before exfiltrating into the stream. We therefore opted to sample this water type directly below ditches, just before exfiltration, as water type BD. BD shows the highest concentrations of HCO3, SiO2, B, and Li, testifying of peat interaction, dissolution of diatom skeletons and desorption of marine components after fresh water intrusion [Stuyfzand, 1993] (Table 2). Shallow phreatic groundwater (SL) is even fresher (Table 2), but bears the chemical signature of agricultural activities including fertilizer application and drainage, leading to raised levels of SO4 by pyrite oxidation [Pons and Van der Molen, 1973].
2.2. Sampling and Analytical Methods
 Stream water was sampled at the catchment's main pumping station, from 11 October 2011 until 4 October 2012. Samples were automatically obtained at the end of each pumping cycle (Teledyne ISCO automatic sampler) and collected within 3 weeks of sampling. Pumping cycles occurred approximately daily, resulting in a total of 362 samples. We investigated the response of solutes in representative stream water in a sample bottle for a worst-case collection scenario of 4 weeks waiting time. The sample collection test showed significant responses of EC, alkalinity, Ca, and NO3, while the response of other possible tracers was in the order of analytical uncertainty. We therefore discarded EC, alkalinity, Ca, and NO3 in subsequent data interpretation. Catchment discharge was obtained by multiplying pumping times, logged at 10 min intervals by the automated water management system of the local water authority, with pumping capacity, measured in three repetitions using a boat-mounted acoustic Doppler current profiler (TeleDyne RD) [Mueller and Wagner, 2009]. Maximum discharge of the various intake culverts was determined in three repetitions by measuring the time necessary to fill a 100 L polyethylene bag.
 The five end-members were sampled with varying frequency either before or throughout the stream water sampling period, depending on their observed temporal variance (Table 1). Shallow groundwater end-members (SL and BD) were sampled with a peristaltic pump, in piezometers screened approximately 1–2 m below the ground surface or ditch bottom respectively. AD was sampled with a peristaltic pump in one existing well, screened at 30, 40, and 60 m below surface level. Historic data from this and four additional nearby wells (< 9 km) was obtained from the Dutch database on subsurface data (DINO, available at http://www.dinoloket.nl). IL was sampled by grab sampling, while PR was sampled using a bulk collector connected to a rain gauge, constructed to minimize evaporation [Gröning et al., 2012].
Table 1. Sampling Locations, Frequency, and Period of Stream Water and End-Members in Lissertocht Catchmenta
 All samples were filtered through a 0.45 μm membrane filter and stored in the dark at 4°C on the day of collection. Alkalinity was determined by end-point titration (Titralab) on the day of sample collection. Anions were analyzed using a DIONEX DX-120 ion chromatograph within 2 days after sample collection. A vial for cations was acidified with 65% HNO3 suprapure (0.7 mL/100 mL) on the day of sampling, for preservation until analysis by a VARIAN 730-ES ICP-OES. Analytical uncertainty was determined by analysis of internal calibration standards and set to at least 3% (relative standard deviation) to account for dilution errors.
 The GLUE methodology was developed by Beven and Binley  as an extension of the regionalized sensitivity analysis (RSA) of Spear and Hornberger . Given uncertainties and errors in model structure, model parameterization, and observational data, GLUE recognizes that multiple models or model parameterization will be equally good descriptors of the modeled system and thus exhibit equifinality [Beven, 2006]. GLUE therefore, rather than trying to optimize a single parameter set for a given model structure, retains multiple model structures or model parameterizations that adequately fit the observational data and are consequently deemed behavioral. Instead of just accepting or rejecting a parameter set (or more precisely a model structure—parameter set combination) as in the original RSA, a likelihood measure is used to express a degree of confidence in the parameter set. All behavioral parameter sets are used to predict a likelihood-weighted distribution of model response(s). Interaction between parameters is implicitly accounted for by GLUE focusing on parameter sets rather than individual parameters. The collection of behavioral parameter sets is obtained by Monte Carlo sampling of prior parameter ranges, running model simulations, and evaluating the simulated result against a likelihood measure to accept or reject the parameter set. A more complete description of GLUE is presented by Beven and Binley , Beven [2006, 2009], and Freer et al. .
2.4. A GLUE Approach to End-Member Mixing Analysis (G-EMMA)
 An end-member mixing model, explaining stream water chemistry as a conservative mixture of end-member concentrations, is a very simple conceptual description of the origin of stream water. Unsurprisingly, mixing models suffer from similar issues with model equifinality due to uncertainty and errors in model structure, parameters, and observations as the rainfall-runoff models GLUE was first applied to. GLUE minimizes the need for prior assumptions about model structure and structure of errors, and is therefore especially suited to quantify the uncertainty in mixing models pertaining to end-member characterization, that is, the variability in end-member concentrations. Additionally, as GLUE permits different model structures to be simultaneously evaluated as adequate system descriptors, uncertainty in end-member identification can be quantified by testing different sets of end-members against the available stream chemistry. Note that what we term identification uncertainty in this paper is paralleled by “structural uncertainty” in GLUE terminology, and characterization uncertainty by “parameter uncertainty.”
 Our GLUE approach to end-member mixing (G-EMMA) starts with a definition of possible end-members. In EMMA, the Euclidean distance between end-members and their projection in the mixing space is used as a measure of the ability of the end-member to explain stream water concentrations [Barthold et al., 2011; Christophersen and Hooper, 1992; James and Roulet, 2006]. This procedure might, however, obscure end-members that are not characterized properly by their median observed tracer concentrations. Instead, our approach minimizes the necessary prior assumptions by allowing for different end-member combinations during different periods, while relying on the time-variant data to reject invalid end-members.
 Subsequently, appropriate tracers must be identified, subject to two of the usual conditions prescribed by mixing model theory: (1) tracers must mix conservatively and (2) tracers must differ in concentration between end-members [Hooper et al., 1990; Sklash and Farvolden, 1979]. A usual third condition: end-member concentrations must be invariant in time and space, does, however, not apply to the G-EMMA approach, which explicitly accounts for end-member variation. The diagnostic tools of Hooper  can aid in defining appropriate tracers. All identified end-members are characterized by a prior concentration distribution for each tracer. Although we were able to characterize concentration distributions of end-members, either stored in the system or as direct inputs, we decided against using these distributions as priors in the procedure. Instead, as we lack information on how these concentrations are convoluted to observed concentration distributions, conditional on the sampling time at the catchment outlet [Rinaldo et al., 2011], we adopted a minimal-assumption approach and used a uniform distribution over the full range of observed concentrations in samples belonging to an end-member. The G-EMMA methodology then allows the posterior effective concentrations for different end-members to be identified, conditional on this minimal prior assumption for effective end-member concentrations.
 A G-EMMA mixing model consists of: (1) a combination of end-members as a subset of all possible end-members, (2) end-member fractions, and (3) end-member tracer concentrations, and is, following the notation of Christophersen and Hooper , represented in matrix notation by (1):
where li represents the k sized row vector of end-member fractions, B the k × p sized matrix of end-member concentrations, and xi the p sized vector of tracer concentrations in the stream water sample, with k and p as the number of end-members and tracers, respectively. End-member fractions are sampled from a uniform Dirichlet distribution, yielding a uniform distribution of mixtures while ensuring mass balance closure (end-members always sum to one). Note that we opted to sample end-member fractions, rather than infer them from a least-squares regression technique, so as to retain a direct dependence of the results on the chosen likelihood measure (see below).
 For each separate stream water sample, a large number of mixing models is generated by uniform Monte Carlo sampling and evaluated against the observed stream water concentrations in terms of a fuzzy likelihood measure. A fuzzy measure, after Zadeh , can be used to express a “degree of belief” in the model as a valid simulator of the system [Beven and Binley, 1992] and has been used in various previous GLUE applications [Blazkova and Beven, 2002, 2009; Freer et al., 2004; Liu et al., 2009; Page et al., 2003, 2007; Pappenberger et al., 2007]. It can be a useful approach to model evaluation when there is an expectation of epistemic (nonrandom), rather than aleatory (random) errors in the modeling process and observational data [Beven, 2006, 2012]. We define our fuzzy likelihood measure as the average over all tracers of individual trapezoids around the analytical values for each tracer, with a relative likelihood of one for calculated values within one standard deviation of the analytical value, decreasing linearly to zero at three standard deviations. Simulations are considered behavioral only if calculated values for all tracers fall within their respective trapezoids (Figure 3). The repetition of this procedure for each stream water sample allows for time-varying end-member fractions and end-member concentrations, as a reflection of catchment processes. Likelihoods are rescaled to sum to unity over the ensemble of behavioral models identified for each time step independently. All software and source code written to facilitate the G-EMMA procedure are available for download at http://g-emma.deltares.nl/.
2.5. Application to the Lissertocht Data Set
 We compared applications of both the G-EMMA and original EMMA approaches to the Lissertocht data set to assess the significance of accounting for uncertainty in mixing models in a challenging catchment. We first used the diagnostic tools of Hooper  to identify appropriate tracers, complemented by expert knowledge on the chemical stability of solutes in the catchment. Following the EMMA procedure outlined by Christophersen and Hooper , we constructed a correlation matrix, by standardizing the stream water samples to zero mean and a standard deviation of one, before performing a principal components analysis (PCA) on the correlation matrix using all appropriate tracers. We investigated the dimensionality in the data set by analysis of both the eigenvalues (“the rule of one”) and the apparent structure in the residuals for increasing dimensionality, and calculated relative RMS errors (RRMSE) for all residuals [Hooper, 2003]. We subsequently used the methodology proposed by Barthold et al.  to evaluate all possible combinations (minimum of three) of end-members for all possible combinations (minimum of four) of tracers on the three criteria: (1) the Euclidean distance between end-members in solute space and their projections in the mixing space is less than 15% [James and Roulet, 2006], (2) smallest deviations of the calculated end-member fractions from the plausible 0%–100% range and (3) the smallest Euclidian distance between end-members and the median of stream water in the mixing space. We calculated end-member fractions for the best performing end-member combination for comparison with G-EMMA results.
 In the G-EMMA procedure, we retained all possible end-members and tracers. We identified behavioral end-member fractions using the G-EMMA procedure outlined above, using the full range of observed concentrations for our five end-members (Table 2). The number of end-members was allowed to vary randomly between three and five, and we used 1 × 109 Monte Carlo runs for each stream sample. We set the uncertainty of stream water samples to their respective analytical uncertainty and calculated the likelihood of each run following the procedure outlined above. G-EMMA results were evaluated by comparing modeled stream water chemistry with observed stream water chemistry (a valid test because the likelihood is averaged over all tracers), by determining the identification of the end-member fractions and by evaluating the calculated catchment response in terms of its physical plausibility. To explore the relative contribution of identification and characterization uncertainty, we investigated the variety of end-member combinations that yielded behavioral results. In addition, we compared the uncertainty calculated for all possible end-member combinations to that for the end-member combination most likely based on conventional EMMA criteria, and investigated the time-variant response of behavioral end-member concentrations.
3.1. Catchment Hydrometry and Chemistry
 Measured chemical composition of the catchment and end-members is summarized in Table 2. Concentration ranges for the end-members SL and BD were relatively wide, reflecting their high spatial variability. The chemical composition of the stream water was highly variable and showed a distinct response to precipitation events (Figure 4). Generally, solutes B, Br, Cl, Mg, Na, and Sr showed a decrease, whereas Li and SO4 concentrations rose with increasing discharge. April 2012 signified a marked drop in all solute concentrations, coinciding with the start of intake of inlet water into the catchment. Maximum capacity of the four intake culverts together was measured at 95.7 ± 4.3 l/s, which equals 0.83 ± 0.04 mm/d. Pumping capacity of the main pump was measured at 1.01 ± 0.02 m3/s and 1.35 ± 0.05 m3/s in normal and maximum operation, respectively.
3.2. EMMA, Hooper's Diagnostic Tools, and Evaluation of Possible End-Members
 After investigation of bivariate solute-solute plots, we selected B, Br, Cl, Li, Mg, Na, SO4, and Sr as suitable tracers. Other possible tracers showed no significant linear correlation with other solutes and were therefore discarded. After performing a PCA on the Lissertocht stream samples, the rank of the data set was analyzed by studying the structure in the residuals of the solute concentrations in the reduced model space. The “rule of one” suggested a two-dimensional model space explaining 96% of the variance in the stream concentrations, a result corroborated by visual inspection of the residuals and calculated RRMSEs (average 5.6%). Some structure was, however, still apparent for solute B, which disappeared in a three-dimensional model space (average RRMSE 3.5%). The evaluation of possible end-member combinations, following Barthold et al. , resulted in end-members AD, SL, and IL (100%, 98%, and 95%, respectively) featuring in nearly all and BD (74%) in the majority of plausible combinations, while PR featured in markedly less (13%). Differences between tracers were small, all tracers were present in between 55% and 65% of plausible results. The combination of IL, SL, BD, and AD was by far the most prominent, making up 57% of plausible results. Calculated end-member fractions using this combination and all tracers are shown in Figure 5. The fractions of all end-members except AD often fall outside the plausible 0–1 range, most notably during the high discharge period of December 2011 to January 2012.
3.3. GLUE End-Member Mixing Analysis (G-EMMA)
 GLUE analysis of the 362 stream samples resulted in a median value and 25–75 percentile range of 3.8 × 103 (3.1 × 102−1.2 × 104) behavioral runs (with positive fuzzy membership for all eight tracers) out of a possible 1 × 109. Two samples (on 17 December 2011 and 17 July 2012) yielded no behavioral runs (i.e., all of the tried combinations of fractions failed to match the defined fuzzy support for one or more tracers). Measured stream water concentrations could, with these two exceptions, consistently be explained by mixtures of our chosen end-members, as is reflected in the excellent agreement of modeled and measured stream water concentrations of tracers B, Br, Cl, Li, Mg, Na, and SO4. Only Sr is consistently under predicted, albeit slightly (Figure 4). The possibilistic distributions of end-member fractions that yielded behavioral results for the different samples are plotted in Figure 5. This plot can be regarded as a time-variant version of the well-known “dotty-plots” of GLUE applications [e.g., Beven, 2006], showing the likelihood-weighted marginal distributions of behavioral model parameters changing over time, as each sample is represented by a separate Monte Carlo calculation. The calculated uncertainty in end-member fractions, indicated by the 5–95 and 25–75 percentile ranges (shaded bands) in Figure 5, varied over time and between end-members. The complete marginal distributions of all end-member fractions lay (necessarily) within the 0–1 range, and are asymmetrical. While there was considerable uncertainty in the fractions of all end-members except AD, all end-member contributions were sensitive parameters in the GLUE sense and could therefore be adequately identified throughout the time series. Except for AD, behaviorial end-member fractions differed markedly from fractions calculated with conventional EMMA. G-EMMA calculated SL fractions were lower than those calculated with EMMA, which at times exceeded a fraction of 1. Contrastingly, G-EMMA calculated BD and IL fractions were higher than the equivalent EMMA fractions, which at times fell below 0.
 We took a closer look at the distribution of end-members, end-member combinations, and end-member concentrations in the posterior parameter set, that is, the models and parameters that make up the behavioral runs. Averaged over the entire time series, frequencies of end-members occurring in behavioral end-member combinations were: AD: 100%, SL: 90%, IL: 86%, BD: 82%, and PR: 52%. Results resemble those obtained through the criteria of Barthold et al. , although the contribution of PR is much more prominent in the G-EMMA analysis. The end-member combination of IL, SL, BD, and AD yielded the most behavioral runs, closely followed by the combination of all five end-members (Table 3). Results from G-EMMA include more combinations, and frequencies are spread out more evenly over the different combinations.
Table 3. Frequencies of End-Member Combinations Producing Behavioral Results
 The effect of including the identification uncertainty in G-EMMA was investigated by comparing the behavioral end-member fractions for all possible combinations, to the subset of behavioral fractions for the combination IL, SL, BD, AD, the most dominant combination from the criteria of Barthold et al. . Maximum effects were seen in IL, the median fraction of IL resulting from all possible end-member combinations is consistently lower (average −30 ± 8%) than from the subset of one possible combination, and its uncertainty (5–95 percentile range) is consistently larger (average 64 ± 90%) than from the subset (Figure 6). Smallest effects were observed for AD, but effects were still on average −4 ± 24% on median fractions (uncertainty range 5 ± 7% larger). For comparison, the EMMA result (which also pertained to this end-member combination) for IL is also shown in Figure 6b. Even with identical end-member combinations, results differ markedly between EMMA and G-EMMA.
 Analysis of the likelihood weighted marginal distributions of end-member tracer concentrations revealed a general insensitivity of the likelihood of modeled stream water concentrations to IL and PR concentrations, and limited sensitivity to most AD and BD concentrations, as behavioral simulations were found throughout the respective parameter distributions (not shown). Note that model likelihood is associated with a combination of a model structure (end-member combination) and model parameters, rather than a single model parameter [Beven, 2006]. So while the model likelihood (i.e., the fit of stream water concentrations) may be insensitive to end-member concentrations, end-member fractions do not necessarily have to be. Most SL tracer concentrations were, however, constrained to part of their initial range during discharge events, when the fraction of SL in stream water is highest (SO4 shown in Figure 7a). Behavioral results were limited to lower concentrations of B, Li, and Mg, and to higher concentrations of SO4 and Sr during discharge events. Subsequent indicative calculations using these constrained concentrations of SL, instead of the full range, clearly lessened the sensitivity of SL concentrations, while hardly affecting modeled stream concentrations. The resulting median SL fraction was slightly lower (−8 ± 14%) than using the full range, while its uncertainty decreased (−20 ± 15%).
3.4. Catchment Response
 Calculating the discharge for each end-member by multiplying the fraction with the discharge provides a comprehensive view of the catchment's response to rainfall events and enables a, subjective, plausibility check of G-EMMA results (Figure 8). Generally, the observed patterns in catchment response are physically plausible and are consistent with our previously formed perceptual model of the hydrologic functioning of the catchment. The catchment response showed a relatively constant flux of AD, consistent with the relatively constant head difference between the aquifer and the tightly managed surface water levels, and in agreement with results for a similar catchment [De Louw et al., 2011]. Precipitation events resulted in a dominant contribution of SL to discharge, a behavior exhibited by numerous catchments over a range of different geographical settings (overview in Weiler et al. ). A long dry period before the onset of precipitation delayed the response of SL and BD considerably, indicating a thorough depletion of shallow groundwater stores.
 Active water management in the catchment is evidenced in the hydrograph by a rising contribution of IL (and PR to a lesser extent) around 1 April 2012, coinciding with the start of intake of fresh water into the catchment. The discharge of IL rose to a relatively constant value of about 0.5 mm/d, which was in the order of the measured maximum capacity of the intake culverts. Additional intake of water at the auxiliary pump had started on 29 May 2012, lasting approximately 1 week (M. Riethoff, Rijnland Water Authority, personal communication, 2012), coinciding well with the temporary rise in IL discharge in June 2012. The contribution of IL during the winter months and during summer precipitation events was unexpected however, as its input is controlled by actively managed hydraulic structures. This unexpected result may be caused by (1) the lack of separation between IL and PR, the unexpected IL contribution in fact being PR, (2) an unidentified source of water with similar chemical properties as IL, most likely subterranean inputs from the canal supplying IL water, or (3) storage in the extensive surface water system flushed out at discharge events. Due to the uncertainty associated with the proximate locations of IL and PR in mixing space, an additional tracer that better distinguishes between the two would be necessary to better separate the two end-members. Gadolinium has proved successful in a similar setting [Rozemeijer et al., 2012] and also 18O could yield better contrasts [Stuyfzand, 1993]. The contribution of PR appears small even during the larger precipitation events, indicating the absence of significant fast flow routes like overland flow, although the lack of separation between PR and IL necessitates caution when drawing this conclusion.
 Given the well-established problems in identifying and characterizing end-members, end-member mixing models are, at best, simple hypotheses about catchment functioning. Nevertheless, they can still offer valuable insights, assuming uncertainty is adequately accounted for [Soulsby et al., 2003a; Uhlenbrook and Hoeg, 2003]. This paper presents G-EMMA, a novel method of quantifying uncertainty, both in identifying and characterizing end-members, in end-member mixing models, based on the GLUE methodology of Beven and Binley . An additional advantage is that our method allows for using more tracers than necessary, a central feature of EMMA, but lacking in existing quantitative uncertainty assessments. We showed that G-EMMA is able to adequately model stream water concentrations and identify contributions of five different end-members, albeit with varying uncertainty. Therefore, as was also shown by Soulsby et al. [2003a], even in agricultural catchments, heavily impacted by agricultural activities and intricate water management, mixing models can help to better understand catchment functioning.
 Several existing approaches have quantified the uncertainty resulting from an inability to adequately characterize end-member concentrations [Bazemore et al., 1994; Genereux, 1998; Hooper et al., 1990; Joerin et al., 2002; Soulsby et al., 2003a]. Uncertainty in end-member concentrations was inferred from sampling of stored water, and in most approaches, approximated by a Gaussian distribution. However, as Joerin et al.  recognize, and illustrated by Figure 7b, end-member concentrations do not always follow a Gaussian distribution, so that this approximation may lead to incorrect uncertainty estimations. Furthermore, an accurate characterization of stores of end-member water in a catchment does not necessarily equate to a proper characterization of the flux-weighted input to the stream [Rinaldo et al., 2011], although implicitly assumed by these approaches. Instead, recognizing the impossibility of adequate characterization of the flux-weighted input to the stream, we adopted a minimal assumption approach in G-EMMA and assumed a uniform prior distribution over the complete range of sampled end-member concentrations. As evidenced from Figure 7b, the posterior distribution of behavioral end-member concentrations can indeed differ markedly from the distribution obtained through sampling, signifying our inability to adequately a priori characterize end-member concentrations.
 We did not explicitly include the temporal variation of end-member tracer concentrations in our mixing models, as temporal variance was relatively low in measured end-member concentrations. Furthermore, adequate quantification of the effect of temporal variance in catchment inputs on stream concentrations would require a theoretical framework that accounts for both nonlinearity and nonstationarity in travel times [Iorgulescu et al., 2005, 2007; Rinaldo et al., 2011], which is outside the scope of this research. Temporal variation is, however, implicitly accounted for in G-EMMA, as every stream water sample is independently modeled using the full range of observed end-member tracer concentrations. If a temporal signal is significant enough to be expressed in stream water concentrations despite all uncertainty, end-member concentrations should be sensitive parameters in GLUE. Our model results were, however, generally insensitive to end-member concentrations, with the exception of SL. While we cannot exclude temporal variation in SL concentrations, the constraining of SL concentrations during discharge peaks is more likely a result of the high proportion of SL in stream water, increasing the sensitivity to SL concentrations. The constrained concentrations of SL during discharge events may therefore be a closer representation of “real” SL water (i.e., the flux-weighted input to the stream) than the range obtained from sampling.
 Explicitly including time-variant patterns in end-member fractions and concentrations in G-EMMA potentially offers several advantages and is an important direction for future research. First, extending the work of Iorgulescu et al. [2005, 2007], combining G-EMMA with the recent progress made in research on transit time distributions [Heidbüchel et al., 2012; Rinaldo et al., 2011; van der Velde et al., 2012] may be a way to shed more light on the time-variant behavior of end-member concentrations or their convolution to stream chemistry through instationary transit times. Second, combining results for successive samples in a time-filtered way may reduce the uncertainty of end-member fractions as opposed to the current independent simulation of successive samples.
 In mixing model analyses, the choice of end-members is often a translation of the researcher's hypothesis of catchment functioning and therefore, by definition, also an uncertain one. The GLUE approach of G-EMMA quantifies this identification uncertainty by simultaneously evaluating different possible end-member combinations. A comparison between results for a selected end-member combination and the complete result set (Figure 6) illustrated the possible significance of identification uncertainty, in this particular case amounting to a maximum 30% difference in median calculated end-member fractions. End-member IL occupies a proximate location to PR in the mixing space of the Lissertocht catchment, resulting in interference and hence a relatively high uncertainty of both end-members. Similar uncertainty due to interference has, to our knowledge, not been reported, as conventional EMMA guidelines [e.g. Christophersen and Hooper, 1992; Christophersen et al., 1990] recommend the use of end-members that are sufficiently different to each other. We would argue, however, that even if adequate separation is simply impossible based on the available measurement data, retaining proximal end-members presents a more realistic notion of the uncertainty in our understanding of catchment functioning.
 The heavily impacted Lissertocht catchment is, due to the significant spatial variation in end-member concentrations and extraneous inputs of regional groundwater and fresh water intake, a difficult test case for applying end-member mixing models. Indeed, conventional EMMA suffered from repeated excursions of end-member fractions outside the plausible 0–1 range (Figure 5). As these excursions predominantly occurred during discharge events with a large fraction of SL water, this end-member is probably not well represented by its sampled concentration median. The skewed constraining of behavioral SL concentrations in the G-EMMA analysis also points in this direction. Contrastingly, G-EMMA application was not significantly affected by the uncertainty in end-member concentrations, and was still able to identify the (uncertain) contributions of five different end-members to the Lissertocht. Therefore, in addition to quantifying uncertainty in end-member mixing models, G-EMMA can potentially be applied over a wider range of catchments than conventional EMMA, while still yielding meaningful results. Moreover, application of G-EMMA is not limited to hydrology, but may be successfully applied to end-member mixing problems in other (earth) sciences.
 Using a GLUE-based approach to end-member mixing models allowed a more complete investigation of end-member mixing uncertainty than existing methods, as the approach includes both characterization and identification uncertainty. Despite this uncertainty, G-EMMA was able to characterize end-member contributions to the Lissertocht, where conventional EMMA results suffered from repeated excursions outside the plausible 0–1 range. We therefore recommend using G-EMMA to more robustly test hypotheses about catchment functioning, especially in complex catchments with considerable concentration ranges. In spite of the well-rehearsed difficulties in applying end-member mixing models to agricultural catchments, our approach enabled us to improve our understanding of the functioning of an actively managed Dutch polder catchment throughout the course of a year.
 We thank J. Visser for laboratory assistance, Rijnland Water Authority for helping with the measurement setup and supplying additional data, and two anonymous reviewers for providing valuable suggestions to this paper. This work was carried out within the Dutch “Knowledge for Climate” program.