Classifying freshwater salinity regimes in central and western U.S. streams and rivers

Freshwater salinization of rivers is occurring across the globe because of nonpoint source loading of salts from anthropogenic activities such as agriculture, urbanization, and resource extraction that accelerate weathering and release salts. Multidecadal trends in river salinity are well characterized, yet our understanding of annual regimes of salinity in rivers draining diverse central and western U.S. landscapes and their associated catchment attributes is limited. We classified annual salinity regimes in 242 stream locations through dynamic time warping and fuzzy c‐medoids clustering of salinity time series. We found two dominant regimes in salinity characterized by an annual summer–fall peak or spring decline. Using random forest regression, we found that precipitation amount, stream slope, and soil salinity were the most important predictors of salinity regime classification. Advancing our understanding of salinity regimes in rivers will improve our ability to predict and mitigate the effects of salinization in freshwater ecosystems through management interventions.

Salinization of freshwaters is of growing global concern (Cañedo- Argüelles et al. 2013), and in the United States, changes in stream salinity levels are driven by variation in land use and climate (Kaushal et al. 2018;Olson 2019;Stets et al. 2020). Elevated salinity levels can lead to reduced fitness and survival of freshwater organisms (Cañedo-Argüelles et al. 2019), declines in species diversity (Cañedo-Argüelles et al. 2013), and reduced ecosystem services, such as litter decomposition rates (Berger et al. 2019). In a multiple stressor context, salinity can cause further harm to organisms depending on the timing of each stressor (Velasco et al. 2019). Furthermore, freshwater salinization complicates the use of water resources for drinking water, irrigation, and recreation (Cañedo-Argüelles et al. 2019). Characterizing annual stream salinity regimes, namely the timing and duration of elevated salinity concentrations, can provide insight into regional drivers of characteristic patterns in salinity and inform management interventions.
Patterns in freshwater salinity across broad spatial scales are partially determined by natural variation in the lithology, topography, and climate of the contributing watershed (Olson and Hawkins 2012), but concerns are growing over broad scale anthropogenic drivers of salinization (Cañedo-Argüelles et al. 2019). Specifically, the use of de-icing road salts in urban areas, displacement of saline groundwater and salinized return flows from irrigated agriculture, and increased weathering of soils and infrastructure contribute to freshwater salinization in anthropogenically impacted landscapes (Kaushal et al. 2018;Moffett et al. 2020). Streams in the highly populated northeastern United States and the agricultural Midwest have already experienced extensive salinization (Kaushal et al. 2018;Stets et al. 2020). In the western United States, the implementation of salinity mitigation projects in highly managed river basins, such as the Upper Colorado River Basin, have caused salinity levels to decline after being elevated (Rumsey et al. 2021). However, regional changes in the salinity of streams across the central and western United States are still occurring (Olson 2019;Stets et al. 2020) and western streams are predicted to experience some of the greatest changes in salinity from natural background conditions relative to other regions by the end of the 21 st century (Olson 2019). Across the central and western United States, stream salinity is predicted to increase because of land use, increased evapotranspiration, and declines in precipitation in some subregions (Olson 2019) which may shift legacy stream salinity dynamics towards new annual regimes (e.g., higher summer salinity peaks during low flow, reduced spring dilution, or shifts in the timing of peak salinity with implications for biota).
Although stream salinization has been characterized at multidecadal scales across the United States (Kaushal et al. 2018;Stets et al. 2020), there is limited understanding of the patterns and drivers of annual salinity regimes. Annual salinity regimes can provide insight into site-specific periods of time when salinity concentrations may reach thresholds that are stressful to aquatic life (e.g., fluctuations in Arkansas River salinity driven by the timing of irrigation; Lin and Garcia 2012). Dissolved salt export regimes from catchments are commonly characterized using concentration-discharge (C-Q) relationship metrics, which quantify how solute concentrations change with discharge in a river (Godsey et al. 2009;Thompson et al. 2011;Musolff et al. 2015). These metrics provide insight into linkages between catchment hydrology and biogeochemistry; yet, additional characteristics including the timing of peak salinity and the spatial distribution of similar regime types can be gleaned using time series clustering methods. Clustering has previously been used to classify regimes of stream hydrology (Olden et al. 2012), temperature (Maheu et al. 2016;Willis et al. 2021), nutrient concentrations (Van Meter et al. 2019, and productivity (Savoy et al. 2019).
In this study, we used dynamic time warping (DTW) and fuzzy c-medoid (FCMdd) clustering to identify and classify emergent annual regimes of salinity in streams and rivers across the central and western United States. We examined the relationships between the resulting regimes and their catchment attributes as well as common C-Q metrics. We predicted that three distinct salinity regimes would emerge from our analysis and would be dominated by (1) peak salinity during the late summer and early fall, (2) a salinity decline during the spring snowmelt period, and (3) constant salinity with brief winter peaks.

Data
We based our initial site selection on the availability of relevant data from USGS National Water Information System gages and only included sites classified as streams, canals, or springs in the National Hydrography Dataset Plus Version 2 (NHD) (McKay et al. 2012). We restricted sites to the central and western United States (hydrologic units 10-18) based on their position west of the main stem of the Mississippi River (Bolotin and Blaszczak 2022). We used specific conductance (SC) as a proxy for salinity (Olson 2019) and downloaded daily mean SC data and discharge data from 1970 to 2020 for 645 USGS gage locations using the dataRetrieval package (De Cicco et al. 2018) in R (R Core Team 2020). We excluded any USGS SC and discharge data not marked as "Approved for Publication." We also included two non-USGS sites with SC data from sensors in the Rio Grande that we calibrated following USGS standards (Wagner et al. 2006), corrected for instrument drift and fouling using Aquarius Workstation 3.3 (Aquatic Informatics), and paired with discharge data from the nearest USGS gage. We filtered time series data for siteyears with ≥ 219 d (60% of the year) and gaps of ≤ 4 d, yielding 999 site-years from 242 sites (Table S1; 3 canals, 3 springs, 236 streams). To capture the typical annual range and timing of salinity at each site, we created one representative time series of SC at each site by calculating the mean daily SC for each day of the year across all years of filtered data. We filled gaps in the representative time series of less than 4 d using linear interpolation, converted SC to salinity (practical salinity units) using a generalized equation for salinity conversions in freshwater (Schemel 2001;Wagner et al. 2006) and then znormalized the time series.

Clustering
We used DTW to quantify the similarity between the z-normalized time series (Sakoe and Chiba 1978;Berndt and Clifford 1994). DTW is more elastic than Euclidean distance, which may overstate the difference between two time series that are similar in shape but out of phase on the time axis, a common feature of hydrologic time series (Olden et al. 2012;Savoy et al. 2019). We clustered sites using FCMdd clustering, which assigns a relative cluster membership to each time series (Izakian et al. 2015).
We first determined the optimal warping window (w), which is the window of time across which DTW makes comparisons between time series, by using the Sakoe-Chiba band (Sakoe and Chiba 1978). We chose 7 d for w by testing a range of w values in 7-d increments up to 28 d and compared results from five internal cluster validity indices (CVIs) specific to FCMdd for each w using the "dtwclust" package in R (Sarda-Espinosa 2019). Three out of five CVIs indicated that two clusters with a 7 d w was optimal for the FCMdd analysis (Table S2). We clustered sites with 1000 iterations, an exponent of 2, and a convergence criterion of 0.001.

Post-clustering analysis
We compiled NHD catchment attribute data related to climate, lithology, soils, topography, land use, atmospheric deposition, and hydrologic modification (hereafter referred to as catchment attribute data) from Wieczorek et al. (2018)   catchments (see summary in Table S3). We paired the catchment attribute data with our sites using the common identifier (ComID) associated with each site in NHD, which we identified using the nhdplusTools package in R (Blodgett 2018). Five sites in the initial clustering (Table S1) were not considered in any post-clustering analysis (n = 237) for a lack of catchment attribute data or an unverifiable ComID.
We developed random forest regression models to investigate the relative importance of the catchment attributes for predicting each site's degree of membership to each cluster according to our FCMdd clustering analysis (see Supporting Information). Degree of cluster membership is defined as the proportion, out of 100% cluster membership, that a site belongs to each cluster. We applied our final random forest model to predict the cluster membership of stream reaches not used in the initial model tuning process but in the same HUC 2 basins (10-18) and with catchment attributes within the range of initial predictor variables (n = 980,281).
We used several common C-Q metrics to contextualize the resulting regimes, including the regression slope (β) of site-specific log-log salinity-Q relationships (Godsey et al. 2009) and the ratio between the coefficient of variation in salinity and Q (CV S : CV Q ; Thompson et al. 2011). We calculated these metrics for all sites to examine whether clusters were representative of chemostatic or chemodynamic export regimes (Musolff et al. 2015). We considered that dilution is the dominant control on solute export from catchments when β is negative while flushing (mobilization) of solutes is the dominant catchment behavior when β is positive. Chemostatic behavior is exhibited when β $ 0 and CV S : CV Q << 1, meaning that solutes only vary as a function of discharge, while chemodynamic behavior is exhibited when CV S : CV Q > 1, which is representative of concentration variation independent of discharge.

Results
The FCMdd clustering of annual salinity time series across 242 stream and river locations in the central and western United States identified two emergent salinity regime types (Fig. 1a). Cluster 1 was characterized by a summer-fall peak in salinity and included 40 sites with a high degree of cluster membership, which we define as greater than 90% cluster membership (Fig. 1c). Cluster 2 was characterized by a spring decline in salinity and included 52 sites with a high degree of cluster membership. Sites that were weakly associated with either cluster (45-55% membership to either cluster 1 or 2; n = 25) were characterized by the aseasonality of salinity. Sites with less than 90% membership to either cluster were common (n = 150; 62% of sites). Sites with high summer-fall peak salinity regime membership tended to be located along the Pacific Coast and Gulf Coast regions, while sites with a spring decline regime were generally located along rivers draining the Rocky Mountains and Sierra Nevada Mountain regions (Fig. 1b). Sites with aseasonal salinity regimes were distributed throughout the gage locations.
Summer-fall peak (cluster 1) and spring decline (cluster 2) regimes had similar random forest model performance, with a root mean squared error of 0.11 for the training data set and 0.25 for the test data set for both models (Fig. S3). Since membership to one cluster is the inverse of membership to the other, we only maintained the spring decline model which had slightly better prediction performance for the testing dataset (R 2 adj = 0.47; Fig. S3). The top predictors of regime cluster membership with a greater than 10% increase in mean standard error if removed from the model were annual precipitation, soil salinity, stream slope, latitude, and the percentage of lithological magnesium oxide (MgO)  or near surface geology (Fig. 2). Probability of site membership to the summer-fall peak regime increased above an average annual precipitation threshold of 1000 mm yr À1 , an average stream slope of 2%, and percent lithological MgO of approximately 6% (Fig. S4). Probability of spring decline regime membership increased between latitudes of 35-45 N and above an accumulated mean soil salinity of zero (Fig. S4).
Overall, the majority of sites exhibited solute export regimes representative of chemostatic or solute dilution behavior (Fig. 3), given the range in β (À0.80 to 0.29) and CV S : CV Q (0.01-1.19) across sites. Few sites (n = 20) had positive β values (mobilization-driven behavior)-a majority of which were spring decline regimes-and one site with an aseasonal salinity regime had CV S : CV Q > 1 indicative of chemodynamic solute export behavior (USGS-09250400). Among sites with a high degree of cluster membership (sites with >90% membership), the median and within-cluster variation in CV S : CV Q was highest for the spring decline regime, which also had a more negative median β, indicative of dilutiondriven behavior (Fig. 3).
Extrapolated predictions of stream regime cluster membership by HUC 2 basin revealed varying proportions of dominant salinity regimes across the central and western United States (Fig. 4). Almost no streams were predicted to have strong cluster membership (< 0.1% of reaches per HUC 2 basin). Approximately half (> 45%) of streams within the majority of HUC 2 basins were predicted to have moderate summer-fall peak regime cluster membership (55-90% cluster 1), except for the Missouri, Pacific Northwest, and Texas-Gulf HUC regions in which the spring decline regime was more prevalent.   Fig. 4. The proportion of river reaches per each USGS 2-digit hydrologic unit (HUC 2 basin) predicted to have a certain degree of cluster membership (n = 980,281). Very few to no river reaches (< 0.1% of reaches per HUC 2 basin) are predicted to have strong membership (> 90%) to either cluster, therefore their proportions are not visible in each pie chart.

Salinity regime classification
We identified two divergent annual salinity regimes characterized by either an annual summer-fall peak or spring-decline from 242 sites in streams and rivers across the central and western United States using daily time series of salinity estimated from SC time series. These two river salinity regimes differ in the timing and duration of peak salinity within a year and provide insight into the seasonal dynamics of salinity by identifying time periods when salinity is elevated (Fig. 1). One extreme of cluster membership was characterized by a summer-fall peak in salinity, indicative of a regime dominated by low flows in the late summer and fall. The opposite extreme of cluster membership was characterized by a spring decline in salinity, indicative of a regime dominated by spring snowmelt. Sites with weak membership to either cluster (45-55% membership) were characterized by aseasonal patterns in salinity. We did not find strong evidence of winter and spring peaking due to the application of de-icing salts to roadways (Kaushal et al. 2018) as is commonly observed in streams in the eastern and midwestern United States.
Summer-fall peak salinity regimes occurred along the Pacific Coast and Gulf Coast regions, and spring decline salinity regimes were generally observed in rivers draining the Rocky Mountains and Sierra Nevada Mountain regions (Fig. 1b). This spatial distribution of regime extremes reflects the dominant climate in these regions. The Pacific Coast experiences a Mediterranean climate with extended periods of low flows that can concentrate solutes in the summer and fall (Ahearn et al. 2004). Saline waters from irrigation return flows (Vengosh 2003) may also elevate salinity during the summer and fall. Rivers and streams draining mountain ranges with deep snowpacks (e.g., Rocky Mountains) are diluted during snowmelt in the spring (Campbell et al. 1995;Poff 1996). Time series with aseasonal (constant) salinity suggest a regime lacking strong seasonal variation in flow, possibly because of high stream-groundwater connectivity (Hare et al. 2021).

Controls on salinity regimes in western United States streams
Catchment and climate characteristics both shaped annual salinity signals in rivers across the central and western United States (Fig. 2). Average annual precipitation, soil salinity, stream slope, the latitude of the monitoring location, and the percentage of lithological MgO were the top five predictors of cluster membership. We found that predicted spring decline regime membership was highest in sites within a 35-45 N latitude range, which encompasses the Rocky Mountains and Sierra Nevada Mountain regions where snowmelt strongly influences streamflow (Figs. 1b, S4). We did not expect the threshold increase in predicted summer-fall peak regime membership past an average annual precipitation of 1000 mm yr À1 (Fig. S4), yet this likely reflects the influence of streams in the coastal Pacific Northwest where wet winters, dry summers, and a limited snowpack lead to low flows with high salinity in the summer and fall despite high annual precipitation amounts. Both soil salinity and lithological MgO represent variation in natural sources of catchment salinity, while an increase in summer-fall peak regime membership above a 2% stream slope likely reflects small catchment sensitivity to seasonal drying that drives evapo-concentration of solutes. We expected anthropogenic influences such as land use and dams to be important predictors of regime type, yet land use was of limited variable importance or excluded from the final random forest models following feature selection.
Salinity exhibited less variability than discharge in all but one USGS site in this study (USGS-09250400), which included data from before salinity-control efforts implemented in the 1980s reduced salinity in the upper Colorado River Basin (Rumsey et al. 2021; Fig. 3). Almost all locations exhibited chemostatic or diluting salinity behavior. These solute export behaviors are consistent with previous studies that found similar negative C-Q slopes (β) or relative coefficients of variation (CV S : CV Q ) for SC (Godsey et al. 2019;Diamond and Cohen 2018). Sites exhibiting a moderate to strong diluting behavior had high cluster membership in the spring decline regime, as expected for sites receiving low salinity snowmelt. Most sites with high membership in the summer-fall peak regime exhibited chemostatic or weakly diluting behavior. Our investigation of solute export behavior is limited by our use of salinity converted from SC which does not provide insights into the behavior of individual solutes. Continental scale catchment solute export behavior is dominated by chemostasis (Godsey et al. 2009), yet whether the export behavior of individual solutes diverges with different salinity regimes and whether those individual solute-salinity relationships are consistent across different regions with variable lithology is unclear and warrants further investigation.
We extrapolated predictions of salinity regime types to stream reaches beyond our testing and training datasets and found only a negligible number (< 0.1% of sites) of streams predicted to have strong cluster membership (> 90%) to either the spring decline or summer-fall peak regime. We likely underpredict salinity regime extremes given the more limited range of catchment attributes encompassed by our relatively small initial sample size (n = 237) compared to the number of stream reaches we extrapolated to (n = 980,281). However, variation among HUC 2 basins in the relative proportions of moderate spring-decline and moderate summer-fall peak cluster membership reflects the diverse landscapes spanning the central and western United States that lead to variation in controls on the timing and duration of elevated salinity.

Implications
The classification of river and stream salinity regimes provides a tool for monitoring the impact of climate and land use change on the delivery of salts to rivers and streams. This is a pertinent task when considering the timing of peak salinity to the timing of other stressors in aquatic ecosystems (e.g., temperature), which can have additive, antagonistic, or synergistic effects on organism fitness and survival depending on the stressor (Velasco et al. 2019). In this study, we focus on the central and western United States where land use and climate change are predicted to account for 55% and 12%, respectively, of projected increases in river salinity (Olson 2019). Although land use was not a strong predictor of river salinity regime types across the central and western U.S. region based on historical data in this study, the timing of peak salinity may shift along with mean concentrations in response to changes in climate, specifically the seasonality and form of precipitation. Flow-adjustment of salinity in our preliminary analyses did not change the number nor regime types we identified, yet further investigation into controls on flowadjusted salinity regimes may provide additional insight into the relative influence of hydroclimatic vs. land use change as drivers of salinity regimes. Understanding how mean conditions and the characteristic timing and variability in peak salinity are changing can inform efforts to mitigate the detrimental effects of freshwater salinization in response to climatic and anthropogenic change.