Quantifying hydrologic alteration in an area lacking current reference conditions—The Mississippi alluvial plain of the south‐central United States

Quantifying hydrologic alteration in the Mississippi Alluvial Plain (MAP) of the south‐central United States is particularly difficult because of the lack of current reference, or even relatively undisturbed, streams and associated streamflow data. Impacts, such as water withdrawals for agriculture, weirs, dams, channelization, and other forms of regulation, within the MAP increased substantially beginning around 1960 suggesting that streamflow has since been altered. Using historical streamflow and climate data and explanatory variables, the U.S. Geological Survey developed random forest regression models to estimate expected reference monthly streamflows (pre‐1960) at 76 sites in the MAP and two adjacent Level III Ecoregions. To compensate for the lack of current reference stream sites in the study area, the pre‐1960 streamflow data were used as a surrogate to estimate current streamflow conditions without anthropogenic influence (inferring current reference conditions). Overall, nearly every site within the study area had less zero‐flow days than what historically has been observed and there were more low‐pulse spells. However, the frequency of floods remained relatively consistent.

There are some geographic areas, however, where there is little to no streamflow data or where human disturbances are so pervasive that reference conditions can only be found in the historical streamflow record. Minihane (2012) used index-gage methods, a hydrologic model, and in-situ observations to estimate a 10-year historic baseline of mean monthly streamflows in the Rovuma River, an undeveloped stream with few streamflow observations, in southern Africa. In developed or altered streams, a variety of methods have been used to establish reference streamflow conditions using historical (pre-alteration) streamflow records (Henriksen, Heasley, Kennen, & Niewsand, 2006;Richter et al., 1997) so that the degree of alteration can be identified. The most common approach is to compare streamflow regimes between unimpacted and impacted time periods (Gao, Vogel, Kroll, Poff, & Olden, 2009). Candela, Tamoh, Olivares, and Gómez (2016) conducted a study in the Mediterranean basin and defined the baseline, historical period as 1984-2008 to compare streamflows with future climate and land-use change scenarios.
The Mississippi Alluvial Plain (MAP) of Arkansas, Louisiana, Mississippi, Missouri, and Tennessee ( Figure 1) has been substantially modified by humans to facilitate agricultural production in the fertile floodplain of the Mississippi River. Land levelling, channelization, and irrigation withdrawals from surface-water and groundwater resources have changed the streamflow regime. For example, irrigation withdrawals from groundwater, and to a lesser extent surface water, in the MAP began prior to the 1920s. Compared with current (2014) conditions, groundwater withdrawals ( Figure 2) were relatively low from the 1920s to about 1960 (Clark et al., 2011). Since 1960, withdrawals have increased and there have been subsequent observed decreases in streamflow and local areas of streamflow depletion within the MAP.
To identify the degree of hydrologic alteration of streams in the MAP, we used random forest (RF) regression methods (Breiman, 2001) to model the relation between six selected streamflow characteristics and explanatory variables (such as drainage area, FIGURE 1 Extent of the Mississippi Alluvial Plain and two surrounding Level III Ecoregions (Omernik, 2004) and select U.S. Geological Survey streamflow-gaging stations used to quantify hydrologic alteration [Colour figure can be viewed at wileyonlinelibrary.com] precipitation, soils, and other watershed characteristics). RF models were chosen for this study because they have been proven to be more robust and accurate than traditional linear regression models (Carlisle, Falcone, et al., 2010;Lawler, White, Neilson, & Blaustein, 2006;Prasad, Iverson, & Liaw, 2006;Cutler et al., 2007). Expected monthly mean streamflow was estimated for a reference period (pre-1960) using daily streamflow data, climate data, and explanatory variables at 76 sites within the MAP and two adjacent Level III Ecoregions (study area). The reference period RF model was then used to estimate streamflow characteristics (flood frequency, high-flow duration, number of zero-flow days, frequency of low-pulse spells, and high flow-discharge index) and expected monthly streamflow for the 76 sites where post-1960 data were available. We discuss our results by comparing observed values with the expected values estimated by the RF models to identify stream sites that are altered within the study area. These analyses and considerations are crucial to better understand the effects of hydrologic alteration that impact human and biological water resource needs.

| Site selection
Streamflow-gaging stations with at least 5 years of continuous data prior to 1960 were considered candidate reference stations for this study (Figure 1). Cursory selection of stations with at least 5 years of data prior to 1960 within the MAP Level III Ecoregion (Omernik, 2004) yielded 52 stations (excluding the Mississippi River). Upon further investigation, stations with "chute" or "ditch" in their respective USGS station identifier names were removed; other stations were subsequently removed based upon review of historical literature, aerial photography, and topographic maps for the presence of anthropogenic influences, such as reservoirs, dams, weirs, and major channelization that occurred before 1960. Additionally, some stations were removed because much of their watershed was outside the boundary of the MAP. These reviews ultimately removed 37 stations leaving only 15 stations within the MAP, which was deemed insufficient for purposes of analysis for this study. To improve accuracy and minimize error of the regression models used to estimate expected streamflow, additional stations were needed. Thus, we extended the study area to adjacent Level III Ecoregions (the Mississippi Valley Loess Plains and the South Central Plains) to include additional stations having similar topographies and correlations of precipitation and evapotranspiration as those of the MAP (Wolock, 2003). After following the same review process used for selecting the original 15 stations within the MAP, 61 additional stations were included from adjacent ecoregions for a total of 76 potential stations ( Figure 1; Table 1).

| Estimation of expected values of streamflow and streamflow characteristics using RFs
RF regression methods (Kuhn & Johnson, 2016) were used to model relations between explanatory variables (such as climate, topography, soils, geology, and other watershed characteristics) and selected streamflow characteristics to produce a model that calculated expected monthly mean streamflow from daily mean streamflow for the reference period (preregulation or pre-1960) and the current (post-1960) period. For a detailed description of RF, see Cutler et al. (2007) and Liaw and Wiener (2002). For this study, R statistical software (R Development Core Team, 2014) was used to complete RF regressions using a package for R called "Random Forest" (Liaw & Wiener, 2002).
Observed mean monthly streamflow was calculated for every site for every month (for example, 10-1-1938 is the average flow for the    Table 2). These are common indices used to represent biologically relevant streamflow attributes and are considered suitable for all stream types, except for fh11, which is considered suitable for superstable or stable groundwater perennial streams (Olden & Poff, 2003

| Explanatory variables
Explanatory variables pertained to watershed size and geometry, meteorology (precipitation and temperature), geology, and soils.
Watershed geometry and size were determined for this study using a geographic information system. Climate data were specific to the month and watershed for sites used in the regression analysis and were obtained from the Parameter-elevation Relationships on Independent Slopes Model dataset (PRISM; Daly et al., 2008). For each month of record for a station, a mean monthly value of precipitation and minimum and maximum temperature were determined from the PRISM grids. Using the extracted climatic data, evapotranspiration for the watershed was calculated using the Hargreaves method (Droogers & Allen, 2002;Hargreaves, 1994). In addition, the preceding monthly precipitation and minimum and maximum air temperature were also extracted for the previous 1 to 6 months. Evapotranspiration was then computed for the previous 1 to 6 months from the  Reed and Bush (2005). Topographic data were obtained from the National Elevation Dataset (U.S. Geological Survey, 2014b) and included the mean, minimum, maximum, standard deviation, and range for slope and elevation. Explanatory variable data used as input for the RF models and the estimated values for streamflow and the streamflow characteristics that support the findings in this paper are available from Hart and Breaker (2018).

| Model evaluation and assessment of hydrologic alteration
To select the best number of trees and splits for the final RF model, a "leave-one-out" validation process was used. The process involved iterative runs of the RF model using specific values for number of trees and number of splits with the data from an individual station removed. After each iteration, the model was evaluated against data from the station left out of a specific run, and the root mean square error (RMSE) of the observed reference period (O R ) and expected (or predicted) reference period (E R ) data were recorded. This process was iterated through all stations, and the resulting values of RMSE were used to select the number of trees and splits for the final model.
Performance metrics used to evaluate monthly streamflow included the weighted correlation coefficient (R 2 ; Tables 1 and 2), normalized RMSE (NRMSE), and percent bias (PBIAS; Moriasi et al., 2007). Performance metrics were computed for all regression equations for all values of O R /E R independently for monthly mean streamflow and for the entire reference period for fh11, dh20, dl18, fl3, and mh16 ( Table 2)  A value of O R /E R near 1 indicated that the RF regression was a good predictor of streamflow statistics for the expected post-1960 monthly mean streamflow (E PR ). The resulting outputs from the RF regression model provided a reasonable comparison of expected to observed monthly mean streamflow (median value of 0.85) for sites in the study area using current climate data (Table 1)

| RESULTS AND DISCUSSION
For most streams, the monthly ratio of O PR /E PR was less than 0.8, which indicated less streamflow was measured in the streams than what was currently expected (Table 1). We then chose to examine a range of hydrologic alteration representing a high (April) and a low (September) streamflow period (Figure 3) Observed monthly mean streamflows for April were shown to deviate from model predictions at a rate that increased to the end of the reference period (Figure 3 a) for all stations. Station 07368000 (Site No. 47 in Table 1 and  (Figure 3c). This is coincident to when anthropogenic influences (i.e., groundwater withdrawals in Figure 2) in the hydrologic system began to increase in the study area.
We further examined Station 07368000 Boeuf River near Girard, LA (hereafter referred to as Boeuf River-Girard; Site No. 47 in Figure 1 and Table 1)   indicating an increase in the high flow-discharge index at the respective sites relative to O R .
For an area like the MAP, any alterations in streamflow and characteristics can be directly related to land use and water use changes. For instance, changes in the number of zero-flow days in a stream could be a result of increases in water use conservation and surface runoff to streams from land-levelling practices. There could also be influence from water-capture structures like weirs and dams.

| CONCLUSIONS
Our analysis demonstrates that RF regression models can be created from historical streamflow data at undisturbed references sites.
Although unaltered reference stations are not present in contemporary time, there were enough data available prior to landscape and water use changes (pre-1960) to calibrate a model of streamflow