## 1. Introduction

[2] A wide variety of hydrologic applications (e.g., management of dams and flood control planning) require knowledge of daily flows in ungauged stream reaches. Accurate interpretation of many stream ecological and water quality characteristics can also depend on quantifying flow at the time of sampling. For example, stream nutrient concentrations, such as total phosphorus, are strongly dependent on flow magnitude. High-velocity flows observed during periods of bankfull and larger floods include riparian and general overland flows that carry nutrient-laden sediments into the stream channel, which, in turn, greatly increase nutrient concentrations. Conversely, during periods of low flows, flow often originates from subsurface sources that have comparatively lower total phosphorus concentrations [*Banner et al*., 2009]. Hence, nutrient concentrations observed in individual samples are best interpreted in the context of the flow conditions at the time of water quality sampling. Similarly, the amount of algae accumulation on stream substrates depends strongly on the flow history prior to the time of sampling [*Biggs*, 2000].

[3] Flow conditions in different streams vary strongly in both space and time because of differences in the location and timing of precipitation events, differences in the hydrological characteristics of the stream network, and differences in physical characteristics of different watersheds. Thus, predicting daily streamflow in ungauged basins has presented a long-standing analytical and conceptual challenge [*Sivapalan*, 2003]. Predictions of particular flow statistics (e.g., 100 year flood, base flow) in ungauged basins have been based on statistical models that related catchment characteristics to the flow statistic of interest [*Santhi et al*., 2008; *Haddad et al*., 2012], and prediction approaches based on spatial statistical analyses have also been proposed [*Skøien et al*., 2006; *Skøien and Blöschl*, 2007; *Castiglioni et al*., 2009]. However, to predict historical daily flows, flow on each day is inherently a different flow statistic, and hence, in theory, one must calibrate a different model for each daily flow one wishes to predict.

[4] Other approaches more efficiently predict flows on many different days by requiring that one select one or more “index gauges” for each ungauged location. Index gauges are gauges at which the timing of different flow events is assumed to be the same as that of the ungauged location. Then, one computes daily flows by combining the timing information with an estimate of the relative magnitude of the flow in the ungauged basin. For example, one commonly used approach for predicting daily flows assumes that the timing of flow in an ungauged basin is the same as that in the gauged basin, and that the ratio of the magnitude of flows in the two locations is equivalent to the ratio of the basin areas [*Hirsch*, 1979].

[5] Index gauges have historically been selected based on the geographic proximity to the location of interest and by best professional judgment, but different mathematical and statistical approaches for selecting index gauges have been recently proposed. For example, recent studies have demonstrated that the use of multiple index gauges can improve the accuracy of flow predictions [*Smakhtin*, 1999; *Zhang and Kroll*, 2007; *Shu and Ouarda*, 2012]. Further improvements in predictions were also observed when the contributions of different gauges were weighted by the distance between index gauges and the ungauged site or weighted by the degree to which certain preselected physical characteristics of the gauged basin were similar to those of the ungauged basin [*Shu and Ouarda*, 2012].

[6] An alternate approach for selecting index gauges is based on the idea that the gauges at which flows are most strongly correlated with the daily flows at the ungauged site should be selected as index gauges. The application of this approach is most easily conceptualized when a discrete or partial flow record is available at the ungauged site. Then, one calculates the correlation between flows at the ungauged site and candidate gauged sites using just the available flow data and selects the index gauge based on the gauge that exhibits the strongest correlation [*Reilly and Kroll*, 2003; *Eng et al*., 2011]. One method for predicting the strength of correlation at sites in which no flow data are available has recently been proposed [*Archfield and Vogel*, 2010]. In this approach, spatial interpolation is used to predict the expected correlation between daily flows at the ungauged site and flows at a candidate index site. This interpolation is repeated for all available candidate index sites, and then a final index site is selected based on the largest predicted correlation coefficient.

[7] Several questions arise that are directly connected with the use of the strength of correlation of daily flows to select index gauges. First, what is the minimum degree with which daily flows in index gauges can be correlated with flows at the ungauged site to ensure that the index gauge provides useful information regarding the flow timing? Second, how many index gauges should one select? Third, and perhaps most important, how can one best select an index gauge for an ungauged site when partial flow data are not available? Here I analyze historical daily flows recorded at gauges in the Ohio River Valley, USA, to answer the first two questions. I then describe a novel method for identifying the reference gauges that uses statistical models to predict the strength with which daily flows at an ungauged site are correlated with the available flow gauges based on the basin physical characteristics and the site location. Based on this prediction, the best index gauges can be selected.