Updated proxy reconstructions of water year (October–September) streamflow for four key gauges in the Upper Colorado River Basin were generated using an expanded tree ring network and longer calibration records than in previous efforts. Reconstructed gauges include the Green River at Green River, Utah; Colorado near Cisco, Utah; San Juan near Bluff, Utah; and Colorado at Lees Ferry, Arizona. The reconstructions explain 72–81% of the variance in the gauge records, and results are robust across several reconstruction approaches. Time series plots as well as results of cross-spectral analysis indicate strong spatial coherence in runoff variations across the subbasins. The Lees Ferry reconstruction suggests a higher long-term mean than previous reconstructions but strongly supports earlier findings that Colorado River allocations were based on one of the wettest periods in the past 5 centuries and that droughts more severe than any 20th to 21st century event occurred in the past.
 The Colorado River, perhaps the most important regional source of surface water supply in the western United States, was the subject of the first tree ring based effort aimed at the quantitative reconstruction of streamflow records [Stockton and Jacoby, 1976]. The reconstruction of annual flows at Lees Ferry, which reflects conditions in the entire Upper Colorado River basin (Figure 1), contained several noteworthy features. The highest sustained flows in the entire record, 1520 to 1961, occurred in the early decades of the 20th century, a period that coincides with the negotiation of the 1922 Colorado River Compact and the resulting allocation of Colorado River flows. In effect, water that was not likely to be in the river on a consistent basis was divided among the basin states. In addition, the most persistent and severe drought occurred in the late 16th century, with flows during this period much lower than for any event in the 20th century.
 Two decades later, this landmark reconstruction was the basis for a series of studies that investigated the hydrologic, social, and economic impacts of a severe sustained drought in the Colorado River basin [Young, 1995]. These studies indicated that under the current Law of the River (the set of legal compacts and regulations that govern the Colorado River), a drought like the 16th century event in Stockton and Jacoby's record would greatly challenge the capacity of the Colorado River to meet water supply needs, and have significant impacts on Compact obligations.
 Severe drought conditions in the Colorado River basin, coupled with a large increase in water use over the past two decades, have recently resulted in water demands that have outstripped natural inflows [Fulp, 2005]. Moreover, new water projects, additional management concerns such as endangered species, and large increases in population have altered the potential impacts of drought. These conditions have reinvigorated interest in reconstructions of Colorado River flow. Stockton and Jacoby's  original Lees Ferry reconstruction ended in 1961, which has made it difficult to assess recent droughts in a long-term context. In addition, reconstruction methods have evolved greatly in recent decades. Hidalgo et al.  have shown that features of the Stockton and Jacoby reconstruction, including relative drought severity and duration, are sensitive to modeling methodology. Thirty additional years of gauge data, new and updated tree ring collections, and improved methodologies now enable a longer and more robust reconstruction of Colorado River streamflow. The purpose of this paper is to describe and analyze a recently generated set of updated streamflow reconstructions for Lees Ferry and other key gauges in the Upper Colorado River Basin.
2. Data and Methods for Reconstructions
2.1. Streamflow Data
 We selected four gauges in the Upper Colorado River basin for reconstruction: the Green River at Green River, Utah; Colorado River near Cisco, Utah; San Juan River near Bluff, Utah; and Colorado River at Lees Ferry, Arizona. The selected gauges represent flows in the three major subbasins as well as the total flow of the Upper Colorado Basin (Figure 1). The U.S. Bureau of Reclamation provided estimates of natural flows for these locations that span the years 1906 to 1995 (J. Prairie, personal communication, 2005). These flow values have been adjusted to account for human impacts through a combination of statistical and expert system approaches, but the records may still include some anthropogenic signals. Water year (October–September) flow data in millions of cubic meters (MCM) were examined graphically and statistically to assess variability, normality and the degree of persistence in the time series (Table 1). The water year flows are essentially normal, and all display a small amount of persistence at a lag of one year. The San Juan represents a considerably more arid region than the other two basins, as evidenced by the lower mean annual flow and higher coefficient of variation. The San Juan is also the only subbasin for which the first-order autocorrelation is not significantly greater than zero.
Table 1. Metadata and Descriptive Statistics of Annual Flows
 In much of the western United States, tree ring widths can provide a proxy for gauge records because the same climatic factors, primarily precipitation and evapotranspiration, control both the growth of moisture-limited trees and processes related to streamflow [Meko et al., 1995]. Recent collections of new tree ring data and efforts to update older collections have produced a set of 62 moisture-sensitive tree ring chronologies in Colorado, southwestern Wyoming, and northeastern Utah that span the common interval from 1600 to 1997 (Figure 1 and Supplementary Data 1 in the online data set athttp://www.ncdc.noaa.gov/paleo/pubs/woodhouse2006/woodhouse2006.html). Of the 62 chronologies, 17 are from ponderosa pine (Pinus ponderosa), 21 from Douglas fir (Pseudotsuga menziesii), 21 from pinyon pine (Pinus edulis), and three from limber pine (Pinus flexilus). Fifteen or more trees were typically sampled at each site using an increment borer and taking two cores from each tree. In the lab, cores were processed, crossdated, and measured using standard dendrochronological techniques [Stokes and Smiley, 1968; Swetnam et al., 1985]. All ring width series were uniformly processed using the ARSTAN program as follows [Cook, 1985]. Measured series were standardized using conservative detrending methods (negative exponential/straight line fit or a cubic spline two thirds the length of the series) before using a robust weighted mean to combine all series into a single site chronology [Cook et al., 1990]. Low-order autocorrelation in the chronologies that may, in part, be attributed to biological factors [Fritts, 1976] was removed, and the resulting residual chronologies were used in most of the subsequent analyses. However, the low-order autocorrelation in the gauge records was closely matched by persistence in the tree ring data. Consequently, the sensitivity to persistence in the tree ring data was tested in the Lees Ferry reconstruction by generating reconstruction models using both the standard (persistence retained) and prewhitened (persistence removed) chronologies. Because the number of series in these chronologies decreases with time, chronologies in the resulting reconstruction models were assessed with regard to subsample signal strength [Wigley et al., 1984].
 Statistical analyses support the high quality and suitability of these chronologies for hydroclimatic reconstructions (Supplementary Data 1). The mean interseries correlation within each chronology averages 0.79, and mean sensitivity (average relative ring width difference from one ring to the next [Fritts, 1976]) averages 0.41. These statistics indicate the strong common signal between the trees that make up each chronology and the high degree of variability in ring widths from one year to the next. Both characteristics are consistent with strong tree ring sensitivity to climatic variability [Cook and Briffa, 1990].
2.3. Reconstruction Approaches
 Multiple linear regression, with predictors entered forward stepwise [Weisberg, 1985], was used to generate the reconstruction models. In an automated process such as stepwise regression, increasing the size of the potential predictor pool also increases the likelihood of a meaningless predictor entering the model by chance alone [Rencher and Pun, 1980]. To assess the sensitivity of the reconstruction to the size and makeup of the predictor pools, two alternative reconstruction approaches were tested for each gauge. First, the “full pool” approach used all chronologies significantly correlated (p < 0.05) with the gauge record as potential predictors. Correlations were evaluated over the entire gauge period (1906–1995) and over both early (1906–1950) and late (1951–1995) sets of years to ensure the stability of the correlation. A second approach, a “watershed-limited” approach, followed the same correlation rules, but the potential predictor set was restricted to chronologies within a 100 kilometer buffer around the watershed upstream from the gauge.
 Reduction of the predictor pool by a watershed boundary constraint was not feasible for the Lees Ferry gauge, as the watershed essentially encompasses all chronologies. The approach taken for that gauge was to reduce the predictor pool by principal components analysis (PCA). After first removing chronologies uncorrelated with Lees Ferry streamflow, a PCA was run on the correlation matrix of the chronologies for their full common period of overlap. Mardia et al. [1979, p. 244] suggest that in a regression context, the components having the largest correlations with the predictand, rather than the components with the largest variances, are best suited for retention. Accordingly, only those components significantly (p<0.05) correlated with streamflow were retained in the pool of potential predictors. The resulting pool has essentially been reduced to concisely express orthogonal modes of common variation in the tree ring data. Because each component is a linear combination of all tree ring chronologies correlated with streamflow, the PCA approach is relatively robust to nonclimatic influences (e.g., disturbance, insect outbreaks) at individual sites. For the Lees Ferry reconstruction, model sensitivity to the use of the standard versus the prewhitened chronologies was tested for both the non-PCA and PCA approaches described above. Validation statistics and features of the reconstructed time series were compared to assess sensitivity of results to the alternative model formulations.
 The strength of the regression models was summarized by the adjusted R2 and F level of the regression equation [Weisberg, 1985]. Possible multicollinearity of predictors was assessed with the variance inflation factor (VIF) [Haan, 2002]. A forward stepwise approach was used to enter predictors from the predictor pools, with threshold F values for entry or removal of predictors. Variables were entered in order of their explained residual variance. As a guide, the F level for a predictor was allowed to have a maximum p value of 0.05 for entry and 0.10 for retention in the equation. Residuals for all regression models were inspected graphically for nonnormality, trend, autocorrelation, and obvious dependence on values of the predictors or predicted flows. Any of these conditions could indicate a need for data transformation. Residuals were tested for normality with the Lilliefors test [Conover, 1980].
 As a safeguard against model overfitting, the entry of predictors was terminated when it resulted in decreased validation accuracy. The reduction of error (RE) [Fritts et al., 1990] and root mean squared error (RMSE) [Weisberg, 1985] were generated using two different calibration/validation schemes. In one scheme, a stepwise model was first fit to the full calibration period, recording the order of entry of predictors. The model was then fit to the first half of the data using the same predetermined order of entry for the predictors, and validated on the second half of the data. The calibration and validation halves were then exchanged and the process repeated. In the other validation scheme, leave-one-out cross validation [Michaelsen, 1987] was used to generate a single validation series. In both schemes, the RE and RMSE were calculated for each step and plotted to assess when the validation scores stopped improving. One last method of validation involved using the predictors selected by the stepwise regression process to run a linear neural network (LNN). LNN is an iterative model fitting process based on statistical bootstrapping techniques that was used here to assess bias in the explained variance. If the relationship between tree growth and climate is robust and stable, the results of LNN and stepwise regression should be equivalent [Goodman, 1996; Woodhouse, 1999].
3.1. Full Pool Stepwise Regression Model Results
 Statistics for the initial full pool stepwise regression results using residual chronologies as predictors are listed in Table 2 in the first three lines under full pool models (subbasins) and the first line under the Lees Ferry models. The regression models all have highly significant F levels, account for between 72% and 81% of the variance of flow, and possess significant skill when applied to cross-validation testing. The predictor pools for the models contain between 24 and 38 chronologies, but the stepwise selection yields four to seven predictor chronologies in the final models.
Table 2. Regression Statistics for Reconstruction Modelsa
Predictors in Pool
Number of Steps/Number of Predictors
Subbasin models based on full pool and watershed-limited pool of potential predictors and statistics for four alternative modeling choices for Lees Ferry record are given. Validation statistics RE and RMSE are based on cross validation.
 The residuals analysis indicated that normality of residuals could not be rejected (Lilliefors test, p < 0.05) for any of the series. Residuals for one gauge, Colorado-Cisco, showed borderline significance of autocorrelation at a 1-year lag. For three of the four gauges, residuals had a significant (p < 0.05) downward trend, suggesting greater tree growth than expected from flow in recent decades. A scatterplot indicated that the variance of residuals increased with the predicted values for the Colorado-Cisco. As neither square-root nor log10 transformation of the flow record offered more than marginal improvement for that gauge, the decision was made to use the untransformed flows. The stepwise validation results indicated that strict adherence to the F enter and F remove criteria did not result in any obvious overfitting of the models. The numbers of steps and predictors chronologies for the four gauges are listed in Table 2. Regression coefficients are given in Supplementary Data 2 (http://www.ncdc.noaa.gov/paleo/pubs/woodhouse2006/woodhouse2006.html). Linear neural networks using the suite of predictors included in the regression equations yielded explained variance values that were the same as those from the regression approaches. An example of the comparison between a gauge record and a reconstruction is shown for Lees Ferry in Figure 2.
3.2. Sensitivity of the Reconstruction Models to Predictor Pool
 The predictor pool sensitivity tests apply only to the gauges on the Colorado-Cisco and San Juan, as the same predictors were selected from both pools for the Green River gauge. Limiting the pool by watershed boundary reduced the number of potential predictors from 38 to 32 chronologies for the Colorado-Cisco and from 24 to 8 chronologies for the San Juan (Supplementary Data 2). Stepwise regression for the Colorado-Cisco and the San Juan gauge yielded two and three predictors, respectively, in the full pool regression equation that were not in the limited pool equation. In these cases, as expected, the explained variance is reduced in the watershed-limited models. To address reconstruction sensitivity, reconstructions based on full pool and limited pools of predictors were compared with attention to critical precalibration periods, such as the well-known drought in the late 16th century [Stockton and Jacoby, 1976; Gray et al., 2004; Stahle et al., 2000]. A comparison of reconstructions for the San Juan and Colorado-Cisco from the two different models indicates only slight differences, particularly during periods of drought (Figure 3). On consideration of calibration/validation accuracy, the relative insensitivity of reconstructions to predictor pool reduction, and ability to reproduce statistical features of the observed record, we decided to adopt the full pool predictor subsets for the final subbasin reconstructions and analysis.
3.3. Sensitivity of the Lees Ferry Reconstruction to Modeling Approaches
 Reconstructions of Lees Ferry streamflow were tested using four different forms of the predictor tree ring data: residual chronologies (Lees-A, described in section 3.1), standard chronologies (Lees-B), principal components of residual chronologies (Lees-C), and principal components of standard chronologies (Lees-D). Exploratory analysis suggested 1490 as a reasonable start year for the reconstructions; of the original 62 chronologies, 31 residual chronologies and 30 standard showed significant correlations with annual streamflow and passed the screening test for time coverage to at least 1490 (Table 2). Stepwise regression on the standard chronologies (Table 2, Lees-B) yielded a reconstruction model with the same number of predictors (7) as for the residual chronology version, and a slight increase in F level and variance explained by regression (see Figure 1 for locations of predictor chronologies and Supplementary Data 2 for regression coefficients).
 The PCA indicated that the residual chronologies have somewhat more spatial structure than the standard chronologies (Supplementary Data 3, http://www.ncdc.noaa.gov/paleo/pubs/woodhouse2006/woodhouse2006.html). PC 1 is by far the most important component, accounting for 47% of the variance of the residual chronologies and 45% of the variance of the standard chronologies. For both sets of data, five PCs have eigenvalues exceeding 1.0, and these PCs account for a cumulative 69% (residual chronologies) and 68% (standard chronologies) of the tree ring variance.
 PC loadings on all chronologies are positive for PC 1 whether the PCA is on residual or standard chronologies. This pattern attests to the strong overriding common signal in tree growth over the Upper Colorado Basin. There appears to be some species dependence, with highest weights on Pinus edulis chronologies. Spatial organization in PCs 2-5 is most obvious for the residual chronologies: maps of loadings (not shown) indicate an east-west contrast in PC 2, a north-south contrast in PC 3, and spatial clustering in PCs 4 and 5.
 The predictor pools, based on significant correlation of PCs with streamflow, were PCs 1, 15 and 16 for the residual chronologies, and PCs 1, 17, 28, and 29 for the standard chronologies. Except for PC 1, a high percentage of tree ring variance accounted for by a PC did not imply strong correlation with streamflow.
 In the stepwise procedure for both the residual and standard chronologies, only PC 1 entered as a predictor of flow (Table 2 and Supplementary Data 2). The final models (Table 2, Lees-C and Lees-D) account for 7–9% less variance of flow than the corresponding non-PCA models but, with just one predictor variable, have considerably higher F levels. Both PCA models verify well as indicated by the high cross validation RE statistics (Table 2). We repeated the PCA regression exercise with predictor pools made up of the PCs 1–5, rather than PCs screened by correlation with flow, and arrived at the same results, a final model with just PC 1 as the predictor.
 Descriptive statistics for the observed flows and the four alternative Lees Ferry reconstructions for the 1906–1995 calibration period are listed in Table 3. For the calibration period, the reconstructed and observed means are forced to be equal by the regression process, and differences in standard deviation simply reflect differences in proportion of variance explained by regression. The skew for all four reconstructions is opposite in sign from that of the observed flows, but given the short sample provided by the calibration period, only the skewness of Lees-C is significantly different from zero at α = 0.05 [Snedecor and Cochran, 1989]. On the basis of Lilliefors test [Conover, 1980] the assumption of normality could not be rejected for any of the four reconstructions (α = 0.05). A large contrast is seen in first-order autocorrelation of the two reconstructions based on residual chronologies versus the reconstructions using standard chronologies. The reconstructions by residual chronologies have essentially no first-order autocorrelation, while the observed flows and the reconstructions by standard chronologies are significantly positively autocorrelated (p < 0.01, one-tailed test).
Table 3. Statistics of Observed and Reconstructed Flow of Colorado River at Lees Ferry for 1906–1995 Calibration Period
Lees-A is reconstruction from residual chronologies, Lees-B is from standard chronologies, Lees-C is from PCs of residual chronologies, and Lees-D is from PCs of standard chronologies. Obs is the observed natural flow record (see text).
Statistics are mean and standard deviation in MCM, skewness, lag 1 autocorrelation.
Normal is defined as mean of observed flows for calibration period 1906–1995.
 Annual observed flows range from 37% to 166% of the 1906–1995 mean. In general, for any reconstruction we expect departures from the calibration period mean to be underestimated due to compression of variance in regression modeling, but in Table 3 the lowest annual flows in all four reconstructions are lower than the lowest observed flow. This unexpected result might be due to the exaggerated negative skew of the reconstructions. In contrast, no reconstructed flow is as high as the highest observed flow. The 5-year running means are as expected, with neither highs nor lows as extreme as in the observed data. As expected when using residual chronologies, the 20-year running means are conservative, and the lows appear to be exaggerated by the reconstructions based on standard chronologies (Table 3).
 The four time series of smoothed full-length (1490–1997) Lees Ferry reconstructions track one another closely (Figure 4). All reconstructions indicate a long-term mean flow below the 1906–1995 observed mean. The long-term reconstructed mean ranges from 94.0% to 96.5% of the observed mean, and so is relatively insensitive to choice of model. If the standard error of an m-year mean of reconstructed values is assumed to be 1/ times the root-mean square error of the annual reconstructed values (Table 2) and the errors are normally distributed, all four reconstructed means are significantly (α = 0.05) different than the observed mean.
 Depending on reconstruction model, the long-term standard deviation is greater than (non-PCA models) or less than (PCA models) the standard deviation of observed flows (Table 4). If climate were equally variable before and during the calibration period, compression of the variance in regression would tend to yield a long-term reconstruction with lower variance than that of the observed flows. The greater standard deviation for the non-PCA models implies more variable climate before the start of the calibration period than after. All four reconstructions are negatively skewed, but the assumption of zero skew can be rejected (p < 0.05, N = 508) only for the reconstructions from the residual chronologies (Table 4).
Table 4. Statistics of Reconstructed Flow of Colorado River at Lees Ferry, 1490-1997, and Observed Flow, 1906–1995a
 Differences in first-order autocorrelation among models were noted for the 1906–1995 calibration period (Table 3), and those differences also apply to the long-term reconstructions (Table 4). A comparison of first-order autocorrelations of reconstructed data for the full reconstruction and the calibration period suggests the autocorrelation in the calibration period is representative of the long-term record. It is also evident, however, that the autocorrelation of the reconstructed flows from residual chronologies is biased low relative to that of the observed flows (Table 3). The impact of the disparity in first-order autocorrelations for model Lees-A was investigated by restoring the persistence to the reconstructed flow with an autoregressive model, and comparing the original reconstruction with the persistence-restored reconstruction. The two series were extremely similar, and although 2-year droughts were slightly less common and three-year slightly more common in the persistence-restored reconstruction, there were no distinct differences for longer droughts. The reconstructions from standard chronologies more accurately reflect the first-order autocorrelation of the observed record (Table 3). Model Lees-D is perhaps strongest in this regard because the reconstructed flows are slightly more autocorrelated than the observed flows. This is reasonable because the reconstruction errors are assumed to not be autocorrelated.
 Extreme n-year running means are quite similar for the alternative Lees Ferry reconstructions, but somewhat more extreme for the reconstructions using the standard chronologies (Table 4). Regardless of model, the lowest 1-year, 5-year and 20-year means for the full reconstructions are much below those in the observed flows. The lowest reconstructed 20-year means for all models are in the late 1500s (Figure 4; note that this drought is somewhat more severe in the standard chronology PCA model). In the standard chronology models, the highest reconstructed n-year means exceed those in the observed record, with the exception of 5-year means. Smoothed time series of the four reconstructions are in agreement in the exceptional wetness of the early 1900s (Figure 4). The implication is that a period of such sustained wetness had not occurred since the start of the 1600s.
 In summary, the above comparison shows that key features of the updated flow reconstructions for Lees Ferry are fairly robust to modeling choices. The models using the standard chronologies appear to more closely match the persistence in the gauge record, and the non-PCA version using standard chronologies (Lees-B) has the greatest calibration period accuracy as measured by regression R2. On the other hand, the models based on standard chronologies overestimate the severity of multidecadal droughts (20-year means) in the calibration period, which is worrisome considering that the regression procedure itself tends to compress reconstructed values toward the calibration period mean. Smoothed time series plots (Figure 4) suggest the PCA reconstruction on standard chronologies (Lees-D) is somewhat of an outlier, and gives a worst-case scenario for the severity of extended droughts and wet periods. In view of the fact that the subbasin gauges were reconstructed using the residual chronologies and a non-PCA approach, for consistency of analysis we used the reconstruction version Lees-A as the baseline record in the subbasin analysis that follows.
3.4. Spatial Fidelity Among Gauges and Reconstructions
 The relationship between gauges within the Upper Colorado River basin, and how those relationships were preserved in the reconstructions, was evaluated by examining the shared variance between the set of gauge records and the set of reconstructions. Spatial relationships were then examined with regard to the magnitudes of flow from the subbasins and their relationship to the total Colorado River flow at Lees Ferry.
 In the gauge records, all flows are highly correlated (r > 0.77) except between the San Juan and the Green (r = 0.55), the most widely separated gauges (Table 5a). In the reconstructed flow records for the same time period (1906–1995), the same relationships are preserved between the Green, Colorado-Cisco, and Lees Ferry reconstructions. Correlations between the San Juan and the other reconstructions are somewhat inflated, particularly between the Green and San Juan (Table 5a). The relationships for the full reconstructions are quite similar to those for the period of the gauge records (Table 5b). The greater shared variance between the San Juan reconstruction and the other reconstructions, compared to the relationships in the gauge record, may be due to an absence of tree ring chronologies located within the San Juan River basin. The result of this may be a weaker representation of local basin variability.
Table 5a. Interbasin Correlations of Observed and Reconstructed Flows: Correlation Matrices of Observed and Reconstruction Flow for the Period 1906–1995a
GRUT, Green River at Green River; COCI, Colorado River near Cisco; SJBL, San Juan near Bluff; Lees, Colorado at Lees Ferry.
Table 5b. Interbasin Correlations of Observed and Reconstructed Flows: Correlation Matrix of Reconstructed Flows for the Period 1569–1997a
GRUT, Green River at Green River; COCI, Colorado River near Cisco; SJBL, San Juan near Bluff; Lees, Colorado at Lees Ferry.
 The observed flows from the three subbasins gauges, Green River, Colorado-Cisco and the San Juan account for nearly all (average of 95.5%) of the total water year flow observed at Lees Ferry from 1906–1995 (Table 6). Over the same years, the average values of contributed flows in the reconstructions are closely matched, as expected due to the regression process. Over the full common reconstruction period, 1569–1997, contributions are also very similar (Table 6). Figure 5 shows the variations in flow at the four gauges and the sum of the three subbasin flows over the full reconstruction period as 5-year running averages. The match between the three gauge sum and the Lees Ferry flow is good (r = 0.98, p < 0.001), though there are several periods when the sum appears to be somewhat less than Lees Ferry total flow (e.g., the 1630s and the last quarter of the 1600s, both periods of higher flows).
Table 6. Percentage of Observed and Reconstructed Annual Flow at Lees Ferry Contributed by Subbasins
3 Gauge Total
4. Long-Term Hydroclimatic Variability in the Upper Colorado River Basin
4.1. Frequency Characteristics of Reconstructed Flows
 We used a multitaper method (MTM) spectral analysis to examine the frequency characteristics of reconstructed flows at Lees Ferry and the three subbasins gauges [Mann and Lees, 1996]. MTM provides a robust means for isolating signal peaks from a time series that may contain both periodic and aperiodic behavior. The MTM spectrum for the Lees Ferry reconstruction (Figure 6a) shows that significant (p < 0.05) high-frequency variability in Upper Colorado River flows (2–7 years) is accompanied by a strong bidecadal peak centered around ∼24 years. MTM also identifies a significant multidecadal peak around 64 years. Peaks similar to those in the two to seven year band at Lees Ferry are also present in the spectra for each subbasin (Figures 6b–6d). All of the subbasin reconstructions show significant bidecadal peaks, though relative power is reduced for the Green River gauge. The reconstructions for both the Colorado-Cisco and the San Juan show strong multidecadal peaks centered on ∼64 years. Cross-spectral MTM reveals significant coherency across the subbasins at lower frequencies (Figure 7). Coherency and phasing of bidecadal and multidecadal peaks is particularly strong.
 The wavelet spectra for each of these reconstructed gauge records further highlights their coherence in the frequency domain (Figure 8). Wavelet analysis also shows marked nonstationarity in the strength of these signals through time. In particular each of the wavelet spectra are characterized by multidecadal variability (30–70 year) in the first two centuries followed by a period from the 18th through mid-19th centuries dominated by significant energy in the decadal to bidecadal bands. Beginning in the late 19th century, however, we see a return to significant multidecadal variability. These lower-frequency modes persist until the late 20th century, when the effects of zero padding likely reduce power in the multidecadal bands [Torrence and Compo, 1998].
4.2. Basin-Scale Flow Variability
 The Lees Ferry and subbasin streamflow reconstructions enable an examination of the spatial characteristics of long-term drought variability in the upper Colorado River basin. We first compared 5-year, 10-year, and 20-year averages of streamflow in the Lees Ferry reconstruction with averaged flows in the three subbasins to determine the degree of drought variability across the upper Colorado River basin. In general, there is a strong tendency for extreme low flows at Lees Ferry to be matched by extreme low flows in all three of the subbasins. Of the driest 5-year periods (lowest 15% of flows) in the Lees Ferry reconstruction, none ranked above the driest tercile in the Colorado-Cisco reconstruction, which accounts for the greatest proportion of Lees Ferry flow. Two of driest Lees Ferry flow periods ranked in the middle tercile in the Green River reconstruction record (1728–1732, 1628–1632). There were nine periods in the San Juan reconstruction that fell within the middle tercile that were dry periods in the Lees Ferry record. Four of these periods occurred in the 1580s and 1590s, which is known regionally as an extreme drought throughout the western United States [e.g., Stahle et al., 2000]. While there were some extremely dry years in the San Juan reconstruction over this period (e.g., 1590), this period was also marked by several wet years (e.g., 1589, 1595, 1599).
 Important regional variations do exist within extreme dry periods (Table 7). Rankings of 5-year averages show that the driest 5-year period in the Lees Ferry record, 1844–1848, was extremely dry in the Green and Colorado-Cisco records (driest and third driest, respectively), but was somewhat less extreme in the San Juan (17th driest). The second most extreme 5-year low-flow period in the Lees record, 1622–1626, was similarly dry in the Colorado-Cisco and San Juan records (second driest and driest, respectively), but to a much lesser extent in the Green (63rd driest). Regional variability in extreme low flows is also evident over longer timescales. The period 1622–1631 was the driest 10-year period in the Lees Ferry reconstruction. As in the 5-year periods, low flows in 1620s are less extreme in the Green River record, but are markedly low in both the San Juan and Colorado-Cisco records (ranks 71st, third, and sixth, respectively). In contrast, the Green River appears to be most strongly impacted by decadal-scale droughts in the 1870s and 1880s. As suggested above, the San Juan appears to be less sensitive to the low flows in the 1580s and 1590s, and this is evident at both 10-year and 20-year timescales. The 20-year period ending in 1592 is the driest such period in the Lees Ferry and Colorado-Cisco reconstructions, and the sixth driest in the Green reconstruction, but it was the 48th driest period in the San Juan reconstruction.
Table 7. Ranked Subbasin Flows During Lowest 5-year, 10-year, and 20-year Moving Averages of Reconstructed Flow at Lees Ferrya
The ending years of the 10 lowest-flow periods at Lees Ferry are listed under “5 year,” “10 year,” and “20 year.” Subbasin gauges are Green River at Green River (GR), Colorado near Cisco (CC), and San Juan near Bluff (SJ).
 Regional drought variability was also examined in the context of its impacts on Lees Ferry flows. Rankings for 10-year moving averages of flow in the three subbasins were divided into terciles. Periods when the value for one basin fell in the dry tercile while flow in another basin fell into the wet tercile, were tabulated (Table 8). Again, droughts tend be widespread, affecting, to some degree, all three subbasins simultaneously. However, in 15 of these 10-year periods, contrasting conditions exist between two basins. Most commonly (eight periods), high flows in the San Juan reconstruction coincide with low flows in the Green reconstruction. Dry conditions in the San Juan and wet in the Green are far less common (three periods). In two periods, the Green is dry while the Colorado-Cisco is wet, and there is one case each when the San Juan is wet and Colorado-Cisco dry and vice versa. The contrasting conditions in the pairs of subbasins appear to balance each other with respect to Lees Ferry flow for the most part, with Lees Ferry flow for these periods most often falling in the middle tercile. However, in four of the eight periods when Green River is low and the San Juan river is high, flow at Lees Ferry is in the driest tercile, and except for one of the eight periods, Lees Ferry flow is lower than the median. This suggests that low-flow conditions in the Green River can override wet conditions in the San Juan, and to some extent, moderate conditions in the Colorado-Cisco, to influence Lees Ferry flows. Greater sensitivity of Lees Ferry flow to the Green than to the San Juan is of course expected, given the much larger percentage of flow contribution from the Green (Tables 1 and 6).
Table 8. Terciles and Percentiles of 10-year Moving Average Reconstructed Flow at Lees Ferry During Periods of Contrasting Flow Anomalies in Subbasinsa
Lees Flow Tercile
Lees Flow Percentile
Year listed is last of 10.
Green dry/San Juan wet
Green wet/San Juan dry
Green dry/Colorado wet
San Juan wet/Colorado dry 1859
San Juan dry/Colorado wet 1820
 As shown in the spectral analysis (section 4.1), streamflow at Lees Ferry and the three subbasins also varies significantly over multidecadal timescales. To highlight this lower-frequency variability, each of the reconstructions was smoothed with a 50-year cubic spline (Figure 9). The smoothed time series display a pattern of high magnitude variations in the 16th and 17th centuries and the 19th and 20th centuries, with dampened variability centered on the 18th century. The driest multidecadal period in the Lees Ferry reconstruction occurs in the late 16th century. The low-flow period at the end of the 19th century shares a similar magnitude. In this multidecadal context the 1950s drought is also notable as the 4th lowest flow period at Lees Ferry. Generally high-flow regimes occurred across the basin in the early 17th and early 20th centuries. The most recent decades of the reconstruction were also quite wet. As in the case of the multiyear and decadal flow regimes discussed above, the magnitude of departures for these multidecadal flow regimes varies somewhat across the basin. This is particularly true for the early 1700s through the mid 1800s, which is the period when the wavelet analyses show a significant loss of multidecadal power in the basin. However, the timing and duration of multidecadal flow regimes is markedly coherent across the Upper Colorado River Basin.
4.3. Comparison with Previous Lees Ferry Reconstructions
 Because of the central importance of the Lees Ferry record to the allocation of Colorado River water supply, it is important that the reconstruction be as accurate as possible, and that the uncertainty be appreciated. The discussion in section 3.3 dealt with uncertainty due to modeling choices: the use of standard versus residual chronologies and the decision to use individual chronologies or chronologies reduced by PCA in the regressions. Previous reconstruction efforts [Stockton and Jacoby, 1976; Hidalgo et al., 2000] not only used different modeling procedures from ours, but also a different tree ring network and a much shorter calibration period. In this section we compare our updated reconstructions, versions Lees-A and Lees-D, with reconstructions by Stockton and Jacoby  and Hidalgo et al. . We refer to these two previous reconstructions as SJ1976 and HDP2000. The comparison focuses on two statistics: the long-term mean annual flow, and the most severe sustained drought as measured by the lowest reconstructed 20-year moving average of flow. Lees-A is our model using regression of flow on residual chronologies. Lees-D is our model using regression of flow on PCs of standard chronologies. Those two versions were selected for the comparison because they represent the most conservative (wettest) and least conservative (driest) of the alternative reconstructions from the updated chronologies (see section 3.3).
 Time series plots of smoothed reconstructions (Figure 10) generally agree in timing of highs and lows, but disagree considerably on the magnitude of some flow anomalies. The plots for the updated reconstructions generally show wetter conditions than the previous reconstructions. HDP2000 represents the driest scenario, with greatly amplified low-flow features in the late 1500s, late 1700s and near 1900. Much less disagreement among the four reconstructions is evident in the calibration period than in the precalibration period.
 Selected calibration and reconstruction statistics for the four models are listed in Table 9. Flow statistics are given in units of both billion cubic meters (BCM) and million acre-feet (MAF) to facilitate comparison with previous published studies. Note that the reconstructions differ considerably in calibration period as well as in the number of tree ring chronologies on which the final reconstructions depend. Agreement of the reconstructions in the calibration period (Figure 10) is not surprising as all four models have high R2 values (Table 9). Perhaps the most striking disagreement in the models is the magnitude of the late 1500s drought (the period of the lowest 20-year mean), which is estimated at 11.2 BCM (9.1 MAF) by HDP2000 and 15.6 BCM (12.6 MAF) by Lees-A. The updated reconstructions suggest the long-term mean annual flow is not as low as previously estimated. Our driest updated reconstruction model (Lees-D) gives a long-term mean of 17.6 BCM (14.3 MAF), which is some 0.9 BCM (0.8 MAF) higher than the original estimate by Stockton and Jacoby .
Table 9. Comparative Statistics of Lees Ferry Reconstructions
Calibration period, number of contributing chronologies, and proportion of variance explained by regression model.
Statistics of long-term reconstruction, expressed in units of billion cubic meters and million acre-feet along with 95% confidence interval estimated from the cross-validation root-mean square error (see text); statistics for common period 1520–1961.
SJ1976 is a mean of two reconstructions with R2 values of 0.78 and 0.87 [Stockton and Jacoby, 1976]; no cross-validation was performed for these models.
 Differences in the reconstructions are undoubtedly related to differences in the basic data and the statistical models used for reconstruction. The most obvious data difference between this and past efforts would be that different chronologies were used as predictors. Two of Stockton and Jacoby's  original sites were recollected, but it is not evident that any of the same trees were sampled. Gauge data were different as well, and both the tree ring data and gauge data in previous efforts resulted in a calibration period nearly half the length of the calibration period used in this study. Differences could also result from data processing and decisions in detrending the raw ring widths. SJ1976 and HDP2000 used standard chronologies, and models with PCs of lagged chronologies as predictors. Over the common period 1906–1961, the SJ1976 reconstruction showed a lag 1 autocorrelation of 0.36 and HDP2000 0.41, which are similar to the 0.22 and 0.31 for the models from this study that were based on standard chronologies. All of these lag 1 values are also consistent with values for the gauge record (0.25). The inclusion of lagged predictors may have the effect of enhancing the persistence in the extreme low flow years. The reliance on just seven tree ring chronologies to sample the runoff variations over the entire Upper Colorado River Basin, as with updated reconstruction Lees-A, might also present a case of potential undersampling of the watershed. However, we note that the Lees-A is closely tracked by Lees-D, a PC-based reconstruction with weights on 31 chronologies distributed over the watershed (Figure 4).
 We can rule out the choice of calibration period as a major source of differences among the reconstructions; recalibrating our model Lees-A on 1914–1961 (following SJ1976) instead of 1906–1995 did not appreciably affect the inferred magnitudes of past droughts. The accuracy of the naturalized flow values is clearly important to the estimated severity of reconstructed droughts: SJ1976 reported important differences in drought severity and in long-term mean reconstructed flow depending on the version of the natural flow record (several existed at that time) used to calibrate their reconstruction model.
5. Recent Drought (2000–2004) in a Multicentury Perspective
 To assess the long-term standing of the most recent drought on the Colorado River, the observed natural flows at Lees Ferry averaged over the heart of the recent drought (water years 2000–2004) can be compared with 5-year running means of the Lees Ferry reconstruction. Because of the unexplained variance in the regression however, we must allow for the possibility that the true 5-year mean for any reconstructed period may have been lower than the reconstructed 5-year mean. For this assessment, error bars were placed on the reconstructed 5-year running means. The standard error of a 5-year mean was estimated as sm = RMSEcv/, where RMSEcv is the cross-validation root-mean-square error of the annual reconstructed values. The computed standard error and the assumption that the errors are normally distributed yield confidence intervals and threshold levels of reconstructed 5-year mean flow with specific empirical nonexceedance probabilities.
 The reconstructed (Lees-A) 5-year means for Lees Ferry along with the threshold levels of flow with 0.25 and 0.10 nonexceedance probability are plotted in Figure 11 with the observed 1999–2004 mean as a baseline (“Lowest Observed”) for comparison. The 5-year mean for 1999–2004 was 12,187 MCM, or 64.9% of the 1906–1995 mean natural flow. The time series plots indicate that only one 5-year period, 1844–1848, was drier than 1999–2004 (Figure 11a). Annual reconstructed flow during this period averaged 63% of normal.
 A probabilistic interpretation of the reconstruction indicates, however, that several other periods have a reasonably large probability of being drier than 1999–2004. Two additional periods, in the early 1500s and early 1600s, have a 25% or greater chance of being as dry as 1999–2004 (Figure 11b). Six periods in addition to the 1840s have a 10% or greater chance of being drier than 1999–2004 (Figure 11c). During the signature drought of 1844–1848, the probability is 10% that the true 5-year mean flow was as low as 54.8% of normal (10,290 MCM or 8.3422 MAF). It should be emphasized that Lees-A is the most conservative (wettest) of our Lees Ferry reconstructions, and that other versions give even more frequent past occurrences of flow lower than in 1999–2004.
6. Discussion and Conclusions
6.1. Updated Reconstructions
 An updated and expanded set of tree ring chronologies has enabled the generation of high-quality water year streamflow reconstructions for four key gauges in the Upper Colorado River basin; the Green River at Green River, UT; Colorado River near Cisco; San Juan near Bluff; and the full Upper Colorado River at Lees Ferry (available online at http://www.ncdc.noaa.gov/paleo/pubs/woodhouse2006/woodhouse2006.html). These reconstructions span the common years 1569 to 1997, and account for more than 70% of the variance in the gauge records. On the basis of the extensive sensitivity analyses, differences in predictor pools and data reduction methods had little significant impact on important features (e.g., long-term mean, runs of drought years, etc.) of the reconstructions. The use of standard versus prewhitened chronologies does have some impact on the magnitude of reconstructed high and low flows, and the standard chronology models retain a degree of low-order autocorrelation similar to that in the gauge record.
 The Lees Ferry reconstructions presented here differ from the efforts of Stockton and Jacoby  and Hidalgo et al.  in suggesting a higher long-term mean for Upper Colorado River flows, and to some degree, less extreme multiyear droughts. While the choice of predictor pools and calibration data sets may factor into these differences, statistical reconstruction methodology, particularly the treatment of autocorrelation, also contributes to reduced drought magnitude and an increased long-term mean.
 Spatially, the relationships between reconstructed subbasin flows are similar to those in the gauge records, except for the San Juan reconstruction, which is somewhat more highly correlated with the other gauge reconstructions over the instrumental period. This enhanced similarity is lessened over the full reconstruction period. It is possible that the higher correlation between the San Juan and other basins is due to the lack of tree ring chronologies actually located in the San Juan River basin. However, exploratory analyses using several recently generated tree ring chronologies in the San Juan basin did not change these results (C. Woodhouse, unpublished). The reconstructions also capture the contribution of subbasin flows to total Colorado River flow at Lees Ferry. The subbasin flows together account for about 96% of upper Colorado River flow and contributions from the three basins are relatively stable over the 431-year common period.
 As seen in the comparisons of the Lees Ferry and subbasin reconstructions, over the past four centuries severe multiyear and decadal-scale droughts in the upper Colorado River basin have tended to be widespread events. The most severe 5-, 10- and 20-year droughts recorded at Lees Ferry are always reflected in the subbasin gauges, although there are subregional differences in the magnitude of droughts. When the influence of subbasin conditions on Lees Ferry flow is examined, most periods of low flow in one subbasin coincide with low flows in the other subbasins. There are some exceptions, in particular when flow in the Green River is low and the San Juan flow is high. In most of these periods of contrasting drought conditions, Lees Ferry flows are average, but a few cases (e.g., the 1930s) suggest that drought in the Green River can have an overriding influence on flows at Lees Ferry, even when high flows prevail on the San Juan. Likewise multidecadal flow regimes tend to be strongly coherent across the basin.
 Again, the magnitude of these persistent high and low-flow events varies across the basin, but the timing and duration of these regimes is consistent among the reconstructions.
6.2. Upper Colorado River Droughts and Possible Climatic Drivers
 The coherency of many single and multiyear droughts across the reconstructions points to common drivers for high-frequency variations in regional hydroclimate. Spectral analysis of the Lees Ferry reconstruction (Figure 6) shows significant variability in a three to seven year band associated with the El Niño Southern Oscillation (ENSO) [Cayan et al., 1999]. Similar high-frequency peaks exist in the subbasin reconstructions. Examination of gauged values and ENSO indicates a good correspondence between La Niña events and low flows on the San Juan, but the relationship is less clear in the other gauges. This agrees with Cayan and Webb  who found that streamflow in the southwestern part of Colorado typically shares the strong southwestern United States response to ENSO (i.e., increased winter precipitation during El Niño events), while the response is much weaker at gauges north of this region, and Hidalgo and Dracup  who reported the ENSO response is much weaker in the Colorado Headwaters and Upper Green River areas.
 Coherency between flows at multidecadal and longer timescales also suggests that remote forcing or region-wide circulation features influence lower-frequency variations in the Upper Colorado River. Although statistical associations have been demonstrated between North American drought and North Atlantic [Enfield et al., 2001], North Pacific [Cayan et al., 1998; McCabe et al., 2004] and Indian Ocean [Hoerling and Kumar, 2003] variability, more research is needed to understand how slow changes in sea surface temperatures are tied to Upper Colorado River flow regimes.
 Overall, intrabasin variations in reconstructed drought magnitude, combined with spectral analyses suggesting variability over a broad range of timescales (interannual to multidecadal), indicate complex and possibly nonstationary linkages between the Upper Colorado River and regional to remote forcings. Independent proxy data for ocean variability (i.e., not from western North American tree rings) and modeling studies are needed to better examine the long-term relationships between Colorado River flows and potential climatic drivers.
6.3. Implications for Management
 The recent drought has been a wake-up call for many water management agencies throughout the Colorado River basin. This drought (2000–2004), as measured by 5-year running means of water year total flow at Lees Ferry, is a markedly severe event in the context of the tree ring reconstruction extending to 1490, and the probability is low (p < 0.10) that any 5-year period since 1850 has been as dry. However, the current drought is not without precedence in the tree ring record. Average reconstructed annual flow for the period 1844–1848 was lower than the observed flow for 1999–2004. In view of reconstruction error, it is helpful to evaluate tree ring reconstructions probabilistically, and such an evaluation suggests that eight periods between 1536 and 1850 had at least a 10% probability of being as dry as 1999–2004. In addition, longer duration droughts have occurred in the past. The Lees Ferry reconstruction contains one sequence each of six, eight, and eleven consecutive years with flows below the 1906–1995 average (1663–1668, 1776–1783, and 1873–1883). Overall, these analyses demonstrate that severe, sustained droughts are a defining feature of Upper Colorado River hydroclimate. Flows in the Upper Colorado are also shown to be nonstationary over decadal and longer timescales, making short-term records inappropriate for most planning and forecast applications.
 Although our results differ in some respects from those of Stockton and Jacoby , the underlying messages are the same. The long-term perspective provided by tree ring reconstructions points to looming conflict between water demand and supply in the upper Colorado River basin. This suggestion has even greater relevance today. Demands on the Colorado River over the past decades have risen to meet or exceed average water availability. Any variations or shifts in climate can have a significant impact on the system [Harding et al., 1995; Christensen et al., 2004]. The sensitivity of the Colorado River system became abundantly clear with the onset of the recent drought. Though the southern portion of the Upper Colorado, as well as many areas in the Lower Basin, gained a measure of drought relief in the winter of 2004–2005, major reservoirs on the Colorado River remained far below capacity in 2005. In the future, predicted climatic changes, including a shift in the ratio of snowfall to rainfall and earlier snowmelt and runoff [Cayan et al., 2001; Stewart et al., 2004], will likely compound the strain on water resources throughout the entire Colorado River Basin.
 Many such climatic changes may have already begun in the western United States [Mote et al., 2005], and rising temperatures will also increase demands for irrigation and hydropower generation. Proxy reconstructions can aid in planning for these scenarios by providing insights into the range of natural variability and a means to explore extreme climatic events and persistent climatic changes that are poorly captured in observational records. Reconstructions of annual streamflow for large rivers are particularly useful in that they integrate climatic variability over large regions, provide essential data for water managers, and complement existing reconstructions of seasonal climate variability [e.g., Cook et al., 2004]. In concert with information on projected future changes, information on long-term variability must guide planning for drought management and economic development in the basin if we are to adequately face the social, legal and environmental challenges that coming decades will undoubtedly present.
 S. T. Gray was funded by the U.S. Geological Survey and Wyoming Water Development Commission. D. M. Meko was funded by a grant from the Arizona Board of Regents Technology and Research Initiative Fund. C. A. Woodhouse received funding from the NOAA Office of Global Programs Climate Change Data and Detection program (grant GC02-046). We greatly appreciate the comments of Edward Cook and two anonymous reviewers. We also thank Jeff Lukas, Mark Losleben, Margot Kaye, Gary Bolton, Kurt Chowanski, Stephen Jackson, Julio Betancourt, and R.G. Eddy for field and laboratory assistance in tree ring chronology data collections and chronology development and James Prairie (USBR) for providing the estimates of natural flow for Colorado River basin gauges used in the calibrations.