Journal of Geophysical Research: Atmospheres

Detecting inhomogeneities in the Twentieth Century Reanalysis over the central United States

Authors


Abstract

[1] The Twentieth Century Reanalysis (20CR), which spans the 138 year period from 1871 to 2008, was intended for a variety of climate applications, including long-term trend assessment. Because over land 20CR only assimilates surface pressure observations and their count increases by an order of magnitude over the course of the record, a key question is whether the 20CR is homogenous and hence suitable for detecting climate-related changes. We use three statistical methods (Pettitt and Bai-Perron tests and segmented regression) to detect abrupt shifts in multiple hydrometeorological variable mean and uncertainty fields over the central United States. For surface air temperature and precipitation, we use the Climate Research Unit (CRU) time series data set for comparison. We find that for warm-season months, there is a consensus change point among all variables between 1940 and 1950, which is not substantiated by the CRU record. While we cannot say with certainty that these shifts in the 20CR analysis fields are artificial, our statistical analyses, coupled with a visual inspection of the underlying assimilated observational count time series, strongly point to this conclusion. Our recommendation is therefore for users to restrict climate trend applications over the central United States to the second half century of the 20CR record, after observational density has stabilized.

1. Introduction

[2] A number of recent modeling studies suggest that, as our climate warms, the global hydrologic cycle will, and perhaps is, accelerating (i.e., increased precipitation, runoff, and evapotranspiration [Huntington, 2006]). For some regions, near-past observed trends are consistent with acceleration [e.g., Dery et al., 2009; Intergovernmental Panel on Climate Change, 2007; Labat et al., 2004]. Over the continental United States and Australia, however, there is empirical evidence to the contrary, indicating that the cycle is decelerating owing to a decrease (i.e., stilling) in global mean winds [McVicar et al., 2008; Pryor et al., 2009], or because a moisture constraint to energy- and temperature-driven acceleration has recently been met [Jung et al., 2010]. Our capacity to mitigate the socioeconomic impacts of climate change on the hydrologic cycle hinges on our ability (1) to diagnose the sign and magnitude of (historical) trends in hydrologic fluxes at local-to-regional scale, (2) to attribute these shifts, whether slowly varying or abrupt, to key processes, feedback mechanisms, and observable quantities, and ultimately, (3) to tie regional changes into global perspective (i.e., teleconnections). Having long-term comprehensive and consistent observationally based records of the global surface fluxes and multilevel atmospheric state that are spatially and temporally continuous is critical at all levels of understanding.

[3] Global atmospheric reanalyses such as NASA's Modern Era Retrospective-analysis for Research and Applications (MERRA) [Rienecker et al., 2008, 2011] and the Japanese 25 year Reanalysis (JRA-25) [Onogi et al., 2005, 2007] provide the only such record of the climate system suitable for examining the causality of changes in a variety of complementary variables. Unfortunately, their application in long-term trend assessment is precluded by two key shortcomings: (1) they suffer from observational “shocks,” or unphysical time-varying biases associated with an evolving observational system (e.g., the introduction of Special Sensor Microwave Imager (SSM/I) in July 1987 and Advanced Microwave Sounding Unit (AMSU-A) in November 1998 [Bosilovich et al., 2008; Robertson et al., 2011]), and (2) their temporal coverage (1979 to present) is insufficient for separating abrupt artificial shifts from natural climate variations [Thorne and Vose, 2010].

[4] The Twentieth Century Reanalysis Project (20CR) [Compo et al., 2011] seeks to address both of the above limitations (data inhomogeneity and short record) through the assimilation of surface synoptic observations only (surface and sea level pressure data; monthly sea surface temperature and sea ice extent also used to prescribe atmospheric boundary conditions), and extension of the temporal coverage to 1871. Unlike other reanalyses, the 20CR is intended for use in long-term climate applications “ranging from assessments of storm track and extreme event variations to studies on drought and decadal variability to investigations into meteorological history” [Compo et al., 2011]. A unique attribute of the 20CR is that it provides quantitative uncertainty estimates (i.e., the ensemble spread) for each variable field at every time step. These uncertainty estimates are a function of atmospheric dynamics, forecast model errors, and observational errors (e.g., assimilated observation distribution and density). These complementary data represent a key source of information to separate climate-related changes from nonclimate related artifacts.

[5] In this study, we address two key questions: (1) Can the 20CR be used for the assessment of changes in different hydrometeorological variables over the 20th century? (2) What would an experimental framework for differentiating between artificial and true climate signals comprise? We answer the former through applying the latter.

2. Study Region and Data

2.1. Study Region

[6] We focus on a 15° × 15° region (30°–45°N, 105°–90°W) in the central United States where improved understanding of 20th century hydroclimatological trends and variability is a top priority to both scientific and economic stakeholders. The region comprises more than half of the total harvested area of principal crops in the United States [U.S. Department of Agriculture, 2011], a large portion of which is sustained through groundwater inputs from the Ogallala Aquifer. Unlike much of the global land area, the central United States experienced a 0.2–0.8°C decrease in summer (June, July, August) daily mean temperature during the period 1976–2000 [Folland et al., 2001, Figure 2.10]. Attribution of the so-called surface “warming hole” remains an open question [Kunkel et al., 2006], but is likely the result of changes in the atmospheric circulation (stronger and more frequent Great Plains low-level jets) and associated changes in soil wetness and surface energy balance [Pan et al., 2004], amplified by large-scale irrigation effects [DeAngelis et al., 2010]. The domain also encompasses one of only a few Global Energy and Water Cycle Experiment (GEWEX) Global Land-Atmosphere Coupling Experiment (GLACE) [Koster et al., 2006] land-atmosphere interaction “hot spots,” where according to several atmospheric general circulation models, soil moisture anomalies have the greatest impact on subsequent precipitation. Accordingly, the region represents an exemplary case study for regional climate change, land-atmosphere interaction (coupling) and its sensitivity to climate change, the sensitivity (masking) of climate change to (by) large-scale groundwater usage, as well as the sensitivity of agricultural productivity to hydrologic variability. Given that the United States leads the world in food exports [Konar et al., 2011], any fluctuations in the region's agricultural productivity could have reverberating consequences globally.

2.2. Twentieth Century Reanalysis

[7] The 20CR version 2 [Compo et al., 2011] is a 2.0° horizontal, 24-layer vertical, 6-hourly global reanalysis covering the period of 1871–2008. The 20CR is remarkable in that it is the first reanalysis to span the entire 20th century and assimilates the following surface observational data only: surface and sea level pressure observations every 6 h from the International Surface Pressure Databank (ISPDv2.2.4) and monthly sea surface temperature (SST) and sea ice concentration fields from the Hadley Centre Sea Ice and SST data set [HadISSTv1.1; Rayner et al., 2003]. It was motivated by the need for a reanalysis with minimal influence from observational shocks, such as those caused by short or intermittent records and changes in observational technology (i.e., the introduction of radiosondes in the 1940s and Earth observing satellites in the 1970s), with quantified random errors in the fields at each time step. The 20CR uses a deterministic ensemble Kalman filter (EKF) as described by Compo et al. [2011] on the basis of the ensemble square-root filter algorithm of Whitaker and Hamill [2002]. Background first guess fields are obtained from a short-term forecast ensemble run in parallel, of fifty-six 9 h integrations of the April 2008 experimental version of the NCEP GFS [Kanamitsu et al., 1991; Moorthi et al., 2001; Saha et al., 2006], initiated with the previous 6 h analysis. The GFS is coupled with the four-layer NOAH v2.7 land surface model [Ek et al., 2003] and run at a horizontal resolution of T62 (192 × 94 Gaussian longitude/latitude), vertical resolution of 28 hybrid sigma-pressure levels, and temporal resolution of 3 h. For each time iteration of the assimilation (6-hourly) and forecast (3-hourly) systems, the ensemble mean and spread are recorded for all variable fields.

[8] We analyze the monthly record of six primary land surface hydrological variables: 2 m air temperature (Ta), surface runoff (Q), precipitation (P), surface latent heat flux (LE), surface sensible heat flux (H), total column precipitable water vapor/ice (PWV); and three derived metrics descriptive of land-atmosphere interactions: lifting condensation level (LCL), low-level humidity index (HI), and convective triggering potential (CTP) [e.g., Betts, 2009; Ferguson and Wood, 2011]. We use the 3 h forecast (first guess) fields, except in the case of HI and CTP, which are derived from the analysis (i.e., surface pressure and multilevel fields of geopotential height, specific humidity, and air temperature). For the primary variables, we analyze the associated monthly time average of the 3-hourly uncertainty fields as well. LCL, HI, and CTP are not computed by 20CR directly and thus, no estimate of uncertainty is available. We use the monthly assimilated observation counts, available on a 5° × 5° geographic grid, to visually confirm correspondence between assimilated observation count and uncertainty.

2.3. Climate Research Unit Data

[9] The Climate Research Unit (CRU) time series data set version 3.1 (TS3.1) is a 0.5° gridded record of monthly surface climate (precipitation, mean temperature, diurnal temperature range, and other secondary variables) for the period 1901–2009, derived entirely from daily ground meteorological observations [Mitchell and Jones, 2005; Mitchell et al., 2004; New et al., 2000]. TS3.1 fields are the product of an angular distance weighted interpolation of monthly climate anomalies relative to the 1961–1990 mean, subsequently recombined with an equivalent grid of normals for the same baseline period [New et al., 1999]. In estimating each grid point, TS3.1 uses the eight nearest station records, regardless of direction, within an empirically derived correlation decay distance (CDD) of 450 km for precipitation and 1200 km for temperature [New et al., 2000]. If a grid point lies beyond the CDD of any stations, the grid is “relaxed” to the 1961–1990 mean. The underlying motivation for an anomaly (rather than absolute) approach is that while the normal may vary considerably at local scales, most aspects of climate variations at yearly time scale occur over much larger spatial scales [Mitchell and Jones, 2005]. Unlike previous releases (i.e., TS2.1) that applied a modified version of the Global Historical Climatology Network (GHCN) method of homogenization [Mitchell and Jones, 2005], no homogenization is performed in TS3.1. Major sources of error in the TS3.1 include instrumental measurement error, insufficient station density, and interpolation errors [New et al., 2000]. For each grid cell and time step, two contributing station counts are provided for each variable, one that considers only those stations contributing from within the grid cell itself and another that tabulates all contributing stations within the variable-specified CDD.

[10] We use the TS3.1 shared variable fields (with 20CR), Ta and P, along with associated within-cell contributing station counts. The TS3.1 (hereafter, simply referred to as “CRU”) and 20CR data sets overlap from 1901 to 2008. Because CRU is derived entirely from daily ground observations of Ta and P, it is not subject to the same modeling artifacts affecting the 20CR data. Therefore, if changes are detected in both 20CR and CRU, it is more likely that they are climate related. On the other hand, if we detect changes only in 20CR but not in CRU, and these changes match those in the uncertainty estimates, it points to data artifacts that are not climate related in nature.

3. Methodology

[11] Many statistical techniques (i.e., tests) exist for detecting inhomogeneities in climate data (see Peterson et al. [1998] and Reeves et al. [2007] for a review). These tests differ in terms of distributional assumptions (i.e., Gaussian, non parametric), whether they can detect single or multiple change points, whether or not the timing of the abrupt change is known a priori, and whether the presence of abrupt changes and linear trends are estimated simultaneously. In this study, we use the following three techniques to examine the presence of abrupt changes in the monthly data and associated uncertainty fields (primary variables only): Pettitt test [Pettitt, 1979], Bai-Perron test [Bai, 1997; Bai and Perron, 2003], and segmented regression [Muggeo, 2003]. We provide only a brief description of the techniques here and point the interested reader to the original references for more details.

[12] The nonparametric Pettitt test is based on the Mann-Whitney test for evaluating whether two independent samples come from the same population. It is designed to detect a single abrupt change in the mean of the distribution of the variable of interest at an unknown point in time. It is also possible to compute the statistical significance of the test, as described by Pettitt [1979]. We set the test significance level to 5%. While the Pettitt test has been applied previously in several hydrometeorological studies [e.g., Busuioc and von Storch, 1996; Bárdossy and Caspary, 1990; Villarini et al., 2009], this is not the case for the Bai-Perron test and segmented regression, which instead are widely used in other disciplines (e.g., econometrics, ecology, and biostatistics).

[13] The Bai-Perron test allows for the detection of multiple change points at unknown points in time. It is based on a standard linear regression model in which the null hypothesis being tested is that the regression coefficients remain constant. The alternative hypothesis is that at least one of the regression coefficients changes over time. It represents a generalization of the statistical F test [e.g., Andrews, 1993], allowing for the detection of multiple breaks. The number of change points is assessed by using the Schwarz Bayesian criterion (SBC) [Schwarz, 1978] as penalty criterion for model selection [Bai and Perron, 2003]. For each change point, we include its 95% confidence interval.

[14] Segmented regression allows for the detection of multiple change points at unknown points in time, together with the estimation of continuous regression lines. The estimation procedure allows inference for all of the model's parameters, including the location of the change point (i.e., it provides confidence intervals of the change points). In presenting the results, we include the 95% confidence intervals of the change points. The model residuals are assumed to be Gaussian, but are not required to be homoscedastic (i.e., having constant variance). One of the difficulties in applying this method is that the user must provide initial guess values for the change point(s). We address this issue by performing a visual inspection of each time series and by examining the sensitivity of the results to different initial guess values [Muggeo, 2003].

[15] We use the Pettitt and Bai-Perron tests to detect abrupt changes in the hydrometeorological variables because the inhomogeneities generally occurred as step changes. On the other hand, we use segmented regression to examine the presence of inhomogeneities in the associated uncertainty fields, owing to the broken-line nature of their time series. The use of these three tests concurrently, together with a visual assessment of the time series, provides a measure of the robustness of the results from any single test.

[16] One issue that deserves further discussion is related to the effects of data autocorrelation on the results of the analysis [e.g., Busuioc and von Storch, 1996; Lund et al., 2007]. When data are significantly autocorrelated, any test statistics derived from those data will be based on an effective sample size that is less than the actual sample size. Previously, the impact of serial correlation on the Pettitt test was examined by Busuioc and von Storch [1996]. They performed a Monte Carlo simulation consisting of multiple time series generated from an AR(1) (autoregressive of order 1) process with varying lag 1 autocorrelation values. They showed that the probability of erroneously detecting a change point rapidly increases with increasing lag 1 autocorrelation values. They recommend “prewhitening” as a solution to this issue. However, the issue becomes much more complex, when real, rather than simulated data, are considered. This is because the presence of abrupt changes as well as trends can introduce spurious autocorrelation in the record. The question becomes, Do we detect a change point because the data are serially correlated, or do the data exhibit autocorrelation because of the presence of a change point? This leads to a circular argument.

[17] The prewhitening method of Busuioc and von Storch [1996] is well suited for data that are either generated or well approximated by an AR(1) process. In our case, as in most applications with real data, we do not know the nature of the generating process. Although, we do know that making an erroneous assumption (e.g., AR(1)) would negatively affect our results. Therefore, we adopt the following two-part approach. First, we compute the lag 1(year) autocorrelation coefficient for both the raw data and the change point detection model's residuals. Second, we visually inspect the time series of all variable and uncertainty fields for all months, and compare them with the results of the statistical analyses. If the residuals are effectively uncorrelated, it means that serial correlation is primarily caused by the change point(s). We will show (section 4) that the lag 1 residual autocorrelation is generally much smaller than the lag 1 autocorrelation of the data themselves, and that autocorrelation does not significantly affect our results. The time series plots, which are included as auxiliary material (Figures S1–S17 in Text S1), support the validity of our statistical analyses.

[18] If some residual autocorrelation exists, the primary impact would be on the standard errors and confidence intervals of the estimated parameters. One possible remedy is to compute the variance-covariance matrix of the estimated coefficients using estimators that are robust to unmodeled autocorrelation and heteroskedasticity [e.g., Andrews, 1991; Zeileis 2004]. Because in our study the residual autocorrelation tends to be small in most cases, we do not implement this or any alternative corrective measure. It is possible, therefore, that in a minority of cases, our confidence intervals are affected by some residual autocorrelation.

[19] All the calculations are performed in R [R Development Core Team, 2008]. We use the segmented package [Muggeo, 2008] to perform segmented regression, and the strucchange package [Zeileis et al., 2002, 2003] to perform the Bai-Perron test.

4. Results and Discussion

[20] We first perform a visual inspection of the 20CR and CRU time series. Figure 1 illustrates the time series of Ta and P for the month of July (see auxiliary material for a full collection of plots for all variables and months, together with results of the statistical analyses). The time series share a common mean of 25.9°C and agree to within 0.5°C for one third of the record, but are generally not well correlated (R2 of 0.28). The precipitation time series exhibit even greater discrepancies, with the 20CR having generally much larger values than CRU. The 20CR uncertainty estimates are shown to be inversely related to the number of assimilated observations (Figure 1, bottom). In the case of Ta, the uncertainty is reduced fivefold, from 3.8 to 0.7°C between 1871 and 1940. The impact of assimilated observations on the uncertainties is even more evident for the precipitation data. The spread varies around a mean value of about 150 mm/month before 1940 and a value of about 90 mm/month after 1940. This trend is consistent across all variables, for which July uncertainty reduces by a factor of 2 to 3 between 1871 and 1940 (see auxiliary material). The two data sets benefit from different (clearly independent in some cases) underlying observational records whose density varies substantially over time (Figure 2). For example, the 20CR assimilated observation count nearly quintuples between 1929 and 1940 from 2,444 to 11,638 month−1. The CRU Ta (P) station count increases steadily from 220 (320) in 1901 to 340 (440) in 1940, plateaus, and then falls off sharply in 1997 (1995) to a mere 45 (28) stations. 20CR does not suffer from as dramatic of a decline in recent years. Both data sets benefit from a high density of supporting observations during the period 1940 to 1990 (Figure 2).

Figure 1.

Over the study region (30°–45°N, 105°–90°W), the July (top) spatially averaged Ta and P from the Twentieth Century Reanalysis (20CR) and Climate Research Unit (CRU) and (bottom) 20CR ensemble spread (black line) and assimilated observation count (gray line). The ensemble spread is the time average of the spatially averaged 3-hourly values. The assimilated observation count is the total per July, not the mean per analysis.

Figure 2.

Comparison between the 20CR total assimilated observation count and the CRU contributing station count for July.

[21] The results of the statistical change point analyses are summarized in Figure 3. The overwhelming consensus across variables and tests is that an abrupt shift in the 20CR occurred circa 1950. During the warm season (May–September), 91% of Pettitt change points were detected between 1940 and 1950. Detected shifts in the cold season were less common, but generally fell within the period of 1905–1920. The Bai-Perron 95% confidence windows generally served to confirm the timing of Pettitt change points, bracketing them in 72% of all occurrences. In the case of primary variables (Ta, Q, P, LE, H, and PWV) for which uncertainty estimates are available, the Pettitt and Bai-Perron change points were found to correspond with segmented regression change points 21% and 71% of the time, respectively (the Pettitt test is designed to detect a single change point). Shared change points between the variable and uncertainty field time series bring the physical realism of the shift into question. Notably, if an abrupt shift is detected in the uncertainty field (i.e., segmented regression) but not in the respective variable field (i.e., Pettitt and Bai-Perron tests), it does not rule out an artificial effect on the variable, but rather implies that any such effect is undetectable given the variable's natural range of variability.

Figure 3.

Summary of the Pettitt (green), Bai-Perron (red), and segmented regression (black) detected change points for the following variables: (a) Ta, (b) Q, (c) P, (d) surface latent heat flux (LE), (e) H, (f) lifting condensation level (LCL), (g) convective triggering potential (CTP), (h) humidity index (HI), and (i) total column precipitable water vapor/ice (PWV). The Pettitt and Bai-Perron tests were performed on the 20CR ensemble mean fields, whereas segmented regression was performed on the uncertainty estimates (i.e., ensemble spread). The whiskers extend to 95% confidence limits for the Bai-Perron test and segmented regression.

[22] As discussed in section 3, we evaluate the extent of serial correlation in the 20CR and its impact on our analysis by computing the lag 1 autocorrelation coefficients of the monthly mean data. Figure 4 shows that the degree of autocorrelation is both variable dependent and seasonal, with peak values ranging from 0.4 to 0.6 during the warm season. After the presence of abrupt shifts in the data is accounted for, the remaining autocorrelation is generally not statistically different from zero at the 5% level (Figure 4). This is truer of the Bai-Perron residual time series, than of the Pettitt ones; accounting for multiple change points results in the largest reduction of serial correlation in nearly all cases. As shown in Figure 5, the uncertainty (i.e., ensemble spread) fields are more strongly autocorrelated, with values of 0.9 not uncommon, and with little seasonal variability. This should not come as a surprise given the sheer number of (segmented) change points detected in the uncertainty fields, relative to the variable time series (Figure 3; see also auxiliary material). Once again (as in Figure 4), the model residuals exhibit a much smaller autocorrelation, even though there is some residual autocorrelation for several variables and months. As mentioned toward the end of section 3, it is possible that the confidence intervals for these cases are affected by unmodeled residual autocorrelation. The realism of these abrupt changes is supported by visual examination of the time series of the uncertainty fields (auxiliary material).

Figure 4.

The lag 1 (year) autocorrelation values of the 20CR (a) Ta, (b) Q, (c) P, (d) LE, (e) H, (f) LCL, (g) CTP, (h) HI, and (i) PWV time series (black), as well as the time series of their residuals, after accounting for the presence of change points detected by the Pettitt (green) and Bai-Perron (red) tests. All three values (black, green, and red) overlap in the case that neither method detects a change point. Green and red values overlap in the case that the Bai-Perron test yields a single change point that concurs with the Pettitt test; they are different in the case that the Bai-Perron test yields multiple change points or a single break in a year different from the Pettitt test. The dashed lines represent the 95% confidence intervals of the autocorrelation function.

Figure 5.

The lag 1 (year) autocorrelation values of the 20CR (a) Ta, (b) Q, (c) P, (d) LE, (e) H, and (f) PWV ensemble spread time series (black), as well as the time series of their residuals, after accounting for the presence of change points detected through segmented regression (red). The points overlap in the case that no change points are detected (i.e., for Q, June, July, August, October, and November; see also Figure 3). The dashed lines represent the 95% confidence intervals of the autocorrelation function.

[23] We tested the CRU Ta and P data sets using the Pettitt and Bai-Perron tests and detected no significant change points in the 1901–2008 record (not shown). Moreover, the mean (linear) trends for both variables were not statistically different from zero at the 5% significance level. Hence, according to CRU, there has been neither an abrupt shift in Ta or P nor an acceleration of the hydrologic cycle over this region during the past century. While we cannot unequivocally conclude that the shift in the 20CR analysis fields is artificial, the fact that (1) the timing of abrupt change follows subsequent to dramatic shift in the density of the underlying observational record and (2) the change point is not corroborated by the CRU data sets for Ta and P, makes the homogeneity of the 20CR in this region at the very least questionable. This interpretation is further substantiated by the results of the Pettitt test performed by DeAngelis et al. [2010] on the National Climatic Data Center (NCDC) monthly P for the period of 1900–2000 over three subdomains ((1) 104°–98°W, 33°–44°N; (2) 98°–92°W, 36°–45°N; and (3) 92°–85°W, 36°–45°N) of our study region. Considering warm-season months (May–September) exclusively, only a single change point (1947) was detected at the 5% significance level for one region (region 3) and one month (July) [DeAngelis et al., 2010]. It is therefore striking that in our analysis of 20CR we detected a statistically significant change point for nearly all warm-season months during the period of 1940–1950, consistently across multiple surface and atmospheric variables. Our hypothesis is that the observed instantaneous, full-system shift is the artifact of a tightly coupled reanalysis reacting to an abrupt observational shock in its only assimilated data stream over land (i.e., surface pressure).

[24] The fact that change point years do not align exactly for all months and variables, we argue, is likely an artifact of spatial sampling bias (see Figures S18 and S19 in Text S1), and serves only to strengthen our observation network hypothesis. For example, sampling can explain the cold-season change points detected in the early part of the record because not only is the record most sensitive to additional observations at this time (because of low total assimilated observation counts and nonuniform spatial coverage) but also the region, which spans 15° in latitude and longitude, is typified by sharp north–south (cold-warm) gradients in Ta and east–west (wet-dry) gradients in P during the cold season. If additional observations were assimilated from the southeast and south central subregions, which did occur starting March 1908 and November 1913, respectively, then this could explain the increase in September–December P (in liquid form), increase in Q, and increase (decrease) in LE (H), related to evapotranspiration from a wetter (and unfrozen) root zone. Evaporative cooling could explain why the regional mean Ta does not exhibit an abrupt increase. Sampling could also explain the range in detected change point years in the 1940s. During this period, the spatial distribution of assimilated observations is in constant flux (Figure S19 in Text S1). Not until circa 1948, when all nine 5° × 5° grid boxes that compose the region contribute over 800 observations per month, is some relative spatial uniformity in sampling achieved. The region's semiarid climate (i.e., temporally vary between water and energy-limited state) makes attribution of intervariable differences in change points difficult. So-called “data denial” experiments offer the only definitive means of testing 20CR's sensitivity to observations from a particular station or subregion.

[25] Importantly, we cannot entirely rule out the possibility that SST biases [e.g., Kennedy et al., 2011a, 2011b] and their correction (or lack thereof), play a shared role in bringing about the step shift in the 1940s. After all, the year 1942 does, coincidentally, mark the date that Folland and Parker's [1995] bias corrections to the HadISST v1.1 were stopped. And subsequent to 1942, leading up until the early 1950s, there remains a great deal of uncertainty concerning how SST measurements were made [Kennedy et al., 2011b].

[26] Using a coupled atmosphere-ocean general circulation model, Folland [2005] quantified the impact of SST bias corrections applied to the Hadley Centre's global sea ice and SST GISST3.1 (which was based on similar data to HadISSTv1.1) (N. Rayner, personal communication, 2011) on modeled surface air temperature globally. In that study, the central U.S. was found to be especially sensitive to the correction relative to the rest of the world, with a cooling response of more than 1.2°C in the 1930–1939 decadal mean temperature [Folland, 2005, Figure 2a]. Similarly, the most direct assessment of the assimilated SST's role in causing the abrupt shift in 20CR would be to replay the reanalysis using the latest and third version of the Hadley Centre gridded SST data set, HadSST3 [Kennedy et al., 2011b], which includes adjustments for post-1941 changes in instrumentation. Understandably, such an effort falls beyond the scope of this paper.

[27] We reviewed the region's historical record for prolonged drought events and expansion of irrigated area, which are both potential triggers for climate shift [e.g., Nicholson et al., 1998; DeAngelis et al., 2010; Puma and Cook, 2010]. According to the CRU-derived Standardized Precipitation Evapotranspiration Index database (SPEI) [Vicente-Serrano et al., 2010], there were four extended (18 months) drought events (SPEI < −1): March–September 1911, April 1934 to August 1935, July 1954 to April 1957, and December 1963 to October 1964. None of these events overlap with the 20CR's change point. Irrigation-related impacts could explain warm-season change points. The largest expansion of irrigated area over the region occurred between 1949 and 1974, when the groundwater withdrawals increased by 475% [McGuire, 2009]. After 1974, withdrawal rates remained approximately stable [McGuire, 2009]. Irrigation over Texas comprised more than half of the total Great Plains withdrawals until 1969 [DeAngelis et al., 2010, Figure 1a]. Considering that Texas lies upstream from the remainder of the region, it is feasible that localized surface moisture anomalies (i.e., irrigation or drought) there contributed to rainfall anomalies across the remainder of the region. Neither the CRU nor NCDC records of P, however, provide evidence of a related change point.

5. Conclusions

[28] We have presented evidence that suggests the 20CR is affected by inhomogeneities, which resulted in abrupt changes between 1940 and 1950 over the central United States during the warm season. On the basis of our findings, we cannot recommend the use of the full 138 year 20CR for calculating climate trends because of the large impacts that these abrupt changes would have on the trend analyses. Instead, we recommend that users focus on the second half century of the record (1960 to present), for which a dense network of pressure observations exists. We accept that the role of inhomogeneities on the 20CR likely varies regionally and we have planned a follow-up study to address this issue. It is appropriate to acknowledge that the 20CR, by assimilating surface synoptic observations only, represents a first step toward achieving climate-quality reanalyses through improved input data source consistency. It is the first and only reanalysis to span the 20th century and there is presently no alternative reanalysis source for the period 1871–1947. Besides trend analysis, the 20CR is proving to be an important source of information on regional atmospheric [e.g., Hakkinen et al., 2011] and hydroclimatological [e.g., Ionita et al., 2011; Cook et al., 2011] variability and extremes. If the 20CR could be bias corrected and downscaled with observations [e.g., Sheffield et al., 2006], it would provide the first observationally based off-line land surface model forcing data set to span the late 19th century to present. Future versions of the 20CR may be significantly improved through continued data rescue and/or data archeology.

[29] Reanalyses serve as the only observationally based tool for comprehensive evaluation of the climate system. Their utility extends far beyond that of single-variable data sets (e.g., Ta or P), but as we have shown, single-variable data sets are crucial to verifying their accuracy. Differentiating between real and artificial climate variability is a critical issue that requires continued consideration.

Acknowledgments

[30] The lead author is supported by the Japan Society for the Promotion of Science Postdoctoral Fellowship for Foreign Researchers P10379: Climate change and the potential acceleration of the hydrological cycle. The second author acknowledges financial support from the Willis Research Network. Support for the Twentieth Century Reanalysis (20CR) Project data set is provided by the U.S. Department of Energy, Office of Science Innovative and Novel Computational Impact on Theory and Experiment (DOE INCITE) program, and Office of Biological and Environmental Research (BER), and by the National Oceanic and Atmospheric Administration Climate Program Office. The 20CR version 2.0 data were obtained from the Research Data Archive (RDA; http://dss.ucar.edu), which is maintained by the Computational and Information Systems Laboratory (CISL) at the National Center for Atmospheric Research (NCAR). NCAR is sponsored by the National Science Foundation (NSF). The CRU TS3.1 data set was obtained in May 2011 from the British Atmospheric Data Centre (BADC; http://badc.nerc.ac.uk). Gilbert Compo provided the 20CR assimilated observation count data set, as well as a helpful critique of the manuscript. The authors acknowledge helpful correspondences with Vito Muggeo and Achim Zeileis and thank them for making the segmented [Muggeo, 2008] and strucchange [Zeileis et al., 2002, 2003] packages freely available in R [R Development Core Team, 2008].