The Twentieth Century Reanalysis Project



The Twentieth Century Reanalysis (20CR) project is an international effort to produce a comprehensive global atmospheric circulation dataset spanning the twentieth century, assimilating only surface pressure reports and using observed monthly sea-surface temperature and sea-ice distributions as boundary conditions. It is chiefly motivated by a need to provide an observational dataset with quantified uncertainties for validations of climate model simulations of the twentieth century on all time-scales, with emphasis on the statistics of daily weather. It uses an Ensemble Kalman Filter data assimilation method with background ‘first guess’ fields supplied by an ensemble of forecasts from a global numerical weather prediction model. This directly yields a global analysis every 6 hours as the most likely state of the atmosphere, and also an uncertainty estimate of that analysis.

The 20CR dataset provides the first estimates of global tropospheric variability, and of the dataset's time-varying quality, from 1871 to the present at 6-hourly temporal and 2° spatial resolutions. Intercomparisons with independent radiosonde data indicate that the reanalyses are generally of high quality. The quality in the extratropical Northern Hemisphere throughout the century is similar to that of current three-day operational NWP forecasts. Intercomparisons over the second half-century of these surface-based reanalyses with other reanalyses that also make use of upper-air and satellite data are equally encouraging.

It is anticipated that the 20CR dataset will be a valuable resource to the climate research community for both model validations and diagnostic studies. Some surprising results are already evident. For instance, the long-term trends of indices representing the North Atlantic Oscillation, the tropical Pacific Walker Circulation, and the Pacific–North American pattern are weak or non-existent over the full period of record. The long-term trends of zonally averaged precipitation minus evaporation also differ in character from those in climate model simulations of the twentieth century. Copyright © 2011 Royal Meteorological Society and Crown Copyright.

1. Introduction

Long-term climate datasets are critical both for understanding climate variations and evaluating their simulation in climate models. Since the 1990s, major national and international efforts have led to the creation of climate datasets called retrospective analyses or ‘reanalyses’, which for the dataset jointly created by the National Centers for Environmental Prediction and the National Center for Atmospheric Research (NCEP, NCAR; Kalnay et al., 1996) spans the period of availability of substantial upper-air observations (1948 onward). While such datasets have already proved valuable in climate research and applications, the fact that they extend back to only the mid-twentieth century limits their utility in constraining climate models, in understanding the global impacts of major earlier events such as the 1877/1878 El Niño and Indian famine and the 1883 Krakatoa eruption, and in reliably establishing long-term trends in the statistics of extreme weather associated with severe storms, floods, hurricanes, heat waves, and cold spells. Recently, US efforts such as the Global Change Research Program (USGCRP; Climate Change Science Program, 2003, 2008a,b), international efforts to develop the Global Climate Observing System (GCOS, 2010) and Global Earth Observation System of Systems (Group on Earth Observations, 2005), as well as conferences sponsored by the World Climate Research Program (WCRP) and other programs (e.g. Trenberth et al., 2008; Bengtsson et al., 2007) have called for new reanalysis datasets ‘spanning the instrumental record’ to compare the patterns and magnitudes of recent and projected changes in both the mean climate and climate variability, including especially the changes in extreme-event statistics. It is hoped that such longer reanalysis datasets will enable researchers to more reliably assess the natural range of variation of extreme-event statistics, and also to understand how variations in the El Niño Southern Oscillation (ENSO) and other climate modes affect those statistics (Easterling et al., 2008). The Twentieth Century Reanalysis (20CR) Project described here represents one of the first comprehensive responses to this call.

The concept of ‘reanalysis’ may be traced back to a proposal by Brandes in 1816 (Monmonier, 1999) to construct synoptic weather maps. Since that time, the mapping of meteorological observations taken at approximately the same time to infer the current state of the atmosphere, what is now referred to as performing a synoptic ‘analysis’, has become a mainstay of weather and climate research. Until the mid-twentieth century, such analysis methods were subjective, relying on a meteorologist's experience in determining the contours of pressure and temperature, vectors of wind direction and speed, and other meteorological features. The advent of objective methods for what has come to be called ‘data assimilation’ has enabled the routine generation of thousands of synoptic maps every day by meteorological centres around the world (Daley, 1991). These methods provide an estimate of the state of the atmosphere at any particular time by forming a weighted average that combines millions of observations taken from weather stations, ships, buoys, balloons, radiosondes, aircraft, satellites, and other measurement platforms, with an a priori estimate of that state obtained from a short-term forecast generated using a numerical weather prediction (NWP) model. While crucial for generating accurate real-time forecasts, these modern synoptic analyses have limited utility in climate research owing to the artificial variability associated with frequent changes in observing systems, data assimilation methods, and in the NWP models used to generate the a priori ‘first guess’ analysis estimates (Bengtsson and Shukla, 1988; Trenberth and Olson, 1988). Modern reanalyses have been proposed as a remedy for this problem, with an emphasis on regenerating the synoptic analyses over several decades using a fixed data assimilation system and NWP model (Bengtsson and Shukla, 1988; Trenberth and Olson, 1988; Bengtsson et al., 2007; Thorne and Vose, 2010).

Strictly speaking, the first retrospective analyses (i.e. reanalyses) were generated in 1819, when Brandes followed up on his original proposal and constructed 365 daily maps of pressure contours using barometric pressure observations taken in 1783 at several European stations (Monmonier, 1999). Many retrospective analysis projects followed upon his pioneering accomplishment. Major efforts to recover historical data and construct retrospective hemispheric sea-level pressure (SLP) analyses were undertaken for the Southern Hemisphere for the years 1901–1904 in association with the expeditions of the Discovery and Gauss to Antarctica (National Antarctic Expedition, 1913). Similar efforts were also undertaken for the Northern Hemisphere, most notably a retrospective SLP reanalysis for 1899–1939 entitled the US Historical Weather MapsDaily Synoptic Series, Northern Hemisphere, Sea Level (Wexler and Tepper, 1947). The US Weather Bureau, Army, and Navy produced these maps to better understand weather events in support of aviation and other needs during World War II. The importance of reanalysis was well recognized by Wexler and Tepper (1947), who enthused that ‘by breathing life into a mass of inert data’ such projects ‘provided an indispensable aid to future research’.

Almost 50 years after the publication of the US Historical Weather Maps, the techniques of data assimilation were brought to bear on the problem of retrospective analysis. As part of the Global Atmospheric Research Program, pioneering data assimilation systems were used at the European Centre for Medium-Range Weather Forecasts (ECMWF, Bengtsson et al., 1982) and National Oceanic and Atmospheric Administration (NOAA)'s Geophysical Fluid Dynamics Laboratory (Ploshay et al., 1983) to produce reanalysis fields for the coordinated study year of 1979. This vanguard achievement was followed by the modern three-dimensional reanalysis datasets. These widely utilized datasets extend from no earlier than 1948 to the present, e.g. the 1948–present NCEP–NCAR reanalysis dataset (NNR, Kalnay et al., 1996); the 1957–2002 ECMWF reanalysis (ERA-40; Uppala etal., 2005); the 1979–present NCEP–Department of Energy (DOE) reanalysis dataset (Kanamitsu et al., 2002); the 1979–present Japan Meteorological Agency reanalysis Archive 25 (Onogi et al., 2007); the 1979–present NASA Modern Era Retrospective-analysis for Research and Applications (Bosilovich et al., 2008; Rienecker et al., 2008); the 1979–present NCEP Climate Forecast System Reanalysis dataset (Saha et al., 2010); the 1979–1993 ERA-15 (Gibson etal., 1997); the 1980–1996 NASA/Data Assimilation Office reanalysis dataset (Schubert et al., 1993); and the 1989–present ECMWF Reanalysis Interim (ERA-Interim; Dee and Uppala, 2009; Simmons et al., 2007). These reanalysis systems assimilate all independent usable observations available at the reanalysis time. Many employ variational assimilation techniques, the most frequent being 3D-Var, with a fixed error assigned to the first-guess fields throughout the reanalysis period. Such an invariant specification may be suboptimal for reanalyzing over longer periods, within which significant variations of data density (and accuracy) can occur. Over the latter half of the twentieth century, the number of observations available for each state estimation has varied from several thousand surface observations and a few hundred upper-air observations in the 1950s to additional millions of satellite observations by the 1990s. The application of suboptimal assimilation techniques with fixed parameters over periods of such variations in observational networks has caused problems ranging from understated extratropical storm track variability (Harnik and Chang, 2003; Hodges et al., 2003) to incorrect tropical (Newman et al., 2000) and Antarctic variability (Bromwich and Fogt, 2004), to spurious long-term trends (Trenberth and Smith, 2005; Bengtsson et al., 2004a,b; Kinter et al., 2004; Sturaro, 2003; Kistler et al., 2001). As one attempts to reanalyze the early twentieth century, and especially the nineteenth century, scant upper-air data, upon which the existing 3D-Var systems depend to produce reasonable upper-air fields (Bengtsson et al., 2004b; Kanamitsu and Hwang, 2006), will be available (Brönnimann et al., 2005). The reduced data densities will further compromise the ability of 3D-Var systems with fixed parameters to produce reliable upper-air reanalyses of these earlier periods.

In contrast, several studies have demonstrated the feasibility of generating reliable reanalyses of these earlier periods using only surface observations and more advanced data assimilation methods such as the Ensemble Kalman Filter (Whitaker et al., 2004; Anderson et al., 2005; Compo et al., 2006) or 4D-Var assimilation (Thépaut, 2006; Whitaker et al., 2009). These studies have emphasized several advantages of using surface pressure and SLP observations for such historical reanalyses. Measurements of surface pressure have been made consistently since the late 1800s, and standard corrections are known for even earlier observations. The surface pressure information, through geostrophy, yields a reasonable approximation to the barotropic part of the flow, which accounts for a substantial part of the total flow. The surface pressure tendency, being related to the vertically integrated mass flux divergence, provides further information about the tropospheric circulation that can be captured by a multivariate data assimilation system (Bengtsson, 1980).

Motivated by the above considerations, the 20CR utilizes an Ensemble Kalman Filter data assimilation system (Whitaker and Hamill, 2002), a new version of the NCEP atmosphere–land model to generate first-guess fields with interpolated monthly sea-surface temperature and sea-ice concentration fields from the Hadley Centre Sea Ice and SST dataset (HadISST; Rayner et al., 2003) as prescribed boundary conditions, and newly compiled surface pressure and SLP reports and observations, to produce a reanalysis dataset spanning 1871 to the present. The pressure data come from the new International Surface Pressure Databank (ISPD), which incorporates pressure observations extracted from leading international archives of meteorological variables and contributed national and international collections. The databank has been established through extensive international cooperation under the auspices of the Atmospheric Circulation Reconstructions over the Earth (ACRE) initiative, and working groups of GCOS and WCRP. A preliminary first version of the reanalysis (20CRv1; Compo et al., 2008) spanned the period 1908 to 1958. In the second and complete version (20CRv2) described here, global atmospheric fields for 1871 to 2008 have been generated.

In our implementation, the Ensemble Kalman Filter provides an estimate of the atmospheric state every six hours and also the uncertainty of that estimation. Figure 1 provides an example of this capability. The maps show the analyzed SLP and 500 hPa geopotential height fields and their estimated uncertainties for 0000 UTC on 29 January 1922 (Figure 1(a, b)) during the time of the so-called ‘Knickerbocker’ storm along the US East Coast (Kocin and Uccellini, 2004), and also for the same day of 1972 (Figure 1(c, d)), when many more observations were available, to illustrate the impact of observational volumes on state estimation. Similar fields are available for all output variables from the 20CR dataset at all times. As the figure shows, the uncertainty in the estimated state decreases as the network density increases from 1922 to 1972. The uncertainty is also flow-dependent, as evident, e.g. in the relatively high uncertainty in Figure 1(b) along the tightly packed geopotential height contours over the northeast Pacific. In addition to the contribution from atmospheric dynamics, as described below, the uncertainty field also includes a representation of errors arising from imperfect observations and a finite-ensemble first guess generated using an imperfect NWP model.

Figure 1.

Synoptic charts of the ensemble mean analysis and ensemble uncertainty for (a) sea-level pressure and (b) 500 hPa geopotential mean and spread at 0000 UTC on 29 January 1922. (c, d) are as (a, b), but for 29 January 1972. Line contours indicate the analysis and shading indicates the uncertainty measured as the ensemble standard deviation (or spread) at each location. The line contour interval in (a,c) is 4 hPa, with the 1000 hPa contour thickened, and in (b,d) it is 50 m, with the 5600 m contour thickened. The shading interval in (a,c) is 0.25 hPa and in (b,d) is 5 m.

The remainder of the paper is organized as follows. Section 2 describes the 2008 experimental NCEP Global Forecast System NWP model used for generating the first-guess background fields. Section 3 provides some specifics of our implementation of the Ensemble Kalman Filter for 20CRv2. Section 4 describes the ISPD compilation of surface pressure and SLP observations and the 20CRv2 quality-control (QC) system. Section 5 describes the multi-stream production scheme employed to generate the dataset. Section 6—the first ‘results’ section—compares the synoptic variability and uncertainty in the 20CRv2 reanalysis fields with that in the NNR and ERA-40 reanalyses, in the station and marine observations of the ISPD, and in additional upper-air observations. Section 7 presents some basic features of the climatology and climate variability as represented in this dataset, and compares them with other reanalyses and statistical reconstructions. Section 8 provides details of data access and conclusions.

2. Global numerical weather prediction model and boundary conditions

The coupled atmosphere–land model used here is the April 2008 experimental version of the NCEP Global Forecast System (GFS), courtesy of the NOAA NCEP Environmental Modeling Center (EMC). The model has a complete suite of physical parametrizations, as described in Kanamitsu et al. (1991), with recent updates detailed in Moorthi et al. (2001). Additional updates to these parametrizations are described in Saha et al. (2006). The version used here includes an updated prognostic cloud condensate scheme (Moorthi et al., 2001) and revisions to the solar radiative transfer (Hou et al., 2002), boundary-layer vertical diffusion (Hong and Pan, 1996), Simplified Arakawa–Schubert convection scheme with momentum mixing (Moorthi et al., 2001), gravity wave drag (Kim and Arakawa, 1995), mountain blocking (Lott and Miller, 1997; Alpert, 2004), and long-wave radiative transfer (Mlawer et al., 1997). A prognostic ozone scheme includes parametrizations of ozone production and destruction (Saha et al., 2010). The model contains a complex representation of land processes through coupling with the four-layer Noah land model (Ek et al., 2003). We used an experimental version of the GFS model primarily because of its ability to treat time-varying CO2 and volcanic aerosols, features not found in the operational version at that time. Annual averages of the time-varying global mean CO2 concentration, volcanic aerosols, and incoming solar radiation were specified as in Saha et al. (2010). For computational efficiency in the 20CRv2 ensemble system, we used the model at a horizontal resolution of T62 and a vertical resolution of 28 vertical hybrid sigma-pressure levels (Juang, 2005).

The lower boundary conditions of sea surface temperature and sea-ice concentration fields needed to run the model in atmosphere–land-only mode were specified using the UK Met Office HadISST1.1 dataset (Rayner et al., 2003). The monthly mean data were interpolated to daily resolution using a monthly-mean preserving algorithm (Taylor et al., 2000). The values used for the first fortnight of the reanalysis, 1–15 January 1871, were determined by interpolating between the average of 1871–1900 December values and the mid-month value of 15 January 1871 from HadISST. As noted by Rayner et al. (2003), the quality of the HadISST varies throughout the period from a sparse network of marine SST observations in the 1870s to a network comprising additional numerous marine, buoy, and satellite SST observations in the late twentieth century (with the International Comprehensive Ocean–Atmosphere DataSet (ICOADS; Worley et al., 2005) providing the bulk of the in situ SST observations). The sea-ice concentrations used have a similar time-dependence in their quality. As described in Appendix A, the sea-ice concentration ingested into 20CR differs from that provided in HadISST, owing to our accidental mis-specification of the concentration near coastal regions.

3. Ensemble Kalman Filter implementation

The Kalman Filter provides a rigorous method for optimally combining, in a least-squares sense, imperfect observations and an imperfect estimate of the current state (the ‘background’ or ‘first guess’) assuming linear error models (Daley, 1991). Owing to computing limitations, for many years it did not have widespread use in meteorological applications. Rapid progress has been made since Evensen (1994) proposed a Monte Carlo approximation, now called the Ensemble Kalman Filter, in which the background-error covariance is approximated using an ensemble of, e.g., short-range NWP forecasts. Recent reviews (e.g. Evensen, 2003; Houtekamer and Mitchell, 2005; Hamill, 2006; Evensen, 2009) discuss many theoretical and practical issues associated with this data assimilation methodology. In particular, when the observations and background errors are Gaussian, then the Kalman Filter is the solution to the Bayesian problem of combining a given observation distribution and a prior background distribution to form a posterior distribution of the state of the atmosphere. There are two distinct classes of algorithms for implementing an Ensemble Kalman Filter: ‘stochastic’ algorithms that account for observational errors by assimilating perturbed observations as part of the ensemble generation (e.g. Houtekamer and Mitchell, 1998; Burgers et al., 1998; Houtekamer and Mitchell, 2005; Houtekamer et al., 2005), and ‘deterministic’ algorithms that modify the update equation to reduce potential sampling errors in the stochastic approach, as reviewed, e.g., by Tippett et al. (2003). Both classes are areas of active research (e.g. Evensen, 2009).

The Ensemble Kalman Filter algorithm used in 20CRv2 is of the ‘deterministic’ type (specifically, through the use of instead of K in Eq. (2) below), as described in Whitaker et al. (2004) and Compo et al. (2006), and is based on the ‘Ensemble Square Root Filter’ algorithm of Whitaker and Hamill (2002). It has a simple parallelizable implementation when observations are processed one at a time (Whitaker and Hamill, 2002; Anderson and Collins, 2007).

To fix ideas, using the notation of Ide et al. (1997), assume that we have an n-member ensemble of first-guess or background fields, with the jth member equation image representing the complete m-dimensional state vector of the NWP forecast model (e.g. wind, geopotential height, humidity, and surface pressure fields on the model domain) in a 6-hour window centred on the analysis time. We use n = 56, based on tests which found that ensembles smaller than this degraded the quality of the upper-air analyses (not shown). The sample mean of these fields is denoted equation image, and the deviations from the mean are

equation image

The sample background-error covariance matrix is denoted

equation image

We denote the first surface pressure observation to be assimilated as yo and the ensemble mean and deviations interpolated to the observation location and time as

equation image

respectively, with H being the operator that interpolates the first-guess surface pressure field to the observation location and time. We then combine the first-guess ensemble and the observation to form an n-member analysis ensemble, whose mean equation image and deviations xja are calculated via

equation image(1)


equation image(2)

where the sample Kalman gain K is given by

equation image(3)

and the sample modified Kalman gain is given by

equation image(4)

as in Whitaker and Hamill (2002). The specified observational-error variance R accounts for both the measurement error associated with the observation and the error associated with representing a large area (i.e. an NWP model grid box) by a point measurement. We further assume that the error in yo is uncorrelated with the errors in all other pressure observations to be assimilated. Note that the background-error variance at the observation location,

equation image

in (3) and (4) is a scalar. In a significant modification to the system of Whitaker etal. (2004) and Compo et al. (2006), we incorporated an improved accounting of hourly observations made throughout the 6-hour window centred on the analysis time, but not necessarily at the analysis time itself. This was done, without changing Eqs (1)–(4), by representing time-dependent information between a yjb at a possibly different time than xjb in the covariance in Eq. (3). Such a modification allows information from, e.g., fast-moving weather systems to be used to improve the analysis, letting the analysis ‘know’ where the system was and will be within the 6-hour window. Technically, this makes our implementation a light Ensemble Kalman Smoother, rather than an Ensemble Kalman Filter, but the equations are unaltered (Sakov et al., 2010).

The model means, variances, and covariances in Eqs (1)–(4) are all unbiased sample estimates from the n = 56 member ensemble. Within the limitations of using an imperfect model and finite ensembles, this formulation represents a minimum-error estimate of the ‘true state’ (Lorenc, 1986), represented here by the analysis ensemble mean equation image. The uncertainty of this estimate is given by the covariance of the analysis ensemble deviations xja,

equation image

The uncertainty at the observation location is HPaHT = σa2.

To assimilate subsequent observations, the 56 members of the analysis ensemble xja become the new first-guess ensemble xjb and Eqs (1)–(4) are applied iteratively for each observation. After all observations have been assimilated, the 56-member set of analyses equation image becomes the 56 initial conditions for the subsequent forecast/analysis cycle, with 9-hour forecasts made to generate analyses every 6 hours (Whitaker et al., 2004, give a complete description). As well as archiving the analysis fields, R, σb2, and σa2 are archived for every observation and observation location.

As discussed by, e.g., Anderson and Anderson (1999) and Whitaker and Hamill (2002), sampling and model errors prevent the ensemble-estimated background-error covariances from being optimal in the Kalman update equation (1). Such issues must be addressed to prevent ‘filter divergence’ wherein the update equations (1)–(4) weight the background too much and the observations too little. Cycling during such a condition reinforces the background and causes the filtered ‘analysis’ estimate to drift farther and farther from observations. Two methods are used to account for these sources of error: covariance inflation (Anderson and Anderson, 1999) and distance-dependent covariance localization (Houtekamer and Mitchell, 2001; Hamill et al., 2001). Covariance inflation partly corrects for under-specified sampling error and also serves as a crude parametrization of neglected model error, assuming that the model-error covariance is in the state space of the ensemble. Although Whitaker et al. (2008) found that so-called ‘additive inflation’ (adding random perturbations with a specified covariance structure to individual ensemble members) was superior to a simple multiplicative covariance inflation (Whitaker et al., 2004) for a modern complete observing network, our own tests of this method with the limited surface pressure network did not reveal a significant difference in analysis quality (not shown). We therefore used the simpler and less computationally expensive method of a simple multiplicative covariance inflation, as described in Whitaker etal. (2004).

Covariance localization (Houtekamer and Mitchell, 2001; Hamill et al., 2001) is a spatial filter that smoothly sets the ensemble covariances to zero beyond a specified distance. This reduces the potential for filter divergence arising from spurious long-distance correlations obtained using finite ensemble sizes. We set the localization distance to 4000 km in the horizontal using the same localization function as Whitaker and Hamill (2002), i.e. Eq. (4.10) of Gaspari and Cohn (1999). In the vertical, localization is set to 4 scale heights (units of −ln(p/ps)), which amounts to smoothly reducing the effect of a 1000 hPa surface pressure observation to zero at about 18 hPa. We note that we localized the variable geopotential height instead of virtual temperature. Localization of temperature gave large geopotential height increments equation image that were not in geostrophic balance with the wind increments at locations where the latter had been localized to zero. The relatively large localization distance may also improve the overall balance of the increments compared to smaller localizations of less than 1000 km sometimes used.

The inflation of first-guess error covariance by a factor r > 1.0 was applied to the first-guess ensemble deviations xjb before assimilating the first observation in a 6-hour window. The specified variation of r was designed to account for the enormous range of observational densities from the 1870s to 2000s, and also different observational densities in the Tropics and the extratropical Northern and Southern Hemispheres, as indicated in Table I. These values were set based on the observing system experiments of Compo et al. (2006) and additional experiments using September 1938. They were not adjusted further or ‘tuned’ during production. As a hard limit to prevent spiraling inflation in the complete absence of observations in an analysis window, r was set to 1.0 in a region if the first-guess ensemble standard deviation of surface pressure exceeded 17 hPa anywhere in that region. This limit was set to be larger than typical regional maximum values on a global map of the climatological standard deviation of surface pressure, which were estimated using ERA-40 for the 1981–2000 period as ∼16 hPa in the vicinity of the Icelandic low and South Pacific storm track (not shown).

Table I. Temporal and spatial variation of the covariance inflation parameter r.
  1. NH = 90°–30°N; Tropics = 30°N–30°S; SH = 30°–90°S.


An important issue to address when assimilating surface pressure observations over land is to make appropriate adjustments to these observations to account for the difference between the model orography and the station elevation at the observation location (Ingleby, 1995). Our approach is similar to that of Ingleby (1995). The adjustment was done using the pressure reduction algorithm of Benjamin and Miller (1990), as

equation image(5)

where zmodel is the model orography interpolated to the station location, T0 is a virtual temperature determined from the model first-guess forecast interpolated to the station location, Γ is a model first-guess virtual temperature lapse rate interpolated to the station location, g is gravity, Rd is the gas constant for dry air, and zstation is the station elevation. The virtual temperature T0 was determined from

equation image(6)

where Tk and pk are model first-guess virtual temperature and pressure at model level k, horizontally interpolated to the station location. The virtual temperature lapse rate Γ was estimated as

equation image(7)

where zk is the model first-guess geopotential height at model level k interpolated to the station location. The model level k in this calculation was chosen to be the first model level that is usually above the planetary boundary layer, to avoid diurnal effects on the pressure adjustment. Benjamin and Miller (1990) used values at 700 hPa. We used values at level k = 16, approximately 250 hPa above the surface, with k = 1 being the model surface.

The adjusted observations equation image were computed using ensemble mean first-guess fields. In order to modify the observation-error variance R to reflect the uncertainty introduced by the adjustment, an ensemble of observations and station elevations was constructed. The ensemble of observations was generated by perturbing pob with Gaussian random noise with zero mean and standard deviation given by the nominal observation error. The ensemble of station elevations was generated by perturbing zstation with Gaussian random noise with zero mean and standard deviation given by the uncertainty in the station elevation, specified here as 3 m (J. Comeaux, pers. comm., 2006). This specification may have been conservative, given that the uncertainty in station elevation is often between 10 and 20 m and may be as large as 80 m in some historical observations (Ingleby, 1995). The pressure adjustment was done separately for each ensemble member, using each ensemble member's virtual temperature and pressure interpolated to the station location. The adjusted R was then simply taken to be the variance of the adjusted surface pressure observations. Larger differences between zmodel and zstation led to larger increases of R above the nominal value, reflecting the increased uncertainty introduced by the adjustment. Increases of R were associated with increases of the ensemble variances of T0 and Γ, since uncertainty in these quantities implies uncertainty in the observation adjustment.

Alternatively, instead of adjusting the surface pressure observations, we could have used the same algorithm to adjust the first-guess surface pressure to the station elevation as part of the computation of the forward observation operator. We tested this alternative approach, and found that it produced similar results.

4. International Surface Pressure Databank collection of observations and quality control

The International Surface Pressure Databank (ISPD) is the world's largest collection of pressure observations. It has been developed by extracting observations from established international archives of meteorological variables and by combining observations made available through additional international cooperation with data recovery facilitated by the ACRE initiative*, and other contributing organizations and projects listed in Table II. ISPD version 2 was used here.

Table II. Organizations contributing pressure observations to the International Surface Pressure Databank version 2.
All-Russian Research Institute of Hydrometeorological Information World Data Center
Atmospheric Circulation Reconstructions over the Earth (ACRE)
Australian Bureau of Meteorology
British Antarctic Survey
Danish Meteorological Institute
Deutscher Wetterdienst (DWD; German Weather Service)
European and North Atlantic Daily to Multidecadal Climate Variability (EMULATE)
Environment Canada
ETH Zurich, Switzerland
GCOS Atmospheric Observation and Ocean Observation Panels for Climate WG on Surface Pressure
Hong Kong Observatory
International Comprehensive Ocean–Atmosphere Data Set (ICOADS)
Instituto Geofisico da Universidade do Porto, Portugal
Japanese Meteorological Agency
Jersey Met Department
Koninklijk Nederlands Meteorologisch Instituut (KNMI; Royal Netherlands Meteorological Institute)
Meteorological and Hydrological Service, Croatia
Met Office Hadley Centre, UK
National Center for Atmospheric Research (NCAR), USA
NOAA Climate Database Modernization Program (CDMP), USA
NOAA Earth System Research Laboratory (ESRL), USA
NOAA National Climatic Data Center (NCDC), USA
NOAA National Centers for Environmental Prediction (NCEP), USA
NOAA Northeast Regional Climate Center at Cornell Univ., USA
NOAA Midwest Regional Climate Center at UIUC, USA
Norwegian Meteorological Institute
Ohio State Univ.—Byrd Polar Research Center, USA
Proudman Oceanographic Laboratory, UK
Signatures of environmental change in the observations of the Geophysical Institutes (SIGN)
South African Weather Service
Univ. of Colorado—Climate Diagnostics Center (CDC) of the Cooperative Institute for Research in Environmental Sciences (CIRES)
Univ. of East Anglia—Climatic Research Unit, UK
Univ. of Lisbon—Instituto Geofisico do Infante D. Luiz, Portugal
Univ. of Lisbon—Instituto de Meteorologia, Portugal
Univ. of Milan—Department of Physics, Italy
Univ. Rovira i Virgili—Center for Climate Change (C3), Spain
ZentralAnstalt für Meteorologie und Geodynamik (ZAMG; Austrian Weather Service)

The ISPDv2 comprises three components: station (land) and marine observations and tropical cyclone ‘best track’ pressure observations and reports. The station component is a new blend of many national and international collections of SLP and surface pressure, with the largest contributor being the International Surface Database (Lott et al., 2008). Yin et al. (2008) describe the duplicate elimination procedures for this component. To produce the blend, the ISPDv2 station component employs a two-step duplicate elimination process that first removes duplicates within a collection and then the duplicates among collections. Stations within a 0.1° radius are considered for duplicate elimination to account for possible location errors in the station metadata. For the marine component, observations of SLP are extracted from ICOADS, which applies extensive duplicate elimination procedures (Worley et al., 2005). The latest available version of ICOADS was used as the 20CRv2 was produced, resulting in ICOADS version 2.4 being used for the period 1952–2008 and version 2.5 for 1871–1951 (Woodruff et al., 2010). Similarly, for the tropical cyclone component, the latest available version of the International Best Track Archive for Climate Stewardship (IBTrACS; Knapp et al., 2010) was used as our reanalysis fields were generated. This resulted in the use of version ‘Beta’ for 1952–2006, ‘v01r 01’ for 1871–1883, and ‘v02r 01’ for 1884–1885 and 2007–2008. In the IBTrACS entries of tropical cyclone central pressure, some data values represent actual measurements while others represent time-interpolated values provided by the tropical cyclone warning centres. We refer to the latter as pressure ‘reports’. For IBTrACS entries where neither a central pressure observation nor report was provided, a central pressure report was derived from the wind speed entry using a second-order polynomial approximation to the gradient wind equation with empirically estimated parameters (N. Matsui, pers. comm., 2010). Parameters were estimated separately for each IBTrACS sub-basin using wind and pressure data from the period 1979–2007, to account for the different characteristics of the tropical cyclones in the sub-basins. These pressure values (from observations, interpolated values, and derived from wind speed) included in the ISPD represent less than 0.22% of the total number of data points in 1878 and less than 0.009% in 2008 in the complete ISPD compilation assimilated here. For simplicity, the combination of these scant reports related to tropical cyclone central pressure, along with the actual measurements of pressure, are referred to as ‘observations’ throughout the remainder of the article, where there is no possibility of confusion.

All other ISPD observations of surface pressure and SLP were QCed during the assimilation cycle in a five-step process that included basic checks for meteorological plausibility, and comparisons with the first-guess ensemble and neighbours (Appendix B; note that the IBTrACS pressure values were not subjected to the QC procedure). The QC indicators contained in the metadata of, e.g. ICOADS and ISD were not used in 20CRv2 so that the 20CR system's own QC examines all marine and station observations. For those observations passing the QC procedure, surface pressure observations were assimilated at the elevation provided in the ISPD's station metadata, whereas observations of SLP were considered as observations at an elevation of zero metres. When both surface pressure and SLP were available with an accompanying station elevation, surface pressure was used.

As described in Appendix B, the complete ISPDv2 includes for every observation the results of the quality control outlined above plus so-called ‘feedback’ assimilation quantities: the observation-error variance R, the first-guess error variance at the observation location σb2, the analysis-error variance at the observation location σa2, the observation minus first guess, equation image, and the observation minus the ensemble-mean analysis, equation image.

5. Production of 20CRv2

To take advantage of the massively parallel computers that the US DOE made available to this project, the 20CRv2 dataset was generated in parallel production ‘streams’ (Table III). Each stream was started from the same 56-member ensemble drawn from a climatological ensemble of states on 1 November 0000 UTC of the stream start year (Table III). For a particular stream, after incrementing the climatological ensemble using Eqs (1)–(4), an ensemble of 56 9-hour forecasts was generated by integrating the NCEP GFS model described in section 2. Each stream was cycled for 14 months to reduce the effect of the initial condition in the lower layers of the land model. Our implementation of Eqs (1)–(4) did not involve any stochastic perturbations or re-arrangements of the ensemble, which had the effect that the jth ensemble member within each stream remained distinct and temporally continuous over the course of a stream's production. However, the jth ensemble member was not continuous across streams, e.g. member 1 from stream 5 ending 31 December 1895 1800 UTC was not continuous with member 1 from stream 6 on 1 January 1896 0000 UTC.

Table III. Multi-stream production scheme including start date of the spin-up period and start and end dates of the production period.
numberstart datestart dateend date
 1 November1 January31 December
 0000 UTC0000 UTC1800 UTC

Production data for almost all streams began from 1 January 0000 UTC fourteen months after the initial climatological ensemble assimilation, and continued for five years (Table III). One exception to this was production stream 16, spanning 1946 to 1951. Additional station pressure observations, courtesy of the Australian Bureau of Meteorology (B. Trewin, pers. comm., 2009), became available after 1951 had been originally generated in stream 17, so stream 16 was extended including the newly available observations. Additionally, stream 27 was extended to 2008 and will be updated to the present on a regular basis.

Unlike the individual ensemble members, the ensemble mean analysis is continuous across the stream boundaries, as expected. This continuity across production streams was tested by extending several streams past their fifth-year boundary into the January of a year 6 and comparing the ensemble mean analyses to the same dates from the year 1 of the overlapping actual production stream. Comparison of root-mean-square (r.m.s.) differences and visual inspection of these analyses with those from the production stream confirmed that the year 1 January analyses were not significantly different from those that would have been obtained by extending the previous stream (not shown).

6. Evaluation of synoptic variability

The 20CRv2 dataset represents the first attempt to generate synoptic analyses of the global troposphere back to 1871. It is also the first reanalysis dataset to estimate the uncertainty in the analysis fields at each analysis time. The ensemble nature of the Filter makes such an estimate particularly convenient compared to variational methods such as 3D-Var or 4D-Var. Additionally, the ensemble of analyses for any particular time can be viewed. Figure 2 illustrates some of these new aspects of this dataset, showing the 500 hPa geopotential height fields for several ensemble members on 29 January 1922 0000 UTC during the ‘Knickerbocker’ storm (Kocin and Uccellini, 2004), and also the ensemble mean analysis (corresponding to the line contours in Figure 1(b)) and the ensemble standard deviation (corresponding to the shading in Figure 1(b)). The fields illustrate that the reanalysis ensemble provides a sample of a probability distribution that includes probable upper-air fields that are dynamically consistent with concurrent and previous pressure observations. From the Monte Carlo aspect of the Ensemble Kalman Filter theory, each member is equally likely. When errors are distributed as a Gaussian, the ensemble mean analysis (Figure 2) gives the most likely state, while the ensemble standard deviation is one measure of the uncertainty in that state.

Figure 2.

Top row shows the 56-member ensemble mean analysis and the ensemble spread for 500 hPa geopotential height from Figure 1 for 0000 UTC on 29 January 1922. Subsequent rows show the 500 hPa geopotential height field from every second ensemble. The contour interval is 50 m, with a bold contour at 5550 m. For the ensemble spread, the contour interval is 5 m, with the 20 m line thickened.

To evaluate whether the ensemble uncertainty varies as expected with the time-changing observation network, in Figure 3 we compare the expected error (red curves) to the actual r.m.s. differences of the first-guess ensemble mean and available observations (blue curves). The expected error is calculated as the sum of the observation-error variance and first-guess ensemble variance at the observation location (σb2 + R). This is averaged for all observations during each year, and the square root is taken to display the results in units of hPa. Similarly, the squared difference of the observation and the ensemble mean first guess at the observation location equation image is averaged over all observations and each year, and the square root is taken. Results are shown for both Northern Hemisphere (≥20°N, Figure 3(a)) and the Southern Hemisphere (≤20°S, Figure 3(b)). Also shown is the number of observations used in the analysis (black curves; note the log scale). The figure provides a demonstration of the good overall consistency of the expected and actual errors, and also their decrease in time over the twentieth century expected from increases in the number of observations.

Figure 3.

Time series of the 6-hourly first-guess root-mean-square (r.m.s.) difference from pressure observations (blue) and expected r.m.s. difference (red) calculated over individual years from 1870 to 2008 for the extratropical (a) Northern Hemisphere (20°N–90°N) and (b) Southern Hemisphere (20°S–90°S). The square root is calculated on the annual mean square values. The thin black curve shows the average number of pressure observations for each analysis in the indicated year (note the logarithmic scale).

Over the Northern Hemisphere, two periods of discrepancies between the actual and expected errors stand out: 1871 to 1890, and 1960 to 1975. The discrepancy in the earlier period is likely a result of setting the inflation factor r to 1.0 in the extratropical Northern Hemisphere when the first-guess spread is larger than 17 hPa anywhere in the region. In the late nineteenth century, the spread in the sparsely observed Arctic and North Pacific often exceeded this threshold, which triggered this setting and resulted in no inflation being applied over the extratropical Northern Hemisphere despite a reduction in spread occurring in the relatively observation-rich areas of the United States and Europe. In these areas, then, the spread became an underestimate of the first-guess uncertainty.

The discrepancy in the 1960 to 1975 period may have a different explanation. The errors in the marine pressure observations in this period may have been larger than specified in our reanalysis, as suggested through independent analyses of the marine observations themselves (Kent and Berry, 2005; Chang, 2005, 2007). Assuming that the largest part of the observation error is the so-called error of representativeness, observation errors were kept fixed over the entire reanalysis period. The results of Kent and Berry (2005) and Chang (2005, 2007) combined with Figure 3(a) suggest that some time variation in the specified observation-error variance R may be necessary to improve the consistency of the actual and expected errors in Figure 3.

The reanalysis quality can be additionally assessed by considering the forecast skill for a longer lead time than the 6-hour lead time shown in Figure 3. Every 24 hours, using the 1200 UTC ensemble mean analysis as the initial condition, a 24-hour forecast of surface pressure and SLP was made as the 20CRv2 analyses were generated. Figure 4 shows time series of the forecast skill measured as the r.m.s. differences between the forecasts and surface pressure observations in the Northern and Southern Hemispheres poleward of 20° for the period 1870 to 2008. For comparison, the expected skill of a persistence forecast at each observation location is shown by the black curves. This persistence skill was estimated using the average squared 24-hour pressure tendencies in the period 1971 to 2000 from NNR at the locations of the observations used to verify a given forecast. Because of this, the persistence skill metric varies with the observation network. For the Northern Hemisphere, the 24-hour forecasts are superior to those expected from persistence even in 1870 and continue to improve over the period of record. For the Southern Hemisphere, forecast skill is similar to persistence starting in 1900 but is not better than persistence until about 1950.

Figure 4.

Time series of 24-hour r.m.s. forecast skill verified using all pressure observations (blue) and the expected skill from 24-hour persistence (black) from 1870 to 2008 calculated over the extratropical (a) Northern Hemisphere (20°N–90°N) and (b) Southern Hemisphere (20°S–90°S). The square root is calculated on the annually averaged squared differences between the observation and the forecast. The expected persistence skill is determined from the climatological variance of the 24-hour pressure tendency from NCEP–NCAR reanalyses at the locations of the observations.

It is also important to compare the 20CRv2 analyses with independent measurements. Figure 5 compares the 20CRv2 with a long record of upper-air data: a compiled series of kite, aircraft, balloon, and radiosonde observations of 500 hPa geopotential height and 850 hPa air temperature made from 1905 to 2006 at Lindenberg, Germany (Adam and Dier, 2005) from the Comprehensive Historical Upper Air Network dataset (CHUAN; Stickler et al., 2010). The 20CRv2 analysis is interpolated to the time and location of these observations. The annual cycle has been computed separately for each series and removed to form anomalies. The figure shows that the correlation between the series is quite high, with correlations from the kite, aircraft, pilot balloon, and radiosonde compilation during the period 1905 to 1938 at 0.90 for the geopotential height and 0.86 for temperature, and correlations for the period of radiosonde measurements alone exceeding 0.96 for geopotential and 0.9 for temperature. (We note that the historical upper-air data likely have considerable uncertainties of their own, although quantification of these uncertainties is difficult). The high correlations, even for the early period, suggest that a high-quality analysis has indeed been achieved throughout the twentieth century for this relatively well-observed region. For Figure 5(a), the radiosonde based correlation is similar to the current pattern correlation skill of operation NWP forecasts of 500 hPa geopotential height made by the NCEP GFS three days in advance (NCEP, 2010).

Figure 5.

Comparison of (a) 500hPa geopotential height anomalies (m) and (b) 850hPa air temperature anomalies (K) at Lindenberg, Germany (52.22°N, 14.12°E) from upper-air observations in the Comprehensive Upper Air Historical Network (CHUAN; Stickler et al., 2010) and 20CRv2 analyzed anomalies interpolated to the time and location of the observations. Anomalies are with respect to the mean annual cycle of the period shown. Blue dots show anomalies from 1950 to present, and red dots show anomalies from data spanning 1905 to 1938. There are no observations from this station in the CHUAN compilation from 1938 to 1949. The number of red dots prior to the commencement of regular radiosoundings at the Lindenberg Observatory (Dubois et al., 2002) in 1932 is (a) 1166 and (b) 13715.

In Figure 6, we examine the r.m.s. error of the 20CRv2 compared to the long series of frequent lower-tropospheric temperature measurements contained in Figure 5(b). Figure 6 shows the expected value of the r.m.s. error (red curves), calculated as the sum of the observation error variance, specified as 2K2 (Diamond et al., 1938; Brönnimann, 2003), and the analysis ensemble variance at the observation location. This is averaged for all 850 hPa air temperatures from Lindenberg during each year, and the square root is taken to display the results in units of K. Similarly, the mean-square differences from Figure 5(b) are calculated over all observations and each year, and the square root is taken (blue curves). The actual and expected differences have similar decadal variability, consistent with the increasing analysis quality and improving measurement quality over the twentieth century. In contrast, the interannual agreement is fair before the 1930s, but poorer once the radiosonde era commences and 20CRv2 has a more consistent number of pressure observations (Figure 3(a)). A particularly noteworthy disagreement occurs in 1971 and 1972. Whether this is related to the changing radiosonde observing practices at this time (e.g. Kistler et al., 2001), a change in instrumentation that occurred at the site in 1971 (Adam and Dier, 2005), or to a concurrent discrepancy in the 20CRv2 first-guess spread and error correspondence seen in Figure 3(a), is an issue of current investigation. We suspect that the change in instrumentation is the likely cause, given the good correspondence of the 1971 and 1972 radiosonde data from the nearby Berlin station (52.48°N, 13.4°E) with 20CRv2 (not shown). The generally lower values of actual error compared to expected error in the 1980s to the 2000s probably reflects the increased accuracy of the instrument, as the 20CRv2 use of observations in the extratropical Northern Hemisphere is nearly constant during this time (Figure 3(a)).

Figure 6.

Time series of r.m.s. difference between 850 hPa air temperature anomalies at Lindenberg, Germany (52.22°N, 14.12°E) taken from upper-air observations in the Comprehensive Upper Air Historical Network (CHUAN; Stickler et al., 2010) and 20CRv2 analyzed anomalies interpolated to the time and location of the observations shown in Figure 5 (blue curve). Also shown is the expected r.m.s. difference calculated over individual years from 1905 to 2008 (red curve). Anomalies are computed separately for each dataset with respect to the mean annual cycle of the period shown. The square root is calculated on the annual mean square values. Units are K.

A global assessment of the quality of the 20CRv2 upper-air fields can be made by comparing the geopotential height fields with other reanalyses that also assimilated upper-air observations: ERA-40 (Uppala et al., 2005) and NNR (Kalnay et al., 1996). We first compare the 20CRv2 geopotential height fields with the ERA-40 fields for the period 1958–1978, i.e. before substantial amounts of satellite observations were available to the ERA-40 system (Figure 7). Shaded contours show the local anomaly correlations of the four-times-daily 20CRv2 and ERA-40 anomalies of 300 hPa geopotential height (anomalies are computed with respect to each dataset's annual cycle). We obtain similar results for comparisons with NNR, with higher correlations in the Southern Hemisphere (not shown). The correlations with both reanalysis datasets are also higher at lower levels in the troposphere (not shown). To illustrate where the ERA-40 might be considered a highly reliable estimate of the observed variability, the thick black line contour in Figure 7 shows the region poleward of which the correlation of the ERA-40 and NNR 300 hPa geopotential height anomalies is greater than 0.975 for this period. Over most of this region, the sub-daily anomalies of 20CRv2 and ERA-40 correlate higher than 0.9, with some large areas, mainly over the Pacific and Atlantic storm track regions, correlating at higher than 0.95. Correlations over the Tropics and Oceania region exceed 0.65.

Figure 7.

Map of the local anomaly correlation between four-times-daily anomalies of 300 hPa geopotential from ERA-40 and 20CRv2 over the period 1958 to 1978. Anomalies are computed separately for each dataset with respect to the mean annual cycle of the period shown. The area north of the thick black line is where ERA-40 and NCEP–NCAR reanalyses correlate highly (≥0.975).

In contrast, away from the Southern Hemisphere midlatitude land regions where ERA-40 would have been expected to have assimilated radiosonde observations (Uppala et al., 2005), the correlations of the 20CRv2 and ERA-40 anomalies are generally below 0.5. Indeed they are lower than expected from our previous feasibility study using observing system experiments (Compo et al., 2006).

To further investigate this issue and expand the comparison with independent observations in general, Figure 8 shows correlations computed over the same 1958–1978 period, but between 20CRv2 and 300 hPa geopotential height observations obtained directly from the CHUAN dataset of radiosonde observations (Stickler et al., 2010), which during this period consists of observations from the Integrated Global Radiosonde Archive (Durre et al., 2006) with the RAOBCOAREv1.4 correction applied (Haimberger, 2007). Correlations are plotted in Figure 8 only if a sounding site contained at least 730 observations over the 21-year period. While the correlations over the Northern Hemisphere are consistent with those obtained with the upper-air-based reanalyses (e.g. Figure 7), correlations for the Tropics and extratropical Southern Hemisphere are not. In particular, correlations with the in situ observations from the Southern Hemisphere Extratropics are considerably higher than with ERA-40. For example, the correlation with the radiosonde data exceeds 0.85 for the station on the Antarctic peninsula where the correlations with ERA-40 are less than 0.5. In contrast, correlations with the in situ data near the Equator are lower than with ERA-40.

Figure 8.

Correlation between 300 hPa geopotential height subdaily anomalies from 20CRv2 and radiosondes for 1958–1978. Anomalies are computed separately for each dataset with respect to the mean annual cycle of the period shown. Values are only shown for radiosonde stations having at least 730 observations during the 21-year period.

Correlations between 20CRv2 and ERA-40 for the period 1979 to 2001, when the latter used substantial amounts of satellite observations and a considerably larger number of radiosonde observations (Uppala et al., 2005), are shown in Figure 9. The correlations are now high in both extratropical hemispheres, with values in some regions exceeding 0.95, and generally exceeding 0.9 in the Northern Hemisphere middle and high latitudes where ERA-40 and NNR agree well. The overall high correspondence over the globe, with most regions correlating at greater than 0.75, provides further evidence of the overall quality of the 20CRv2 fields when pressure observations are available.

Figure 9.

As Figure 7, but calculated for the period 1979–2001. In the Southern Hemisphere, the area between the two thick black lines is where ERA-40 and NCEP–NCAR reanalyses correlate highly (≥0.975).

Another common method of examining analysis quality is to examine the forecast skill compared to other forecasting systems. In Figure 10, the r.m.s. errors of 24-hour forecasts of marine SLP observations are shown separately for the Northern and Southern Hemispheres for three different sets of forecasts: those from ERA-40 (red curve), NNR (yellow curve), and 20CRv2 (blue curve). Note that NNR forecasts are actually 21-hour forecasts for the period 1948–1957, as analyses were generated at 1500 UTC instead of 1200 UTC during this period (Kistler et al., 2001). Also note that while daily forecasts are shown for 20CRv2 and ERA-40, 12-hour and longer forecasts were generated only every 5 days in the NNR project. The expected r.m.s. errors of 24-hour persistence forecasts, computed at the observation locations as described previously, are also shown (black curve). It is readily apparent that over the Northern Hemisphere, the ERA-40 and NNR forecasts are superior to those of 20CRv2 for the entire period, with the exception of NNR forecasts during 1972, when the forecast skill was comparable. A similarly degraded forecast skill in the NNR system during the early 1970s is also seen in 5-day forecasts of 500 hPa heights verifying against the NNR analyzed fields (Kistler et al., 2001). Comparing with the ERA-40 forecast skill, we interpret these results as suggestive of additional undetected coding errors (Kistler et al., 2001) in the NNR radiosonde archive of this period that were corrected in the ERA-40 archive, rather than of real variations in predictability.

Figure 10.

Time series showing 24-hour r.m.s. difference between 24-hour forecasts of SLP and observations of SLP from marine platforms over the extratropical (a) Northern Hemisphere (20°N–90°N) and (b) Southern Hemisphere (20°S–90°S). Forecast verification is shown for forecasts from20CRv2 (blue), ERA-40 (red), and NCEP–NCAR (orange) reanalyses. The thin black curve shows the expected skill of 24-hour persistence forecasts at the observation locations. Mean square differences are computed for each calendar year and then the square root is taken.

The Southern Hemisphere forecast skill in Figure 10(b) presents a surprising contrast to that of the Northern Hemisphere. Forecasts made with 20CRv2 initial conditions now have smaller errors than both persistence forecasts and forecasts made with NNR and ERA-40 initial conditions until about 1975 and 1979, respectively, and remain comparable until 1984. The ERA-40 and NNR begin to show improved skill with the assimilation of satellite data, with observations from the Vertical Temperature Profile Radiometer (VTPR) satellite being assimilated into ERA-40 starting in 1973 (Uppala et al., 2005) and into the NNR starting in mid-1975 (R. Kistler, pers. comm., 2010). Note that the time variation of the persistence errors reflects the increasing density and expanded spatial coverage of marine observations for this region (e.g. Woodruff et al., 2010). These lower 24-hour forecast errors for 20CRv2 are consistent with the high correlations of 20CRv2 to radiosonde observations seen in Figure 8. We note a peculiar degradation of the forecasts in 1983 for all three systems. This variation suggests either an issue with the underlying pressure observations or the prescribed SSTs, as 20CRv2 would have no dependency on any other observing system. An alternative explanation, that the lower forecast skill reflects a substantial interannual variation in predictability coincident with the outstanding El Niño episode of 1982–1983, seems unlikely given the absence of another such unanimous drop in forecast skill in any other year in Figure 8.

The results in Figures 7, 9, and 10 are indicative of improvement in the quality of ERA-40 in the Southern Hemisphere as satellite data become available (Bromwich and Fogt, 2004; Bengtsson et al., 2004b; Uppala et al., 2005). A reasonable hypothesis for this effect is that the use of constant background-error covariances that were consistent with higher-quality forecasts in the satellite era was inappropriate for, and reduced the quality of, the ERA-40 analyses in the pre-satellite era. The lower skill of 24-hour forecasts prior to the 1970s using ERA-40 and NNR initial conditions, compared to 20CRv2 initial conditions, in the Southern Hemisphere lends support to this hypothesis. To the extent that better upper-level fields lead to better SLP forecasts beyond a few hours, these results also suggest that the upper-tropospheric 20CRv2 fields before about 1975 may be more accurate in the Southern Extratropics.

Overall, the analysis quality shown in Figures 7–10 is largely consistent with previous observing system experiments using only surface pressure observations (Whitaker et al., 2004; Compo et al., 2006; Whitaker et al., 2009). These results are expected to be representative of the 20CRv2 quality during early periods for regions where similar pressure data densities are available.

7. Representation of mean climate and climate variability

Assimilating only surface pressure and SLP observations and prescribing monthly-mean SST and sea-ice concentration, we have generated full three-dimensional estimates of the state of the troposphere every six hours from 1871 to 2008 in the 20CRv2. Figure 11 shows latitude–height sections of the zonally averaged 138-year mean zonal wind and temperature in this reanalysis dataset. The principal features of these zonally averaged quantities are generally as expected, and also generally consistent with those in the ERA-40 and the NNR datasets. The mean differences of the 20CRv2 from these datasets, over a common 1979 to 2001 post-satellite era, are detailed in Appendix A. One common difference worth highlighting here is a warm lower-tropospheric polar temperatures bias of the 20CRv2 with respect to both the ERA-40 and the NNR. We have identified this as an error arising from a mis-specification of the sea-ice concentration near coastal areas discussed in section 2, which will be corrected in future versions of our reanalysis.

Figure 11.

Zonal mean of (a) zonal wind speed and (b) air temperature from 20CRv2 averaged over the period 1871 to 2008. The contour interval is (a) 2 m s−1 and (b) 5 K. In (a), the zero contour is thickened and negative contours are dotted.

We recognize that the reproduction of well-known climatological features in Figure 11, while reassuring, does not provide a hard test for this reanalysis: as already demonstrated in numerous studies, most of those features can also be captured in ‘AMIP’-style atmospheric general circulation model (AGCM) integrations with prescribed observed SSTs and sea-ice concentrations alone (e.g. Gates et al., 1999; Anderson et al., 2004; Hurrell et al., 2006), i.e. without any other observational input. A harder test would be the reproduction of the climatological features of synoptic variability. If the surface pressure observations were having no impact on our analyses, then the synoptic variability in our 56-member ensemble-mean analysis would be muted by the averaging out of random phase errors in each ensemble member, as is the case with the ensemble mean of ‘AMIP’ integrations. Indeed, for an infinite-member ensemble, the resulting variability would consist of low-frequency variations associated only with the imposed SST and sea-ice variations. It is therefore of considerable interest to examine the statistics of synoptic variability present in the ensemble-mean 20CRv2 analyses. Figure 12 provides important reassurance in this regard. It shows that the statistics of extratropical ‘storm tracks’ in the 20CRv2, as represented by the variance of 2–7 day band-pass filtered anomalies (e.g. Blackmon, 1976; Compo and Sardeshmukh, 2004) of 500 hPa geopotential height, are very similar in both the Northern and Southern Hemispheres to those obtained from NNR for the entire 1948–2008 period. Similar results are obtained (Figure 13) for the ‘storm tracks’ represented in the variance of the band-pass filtered 500 hPa vertical velocity. This is a pleasant surprise, in that the 500 hPa vertical velocity was not considered a well-analyzed variable in NNR (a so-called class ‘B’ quantity, in the terminology of Kalnay et al., 1996), but the storm-track features are remarkably robust in the two datasets. Analysis of this ‘vertical velocity storm track’ measure is particularly useful; its interannual variability is closely related to that of precipitation (Compo and Sardeshmukh, 2004) and, as noted by Chang (2009), it does not appear to suffer from a Doppler effect observed in ‘geopotential height storm track’ measures (Burkhardt and James, 2006).

Figure 12.

Climatological winter mean (December–February) variance of 2–7-day bandpass filtered Northern Hemisphere 500 hPa geopotential height. Note that the square root of each field is plotted. In (a) and (b), the climatology is calculated for the period 1948–2008 using (a) NCEP–NCAR and (b) 20CRv2 reanalysis data. (c) is as (b) but for the period 1887–1947. (d)–(f) are as (a)–(c), but for the Southern Hemisphere. Contour interval in all panels is 15 m.

Figure 13.

As Figure 12, but for the 500 hPa vertical velocity (omega) for the Northern Hemisphere only. The contour interval in all panels is 2 cPa s−1.

The storm tracks estimated from the ensemble-mean 20CRv2 analyses for the previous 61 years (1887–1947) are notably weaker than for the later (1948–2008) period, in terms of both 500 hPa ‘geopotential height’ (Figures 12(c, f)) and for 500 hPa ‘vertical velocity’ (Figure 13(c)) storm tracks. Such a result should not be taken as indicative of an actual climate change. Rather, as the observational density gets lower, less synoptic variability is present in the ensemble mean analyses as fewer observations are available to increment the ensemble-mean first guess in Eq. (1). In the limit of no observations, the 20CRv2 ensemble mean analysis fields become the ensemble mean of a 56-member AGCM integration forced by SST and sea ice. In this limit, the synoptic variability in Figures 12(c), 12(f), and 13(c) should be reduced to about 1/(56)0.5 = 0.13 times that observed from a more complete observing system, or an individual ensemble member. The storm track estimates for 1887–1947 have considerably greater amplitude than this limit, particularly in the Northern Hemisphere, suggesting that even the sparse synoptic information in this period has made an impact on the analyzed synoptic variability. Nonetheless, it is clear that caution is indicated in investigations of the interannual variability and trends of this and other storm-track-related quantities using the ensemble-mean 20CRv2 analyses rather than each ensemble member. A Monte Carlo method to calculate derived quantities, such as tracking storm features, in which the statistic is calculated for each ensemble member and then averaged, may prove more fruitful in this regard.

Beyond the synoptic scale, the statistics of lower frequency variations, including blocking events, are also often represented in terms of the variance of band-pass-filtered anomalies (e.g. Blackmon, 1976; Compo et al., 2001). Figure 14 shows the standard deviation of 500 hPa geopotential height anomalies in the 7.5–45 day period band. The 20CRv2 fields for 1948–2008 compare well with NNR fields albeit with slightly weaker maxima, similar to what was seen in Figure 12. Also as in Figure 12, the standard deviation for 1887 to 1947 is generally smaller than for 1948 to 2008 over the North Pacific and western North America, again reflecting the weaker variability of the ensemble-mean analyses in the case of sparse observations. Still, over the relatively observation-rich North Atlantic region, Figures 14(b,c) are quite similar in the two periods.

Figure 14.

As Figure 12, but showing the square root of the variance of 7.5–45-day bandpass filtered 500 hPa geopotential height.

One might suspect that while synoptic and submonthly variability throughout the troposphere could be reasonably well captured using only synoptic surface observations, lower-frequency variability might be more poorly represented (Kanamitsu and Hwang, 2006). However, Figure 15 shows that the spatial patterns of the Northern Hemisphere monthly anomalies of 300 hPa geopotential height for 20CRv2 correspond well with those of ERA-40 and NNR, and continue to show the impact of the surface pressure observations in the reanalysis even at this relatively long time-scale beyond that associated with the prescribed observed SST and sea-ice boundary conditions. Time series of the pattern correlation between monthly anomalies from 20CRv2 and NNR for the months of December and June are shown. The results for all other months lie between these two extremes. The correlations with the ERA-40 fields for December and June are similar. The increase of June correlations from an average of 0.84 in 1958–1978 to an average of 0.89 in 1979–2001 most likely reflects the increasing use of satellite observations in ERA-40 and the NNR.

Figure 15.

Time series showing anomaly pattern correlations between monthly mean anomaly fields of Northern Hemisphere extratropical 300 hPa geopotential height from two upper-air-based reanalyses and 20CRv2 for the months of December (cool colours) and June (warm colours). Correlations with ERA-40 (NCEP–NCAR) reanalysis fields are shown by the blue (cyan) and red (orange) curves. For each ERA-40 series, the horizontal coloured lines show the mean value of the correlations when averaged before the period of substantial satellite observations (1958–1978) and then afterwards (1979–2001). The black curves show the expected anomaly pattern correlations when only the observed sea-surface temperature fields are available for the months of (thick) December and (thin) June. The horizontal black lines show the mean value of these expected correlations, with the thicker line for December and the thinner for June. Anomalies are with respect to the mean annual cycle of the period shown.

Figure 15 further demonstrates the impact of synoptic surface pressure observations even on upper-tropospheric monthly mean anomalies beyond that associated with prescribed observed SST and sea-ice boundary conditions. To assess whether the pattern correlations of the 20CRv2 fields with the ERA-40 and NNR fields merely reflect the correlations associated with response patterns to boundary forcing, the same pattern correlations were computed, but in a perfect model context using a 24-member ensemble of ECHAM4.5 AGCM (Roeckner etal., 1996) integrations generated by the International Research Institute using prescribed observed SSTs. To calculate the perfect model curves, the monthly-mean 300 hPa geopotential height anomaly field from a single ensemble member was treated as the ‘observed’ field and the mean of the remaining 23 anomaly fields was correlated with it. This procedure was repeated for all 24 members and all months. The average pattern correlation is plotted for the months of December and June. They are clearly much lower than the pattern correlations of the 20CRv2 with the ERA-40 and NNR fields. The slight apparent increase in those correlations from ∼0.18 before 1979 to ∼0.23 afterwards is not statistically significant at the 5% level.

Climate variability is often represented by climate indices of seasonally averaged data (e.g. as recently reviewed by Brönnimann et al., 2009). In Figure 16, we examine three climate indices calculated from the 20CRv2 and compare to five other estimates. Time series representing the Pacific Walker Circulation index (PWC; Figure 16(a)), the North Atlantic Oscillation (NAO; Figure 16(b)), and the Pacific–North America Pattern (PNA; Figure 16(c)) are shown from the reanalysis datasets of 20CRv2, ERA-40, NNR, and ERA-Interim, from statistical reconstructions, and from a nine-member ensemble of boundary- and chemistry-forced integrations using the Solar Climate Ozone Links (SOCOL) atmospheric-chemical GCM (Schraner et al., 2008). The SOCOL chemistry-climate GCM was integrated in the ‘all forcings’ configuration over the period 1901–1999 (Fischer et al., 2008), with a T30 horizontal truncation and 39 vertical levels. Each integration had prescribed boundary conditions of monthly mean SST and sea ice from HadISST (Rayner et al., 2003) and prescribed land surface conditions, stratospheric aerosols, solar variability, surface concentrations of greenhouse gases and ozone-depleting substances, emissions of short-lived species, and the Quasi-Biennial Oscillation in the stratosphere. Their climate variability was previously examined by (e.g.) Brönnimann et al. (2009) and Scaife etal. (2009).

Figure 16.

Time series of seasonally averaged climate indices representing (a) the tropical September to January Pacific Walker Circulation (PWC), (b) the December to March North Atlantic Oscillation (NAO), and (c) the December to March Pacific North America (PNA) pattern. Indices are calculated from various sources: 20CRv2 (pink); statistical reconstructions using Bronnimann et al. (2009) for the PWC, Griesser et al. (2010) for the PNA, and HadSLP2 (Allan and Ansell, 2006) for the NAO (all cyan); NCEP–NCAR reanalyses (NNR; dark blue); ERA-40 (green); ERA-Interim (orange); and SOCOL ensemble mean (dark grey). The light grey shading indicates the minimum and maximum range of the SOCOL ensemble. All indices are computed with respect to the overlapping 1989–1999 period. Indices are defined as in Brönnimann etal. (2009).

The definitions of the indices are the same as used in Brönnimann et al. (2009). The PWC is defined following Oort and Yienger (1996) as the difference in the area-averaged 500 hPa vertical pressure velocity between the regions of (10°S–10°N, 180–100°W) and (10°S–10°N, 100–150°E). The NAO is defined as the difference in the standardized monthly SLP anomalies at Ponta Delgada (Azores) and Reykjavik (Iceland). As the datasets are gridded, the nearest grid points are used. The PNA is defined following Wallace and Gutzler (1981) as

equation image(8)

where Z is the standardized 500 hPa monthly geopotential height anomaly at points P (20°N, 160°W), Q (45°N, 165°W), R (55°N, 115°W), and S (30°N, 85°W). All anomalies were determined with respect to the maximal overlap period for all datasets (1989–1999). Note that the statistical reconstructions (RECs) of the PWC (Brönnimann et al., 2009) and PNA (Griesser et al., 2010) span the period of 1901 to 1947 and 1880 to 1957, respectively. Unlike the reanalyses and SOCOL integrations, neither REC uses SST information, employing only the available monthly-mean SLP, land air temperature, and upper-air observations (Brönnimann et al., 2009; Griesser et al., 2010). The statistics for reconstructing PWC are referenced to the NNR (1948–2004) and those for the PNA are referenced to the ERA-40 dataset (1958–2001). The Hadley Centre sea-level pressure dataset (HadSLP2; Allan and Ansell, 2006), used as the statistical reconstruction for the NAO index, spans the full period. HadSLP2 also does not use SST information, incorporating only monthly mean SLP observations from marine and land platforms.

Correlations for the period of overlap between the estimates of the indices from the different datasets are shown in Table IV. It is no surprise that the SLP-based NAO agrees extremely well with the observational estimates, all of which use pressure observations. For example, the correlation between ERA-40 and 20CRv2 is 0.998 for the 1958–2001 period. In contrast, the forced SOCOL integration agrees poorly with all of the observational estimates, consistent with relatively weak SST forcing of the NAO (e.g. Scaife et al., 2009). Interestingly, there is also good agreement among the observational estimates for the time series representing the upper-air pattern of the PNA. In all cases, the observation-based estimates are in better agreement with each other than they are with the anomalies from the SOCOL integrations. While the relatively high correlation of 0.701 between the SOCOL PNA and ERA-40 PNA provides evidence of the well-known SST influence on this pattern of variability (e.g. Alexander et al., 2002; Barsugli and Sardeshmukh, 2002), the substantially higher correlation of 0.992 of the 20CRv2 with ERA-40 attests to the considerable observational information in the 20CRv2.

Table IV. Correlations between various estimates of seasonal indicesa,b,c of climate variability for the full period of overlap from each sourced.
  1. aPacific Walker Circulation; bNorth Atlantic Oscillation; cPacific–North America pattern. dTime periods used are 20CRv2 (1871–2007), ERA-40 (1958–2001), ERA-Interim (1989–2007), REC (PWC:1901–1947, PNA:1880–1957), HadSLP2 (1871–2007), NNR (1948–2007), SOCOL (1901–1999).

ERA-40 10.989NA0.9620.935
ERA-Interim  1NA0.9910.975
REC   1NA0.828
NNR    10.880
SOCOL     1
ERA-40 10.9970.9920.9990.293
ERA-Interim  10.9800.9830.090
HadSLP2   10.9890.213
NNR    10.250
SOCOL     1
ERA-40 10.995NA0.9930.701
ERA-Interim  1NA0.9920.895
REC   10.9320.622
NNR    10.691
SOCOL     1

In contrast to the other two indices, the PWC index agrees well with all the estimates, including those from SOCOL. This is not surprising, as there is a considerable SST influence on the variations of the Walker Circulation (e.g. Oort and Yienger, 1996). Note that the higher REC correlation with SOCOL than with 20CRv2 is not statistically significant at even the 20% level, assuming 45 degrees of freedom (the number of years of overlap between the PWC REC and SOCOL minus two).

Considering longer-term variations in Figure 16, it is readily apparent that none of the indices have demonstrable trends over the 1871 to 2008 period of 20CRv2, and a more rigorous statistical trends significance analysis supports this (not shown). Additional studies incorporating the uncertainty estimates in the 20CRv2 are under way to quantify further any significant decadal variability in Figure 16.

The results in Figures 12–16 demonstrate that the 20CRv2 reanalysis has successfully incorporated the information in synoptic surface pressure observations and its beneficial impact on estimates of the global tropospheric circulation, not only on the synoptic but also much longer time-scales. We end this section with a tantalizing look at perhaps the hardest test for such a surface-pressure-based reanalysis system: its ability to represent the mean hydroclimate and its variability. Figure 17(a) compares the 1980 to 2000 mean of zonally averaged precipitation P in the 20CRv2 and the Global Precipitation Climatology Project (GPCP, Adler et al., 2003) v.2 datasets. The comparison is generally favourable, and within the uncertainties estimated from intercomparisons among other observational precipitation datasets (e.g. Adler et al., 2003). Figure 17(b) shows the 1980–2000 mean of zonally averaged precipitation minus evaporation, PE, in the 20CRv2, and also its change Δ(PE) from that during the first 20-year period (1871–1891) of the reanalysis. The surprise here is that the meridional structure of Δ(PE) does not resemble that of PE itself. Such a resemblance might have been anticipated from simple arguments and climate model simulations (Held and Soden, 2006) as a ‘robust’ feature of the global hydrological response to global warming. Indeed at 10°N the sign of Δ(PE) is opposite to that of PE. Assessing the realism of such aspects of the 20CRv2 and other historical reanalysis datasets will clearly continue to be of interest.

Figure 17.

(a) Zonal mean precipitation rate P (mm day−1) averaged over 1980–2000 from 20CRv2 (black curve) and from GPCPv2 (red curve) datasets. (b) Zonal mean P minus evaporation rate E averaged over 1980–2000 from 20CRv2 (blue curve), and the 1980–2000 PE average minus the 1871–1891 PE average (red curve).

8. Summary and concluding remarks

To begin to address the needs of climate science for reanalysis products spanning the instrumental record, NOAA's Earth System Research Laboratory (ESRL) Physical Sciences Division and the University of Colorado CIRES Climate Diagnostics Center have led an international Twentieth Century Reanalysis Project to produce version 2 of the reanalysis (20CRv2) using US DOE supercomputers. The 20CRv2 dataset contains the first synoptic-observation-based estimate of global tropospheric variability spanning 1871 to 2008, and is derived using only observations of synoptic surface pressure and prescribing monthly SST and sea-ice distributions as boundary conditions for the atmosphere. The beneficial impact of these synoptic observations on the analysis of the tropospheric circulation has been demonstrated not only on the synoptic but on even much longer time-scales.

The first version of the dataset (20CRv1) has already been used to investigate issues as diverse as the US Dust Bowl (Cook et al., 2010), the early twentieth century Arctic warming (Wood and Overland, 2010), historical ENSO events (Giese et al., 2010), decadal Atlantic hurricane variability (Emanuel, 2010), and ocean ecology (Baird et al., 2010). The 138-year span of the 20CRv2 dataset should make it even more useful for a variety of climate applications ranging from assessments of storm track and extreme event variations to studies of drought and decadal variability to investigations into meteorological history. The dataset may also be useful for detecting inhomogeneities in independent observed time series, such as from radiosondes, utilizing, e.g., the method of Haimberger (2007).

The analysis fields are available at six-hourly temporal and 2° horizontal resolution and 24 pressure levels from NOAA ESRL ( and NCAR ( and will also be distributed via the NOAA National Model Archive and Distribution System (NOMADS; Selected fields from individual ensemble members will be made available via the NERSC science gateway (, NCAR, and NOMADS. Additionally, the complete spectral files for every ensemble member were archived, so any additional variable can be obtained for every member. These will be made available by courtesy of NCAR and NOMADS. Finally, the results of the assimilation of each observation in the ISPD, including the first guess and analysis uncertainty at the observation location (equation image and equation image, respectively), are also available by courtesy of NCAR (

With the production of the 20CRv2 version of the dataset complete, it is reasonable to consider how, in addition to using an improved NWP model in future versions, the dataset could be further improved. It is readily apparent that the current quality depends on the availability of pressure observations. We have demonstrated that, as more pressure observations become available, the reanalysis fields improve and become more certain (e.g. Figures 1, 3, 4, and 6). Building on a history of national data rescue projects (e.g. Manabe, 1999) and international exchange of marine and terrestrial data, recent ‘data archaeology’ by university researchers, national meteorological services, and ACRE and its partners suggests that millions of additional surface pressure observations from the early twentieth century and nineteenth century remain to be digitized. As an example, only 62 land stations were used to generate the 1871 fields of 20CRv2. Our recent research has uncovered 363 additional stations that could have been used if the data had been digitized. Considering even earlier periods, García-Herrera et al. (2005), for example, document an effort that digitized eighteenth and nineteenth century European national marine meteorological observations, and Wilkinson et al. (2010) describe marine SLP pressure observations becoming available from the British East India Company for 1790–1834 and a range of other sources (also Woodruff et al., 2005). Przybylak (2009) describes the network of Polish meteorological stations back to the eighteenth century, part of the network of European stations measuring pressure extending back even to the seventeenth century (Jones, 2001). Organizations such as the World Meteorological Organization and its Joint Technical Commission (with the Intergovernmental Oceanographic Commission) for Oceanography and Marine Meteorology (JCOMM), the NOAA Climate Database Modernization Program (CDMP, Dupigny-Giroux et al., 2007), the International Environmental Data Rescue Organization, universities, national meteorological services, and ACRE are working to recover these observations and uncover additional observations over land and ocean (e.g. Brunet and Kuglitsch, 2008). Taking advantage of other observation types, such as near-surface wind or temperature may provide another fruitful avenue for improvement (Anderson et al., 2005).

Several components of the current algorithm concerned with accounting for uncertainties arising from the use of a finite ensemble, an imperfect model, and imperfect observations are rather simplistic. In all three areas, there are opportunities for improvement. Instead of the current covariance localization and inflation, a more sophisticated accounting of model and sampling errors, such as using perturbed model parametrizations (e.g. Houtekamer et al., 2009), or prescribing spatially coherent additive stochastic noise (e.g. Whitaker et al., 2008), may reduce the discrepancy between the expected and actual errors in Figures 3 and 6. Allowing the observation error to vary adaptively (Li etal., 2009) may account for the apparent time variations in that quantity, particularly as shown for marine observations (Kent and Berry, 2005; Chang 2005, 2007). Known issues with balance in the Ensemble Kalman Filter (e.g. Mitchell et al., 2002) are indirectly addressed with our rather large localization distance of 4000 km, but additional steps to maintain balance could be implemented, such as the so-called ‘Chi correction’ (Sardeshmukh, 1993). The uncertainties in the prescribed SST, sea-ice concentration, CO2, volcanic aerosols, and solar variations are also only mildly accounted for with the present covariance inflation. A Monte Carlo approach that accounts for all sources of uncertainty in these components of the first guess may also improve the representation of uncertainty and the correspondence of the actual and expected errors.

Additionally, using a broader time window may improve the analyses. The implementation of the Ensemble Kalman Filter used here is actually a smoother (Sakov et al., 2010), since it incorporates observations up to 3 hours after the analysis time. Khare et al. (2008) have proposed an efficient lagged ensemble smoother algorithm, potentially extending the time window of observations used out to many hours or even days after the analysis time. While errors in the representation of the covariances associated with such long times will be an obstacle, the use of a wider observation window may provide a way to extract more information from sparse observations, particularly from mobile marine platforms. The ERA-CLIM project, which will develop the next ECMWF reanalysis back to 1900, will investigate the benefit of using longer time windows in the analysis step (D. Dee, pers. comm., 2010).

The overall quality of the 20CRv2 dataset may surprise some readers. While the relevance for weather studies appears to be consistent with that anticipated from advanced data assimilation algorithms applied in observing system experiments using only surface observations (Whitaker et al., 2004, 2009; Anderson et al., 2005; Thépaut, 2006; Compo et al., 2006), the relevance for climate studies, e.g. as suggested by the high correlations of monthly-mean anomalies in Figure 15 and climate indices in Figure 16, could not have been anticipated from those short feasibility experiments. The ability to generate skilful 24-hour forecasts of surface pressure (relative to persistence forecasts) even in years as data-poor as 1871 was another pleasant surprise. Still, many additional evaluations could be performed. These are planned as future work, e.g. comparisons with the EMSLP daily SLP dataset which extends from 1850 to the present (Ansell et al., 2006), and with the monthly mean land temperature datasets produced by NCDC (Smith et al., 2008), NASA Goddard Institute for Space Studies (Hansen et al., 2001), and the Climatic Research Unit of the University of East Anglia (Brohan et al., 2006).

With further improvements to the analysis algorithm and the potential for utilizing additional recoverable observations, a backward extension of the dataset to the start of the nineteenth century or even earlier seems plausible. For such earlier periods, the quality of the dataset in the Northern Hemisphere could be expected to be comparable to that shown here for the Southern Hemisphere in the late nineteenth century. We note that in his original ‘reanalysis’, Brandes (1820) constructed synoptic maps of 1783 using the Meteorological Society of the Palatinate observations, a collection of extensive weather observations, including pressure, spanning 1781 to 1792 taken by weather observers in 18 countries on both sides of the North Atlantic (Monmonier, 1999). Additional observing-system and observing-system-simulation experiments, with careful assessment of the quality of the recovered historical observations, will be needed to provide scientific support for such an undertaking.


The Twentieth Century Reanalysis Project received support and observational data from many people, organizations, and projects. The NOAA NCEP/EMC staff's years of work improving the GFS, and particularly the help of S. Moorthi, H.L. Pan, S. Saha, R. Kistler, and S. Lord is gratefully acknowledged. The NOAA ESRL/Physical Sciences Division (PSD) and CIRES/Climate Diagnostics Center (CDC) IT staff provided invaluable computer support, especially K. Healy, R. Jesse, A. McColl, C. McColl, B. McInnes, and N. Wilde. Assistance with ISPD calculations by C. McColl is also gratefully acknowledged. The work of D. Hooper and C. Smith to make the dataset available at NOAA ESRL/PSD and CIRES/CDC, and the work of the NCAR Data Support Section, especially J. Comeaux, D. Schuster and C.-F. Shi, to make the dataset available at NCAR is gratefully acknowledged. Consulting and support provided by the staff of the US DOE National Energy Research Scientific Computing Center, especially H. He and F. Verdier, and by the staff of the US DOE National Center for Computational Sciences, especially I. Carpenter and D. Kothe, is gratefully acknowledged. Useful discussions are acknowledged with colleagues in ESRL/PSD and CIRES/CDC (especially T. Hamill, R. Webb, R. Dole, C. McColl, M. Newman, and W. Neff), colleagues at NCAR (especially J. Anderson, J. Comeaux, R. Jenne, and K. Trenberth), colleagues at ECMWF (especially D. Dee, A. Simmons, and J-N. Thépaut), colleagues at the UK Met Office Hadley Centre (especially P. Brohan and T. Ansell), colleagues at the University of Maryland (especially P. Arkin and E. Kalnay), and colleagues at the US DOE (especially A. Bamzai and K. Yelick). The US Department of Commerce-Boulder and NCAR librarians kindly assisted with manuscript research. The authors would like to thank the organizations and projects listed in Table II and the following individuals for invaluable assistance in exchanging observations for the ISPD: W. Adam of the Lindenberg Observatory, DWD; K. Andsager of NOAA's Midwestern Regional Climate Center; T. Brandsma of KNMI; J. Burroughs, S. Doty, J. Elms, K. R. Knapp, D. H. Levinson, N. Lott, T. C. Peterson, and R. Truesdell of NOAA's NCDC; J. Comeaux and C.-F. Shi of NCAR; F. Le Blancq of the Jersey Met Service; S. J. Lubker of NOAA ESRL; G. Lentini of Universita degli Studi Milano; N. Nichols of Monash University; L. Srnec of the Meteorological and Hydrological Service of Croatia; A. Stickler of ETH Zurich for CHUAN; V. Swail of Environment Canada; B. Trewin and D. Jones of the Australian Bureau of Meteorology; D. Tse of the Hong Kong Observatory; M. A. Valente of the University of Lisbon and M. Barros of the University of Porto for SIGN contributions; J. S. Woollen of NOAA/NCEP; and V. Wagner and R. Zoellner of DWD. The IBTrACS dataset benefitted from the considerable work of C. Landsea and the HURDAT Reanalysis Project. The assembly of the ISPD under the auspices of the GCOS AOPC/OOPC Working Group on Surface Pressure and the WCRP/GCOS Working Group on Observational Data Sets for Reanalysis by NOAA/ESRL, NOAA's National Climatic Data Center (NCDC), and the Climate Diagnostics Center (CDC) of the University of Colorado's Cooperative Institute for Research in Environmental Sciences (CIRES) is gratefully acknowledged. The HadISST fields are courtesy of N. Rayner, BADC, and the UK Met Office Hadley Centre. The assistance of M. Fiorino with daily interpolation of HadISST fields is acknowledged. Access to the ECHAM4.5 AGCM integrations was courtesy of the IRI data library. The ERA-40 and ERA-Interim datasets are courtesy of ECMWF. Access to the NCEP–NCAR Reanalysis dataset is courtesy of NOAA ESRL/PSD. The authors would like to thank referees D. Dee, H. Mitchell, and an anonymous reviewer for constructive comments on an earlier version of the manuscript. R. J. Allan receives support under ACRE from the Queensland Climate Change Centre of Excellence, and from the UK Joint Department of Energy and Climate Change and Department for Environment, Food and Rural Affairs Integrated Climate Programme (DECC/Defra GA01101). NCAR is sponsored by the National Science Foundation. Any opinions, findings, and conclusions or recommendations expressed in this publication are those of the authors and do not necessarily reflect the views of the National Science Foundation. S. Brönnimann and A. N. Grant were supported by the Swiss National Science Foundation. Support for the Twentieth Century Reanalysis Project dataset is provided by the US Department of Energy, Office of Science Innovative and Novel Computational Impact on Theory and Experiment (DOE INCITE) program, and Office of Biological and Environmental Research (BER), and by the NOAA Climate Goal. The project used resources of the National Energy Research Scientific Computing Center and of the National Center for Computational Sciences at Oak Ridge National Laboratory, which are supported by the Office of Science of the US Department of Energy under contract No. DE-AC02-05CH11231 and contract No. DE-AC05-00OR22725, respectively.

Appendix A: Systematic Differences with Previous Reanalyses (NNR and ERA-40)

The overall structure of the time-mean flow shown in Figure 11 suggests that the basic features of the general circulation are captured in the 20CRv2 dataset. Such a result would be expected even for three-dimensional fields generated by an AGCM forced solely with SSTs (e.g. Gates et al., 1999; Anderson et al., 2004; Hurrell et al., 2006). Figure A1 shows that the features are in general agreement with those estimated from ERA-40 and NNR. Figure A1(a) shows the difference between the 20CRv2 and ERA-40 zonal mean zonal wind speed averaged over 1979–2001. Away from the southern polar troposphere, the overall biases in the troposphere are relatively small. Similar results are seen when comparing 20CRv2 to NNR fields (Figure A1(b)). The south polar tropospheric wind biases throughout the column are smaller relative to NNR than to ERA-40. In this same vein, tropical biases are of opposite sign up to 300 hPa, though the differences are slightly larger against ERA-40. In the stratosphere, large biases are seen in both Figure A1(a) and (b), with a magnitude and structure similar to the AGCM stratospheric biases noted in the Atmospheric Model Intercomparison Project (AMIP; Gates et al., 1999).

Figure A1.

Average difference of 20CRv2 zonal means of (a, b) zonal wind speed and (c, d) air temperature from (a, c) ERA-40 and (b, d) NCEP–NCAR reanalyses (NNR) over the period 1979–2001. Contour interval is (a, b) 1 K and (c, d) 1 m s−1. The zero contour is thickened in all panels, and dotted contours indicate negative differences. Black shaded regions indicate where more than 50% of the pressure-level grid points at that latitude are below ground.

Temperature biases (Figures A1(c,d)) are also relatively small throughout the troposphere, except in the lower troposphere in the vicinity of both poles. The tropospheric differences are generally much smaller than when compared to AMIP and other AGCM simulations forced by SSTs (e.g. Gates et al., 1999; Anderson et al., 2004). In the region of the tropical tropopause, some biases are observed, with positive bias near the Equator and negative biases approaching the Subtropics. These are still relatively small and smaller against NNR than against ERA-40. These biases have opposite signs in some parts of the tropical upper troposphere. Near the surface in the Arctic and Antarctic, however, the differences are relatively large compared even to AGCM biases (e.g. Gates et al., 1999; Anderson et al., 2004). In the polar lower troposphere, the differences are considerably smaller against ERA-40 than against NNR. These differences reflect an issue with the handling of the specified sea-ice concentration. In 20CRv2, the HadISST sea-ice concentration was specified in each gridbox. However, an error in the transformation of this concentration near coastal areas resulted in specified concentrations lower than in the HadISST daily-interpolated dataset. This resulted in a warm bias in the lower troposphere of both poles compared to the upper-air-based reanalyses. In the lowest layers, the difference from ERA-40 is smaller than that from NNR, but, in the mid-troposphere over the Arctic, the difference is smaller from NNR. The sea-ice concentration issue will be corrected in a future version of the dataset.

Appendix B: Quality-Control Procedure

A five-step quality-control procedure was utilized on the SLP and surface pressure observations from marine and station platforms. First, as a basic test, every pressure observation was reduced to sea level using the US Standard Atmospheric lapse rate in the troposphere (6.5 K km−1). If this reduced observation fell outside the plausible range of 880 to 1060 hPa, the observation was flagged and rejected without further consideration. This basic check was intended to prevent grossly erroneous observations from influencing subsequent adaptive quality-control decisions. In the following discussion, our references to observations will refer to only those pressure observations that passed this plausibility test.

In the second step, after an observation was reduced to the model orography using (5) to form yo, its absolute difference from the first guess equation image was compared to the combined error variance equation image. If equation image, the observation was flagged as having failed the ‘background check’.

Observations that passed the background check were then used in the third step, a ‘buddy check’ evaluation of all observations (Dee et al., 2001). We considered each observation in turn, constructing a single observation analysis equation image for the ith observation equation image using Eqs (1)–(4). We then determined the ratio Bi of the mean-square observation departure from equation image to the mean-square observation departure from the first guess equation image for the K nearby observations equation image located within a radius D from equation image,

equation image(B1)

If Bi < 1 at the 50% level, according to the F-test, even if the observation had been rejected by the background, it was now accepted, as it would improve the fit of the analysis to the neighbouring observations despite its significant difference from the first-guess ensemble mean. By the same token, if Bi was significantly greater than 1 at the 10% level, then it was rejected on the grounds that it would degrade the analysis fit to the neighbouring observations. In instances of Bi = 1, the observation was accepted only if it passed the background check. We iterated this buddy test twice through all observations, so that observations that were accepted became part of the pool of nearby observations and could then influence the automated decision-making on other nearby observations. After trial and error in developing this testing procedure, we set D = 1000 km.

The fourth step in this quality-control procedure was data thinning using the F-test on the ratio equation image of the analysis uncertainty to the first-guess uncertainty at the observation location. Observations were assimilated in the order of increasing F. Since smaller values of F imply a larger impact on the analysis ensemble, this means that observations expected to have a relatively large impact on the analysis were assimilated first. F was recalculated for all remaining observations as each observation was assimilated sequentially, i.e. Eqs (1)–(4) were iterated for all observations that passed the background and buddy checks or were restored by the buddy check. This recalculation accounted for the effect of already assimilated observations on the value of F. If F was not less than 1 at the 55% level (slightly higher than even odds) based on the F-test with 56 degrees of freedom, then the observation was not assimilated. This thinning allowed us to reserve a set of observations that are independent of the assimilation for use in later validation studies. As an illustration of the effect of these parameters, about 98% of the observations were retained in 1891 and 32% in 2005. We performed several tests with different levels of thinning and determined that the thinning used here did not adversely affect the quality of the analysis (not shown). Further, the thinning procedure removed previously undetected duplicate observations, and limited the inhomogeneity in the observing system by effectively capping the number of observations assimilated in the Northern Hemisphere at mid-twentieth century levels (as evidenced by the plateau in the thin black curve starting at roughly 1960 in Figure 3(a)).

The fifth quality-control step was a bias correction of the station observations. Surface pressure observations can have systematic differences from first-guess forecasts. These differences arise from actual measurement error, metadata errors in reported station elevation or position, coding errors, and model errors (e.g. Ingleby, 1995). Such differences will make the assimilation sub-optimal, as Eqs (1)–(4) assume that errors are random. We accounted for these biases through a systematic, time-varying bias correction algorithm. After 60 days of assimilation, at the start of every assimilation cycle, station observations were investigated for statistically significant biases in their differences from the first guess. All available pairs of first-guess forecast and raw observations were aligned in time. The observation bias was estimated as the mean difference between forecast and observations over the previous 60 days. If the difference was significant at the 5% level using a paired sample t-test, the estimated bias was removed from the raw observation. The number of degrees of freedom used in the t-test was the number of days N for which observation/forecast pairs were available. The maximum possible number was 60, if pairs were available for all 240 analysis times in the 60-day window. The bias was estimated only if N exceeded 30.

We expect that many of these biases are related to errors in the station elevation metadata. Occasionally, mean SLP observations appear to be coded as station pressure (or vice versa) or to have some other error, e.g. station elevation misreported by 40 m or more (Ingleby, 1995) or the 50 ft rule whereby Canadian stations reported surface pressure as SLP if the station elevation was 50 ft or less (Slonosky and Graham, 2005); our algorithm was able to account for the systematic effects of situations such as these. While some of the biases are also likely related to systematic model errors, one might expect calculated biases to be larger in regions of large orography, which we did not find (not shown). Additionally, biases can arise from systematic errors in the model's simulation of planetary waves (D. Dee, pers. comm., 2010), or through systematic errors in the temperature field which is used for the reduction to the model orography. Both of these may vary with season and contribute to the time-varying biases removed. We made no attempt to discriminate model and observation bias in this procedure, and as a result we likely included some model systematic errors into our observation bias correction. Distinguishing between these two sources of bias, perhaps using the thinned independent observations, is an ongoing area of research.

There was also no attempt to bias-correct marine observations, as not all marine observations have consistent identification or other observational metadata (e.g. Kent et al., 2007). Future enhancements to ICOADS may permit such corrections in subsequent versions of this reanalysis dataset.