No current tree ring (TR) based reconstruction of extratropical Northern Hemisphere (ENH) temperatures that extends into the 1990s captures the full range of late 20th century warming observed in the instrumental record. Over recent decades, a divergence between cooler reconstructed and warmer instrumental large-scale temperatures is observed. We hypothesize that this problem is partly related to the fact that some of the constituent chronologies used for previous reconstructions show divergence against local temperatures in the recent period. In this study, we compiled TR data and published local/regional reconstructions that show no divergence against local temperatures. These data have not been included in other large-scale temperature reconstructions. Utilizing this data set, we developed a new, completely independent reconstruction of ENH annual temperatures (1750–2000). This record is not meant to replace existing reconstructions but allows some degree of independent validation of these earlier studies as well as demonstrating that TR data can better model recent warming at large scales when careful selection of constituent chronologies is made at the local scale. Although the new series tracks the increase in ENH annual temperatures over the last few decades better than any existing reconstruction, it still slightly under predicts values in the post-1988 period. We finally discuss possible reasons why it is so difficult to model post-mid-1980s warming, provide some possible alternative approaches with regards to the instrumental target and detail several recommendations that should be followed in future large-scale reconstruction attempts that may result in more robust temperature estimates.
 For the ENH temperature reconstruction of D'Arrigo et al. , some of the constituent TR chronologies expressed divergence at the local scale. It is therefore not surprising that divergence was noted in the resultant hemispheric reconstruction. In this paper we test whether it is possible to develop an ENH reconstruction that better tracks recent temperature trends. We attempt to address the “divergence problem” noted in large-scale reconstructions by developing a new independent TR based reconstruction of ENH temperatures using published local/regional reconstructions and newly updated/sampled TR chronologies that show no divergence against local temperatures. This new record is not meant to replace existing reconstructions, in fact it allows some degree of independent validation of these earlier studies, but rather to quantify and test how much more skillfully TR data can model recent warming at large scales when careful selection of constituent chronologies is made.
2. TR Data Selection
 A number of criteria were defined to assess whether a published reconstruction or TR chronology would be considered for analysis:
 1. To ensure the independence of this study, only TR proxy series that had not been used in previous reconstructions of ENH temperatures would be considered.
 2. As undertaken by D'Arrigo et al.  each TR series must have acceptable replication (≥10 series within each site chronology) from 1750 to present.
 3. Nonpublished TR data must extend to 1995 or beyond.
 4. The TR proxy series must correlate at >0.40 against an optimal seasonal parameter of “local” gridded mean temperature data from the CRU3 [Brohan et al., 2006] land only data set. Even if a series correlates at <0.40, but the inferred association is significant at the 95% confidence limit, it was still rejected from further analysis.
 5. No significant autocorrelation (as measured using the Durbin-Watson statistic) must be observed in the model residuals from regressing the TR proxy time series against the local seasonal temperature data. If a significant divergence exists between the TR data and local temperatures, this would be expressed as trend in the model residuals and hence they would be autocorrelated.
 By strict observance to these criteria, we identified 15 TR based temperature proxy series (Figure 2 and Table 1) that portray reasonable estimates of “local” temperatures. 12 of these series are published reconstructions. Four (ALP, SCA, MON and MBC) of these published series have been “updated” or “reprocessed” for this study. A further three sites (WSI, KYR and NQU) were added to the hemispheric data set by using unpublished TR data acquired from either the International Tree ring Data-bank (http://www.ncdc.noaa.gov/paleo/treering.html) or nonarchived sources. Although a minimum of only 10 series were defined as acceptable, the signal strength (as measured using the Expressed Population Signal statistic [Briffa and Jones, 1990]) is strong in all of these series from 1750 onward. The time series for each of these 15 temperature TR proxies can be accessed in the auxiliary material and Appendix A briefly describes how each series was derived.
Table 1. Summary Information of Local/Regional Tree Ring Based Proxies Used in This Studya
Time Series Length
Local Gridded Correlations
Correlation With NH Temps
Filtered (20-Year Spline)
WSL, Swiss Federal Institute for Forest, Snow and Landscape Research. Time series length denotes the period replicated by 10 or more TR series. In some cases (WSI, NEP, COL, IDA, YUS, YUN) comparison was made to gridded areas larger than the standard 5 × 5°. DW, Durbin-Watson statistical test for 1st autocorrelation in regression model residuals. Correlation against extratropical NH (20–90°N) annual land temperatures were made over the common 1850–1988 period.
 It should be noted that two published TR reconstructions that meet the post-1750 replication screening criterion were not used for this study as they failed selection because of calibration issues. These series are the reconstructions from (1) Hokkaido (1557–1997 [Davi et al., 2002]) which, although correlating significantly at high frequencies with local gridded August–September temperatures (r = 0.48 (1876–1997), after the data had been transformed to 1st differences), expresses serious trend differences between the TR and instrumental data (before ∼1920 and after ∼1980) that result in significant autocorrelation in the model residuals (see Davi et al.  for more discussion where the time series was purposely interpreted to provide high-frequency climatic information only for the region); and (2) Kamchatka (1630–1992 [Gostev et al., 1996]) which correlates poorly (r = 0.25, 1906–1992) with local gridded temperatures despite the much stronger published correlation (r = 0.63, 1942–1983) noted against the nearest meteorological station (Esso [Gostev et al., 1996]).
3. Reconstruction Method
 We employ a similar method to that used by D'Arrigo et al.  to derive a new ENH annual temperature reconstruction. Averaging was performed to composite the TR proxy time series into continental and NH mean series, which were used to calibrate against the instrumental record. A nested approach, which accounts for the decrease in the number of chronologies (in this case forward in time), was used to extend the reconstruction as far forward as possible [Meko, 1997; Cook et al., 2002]. This procedure entails normalizing the TR time series to the common period of all series in each nest and then averaging the series together to create a nest mean. To develop the final reconstruction, the mean and variance of each nested time series were scaled to that of the most replicated nest (1750–1988) and the relevant sections for each nest spliced together. For each nest, separate average time series were generated for North America and Eurasia and these continental-scale time series were averaged to produce a final large-scale mean that was not biased to one particular continent because of the varying number of series. This process, undertaken iteratively as each TR time series left the data matrix, resulted in eight nested mean series upon which calibration and verification were made separately. Following a similar approach as D'Arrigo et al. , full period calibration, against extratropical (20–90°N) land-only mean annual (January–December) temperatures, was made over 1850–1988 (the common period of the TR and instrumental series), while verification was made over the period 1896–1942 after appropriate calibration using the combined 1850–1895/1943–1988 period. Verification was made using the square of the Pearson's correlation coefficient (r2), the Reduction of Error (RE) statistic and the Coefficient of Efficiency (CE) statistic [Cook and Kairiukstis, 1990; Cook et al., 1994]. Both RE and CE are measures of shared variance between the actual and modeled series, but are usually lower than the calibration r2. A positive value for either statistic signifies that the regression model has some skill. CE provides the more rigorous verification test. To test the robustness of the decadal to long-term signal in the reconstructed nested series, assessment of the regression model residuals (from the full period calibration) was also employed using the Durbin-Watson statistic. As the modeled temperature signal is predominantly at timescales ≥∼20 years [Cook et al., 2004], it is particularly important to identify models that have significant trends in the model residuals, as they would therefore not portray long-term variability in a robust manner.
4. Results and Discussion
Table 1 shows the correlation of each of the TR time series against their optimal seasonal parameters of local gridded temperatures. Correlations range from 0.41 (IDA and TSH) to 0.77 (MBC) with an overall mean coherence with local temperatures >30%. It should be noted that some of the correlations and identified optimal seasons are slightly different to the original publications. This most likely reflects the difference between using gridded versus local temperature records, the latter often providing improved correlation results, as well as identifying the optimal season with respect to the Durbin-Watson statistic results rather than just the correlation coefficient. The fit between the TR and gridded temperature series at local scales can be qualitatively assessed in Figure 3. In all cases, no decadal length divergence occurs for any of the records over the recent period.
 The premise of reconstructing large-scale NH temperatures from a relatively sparse network of proxies is that each proxy explains a certain percentage of the local temperature variance, which is itself a constituent part of the large-scale hemispheric mean. Therefore, when these proxy data sets are combined, they theoretically should explain a reasonable amount of the large-scale instrumental temperature variance. However, for a given region or tree ring site, the individual proxy series themselves (and also the instrumental data) may not correlate strongly with NH temperatures. The last two columns in Table 1 show the correlation of each proxy series with ENH mean annual land only temperatures for both unfiltered and filtered (20-year spline) versions of the data. SCA, MON, MBC and YUN correlate most strongly with NH temperatures at ∼0.30 (unfiltered). In general, coherence increases substantially when the data are smoothed although some records still correlate poorly. In fact, TAT and TSH are inversely correlated (albeit weakly) with NH temperatures over the mid-19th to late 20th century period. However, as each of these proxies track relatively well their respective local temperature records (Figure 3), these commonalties in trend (or lack thereof) with the NH instrumental data may simply reflect regional variation in temperatures (see also later discussion).
Table 2 presents the calibration (using ordinary least squares regression) and verification results for each of the eight nested models needed to derive the full NH reconstruction (1750–2000). Using each of the validation statistics (r2, RE and CE), each nested model passes all verification tests. The CE values are similar to the RE values as the mean of the instrumental data in the verification period (−0.18°C: 1896–1942) is very close to the mean of the combined calibration period mean (−0.17°C: 1850–1895/1943–1988). The amount of the instrumental temperature variance explained by each nest decreases as the proxy records leave the data matrix. However, in general, up to 1997 the explained variance is around 20–25% using unfiltered series and 70–75% using smoothed (20-year spline) series. Although, after 1997, the explained variance drops to ∼10% (50% for smoothed) for the last two nests, composed of 7 and 6 records respectively, the verification statistics still indicate some fidelity of the models. We therefore, for comparison, utilize the whole length of the reconstruction (1750–2000), with the caveat that the quality of the record decreases after 1988.
Table 2. Calibration and Verification Results for Each Nested Modela
Number of Series
Full Calibration: 1850–1988
r2, square of the multiple correlation coefficient; SE, standard error of the estimate; DW, Durbin and Watson statistic for residual autocorrelation; RMSE, root-mean-square error; RE, reduction of error; CE, coefficient of efficiency.
Figure 4a compares the new NH reconstruction (hereafter WNH2007) with annual ENH instrumental temperatures after being scaled (same mean and variance) over the 1850–1988 period. As expected from the reasonable calibration/verification results (Table 2), the TR proxy series tracks the trends in the instrumental data quite well. Over the last 250 years, WNH2007 shows moderately cool temperatures until the early 19th century followed by a sharp decline around 1810. Reconstructed temperatures then increase from ∼1830, with decadal-scale variability (i.e., decadal-scale cooling around 1900–1918 and 1961–1976), until present. The coldest decade in WNH2007 is 1812–1821 (−0.94°C) which coincides with a period of known volcanic activity (e.g., Tambora, 1815) and low solar irradiance (the Dalton Minimum). The warmest decade is 1989–1998 (+0.24°C). WNH2007 shows strong coherence with previous TR based reconstructions of ENH temperatures (Figure 4b) with interproxy correlations ≥0.75 (Table 3). WNH2007 has no data overlap with these previous studies, so this strong common signal provides important mutual validation, at least over the last 250 years, of extratropical temperature trends expressed by all these TR based proxies. Focusing on the most recent period (Figure 4c), however, shows that WNH2007 still underpredicts temperature values. Between 1970 and 1995, the linear increase in instrumental temperatures is 0.29°C/decade (Figure 4c), compared with 0.22°C/decade in the WNH2007. However, the new reconstruction is still an improvement on earlier attempts (see trends listed in Figure 4d).
Table 3. Correlation Matrix (1750–1992) Between WNH2007 and Previous TR Based Reconstructions of ENH Temperatures
 Although the WNH2007 reconstruction slightly under predicts values in the post-1988 period, it is the first large-scale TR based reconstruction of ENH temperatures that extends (with reasonable chronology replication) to 2000 and shows not only increasing reconstructed temperatures in general agreement with the instrumental data but also indicates that the late 1990s have been the warmest period for the last 250 years. Despite the improvement of WNH2007 compared to previous NH reconstructions, some discussion is needed to try and address why there is still underestimation of predicted values in the recent period:
4.1. Issues Related to the Instrumental Predictand Data
4.1.1. Urban Heat Island Effect
 The Urban Heat Island (UHI) effect is another source of uncertainty in the recent period that could bias recent instrumental temperatures upward. However, with a quantified large-scale influence of 0.0055°C/decade [Folland et al., 2001; Brohan et al., 2006], this effect, as often cited [e.g., Intergovernmental Panel on Climate Change, 2001, 2007], has only a minimal effect upon hemispheric temperatures and therefore has only a small potential influence upon the divergence problem for ENH reconstructions. The UHI effect, however, could be more relevant for local TR based studies that show divergence.
4.1.2. Early Instrumental Record
 The robustness of the ENH instrumental record decreases going back in time as the number of constituent station data decrease [Brohan et al., 2006]. It has been suggested for studies in Europe [Moberg et al., 2003; Büntgen et al., 2005, 2006b; Frank and Esper, 2005], coastal Alaska [Wilson et al., 2007] and for the Northern Hemisphere [Esper et al., 2005; D. C. Frank et al., Warmer early instrumental measurements versus colder reconstructed temperatures: Hemispheric to regional evidence, submitted to Quaternary Science Reviews, 2007, hereinafter referred to as Frank et al., submitted manuscript, 2007] that instrumental temperatures in the 19th century are possibly “too warm” because of homogenization correction procedures. If this is the case, inclusion of earlier instrumental data may bias reconstructed values. Using the WNH2007 series, calibrations using annual temperatures made over the 1850–1988 and 1880–1988 periods result in respective correlations of 0.50 and 0.55 (Table 4) with resultant little difference in the overall trend/amplitude of the final time series. The slight improvement in calibration using the shorter period is not statistically significant. However, this difference becomes arguably more significant when summer temperatures are the target seasonalized parameter, which has important repercussions for large-scale TR based reconstructions (see next section).
Table 4. Correlations Between WNH2007 and ENH Instrumental Temperaturesa
The summer season is May–August.
4.1.3. Target Season
 Most reconstructions of NH temperatures calibrate against an annualized (e.g., January–December) parameter despite the fact that many of the constituent proxy series may be best quantified as summer temperature series at the local/regional scale (Table 1 and Figure 3). It has, however, been argued that trees from selected tree line sites may integrate climate conditions during non-growing-season months [e.g., Jacoby and D'Arrigo, 1989; Payette et al., 1996; Frank and Esper, 2005] and therefore better calibration may be obtained, over large hemispheric scales, against annualized temperatures rather than the summer season. The WNH2007 series appears initially to be no different in this regard. Over the 1850–1988 period, the correlation of this series to annual and summer (May–August) ENH temperatures is 0.50 and 0.35 respectively (Table 4). This weakening in coherence for the summer season is likely related to the proxy data not being able to model well the relatively warmer 19th century temperatures expressed in the instrumental summer ENH data (see Figure 5a and also Esper et al.  and Frank et al. (submitted manuscript, 2007) for a more detailed discussion). If, however, calibration is made over the shorter period, 1880–1988, the WNH2007 series correlates with annual and summer (May–August) ENH temperatures at 0.55 and 0.60 (Table 4). Despite there being no statistical difference between these two correlation values, this result, along with those of the previous section, therefore lead to a simple choice when reconstructing large-scale ENH temperatures: should a data set of proxy series that are generally weighted to the summer season (but which may integrate conditions throughout the year), be used to model annual temperatures at large scales, or should TR based summer-weighted proxies be used exclusively to reconstruct large-scale summer temperatures, but with calibration excluding the early period where possibly instrumental temperatures are “too warm”?
4.1.4. Target Temperature Parameter
Wilson and Luckman  hypothesized that when trees grow in temperature-limited environments, in regions where there is a significant difference in trend between daytime maximum and nighttime minimum temperatures, they will show greatest response to maximum temperatures since the bulk of cambial activity (i.e., photosynthesis and respiration) occurs during the daytime. Although this hypothesis is essentially untested, Youngblut and Luckman  showed that superior calibrations were obtained against maximum temperatures in the southern Yukon (see auxiliary material). Similar calibration results are also reported utilizing a composite tree ring maximum density chronology for the central Spanish Pyrenees (U. Büntgen et al., Eight centuries of Pyrenees summer temperatures from tree-ring density, submitted to Climate Dynamics, 2007, hereinafter referred to as Büntgen et al., submitted manuscript, 2007) where a significant trend difference between maximum and minimum temperatures exists. Frank et al.  have also documented a stronger response with maximum temperatures from trees in the Central Altay mountains in Russia. At hemispheric scales, studies of 20th/21st century instrumental climate records [Karl et al., 1993; Easterling et al., 1997; Vose et al., 2005] have shown that minimum temperatures have been rising significantly faster than mean or maximum temperatures. Vose et al.  show that over the 1950–2004 period, the increasing linear trend of NH annual maximum temperatures is 0.15°C/decade, while minimum temperatures have increased by 0.23°C/decade. Focusing on the recent period, Figure 6 further highlights the greater linear increase of minimum temperatures since 1970 compared to maximum temperature for both the annual (0.38°C versus 0.31°C/decade) and summer (0.31°C versus 0.24°C/decade) seasons. If indeed trees do respond predominantly to daytime maximum temperatures, then the difference in trend between maximum and minimum temperatures at hemispheric scales may be a further factor that needs to be taken into account when calibrating against large-scale temperatures. However, the results from calibration trials using both ENH minimum and maximum temperatures are ambiguous. Figure 7 compares WNH2007 against ENH maximum and minimum temperatures for the annual and summer seasons after scaling over the 1880–1988 period. The highest correlation (r = 0.63) is found against ENH summer minimum temperatures. However, WNH2007 clearly does not track recent summer minimum temperatures well after 1988 (Figure 7d) with results generally similar to the standard approach of calibrating against mean temperatures (Figures 4 and 5). Scaling WNH2007 against summer ENH maximum temperatures (Figure 7b), however, despite the correlation being weaker (r = 0.53), results in the proxy reconstruction actually expressing a slightly greater increasing trend (0.23°C versus 0.21°C/decade) over the 1970–2000 period compared to the instrumental data.
4.2. Issues Related to the Number of Proxy Records
4.2.1. Proxy Replication
D'Arrigo et al.  argued that because of the low number of temperature sensitive chronologies existing for the pre-1400 period in their ENH reconstruction, the temperature estimates for this earlier period (i.e., the Medieval Warm Period) must be interpreted cautiously. This observation was emphasized in the recent NRC [2006, p. 110] report where they stated that current “large-scale temperature reconstructions should always be viewed as having a ‘murky’ early period.” In this study, the same can be said for the recent period (i.e., it gets “murkier” as we extend beyond 1988). The successful reconstruction of ENH temperatures, therefore, must use a sufficiently replicated network of proxies. Despite the encouraging verification results, the quality of the WNH2007 reconstruction weakens as the number of proxy series decreases toward 2000 (Table 2). Is it possible, therefore, that some of the noted post-1988 divergence is simply related to the low number of proxy records used? To test this, we utilized the optimal seasonal (mostly summer) instrumental series used to calibrate the 15 TR proxy series (Table 1 and Figure 3), to derive an instrumental based reconstruction of ENH temperatures. Because of the varying length of the local/regional instrumental series, a nesting procedure as used for WNH2007 was not used. Rather, the series were normalized to their common period (1942–1995), averaged for each continent, and a final NH mean series derived by averaging the continental series together (again after renormalizing these series to 1942–1995). The variance of the continental mean series was stabilized using the Osborn et al.  method. This instrumental based reconstruction (hereafter INSTNH) was calibrated (scaled) to both annual and summer ENH mean temperatures over the 1889–1988 period (the inner year denotes the point at which replication was at least two instrumental series per continent). Figure 8 presents the calibration results and shows a weaker correlation with annual ENH temperatures (r = 0.47, Figure 8a) compared to the summer season (r = 0.60, Figure 8b). As the predominant season of the local grid temperature series is weighted to summer, the higher correlations with ENH summer temperatures (Figure 8b) are perhaps not surprising. However, for both annual and summer ENH temperatures, a slight divergence is observed after 1988. The linear trends (1970–2000) of the actual and reconstructed ENH series for the annual (0.32°C versus 0.19°C/decade) and summer (0.26°C versus 0.17°C/decade) seasons clearly suggests that from using such a sparse network of time series it is not possible to model recent warming well.
4.2.2. Spatial Sampling of Proxy Series and Their Coherence to ENH Temperatures
 On the whole the above analyses suggest that (1) summer ENH temperatures may possibly be the optimal target parameter (although further work is needed to address both the quality of the pre-1880 instrumental data and the physiological relationship of tree growth to summer and annual temperatures) and (2) the post-1988 divergence might also simply be a product of low replication. To possibly exacerbate the problem of low replication, the balance of those proxies that correlate positively with NH temperatures versus those that do not may also bias the final mean function as well. As stated earlier, two of the TR proxies (TAT and TSH) show slight decreasing trends through the 20th century and so correlate inversely with ENH temperatures (Table 1). NEP and IDA also correlate weakly with ENH temperatures. It is possible, therefore, that the inclusion of these proxy series may slightly bias the final WNH2007 series downward in the recent period. If these series are excluded when generating WNH2007, the final hemispheric time series is very similar (r = 0.95) to the full data set version (Figure 9). However, over the 1970–2000 period, the linear trend is actually marginally greater (0.17°C versus 0.11°C/decade) in the original WNH2007 series suggesting that the removal of these four series weakens the overall ENH reconstruction. These results are ambiguous, but essentially highlight the sensitivity of the removal of a few series on the final mean function when replication is generally low. Therefore one interpretation is that TR proxy series that correlate negatively or weakly with ENH temperatures, so long as they are robust representations of their local temperature records should not necessarily be excluded from NH reconstructions. However, considerable caution is required. For example, it would not be possible to develop a robust large-scale reconstruction of NH temperatures using a network of local/regional proxies, despite being robust estimates of local temperatures, that were all inversely correlated or weakly correlated with large-scale temperatures. Therefore a careful balance of sites, representing the actual balance shown in the instrumental record, is needed.
 Utilizing a data set of TR based published local/regional reconstructions and new TR chronologies, we have developed a new and completely independent reconstruction of ENH temperatures (WNH2007, 1750–2000) that compares well with previous reconstructions up to the mid-1980s. The constituent TR proxy time series were chosen as they portrayed relatively robust estimates of local/regional temperatures, showed no divergence in the recent period with their respective instrumental predictand records and allowed reasonable replication up to 2000 in the final reconstructed time series. We hypothesized that the utilization of TR based proxies that show no divergence at the local-scale could result in better estimates of hemispheric temperatures in the recent (post-mid-1980s) period where all other TR based ENH hemisphere reconstructions [Briffa, 2000; Esper et al., 2002; D'Arrigo et al., 2006] diverge below the increasing trends in the instrumental data. The WNH2007 reconstruction does indeed portray a significant increase in predicted values for the last two decades, with the warmest decade over the last 250 years being 1989–1998, although there is still some underestimation in the predicted values compared to measured values over this period.
 1. Only TR data that express a robust nonbiased estimate of local/regional temperatures should be used. The degree of coherence of a particular record with NH temperatures, so long as it correlates robustly with local temperatures, is only of minimal importance so long as proxy replication is high.
 2. The “divergence problem” needs to be addressed and explored at the local/regional scale. For those TR records where the divergence effect can be attributed to anthropogenic influences (i.e., related to pollution or dimming etc.) the data can be truncated at the point where divergence starts, and the rest of the data used [see Wilson and Elling, 2004]. Alternatively if these effects are seen to be the result of detrending “end effects” [Melvin, 2004; K. Briffa and T. Melvin, Climatic Research Unit, personal communication, 2006], correction can be made using improved detrending techniques. With respect to temporally unstable relationships, palaeoclimatology must ultimately rely on James Hutton's principle of uniformitarianism whereby relationships between proxies and their targets, drawn during the calibration interval, are assumed to remain relatively stable over time. Therefore, for those TR chronologies which express a significant response change with climate (e.g., a weakening in temperature response due to an increase in moisture stress), these series should be used with caution (or in some cases not at all) for such large-scale reconstructions of past temperatures since it is not possible to quantify whether such nonlinear response changes have also occurred in the past, unless it is presumed that such a nonlinear response is unique to the recent anthropogenic period.
 3. Currently, most NH temperature reconstructions target the annual season despite the individual proxies generally portraying a summer signal at local scales. Although it has been argued that trees from selected tree line sites may integrate climate conditions during non-growing-season months [Jacoby and D'Arrigo, 1989; Payette et al., 1996; Frank and Esper, 2005], this tendency may also be partly related to a better empirical “fit” between the proxy and instrumental annual data prior to 1880, a period where the quality of large-scale hemispheric instrumental data can be questioned. Calibration trials using WNH2007 against ENH temperatures (Figure 5 and Table 4), excluding the pre-1880 period, show similar results for both the annual and summer seasons. Therefore more detailed explorative work assessing the quality of instrumental series prior to the 1880s is needed before a balanced decision can be made on which is the optimal target seasonal parameter for reconstruction. Further calibration trials (Figure 8), but utilizing a mean of the gridded temperature series used for calibration of the individual TR proxy series, strongly suggest that ENH summer temperatures would be the optimal large-scale target instrumental predictand season.
 4. The research of Wilson and Luckman , and the simple analyses made in this study suggest that optimal calibration, with regards to tracking recent temperature trends using TR data, can be gained by targeting maximum rather than mean temperatures. To test this hypothesis, however, more explorative work on tree ring growth/temperature relationships is needed in regions where there is a significant difference in trend between nighttime and daytime temperatures [e.g., Youngblut and Luckman, 2007; Büntgen et al., submitted manuscript, 2007]. If indeed a predominant optimal tree response is found with maximum temperatures at temperature limiting locations (i.e., altitudinal and latitudinal tree lines), this would have major implications for dendroclimatology that must be addressed in the ongoing discussion of late 20th/early 21st-century changes in tree ring/climate relationships.
 5. Finally, not only are much more data needed in the early pre-1400 period [Cook et al., 2004; NRC, 2006; D'Arrigo et al., 2006] to increase replication and therefore improve large-scale reconstruction confidence during these earlier periods, but existing data sets also need to be updated to present, as well as incorporating new data sets, to allow more robust comparison with the instrumental record over recent decades.
Appendix A:: Description of the 15 TR Proxy Records
A1. European Alps
 Two summer temperature reconstructions have recently been developed for the Alpine region. Büntgen et al. , using 1,527 ring width (RW) measurements from living trees and relict wood, produced a June–August temperature reconstruction back to AD 951. The reconstruction is composed of larch data (Larix decidua) from four Alpine valleys in Switzerland and pine data (Pinus cembra) from the western Austrian Alps. These regions are situated in high-elevation Alpine environments where a spatially homogenous summer temperature signal exists. The regional curve standardization technique (RCS [Mitchell, 1967; Briffa et al., 1992; Cook et al., 1995; Esper et al., 2003a]) was applied to the RW measurements in an attempt to capture the full frequency range of summer temperatures over the past millennium. In a related study, Büntgen et al. [2006b], utilized maximum density (MXD) data (processed using RCS) from 180 recent and historic high-elevation tree ring (TR) series from the Swiss Alps to develop a summer temperature reconstruction over the AD 755–2004 period. This reconstruction correlates at ∼0.7 with Alpine high-elevation summer temperatures back to 1818. Both Alpine records suggest that summer temperatures during the last decade are unprecedented over the past millennium. Over the 951–2002 common period, both reconstructions correlate at 0.55 and therefore, for this study, were averaged together after they had been normalized to their common period. The resulting mean time series correlates with gridded (45–50°N/5–10°E) temperatures (May–September) at 0.64, and shows no autocorrelation in the model residuals (Table 1). Some of the pine data used by Büntgen et al.  was included in the generation of the D'Arrigo et al.  RCS NH reconstruction. These data were not included in their STD version however.
A2. Tatra Mountains
 From a network of 24 RW and four MXD chronologies Büntgen et al.  developed two summer temperature reconstructions for the Tatra region in Poland and Slovakia. The trees in the network consisted of four conifer species (Picea abies, Larix decidua, Abies alba, and Pinus mugo) from which ring width and maximum density measurements were standardized using individual series detrending approaches. Principal component analysis identified five dominant eigenvectors that express somewhat contrasting climatic signals. The first principal component contains highest loadings from 12 Picea abies RW chronologies and explains 42% of the network's variance. The mean of these 12 high-elevation chronologies correlates at 0.62 with June–July temperatures, while the mean of the three MXD chronologies that load most strongly on the fourth principal component, correlates at 0.69 with April–September temperatures. These groupings allowed the development of RW and MXD based reconstructions of June–July (1661–2004) and April–September (1709–2004) temperatures, respectively. For this study, we utilized the broader seasonal windowed MXD based reconstruction (correlates with local April–September gridded temperatures at 0.67, Table 1) as the RW based temperature proxy showed significant autocorrelation in the model residuals when it was regressed against the gridded data.
A3. Northern Scandinavia
Kirchhefer  developed July–August temperature reconstructions using RW data (each TR series detrended individually) from Scots pine (Pinus sylvestris) for three regions in Norway: Forfjorddalen in the Vesterålen archipelago (AD 1358–1992), Stonglandseidet on Senja (AD 1548–1994) and Vikran near Tromsø (AD 1700–1992). The correlations of these reconstructions with local station data were 0.56, 0.50 and 0.71 respectively. As detailed by Kirchhefer [2001, Figure 6], these three series were combined to develop a regionally representative temperature proxy time series (1700–1989). For this study, we “extended” the Kirchhefer  study to 2001 by complimenting the original chronology data with new recently archived RW chronologies from Norway and Finland. The old and new data were obtained from the International Tree ring Data-Bank (ITRDB) (http://www.ncdc.noaa.gov/paleo/treering.html). Only chronologies that correlated significantly with July–August gridded temperatures (65–70°N/15–30°E) were considered for further analysis. The chronologies (with ITRDB code, latitude/longitude, and period with >10 series) that passed this screening were (1) Tutkimusasema (FINL054, 67°N/26°E, 1586–2000), Vytamoselka (FINL055, 67°N/27°E, 1577–2001), Rorstaddalen (NORW011, 67°N/15°E, 1550–1997) and Borealoa River (NORW012, 67°N/15°E) [Melvin, 2004], and (2) Karhunpesäkivi Inari (FINL021, 68°N/27°E, 1509–2001), 1625–1997) and Karasjok (NORW007, 69°N/25°E, 1733–2001) (updated sites from Lindholm et al. ); Stonglandseidet (NORW009, 69°N/17°E, 1537–1997) and Forfjorddalen 2 (NORW010, 68°N/15°E, 1254–1993) [Kirchhefer, 2001]. Note that the Forfjorddalen 2 chronology was included despite its termination in 1993 as it was utilized in the original Kirchhefer  study. These chronologies were detrended using so-called standard techniques (either negative exponential or regression functions of negative or zero slope). The resultant chronologies were normalized over their common period (1733–1993), averaged to derive a regional mean series and the variance of the time series stabilized using the method outlined by Osborn et al. . This final mean series (1733–2000) correlates with July–August temperatures at 0.61 (Table 1) and coheres strongly (r = 0.75) with the original Kirchhefer  regional series (see Figure A1).
A4. Western Siberia
 Both RW and MXD parameters were measured (at the Swiss Federal Institute for Forest, Snow and Landscape Research) from a newly sampled larch (Larix sibirica) site in western Siberia (Putorama, 70°31′N/92°57′E). The period where replication is at least 10 series is 1713–2000. The two data types were detrended using standard individual series methodologies and mean chronologies computed. As both chronologies correlated reasonably strongly with each other (r = 0.61) and their response to summer temperatures was similar, they were averaged together, after being normalized to the common period, to derived a mean summer temperature proxy for the region. This series correlates with mean May–September gridded temperatures at 0.43 (Table 1). The Western Siberia series tracks the gridded temperature series quite well (Figure 3) except for the last two years, where the proxy values are substantially lower than the actual instrumental data. These two years of misfitting are too short to identify whether this is a significant divergence.
 This record was derived from combining two temperature-sensitive elevational tree line RW records from Mongolia–Khalzan Khamar (Larix sibirica; ITRDB code: MONG009, 49.55°N/91.34°E) and Horin Bugatyin Davaa (Pinus sibirica; MONG009, 49.22°N/94.53°E). These two series were chosen from data presented by D'Arrigo et al.  as they expressed the strongest response to summer temperatures. The chronologies were detrended using standard methods, and after normalizing to the common period, averaged to derive a mean series. The period replicated with >10 series is 1636–1998. This composite record is independent of the Solongotyin Davaa record used in D'Arrigo et al.  and correlates well (r = 0.70) with gridded June–July temperatures (Table 1).
 A search through the ITRDB, found surprisingly few temperature sensitive TR data sets that came up to at least 1995. Kyrgyzstan was one region where such chronologies were found (sampled and measured by the Swiss Federal Institute for Forest, Snow and Landscape Research). In this region, both RW and MXD data were obtained from two spruce (Picea shrenkiana) sites Sarejmek (ITRDB code: RUSS152, 41.36°N/75.09°E) and Tschongkys (RUSS164, 42.11°N/78.11°E). Chronologies were computed using standard techniques. The period covered by at least 10 series in each chronology is 1689–1995. To account for the varying coherence between each chronology, the chronologies were not averaged to derive site mean series, but rather the RW and MXD chronologies were utilized separately as potential predictor series in a stepwise multiple regression against gridded temperatures. The final optimal model was calibrated against June–July mean temperatures, with the final series being a linearly weighted combination of the Sarejmek MXD and RW data as well as the RW data from Tschongkys. The final Kirgistan reconstruction explains 36% (r = 0.61. Table 1) of the gridded temperature variance.
A7. Tien Shan
Esper et al. [2003b] developed a chronology from 203 Juniperus turkestanica RW series to reconstruct temperature variations in the Tien Shan Mountains (Kirghizia). The Tien Shan RW data were detrended to remove tree age related biases and to emphasize high- to low-frequency climatic signals over the past millennium. Esper et al. [2003b] showed that the data correlated most strongly with the June–September temperatures of the Fergana meteorological station in eastern Uzbekistan (r = 0.46). In this study, taking into account residual analysis, we identified the optimal correlation (r = 0.41) with July gridded temperatures (Table 1).
Cook et al.  described the development of a multispecies (Abies spectabilus, Juniperus recurva, Populus ciliata, Pinus roxburghii, Picea smithiana, Pinus wallichiana, Tsuga dumosa and Ulmnus wallichiana) TR chronology network in Nepal and the development of two temperature reconstructions. The network was composed of 32 TR chronologies (processed using the RCS method) and was represented by five indigenous tree species. Principal component analysis of the chronologies over the common interval 1796–1792 indicated that there was a coherent large-scale common signal among the TR chronologies which was hypothesized to reflect, in part, broad-scale climate forcing related to temperatures. Using monthly temperature data from Kathmandu two reconstructions were developed: February–June (1546–1991) and October–February (1605–1991). In this study, we utilize the February–June reconstruction as it incorporates the summer season (to be consistent with the other TR proxies). We found the optimal season of coherence to be with March–May temperatures (r = 0.49).
 Using Pinus aristata RW data (detrended using standard techniques) from the San Francisco Peaks in Arizona, Salzer and Kipfmueller  developed >2000 year long reconstruction of annual temperatures for the region. In the original study, the reconstruction explains 46% of the variance in the temperature data over the 1909–1994 calibration period. In this study, correlations with gridded temperatures were not so strong (0.42, 1889–1996), but still passed the screening criteria.
Biondi et al. , using RW data (detrended using standard techniques) from whitebark pines (Pinus albicaulis) and Douglas firs (Pseudotsuga menziesii), developed an 858-year proxy records of July temperatures for east-central Idaho. The correlation of their proxy series with instrumental July temperatures was 0.47 (1895–1992) with this improving to 0.55 when the 1895–1903 period was removed. In this study, using gridded data from a relatively large region (40–45°N/110–120°W), we show that these correlations are generally consistent back into the 19th century (r = 0.41, 1868–1992).
A11. British Columbia
Wilson and Luckman  demonstrated the possibility of reconstructing both May–August maximum (Tmax) and minimum (Tmin) temperatures using RW and MXD series (detrended using standard techniques) from (Engelmann spruce (Picea engelmannii Parry) at tree line sites across Interior British Columbia, Canada. Multiple linear regression of three orthogonal principal components (derived from 12 RW and 7 MXD chronologies) were used to reconstruct each climate parameter separately. Calibration explained 64% (Tmax) and 39% (Tmin) of the variance in the instrumental climate record (1895–1991). In this study, we utilize the same data set as Wilson and Luckman , except we removed the Harts Pass (Washington State) RW chronology (1585–1991) from the data set. The chronologies were used over the period denoted by at least 10 series (1750–1997) and principal component analysis identified three principal component scores comparable to those of the original study. Using stepwise multiple regression, calibration against mean May–August gridded temperatures (1894–1997) resulted in a reconstruction explaining 58% of the temperature variance (r = 0.77, Table 1). The reconstruction code (Table 1) is denoted by MBC to minimize confusion with the IBC maximum temperature reconstruction detailed by Wilson and Luckman .
A12. Southern Yukon
Youngblut and Luckman  utilized a network of high-elevation Picea glauca TR chronologies from the southwest Yukon to reconstruct June–July maximum temperatures back to 1684 AD. The chronologies (processed using standard methods) are characterized by low interannual RW variability and display similar patterns of RW variability across the sample area over the last 300 years. The driving force of this common signal appears to be a common tree growth response to summer temperatures across the region. Using seven chronologies a reconstruction of maximum June–July temperatures was developed back to 1684 A.D., explaining 46.6% of the climatic variance over the 1946–1995 calibration period. In this study, we correlate this record with gridded temperature from a broader region (60–65°N/130–140°W) that allows assessment of the series prior to the original calibration period. Over the 1898–2000 period, the series correlates at 0.54 (Table 1) with June temperatures (the optimized climate parameter when using this extended gridded data set).
A13. Northern Yukon
Szeicz and MacDonald  used RW data from five Picea glauca chronologies in northwestern Canada to develop a June–July reconstruction back to 1638. Age-dependent modeling was used to optimize the calibrated signal as well as capture more low-frequency variability in the final proxy time series. The original reconstruction was calibrated against local station data and explained 47% of the June–July variability. In this study, the correlation (r) of the record with local gridded June–July temperatures is 0.60 (Table 1).
A14. Wrangell Mountains
Davi et al.  developed a warm-season (July–September) temperature reconstruction (1593–1992) based on the first eigenvector from principal component analysis of six Picea glauca MXD chronologies (processed using standard techniques). Their reconstruction explained 51% of the temperature variance over the 1958–1992 calibration period. In this study, we derived a simple mean of the four longest chronologies from the Davi et al.  study (ITRDB code: AK077, Caribou Creek (62.33°N/143.17°W), Nabesna Mine (AK074, 62.22°N/143.03°W), Big Bend Lake (AK078, 61.20°N/142.43°W) and Hawkins Hill (AK075, 61.09°N/142.05°W)) which allowed the series to be brought forward to 1997. The resultant time series correlates optimally with July–August mean gridded temperatures at 0.63 over the extended period 1900–1995. It should be noted that Davi et al.  also produced an eigenvector score from a RW chronology network in the region. The resultant PC (utilized by D'Arrigo et al. ) although showing a similar temporal history to the MXD data, showed divergence since 1970 when compared to the instrumental data. Davi et al.  stated that this might be related to increasing moisture stress or other factors.
A15. Northern Quebec
 It is obvious from Figure 2, that there is a spatial bias in the availability of relevant temperature sensitive TR data sets for North America. This is partly due to the geography of North America, with the mountains and resultant upper elevational tree line sites being mostly restricted to the west, but also reflects that the recently updated TR chronologies along the northern latitudinal tree line were utilized by D'Arrigo et al. . In the ITRDB, there are no archived TR chronologies, which express a temperature signal that go up to at least 1995. Luckily, newly sampled data sets were obtained for Northern Quebec from Lake Tesialuk (Picea glauca, 1620–1997, 58.23°N/67.03°W) and Pyramid Lake (Picea glauca, 1608–2002, 57.27°N/65.12°W [Payette, 2007]). These two chronologies were supplemented by RW data from Fort Chimo (Larix laricina, 1641–1974, 58.22°N/68.23°W, ITRDB code CANA002). The data from these three data sets were pooled and a regional northern Quebec chronology was developed (detrending using standard methods) with >10 series for the 1641–2002 period. This series correlates with gridded July temperatures at 0.42 over the 1942–2002 period (Table 1).
 R.W., U.B. and J.E. are funded by the European Community under research contract 017008-2 MILLENNIUM. R.D. and B.B. were funded by NOAA and by the National Science Foundation's Earth System History and Paleoclimatology programs. D.F. is funded by the Swiss National Science Foundation (NCCR-Climate). The Canadian studies were funded by the Natural Sciences and Engineering Research Council of Canada (B.L., D.Y. and S.P.) and Meteorological Service of Canada and the Inter-American Institute for Global Change Research (B.L. and D.Y.). We thank Philip Brohan and Andreas Kirchhefer (funded through MILLENNIUM) for informative discussions on this topic. This is Lamont-Doherty Earth Observatory contribution 7050.