3.1. Pros and Cons of Various Antarctic Temperature Data Sets
 In order to understand Antarctic climate variability and to diagnose global climate models (GCMs), having records that are representative of near-surface temperature over the entire Antarctic continent is desirable. One method of doing this is to simply take the linear average of all station records available [e.g., Jones and Reid, 2001]. Such analyses are useful for assessing year-to-year variability, but are not reliable for evaluating the spatial distribution of trends because of the relatively sparse network of observing stations. Temporal trends calculated by linear averaging indicate spurious warming for recent decades because a disproportionate number of stations are located on the Antarctic Peninsula, a region whose ice comprises only ∼5% of the total surface area of the ice sheet [Vaughan et al., 1999], where strong warming has occurred over the past 50 years [e.g., Vaughan et al., 2003]. Individual station records suggest that there has not been statistically significant warming elsewhere on the continent [e.g., Turner et al., 2005]. Because of the problems cited, linearly averaged Antarctic temperature records are not employed in this study.
 Objective analysis methods [Doran et al., 2002; Chapman and Walsh, 2007] have reduced problems compared to linear averaging, as these methods interpolate/extrapolate to voids using station data (either trends calculated from the station data, or raw station data) that is weighted as a function of inverse distance or a natural neighbor scheme [Cressie, 1999]. These analyses do not show strong warming trends and indicate that Antarctic temperatures collectively have not changed significantly since the 1960s. Statistically insignificant cooling over most of the continent has occurred on an annual basis from about 1970 to 2002 [Chapman and Walsh, 2007]. The annual and seasonal time series from Chapman and Walsh  are used in this study, as they provide the most recent and complete analysis of Antarctic temperatures.
 Numerical atmospheric model fields provide useful assessments of temperature over Antarctica, and they account for topography, storm activity, teleconnections, and other natural phenomena that impact climate. However, one problem that has plagued model reanalysis fields in Antarctica is the dearth of observational data assimilated into the models prior to the modern satellite era (∼1979). This leads to relatively poor simulations before ∼1979, and improved simulations thereafter [e.g., Bromwich and Fogt, 2004; Bromwich et al., 2007]. Thus the evaluation and use of ERA-40 temperatures is limited to the period 1980–2001 in this study. The 1980–2001 ERA-40 annual and monthly temperature fields are used to create the background field for the statistical reconstruction, allowing temperature to be interpolated/extrapolated to data voids from station observations in a physically based manner.
 Skin temperature from AVHRR instruments onboard the National Oceanic and Atmospheric Administration's suite of polar orbiting satellites is the final Antarctic temperature data set used. AVHRR records provide the most spatially comprehensive observations of Antarctic temperatures. AVHRR temperature records must be used with caution as they are only valid for clear-sky conditions, an issue that can be problematic in the coastal Antarctic regions where conditions are more often cloudy than not [e.g., Guo et al., 2003]. However, statistical sampling is relatively good, especially in the Antarctic region where overlapping orbits enable as many as 12 measurements of the same surface per day. It should be noted that the skin temperatures inferred from thermal-infrared sensor data may be significantly different from the 2 meter air temperature observed by meteorological stations, especially in spring and summer. Also, a fixed emissivity close to unity is assumed for the surface for all seasons in the retrieval algorithm. This may cause a slight error in melt areas (near the coast) in the spring and summer. A thorough description of the AVHRR record and its quality over Antarctica is given by Comiso . The most recent realization of the AVHRR temperature data set is used in this study. The most recent published version of the data set for Antarctica is Kwok and Comiso [2002a].
 Monthly temperature records from sixteen stations were selected from the READER database to validate our Antarctic temperature reconstruction (Table 2 and Figure 2). None of the sixteen records were used in our reconstruction, and therefore they provide an independent means of assessment. Eight of the sixteen records were used in the reconstruction of Chapman and Walsh , and therefore only the eight independent stations (indicated in Table 2) are used to calculate statistics in cases where the data sets are compared. The sixteen stations were chosen on the basis of completeness of record, and to provide a representative sampling of the climatic variability across Antarctica. Eight stations are located on the coast, and eight are in the interior of Antarctica, six of which are >1000 m ASL. Five of the stations have records that begin prior to 1980, the beginning of the calibration period for the reconstruction. For ease of comparison, the following nomenclature will be used henceforth: “READER” are the observed temperature records; “RECON” is our new near-surface temperature reconstruction; “CHAPMAN” is the reconstruction of Chapman and Walsh ; and “COMISO” is the AVHRR temperature data set [Comiso, 2000; Kwok and Comiso, 2002a].
Table 2. Description of the Independent READER Temperature Observations Used to Validate the Reconstructiona
|Station Number||Station||Latitude||Longitude||Elevation, m||Type||Country||Duration||n (60-02)||n (82-02)||σRECON/σREADER||σCHAPMAN/σREADER||σCOMISO/σREADER|
 Figure 4 shows the monthly and annual correlation (Figure 4a), root mean square error (RMSE; Figure 4b) and ratio of RMSE to standard deviation (RMSE/σ) between the READER (observed) near-surface temperature anomalies and those from RECON, CHAPMAN, and COMISO for the independent station data available for the common period 1982–2002. Of the eight stations that are independent of both the RECON and CHAPMAN data sets, six have data during this period (stations 6, 7, 8, 9, 14, and 16). The statistics for January, for example, are calculated for all available January observations from the six stations. The total number of observations for all months from each station are shown in Table 2 (column “n (82–02)”). The comparison for each data set and each station is exact (months in each data set for which there are no observations are excluded). The results presented in Figure 4 provide an estimate of the average reconstruction skill at a single grid point. In RECON, correlations are r > 0.7 during seven months; in CHAPMAN r > 0.7 during 10 months; and in COMISO, r > 0.7 during 4 months. In most of the remaining months, r > 0.6 in all three data sets. In general, correlations are lowest in the summer and highest during the cold months, in part related to minimum sea ice cover in summer which enhances localized temperature effects at coastal stations. Annual correlations in all three data sets are lower than expected (0.25–0.35), an issue caused by having few total station years (n = 33) for which to calculate the statistics, and also because one of 33 observations is questionable. If the questionable record is removed, the annual correlations are 0.50, 0.58, and 0.47 for RECON, CHAPMAN, and COMISO, respectively.
Figure 4. For the three observational data sets, RECON, CHAPMAN, and COMISO, the (a) correlation, (b) RMSE (K), and (c) RMSE/σ between the observed and reconstructed temperature anomalies for all available observations for the six common independent stations placed into monthly and annual (“Y”) bins. The stations are shown in Figure 2 and described in Table 2 (stations 6, 7, 8, 9, 14, and 16). Confidence intervals (p < 0.05) for the correlations are indicated by the error bars (only the lower bound of the uncertainty is shown).
Download figure to PowerPoint
 The RMS errors in all three data sets have strong seasonality (Figure 4b), being largest in winter and smallest in summer. However, when standardizing the errors to account for dampened temperature variability during summer (due to the enhanced maritime effect), it is seen in Figure 4c that the largest “relative” RMS errors occur during the late summer and early autumn months (January–April), when the greatest fraction of open water is present around Antarctica [Gordon, 1981; Parkinson, 1992]. The RECON data typically have higher RMS errors than the CHAPMAN data (Figure 4b), but they have lower relative RMS errors (Figure 4c), a condition that arises because the RECON data are adjusted to match the observed temperature variance (otherwise the kriging method dampens the variability), and thus have larger variability than CHAPMAN. Examination of the ratios of the standard deviations of the reconstructed data sets versus observations (the last three columns in Table 2) indicates that the RECON variability is close to that observed (0.94 on average). The CHAPMAN and COMISO data slightly underestimate the observed variability (on average, 0.72 and 0.79, respectively). Accounting for the seasonal cycle of variability in the RECON data set eliminates the seasonal cycle in the relative RMS errors (Figure 4c). In the CHAPMAN data, after accounting for the seasonal cycle of variability, the largest relative RMS errors occur in the late summer and early autumn months (JFMA). Correspondingly the average correlation coefficients in CHAPMAN during these months (Figure 4a) are lower compared to the other eight months (r = 0.68 versus r = 0.76). The average correlation coefficients in RECON are nearly identical between the two periods (r = 0.71 versus r = 0.70). The COMISO data have lower correlation coefficients during the warm months (on average, r = 0.57 for JFMA versus r = 0.66 for the remaining months), which may be due in part to surface melt conditions (for the coastal stations) which cause a decrease in surface emissivity and hence a slight error in the retrieval. Also, during melt the near-surface air temperature may be significantly different from the skin temperature (which is fixed at 0°C). Furthermore, the relatively coarse grid of 12.5 km for the AVHRR data would cause measurements in coastal stations to be partly that of ocean regions which are ice free and relatively warm in the summer.
 Figure 5 shows the correlations between the READER (observed) near-surface temperature anomalies and those from RECON and CHAPMAN for the independent station data available for the common period 1960–2002. The objective of Figure 5 is to evaluate the performance of the data sets over a longer period than in Figure 4. Figure 5a is similar to Figure 4a, showing the monthly and annual correlations for all available observations from the 8 common independent stations (see figure caption for stations). Figure 5b shows the monthly and annual correlations from all available observations from the 8 independent stations after they have been averaged together first. Figure 5b estimates the ability of RECON and CHAPMAN to reproduce regional temperature variability, whereas Figure 5a estimates their average ability at a single grid point. Figure 5c shows the correlations at each station for all of the monthly observations available (counts are shown in the “n (60–02)” column in Table 2), and thus provides an estimate of the ability of RECON and CHAPMAN to reproduce the temperature variability across all months at a given station. The objective of presenting Figure 5a is to show that the correlations are similar to those in Figure 4a, and are thus not sensitive to the period chosen; RECON and CHAPMAN have consistent skill throughout 1960–2002. It is noteworthy that the annual correlations are higher than for the 1982–2002 period because more annual averages are available for the analysis (n = 75 for 1960–2002, versus n = 33 for 1982–2002). The RECON and CHAPMAN data sets are able to reproduce regional variability (Figure 5b) with strong statistical significance. RECON (CHAPMAN) has correlations exceeding 0.6 in 12 months (11 months), and correlations exceeding 0.8 in 4 months (5 months). The correlations are highest during the winter months. Evaluation of correlations at individual stations (Figure 5c) demonstrates that RECON and CHAPMAN are consistently able to reproduce observed variability with strong statistical significance (r > 0.6 in all instances, r > 0.7 in most instances). The correlations at the 4 independent low-elevation coastal stations (stations 1, 2, 8, and 14) are similar to those at the 4 independent high-elevation interior stations (stations 6, 7, 9, and 16). The RECON correlations are r_low = 0.75 versus r_high = 0.76, and the CHAPMAN correlations are r_low = 0.80 versus r_high = 0.80. In the RECON data set, for which all 16 stations are independent, the correlations between West Antarctica (stations 1, 2, 5, 7, 10, 13, 14, and 15) and East Antarctica (stations 3, 4, 6, 8, 9, 11, 12, and 16) are compared and found to be similar (r_west = 0.73 versus r_east = 0.76).
Figure 5. Correlation coefficients between the observations and the RECON and CHAPMAN near-surface temperature anomalies for the common period 1960–2002 for (a) all available observations from the eight common independent stations placed into monthly and annual (“Y”) bins; (b) all available observations from the eight common independent stations averaged together first, then placed into monthly and annual bins; and (c) all monthly observations at each individual, independent station (the eight common stations, 1, 2, 6, 7, 8, 9, 14, and 16, plus an additional eight stations that are independent in the RECON evaluation only). Confidence intervals and station information are as described in Figure 4.
Download figure to PowerPoint
 Figure 6 shows the temporal trends of the temperature anomalies for the READER (observed), RECON, and CHAPMAN data sets for several cases. Figure 6a shows the monthly and annual trends for 1982–2002 for all available observations for the six common independent stations (the COMISO data are also included in Figure 6a since they cover the 1982–2002 period). The trends are calculated from the same data for which statistics are presented in Figure 4, and they demonstrate the ability of the data sets to reproduce observed trends at a single grid point. It is noteworthy that these statistics do not accurately depict the actual Antarctic temperature trends, as they represent an assemblage of discontinuous observational data sets. Figure 6b is similar to Figure 6a, but for the 1960–2002 period (based on the same data used to calculate the statistics in Figure 5a). Observation of Figures 6a and 6b for both periods (all months and annually) indicates that none of the data sets have trends that are statistically different from zero (p < 0.05), nor are the trends among data sets statistically different from each other. The RECON, CHAPMAN and COMISO data are of the same sign as the READER trends in all but a few instances, demonstrating that they are able to capture the weak observed trends at a grid point even though the trends are not statistically significant. Such a result infers that any statistically significant trends that occur will be easily reproduced by RECON, CHAPMAN, and COMISO. Figure 6c shows the 1962–2002 trends for the 8 common stations averaged together first (based on the same data used to calculate the statistics in Figure 5b), and thus provides an estimate of the ability of the data sets to reproduce regional trends. As with Figures 6a and 6b, despite statistical insignificance, RECON and CHAPMAN produce trends of the same sign and similar magnitude as observed in all but one instance (CHAPMAN has a small positive trend versus a small observed negative trend in August). The results presented in Figure 6 indicate that all of the data sets can reproduce observed Antarctic temperature trends at individual grid points, and regionally, in all seasons.
Figure 6. Temporal trends of the temperature anomalies (K a−1) for the observed READER (observed), RECON, CHAPMAN, and COMISO data sets for (a) all available observations for the six common independent stations for 1982–2002 placed into monthly and annual (“Y”) bins; (b) all available observations from the eight common independent stations for 1960–2002 placed into monthly and annual (“Y”) bins; and (c) all available observations from the eight common independent stations for 1960–2002 averaged together first, then placed into monthly and annual bins. Note that y axis scales vary. COMISO data are only shown in Figure 6a because they start in 1982. The error bars indicate 95% confidence intervals for the trends, estimated as t05*SEb1, where t05 is the t value for p = 0.05 and SEb1 is the standard error of the regression slope (i.e., of the trend). In subsequent figures and in Table 3, uncertainty is estimated as t05*SEtot, where SEtot = , and SEm accounts for additional uncertainty due to imperfect methodology/algorithms for RECON, CHAPMAN, and COMISO, estimated as the average standard error between the three data sets.
Download figure to PowerPoint
 In summary, the RECON, CHAPMAN, and COMISO data sets have similar overall performance according to our validation. In nearly all cases the correlations of RECON, CHAPMAN, and COMISO with individual station data are highly statistically significant. The performance during the coldest months is similar among the data sets. During late summer and early autumn, when Antarctic sea ice cover is lowest, the RECON data set on average has the highest correlations and lowest relative RMS errors compared to observations. The strong performance of RECON during summer may be due to our methodology, which through the use of model fields to establish spatial relationships likely minimizes the impacts of localized influences on temperatures compared to conventional objective analysis techniques. All of the data sets reliably reproduce near-surface temperature trends at independent stations even though they are statistically insignificant, suggesting that they will easily reproduce stronger, statistically significant trends as well. The results of this validation provide quantitative evidence that the continent-averaged Antarctic temperature data presented next are accurate.
3.3. Comparison of Antarctic Temperature Data Sets
 Figure 7 shows the annual Antarctic near-surface temperature anomalies for various data sets for the 1950–2005 period (Figure 7a), and the more recent period from 1980 to 2005, which contains several additional data sets (Figure 7b). There is close agreement between RECON and CHAPMAN for the 1960–2005 period (r = 0.96; Figure 7a). Considering the small-scale noise and isotope diffusion that inherently occur in ice cores [e.g., van der Veen and Bolzan, 1999], the stable isotope reconstruction of Schneider et al.  matches the RECON and CHAPMAN data sets quite well (r ≈ 0.65 compared to either data set for 1960–1999), especially after 1975 (r ≈ 0.78 for 1975–1999). For the 1980–2005 period (Figure 7b), the time series have similar interannual variability, including the reconstructions, the ERA-40 temperature data, and a “synthetic” reconstruction, using the same technique as RECON, that employs ERA-40 records from the 15 observation sites (“RECON_SYN”). If our reconstruction methodology were perfect, RECON_SYN would exactly match the ERA-40 record. The close match indicates that the synthetic reconstruction reproduces the ERA-40 record very well (r = 0.95). The inclusion of the “RECON_NO_BYRD” record in Figure 7b demonstrates that our result is insensitive to the omission of the Byrd Station record, as the RECON and RECON_NO_BYRD records are nearly identical (r = 0.99). Because it is based on AVHRR skin temperatures, the COMISO record provides an independent constraint on the other records. The correlation of COMISO with RECON and CHAPMAN, and ERA-40 is r = 0.61 and r = 0.64, and r = 0.70 respectively.
Figure 7. Annual Antarctic near-surface temperature (K) anomalies (with respect to the 1980–1999 mean) for various data sets for (a) 1950–2005 and (b) 1980–2005. Abbreviations are as follows: “RECON_NO_BYRD” is the reconstruction with Byrd Station record omitted, and “RECON_SYN” is the reconstruction using “synthetic” temperature records extracted from the 15 ERA-40 grid points that correspond to the observation sites. The COMISO data begins in 1982, thus the anomalies are with respect to the 1982–1999 mean.
Download figure to PowerPoint
 The annual and seasonal Antarctic near-surface temperature trends are calculated for 1960–2002 and 1982–2001 (Table 3). The difference in end years (2002 versus 2001) between the two periods is due to the ERA-40 records ending in 2001 (actually, in mid-2002). The 1960–2002 annual and seasonal trends are statistically insignificant in all of the available data sets, and the 95% confidence intervals are at least twice as large as the trends in nearly every instance. The trends are of similar magnitude for the two reconstructions, CHAPMAN and RECON, indicating that at the continental scale the results are insensitive to which technique is employed.
Table 3. Temporal Trends and 95% Confidence Intervals of Average Annual and Seasonal Antarctic Near-Surface Air Temperature (K decade−1) From Various Data Sets for Two Time Periodsa
|RECON||0.02 ± 0.18||0.01 ± 0.29||0.02 ± 0.37||0.12 ± 0.37||0.14 ± 0.27|
|CHAPMAN||0.04 ± 0.14||0.05 ± 0.17||0.01 ± 0.28||0.08 ± 0.30||0.05 ± 0.23|
|SCHNEIDER||0.01 ± 0.13|| || || || |
|RECON||−0.21 ± 0.57||−0.66 ± 0.92||−1.09 ±1.07||0.63 ± 1.08||0.21 ± 0.81|
|ERA-40||0.21 ± 0.44||−0.41 ± 0.84||−0.26 ± 0.77||1.07 ± 0.81||0.49 ± 0.75|
|CHAPMAN||−0.05 ± 0.42||−0.07 ± 0.55||−0.78 ± 0.78||0.40 ± 0.88||0.23 ± 0.67|
|COMISO||0.24 ± 0.57||−0.16 ± 1.02||−0.19 ± 0.81||0.77 ± 0.81||0.50 ± 0.75|
|RECON_SYN||0.12 ± 0.58||−0.48 ± 0.83||−0.46 ± 0.91||1.01 ± 0.90||0.38 ± 0.65|
|SCHNEIDER||−0.06 ± 0.50|| || || || |
 The annual and seasonal trends are stronger over the 1982–2001 period, but they are statistically insignificant in all but four cases. The 95% confidence intervals are larger than the 1982–2001 annual temperature trends by a factor of two or more in all six data sets, indicating the annual trends are highly insignificant. For each of the four seasons, the trends for all of the data sets have the same sign (+ or −), suggesting robust results. The RECON and CHAPMAN near-surface air temperature trends are significantly (p < 0.05) negative in MAM (−1.1 and −0.78 K decade−1, respectively), but it is noteworthy that the negative RECON trend is much smaller (−0.33 K decade−1) and statistically insignificant if calculated through 2005. The negative trends in DJF and MAM are consistent with the strong upward trend in the SAM during summer and autumn [Marshall, 2003, 2007]. In JJA, the positive trends are consistent with middle and upper tropospheric warming (1970–2003) over Antarctica in winter based on weather balloon observations [Turner et al., 2006]. The SAM has not been strengthening during the winter months (until perhaps more recently; Figure 1), raising the question of whether the JJA warming is an analog of how Antarctic temperatures may change in other seasons if the positive SAM trends subsided. Marshall  notes that over East Antarctica the surface temperature response to SAM forcing displays little seasonality; that is, if SAM forcing in other seasons were similar to winter, the temperature response in those seasons might also be similar. One GCM study [Shindell and Schmidt, 2004] suggests the trends in the SAM might level off by midcentury if the Antarctic ozone hole mends itself. Other studies of GCM projections suggest the SAM will continue to strengthen throughout this century [e.g., Lynch et al., 2006; Fyfe and Saenko, 2006]. Figure 1 suggests the DJF, MAM, and annual SAM trends may already be leveling off since about the mid-1990s, an issue that is discussed in more detail below when the spatial plots are presented. The positive temperature trends in ERA-40 and RECON_SYN are statistically significant in JJA. Johanson and Fu  suggest that ERA-40 wintertime tropospheric temperature trends are too large in winter by a factor of about two; thus the veracity of these model-based trends is questionable. However, the good agreement between the ERA-40 and RECON_SYN trends indicates that our reconstruction methodology reliably reproduces the continental-scale trends.
 In summary, the two station-based near-surface temperature reconstructions (RECON and CHAPMAN) correlate strongly for annual and seasonal timescales for 1960–2005, and they agree reasonably with the Schneider et al.  stable isotope reconstruction for annual timescales. RECON is representative of the entire continent, as indicated by the similar trends and the strong correlation between the ERA-40 and “synthetic” ERA-40 (RECON_SYN) data sets. All records correlate significantly with all other records during all seasons from 1982 to 2001 (not shown). Near-surface temperature trends are statistically insignificant (p > 0.05) on annual timescales within every data set analyzed, for both the longer (1960–2002) and shorter (1982–2001) periods. Continental-scale seasonal trends are of the same sign in all data sets. Collectively, these results suggest that RECON is a robust record. In the next section, the regional variability of Antarctic near-surface trends is evaluated.