5.1. First Difference Method
 To estimate the error introduced into the extended time series by the first difference procedure, we did Monte Carlo tests using NCEP/NCAR reanalysis data [Kistler et al., 2001] to simulate the radiosonde station data. Using grid points collocated with the RATPAC stations, we selected artificial break dates after 1996 at random, deleted the data for 12 months surrounding those dates and then combined the reanalysis time series using the FD method described above. The number of break dates used at a given station was equal to the number actually found in the metadata for that station. The process was repeated 10,000 times with different randomly selected dates for the cuts. For each iteration, we calculated trends in the resulting series for 1960–2004 (not shown) and 1979–2004. The input time series were masked to match the missing months in the actual IGRA data sets for the corresponding stations. All tests used endpoint outlier trimming at 1.0 standard deviation and the spatial averaging and interpolation procedures described in section 3.2 above.
 We then took the difference between the 5th and 95th percentiles of the trends from the 10,000 iterations as a measure of the uncertainty introduced by the procedure. Since the FD procedure is used in only approximately one third of the time period 1979–2004, this uncertainty metric is less than that which would occur if FD were used for the entire period. We chose this measure, rather than a metric focused on the period when FD was used, because it most closely resembles the quantity of interest to most users of the data set. As the RATPAC-A data set is extended further in the future using FD, the trend uncertainty can be expected to increase.
 The uncertainties for annual mean trends at individual pressure levels in the hemispheres and tropics fall between 0.01 K/decade for the SH near the surface and 0.25 K/decade for the NH at 30 mbar (Figure 5 and Table S4 in Supplemental Material) and are usually a few hundredths of a degree in the troposphere, but over 0.1 K/decade in the stratosphere. The ratio of uncertainty to trend is usually much less than one, but exceeds one for a few cases, generally where the trends themselves are very small. The uncertainties are typically less than the standard error of the trends, except for a few cases near the surface. Largest uncertainties from the FD procedure occur for the NH extratropics in the troposphere, because of the large number of metadata events for which cuts are made in this area, and for the tropics in the stratosphere.
Figure 5. Estimated uncertainty in trends in annual mean temperature for 1979–2004 related to the use of the FD method for the globe (open circles), tropics (solid circles), NH extratropics (open squares) and SH extratropics (solid squares).
Download figure to PowerPoint
 Similar tests for surface-troposphere trend differences indicate uncertainties of 0.05 K/decade for the globe and 0.08 for the tropics for 1979–2004. Uncertainties for trend differences at adjacent levels in the troposphere are smaller. For example, differences between trends at 700 and 500 mbar had estimated uncertainties of 0.02–0.04 K/decade. The data set therefore appears suitable for analysis of changes in lapse rates within the troposphere.
 On the basis of similar tests with reanalysis data, trends for seasonal layer means show uncertainties generally larger than those for annual pressure level data, reaching 0.18–0.51 in the stratosphere in boreal fall (Table S5 in Supplemental Material). As with the annual pressure level data, trend uncertainties in other seasons are generally largest for NH and NH extratropics where the most metadata cuts are made. In the troposphere, seasonal trend uncertainties are no more than 0.07 K/decade for regions other than the NH extratropics and NH. For trends in individual months, uncertainties estimated by this method (not shown) are larger, up to 0.20 for global 850–300 mbar layer means. Because of these high uncertainties, we do not include monthly data in the RATPAC-A data set.
 In general, uncertainties are larger for the NH than the SH, presumably because of the greater number of metadata events there. In the SH, lack of metadata limits the number of cuts and therefore the estimated uncertainty from the FD procedure, but the lack of metadata itself creates uncertainty of an unknown size. Because of this lack of metadata, as well as the limited number of stations in some parts of the SH, overall uncertainty could well be greater in the SH than in the NH.
 Another possible source of uncertainty in the FD method is the sensitivity to methodological details such as the endpoint outlier trimming procedure described in section 3.2.3 above. Different trimming choices can change the resulting trends by up to 0.06 K/decade in the troposphere and 0.07 in the stratosphere (Figure 6), with the largest effect in the NH. In most cases these differences are small in comparison with the trends, so that choice of trim parameter does not appear to be a major source of uncertainty.
Figure 6. Least squares linear trends in annual mean temperatures for 1979–2004 from RATPAC-A using trim factors of 1.0, 1.2, 1.6 and 2.0 times the standard deviation, along with trends for series combined with FD but no trimming (“no trim”) and for series combined without using FD (“no FD”), in K/decade. The “no FD” series contain LKS adjusted data through 1995 and IGRA data afterward. Note that this differs slightly from RATPAC-B which uses LKS adjusted data through 1997.
Download figure to PowerPoint
 Results from the FD procedure will also be sensitive to the timing of cuts made for metadata events. Because the metadata is out of date, incomplete or unclear for the majority of the stations in the LKS network, many changes have undoubtedly occurred that are not accounted for in our procedure. At least half of the discontinuities found by the LKS team and the HadAT team were not supported by available metadata. This suggests that metadata problems could significantly reduce the ability of our procedure to reduce inhomogeneities. The effect of this error has not been quantified, but could exceed the other uncertainties described in this subsection.
5.2. Limited Spatial Coverage
 Previous work has shown the potential importance of spatial sampling issues in results from other radiosonde data sets [e.g., Trenberth and Olsen, 1991; Santer et al., 1999]. Because of similar questions about the adequacy of spatial coverage in the 85-station RATPAC network, we considered expanding the network to include additional carefully selected stations. Our original plan was to use the first difference method applied to data from stations with good station history metadata, few metadata events, and relatively complete records. We hoped that the first difference method would allow incorporation of new stations without the arduous work that had already been done for the original LKS stations. However, we found that in most of the apparent voids in the network, such as the Northern Pacific ocean, no stations with long records and good metadata exist, so opportunities to improve the gross spatial coverage are limited. Thirty-five potential new stations were identified on the basis of data and metadata archives. Most were in the Northern Hemisphere, particularly in North America and China. Although the new stations were in areas already covered relatively well, we hoped the additional stations could still improve the sampling error of the large-scale mean time series.
 We tested the effect of the proposed expansion on sampling error using the NCEP reanalysis data. For this test, the “global mean” or “regional mean” data sets created by subsampling the NCEP reanalysis data according to (1) the LKS network locations and (2) the expanded network locations (LKS plus new) were compared to (3) the global mean of the NCEP data for all points. Table 4 shows sampling error and changes in sampling error for global mean trends at four pressure levels. We measure the sampling error by the absolute value of the difference between the trend in the full global mean and the trend in the subsampled data set. The “improvement” from the expansion is the LKS sampling error minus the extended set sampling error.
Table 4. Spatial Sampling Error in LKS and Extended Networks, as Measured by the Difference Between the Trend in the Complete Global Mean Reanalysis Data Set and the Trend in the Smaller Network, Along With the Change in That Error From Expansion of the Network From 85 LKS Stations to the 120-Station Extended Network (“Improvement”)a
|LKS Errorb||Extended Errorb||Improvementc||LKS Errorb||Extended Errorb||Improvementc|
 For global means, the extended set includes 35 additional stations. The sampling error in the trends ranges from less than 0.01 to 0.06 K/decade for 1979–1997. The changes in sampling error with the addition of extension stations are no more than 0.04 K/decade, and are slightly negative in four of the eight cases (implying a degradation of the result with the expanded network in those cases, although the change is statistically insignificant). Most of these sampling errors and their differences are much smaller than the ∼0.05 standard error of the trends. These trend comparisons do not show a consistent or significant improvement from adding the additional stations to the LKS data set. Results from the NH and tropics confirmed this conclusion. This is consistent with the findings of Free and Seidel  showing little or no improvement in large-scale sampling error for upper air networks of greater than 100 unevenly spaced stations in comparison to the LKS 87-station network.
 For comparison, we used the reanalysis data to estimate the error that could be introduced by the first difference method if we used it to extend the data set by 35 new stations, in a test like those described in section 5.1 above. We used two randomly timed cuts for each of the new series that had metadata events in real life. The results indicated potential errors of up to 0.02 K/decade from the procedure.
 We also examined the root-mean-square differences between the subsampled and full annual time series. Like the trends, these showed no consistent improvement in the error for the extended data set as compared to the LKS network.
 These estimates suggest that the error in global and hemispheric temperature would likely be no better in the expanded network than in the LKS network. This finding is consistent with results from Trenberth and Olsen  suggesting that the Angell and Korshover  network of only 63 stations is reasonably adequate to define global and hemispheric upper air temperature variations. On the basis of our subsampling results, the potential for additional error from first differencing, and the fact that the candidate extension stations are in regions that are already reasonably well sampled by the LKS network, we concluded that the RATPAC data set would probably be more accurate without the additional stations.
5.4. Residual Inhomogeneities in LKS Data
 As recognized by Lanzante et al. [2003a, 2003b], the LKS adjustment process undoubtedly missed some significant inhomogeneities. LKS showed that the adjustments generally reduced the differences between the radiosonde trends and trends from the Christy et al.  satellite data for individual stations, but did not eliminate them. Since this work, two alternate satellite temperature data sets [Mears et al., 2003; Vinnikov and Grody, 2003] have been published that suggest even larger differences between sonde and satellite trends. Recent work by other authors comparing sonde data to satellite data [Randel and Wu, 2005] and examining day-night differences in sonde data [Sherwood et al., 2005] suggests that the trend effects of remaining inhomogeneities in LKS and other adjusted radiosonde data sets could be as large as the effects of the LKS adjustments. Thus shortcomings in the adjusted LKS data set are a significant possible source of additional uncertainty in the RATPAC products. However, RATPAC trends are generally similar [Santer et al., 2005] to those in the HadAT radiosonde data set, which was constructed with a different adjustment approach. We believe that the current RATPAC product, despite its limitations, is as reliable as any available at this time.