A recent study of 1979–2010 tropical tropospheric temperature trends in climate model simulations and satellite microwave sounding unit (MSU) observations concluded that, although both showed greater warming in the upper than lower troposphere, the vertical amplification of warming was exaggerated in most models. We repeat that analysis of temperature trends, vertical difference trends, and trend ratios using five radiosonde datasets. Some, but not all, comparisons support the notion that vertical amplification in models exceeds that observed. However, larger ranges of radiosonde trends compared with those for MSU, and the sensitivity of results to the upper-tropospheric level analyzed, make it difficult to conclude unambiguously that models are inconsistent with radiosonde observations. The larger ranges are due to the availability of more radiosonde datasets with different approaches for adjusting measurement biases. Together these two studies highlight challenges of using imperfect observations of tropical tropospheric temperature over a few decades to assess climate model performance.
 Examining 1979–2010 tropical tropospheric temperature trends in 19 climate models and in satellite microwave sounding unit (MSU) observations, Fu et al. [2011, hereinafter FMJ] found that both observations and models indicate both lower-middle and upper-middle tropospheric warming, with greater warming aloft. However, vertical amplification was exaggerated in most of the models compared with MSU observations. Because the vertical structure of temperature trends is related to tropospheric lapse rate and humidity trends, their finding suggests models may not accurately simulate associated climate feedback processes. Here we use radiosonde data to test the reproducibility of the main conclusion ofFMJ – that models show larger vertical amplification of tropical tropospheric warming trends than do observations.
 Radiosonde data offer several distinct advantages for this investigation. First, better vertical resolution allows clearer differentiation of upper and lower tropospheric trends. Radiosonde reports have 11 mandatory pressure levels between 1000 and 100 hPa, while MSU has two channels that sense emissions from deep layers, which FMJ manipulated to obtain temperatures representative of two deep tropospheric layers [Fu et al., 2004; Fu and Johanson, 2005]. Second, there are more long-term radiosonde datasets than MSU datasets, offering a better opportunity to assess uncertainty associated with removing time-varying biases.
 Many studies and several comprehensive reports have examined modeled and observed temperature trend differences between the surface and the troposphere, with recent work suggesting that the largest uncertainties are associated with tropical tropospheric trends [see Climate Change Science Program, 2006; Thorne et al., 2011, and references therein]. Before FMJ, only a few studies had focused on differences between the upper and lower tropical troposphere (rather than the surface). Gaffen et al.  examined 700 and 500 hPa temperature and lapse rate trends and found cooling at both levels in unadjusted radiosonde data during 1979–1997, a shorter time span than FMJ. Bengtsson and Hodges  found greater warming in the lower than middle troposphere in MSU data over tropical ocean regions during 1979–2008, but their “middle” tropospheric layer is actually higher than FMJ's “upper-middle” layer, becauseFMJ removed the stratospheric influences (including cooling that counteracts tropospheric warming).
2. Radiosonde Data, Model Simulations and Methods
 This study is based on modeled and observed tropical (20°N to 20°S) monthly tropospheric temperature anomalies for 1979–2010. We focus on 300 and 700 hPa, near the centers of the MSU layers FMJ and Po-Chedley and Fu refer to as the upper-middle troposphere (T24) and lower-middle troposphere (T2LT) (Figure 1 in FMJ), and 200 hPa, where FMJ report maximum warming in models. The T24 weighting results in temperatures that are more purely representative of the upper troposphere than of the deeper layer sensed by MSU Channel 2 [Fu et al., 2004; Fu and Johanson, 2005].
 Using tropical-average upper and lower tropospheric temperature anomaly time series, and upper-minus-lower-tropospheric difference time series, we computed trends as inFMJ, using least squares linear regression, with 95% confidence intervals that account for time series autocorrelation. Trends from different datasets and models are compared via two-sided t tests [Lanzante, 2005; Santer et al., 2008; FMJ]. The ratio of the two trends (not the trend in time series of anomaly ratios) was also computed for comparison with FMJ.
 We use five radiosonde datasets created using four different methods: RATPAC [Free et al., 2005], HadAT [Thorne et al., 2005], RAOBCORE [Haimberger, 2007], and RICH [Haimberger et al., 2008]. (See references for dataset descriptions and acronym explanations.) All are adjusted datasets from which time-varying biases have been removed, but each involves a unique adjustment method and a particular subset of radiosonde stations (Figure 1). Note that RAOBCORE and RICH use information from a reanalysis to locate data discontinuities, and RAOBCORE incorporates reanalysis information to make adjustments. We use two versions of RATPAC: RATPAC-B station data with no adjustments after 1997 and RATPAC-A zonal-mean products adjusted for the full record. We use the most recent versions (1.5) of RAOBCORE and RICH [Haimberger et al., 2012] and note that previous related studies employed earlier versions (e.g., Santer et al. used versions 1.2, 1.3, and 1.4), and that there are large trend differences between versions 1.4 and 1.5 (not shown here). For RAOBCORE, RICH and HadAT, tropical averages were calculated from area-weighted means from four 10° latitude bands derived from gridded data. Tropical averages from RATPAC-B were calculated by combining 0000 and 1200 UTC data (when available) for each station, averaging time series for all stations in each 10° band, and computing the weighted average of the bands. RATPAC-A averages station data in three longitudinal regions that are then combined to obtain a tropical mean.
 Following FMJ, simulations are from Phase 3 of the Coupled Model Intercomparison Project [Meehl et al., 2007] from 36 runs of 19 climate models for 1979–2010, derived by merging 20th Century simulations with 21st Century projections. Multi-model means are computed by averaging ensembles to form means for each model, then averaging (and calculating standard deviations from) the 19 model means.
3.1. Tests of Tropical Sampling Differences
 Compared with MSU observations, radiosonde data have significant gaps due to irregular station spacing and irregular observing programs. We explore the resulting uncertainty in tropical-average temperature trends by (1) subsampling MSU data at radiosonde locations, (2) imposing temporal sampling requirements on station data, and (3) comparing three methods of combining station data to form tropical averages. In each approach, we test the consistency of tropical-average upper-minus-lower-tropospheric temperature trends from differently sampled data. In brief, all three sets of tests showed no statistically significant (p = 0.05) trend differences. The second and third sets of tests were performed on four of the five radiosonde datasets, but not on RATPAC-A, for which individual station data are not provided.
 Subsampling MSU T24 and T2LT data [Mears and Wentz, 2009] at the radiosonde station locations of the sparsest dataset, RATPAC, yielded positive trends in T24, T2LT and T24-minus-T2LTfrom both the complete and subsampled MSU data. Moreover, the trend differences between the complete and subsampled results were not statistically significant, indicating that the irregular spatial distribution of the radiosonde network does not bias tropical-average trend estimates.
 Some radiosonde station records have large gaps during 1979–2010. To test their impact, we compared trends based on all of the stations in each dataset to trends based on subsamples of stations having at least 10 daily observations for a given month, pressure level, and observation time (0000 or 1200 UTC) for at least two thirds of the months during 1979–2010. This reduced the number of stations from 192 to 37 in RAOBCORE and RICH, from 192 to 31 in HadAT, and from 21 to 13 in RATPAC. For all four datasets, trends in tropical-average 300-minus-700 hPa temperature differences, and in 200-minus-700 hPa differences, from the full and reduced station networks showed no statistically significant differences.
 To examine the sensitivity of trends to the method of computing tropical averages, recognizing that some stations take one and others take two observations daily, we combined station data three ways: (a) equally weighting all time series within each of three 120° longitude regions, allowing two time series at stations with both 0000 and 1200 UTC data, then computing the average of the regions, (b) creating 0000 and 1200 UTC averages for each region, averaging these, then averaging the three regions, and (c) equally weighting time series from all tropical stations to create 0000 and 1200 UTC averages, then averaging these. These methods were compared with each other and with the standard RATPAC-B method described above. As with the tests of temporal sampling effects, there were no statistically significant upper-minus-lower-tropospheric temperature trend differences associated with averaging method.
 In summary, for each radiosonde dataset and for MSU data, our tests demonstrated that neither spatial nor temporal sampling choices result in significant differences in tropical-average tropospheric temperature difference trends. For the rest of this paper, we report results from each radiosonde dataset based on its full station network and the RATPAC-B averaging method.
3.2. Upper and Lower Tropospheric Temperature Trends and Difference Trends
FMJ reported 1979–2010 T24 and T2LT warming trends from three and two MSU datasets, respectively, of ∼0.1 to 0.2 K/decade. The climate models also showed T24 and T2LT warming but larger than in MSU observations (Table 1 in FMJ). The radiosonde datasets yield a wider range of 1979–2010 tropical tropospheric temperature trends than MSU, and not all are warming trends. Trend ranges are: −0.01 (HadAT) to 0.15 K/decade (RICH) at 200 hPa; 0.11 (HadAT) to 0.20 K/decade (RICH) at 300 hPa; and 0.06 (RAOBCORE) to 0.14 K/decade (RICH) at 700 hPa. We find these trend ranges in the models: 0.10 to 0.69 K/decade at 200 hPa; 0.10 to 0.67 K/decade at 300 hPa; and 0.06 to 0.40 K/decade at 700 hPa. Thus some model simulations show greater lower tropospheric (700 hPa) warming than any of the radiosonde datasets, and upper-tropospheric trends are more consistent between 300 and 200 hPa in the simulations than in the radiosonde observations.
Figure 2shows the evolution of upper-minus-lower-tropospheric temperature anomaly differences for the tropics, from radiosondes and models (mean ±2 standard deviations). The small spread indicates good agreement among the radiosonde time series, which show strong interannual variability associated with ENSO (FMJ), an expected consequence of the vertical amplification of tropical temperature anomalies (the stronger ENSO signal in the upper troposphere remains in the upper-minus-lower-tropospheric difference [Santer et al., 2005]). The model results show larger spread because of inconsistent timing of ENSO events in these coupled atmosphere-ocean models.Po-Chedley and Fu  have recently replicated the main findings of FMJ using atmospheric model simulations with observed sea surface temperature boundary conditions, suggesting that the timing of ENSO events is not the main source of discrepancy between model and MSU trends. Radiosonde temperature anomaly differences have a range of ∼2 K at any point in time, about five times the range of the MSU differences (compare our Figure 2 with Figure 3 in FMJ), because the amplification is more pronounced between the vertically-separated radiosonde pressure levels than between the broad MSU layers.
Figure 3(left) presents trends in the vertical differences, for both 300-minus-700 and 200-minus-700 hPa, for all 36 model simulations (and the multi-model mean) and all 5 radiosonde datasets. All model trends are positive, the majority of trends have 95% confidence intervals that do not span zero, and model trends are very similar for the two pairs of pressure levels. Model difference trend values range from 0.04 to 0.28 K/decade for 300-minus-700 hPa, and from 0.00 to 0.29 K/decade for 200-minus-700 hPa (vertical lines inFigure 3), substantially greater than the model difference trend range for T24 minus T2LT, ∼0.015 to 0.090 K/decade (FMJ). The multi-model mean difference trend is 0.161 ± 0.060 K/decade for 300-minus-700 hPa, and 0.150 ± 0.050 K/decade for 200-minus-700 hPa, both exceeding the T24-minus-T2LT trend of 0.051 ± 0.007 K/decade (FMJ).
 Ranges of radiosonde difference trends are smaller than the model ranges and shifted toward lower values (0.026 to 0.122 K/decade for 300-minus-700 hPa, and −0.094 to 0.018 for 200-minus-700 hPa) with systematically more positive difference trends for 300 than 200 hPa. This might be due to remaining time-varying biases in the radiosonde data (which we expect contribute to spurious cooling that is larger at 200 than 300 hPa); alternatively, model trends at 200 hPa might be biased by tendencies to overestimate the heights to which radiative-convective equilibrium dominates radiative equilibrium and to underestimate stratospheric influences on the upper troposphere.
Table 1compares each radiosonde difference trend estimate with the other four observational results and with the 36 model simulations. Trend estimates and their confidence intervals (K/decade) from each radiosonde dataset (columns 2 and 5) are compared with estimates from 4 other radiosonde datasets (columns 3 and 6) and with estimates from 36 model simulations (columns 4 and 7). Entries in columns 3, 4, 6 and 7 are the percentage of statistically significant trend differences, evaluated with two-sample t tests based on ± 2 standard errors and accounting for time series autocorrelation.
 The observational trend estimates tend to show few statistically significant trend differences with the other observational estimates. Not shown in Table 1is the degree of agreement among the model runs. Briefly, comparisons of pairs of model trends show no statistically significant differences for most of the runs. But comparisons involving 9 of the 36 runs do show significant 200-minus-700 hPa temperature difference trend differences from the majority of other runs. For 300-minus-700 hPa temperature difference trends, 8 of 36 runs show significant differences from the rest.
Table 1. Comparison of Observational and Model Estimates of 1979–2010 Trends in Upper (300 or 200 hPa) Minus Lower (700 hPa) Tropical Tropospheric Temperature
300 Minus 700 hPa
200 Minus 700 hPa
0.067 ± 0.054
0.018 ± 0.064
0.080 ± 0.100
−0.080 ± 0.120
0.060 ± 0.045
−0.064 ± 0.060
0.122 ± 0.058
−0.018 ± 0.056
0.026 ± 0.045
−0.094 ± 0.061
 More relevant to this study are the comparisons of observed and modeled difference trends in Table 1, a large fraction of which show statistically significant differences. For 200-minus-700 hPa, all five radiosonde datasets have significantly smaller trends than the majority (25 to 33) of the 36 model simulations. The disagreement between observations and models is less striking for 300-minus-700 hPa; only 42% of the 180 comparisons (5 radiosonde estimates times 36 model estimates) show significant differences, compared with 84% of the 200-minus-700 hPa comparisons. Overall, there is more disparity between model simulations and observations than within either the group of simulations or the group of observations, and more disparity using 200 hPa than 300 hPa as representative of the upper troposphere.
3.3. Temperature Trend Ratios
FMJ found the ratio of T24 to T2LT trends to be larger in the model simulations (1.20) than in two versions of MSU observations (1.10 and 1.06). Figure 3presents analogous 300-to-700 hPa, and 200-to-700 hPa trend ratios. The range of model values is smaller for 300-to-700 hPa than for 200-to-700 hPa, but the multi-model means are the same, 1.64, and larger than for the MSU layers, again highlighting the greater amplification obtained using distinct pressure levels. Radiosonde data yield much larger ranges of trend ratios. The range of 300-to-700 hPa ratios easily encompasses all the model results, but for 200-to-700 hPa ratios, four of the five observational results fall outside the model range.
4. Discussion and Summary
 Our tests using radiosonde rather than MSU observations lend some support to FMJ's finding that climate models exaggerate the vertical amplification of tropical tropospheric warming during 1979–2010. However, our results are not as clear, due to two main factors: the wide range of trends in datasets using different methods of adjusting time-varying observational biases, and the dependence of the findings on the discrete pressure level (i.e., 200 vs. 300 hPa) chosen to represent the tropical upper troposphere. Had our scope been more similar toFMJ's (i.e., had we examined only two radiosonde datasets and considered only 200 hPa) the range of results might have been smaller and more supportive of their conclusion. Taken together, the two studies highlight the challenges inherent in using imperfect observations of tropical tropospheric temperature over a few decades to assess climate model performance, and the value of independent satellite and in situ observations.
 J.S.W. was supported by the NOAA Air Resources Laboratory through a National Research Council Research Associateship. We are grateful to Qiang Fu, Syukuro Manabe, and Celeste Johanson for generously providing processed model results and for helpful discussions. We thank Yehui Zhang for computer code, and Leopold Haimberger for RICH data. Qiang Fu, Jim Angell and two anonymous reviewers provided constructive review comments.
 The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.