Corresponding author: D. M. Mitchell, Department of Atmospheric, Oceanic and Planetary Physics, University of Oxford, Oxford, UK. (firstname.lastname@example.org)
 Controversy remains over a discrepancy between modeled and observed tropical upper tropospheric temperature trends. This discrepancy is reassessed using simulations from the Coupled Climate Model Inter-comparison Project phase 5 (CMIP 5) together with radiosonde and surface observations that provide multiple realizations of possible “observed” temperatures given various methods of homogenizing the data. Over the 1979–2008 period, tropical temperature trends are not consistent with observations throughout the depth of the troposphere, and this primarily stems from a poor simulation of the surface temperature trends. This discrepancy is substantially reduced when (1) atmosphere-only simulations are examined or (2) the trends are considered as an amplification of the surface temperature trend with height. Using these approaches, it is shown that within observational uncertainty, the 5–95 percentile range of temperature trends from both coupled-ocean and atmosphere-only models are consistent with the analyzed observations at all but the upper most tropospheric level (150 hPa), and models with ultra-high horizontal resolution (≤ 0.5° × 0.5°) perform particularly well. Other than model resolution, it is hypothesized that this remaining discrepancy could be due to a poor representation of stratospheric ozone or remaining observational uncertainty.
 Discrepancies between observational and model-predicted temperature trends in the deep tropics have sparked much research over the recent past [e.g., Thorne et al., 2011a]. This is of particular concern as our predictions of future climate change rely heavily on how well climate models simulate historical climate change. In particular, given that climate models show roughly constant tropospheric relative humidity and that the atmosphere is not radiatively saturated in the upper tropical troposphere region, this has substantive implications for transient and equilibrium climate sensitivity [Randall et al., 2007].
 In the tropics, air-temperatures are dominated by convective processes which are parameterized in global climate-models. On seasonal-to-annual time scales, both the modeled and observed lapse rate variations follow those of a moist adiabat in the lower- and mid-troposphere [Santer et al., 2005]. However, this does not remain the case for observations when considered on multi-decadal time scales. In particular, Fu et al.  and Seidel et al.  suggest that models consistently exaggerate the (positive) trend in tropical tropospheric static stability from the mid- to the upper-troposphere compared to satellite and radiosonde observations. Recently, Lott et al.  examined a sub-selection of CMIP 5 models (based on the then available CMIP 5 archive data). They showed that there was a large inter-model spread in vertical profiles of tropical and mid-latitude temperature trends over the last 50 years. They also demonstrated a detectable anthropogenic signal in the vertical profiles of mid and low latitude trends, confirming previous studies [e.g., Tett et al., 1996; Stott et al., 2001; Thorne et al., 2002, 2003; Jones et al., 2003; Hegerl and Zwiers, 2011], while showing that many of the most up-to-date climate models continue to overestimate tropical tropospheric trends compared to observations.
 The numerous studies highlighting this issue over the recent past led Karl et al.  to suggest that the characterization of observational uncertainty may be inadequate. Thorne et al. [2011b] showed that uncertainties in the observed tropospheric temperature trends were of the same order of magnitude as the trends themselves and questioned whether a discrepancy between modeled and observed behavior really exists.
 In this study, we reexamine the reported discrepancy by considering the uncertainty in the most up-to-date models and observations available at the time of writing. Specifically, the structure and reasoning of our analysis is as follows:
 The observed temperature record is one single realization of many possible realizations that could have emerged given internal climate variability. Indeed, the observed tropical surface temperature record (over the satellite era) is somewhat unusual in it's ENSO sequence, such that there is a preponderance of La Niñas in the second half of the record [Rahmstorf et al., 2012].
 Other than poorly characterized observational uncertainty, the discrepancy between observed and modeled tropical surface- and tropospheric-temperature trends can arise due to either errors in model physics, or errors in constraining the model realistically through time-varying forcings and imposed boundary conditions.
 Two complimentary ways in which we address this are as follows: (1) by comparing atmosphere only models (i.e., with imposed boundary conditions) with coupled-ocean models and (2) by considering the ratio between surface- and tropospheric-temperature trends for each model. We show that when models are constrained in this more realistic manner, and observational uncertainty is well characterized, there is substantial improvement in the agreement between models and observations.
2.1 Observational Data Sets
 For surface temperature, we use the HadCRUT4 data set [Morice et al., 2012, available from http://www.metoffice.gov.uk/hadobs/], which is a combined Sea Surface-Temperature (SST) and land-surface air temperature data set. HadCRUT4 consists of 100 possible realizations of 5°×5° gridded surface temperature from 1850–present day. The different realizations are generated by sampling over realizations of observational uncertainly in the data set [Morice et al., 2012], for example, to account for changing SST measurement techniques, or for homogenization of land-station records.
 For air temperature, we use the Radiosonde Innovation Composite Homogenization (RICH) data set (available from http://www.univie.ac.at/theoret-met/research/raobcore/), which spans 1958–present day on a 10°×10° grid with eight vertical levels in the troposphere (up to 150 hPa). RICH uses observations minus background forecast statistics to identify non-climatic break points in the data [Haimberger et al., 2012]. Uncertainty is parameterized through varying a number of choices in the neighbor-based adjustment methodology, which results in a 32 member ensemble. It should also be emphasized that RICH warms more than most other older radiosonde data sets, and a comparison can be found in Haimberger et al. .
2.2 Model Data Sets
 We make use of simulations from the Coupled Model Inter-comparison Project phase 5 (CMIP 5), which encompasses the most up-to-date coupled ocean-atmosphere models. All model simulations are run with the most important climate forcings which include changes in greenhouse gases, tropospheric aerosols, solar irradiance, stratospheric ozone, and volcanic aerosols. We choose to use one ensemble member of each model so that we are comparing like-for-like (and we always choose the first available ensemble member so as not to be biased). If we instead used ensemble averages, a meaningful comparison between each of the models would be non-trivial due to the decrease of internal variability with the increase in ensemble size. In total, 22 CMIP models have the required data available for this analysis, and details of each model are given in Table S1 of the supporting information.
 We also make use of the atmosphere-only simulations performed as part of the CMIP 5 initiative (known as AMIP). These are identical to the CMIP simulations, but have imposed SSTs taken from observations derived by merging data from the HadISST1 data set and the National Oceanic and Atmospheric Administration (NOAA) weekly optimum interpolation (OI) SST analysis [Hurrell et al., 2008]. Because the AMIP simulations were not designated a primary experiment in the CMIP 5 project, only 18 models are available (Table S1).
 Our analysis is carried out on surface and air temperatures at tropical latitudes defined as the area weighted average between 20°N and 20°S. As most of the AMIP models are only run from 1979–2008, we focus our analysis on this period. However, we also extend the analysis to 1959–2008 using just the CMIP data. All model simulations use standard historical forcings up to the end of 2005. For the period 2006–2008, we use the historical extension simulations (where available), or the Recommended Concentration Pathway 8.5 (RCP8.5) simulations.
 As observational data is spatially incomplete, before performing the analysis, we mask all model tropospheric temperature data onto the RICH grid, and all model surface temperature data onto the HadCRUT4 grid using the same technique as in Mitchell et al. , thereby ensuring that both models and observations are sampling the same spatial and temporal regions.
 Figure 1 shows histograms of the observed tropical decadal temperature trends at different altitudes over the period (orange bars) 1959 to 2008, and (red bars) 1979 to 2008. The trends are calculated using a least squares linear fit. It is clear that over both periods, the uncertainty in surface temperature trends is of a similar order of magnitude as the mean trend, and this remains true throughout the depth of the atmosphere. For instance, at the surface the mean temperature trend over the 1959–2008 period is ∼0.10 K/dec with a range between 0.08 and 0.12 K/dec. Likewise, for the later period of 1979–2008, the mean surface trend is ∼0.12 K/dec with a range between 0.09 and 0.14 K/dec. The shift in the histograms over the two periods also emphasizes the greater rate of increase in surface temperature in the more recent decades.
 The greatest uncertainty (i.e., histogram spread) is at the higher altitudes, reflecting the decreased reliability in radiosonde measurements with height; [Karl et al., 2006; Thorne et al., 2011b].
 In the tropics, the transfer of surface heat via convection is of particular importance in determining vertical temperature profiles throughout the troposphere. We therefore combine the temperature trends of the 100 HadCRUT4 realizations with the 32 RICH realizations to create 3200 separate profiles of vertical temperature trends, which extend from the surface to the tropopause (150 hPa). Note that we make the assumption that all HadCRUT4 trend realizations can be unconditionally matched with all RICH trend realizations (i.e., a surface temperature trend at the lower end of the uncertainty distribution can be paired with an air-temperature trend at the upper end of the uncertainty distribution, and vice versa). This assumption is plausible as the two ensembles were derived independently and there is no a priori more defensible way to match the sets of data products produced. However, since in reality there are physical constraints as to how the surface temperatures and lower tropospheric temperatures co-vary, combining the two data sets as we have here may lead to an overestimate of the observational uncertainty.
 Figure 2a shows the resulting range of possible vertical temperature trend profiles in the observations (grey region) for the period of 1959–2008, and from a theoretical air-parcel following a moist adiabat from the mean observed surface temperature trend (black dashed line; note the use of a moist adiabat may be misleading in the upper troposphere [O'Gorman and Singh, 2013]). The red lines show the corresponding trends from all CMIP 5 models available at the time of writing (for temperature trend profiles of individually named models, see Figure S1). Note that nearly all the AMIP models start their simulations in 1979, therefore, we exclude them from the analysis of the longer period.
 It is clear that the majority of the CMIP models overestimate the tropical temperature trends over this period, although we note that the observed temperature trend at all heights lies within the 5–95 percentile range of simulated temperature trends. For example, at 300 hPa, the mean RICH temperature trend is ∼0.2 K/dec, but some of the models simulate trends of over double this. Some of this discrepancy arises because the models tend to overestimate the surface temperature trends. This is a common problem in global climate models and could stem from a poor representation of the Inter-Tropical Convergence Zone (ITCZ) [Richter et al., 2012], or through poor characterization of El Niño Southern Oscillation (ENSO) [in both frequency and magnitude; Guilyardi et al., 2012]. However, as we are using one ensemble member of each model (not ensemble averages, see section 2.2), the observations lie in the extreme lower tails of the model-predicted temperature changes. As convection is the dominant form of vertical heat transfer in the tropics, incorrectly simulating the surface temperatures will lead to a poor representation of temperatures aloft (as they will approximately follow a moist adiabat; Figure 2, black line). Given the bias in SST warming in the CMIP 5 models, it is therefore not surprising that the models on average over-estimate the tropospheric warming.
 Considering just the more recent period of 1979-2008 (Figure 2b), where the potential discrepancy is considered to be more pronounced [Thorne et al., 2011a], the CMIP 5 models (red lines) again exhibit greater warming than the apparent observed trends and at no point does the observed trend lie within the 5-95 percentile range of simulated trends.
 When the atmosphere-only (AMIP) models are considered (blue lines), the 5–95 percentile range of vertical profiles of temperature trends encompasses the observations up to 300 hPa. This is largely because the models have the correct SSTs (by construction) both in terms of trend and spatial pattern, so that the simulated tropical convection is likely to be more realistic and the AMIP inter-model spread is much lower than the CMIP inter-model spread. This is consistent with the study of Po-Chedley and Fu  who found a similar result, but using satellite data and CMIP5/AMIP models.
 Of particular note in this analysis are the ultra-high horizontal resolution (hereafter, ultra HR) models, which include two versions of the Geophysical Fluid Dynamics Laboratory High Resolution Atmospheric Model (GFDL-HIRAM), and two versions of the Meteorological Research Institute Atmospheric General Circulation Model (MRI-AGCM), which all have a horizontal resolution of ≤0.5°×0.5° (see Table S1). They have been marked with dashed blue lines in Figure 2 to distinguish them from the other models. Possible reasons for why these models outperform the other models are as follows: (1) the horizontal resolution is far higher than any other models and (2) the use of an improved convective parameterization scheme, allowing for deep convection to occur on resolved scales [e.g., Zhao et al., 2009].
 It is interesting to note however that even though the AMIP models have a correctly forced lower-boundary (i.e., with observed SSTs), they still slightly overestimate the vertical temperature trends compared to the apparent observations, especially in the upper troposphere. Other than the possibility of poorly prescribed forcing data, this suggests that either the models may be incorrectly capturing some of the fundamental physical processes, for instance in the parameterization of convection, or that there are unresolved errors in the observational measurements and in the statistical techniques used in homogenizing them. Indeed it is well known that raw radiosondes not only have a systematic cooling bias but also that the magnitude of this bias increases with height [Karl et al., 2006].
 To understand better how incorrect simulation of the surface temperature trends can influence the vertical temperature trend profiles, we consider the profiles in terms of an amplification of the temperature trend from the surface as in Santer et al. : at each discrete pressure level the temperature trend, T(z), is divided by the surface temperature trend, Ts(Figure 3; see Figure S2 for individually named models). Performing the analysis in this way allows us to compare the shape of the vertical temperature trend profiles of all the models because the surface signal is unity, and to first order, all models conform to the static stability constraint (i.e., the black dashed lines in Figures 2 and 3).
 For the 50 year period, the 5–95 percentile range of model simulated temperature trends is consistent with observations throughout the depth of the troposphere (Figure 3a). This indicates that the vertical temperature trend profiles would be well captured by the models if the surface trends were well captured (within observational uncertainty). Interestingly, when the shorter period is used (1979–2008, Figure 3b), the models and observations are also in good agreement (at least in the lower- and mid-troposphere), in contrast to the results of Santer et al.  who compared radiosonde data and CMIP3 models (although we note that they were unable to treat observational uncertainty as thoroughly as we have here because there were no such available information on the observational products at the time). Nevertheless, the models still exhibit more warming than the observations in the upper troposphere (above ∼250 hPa), and the upper most level (150 hPa) of model simulated temperature trends is not consistent with observations over the 5–95 percentile range. Three possible explanations arise for why this may be: (1) the observations may still contain systematic biases; (2) the model parameterized convection processes may be wrong, and the convective lid may be higher in models than in observations (our analyses of the ultra HR models supports this); and (3) Forster et al.  proposed that upper tropical tropospheric temperature trends could be influenced by stratospheric ozone. Solomon et al.  recently highlighted a potentially large discrepancy between the set of prescribed ozone forcings (employed in the CMIP and AMIP simulations) and the real world evolution of tropical stratospheric ozone. Hence, if the modeled ozone is poorly constrained, the upper tropospheric temperatures might also be. Although we emphasize that there is still much uncertainty in observations of stratospheric ozone concentrations.
 We also note that preliminary analysis from Mitchell et al.  suggested (although did not explicitly show) that tropical tropospheric temperature trends may be better simulated with models that have a well resolved stratosphere. In the analysis here, if the models are divided into groups of high- or low-top models (using the definition of Charlton-Perez et al., 2013), or fine or coarse horizontal resolution models (using the definition of Anstey et al., 2013), no statistically significant separation in the mean tropical temperature trend between the groups is apparent (see Figures S3 and S4). However, we note that even the fine resolution models considered here may be too coarse for processes such as convection to occur on resolved scales (with the exception of the ultra HR models).
 In this study we have taken advantage of the multiple realizations of surface- and air-temperature data available from the HadCRUT4 and RICH data sets. This has allowed a much more rigorous characterization of the uncertainty in observed trends of vertical temperature profiles in the tropics (20°N–20°S) over the periods of 1959–2008 (50 years) and 1979–2008 (30 years).
 Using coupled-ocean atmosphere models from CMIP 5, we highlight the discrepancy in surface temperature trends between model simulations and observations (especially over the 1979–2008 period). When the models are constrained in a more physically meaningful manner, through using either fixed SSTs or considering air temperatures as an amplification of the surface temperature, we show robust evidence for good agreement in the low-mid tropical troposphere but with some (reduced compared to previous studies) discrepancy in the upper tropical troposphere. This holds true even for the 1979–2008 period where the issue appears to be particularly pronounced [Thorne et al., 2011a]. However, we do note that the RICH data set warms more than preceding radiosonde data sets, at least in the upper troposphere [Haimberger et al., 2012]. Likely explanations for this remaining discrepancy are either systematic biases in the observations that remain unresolved, poor model parameterizations of convection, or poorly constrained ozone forcings. The ultra high horizontal resolution models, which allow for deep convection on resolved scales, performed particularly well in this analysis and were remarkably consistent with observations throughout the troposphere. This suggests that an accurate representation of convective processes at a suitable resolution may be fundamental when attempting to correctly simulate tropical tropospheric temperature trends.
 We thank the two anonymous reviewers for their insightful comments, and Ben Santer for providing the MALR profile used in our analysis. D.M.M. was supported by a grant from the UK Natural Environmental Research Council (NERC). P.A.S. was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101).
 The Editor thanks two anonymous reviewers for their assistance in evaluating this paper.