We perform a comparison of temperature changes in the free atmosphere over the period 1961–2010 using a high-top (lid at 84 km) and low-top (lid at 40 km) version of the HadGEM2 atmosphere-ocean global climate model. Model simulations of historical climate change that include key anthropogenic and natural external forcings are compared with three different radiosonde data sets. We also apply a regression-based “optimal fingerprinting” method to determine whether model-predicted temperature-change signals in response to external forcing are identifiable in observations. This method employs simulations that isolate the signals associated with different sets of external forcings (well-mixed GHGs and natural external factors). In both high- and low-top models, we obtain positive detection of the signals arising from anthropogenic influences. We find statistically significant differences between the latitude-height temperature signals simulated by the high- and low-top models, particularly in the tropical lower stratosphere. The scaling factors associated with each of the separate forcings are more tightly constrained in the high-top over the low-top model, but otherwise are similar in nature. Finally, we show that in detecting the GHG fingerprint, it is the entire vertical temperature profile that is important, not simply the lower-most tropospheric levels.
 A number of investigations have shown that the tropospheric warming and stratospheric cooling observed over the past 30 to 50 years cannot be explained by natural climatic effects alone, and that anthropogenic influences must play a large role [Santer et al., 1996; Tett et al., 1996; Stott et al., 2001; Jones et al., 2003; Hegerl and Zwiers, 2011].
 While global mean temperatures in the troposphere have increased from 1960 onward [with substantial noise on decadal time scales; Santer et al., 2011], in the stratosphere the temperature decrease has been observed primarily as two distinct step changes after the large volcanic eruptions of El Chichón and Mt. Pinatubo, respectively [Seidel et al., 2011]. The temperature changes in the troposphere are primarily associated with increased CO2, whereas in the stratosphere the cooling from increasing CO2 and anthropogenic ozone loss are of a comparable magnitude [Jonsson et al., 2009] although with different vertical dependencies.
 A number of previous studies have detected an anthropogenic fingerprint in radiosonde- and satellite-based estimates of the vertical profiles of zonal temperature changes [e.g., Santer et al., 1996; Tett et al., 1996; Gillett et al., 2011]. A recent multimodel study has shown that this remains the case in the most up-to-date coupled ocean-atmosphere models [Lott et al., 2013]. However, one continuing concern in these models is their inability to reproduce observed lapse rates in the tropical troposphere [Thorne et al., 2011].
 With the importance of stratosphere-troposphere coupling becoming more widely accepted [Baldwin and Dunkerton, 2001; Mitchell et al., 2013], more climate models are including a fully resolved stratosphere. As the majority of detection and attribution studies have been performed using models with relatively coarse stratospheric representation, it is important to consider whether the stratospheric resolution has a significant effect on the climate response to external forcing and on the detectability of anthropogenic signals.
 To address this question, we use a high-top version of the Hadley Centre climate model and compare against a low-top version.
 Three separate radiosonde data sets are used. Here, the use of radiosonde data sets rather than satellite measurements is preferable as they provide a longer record for estimating a climate fingerprint, and this may be important for detecting GHG signals in the lower stratosphere [Gillett et al., 2011]. However, it should be noted that spatial coverage in the high-latitude regions is poor compared to satellites. Therefore, the analysis is confined between 60 N and 60S. We consider the period 1961–2010 to be consistent with data from Lott et al. .
 The radiosonde data sets employed are HadAT2, which uses a nearest-neighbor approach to identify and adjust for any nonclimatic, step-like changes in the data set, and RICH-obs v1.5 and RAOBCORE v1.5, which use ERA-40 and ERA-Interim reanalysis products to identify and adjust for such changes. The adjustments to these data sets have led to more reliable estimates of decadal-scale changes in upper air temperatures [Haimberger, 2007].
 Since HadAT2 has incomplete spatial coverage and coarse horizontal resolution (5° × 5°), we performed all model-versus-observed comparisons on the HadAT2 grid, and used the HadAT2 coverage to mask all other data sets. To compute the zonal mean for a given latitude band, at least three grid points must have valid data (otherwise, the zonal average is set to missing). For annual averages, 9 out of 12 months must be present, with at least 1 month available in every season.
2.2 Model Configuration and Simulations
 We use a high- and low-top version of the coupled atmosphere-ocean Hadley Centre global environmental model, HadGEM2-CC (lid at 84 km, with 32 levels above the tropopause) and HadGEM2-ES (lid at 40 km, with 10 levels above the tropopause), respectively [Jones et al., 2011; Hardiman et al., 2012]. The primary differences in the simulations, other than the representation of the stratosphere, are the inclusion of interactive tropospheric chemistry in the low-top model runs and the inclusion of the solar cycle in stratospheric ozone poleward of 50° in the high-top runs. To examine if these differences could affect our results, we also employ a low-top version of HadGEM2-CC, which only differs from the high-top version in the model lid height and stratospheric resolution. The low-top version of HadGEM2-CC was not used for the optimal fingerprinting, since the full suite of simulations required for the fingerprint analysis was not available. A comparison of the temperature trends and variability of globally averaged temperature profiles between the two low-top models (using simulations that do have the same forcings) yields very similar results (not shown). As such, HadGEM2-CC/HadGEM2-ES model differences can be interpreted with confidence as due to model lid height rather than to other model differences.
 Four separate simulations covering the 1961–2010 period were performed, with (1) natural and anthropogenic forcings (ALL), (2) natural-only forcings (Nat), (3) well-mixed greenhouse gas only forcings (GHG), and (4) constant forcings set at preindustrial values. The external forcings, including prescribed ozone, are all as recommended by the Coupled Model Intercomparison Project (CMIP5); and more details of the model simulations are given in Table S1. In addition, we infer the other anthropogenic (OA) climate response by subtracting simulations 2 and 3 from simulation 1.
 We use optimal detection analysis [Allen and Tett, 1999; Tett et al., 2002], which assumes that a given set of observations, y, can be represented as a linear sum of climate responses (“fingerprints”) to different forcing mechanisms, xi, multiplied by a set of scaling factors, βi. The scaling factors are estimated using total least squares regression:
where ui is a measure of between-realization variability and u0 is the noise in the observations. Equation ((1)) cannot be solved explicitly as the covariance matrix associated with u0 cannot be inverted [see Allen and Tett, 1999]. We therefore optimize the analysis by projecting all data onto the leading empirical orthogonal functions (EOFs) of an estimate of the noise covariance matrix. βi are then determined at a specific truncation of EOFs such that a sufficiently high variance of the original data are still retained (more details of the optimal truncation are given with the relevant analysis in section 'Analysis'). As a consistency check, we compare the variance of the residuals of regression from equation ((1)) with the variance of internal variability estimates using an F test [Allen and Tett, 1999; Allen and Stott, 2003].
 We use the intraensemble variability (IEV) for the optimization of the fingerprints and the control run noise for significance testing. This procedure avoids the introduction of artificial skill [Allen and Tett, 1999]. The IEV is calculated by subtracting the ensemble average of n ensemble members from each of the individual members, where n = 3 and 4 for the high- and low-top models, respectively. The result is then scaled by (n(n − 1)− 1)1/2 to account for the decrease in variance from averaging ensembles [Tett et al., 2002].
 In the final part of our analysis, we simultaneously estimate βi for temperature-change signals from the high- and low-top models, Thigh and Tlow, respectively. We can express the regression analysis as follows: Tobs = β1Thigh + β2Tlow + u0. Thigh can also be expressed as the temperature-change signal from the low-top model with an extra signal associated with the high-top model, i.e., Thigh = Tlow + Tex, as such, Tobs = (β1 + β2)Tlow + β1Tex + u0. Throughout this study, we shall refer to the β values associated with Tlow as the common signal, and the β values associated with Tex as the extra high-top signal. If we consider the high-top model as a “perfect” model (i.e., the high-top model perfectly matches the observations), then it can be clearly seen that β1 = 1 and β2 = 0. Performing the analysis in this way allows us to clearly detect, or otherwise, the extra information gained by using a high-top model.
 One potential drawback to this type of multivariate linear analysis is signal degeneracy. To ensure that this is not problematic for two potentially similar signals, such as Thigh and Tlow, we perform signal degeneracy tests on the signal correlation matrix, as in Tett et al. [1999, 2002] [see also reference Mardia et al., 1980, for the underlying theory]. It is found that none of the signal combinations in this analysis are degenerate.
 Figure 1 shows the decadal temperature trends in the ALL simulations over the period 1961–2010 for three latitudinal bands that give a good representation for the tropics (20 N–20S) and midlatitudes (60S–20S and 20 N–60 N). Note that due to the spatial coverage of radiosonde observations, it is not possible to extend this analysis to the poles. The trends for the ensemble mean of high- and low-top models (red dashed and solid lines, respectively) were calculated using the coverage of the spatially incomplete HadAT2 data to mask the spatially complete model simulation output. Light red regions give an estimate of the 95% uncertainty in the trends, calculated by randomly resampling 50-year segments of the control run and repeating the analysis. The masked model data are then directly compared against HadAT2 (black solid line), RAOBCORE (black dashed line), and RICH (black dotted line). Note for completeness the decadal temperature trends are also shown for the Nat and GHG simulations in Figure S1.
 The radiosonde data sets agree well with each other in the troposphere, but start to diverge in the stratosphere, especially the tropical and southern midlatitude regions where the data coverage is poor. In particular, the RAOBCORE data set shows less stratospheric cooling than any of the other data sets used, consistent with comparisons of observational trends in other studies [e.g., Thorne et al., 2011].
 In all latitudinal bands, the high-top model cools more in the stratosphere than the low-top, a result that is also observed when the analysis is performed on a separate high-top/low-top clean model comparison using the EC-EARTH global climate model from CMIP5 (not shown). In general, the high-top model agrees better with both HadAT2 and RICH, and the low-top model with RAOBCORE, although it is noted that there is large observational uncertainty where the greatest model differences are observed.
 One notable difference between the models and radiosondes is in the tropical mid-upper troposphere (~500–100 hPa). While the high- and low-top models agree well with each other, simulating a 0.3 K/decade trend in this region, both models yield warming trends that are three times larger than in radiosonde data sets. This is a common disparity between present-day climate models and observations, especially during the satellite era [Santer et al., 2005; Thorne et al., 2011].
 If the same analysis is performed over the 1981–2010 period (Figure S2), where data coverage is more dense and anthropogenic influences on the climate are stronger, the high-top model warms less than the low-top model throughout the tropical troposphere, and is in better agreement with the radiosondes.
 Since the largest difference between the high- and low-top model is observed in the tropical stratosphere at around 50 hPa, we study this region in more detail. Figure 2 shows annual mean time series of the temperature anomaly (relative to the 1966–1995 period) in this region for the radiosonde data sets compared with simulations described in section 'Model Configuration and Simulations'. Because the model temperatures are ensemble averages, they show less variability than the radiosonde data sets.
 Prominent features of observed stratospheric temperature changes over this time period are the volcanically induced stratospheric warming signals after the eruptions of Mt. Agung (1963), El Chichón (1982), and Mt. Pinatubo (1991). These features are evident in the ALL and Nat simulations. As in the observations, the combined natural and anthropogenic simulation (a) shows the majority of stratospheric cooling occurs after the peak of the lower stratospheric warming caused by the eruptions of El Chichón and Mt. Pinatubo. In both models, the time evolution of tropical temperature changes at 50 hPa is closer to observations than in similar comparisons involving CMIP3 models [Cordero et al., 2006]. One plausible explanation for this result is the more realistic representation of the observed spatiotemporal history of stratospheric ozone loss in the models used here. However, in the ALL simulations, greater stratospheric cooling is observed in the high-top model than in the low-top model immediately after Mt. Pinatubo, and this is in better agreement with the observations. For both the high- and low-top models, stratospheric cooling is small in the GHG- and natural-only simulations compared to the OA simulation, implying ozone depletion as a primary driving force at this altitude (see also reference Randel et al., ).
 Using the ALL, GHG, and Nat simulations, we apply the optimal detection framework detailed in section 'Methodology'. We use the temperature time series at all heights and latitude bands and regress onto each of the radiosonde data sets separately. We use 5-year averages of the time series. This maximizes the fraction of observed variability retained after the data have been projected onto EOFs, while still retaining the fingerprint of large volcanic eruptions (for technical details on the reliability of spatiotemporal fingerprinting for this data, as well as extra analysis using different averaging periods, see Lott et al. ). Figure 3a shows the scaling factors, βi, calculated for the natural, GHG, and OA simulations for a range of EOF truncations using HadGEM2-ES regressed onto HadAT2 (see Methodology). Our previous analysis showed that vertical profiles of temperature changes were poorly reproduced in the model (Figure 1) and so we also express the temperature time series as anomalies relative to the lower-most layer (850 hPa) (b), thereby testing whether it is the near-surface layers or the entire vertical profile that are important in detection of the climate signals in the atmosphere (and hence may be poorly reproduced in the model). Interestingly, the GHG scaling factors are very similar between the two analyses, indicating detection throughout the atmosphere. In contrast, when temperatures relative to the base layer are used (Figure 3b), the OA fingerprint yields a value of βi > 1, indicating that the fingerprint amplitude has to be scaled up to match observations, and the natural forcings fingerprint is no longer detected (note this conclusion holds for all the radiosonde data sets in this study).
 To compare the Nat, GHG, and OA scaling factors between the high- and low-top model, it is necessary to consider their values at a single truncation. The truncation is chosen to be as large as possible while the regression residuals are still reasonable [Allen and Tett, 1999], and all chosen truncations lie in the range 30–40. Figure 3 shows these scaling factors for (c) the period 1961–2010 and (d) the period 1981–2010. Each panel is further subdivided into scaling factors for (left) the high-top model, (middle) the low-top model, and (right) the difference between the two. Within each section the model is compared against the three radiosonde data sets in turn (HadAT2, RICH, and RAOBCORE, respectively). The figure shows 5–95% ranges, and a signal is detected if the lower limit (5%) of the confidence intervals is greater than zero. If this range encompasses 1, then this indicates consistency with the estimated observed response.
 We first note that the GHG, OA, and Nat fingerprints are still all detected in the high-top model as has been reported multiple times for low-top models [Stott et al., 2001; Tett et al., 2002; Jones et al., 2003]. This detection is consistent across all radiosondes, and in both analysis periods, although the attribution of these fingerprints is not so consistent. It is also clear that the signal amplitudes estimated using the high-top model [Figure 3c and d (left)] are more constrained than those estimated using the low-top model (middle), and this is particularly true for the Nat simulations (blue lines), which are not detected for the low-top model (i.e., the scaling factors are consistent with zero) when using the 1981–2010 period. We also note that using more ensemble members in the regression acts to more tightly constrain values of β as they essentially reduce natural model internal variability.
 Over the 1961–2010 period (Figure 3c), the Nat fingerprint is consistent with observations, the GHG fingerprint overestimated (and hence needs to be scaled down), and the OA fingerprint is rather inconclusive, but is also generally overestimated. It is interesting to note that when the shorter period of 1981–2010 is used (Figure 3d), the high-top model response to Nat is too large, and this is also suggested from the time series analysis (Figure 2c). In contrast, in the low-top model, the Nat fingerprint is no longer detectable over this period.
 As a further test of the model difference, we also regress out the common and extra high-top signal between the high- and low-top ALL forcing simulations (see Methodology for how these are defined), the results of which are shown in Figure 3c and d (right). The red bars show the common signal between the high- and low-top models, which is consistently detected. The black bars, which show the signal only associated with the high-top model, are also clearly detected in all three radiosonde comparisons for the period 1961–2010, and in two of the three comparisons for the period 1981–2010, albeit with less certainty than the common signal.
 We have used a high- and low-top version of the same model to study temperature changes in the free atmosphere from 1961 to 2010. A spatiotemporal optimal detection analysis has been performed on the data. In doing so, we showed using the high-top model that the fingerprints of the GHG, Nat, and OA climate signals were all detected, and that they were robustly more well constrained than the low-top model. Interestingly, the GHG signal was still detected when temperatures anomalies were considered relative to the lowest available level (850 hPa). We were also able to detect a signal that was only associated with the high-top model, although the signal amplitudes were similar enough to the low-top model that we can infer that previous detection and attribution studies of temperatures in the free atmosphere remain relevant.
 When the period of 1981–2010 was studied, the Nat component of the climate signal was no longer detectable in the low-top runs. The fact that it can still be detected in the high-top model is likely due to a better representation of the stratospheric response to El Chichõn and Mt. Pinatubo. Interestingly, over this same period, temperature trends in the tropical upper troposphere were smaller in the high-top than in the low-top model and as such were in better agreement with radiosondes. While assigning statistical significance to this result is nontrivial given the limited number of ensemble members available, it does prompt further work into this area as the disparity between modeled and observed lapse rate changes in this region is of great concern.
 We thank Peter Thorne, Susan Solomon, Simon Tett, and Jonathan Gregory for insightful discussions, and Bo Christiansen for supplying an alternate clean high-top low-top model comparison. We also thank the two reviewers for their insightful comments. DMM was supported by a grant from NERC. PAS, FCL, NB and SCH were supported by the joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101).