Corresponding author: W. R. Hobbs, Institute of Marine and Antarctic Science, University of Tasmania, Private Bag 129, Hobart, TAS 7001, Australia. (firstname.lastname@example.org)
 Recent work comparing historical hydrographic data with modern Argo observations shows a long-term change in the global ocean temperature. The magnitude of this change is greater than estimates of late 20th century warming, and implies a century-scale change in the global oceans. Using global coupled climate models from the Coupled Model Intercomparison Project Phase 5 suite of simulations, we assess to what extent this observed temperature difference can be attributed to a genuine long-term warming trend. After accounting for natural variability and sampling errors, we find convincing evidence that there has indeed been a century-scale anthropogenic warming of the global ocean up to the present day, and a strong possibility of anthropogenic warming from 1873 to 1955. The estimated 1873–1955 ocean warming implies a net top-of-atmosphere energy imbalance of 0.1 ± 0.06 Wm–2, and a thermosteric global mean sea level rise of 0.50 ± 0.2 mma–1.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
 The ocean is the primary heat sink of the global climate system; hence, any forced net energy imbalance at the top-of-atmosphere (TOAnet) is expressed largely as a change in the global ocean temperature. Because depth-averaged global ocean temperature has relatively little natural variability compared to other components of the climate system, change in ocean heat content is a reliable indicator of long-term energy imbalance [Trenberth et al., 2009; Palmer et al., 2011; Loeb et al., 2012]. Furthermore, thermal expansion of the oceans is a key component of the persistent observed rise in global mean sea level since the 1880 s [Church and White, 2011]. Much work has been carried out to make accurate estimates of long-term ocean warming, largely relying on in situ data from the mid-20th century onwards, from when extensive subsurface temperature observations were available [Levitus et al., 2000; Levitus et al., 2005; Levitus et al., 2009; Domingues et al., 2008; Levitus et al., 2012]. Several recent studies have extended the estimate of upper ocean temperature change prior to the late 20th century [Gouretski et al., 2012; Roemmich et al., 2012, hereafter RGG12], but only Roemmich et al. have considered the ocean below 400 m. They achieved this by comparing early 21st century ocean temperature data from the global Argo network, with temperature observations made during the 1873–1876 survey cruise of HMS Challenger, calculating a difference between the two periods of +0.33°C ± 0.14 in the upper 700 m. This warming is greater than that estimated for the mid-20th century to present day, thus implying a significant ocean warming in place prior to the mid-20th century. This raises two important questions. The first is to what extent can the estimated temperature change, which after all is based on the difference between observations made over two relatively short periods, be attributed to genuine long-term anthropogenic warming, as opposed to natural variability? Second, because the survey track of the Challenger cruise was confined to the Pacific and Atlantic oceans equatorward of ~40° latitude, and significant warming has been reported outside this domain [Gille, 2008], can the RGG12 estimate be considered representative of the global mean state of the ocean? In this study, we attempt to answer these questions using simulations from four state-of-the-art, fully coupled Earth System models as part of the Coupled Model Intercomparison Project Phase 5 (CMIP5), supported by global estimates from objectively analyzed and model-assimilated data.
 Because there are no direct observations of the century-scale natural variability of the ocean, simulations from the CMIP5 database of coupled climate experiments were analyzed. Warming over the 20th century was analyzed using a single ensemble member of each model's “historical” experiment, which includes natural and anthropogenic climate forcings from the mid 19th century to 2005 [Taylor et al., 2012]. Although the end of the historical simulations in 2005 is slightly before the end of the RGG12 analysis in 2010, the extra 5 years has minimal difference on the long-term temperature change. The influence of anthropogenic forcing was assessed by comparing the “historical” simulations with the “historicalNat” experiment, which includes the same natural climate forcings as the “historical” case (e.g., volcanic sulphate aerosols), but with no anthropogenic forcing (e.g., greenhouse gases, black carbon). Unforced natural variability of the ocean was estimated using results from the preindustrial control experiment (“piControl”). Models were selected covering a range of vertical and horizontal resolutions and model physics to ensure a range of reasonable simulations of natural variability. To ensure that the chosen models have a reasonable ocean thermodynamic response to long-term climate forcing, the “historical” simulation total ocean heat content change was compared with TOAnet averaged over the 135 years of the analysis period. The four models with an ocean heat content change closest to 80–95% of their respective TOAnet were selected, thus having an ocean warming response closest to that of the observed climate system. The selected models and their ocean components are summarized in Table 1.
Table 1. Summary of Coupled Models Used, Including the Name and Vertical Coordinate System of Each Model's Ocean Component
 Using the same approach as RGG12, the Atlantic and Pacific temperature differences were estimated as the unweighted average of each station in the relevant basin, and the global mean temperature change was estimated as the surface-area weighted mean of the Atlantic and Pacific averages. To estimate the spatial sampling error inherent in the global estimate, the model ocean temperatures were interpolated to the location of the Challenger survey stations used in RGG12, and vertically interpolated to the 100 fathom (~182 m) depth intervals of the original observations. A small number of stations, mostly at high Southern latitudes, were excluded from RGG12 where the Argo temperature profile indicated that they were unsuited to the measurement approach employed by the Challenger survey; these stations were excluded from this analysis also. Each station was subsampled at the same month relative to the first station along its track. (This can be envisaged as a synthetic Challenger cruise starting once a year over the period of analysis, each following exactly the same course.) Each track's global estimate was then compared with the “true” global mean (i.e., the mean made using all the model data points) calculated over a simultaneous 4 year period; i.e., a Challenger track estimate for the original February 1873 to May 1876 cruise would be compared with the 1873–1876 annual mean global temperature.
 To estimate unforced natural variability along the Challenger track, solutions for the “piControl” experiment were used. In this case, to include the effects of any potential seasonal bias in the cruise track, station sampling was not restricted to the same time of year as the original observation. Instead, complete tracks were made starting at each month of the data period (i.e., approximately 200 “cruises” departing one month apart, as opposed to 1 year apart as for the spatial sampling analysis). For each model, 1000 temperature differences between two randomly selected tracks were calculated with a minimum time difference of 135 years, giving 4000 random, unforced temperature differences. A probability density function for each 100 fm depth interval was made from these differences, as well as depth-averaged differences from the surface, from which 99% confidence intervals were estimated. To estimate the natural variability of the truly global ocean (i.e., not just along the survey track), a similar random-difference approach was used using the models’ unsampled global-mean.
3.1 Estimation of Global-Mean Temperature
 We first consider the question of a whether reasonable estimate of global temperature can be attained from the Challenger data. Uncertainties come primarily from errors during data collection (observation errors), the spatial distribution of the observations, and the natural variability of the global ocean. The effect of natural variability on the global temperature difference estimate is dealt with at length in the “detection” section of this paper (section 3.2). Using contemporary reports, RGG12 estimated the accuracy of the Challenger observations, and identified the major sources of observation error as being from pressure effects on the thermometers, and sounding line bias. Post-voyage analysis showed that a correction factor of 0.04°C km–1 accounted for compression effects on the thermometers, which is applied to the Challenger observations here. Additionally, RGG12 noted that the precision of the Challenger thermometers was 0.14°C. Divided by the square root of independent stations (54 and 40 for the Atlantic and Pacific, respectively), this would result in a track-mean uncertainty of ± 0.02°C for each individual basin, or ± 0.014°C over the global track.
 Sounding line bias is almost impossible to precisely quantify, because that would require knowledge of the sounding line's response to ocean currents (in particular the buoyancy and drag of the rope and thermometer), not to mention local winds and currents at the time of observation. However, a simple analysis of the time-mean ocean temperature and circulation was made to identify the stations that may have been particularly prone to this error. The problem occurred where vertical current shear dragged the sounding line away from the ship, so that true depth of the measuring thermometer was less than indicated by the line length; where there was also strong local vertical temperature gradient, this would result in a positive bias in the Challenger measurement, and thus a spurious reduction bias in the Argo-Challenger difference. Using 1958–2008 time-mean temperature and velocity data from the Simple Ocean Data Assimilation (SODA) [Carton et al., 2000], the vertical temperature gradient and the velocity difference with respect to the surface was calculated for each station depth and location; the temperature gradient and velocity shear were multiplied to give a sounding line bias parameter (°C s–1) combining both environmental components (see Figure S1 in the auxiliary material). The overwhelming majority of observations have a sounding line bias parameter less than 0.01°C s–1, but there were a relatively small number of outliers with a greater bias parameter. Unsurprisingly, almost all of these outliers were at the shallowest depth (i.e., 100 fathoms, or ~180 m) where vertical gradients tend to be strong; 25 locations had a noticeable sensitivity to sounding line bias at 100 fm, most of which indicated a significant cooling in the RGG12 analysis, and a small number of 200 fm observations also show a strong sensitivity. All of these stations were equatorial, six in the Atlantic and the remainder in the strong shear region of the Pacific Equatorial Undercurrent [Cromwell, 1953]. Exclusion of just these 25 observations at 100 fm (i.e., 2% of the total) increased the 0–400 fm temperature difference by 0.06°C, approximately 17% of the total warming, giving some idea of the possible low bias in the RGG12 estimate.
 We now consider the spatial sampling error due to the incomplete coverage of the Challenger voyage. Figure 1 compares the Challenger observations with global maps of simulated and observation-based warming trends from the objectively interpolated subsurface temperature and salinity analyses [Ishii et al., 2006, available online from http://rda.ucar.edu/datasets/ds285.3/], and from the 1958–2008 SODA data. The Argo-Challenger station temperature differences show a similar general pattern to the model ensemble and analyzed estimates, with strong warming concentrated in the Atlantic basin (especially the North Atlantic), and a weaker warming signal in the Pacific. Challenger stations that show a cooling are concentrated in the equatorial Pacific, which we have shown to be prone to sounding-line bias, and the regions of the Kuroshio and Gulf Stream currents. Cooling in the North Atlantic is not matched by any of the three maps, but we note that in this region Gulf-Stream related transient eddies and meanders coincide with strong meridional temperature gradients, hence station temperature differences may be much more prone to short term “noise” effects than elsewhere along the track. Most importantly, the maps show warming in the Indian and Southern Oceans, and it is therefore important to test whether the lack of stations in these basins is problematic for estimating the global mean, for which we used the models and analyzed data in Figure 1.
 For this analysis, we define sampling error as the difference between the track-based estimate and the global mean at any time. Spatial uncertainty in the Argo-Challenger temperature change could come from track-related bias, and from random error. The bias here would come from any trend in the sampling error, because a significant trend over time would imply that the Challenger track followed regions of weaker or stronger temperature change compared to the global average, and would introduce a bias over time in the track-estimate of global mean temperature change. The random error we take as the standard deviation of the sampling error time series (the time series’ of sampling errors are shown in the auxiliary material, Figure S2). The four models indicated small bias, with magnitudes of approximately ±0.06°C for 0–400 fm over the 135 time period that is much smaller than the 0.33 ± 0.14°C temperature difference reported by RGG12. The sampling biases for the SODA and Ishii et al.  data were also small (0.04°C and 0.05°C, respectively). The biases were similar for the 0–1000 fm mean temperature, at ±0.06, 0.05, and 0.07°C for the models, SODA and Ishii et al.  data, respectively. The random sampling uncertainty, were in the range 0.04–0.06°C for 0–400 fm, and 0.01–0.03°C for 0–1000 fm. The observational-based estimates have both higher variability and fewer observations, and had sampling uncertainties for 0–400 fm of 0.08°C and 0.04°C for the SODA and Ishii et al. data, respectively, and with 0.04 and 0.03°C for the 0–1000 fm case. Assuming that the uncertainties of two global track estimates are uncorrelated (and thus may be summed in quadrature), the total sampling error in Challenger track-estimated global mean temperature difference is the bias + √ (2 × uncertainty2). Taking the largest bias and uncertainty from the individual models and the assimilated data, the sampling error is 0.17°C for the 0–400 fm layer, and 0.13°C for the 0–1000 fm. These estimates are not increased appreciably when the instrument precision uncertainty is added. This uncertainty is not greatly different from the error estimate of RGG12 (0.14°C for 0–700 m), and we conclude that the Challenger track provides a reasonable proxy for global ocean warming signals that are substantially larger than these uncertainties.
3.2 Significance of Observed Temperature Difference
 Figure 2 shows the profile of Argo-Challenger temperature warming, including 99% confidence intervals for unforced temperature difference. At most depths the observed warming is outside the range of error and natural variability combined. The signal is particularly strong in the Atlantic, where the temperature change is statistically significant at all subsurface depths. Although submixed layer natural variability is less in the Pacific compared to the Atlantic (as shown by the comparative widths of their respective confidence intervals), the warming signal is much reduced, and even shows a cooling below ~1350 m. For comparison, Figure 2 also shows the Argo-Challenger temperature difference from full model data using long term-trends, such that there are no temporal or spatial sampling errors in the model profiles. The simulations generally show less subsurface warming than observed, especially in the Atlantic basin, which may in part be due to short-term variability in the Challenger data. The most obvious difference between models and observation, however, are the warming minima in both basins at 180 m (100 fm), which is very pronounced in the Pacific. Noting from the previous section that this is the depth of greatest sensitivity to sounding-line bias, the average change at each depth was recalculated excluding the 28 observations identified as being at risk of sounding line bias. This increased the 100 fm Pacific average difference by 0.48°C, and the global difference by 0.29°C, sufficient to smooth out the inflection point in both cases (black dashed line in Figure 2).
 The subsurface temperature change over time is compared with the simulations in Figure 3. The temperature change from1873–1955 was calculated by subtracting the most recent estimate of 1955–2010 ocean warming [Levitus et al., 2012] from the Argo-Challenger difference, giving values of 0.2 ± 0.17 and 0.16 ± 0.13°C for the 0–400 fm and 0–1000 fm depth layers respectively. (The uncertainty here is the quadratic sum of the Levitus et al. and Argo-Challenger errors). (Because the Levitus et al.  interpolation scheme defaults to zero in data sparse regions, some of which have shown significant warming [Gille, 2008], their estimate may be a lower bound on the true late 20th century warming. However, any such underestimate is likely to be small compared to the Challenger uncertainty.) As with the profiles, the historical simulation warming lies within the error bounds of the depth-averaged observations, but is somewhat less than the observations (Figures 3a and 3b). A recent study by Gouretski et al.  using optimally interpolated hydrographic data indicates a 0–400 m warming of 0.3 ± 0.1°C between 1900 and 1955, which seems more consistent with our 1873–1955 estimate than the simulated changes. Furthermore, most of the models show a small cooling in the “historicalNat” simulations, which appears to stem from a slow recovery following significant volcanic eruptions (Figure 3c). The design of volcanic forcing in the CMIP5 experiments has been previously identified as a cause of spurious cooling [Gregory et al., 2012], and when this natural forcing is subtracted from the “historical” simulation it can be seen that the agreement between the models is improved, the agreement between the models and observation is improved, and the magnitude of simulated warming is increased (Figures 3e and 3f). This suggests that the historical experiment warming may be too low due to the representation of volcanic forcing in the models. (Because volcanic forcing is not included in the piControl simulations this effect has no bearing on the confidence interval estimates.)
 Figure 3 includes 99% confidence intervals for unforced simulations estimated using full model data rather than from Challenger station subsampling, such that the whiskers indicate observation uncertainty, and the grey shading indicates the uncertainty in estimating a global change from two time periods. The observed 135 year temperature change is well outside the range of normal natural variability in both layers. Our best-guess of the ~80 year temperature change preceding 1955 is also well outside the 99% confidence intervals, but due to the conservative uncertainty ranges the lower bound of warming cannot be completely separated from natural variability. If the 0.06°C estimated sounding line bias were included in the temperature differences, then the lower bound of the 80 year temperature change would be outside the confidence intervals. As noted above, the ‘historical’ simulations agree with the observed temperature change reasonably well, especially when apparently spurious cooling is accounted for. However, none of the “historicalNat” simulations (Figures 3b and 3d) reproduce the magnitude of the 1873–1955 temperature change, over any length of time within the 135 year period of the study (Figures 3c and 3d). This clearly indicates that the 135 year ocean warming estimated by RGG12 is part of a long-term anthropogenic signal, well outside the range of natural variability. Furthermore, while the conservative error estimates applied here preclude a complete separation of the 1873–1955 temperature change from natural variability, the model simulations over the same time period suggest that is very likely that this earlier temperature change is part of the same long-term anthropogenic signal.
 Using a number of coupled climate simulations, we have shown that the 135 year ocean temperature difference estimated by RGG12 indicates a long-term change that is significantly larger than simulated natural fluctuations. Despite being largely restricted to the Atlantic and Pacific equatorward of 45°, models and assimilated observations suggest that the Challenger survey stations give a valid proxy for the global-mean temperature difference, albeit with a large uncertainty due to spatial sampling error. After accounting for these uncertainties, the magnitude of subsurface ocean warming along the track of the Challenger survey is well outside the normal range of simulated century-scale natural variability, and agrees reasonably well with modeled warming over the same period. We also provide a unique estimate of observed warming from 1873–1955, which is also much larger than the 99% confidence limits, but which cannot be completely separated from natural variability due to the high sampling uncertainty. The estimated temperature difference is a lower-bound estimate because it does not include the negative bias due to sounding line error; a simple analysis suggests that this effect may underestimate the true warming by roughly 0.06°C, approximately 17% of the difference reported in RGG12. Because many of the stations prone to sounding line bias are in the Eastern Pacific, a region showing one of the strongest upper ocean warming trends globally, the true sounding line bias may in fact be larger. Although a formal attribution study has not been performed, all of the models reproduce this warming only where anthropogenic forcings are included in the simulation. This analysis strongly supports the conclusion of RGG12 that there has been a century-scale long term warming of the global oceans, and furthermore that this warming is forced by human activity.
 A major motivation of this work was to consider the climate impacts of the detected warming prior to the well-analyzed period since the 1950s. Using salinity data from the World Ocean Atlas [Antonov et al., 2010], 1873–2010 changes in ocean heat content and thermosteric sea level rise were calculated from the Argo-Challenger temperature difference, adjusted to an estimate of the 1873–1955 period using the Levitus et al.  late 20th century values. The Argo-Challenger change implies a TOAnet imbalance of 0.17 ± 0.05 Wm–2, which combined with the Levitus et al.  estimate of 0.27 Wm–2 gives an 1873–1955 mean TOAnet of 0.1 ± 0.06 Wm–2. The 1872–2010 thermosteric sea level rise calculated from the Argo-Challenger temperature difference is 70 mm, which combined with the Levitus et al. estimate gives an 1872–1955 thermosteric rise rate of 0.50 ± 0.2 mma–1. Reconstructions from historical tide gauge data give a total rise rate over the same period of 1.24 ± 0.2 mma–1 [Church and White, 2011], implying a contribution from ocean mass changes equivalent to 0.74 ± 0.3 mma–1 (Figure S3). This value is consistent with a recent attempt to close the budget of 20th century sea level rise [Gregory et al., 2012], which gave a total contribution 1900–1970 from glaciers, the Greenland ice sheet and terrestrial storage of 0.49 ± 0.3 mma–1 (no estimate exists for the Antarctic ice sheet). Notably, in the absence of observational estimates, Gregory et al.  had to rely on model simulations to calculate the early 20th century thermosteric rise, highlighting the need for studies such as RGG12 that incorporate historical records. The work presented here shows that RGG12s use of such measurements provides a robust and useful observed estimate of early 20th century change, which we hope will be of value for future studies of the global sea level and energy balances.
 The authors wish to thank two anonymous reviewers for their helpful suggestions on improving this manuscript. This work was carried out in part at the Jet Propulsion Laboratory, California Institute of Technology under a contract with the National Aeronautics and Space Administration, and supported in part by the Australian Research Council Centre of Excellence for Climate System Science (grant CE110001028).