The evolution of ocean temperature measurement systems is presented with a focus on the development and accuracy of two critical devices in use today (expendable bathythermographs and conductivity-temperature-depth instruments used on Argo floats). A detailed discussion of the accuracy of these devices and a projection of the future of ocean temperature measurements are provided. The accuracy of ocean temperature measurements is discussed in detail in the context of ocean heat content, Earth's energy imbalance, and thermosteric sea level rise. Up-to-date estimates are provided for these three important quantities. The total energy imbalance at the top of atmosphere is best assessed by taking an inventory of changes in energy storage. The main storage is in the ocean, the latest values of which are presented. Furthermore, despite differences in measurement methods and analysis techniques, multiple studies show that there has been a multidecadal increase in the heat content of both the upper and deep ocean regions, which reflects the impact of anthropogenic warming. With respect to sea level rise, mutually reinforcing information from tide gauges and radar altimetry shows that presently, sea level is rising at approximately 3 mm yr−1 with contributions from both thermal expansion and mass accumulation from ice melt. The latest data for thermal expansion sea level rise are included here and analyzed.
The broad topic of climate science includes a multitude of subspecialties that are associated with various components of the climate system and climate processes. Among these components are Earth's oceans, atmosphere, cryosphere, and terrestrial regions. Processes include all forms of heat transfer and fluid mechanics within the climate system, changes to thermal energy of various reservoirs, and the radiative balance of the Earth. The incredible diversity of climate science makes it nearly impossible to cover all aspects in a single manuscript, except perhaps for within massive assessment reports [e.g., Intergovernmental Panel on Climate Change (IPCC), 2007]. Nevertheless, it is important to periodically provide detailed surveys of the aforementioned topical areas to establish the current state of the art and future directions of research.
It is firmly established that changes to the Earth's atmospheric concentrations of greenhouse gases can and have caused a global change to the stored thermal energy in the Earth's climate system [Hansen et al., 2005; Levitus et al., 2001]. To assess the impact of human emissions on climate change and to evaluate the overall change to Earth's thermal energy (whether from natural or human causes), it is essential to comprehensively monitor the major thermal reservoirs. The largest thermal reservoirs are the Earth's oceans; their extensive total volume and large thermal capacity require a larger injection of energy for a change in temperature compared to other reservoirs.
Despite the importance of accurately measuring the thermal energy of the ocean, it remains a challenging problem for climate scientists. Measurements covering extensive spatial and temporal scales are required for a determination of the energy changes over time. While there have been significant advancements in the quantity and quality of ocean temperature measurements, coverage is not yet truly global. Furthermore, past eras of ocean monitoring have provided extensive data but variable spatial coverage. Finally, changes in measurement techniques and instrumentation have resulted in biases, many of which have been discovered with some account made.
This review focuses on subsurface ocean temperature measurements that are required for climate assessment, with an emphasis on the status of oceanographic temperature measurements as obtained from two of the key historical and modern measurement instruments. Those instruments (the expendable bathythermograph (XBT) and the Argo floats) are among the most important instruments for assessing ocean temperatures globally, and they provide up-to-date ocean subsurface temperature measurements. A historical discussion of other families of probes will also be provided along with discussions of the accuracy of those families.
While most of the analyses reviewed here are done by individuals or small groups of investigators, they would not have been possible without strong international coordination and cooperation. International, observational programs and projects are vital to the data used in these analyses. Early examples are the International Geophysical Year in 1957–1958, with its extensive Nansen bottle sections, and the 1971–1980 International Decade of Ocean Exploration, which endorsed the North Pacific Experiment (greatly increasing North Pacific shallow XBT use in the 1970s) and Geochemical Ocean Sections Study (a global high-quality and full-depth, if sparse, baseline oceanographic survey).
Since its inception, the World Climate Research Program (WCRP) has taken international leadership with the Tropical Ocean-Global Atmosphere project which focused on observation in the equatorial region in the 1980s, including initiating the Tropical Atmosphere Ocean/Triangle Trans-Ocean Buoy Network (TAO/TRITON) moored array and the World Ocean Circulation Experiment (WOCE) which took a truly global set of oceanographic coast-to-coast full-depth sections and expanded the XBT network in the 1990s. The WOCE provides a global-scale benchmark against which change can be assessed. More recently, the WCRP formulated the Climate Variability and Predictability Project (CLIVAR), further fostering the Argo float array and reoccupation of some of the full-depth WOCE hydrographic sections under the auspices of the Global Oceanographic Ship-Based Hydrographic Investigations Program. The Global Climate Observing System, in partnership with WCRP, has formulated a global ocean observing system and encouraged contribution to it, particularly through the OceanObs workshops in 1999 and 2009.
Oceanographic data centers, both national and international, are also vital to the studies reviewed here. These centers accept, collect, and actively seek out data (from large programs and small); then archive and quality control them; and make the results readily and publically available. The collection, assembly, and quality control of a comprehensive data set are invaluable for all sorts of global analyses, including those of ocean temperature, heat content, and thermal expansion.
2 The Evolving Subsurface Temperature Observing System: A Historical Perspective
An understanding of ocean heat content changes is only as good as the subsurface ocean temperature observations upon which these calculated changes are based. The subsurface temperature observing system is still relatively young when compared to atmospheric observing systems. What follows is a look at the developments and ideas that enabled implementation and precipitated changes in the observing system. As a guide, Figure 1 shows geographical coverage during the height of each iteration of the observing system.
2.1 Early Measurements (From 1772)
On Captain James Cook's second voyage (1772–1775), water samples were obtained from the subsurface Southern Ocean and it was found that surface waters were colder than waters at 100 fathoms (~183 m) [Cook, 1777]. These measurements, although not very accurate, are among the first instances of oceanographic profile data recorded and preserved. Slightly more than 100 years later, the Challenger expedition (1873–1876) circumnavigated the globe, taking temperature profiles from the surface to the ocean bottom along the way, ushering in an increased interest in subsurface oceanography and new technology developments which facilitated measurement. The Challenger was equipped with a pressure-shielded thermometer [Anonymous, 1870; Wollaston, 1782; Roemmich et al., 2012] to partially counteract the effects of pressure on temperature at great depths.
2.2 The Nansen Bottle Observation System (From 1900)
Around the time of the Challenger expedition, the reversing thermometer [Negretti and Zambra, 1873] was introduced and remained the standard instrument for subsurface temperature measurements until 1939. It is still in limited use today. A protected reversing thermometer was typically accurate to 0.01°C or better when properly calibrated. Pairs of protected and unprotected reversing thermometers were used to determine temperature and pressure, with pressure determined to an accuracy of ±5 m depth in the upper 1000 m. The development of the Nansen bottle [Mill, 1900; Helland-Hansen and Nansen, 1909] which attached the thermometers to a sealed water sample bottle completed the instrumentation package which constituted the subsurface upper ocean temperature observing system for the 1900–1939 time period. The problems during this time period with regard to a global ocean observing system were that Nansen bottle/reversing thermometer systems could only measure at a few discrete levels at each oceanographic station and that it was time consuming to deploy the instrumentation and make the measurements. It was also difficult to get properly equipped ships to most areas of the ocean. Many of the open ocean temperature profiles were measured during a small number of major research cruises [Wust, 1964]. Hence, the long-term mean seasonal variations, the year-to-year variance, and vertical structure of the ocean were not well described.
2.3 Mechanical Bathythermograph Observation System (From 1939)
Quickly and accurately mapping the temperature variation of the upper ocean became a military priority in the lead-up to World War II for the accurate interpretation of sonar readings to locate submarines and their potential hiding places. As related in Couper and LaFond , sonar operators were aware of an “afternoon effect” where sonar ranges were shorter in the afternoon than in the morning, but did not understand that the effect was due to diurnal warming. The wide vertical spacings of Nansen bottle casts did not capture the gradients at the bottom of the mixed layer or indeed the vertical extent of the mixed layer.
Early in the 1930s, Carl-Gustaf Rossby had experimented with an “oceanograph” which could draw a continuous pressure/temperature trace on a smoked brass foil [Rossby and Montgomery, 1935]. Rossby enlisted Athelstan Spilhaus to develop this idea into a cheap, reliable, reusable instrument. Spilhaus created the first version of the instrument that we now call the mechanical bathythermograph (MBT) [Spilhaus, 1938]. Oceanographers now had the means with which to acquire detailed sets of measurements to map the mixed layer and shallow thermocline [Spilhaus, 1940].
The U.S. Navy funded research to improve the design and operation of the MBT, as Drs. Vine, Ewing, and Worzel modified Spilhaus's design to allow operational use of the instrument by the Navy and oceanographers [Spilhaus, 1987]. The U.S. Navy, in conjunction with Scripps Institute of Oceanography and the Woods Hole Oceanographic Institution, facilitated the first coordinated worldwide subsurface temperature measurement system, which grew up during World War II and continued afterward. The MBT itself is a cylinder approximately 31.5 inches (~0.8 m) long and 2 inches (~0.51 m) in diameter with a nose weight, towing attachment, and tail. Inside the cylinder is a Bourdon tube enclosing a capillary tube with xylene (a hydrocarbon obtained from wood or coal tar) inside. As temperature increases, the pressure on the xylene increases, causing the Bourdon tube to unwind. A stylus attached to the Bourdon tube captures the movement as temperature change horizontally scratched on a plate of smoked glass. A spring and piston measuring pressure simultaneously pulls the stylus vertically down the glass, completing the depth/temperature profile. The instrument free-falls from a winch that is used to recover the instrument; it can be used at speeds up to 15 kt. Initially, MBTs were built to reach depths of 400 feet (~122 m). By 1946, MBTs could reach to 900 feet (~275 m), although the shallower version was deployed more often every year except 1964 (49% shallower version). The 900 foot MBTs had significant depth calibration issues if they were lowered the full 900 feet, and for this reason, most MBTs were not lowered deeper than 400–450 feet. The accuracy of the MBT instrument was ±5 dbar in pressure and ±0.3°C in temperature.
The Navy's interest in MBTs was for temperature gradient information, but a system of careful calibration was put in place to accurately preserve the full temperature information for future study. Later, more than 1.5 million MBT temperature traces from1939 to 1967 were digitized at 5 m intervals and stored on index cards. These cards were, in turn, electronically digitized and archived at the U.S. National Oceanographic Data Center [Levitus, 2012]. It was reported that 73% of all 1939–1967 MBTs were U.S. devices, but other countries, notably Japan and the Soviet Union, also dropped MBTs. However, these traces were not distributed under the U.S. Navy system. MBTs continued to be used after 1967, with ~800,000 traces gathered in 1968–1990. Geographic coverage of MBTs was limited by areas of interest to navies, merchant ship routes, and research cruises. So, while a sketch of the upper ocean waters was being recorded by the MBT network, geographic distribution was uneven and temperature measurements from depths deeper than 250 m were still reliant on sparse Nansen bottle observations.
The development of the salinity-temperature-depth (STD) and later the conductivity-temperature-depth (CTD) instruments augmented existing observations by eventually replacing the discrete reversing thermometer observations with continuous profiles of temperature. The development of the CTD also laid the groundwork for our current observing system and for the backbone large-scale measurement cruises of the World Ocean Circulation Experiment (WOCE) among others. But, since it was an instrument that was mainly deployed from research ships, the CTD could not replace the MBT observing network. The development of the CTD was precipitated by advances in temperature measurement before and during World War II. The basic physical concept of a thermal resistor was known as early as 1833 when Faraday noted that the conductivity of certain elements was affected by changes in temperature [Faraday, 1833]. However, it was not until 1946 that technological advances made commercial production of these thermal resistors (coined “thermistors”) possible [Becker, 1946]. Similarly, platinum resistance thermometers, which had been understood for some time [Callendar, 1887], became practical for oceanographic applications owing to more recent technological advances [Barber, 1950].
An early attempt to measure a continuous temperature profile [Jacobsen, 1948] inspired Hamon and Brown [Hamon, 1955; Hamon and Brown, 1958] to engineer a similar instrument. Hamon and Brown deployed their first STD in 1955 [Baker, 1981]. Their instrument, which was lowered by a winch, used a thermistor, as well as a conductivity sensor and pressure sensor connected by a sealed cable to an analog strip chart on deck. The pressure sensor was a Bourdon tube connected to a potentiometer. Commercial production of CTDs began in 1964. Brown later modified the CTD design to use both a fast-response thermistor and a platinum resistance thermometer as well as a wire strain gauge bridge transducer to measure pressure in order to correct transients in the conductivity signal [Brown, 1974]. Most modern CTDs now use thermistors, often in pairs, and strain gauge pressure sensors. While Hamon's original STD experiments had an accuracy of 0.1°C and 20 m in depth, the modern CTD is accurate to 0.001°C and 0.15% of full scale for pressure (1.5 m at 1000 m depth) and fully digital. Modern shipboard CTD temperature sensors have a time response of 0.065 s (compared to 0.2–0.4 s for the MBT stylus), which allow the acquisition of accurate pressure/temperature profiles at a fairly rapid deployment rate from the surface to the deep ocean. When combined with the lowering speed (~1 m s−1), a vertical resolution of 0.06 m is obtained, although in practice, data are often reported in 1 or 2 m averages, since ship-roll-induced motions alias the temperature data on finer vertical scales.
2.5 The Expendable Bathythermograph Observing System (From 1967)
As Snodgrass  relates, by the early 1960s, the search was on for a replacement for the MBT. The replacement needed to be cheaper and easier to deploy, calibrate, and retrieve data, and had to be able to profile deeply from ships moving faster than 15 knots. Technological advances in wire and wire insulation made it possible to create an instrument electrically connected to the ship and able to transmit information through a thin conducting wire. Advances in thermistor manufacture made it practical to deploy these temperature sensors cheaply, with no need to retrieve instruments after deployment. More than 12 companies attempted to create the expendable bathythermograph (XBT). Three succeeded, but only one, Sippican (Lockheed Martin Sippican (LMS)), went on to dominate the XBT market due to their winning of a contract with the U.S. Navy [Kizu et al., 2011]. Their design was a torpedo-shaped probe smaller than the MBT, containing a thermistor in the central hole through the zinc nose. A wire connected the probe to the ship deck. Part of the wire is wrapped around the XBT itself and part in a canister shipboard.
U.S. Navy traces were sent to the Fleet Numerical Weather Center (FNWC) where they were digitized, used for weather prediction and other projects, and then passed to the U.S. National Oceanographic Data Center (NODC) for archive and public release [Magruder, 1970]. About 60% of all publicly available XBT data in 1967–1989 were U.S. drops. In 1990, a global system of distributing XBT data was implemented (see below discussion of the Global Temperature and Salinity Profile Program (GTSPP)).
The new probe almost immediately revolutionized subsurface ocean temperature observations with their low cost and easy deployment from Navy, merchant, and research ships. Estimates of upper ocean global mean yearly heat content anomaly exhibit reduced sampling uncertainty starting from around year 1967, the first year of widespread use of the XBT [Lyman and Johnson, 2008; Boyer et al., 1998]. The success of the XBT and the concurrent Fleet Numerical Weather Center (FNWC) Ship-of-Opportunity Program (SOOP) led to more systematic designs of XBT observing networks for the Pacific [White and Bernstein, 1979] and the Atlantic [Bretherton et al., 1984; Festa and Molinari, 1992] which were implemented and continue still. The switch to digital recorders in the 1980s made the use and dissemination of XBT data even easier.
With the advent of the ARGOS positioning and data transmission system, set up by the French and U.S. Space agencies in 1978, XBT profiles began to be transmitted from ships in real time and distributed on the World Meteorological Organization's Global Telecommunications System (GTS). The Global Temperature and Salinity Profile Program (GTSPP) began in 1990 to systematically capture subsurface temperature data off the GTS, perform quality check and control, and distribute XBT temperature profiles (and other subsurface data) to the scientific and operational communities in near-real time. The XBT response time, at 0.15 s, is slower than modern shipboard CTDs, its accuracy likewise, at 0.15°C and 2% or 5 m in depth, whichever is greater. LMS is still the main manufacturer of XBTs. TSK, a Japanese company (Tsurumi-Seiki Co.), started manufacturing T6s in 1972 and T7s in 1978 [Kizu et al., 2011]. These designators follow a model-naming scheme that uses letter/number combinations to identify probe types. A Canadian company, Sparton, also briefly manufactured XBTs of their own design.
Despite their widespread use, XBTs are not free of problems. Section 3 of this review will discuss these problems in detail. From 1967 to 2001, the XBT was a major contributor to the subsurface temperature observing system and was responsible for the growth of this system. However, it was still limited to major shipping routes and Navy and research cruise paths, leaving large parts of the ocean undersampled for many years. The XBT is also depth limited. While there are deep falling XBTs such as the T-5 that reach to nearly 2000 m, they are of limited use due to cost and the lower ship speed necessary for the drops.
There is another expendable probe that contemporaneously measures conductivity and temperature (XCTD). It is available from TSK; however, it has appeared in far fewer numbers than the XBT devices described here.
2.6 Tropical Moored Arrays (From 1984)
The tropical moored arrays were set up to continuously monitor the tropical ocean. The first tropical moored array, the Tropical Atmosphere Ocean (TAO) array (later TAO/TRITON), was set up to help monitor and understand the El Niño phenomenon [McPhaden et al., 1998]. After initial experiments in 1979, an array of moored buoys, spaced at 2°–3° latitude and 10°–15° longitude, was set up across the equatorial Pacific. Work began on the array in 1984, and it was completed in 1994. The temperature sensor is often just a thermistor but is sometimes paired with a conductivity or pressure sensor depending on geographic location and depth. Each buoyed sensor is attached to a mooring line and hung at depths from the surface to 500 m. The measurements are relayed to a satellite and then the GTS at 12 min intervals. The TAO/TRITON array requires regular maintenance and calibration cruises.
The PIRATA array (Pilot Research moored Array in the Tropical Atlantic) [Bourles et al., 2008] was set up in the Atlantic starting in the mid-1990s. The RAMA array (Research Moored Array for African-Asian-Australian Monsoon Analysis and Prediction) [McPhaden et al., 2009], begun in the Indian Ocean in the early 2000s, is still not complete. Both follow similar setup and data transmission patterns as TAO/TRITON. The array is important for local heat content calculations [e.g., Xue et al., 2012], and even the exclusion of one meridional set of buoys from the heat content calculation during the 1997–1998 El Niño led to a significant underestimate of heat content anomaly.
2.7 Argo Profiling Float Observing System (From 2001)
By the 1990s, all the pieces were in place for a global ocean observing system: a scientifically based blueprint for systematic observations, a satellite network for real-time data delivery, technology for easy and accurate temperature and pressure (depth) measurements, and a reliable data distribution network. But the observing system was still limited by the need to take most measurements from ships, geographically limited, seasonally biased, and often costly to outfit and deploy. As with previous obstacles to the observing system, the answer to these limitations lay in a combination of older ideas and new technological applications. The Swallow float was a neutrally buoyant float developed in the 1950s [Swallow, 1955]. These floats sank to a neutrally buoyant level and were tracked by a nearby surface ship. Later, the SOFAR (Sound Fixing and Ranging) float [Webb and Tucker, 1970; Rossby and Webb, 1970] improved on this system by enabling tracking of the float by underwater listening devices. In the 1980s, the RAFOS float reversed this idea by having the float listen for stationary underwater sound sources [Rossby et al., 1986].
The Autonomous Lagrangian Circulation Explorer removed entirely the need for a system of underwater sound sources by having the float surface periodically and its position determined by ARGOS satellites [Davis et al., 1992]. The floats surface by increasing their buoyancy relative to the surrounding water by transferring mass and volume between the float's pressure case and an external bladder. The process is reversed for submersion. It was then a relatively simple step to add CTD sensors to the float to record pressure, temperature, and salinity as the float profiled from depth to the surface and to transmit this information to a satellite [Davis et al., 2001]. The accuracy of temperature and pressure measurements is that of the attached CTDs (0.002°C, 2.4 dbar). A blueprint for constructing and maintaining an observing system with these floats was set forth in 1998 [Argo Science Team, 1998], and the Argo Program was born. This program, which moved beyond regional float deployments in 2001, scaled up to global coverage (ice-free ocean outside of marginal seas) by 2005 and reached its goal of 3000 functioning floats in 2007 [Roemmich et al., 2009]. The expected lifetime of an Argo float is 3–5 years, so the fleet must be continually renewed to maintain the 3000 float goal.
The floats operate on a nominal 10 day cycle. They drift subsurface (usually at 1000 dbar) for most of that cycle. Each cycle, they dive to a nominal 2000 dbar target and typically measure pressure, temperature, and salinity from there to the surface where the information is transmitted to a satellite. The Argo Program is a holistic program which does not end with satellite transmission. The data are released on the GTS but also collected at data assembly centers. The floats are constantly monitored, with internationally agreed standards of quality control applied to their data both in real-time and delayed modes. So, the Argo Program governs the floats from deployment planning through quality control and dissemination, a true end-to-end observation system.
2.8 Summary of Ocean Temperature Measurements
The subsurface temperature observing system has evolved from an ad hoc low vertical resolution sampling of the ocean with Nansen bottles to the more systematic, but low accuracy, limited depth and geographic coverage of the MBT, to the first sustained observing system with spatial coverage capable sufficient to reduce errors in global upper ocean heat content calculations with the XBT, to the systematic, tightly controlled, seasonally unbiased, near-global upper ocean coverage of the Argo floats. Interspersed within the main observing system data are high-quality bottle and CTD temperature measurements from projects such as WOCE (1990–1998). Historic studies of ocean heat content and other related variables need to take into consideration the changes in the observing system and the limitations of the system during each time period to fully interpret their results. Gliders, undulating CTDs, and sensor-outfitted animals are already starting to extend and expand the observing system, and full-depth Argo floats are under development with a goal of allowing an ever-improving understanding of ocean heat content variability and its place in the Earth's climate system.
3 The Expendable Bathythermograph (XBT)
3.1 The XBT: The History of the Instrument and Its Accuracy
3.1.1 The XBT Instrument
As discussed earlier, an XBT is a probe that measures temperature as it free-falls through the water column. Originally, they were designed for military use to determine the properties of the ocean. However, they have subsequently been widely used for nonmilitary applications [Campbell et al., 1965] and were the dominant oceanographic instrument for collecting upper ocean temperature profiles from the 1970s to the 1990s [Johnson and Wijffels, 2011].
A variety of different types of XBT have been manufactured to meet different needs [Lockheed Martin Sippican, Inc., 2005]. The types differ in the maximum depth they can reach and in the maximum ship speed at which the claimed depth range is guaranteed. The most common types of XBTs are the T4 and T6 models, designed to reach 460 m, and the T7 and Deep Blue (DB) models that are nominally able to reach 760 m. The two subtypes within each of the depth categories are designed for ships moving at different speeds. There are also other types, including probes designed for longer (T5; 1830 m) and shorter (T10; 200 m) depth ranges and for greater vertical resolution (T11). According to Ishii and Kimoto [2009, Table 1], approximately 23% of XBTs are known to be T4s or T6s manufactured by LMS (and 2% by TSK), while 21% are T7s or DBs manufactured by LMS with 1% by TSK.
Unfortunately, for about 51% of XBT profiles in the historical archives, the type is unknown (Figure 2). After 2000, most XBT profiles have information on probe type. However, before 2000, the unknown-type profiles constitute at least a half of the data set; the proportion is about 17% and 62% for deep and shallow XBTs, respectively. The percentage varies from year to year, and the peak in the number of unknown-type profiles occurs near 1990. The geographical distribution of the unknown XBTs (Figures 3a and 3b) shows that they occur across all the oceans, although their frequency is relatively low in the northern Pacific Ocean.
Unlike other oceanographic instruments such as MBTs, reversing thermometers, or CTDs deployed from ships or mounted on profiling floats, the XBT probes do not measure pressure (depth). Instead, the XBT sample depth D is inferred from the time t elapsed since the moment when the probe hits the water using a fall rate equation (FRE), often expressed as
where a represents the speed of the probe as it enters the water and b describes the deceleration of the probe principally due to the reduction in probe mass as the spool of wire on the XBT unwinds. The fall rate coefficients as estimated by LMS for T4, T6, T7, and DB probe types are a = 6.472 m s−1 and b = 2.16 × 10−3 m s−2.
3.1.2 Methods of Determining the Biases of XBT Data
LMS quotes the accuracy of XBTs to be 5 m or 2% of depth (whichever is greater) and ±0.2°C in temperature; this value is slightly larger than other reports on XBT accuracy. However, numerous studies on the accuracy of XBT data revealed systematic errors that exceeded the manufacturer-specified limits.
In order to quantify XBT biases, one needs to select unbiased data as a reference. A number of methods have been used to assess XBT biases. The results obtained in each study can be crudely subdivided into the following categories:
Side-by-side studies. The XBT probes and a CTD instrument are deployed concurrently from the same platform. The quality data from the CTD are then used to assess the accuracy of the XBT. The method gives detailed information on biases for specific vintages of XBTs, but this information cannot be used with confidence to correct the majority of XBT profiles archived that are lacking critically important metadata [Flierl and Robinson, 1977; Fedorov et al., 1978; Hanawa and Yoritaka, 1987; Hallock and Teague, 1992; Thadathil et al., 2002; Reseghetti et al., 2007; Reverdin et al., 2009; Kizu et al., 2011; Cowley et al., 2013].
Comparison of XBT data with quasi-colocated and quasi-simultaneous reference data. In this method, the comparison is made between XBT and CTD/bottle measurements obtained from a large-scale, historical database such as the World Ocean Database (WOD) [Boyer et al., 2009]. These pairs are not strictly colocated and simultaneous but within a specific spatial and temporal distance (e.g., within 1° and 1 month). In this case, the sample size available for the bias calculation is larger than if only relying on data from side-by-side studies. This compensates for the additional uncertainty arising from using data that are less closely colocated in time and position. However, it obviously induces more uncertainties caused by mesoscale and seasonal signals [Ishii and Kimoto, 2009; Gouretski and Koltermann, 2007; Levitus et al., 2009; Gouretski and Reseghetti, 2010; DiNezio and Goni, 2010; Hamon et al., 2011; Wijffels et al., 2008]. The use of a robust statistic, the median, attempts to reduce this uncertainty [Levitus et al., 2009].
Use of bathymetric data. In order to assess depth biases, comparisons have been made between the last sampled XBT depth in water shallower than the maximum depth of the instrument to the bathymetry. Results of such comparisons are contained in Good  and Gouretski .
Both depth and thermal biases were revealed in XBT data during side-by-side intercomparisons since the late 1970s [Anderson, 1980]. Despite this fact, more attention has been paid to eliminating the depth biases than the thermal bias. This omission probably occurred because the XBT depths are inferred quantities given by the FRE and are therefore an obvious potential source of error. A number of side-by-side field experiments during 1985–1992 [Hanawa et al., 1995] resulted in new values of the FRE coefficients: a = 6.691 m s−1 and b = 0.00225 m s−2 for T4, T6, T7, and DB types. This new equation was recommended for use by the Intergovernmental Oceanographic Commission in place of the original FRE by LMS. Unfortunately, the recommendation by Hanawa et al.  to continue to archive XBT data with depths calculated using the original FRE was not always followed. The international data centers now possess XBT data with depths calculated using both versions of the FRE. Information about which FRE was used is missing for thousands of XBT profiles obtained since 1995.
After the introduction of the new FRE in 1995, the depth bias problem in XBT data was thought to be essentially solved, and the historical collection of XBT data was extensively used together with other hydrographic data for the estimation of the global ocean heat content time evolution [Levitus et al., 2000; Levitus et al., 2005]. However, recently, a strong time-varying temperature bias in the global XBT data set was revealed through comparison of the XBT data with quasi-colocated CTD and bottle data [Gouretski and Koltermann, 2007]. The biases in XBT data are comparable in magnitude to the climatic temperature changes, leading to spurious decadal variability in the global heat content time series.
Since then, several efforts have been made to develop XBT bias correction schemes for the global XBT data set by comparing XBT data with quasi-colocated and quasi-simultaneous reference data [Ishii and Kimoto, 2009; Cowley et al., 2013; Levitus et al., 2009; Gouretski and Reseghetti, 2010; Hamon et al., 2011; Wijffels et al., 2008] and with global bathymetric data [Good, 2011; Gouretski, 2012].
3.1.3 Biases in the XBT Data and Their Causes
22.214.171.124 Depth Bias
The cause of this bias is the inadequacy of the FRE in describing the actual probe depth, which depends on numerous factors including the physical characteristics of the water column, launching conditions, and the physical characteristics of the probes. The initial fall rate of the XBT (the a coefficient in the FRE) depends on launching conditions (launching height, waves, ship motion, etc.). The changing fall rate as the probe descends through the water column (parameterized by the b coefficient) can be affected by factors such as the water viscosity [Abraham et al., 2012a; Cowley et al., 2013; Seaver and Kuleshov, 1982; Hamon et al., 2012; Gouretski and Reseghetti, 2010], wire stretching, and ocean currents.
Historically, studies of the depth bias (e.g., Figure 4) concentrated on the depth range below the upper 50–100 m. Numerous side-by-side experiments were designed and conducted in order to provide new values of the FRE coefficients that would (on average) result in a more accurate estimation of the sample depth [Flierl and Robinson, 1977; Federov et al., 1978; Hanawa and Yoritaka, 1987; Hallock and Teague, 1992; Thadathil et al., 2002; Reseghetti et al., 2007; Reverdin et al., 2009; Green, 1984; Seaver and Kuleshov, 1982; Heinmiller et al., 1983; Bailey et al., 1994]. A summary of the FRE coefficients can be found in Gouretski and Reseghetti .
Recent studies have effectively introduced depth offsets into the FRE as part of their bias correction schemes in order to improve its accuracy [Reseghetti et al., 2007; Cowley et al., 2013; Gouretski and Reseghetti, 2010; Hamon et al., 2011; Gouretski, 2012; Cheng et al., 2011] despite the original form of the FRE (equation (1)) which implies that there is no time-independent component to the XBT depth calculation. A depth offset, c, can be introduced into the original FRE (equation (2)) as a constant term:
The physical grounds for the introduction of this term are the time lag of the thermistor and the whole acquisition system (which is greater than 0.1 s, the thermistor response time [Reseghetti et al., 2007]) and the launching conditions, such as height of the platform, sea state, ship motion, and entry angle. These factors can translate into a depth offset of the order of a meter or more.
Both the original form of FRE and the version suggested by Hanawa et al.  do not take into account the possible influence on fall rate of water temperature through its effect on the viscosity [Thadathil et al., 2002; Gouretski and Reseghetti, 2010; Hamon et al., 2011; Abraham et al., 2012a; Kizu et al., 2005], with the fall speed found to be slower in colder water (or generally in high latitudes). Unfortunately, specific tests in cold waters are rarely available and LMS provides no details about the conditions under which the FRE coefficients have been determined. This issue has been addressed in different ways: by including a term in a multiplicative depth correction factor that depends on the layer-averaged temperature [Gouretski and Reseghetti, 2010] or by separating probes according to the water temperature in which they were launched and calculating separate adjustments for the two groups [Hamon et al., 2011].
126.96.36.199 Thermal Bias
The thermal bias in XBT data originates from the thermistor, wire, and data acquisition systems. The existence of this error is documented in some side-by-side studies [e.g., Reseghetti et al., 2007; Reverdin et al., 2009]. For instance, XBTs from 23 French cruises were found to exhibit a warm bias [Reverdin et al., 2009]. There is a hint of temperature dependence in this bias (Figure 5), as also indicated by probe calibrations in the laboratory [Gouretski and Reseghetti, 2010].
Like the depth bias, the thermal bias seems to be time variable. It has predominantly positive values (i.e., the XBT profile is too warm). One study [Gouretski and Reseghetti, 2010] identified a time-varying thermal bias between approximately −0.05 and 0.15°C, peaking in the mid-1970s for the T4/T6 types of XBTs. The T7/DB types are also found to exhibit temperature biases in global studies but with perhaps smaller magnitude and less consistency between studies.
Some types of acquisition systems occasionally showed much larger positive temperature bias (compared to the above estimates) called “bowing” that is believed to be caused by leakage [Bailey et al., 1994; Kizu and Hanawa, 2002a]. This bias can most easily be assessed in thermally homogeneous waters such as the subtropical thermostads [Heinmiller et al., 1983]. The estimation of the thermal bias is complicated due to the suggested temperature dependence [Reseghetti et al., 2007; Reverdin et al., 2009].
188.8.131.52 Manufacturer Differences
Based on side-by-side comparisons, nominally identical T5 XBTs manufactured by TSK and LMS fall at different rates, with the TSK probes being heavier but falling more slowly. However, other XBT types (T6, T7, T10) made by TSK have a higher fall rate compared to those made by LMS [Kizu et al., 2011; Gouretski, 2012]. Recent comparison tests show that the two companies' probes have many structural differences and that these are thought to have caused the intermanufacturer fall rate differences [Kizu et al., 2011; Kizu et al., 2005].
184.108.40.206 Near-Surface Transients
XBT temperature readings can sometimes differ from the true water temperature near the surface (Figure 6) [Kizu and Hanawa, 2002b; Bailey et al., 1994]. The suggested causes of this are the already discussed thermal inertia of the probes and the difference between the probe storage temperature and water temperature. The depth range over which the surface transient is significant depends on the accuracy requirement and on the system type. The United Nations Educational, Scientific and Cultural Organization  recommends that the upper 3.7 m of a profile is not used because of these effects, consistent with an accuracy requirement of 0.2°C. However, for a more stringent accuracy requirement of 0.02°C, the surface transient effect can remain significant to ~10 m [Kizu and Hanawa, 2002b].
220.127.116.11 Change From Strip Chart to Digital Recording
Despite digital prototypes for XBT recording systems which were developed near the end of the 1960s, temperature profiles reported by the XBT sensors were registered by mechanical recorders on moving paper bands (strip charts) until the mid-1980s. Unfortunately, neither reliable documentation on the behavior of the strip chart recorders nor intercomparisons of these mechanical recorders with the digital acquisition systems that replaced them are available in the literature. This lack creates a serious barrier to developing a proper correction scheme for the time period between 1966 and the mid-1980s when the strip chart recorders were in use. Digitizing the paper records introduces additional but small uncertainties, e.g., 0.055°C and 0.95 m [Anderson, 1980]. Some of these issues have also been addressed by Cowley et al. .
3.1.4 Summary of Biases
The biases listed above make different contributions to the total temperature bias. A depth offset even of a few tenths of a meter is important within the seasonal thermocline, whereas the depth bias due to the uncertainty in the FRE coefficients becomes more important at greater depth. The transient effects are important within a near-surface layer (~10 m).
3.1.5 Global View of the Total XBT Temperature Bias
Global XBT versus CTD/bottle intercomparison studies provide an overall picture of the total XBT total temperature bias arising due to errors in the fall rate equation and temperature measurement (e.g., Figures 7 and 8). The following discussion describes the temperature biases that occur if the manufacturer's FRE for calculating the depth is used—different bias magnitudes that would be obtained if the Hanawa et al.  FRE was in use [Wijffels et al., 2008]. The distribution versus time (Figure 7) suggests a time-varying total temperature bias, with the positive biases (XBTs too warm) down to the maximum sample depth until the beginning of the 1980s. This period is believed to correspond to the time when strip chart recorders were in use and may tentatively be attributed to the biases inherent to these recorders. Another reason for the time variation could be adjustments of the probe design by the manufacturer.
The total temperature bias is characterized by positive values almost everywhere in the top 50–100 m (Figure 7). A change in the sign of the bias below this depth is revealed after about 1982–1984. There is also a clear stepwise change in the total bias near about 460 m. The cause of this step is a difference in the biases of shallow-range probes (T4 and T6) and deep-range probes (T7 and DB).
A meridional section of the zonally averaged total bias (Figure 8) reveals a symmetrical (relative to the equator) pattern, suggesting the dependence of the total temperature bias on the vertical thermal structure of the water column. For regions of low vertical temperature gradient (south of about 45°S and north of about 35°N), a warm bias is usually observed throughout the water column, suggesting the presence of a pure thermal bias in the data. In contrast, the total bias in the subtropical and tropical regions is negative below 50–100 m. The largest absolute bias values correspond to the regions with the strongest vertical temperature gradient, which implies that depth error is the main cause for the total error in temperature there.
3.1.6 Bias Correction Schemes for the Global XBT Data Set
XBT data are a major component of the global subsurface temperature database from the 1970s through the 1990s. There is, therefore, a lot of interest in the oceanographic community in the development of bias correction schemes in order to improve the utility of XBT data for climate applications.
Several correction schemes (Figure 9) have been proposed using different statistical methods. Wijffels et al.  contributed two sets of corrections, both of which attempt to remove bias by applying time-varying multiplicative factors to the measurement depths (effectively assuming that there are no pure temperature or depth offset adjustments). The first set (seen in Wijffels et al. [2008, Table 1]) spans 1968–2005, with separate corrections for “deep” (maximum depth > 550 m) and “shallow” XBTs. This categorization scheme separates out the most common types of XBT without needing to rely on the metadata that are missing for many XBTs. The second set of corrections (seen in Wijffels et al. [2008, Table 2]) spans 1993–2006 (not shown in Figure 9 owing to the short time span). The method relies on the use of satellite altimetry data to indirectly relate high-quality data and XBTs, which explains the short coverage period. Separate corrections are derived for a number of specific types of XBTs, including those of unknown type separated into the depth categories described above. Ishii and Kimoto  derived corrections that are also applied to the depths only and are proportional to the length of time the XBT had been falling through the ocean when each measurement was taken. Corrections cover 1966–2006 for a variety of different types and manufacturers of XBTs. A single set of corrections is provided for XBTs of unknown type. Owing to the difficulties with metadata, Levitus et al.  opted not to derive corrections for different types of XBTs. Their single set of adjustments are applied to the temperatures only and vary with time (in the latest version available at http://data.nodc.noaa.gov/woa/WOD/XBT_BIAS/antonov_xbtbias_2.dat, they cover 1966–2008) and depth. These corrections are not shown in Figure 9 as they are not directly comparable to the others.
While the previous three studies intercompared XBT and higher-quality data such as from CTDs in order to find corrections, Good  took an alternative approach and used bathymetry data as reference data to derive multiplicative depth factors for 1968–2008 for three specific types of XBTs and those of unknown type. This alternative approach provides the opportunity to validate corrections derived using the other and vice versa. However, the types of XBT used in water shallow enough for the method to work tend to differ from the open ocean, and fewer data are available to derive corrections. Gouretski  also took the bathymetry approach for deriving corrections. This study obtained time-varying thermal and depth corrections. The latter includes both an offset and a multiplicative term. Corrections cover 1967–2008, and there are four sets for different types of XBTs including those with missing metadata.
Gouretski and Reseghetti  took a multiple component approach for correcting the XBT data. They derived pure temperature corrections and depth corrections that vary with depth. The depth corrections were modeled with time-invariant offset and depth-dependent terms. Time-varying coefficients (as shown in Figure 9) were later made available from http://www.nodc.noaa.gov/OC5/XBT_BIAS/gouretski_reseghetti.html. Adjustments were also applied to the depth corrections as a function of water temperature. The corrections cover 1967–2008, and there are sets for T4/T6 and T7/Deep Blue XBT types. The approach of Hamon et al.  combines features of some of the previous studies. They used maximum profile depth to separate XBTs into categories and also divide each into two groups according to water temperature. Groups were also formed only from XBTs used in the western Pacific. Their corrections span 1968–2007 and include pure temperature and depth (offset and depth dependent) terms. Cowley et al.  derive corrections for the common shallow and deep types of XBT; those with missing data are assigned to a type by their maximum depth and country of origin. They propose two sets of corrections (covering 1967–2010): the first includes pure temperature and depth (offset and multiplicative) terms, while the second follows the Cheng et al.  method and includes a pure temperature offset and a depth equation for the XBTs that includes an offset term.
In spite of quantitative differences among the schemes, there is qualitative agreement among the fall rate corrections, suggesting a slower fall rate before 1980 and after about 2000, with the fastest fall rates between 1985 and 1990. There is also qualitative agreement among the estimates of thermal bias (Figure 9, bottom panels). Thermal bias is largest around 1970–1975, both for the shallow- and deep-type probes. As mentioned above, the higher thermal bias values in the beginning of the time series may be connected to the use of the strip chart recorders, but this hypothesis needs to be verified. The results suggest that while XBT biases can be identified and corrected, uncertainties remain.
During the 40+ years of studies on XBT biases, some basic features of the XBT biases were detected as presented above. These can be broadly divided into pure thermal and depth biases. The former results from problems in the thermistor, wire, and data acquisition systems, while the latter occurs due to the inadequacies in the FRE to realistically describe the motion of the XBT. The relative importance of each depends on the local temperature gradient, with pure temperature bias more prominent where the temperature gradient is low. Many correction schemes have been proposed to adjust for these biases.
However, many issues are still left unanswered, and uncertainties in the correction schemes persist. In the future, it will be vitally important to (1) understand and qualify each error source of the XBT bias and to determine the role each played in XBT bias history, (2) determine how best to correct the bias in global XBT data set, and (3) quantify the uncertainty in the bias adjustment schemes so that uncertainties in time series generated from the data can be properly assessed.
Addressing these points requires continued investigation into XBT biases and should be facilitated by international collaborations, for example, through continuing to hold workshops where the XBT bias issues are discussed (see summary at http://www.nodc.noaa.gov/OC5/XBT_BIAS/xbt_bias.html) and through a proposed project to improve the historical ocean data record. This will allow the XBT data to be exploited fully by the climate and oceanographic communities.
3.2 Dynamic Models for XBT Devices
While traditionally, the descent of XBT probes into the ocean water is handled through the use of standardized FREs, it is also possible to use dynamic models, allowing independent predictions of probe depths. Dynamic models differ from FREs because they are not based on experimental correlations that relate depth and time. Rather, they are based on a momentum balance analysis that includes the impacts of mass changes as the XBT wire unspools during descent. In this section, the values proposed by Hanawa et al.  are taken as the standard FRE coefficients for LMS T4/T6/T7/DB XBT probes.
Each technique has its unique advantages and disadvantages. For the FRE method, critical advantages are that modeling errors and simplifications are absent. The FRE is based on experimental results that are often performed by comparing temperature information from collocated and contemporaneous XBT and CTD experiments. Additionally, the FRE method is simple to incorporate into standard data processing procedures.
For the dynamic modeling technique, it is possible to incorporate changes to the drop conditions that are not reflected in the experiments during which the FRE was obtained. For instance, variations in probe mass, drop height, water temperature, or the linear density of the wire can be included in the analysis. When XBT devices are released into tropical waters or into waters that exhibit a temperature profile that differs substantially from the waters of the calibrating experiments, it is possible that a bias is incurred. That bias is related to the fact that water viscosity depends on temperature. Similarly, when drops are made from heights that differ from the recommended standard (2.5 m), there can be an impact on the probe depth. For example, probes deployed from ships participating in SOOP usually launch from heights of ~10 m. Unfortunately, the historical archives do not generally contain drop height information, so a correction for this bias is not possible. Additionally, we are unaware of dynamical comparisons from a moving vessel (the real conditions of the XBT deployment).
The first investigations which presented dynamic models [Green, 1984; Hallock and Teague, 1992; Kezele and Friesen, 1993] were limited by the accuracy to which the drag coefficients were known. Recently, a series of detailed studies on the drag coefficients has been carried out for the major types of XBT devices manufactured by LMS, and incorporation of spinning and nonspinning descent has been made [Abraham et al., 2012a, 2012b, 2011; Stark et al., 2011]. These studies, however, are limited to the modeling of fully submerged probes, and entry effects, for instance, are not incorporated. With this new information, and with a dynamic model, it is possible for users to calculate the depth of XBT devices independent of FRE models.
When the new drag coefficients are employed in a dynamic model, results from the new dynamic model can be compared with collocated and contemporaneous CTD measurements (Figure 10). Differences between the CTD and the XBT data are difficult to detect visually.
The new dynamical model also provides detailed information about the flow patterns in the near vicinity of the probe [Abraham et al., 2012a, 2012b, 2011; Stark et al., 2011]. Finally, the new method allows the user to quantify the impact of various parameters, including probe mass, drop height, and linear mass density of the wire, on depth. Since the new method makes use of the temperature measurements of the probe, variations of water viscosity are naturally included in the model. Probe mass and drop height may have a significant impact on probe depths; however, a recent set of experiments suggest that the effect of drop height might be overpredicted [Abraham et al., 2012a]. Surface effects (such as sudden impact forces at entry, angle of impact with the ocean surface, ship motion, entrainment of air, etc.) may negate the larger impact velocities.
Even though the dynamical model currently ignores these surface effects, the application of this technique to the XBT profiles stored in the world database could improve the accuracy of the estimations of the ocean heat content or provide support for experiments. For instance, it has recently been shown that the impact of ocean temperature on fall rate is modest [Cowley et al., 2013], a finding that is strongly reinforced by the numerical model [Abraham et al., 2012a]. It is also possible that the method could be applied to existing XBT data sets to provide FRE coefficients that are specific to a particular drop case.
3.3 The Global XBT Measurement Network
XBT deployments are designated by their spatial and temporal sampling goals or modes of deployment (low density, frequently repeated, and high density) and sample along repeated, well-observed transects, on either large or small spatial scales, or at special locations such as boundary currents and chokepoints (Figure 11). Low-density transects typically target 12 realizations per year, with XBTs deployed at 150–225 km spacing, and are designed to detect the large-scale, low-frequency modes of ocean variability. Frequently repeated transects typically target 12–18 realizations per year, with XBTs deployed at 100–150 km spacing, and are designed to obtain high spatial resolution observations in consecutive realizations in regions where temporal variability is strong and resolvable with an order of 20 day sampling. High-density (HD) transects target four realizations per year, with XBTs deployed at ~25 km spacing, and are designed to obtain synoptic high spatial resolution resolving the spatial structure of mesoscale eddies, fronts, and boundary currents.
Given the advances in global observing system, the global XBT network is currently focused on the monitoring of boundary currents and heat transport and not exclusively on the upper ocean thermal field. The OceanObs09 Ship Of Opportunity (SOOP) community white paper [Goni et al., 2010] contains many references to XBT scientific manuscripts.
XBT HD transects extend from ocean boundary (continental shelf) to ocean boundary in order to resolve boundary currents and to estimate basin-scale geostrophic velocity and mass transport integrals. Many HD transects now have time series extending for more than 15 years. PX06 (Auckland to Fiji), which began in 1986, is the earliest HD transect in the present network with almost 100 realizations. The scientific objectives of HD sampling and examples of research targeting these objectives are as follows [Goni et al., 2010]:
Measure the seasonal and interannual fluctuations in the transport of mass and heat across transects which define large enclosed ocean areas and investigate their links to climate indices.
Determine the long-term mean annual cycle and interannual fluctuations of temperature, geostrophic velocity, and large-scale ocean circulation in the top 800 m of the ocean. However, in some regions, XBTs reaching 800 m cannot depict the complete vertical structures of fine but intense oceanic jets and a combined approach in terms of high density and deeper profiling float measurements is necessary.
Obtain long time series of temperature profiles at approximately repeated locations in order to unambiguously separate temporal from spatial variability.
Determine the space-time statistics of variability of the temperature and geostrophic shear fields, recognizing that the late of synoptic salinity profiles introduces uncertainty in the shear-temperature relationship.
Provide appropriate in situ data (together with Argo profiling floats, tropical moorings, air-sea flux measurements, sea level, etc.) for testing ocean and ocean-atmosphere models.
Determine the synergy between XBT transects, satellite altimetry, Argo, and general circulation models.
Identify permanent boundary currents and fronts and describe their persistence and recurrence and their relation to large-scale transports.
Estimate the significance of baroclinic eddy heat fluxes.
3.4 Future of the XBT Network
The XBT network reflects the recommendations of OceanObs99 and OceanObs09 [Goni et al., 2010] and includes several transects that the scientific community has added during the last 12 years (Figure 11). Some transects may be difficult to occupy continuously due to logistical and budgetary constraints; however, they are kept as recommendations based on the justifications given by OceanObs99, supported by their scientific value. Ship recruitment is an ongoing issue in implementing the XBT network, resulting in gaps or shifts in sections. Sampling histories and data along individual transects are made available through http://www-hrx.ucsd.edu and http://www.aoml.noaa.gov/phod/hdenxbt.
Thirteen years after OceanObs99, the XBT HD transects continue to increase in value, not only through the growing length of decadal time series, but also due to integrative relationships with other elements of the ocean observing system, including the following:
The implementation of global broad-scale temperature and salinity profiling by the Argo Program underlines a need for complementary high-resolution data in boundary currents, frontal regions, and mesoscale eddies. HD transects together with Argo float data provide views of the large-scale ocean interior and small-scale features near the boundary, as well as of the relationship of the interior circulation to the boundary-to-boundary transport integrals.
Almost 20 years of continuous global satellite altimetric sea surface heights are matched by contemporaneous HD sampling on many transects. The sea surface height and the subsurface temperature structure that causes most of the sea surface height variability are jointly measured and analyzed.
Improved capabilities in ocean data assimilation modeling allow these and other data sets to be combined and compared in a dynamically consistent framework.
4 Accuracy/Biases of Argo Floats
The Argo Program, an array of over 3000 autonomous floats designed to return materially important oceanic climate data, is a vitally important component of the present oceanic Earth observing system and a strong complement to satellite observations. This program, designed to complement the Jason altimeter missions, provides observed climate signals that, when globally averaged, are sensitive to the presence of data bias. Thus, the Argo Program expends much effort to minimize the likelihood of measurement bias and spatial and temporal sampling bias.
The Argo Program has advanced the breadth, quality, and distribution of oceanographic data as compared to the broad-scale XBT network while continuing to supplement ship-based CTD programs (section 2). The autonomous profiling float introduced a high-quality CTD, reducing measurement uncertainty into the design of a broad-scale, subsurface data collection network. Liberated from the presence of research vessels once deployed, the floats' continuous oceanic measurement over a 3–5 year lifetime greatly reduced the temporal and spatial sampling bias of the historical hydrographic data set (Figure 1). Nowhere is the improved temporal bias more apparent than during the winter month of August south of 30°S in the Southern Ocean [Roemmich and Gilson, 2009].
By design, the Argo array is composed of multiple autonomous float models and manufacturers, ideally utilizing different sensor models, provided by over 30 national Argo programs. The array is currently dominated by three float families, Autonomous Profiling Explorer (APEX; comprising 68%), Sounding Oceanographic Lagrangian Observer (SOLO; 23%), and PROVOR (8%) (PROVOR represents Profiler Sea in French), although the recent introduction of several new float types and additional manufacturers may realign future percentages. The multiplicity within the array reduces the likelihood that a single failure vector or bias would render the array valueless for climate studies. However, in practice, near homogeneity in smaller regions does occur.
Consistent data processing over temporal and spatial dimensions and among different float providers is a high priority. Argo data are telemetered via satellite and made publicly available from the Argo Global Data Assembly Centers (GDAC) within 24 h of acquisition. The immediate distribution of data results in a two-tiered quality control system, each with distinct expectations of bias within the data. The initial release of data has undergone a series of automated checks termed “real-time quality control” (RTQC) which is performed at 1 of the 11 Argo regional Data Assembly Centers (DAC) [Wong et al., 2012]. The RTQC tests are coarse. A more careful analysis termed “delayed-mode quality control” (DMQC) is performed by the float provider 6–18 months after data acquisition [Wong et al., 2012]. The reason for the delay is to allow corrections to be made in light of the temporal behavior of the (particularly) salinity sensors. In general, data bias is identified and addressed in DMQC, unless a correction can be applied with minimal subjectivity in RTQC. This makes the DMQC data most suitable for climate-related studies.
What follows is a history of temperature and pressure bias identified within the Argo Program autonomous float array. Also, a presentation of past examples of float logic that led to data bias being injected into the data set will be given. These biases have already been or are presently being addressed by the Argo Program either through data adjustment or labeling.
4.1 Argo Float CTD Sensors
The majority of floats within the “Core Argo” array measure temperature, salinity, and pressure with Sea-Bird Electronics, Inc. (SBE) conductivity-temperature-depth (CTD) packages. In recent years, the Argo array has become nearly homogenous in the use of the SBE CTDs due to their high accuracy, modest conductivity sensor drift (both in numbers of floats with drift and, if present, the rate of drift), and the lack of a suitable alternative sensor provider. Falmouth Scientific, Inc. (FSI) provided an alternative CTD sensor option. However, the performance of FSI CTD-equipped Argo floats was substandard, and they were phased out. The last FSI-equipped Argo float was deployed in December 2006. While CTD model homogeneity simplifies discussion of Argo float CTD sensor bias, it also makes the Argo array susceptible to unforeseen CTD hardware/software issues.
Two models of SBE CTD designed for energy-limited autonomous operation are commonly installed on Argo floats: the SBE-41 and the SBE-41CP. The former is designed to minimize energy usage by turning off the sensor pump between sparse profile measurements (spot sampling). The SBE-41CP samples continuously (1 Hz) and can return profile data averaged over pressure intervals. The SBE-41CP can also be used in spot sampling mode. The SBE-41 has historically been installed in more Argo floats. However, numbers of SBE-41CPs are increasing along with the numbers of floats using the bidirectional, higher-bandwidth Iridium transmission system that can support collection of higher vertical resolution profiles
4.2 Sensor Drift: The Causes, Identification, and Correction
Sensor drift is a continuous concern for instruments that are designed to obtain extended duration measurements in the climate system. Of particular importance to this study is the magnitude and impact of pressure sensor drift discussed below.
A few different pressure sensor models have been used within Sea-Bird CTD packages on board Argo floats. Early floats contained pressure transducers manufactured either by Paine Corporation or Ametek. Both models drifted toward anomalously high pressure values (positive pressure drift) of 5–10 dbar at the ocean surface [Barker et al., 2011]. Beginning in 2002, Argo floats were deployed with SBE CTD packages paired with Druck Corporation PDCR 1820 pressure transducers [Barker et al, 2011]. The Druck sensor was quite stable, demonstrated by surface pressures (SP) reported over the lifetime of an Argo float falling within a ±1 dbar envelope.
In early 2009, it was discovered that over the previous couple years of Argo deployments, an increased number (25–35%, up from 3% prior to 2007) of Druck pressure transducers were exhibiting moderate to strong anomalously low pressure values at the ocean surface (negative pressure drift). The cause was traced to oil leaking through microfractures in the glass-metal seal feedthroughs in the inner sensor, allowing the sensor diaphragm to fill the interstitial regions (see http://www.argo.ucsd.edu/seabird_notice.html). The affected floats are termed “microleak” floats. By the 2009 discovery, the Druck pressure sensor was in the vast majority of active Argo floats; thus, all float models were potentially affected. The magnitude and rate of the observed pressure drift vary from a rapid drift with quick float failure over tens of cycles to a gradual drift over the floats' lifetime. In either case, the microleak floats fail prematurely when enough oil has escaped to allow the sensor to short. At the time of float failure, the magnitude of drift can range from the high single digits to tens of decibars. The manufacturing process of the Druck pressure sensor was modified, and Argo floats with rigorously tested Druck pressure sensors were again being deployed by late 2009. In addition, Sea-Bird has introduced an additional pressure sensor option manufactured by Kistler International.
Argo floats commonly transmit measured surface pressure (SP) during their surface transmission period. An estimated correction to pressure is applied by assuming a pressure-independent offset equal to the SP value, although the applicability of the correction to the full Argo array has not been rigorously confirmed. Studies done by Sea-Bird on predeployed microleak-affected Druck pressure sensors and a few recovered sensors found an offset correction to be within stated errors until pressure drift exceeding −5 to −10 dbar, after which an additional corrective slope term is necessary to increase the negative drift at depth. For pressure drifts larger than −10 dbar, a temperature component to the nonlinear correction is necessary, with cold water at depth requiring greater correction (N. Larson, Sea-Bird Electronics, personal communication, 2012).
Several Argo float models are programmed to autocorrect for pressure drift on board the float using the measured SP offset (e.g., SOLO and PROVOR families). This can be accomplished through the zeroing of the pressure transducer at the surface or by applying a cumulative pressure correction. Regardless of the method, the float is transmitting profile and trajectory data that use corrected pressures. For these self-correcting floats, only when the offset correction is insufficient do the data need to be flagged as questionable. The remaining Argo float models, which include APEX floats, do not autocorrect for pressure drift. For these floats, the pressure (and salinity) data need to be adjusted and flagged appropriately within the real-time file.
When APEX float pressure is uncorrected, Barker et al.  found a net global positive temperature bias, although the signal was mitigated through compensating pressure drifts from floats utilizing different pressure sensor models. Globally averaged temperature bias reached a magnitude of 0.02°C at the base of the mixed layer. The compensating tendency was not so strong in regional areas and the biases were larger [Barker et al., 2011], nor should it be expected to be as effective in the near-future as the older, positive pressure drifting Argo floats disappear from the array.
An audit of the Argo GDAC is routinely performed to identify incorrectly offset-adjusted pressure data and to confirm that the technical and meta-information necessary to substantiate the adjustment is present. The application of nonoffset corrections and the flagging of data as uncorrectable due to pressure drift are applied in DMQC.
Like pressure measurements, salinity sensors have been investigated with respect to drift and sensor response correction. While not a major focus of this paper, a number of articles are referred to here for a more detailed discussion of that topic [Wong et al., 2012; Lueck and Picklo, 1990; Johnson et al., 2007b; Owens and Wong, 2009; Wong et al., 2003; Böhme and Send, 2005; Guinehut et al., 2006].
4.3 Temperature Sensor Bias
No example of significant temperature drift has been identified within the Argo array. The thermistor used in the SBE41 and SBE41-CP has a manufacturer's stated accuracy of 0.002°C and stability of 0.0002°C yr−1. Identifying temperature drift without postmission calibration is difficult. To date, no standard test designed to identify temperature drift is performed within RTQC or DMQC [Wong et al., 2012]. However, small numbers of instruments recovered and recalibrated after 4–9 month missions have shown no appreciable drift within manufacturer's stated temperature accuracy [Oka and Ando, 2004]. More recently, temperature sensors of a few floats recovered after 3–5 years in the field have also not drifted outside these stated accuracies.
Argo float models report the Core Argo profile parameters—temperature, salinity, and pressure—as either a point measurement or vertical pressure average (bin averaged). This difference in data reporting will result in an apparent temperature bias proportional to the vertical curvature of temperature and the width of the averaging interval. Both sampling methods are equally valid and each provides advantageous properties depending on the application, but the sampling mode should be known for most accurate use. The bin-averaged value is directly applicable to heat content estimates, while the spot sampling value measures the actual temperature on a pressure level. Many Argo floats shorten the sampling interval in the upper water column where vertical gradients are strongest.
The apparent temperature bias resulting from improperly analyzing data reported using the different strategies cannot be estimated directly but can be modeled or approximated using high-resolution shipboard CTD data (and Argo 2 dbar profiles). Here a calculation is presented to illustrate the pressure ranges most susceptible to this bias. The Roemmich-Gilson Argo climatological data set (RG) includes Argo-derived climatological salinity and temperature values on 58 pressure levels spanning 0–2000 dbar [Roemmich and Gilson, 2009]. The RG levels approximate (although in most cases underestimate) the number of levels reported by a typical Argo float, with finer resolution nearer the surface. The RG climatological temperature profiles were interpolated to 20,000 values with a cubic spline at each 1° latitude × 1° longitude grid point to approximate the scans recorded by a continuously sampling (1 Hz) CTD with a float rise rate of 10 cm s−1. The globally averaged temperature gradient from 2000 up to approximately 200 dbar (Figure 12b, black lines) results in a warm bias for floats recording bin-averaged data. The sign is reversed in the surface waters with the bias reaching its maximum magnitude at 30 dbar. The net bias summed over pressure is small. Bias from a 1° square area in the equatorial Pacific (Figure 12, red lines) reaches 3 times the magnitude of the global average in the transition between thermocline and mixed layer. The apparent temperature bias is clearly accentuated in larger width pressure averaging bins.
It has been difficult to identify bin-averaged versus spot-sampled profile data with the information currently available at the Argo GDAC. However, Argo will soon be utilizing updated procedures which explicitly state whether the data are bin averaged or spot sampled. The CTD model can often be found in the float metafile; at present, 86% of floats that report using a SBE CTD indicate the model. However, the correspondence between CTD model and sampling method is inexact as the SBE-41CP can be used to retrieve either bin-averaged or spot-sampled data. Although there are exceptions, float models, if equipped with similar hardware and telemetry, tend to use a consistent sampling method. Floats that commonly report bin-averaged data include the SOLO float family and, with exceptions, the PROVOR float family. APEX floats which transmit data via service Argos primarily use the SBE-41 and record spot-sampled data, but Argo Iridium floats are often equipped with SBE-41CP CTDs that can be used in either spot sampling or bin averaging configurations, sometimes both in a single profile. Argo floats using Iridium telecommunications often report data at a vertical resolution of 2 dbar. This fine resolution bin averaging, typical of that used for processed shipboard CTD data, reduces the bias under consideration considerably compared with coarse vertical resolution reporting of other Argo floats.
4.4 Biases Introduced by Float Firmware
Two recent, unrelated issues affected a single (but different) model of Argo float and introduced pressure bias into the Argo data set. At present, the Argo Program has addressed the situation, asking program principle investigators and DACs to appropriately correct or mark the data at issue as bad or questionable, and issued recommendations on proper data interpretation to the Argo community. Both instances highlight the necessity of creating a robust Argo array that is populated by multiple float types utilizing multiple sensors. In the absence of such a variety of floats/sensors, it will remain a task for the scientific community to implement robust methodologies to identify problems and to ensure the metadata exist to identify and correct problems.
4.4.1 Truncated Negative Drift Pressure (TNDP) in APEX Floats
The microleak issue discussed earlier led to a reduced mean lifetime for Argo floats deployed in the years 2007 through early 2009. However, the microleak issue was complicated by a legacy programming issue in APEX floats, resulting in an unknowable and thus uncorrectable amplitude of negative pressure drift. APEX floats with older controller boards which were deployed as late as 2009 (identified as APF5 through APF8) did not report signed pressure values and truncated the SP reading to zero if the value was negative [Barker et al., 2011] In these floats, the SP is saved and transmitted with an offset of +5 dbar because the SP+5 dbar value is used by the float on the following ascent to shutdown the CTD before nearing the surface. APEX floats that report a constant (over many cycles) +5 dbar value of SP indicate that the pressure sensor is consistently reading negative values. These floats have been identified as “truncated negative drift pressure” (TNDP). Newer APEX controllers (some later versions of APF8s and and all APF9 boards) that were used in all APEX Argo floats deployed since late 2009 do not truncate negative SP values.
TNDP APEX floats did not originate with the arrival of the microleak floats. However, the large-amplitude, negative drift of the microleak floats spurred analysis on the possible pressure bias caused by TNDP floats. In a census conducted in January 2009, Barker et al.  were able to identify 26.9% of APEX-measured profiles as likely TNDP. This value is a lower bound as they were unable to make a determination on 15.6% of profiles due to insufficient GDAC metadata and/or SP values. By comparing the identified TNPD profiles to nearby non-TNPD profiles, Barker et al.  estimate a mean pressure error of −3 dbar, cautioning against the use of TNPD APEX float data in ocean heat content studies.
How a TNDP float is identified is dependent on the believed severity of the unknown drift [Wong et al., 2012]. Progress in documenting TNDP APEX floats is ongoing. Data users may make their own determination of TNDP status by referring to the SP variables included in a float technical parameter netCDF file available at the GDAC.
4.4.2 Incorrect Assignment of Pressure Bins in Woods Hole Oceanographic Institution (WHOI) SOLO Floats
A number of Argo float models report bin-averaged profile data during ascent. Some of these floats do not transmit measured pressure but instead rely on a pressure lookup table. A subset of Argo floats (SOLO), manufactured prior to 2007 by the Woods Hole Oceanographic Institution (WHOI), had assigned temperature and salinity values to incorrect pressure levels [Willis et al., 2009]. The issue encompassed most SOLO WHOI FSI models and a subset of SOLO WHOI SBE models. The incorrect data assignments in the SOLO WHOI FSI floats were not correctable without certain engineering data that were not universally transmitted. The pressure bias varied between float model and cycle by cycle in the FSI models, but the net effect was an apparent cooling of the water column that was partially responsible for the Atlantic Ocean OHC (ocean heat content) variability during 2003–2007 discussed in Lyman et al. .
The float profile data that were in error due to incorrect pressure level assignment either have been corrected to the proper pressure (all SOLO WHOI SBE models and a subset of SOLO WHOI FSI) or have been assigned bad quality control flags for those models that are uncorrectable (subset of SOLO WHOI FSI). Lists of the different floats in each category can be found online (http://www-argo.ucsd.edu/Acpres_offset2.html).
The Argo Program float array is an important component of the present oceanic Earth observing system, extending broad-scale monitoring of ocean temperature, among other variables, from what was achieved by previous research programs. Hence, it is illustrative to place into context the bias of Argo data as described in this review. Perhaps the most pertinent for the climatic temperature record discussed in the next sections is that recent studies have estimated XBT pressure biases to be up to ~10 times greater during some temporal periods than have been identified in Argo floats [Wijffels et al., 2008; DiNezio and Goni, 2011]. Argo CTD temperature sensors are well calibrated before deployment and appear to be stable within errors over the floats' lifetime. The spatial and temporal distribution of pelagic ocean data is improved. A Northern Hemisphere sampling bias is greatly lessened with Argo but still remains, due largely to sparse Southern Hemisphere ship availability for deployments, float limitations (e.g., lack of ice avoidance routines on some float models), and a bias in float funding toward the Northern Hemisphere.
The near-term future goals of the Argo Program are to sustain a Core Argo array near its present float density, data quality, and consistency while extending sampling both spatially and toward greater pressure. Improvement in mean Argo float lifetimes is a primary reason for the feasibility of maintaining the current array and allowing the possibility of spatial extensions. The most apparent spatial bias in Argo float density in the pelagic ocean is found within seasonal ice zones. The inclusion of ice avoidance schemes to float firmware should facilitate a reduction of this bias. Extending the Argo Program to greater pressure is occurring on two fronts. Several float types (ARVOR and New profilINg float of JApan, NINJA) are being modified to extend their pressure range. Simultaneously, “Deep Argo” float development is underway, leading to floats capable of reaching 6000 dbar, which will allow sampling from the surface to the ocean floor over all but <2% of the ocean area. The importance of broad-scale temperature monitoring extending to the deep ocean is discussed in future sections. The collection of high-quality abyssal ocean data requires a low-energy CTD possessing improved sensor stability characteristics than presently used for the upper ocean. Development of a Sea-Bird CTD for use in Deep Argo floats is ongoing. Emphasis has been placed upon improving the predeployment sensor calibration methods and the unit's sampling techniques (D. Murphy, Sea-Bird Science Director, personal communication, 2013). Initial Deep Argo float prototype deployments will be equipped with the new sensor package, allowing its accuracy and stability to be accessed.
Finally, the future Argo array will likely continue to expand its use of higher-bandwidth, bidirectional data transmission services. Advantages include the recording of profiles at higher resolution (2 dbar and higher), reduced float mortality due to shorter surface periods, and the ability to modify the float sampling midmission driven by scientific objective. Additionally, a wider range of engineering diagnostic data will be transmitted.
5 Global Ocean Heat Content, Earth Energy Budget, and Thermosteric Sea Level Rise
The amount of heat accumulating in the global ocean is vital for diagnosing Earth's energy imbalance and sea level rise. Over 90% of the total heat accumulated in the Earth's climate system goes toward warming the ocean [Bindoff et al., 2007; Church et al., 2011], and over the past four decades, this process has resulted in a marked increase in upper ocean heat content [Domingues et al., 2008; Ishii and Kimoto, 2009; Levitus et al., 2009] and ocean thermal expansion, thus contributing to sea level rise [Antonov et al., 2005; Domingues et al., 2008; Church et al., 2011; Hanna et al., 2013]. Ocean warming has also been observed below the thermocline [von Schuckmann and LeTraon, 2011; Levitus et al., 2012] and even in the abyss [Purkey and Johnson, 2010; Kouketsu et al., 2011]. The uptake of heat by the ocean acts as a buffer to climate change, slowing the rate of surface warming [Raper et al., 2002], and so is an important element in the evolution of the climate over land and between the Northern and Southern Hemispheres.
This section will present an abbreviated update of ocean heat content estimates, the present Earth energy balance, and thermosteric sea level rise with a particular focus on the accuracy of the temperature measurements and the impact of accuracy on the certainty of these measurements.
5.1 Upper Ocean Warming
Changes to ocean heat content (OHC) can be calculated from measurements of the temperature evolution of the ocean. The OHC is attained from the difference of the measured potential temperature profile and the potential temperature climatology. This difference is integrated over a particular reference depth (for instance, 700 m) and is multiplied by a constant ocean density reference and heat capacity.
A multidecadal increase in global ocean heat content in the upper 700 m (OHC 0–700 m) is evident in various observational estimates [e.g., Palmer et al., 2010, Figure 2], superimposed with interannual-to-decadal fluctuations. Prior to the full deployment of the Argo array in 2005, these estimates relied on a sparse and unevenly distributed set of subsurface temperature data (section 2), collected by a large and changing mix of instruments with various accuracies and biases (sections 3 and 4). The degree to which these observational estimates differ from each other in their global evolution and spatiotemporal variability mainly reflects the sensitivity of the OHC calculations to different choices of (i) instrumental bias correction, (ii) mapping approach, (iii) climatological reference, and (iv) the quality, types, and amount of data included in the analyses [Palmer et al., 2010; Lyman et al., 2010]. For example, Ishii and Kimoto  incorporate gridded sea surface temperature (COBE-SST based on Ishii et al. ) within the mixed layer in addition to in situ data. Levitus et al.  include in situ data from all available instrument types, whereas Domingues et al.  only use in situ data from bottles, CTDs, XBTs, and Argo floats. Historical and Argo databases are regularly updated, so both quality and types, and amount of data used in the OHC analyses depend on the database and its version. In addition, before using the databases, research groups often apply their own quality control procedures. Consequently, the amount of apparently erroneous (delayed or real time) data removed also varies. To increase the number of available profiles for the OHC 0–700 m analyses, shallower temperature profiles are sometimes extrapolated to 700 m [e.g., Willis et al., 2004; Lyman et al., 2010].
Since the recent discovery of time-dependent XBT biases [Gouretski and Koltermann, 2007], numerous corrections have been proposed [Wijffels et al., 2008; Levitus et al., 2009; Ishii and Kimoto, 2009; Gouretski and Reseghetti, 2010; Cheng et al., 2011; Good, 2011; Hamon et al., 2011; Gouretski, 2012; Cowley et al., 2013], but there is as yet no universal agreement on which adjustment is the most accurate (section 3). Nevertheless, all these methods improve the overall quality of the estimates of OHC. Outstanding issues include incomplete understanding of bias sources, unknown impacts of XBT manufacturing changes on measurements, limitations in the quality and quantity of the observations and metadata, and differences among bias correction models and parameter estimation methods. Remaining XBT correction biases contribute to OHC uncertainty (as discussed earlier in this paper and later in this section). In response, the international community is working toward a better understanding of these biases and the best possible way of correcting them (e.g., http://www.nodc.noaa.gov/OC5/XBT_BIAS/xbt_bias.html).
Early OHC 0–700 m estimates [Levitus et al., 2005; Ishii et al., 2006] included in the Intergovernmental Panel on Climate Change (IPCC) Fourth Assessment Report (IPCC AR4) [Bindoff et al., 2007] exhibited substantial decadal variability during the 1970s–1980s that climate models were unable to simulate [Gregory et al., 2004; AchutaRao et al., 2006; Hegerl et al., 2007; Solomon et al., 2007]. All subsequent proposed XBT corrections, despite their differences, have markedly reduced the amplitude of this decadal variability. It is now widely accepted that the large decadal variability in the 1970s–1980s in the earlier estimates was mostly an artifact caused by XBT biases. It had long been known that XBTs are prone to small systematic errors [e.g., Hanawa et al., 1995], but what was not recognized prior to the Gouretski and Koltermann  study, and the IPCC AR4, was that these biases were time dependent. Although small, these time-dependent biases, if left uncorrected and when integrated in depth and over the global ocean, lead to substantial errors in OHC estimates, in terms of both temporal variability and trends [e.g., Domingues et al., 2008; Wijffels et al., 2008; Levitus et al., 2009].
Instrumental biases have also been discovered (and corrected when possible) for certain types of Argo floats. Willis et al.  reported on systematic pressure labeling errors in a small subset of SOLO Argo floats (section 4.4.2) which were partly responsible for the spurious cooling of the global upper ocean during 2003–2007 [Lyman et al., 2006]. Barker et al.  reported on positive and negative pressure drifts in the dominant (APEX) type of Argo float (sections 4.2 and 4.4.1). Correctable pressure drifts in APEX floats were time variable but presumably depth independent. Their mixed (positive and negative) nature mostly counteracted each other in the global mean (within error bars) but led to larger biases in upper ocean thermal expansion at regional scales [Barker et al., 2011].
Given the geographical and temporal gaps of the ocean subsurface temperature observing system (section 2), estimates of OHC on regular grids are influenced by mapping choices, for example, how these observational gaps are infilled [Gregory et al., 2004]. Objective mapping is widely applied to produce spatially complete fields, but implementations differ. For example, Ishii et al. [2003, 2006], Ishii and Kimoto , and Levitus et al. [2000, 2005, 2009, 2012] assume an initial guess of zero temperature anomaly in unsampled areas (e.g., relax toward climatological values), whereas Lyman et al.  and Johnson et al. [2012, 2013] assume that the mean anomaly of sampled areas is representative of unsampled areas for global integrals [Lyman and Johnson, 2008]. Willis et al. , Guinehut et al. , Lombard et al. , and Johnson et al. [2012, 2013] (for their global maps) infill in situ gaps based on spatially variable linear regressions with satellite altimeter sea level, but this is only possible from 1993 onward.
Techniques other than objective mapping are also used. Palmer et al. , von Schuckmann and Le Traon , and Gouretski et al.  calculate area-weighted anomaly averages within (2° × 2° or 5° × 5°, respectively) grid boxes and sum the results to derive global estimates. While unsampled grid boxes in von Schuckmann and LeTraon  and Gouretski et al.  have zero anomaly, Palmer et al.  apply the averaged anomaly of sampled areas to the unsampled grid boxes, similar to the representative average approach of Lyman and Johnson . Domingues et al.  and Church et al.  use a reduced-space optimal interpolation in which a reduced set of near-global spatial functions (derived from satellite altimeter sea level measurements) is combined with thermal expansion observations to produce spatially complete fields from 1950 onward. OHC 0–700 m is then subsequently estimated based on spatially variable linear regressions with thermal expansion. Reduced-space optimal interpolation is commonly used to reconstruct other sparse data sets, such as sea surface temperature [Smith et al., 1996; Kaplan et al., 1998], sea level pressure [Kaplan et al., 2000], and sea level [Chambers et al., 2002; Church and White, 2006, 2011; Ray and Douglas, 2011].
Mapping uncertainties due to sampling coverage should be larger in the most data-sparse periods, depth levels, and ocean basins. These portions include the early years of the historical record (before 1970s), below ~400 m before the frequent use of deep XBTs in the mid-1990s, below ~700 m before the Argo array achieved near-global ocean coverage in 2005, and in the Southern Hemisphere (especially south of 30°S) before the Argo array (Figure 13). As current Argo float technology does not yet allow for full-depth profiling, the most poorly sampled ocean regions continue to be below ~2000 m (~50% of the total ocean volume). Mapping differences, however, also exist for OHC 0–700 m estimates even in historically well-sampled regions, such as the North Atlantic [Gleckler et al., 2012].
Differences in quality control, profile selection, and climatological reference can contribute to uncertainty in estimates of OHC. Hints to the significance of these differences can be seen in the updated OHC curves when using similar mapping methods. For instance, changes in the 2002–2008 OHC trends were noted in the analysis of Johnson et al.  compared to that of Lyman et al.  related to quality control issues. Similarly, OHC trends reported in Levitus et al.  were updated in Levitus et al. . The differences observed between their 2009 and 2012 estimates are due to changes in climatological reference and also in the volume and quality control of the data sets, from both NODC WOD and Argo, although further work is necessary to pinpoint the contribution of each factor.
In addition to direct observational estimates, changes in OHC can be derived by assimilating (in situ and/or satellite) observations in ocean (or coupled) general circulation models [Stammer et al., 2010]. These methods, sometimes known as ocean state syntheses, vary in complexity and computational cost, from inexpensive multivariate sequential schemes, with solutions strongly constrained toward observations [Carton et al., 2005; Carton and Giese, 2008], to more computationally expensive and sophisticated adjoint methods, with solutions that respect physical constraints such as energy conservation [Stammer et al., 2002]. Synthesis analyses vary in terms of assimilation methods and model systems as well as the type of observations assimilated and period of integration (e.g., see a brief summary in Lee et al. ). All of these synthesis estimates are highly dependent on the accuracy of the observations assimilated and their formal errors [Stammer et al., 2010].
All synthesis estimates show a multidecadal warming over the past 50 years, superimposed on shorter-term variability [e.g., Stammer et al., 2010; Corre et al., 2012; Xue et al., 2012]. Significant differences in variability and trends are also observed, largely reflecting the diversity in estimation approaches among the groups. Ocean synthesis efforts are a relatively recent activity which has flourished under the auspices of the World Ocean Circulation Experiment (WOCE) in the 1990s and is now being advanced as part of the World Climate Research Programme-Climate Variability and Predictability Project and the Global Ocean Data Assimilation Experiment [Stammer et al., 2010] (see also http://www.clivar.org/organization/gsop/synthesis/synthesis.php). Over time, increasing fidelity of such synthesis analyses should lead to the most optimal estimation system for understanding OHC variability and change.
5.1.2 Current Observational Estimates
Updated and recent observational analyses of global upper OHC (Table 1) all show significant multidecadal warming, with a steady increase in OHC since the 1970s (Figures 14a1–14a3). Choices for vertical integrations are usually based on the most frequently observed maximum depths, about ~400 m for shallow XBTs [Gouretski et al., 2012], ~700 m for deep XBTs (most estimates), and either ~900 m (for estimates using earlier floats) [Boening et al., 2012] or ~2000 m [von Schuckmann and LeTraon, 2011] for the Argo array (Figure 13). Although the top 400 to 700 m represents only 10% to 20% of the total volume of the ocean, it accounts for a large fraction of the increase in global OHC at multidecadal timescales [e.g., Levitus et al., 2012]. Long-term changes in subsurface temperature are expected to be largely forced from the ocean surface (~70% of Earth's surface), through air-sea fluxes. As the ocean surface warms, some of that added heat is transported into deeper layers [e.g., Johnson and Wijffels, 2011]. All ocean basins are warming over multidecadal timescales [Levitus et al., 2012; Gleckler et al., 2012], on average, faster in the Northern Hemisphere than in the Southern Hemisphere, and particularly in the North Atlantic (in part due to increased heat transport from the Indian Ocean [e.g., Palmer and Haines, 2009; Lee et al., 2011]). The Pacific Ocean is the largest basin and the one that makes the largest contribution to the observed increase in global upper OHC [Levitus et al., 2012; Gleckler et al., 2012]. Although the Southern Ocean (south of 30°S) is the least observed basin, warming has also been detected there [Gille, 2002; Böning et al., 2008; Meijers et al., 2011]. This Southern Ocean warming is explained by heat uptake from the atmosphere as well as by changes in ocean circulation driven by changes in the mean wind stress field [Morrow et al., 2008; Swart and Fyfe, 2012].
Table 1. Details of the Globally Integrated and Yearly Averaged Upper Ocean Heat Content Time Series Shown in Figure 14a
Some of the time-variable error bars of the OHC time series (Figures 14a1–14a3) reveal that sampling uncertainties in representing upper OHC changes are larger in the earlier part of the record due to sparser ocean coverage (e.g., more incomplete geographical and depth sampling as well as smaller number of observations to reduce noise arising from mesoscale eddy variability), before the widespread use of XBTs in the 1970s [Domingues et al., 2008; Lyman and Johnson, 2008; Palmer and Brohan, 2011; Gleckler et al., 2012]. With these temporal uncertainties in mind, linear trends ending in 2012 (Figures 14b–14f) are estimated for three multidecadal periods (starting in 1970, 1980, and 1993) and also for the past 8 years (2005–2012), in which the ocean observations are dominated by Argo floats. These OHC trends are expressed in terms of heat flux over the entire Earth's surface area (5.1 × 1014 m2) to indicate the upper ocean's contribution to changes in the planetary heat storage and are determined by linear least squares fitting, either by taking into account time-variable error bars (“weighted fit least squares” (WLS)) or not (“simple fit least squares” (SLS)). The time-variable error bars (1 standard deviation uncertainty) used for the WLS trend calculations are from Domingues et al.  (Figure 14a1), an intermediate case (i.e., not the largest or smallest uncertainties) since the uncertainties from different groups may not necessarily account for the same formal errors. The time-variable uncertainties of Johnson et al.  (Figure 14a3) are very similar to those of Domingues et al.  (Figure 14a1).
For the OHC time series in Figure 14, a 2005–2007 reference is chosen because that time period is well sampled by Argo and the shortest estimate [Willis et al., 2004] ends in 2007; however, as the non-XBT data used for the robust average curve in Figure 14a3 are relatively unchanged from 1993 to 2001, in contrast to larger post-2001 data set changes, a 1993–2001 reference is used. The term “robust average” was used in Lyman et al.  because all of the OHC curves mapped using the same Lyman and Johnson [2008, Figure 2] technique showed upper ocean warming independent of XBT correction and for two climatological references.
18.104.22.168 Multidecadal Rates (Since 1970)
Over multidecadal periods, heat is being accumulated in the upper ocean but estimates can quantitatively disagree in the warming rates. For the longest period (1970–2012), the contribution of the upper 700 m of the ocean to changes in the planetary heat storage is 0.27 ± 0.04 W m−2 based on the median value of the WLS trends and 0.22 W m−2 for the SLS trends (Figure 14b). For the WLS trends, this is equivalent to an increase in global upper OHC of about 19 × 1022 J and implies an averaged ocean warming of ~0.2°C over 43 years (or ~0.048°C per decade) in the upper 700 m. However, in some observational estimates, there are clear short-term departures from a linear trend, and these are also seen in the CMIP3 model responses to radiative forcing (Figures 14a1–14a3), where identifiable coolings from major volcanic eruptions are evident following Mounts Agung (1963), El Chichón (1982), and Pinatubo (1991). Over the 33 year period (1980–2012), the median WLS trend is 0.30 ± 0.04 W m−2 (Figure 14c), slightly higher but not statistically different than that estimated for 1970–2012.
The start of the shortest multidecadal period (1993–2012) coincides with the advent of high-precision satellite sea level altimetry, the “satellite altimetry era.” Over this 20 year period, the spread in warming rates for the upper 700 m is from 0.25 W m−2 [Ishii and Kimoto, 2009] to 0.46 W m−2 [Johnson et al., 2013] (Figure 14d). The median rate over 1993–2012 is 0.33 ± 0.06 W m−2 for the WLS fit and 0.34 W m−2 for the SLS fit. The Interdecadal Pacific Oscillation [Corre et al., 2012] or the Pacific Decadal Oscillation [Feng et al., 2011; Xue et al., 2012] plays a role in decadal fluctuations, and apparently so do some major volcanic events.
As mentioned earlier, Lyman et al.  computed an ensemble mean OHC 0–700 m estimate, which they called robust average (Figure 14a3), based on different XBT bias-corrected profiles together with non-XBT profiles sourced from the WOD 2005 and the Argo Programme [Gould et al., 2004]. Their estimation method produces larger rates than the methods employed by other groups. The mean ensemble rate computed in Lyman et al.  for 1993–2008 is 0.64 ± 0.11 W m−2 (90% confidence intervals) compared to 0.39 ± 0.09 W m−2 for the median WLS rate over the same time period (Figure 14e). This difference is at least partly owing to Lyman et al.'s  use of a representative average [Lyman and Johnson, 2008] for infilling gaps in data coverage. However, changes in quality control may also play a role. An updated representative average [Johnson et al., 2013] which takes advantage of quality control advances in the interim, while still producing a higher 1993–2008 rate (0.51 ± 0.09 W m−2) than other estimates, overlaps some of them within standard errors and is lower than that (0.78–0.90 W m−2; their Table 1) estimated by Lyman et al. .
22.214.171.124 Interannual Rates (Since 2005)
Over the past 8 years (2005–2012), the median SLS or WLS trend for OHC 0–700 m is 0.21 ± 0.20 W m−2 (Figure 14f). Individually, trends vary from 0.16 W m−2 [Levitus et al., 2012; von Schuckmann and LeTraon, 2011] to 0.39 W m−2 [Domingues et al., 2008], and uncertainties are larger for the shorter periods. In addition, an updated estimate from von Schuckmann and LeTraon  finds a WLS trend of 0.3 ± 0.1 W m−2 for the 10–2000 m layer, based on their Argo analysis for 2005–2012. Although these trends seem to be consistent with those estimated for the multidecadal periods, they are unlikely to represent long-term changes in global upper OHC. Linear trends are particularly sensitive to the periods being analyzed [Lyman, 2012], and over such a short 8 year interval, changes in upper OHC can be strongly influenced by fluctuations in the state of the El Niño-Southern Oscillation (ENSO) [Roemmich and Gilson, 2011] and other short-term variations in the ocean state. Specifically for the ENSO events observed during 2004–2011, the global ocean tends to lose heat at a rate of >1 W m−2 during El Niños, mainly through evaporative cooling [Trenberth et al., 2002], and to gain a similar amount of heat during La Niñas [Roemmich and Gilson, 2011]. These net changes in OHC associated with ENSO are an order of magnitude larger than the multidecadal changes estimated for 1970–2012. They depend on the east-west oscillation of the tropical Pacific thermocline, which adiabatically redistributes heat between the surface (~0–100 m) and subsurface ocean (~100–500 m) and thus allows the near-surface ocean to significantly alter its net heat exchange with the atmosphere depending on the phase of ENSO [Roemmich and Gilson, 2011].
In addition to its open ocean component, the OHC signal related to ENSO has a large contribution from a more coastally trapped component, particularly across the Indonesian Throughflow and along the coastal waveguide of some boundary currents [e.g., Wijffels and Meyers, 2004]. This coastal component is not observed by Argo floats. Since these shallow areas display ENSO variability that can be large enough to impact a global integral, it is possible that part of the differences between OHC time series and their 2005–2012 rates arises from how different groups deal with these shallower regions (e.g., inclusion of non-Argo data and/or gap infilling techniques).
126.96.36.199 Short-Term Variability
The detection of the global imprint of interannual variability, such as ENSO or other short-lived fluctuations in OHC, is more challenging than the detection of the superimposed long-term changes. Inconsistencies among short-lived variability in the individual OHC analyses (magnitude and timing) are apparent during most of the historical record (Figures 14a1–14a3), including the episodic impact of explosive volcanic eruptions (e.g., 1963, 1982, and 1991), for which the expected cooling signal seems more obvious in some of the estimates [e.g., Domingues et al., 2008; Palmer et al., 2007; Palmer and Haines, 2009] and in climate model simulations (Figure 14a1)]. This large spread in short-term variability among OHC estimates mainly reflects the greater influence that differences (data quality, bias corrections, limited ocean coverage, mapping techniques, etc.) have on interannual and shorter timescales [Palmer et al., 2010]. In addition, some of the OHC time series (Figure 14a1–14a3) exhibit a step change around 2003, which coincides with a major transition in the global observing system from XBTs to Argo floats (Figure 13). Such major shifts in observing systems can introduce artifacts in the spatiotemporal variability and trends of climate records and should be examined more closely.
Interannual variability in OHC has been greatly improved in the tropical Pacific since the establishment of the TAO/TRITON array of moored buoys in the mid-1980s. More generally, confidence in interannual variability in the global upper OHC has improved after 2005, following the dramatic improvement in open ocean coverage by the Argo floats, at least for the upper ~2000 m.
5.1.3 Concluding Remarks on Upper Ocean Heating
Recent discovery of time-dependent biases [Gouretski and Reseghetti, 2010] led to improved estimates of OHC/ocean thermal expansion that helped to refine our understanding of the major role of ocean heat storage in the Earth's energy balance and to close the historical sea level budget (within uncertainties), one of the key uncertainties in the IPCC AR4 [Solomon et al., 2007]. Bias-corrected estimates also helped to considerably increase confidence in climate model simulations and in the detection and attribution of anthropogenic ocean warming since the 1970s [Gleckler et al., 2012; Pierce et al., 2012].
Current OHC observational estimates consistently show that the global ocean has significantly warmed in the upper 700 m, at least from 1970. One study [Levitus et al., 2012] documents that the upper 2000 m of the world ocean has warmed since 1955 and estimates that 30% of the warming has occurred in the 700–2000 m layer. Although different estimation approaches and instrumental corrections have been applied to the subsurface ocean temperature data in an attempt to account for the irregular coverage and methods of measurements of the observing system (sections 2, 3, and 4), the combined impact of these structural uncertainties does not prevent the detection of a statistically significant increase in upper OHC at multidecadal timescales; however, it does contribute to a spread in estimates of warming rates. Further systematic comparisons between OHC analyses are needed to understand the spread in multidecadal rates, to isolate the impact of individual structural uncertainties, and to develop best practices for analyses. Despite the greatly improved open ocean coverage for the upper 700 m by the Argo array since 2005, a wide spread in interannual rates for 2005–2012 remains. Further systematic comparisons may also help to understand differences in estimates over this relatively short and well-sampled time interval.
Present observational estimation approaches and instrumental bias corrections are perhaps two significant sources of uncertainty in OHC estimates, and both are unlikely to be perfect. Future refinements in methodological issues and in the observational database can be made to narrow the spread in the warming rates and to strengthen the key conclusion of increasing heat storage in the upper ocean since the late twentieth century. In fact, longer-term upper ocean warming is found through analysis [Roemmich et al., 2012] of data from the 1872–1876 HMS Challenger global voyage [Wyville and Murray, 1885] and modern Argo floats [Gould et al., 2004]. This result agrees with findings of century timescale ocean warming from observations in the upper 400 m, albeit within a rather broad uncertainty range due to extremely sparse ocean observations before the 1970s [Gouretski et al., 2012].
5.2 Deep Ocean Heating
Though variations in deep ocean temperatures are small compared to the upper ocean, the large volume of the deep ocean makes its contribution to the global energy balance significant [Purkey and Johnson, 2010]. The deep ocean (>700 m) has been estimated to have absorbed 42% (80.4 × 1021 J) of the 193 × 1021 J stored in the ocean between 1955 and 2003 [Church et al., 2011]. Variability of the heat content of the deep ocean modulates both the energy budget of the climate system and global sea level [IPCC, 2007]. Therefore, a comprehensive closure of the global energy budget [e.g., Trenberth and Fasullo, 2010] and a precise attribution of observed changes in sea level are not possible if variations of the deep ocean heat content are not formally evaluated. Moreover, by inducing changes in the baroclinic pressure gradients, regional variations in deep ocean heat content can alter the meridional overturning circulation (MOC) and therefore lead to complex feedbacks to the climate system [Rintoul et al., 2012]. Though the spatial and temporal sparseness of observations of sufficient quality to assess the subtle deep ocean temperature changes makes it difficult to evaluate this variability, the consistency of recent analyses is encouraging.
Comparisons of recent deep hydrographic observations to earlier data reveal changes in Antarctic Bottom Water (AABW) characteristics occurring on decadal timescales. Bottom waters of Antarctic origin in the deep South Atlantic [e.g., Johnson and Doney, 2006; Meredith et al., 2008], Pacific [Fukasawa et al., 2004; Kawano et al., 2006; Johnson et al., 2007a], and South Indian Oceans [e.g., Johnson et al., 2008a] have all warmed over the last few decades, indicating a change in the Antarctic contribution to the MOC. In the Argentine Basin, the recent findings are in agreement with warming and decreasing volume of dense waters observed during the 1980s [Coles et al., 1996]. The relatively large temperature variability (~0.1°C) in warm deep waters of the Scotia Sea resembles changes observed earlier upstream in the Weddell Sea [e.g., Fahrbach et al., 2004], which are in turn associated with interannual changes in the circulation set by large-scale atmospheric circulation patterns [Meredith et al., 2008]. Warming in the Atlantic abyssal waters has also been observed in the Vema Channel, the conduit through which AABW connects the Argentine and Brazil Basins [Hogg and Zenk, 1997; Zenk and Morozov, 2007]. These data present a warming of ~0.03°C per decade observed from 1991 to 2007. Similarly, bottom waters close to Antarctica in the Indian and Pacific sectors [Johnson et al., 2007a; Rintoul, 2007; Swift and Orsi, 2012; Purkey and Johnson, 2013] also appear to have freshened, consistent with decadal timescale freshening in the source regions for this bottom water [Jacobs et al., 2002].
The above mentioned observations suggest the circumpolar spreading of deep-sea warming patterns, propagated from the Weddell Sea, which is most notable on the western Atlantic, the eastern Indian, and the central Pacific [Purkey and Johnson, 2010]. Furthermore, pressure changes carried by planetary waves can propagate temperature changes occurring in regions of water mass formation around Antarctica to waters to the north on relatively short timescales [Kawano et al., 2006]. Thus, regional deep ocean temperature variations could impact the global ocean on timescales much faster than advection: decades as opposed to centuries. Finally, there are suggestions in the South Atlantic and South Pacific [Kouketsu et al., 2011], the North Atlantic [Johnson et al., 2008b; Frajka-Williams et al., 2011], and the North Pacific [Kouketsu et al., 2009], as well that the MOC associated with AABW may have slowed, consistent with the near-global contraction of this water mass [Purkey and Johnson, 2012].
Above the deep water, Argo data document a large-scale warming and freshening around Antarctica that may be partly associated with a southward shift of the Antarctic Circumpolar Current (ACC) [Gille, 2008]. Given the barotropic nature of the ACC and the vertical coherence of the associated thermohaline structure, it is likely that this warming pattern extends to deeper waters and potentially impacts MOC. However, the sparseness of available data precludes quantification of such an impact. The analysis of Böning et al.  confirms the reported trends in the Southern Ocean but also indicates that there are no significant changes in the tilt of isopycnals, thus suggesting that ACC transport is insensitive to these observed trends.
Only about two thirds of the long-term altimetry-derived sea level rise can be explained from upper ocean warming and added mass from ice melting. The combined effect of the model deep ocean steric height with observational upper ocean data and mass trends estimated from the Gravity Recovery and Climate Experiment explains the altimetry-derived global mean trend and the regional trends of sea level. These results strongly suggest that changes in the deep ocean heat content manifest themselves in the observed long-term sea level rise trend. Assimilation of temperature and salinity data into an ocean general circulation model leads to a significant increase in deep ocean heat content in the 1990s (~0.8 × 1022 J decade−1). These changes can be estimated based upon present-day repeat hydrographic observations, but uncertainties are large. This perhaps explains why the various estimates of recent deep ocean heat content are in overall agreement. However, given continuous increase of greenhouse gases, and the significance of deep ocean heat content changes for steric sea level rise and the Earth's energy budget, an improved set of deep ocean observations is required [Garzoli et al., 2010]. At present, only hydrographic observations provide data of the required accuracy and at a decadal frequency (the minimum needed to observe climate change). These observations are limited to specific regions, latitudes, or longitudes but will be complemented through the development of deep-reaching autonomous Lagrangian floats. This new platform may significantly improve the spatial coverage of observations in the remote deep and abyssal ocean.
5.3 Impact of Ocean Measurements on Earth Energy Balances
The key issues for the Earth from an overall energy standpoint are the energy imbalances at the top and bottom of the atmosphere and their changes over time. The energy imbalance can be estimated by an inventory of the rates of changes of energy stored in all components of the climate system, the most important of which is the ocean, and thus changes in the OHC.
As noted earlier, the OHC changes account for an order of 90% of the total energy imbalance. IPCC AR4 [Bindoff et al., 2007] provided an assessment of the inventory of how much different parts of the climate system contributed to changes over 1961–2003 and 1993–2003. Trenberth  provided a more complete inventory of all the components of the climate system and changes in the solar radiation, and their contributions to both global energy storage change and sea level rise. This included tracking the slight decrease in solar insolation from 2000 until 2009 with the ebbing 11 year sunspot cycle, enough to temporarily offset 10–15% of the estimated net human-induced warming [Trenberth, 2009].
The changes in energy storage over time do not require absolute accuracy, but they do require precision and reproducibility in observations. That is, they require a consistently stable set of instrumental measurements that may be biased in some, perhaps unknown but relatively time-invariant, way.
By 2005, the ocean observing system had reached new capabilities, providing regular temperature soundings of the upper 2000 m, giving considerably greater confidence in the OHC assessment. However, the pre-Argo and Argo eras may not be compatible for inventory analysis in determining changes over time. Other observing systems in place can nominally measure the major storage and flux terms, but owing to errors and uncertainty, it remains a challenge to track anomalies with confidence.
5.3.1 Climate System Energy Inventory
Extensive use has been made of conservation of energy and the assumption that on a timescale of years, the change in heat storage within the atmosphere is very small [Trenberth et al., 2009]. According to Hansen et al. , the Lyman et al.  upper ocean heat changes from 1993 to 2008 of 0.64 ± 0.11 W m−2 (robust average, Figure 14e) yield a planetary energy imbalance of 0.80 W m−2 when taking account of the other components of the climate system (deeper ocean, melting sea ice and glacial ice, etc.). Levitus et al.  found smaller heat gains in the upper 700 m of 0.41 W m−2, yielding a planetary energy imbalance of only 0.57 Wm−2 according to Hansen et al. . Hansen et al.  built on the von Schuckmann and Le Traon  and Lyman et al.  values to estimate the planetary energy imbalance as 0.80 ± 0.20 W m−2 for 1993–2008 and 0.58 ± 0.15 W m−2 for 2005–2010 (1-sigma standard errors), where these include the nonocean component discussed earlier.
A recent published OHC estimate from Levitus et al. , which has been updated in Figure 15, has values that are nominally for 0–2000 m but in reality cover the depth range to 1750 m. Their estimates of change are very conservative owing to the assumptions of zero anomaly for a first guess, and this is problematic in a changing climate. Instead, it is important to acknowledge the large-scale changes such as through a first guess of the global or zonal averages of all other observations [Hurrell and Trenberth, 1999]. A key result, however, is the growing disparity between the OHC changes in the upper 700 m and 0–2000 m after 2005. From 1993 to 2011, the rate of increase of OHC 0–2000 m in their estimate is about 0.57 W m−2 per unit area of the globe and this rate applies also for 2005–2010.
In addition, a much more comprehensive ocean reanalysis ORAS4 from European Centre for Medium-Range Weather Forecasts that assimilates not only ocean temperature and salinity information but also sea level and sea surface temperature fields into a global ocean model that is driven with surface winds and fluxes based on atmospheric reanalyses [Balmaseda et al., 2013a] reinforces this result. Balmaseda et al. [2013b] analyzed the OHC from ORAS4 for 1958 through 2009. Volcanic eruptions and El Niño events are identified as sharp cooling events punctuating a long-term ocean warming trend, and while heating continues during the recent upper ocean warming hiatus, in the last decade, about 30% of the warming has occurred below 700 m, contributing significantly to an acceleration of the warming trend. The warming below 700 m remains even when the Argo observing system is withdrawn. Sensitivity experiments illustrate that surface wind variability is largely responsible for the changing ocean heat vertical distribution. In the 2000s, the ocean warming is 0.84 W m−2.
In summary, after the effects of Mount Pinatubo died away in ~1994, several estimates support the view that the energy imbalance was 0.8 to 0.9 W m−2 until ~2004 [Trenberth et al., 2011; Trenberth and Fasullo, 2011; Hansen et al., 2011; Balmaseda et al., 2013b]. From 2004 to 2010, the quiet sun reduced the energy imbalance by 0.1 to 0.15 W m−2 and there was a noticeable slowing of the increase in OHC above a 700 m depth that has led to reduced estimates of the overall energy imbalance of 0.6 to 0.8 W m−2 [Trenberth and Fasullo, 2011, Hansen et al., 2011]. Moderate volcanic eruptions during this time may also reduce the warming [Neely et al., 2013]. However, estimates of OHC trends above 700 m from 2005 to 2012 (Figure 14f) range from 0.2 to 0.4 W m−2, with large, overlapping uncertainties, highlighting the remaining issues of adequately dealing with missing data in space and time and how OHC is mapped, in addition to remediating instrumental biases, quality control, and other sensitivities.
5.4 Ocean Temperature Measurements and Thermosteric Sea Level Rise
Both the volume and mass of the global ocean, and thus sea level (global mean sea level), change across a variety of timescales, due to expansion and contraction of water as ocean temperatures and heat content change, and the growth of ice sheets and glaciers. Relative sea level, the height of the ocean relative to the land, also changes as a part of climate variability and change (as water is redistributed around the ocean), and in response to vertical movement of the Earth's crust and changes in the Earth's gravitational field.
The uptake of heat by the ocean results in expansion of the ocean (thermosteric sea level rise). Because the ocean has the largest heat capacity of the climate system, and the ocean thermosteric rise is one of the largest contributors to the late 20th (and projected 21st) century sea level rise [Church et al., 2011; Meehl et al., 2007], the Earth's energy and sea level budgets must be consistent. However, this consistency is only a weak constraint because for the same amount of heat, melting of land ice gives a much larger sea level rise than ocean thermosteric rise.
Early estimates of global averaged ocean heat uptake and the related thermosteric rise [Levitus et al., 2005; Antonov et al., 2005; Ishii et al., 2006] were adversely affected by XBT biases already discussed. These calculations are also sensitive to the techniques used to interpolate across data gaps [Gregory et al., 2004].
Biases in the XBT data have been substantially (but not completely) reduced by various proposed corrections (see section 3). For 1955 to 2010, the most recent trends in thermosteric sea level rise are about 0.41 mm yr−1 for the upper 700 m [Ishii and Kimoto, 2009; von Schuckmann and Le Traon, 2011] and 0.54 mm yr−1 for the upper 2000 m [Levitus et al., 2012].
To address the implicit assumption of zero anomaly where there were no data and XBT biases, Domingues et al.  used a reduced-space optimal interpolation scheme in combination with the XBT corrections of Wijffels et al. . The linear trends in global averaged thermosteric height are 0.5 ± 0.1 mm yr−1 from 1961 to 2003 [Domingues et al., 2008], 0.68 ± 0.1 mm yr−1 from 1972 to 2008, and 0.82 ± 0.3 mm yr−1 from 1993 to 2008 [Church et al., 2011]. The Lyman et al.  robust average ocean warming estimate would give a somewhat larger ocean thermal expansion.
For waters deeper than 700 m, ocean heat content estimates remained dependent on deep ocean bottle and CTD casts until the implementation of the Argo Program [Gould et al., 2004], which dramatically improved global sampling to a depth of 2000 m, particularly in the Southern Ocean. The most recent estimates using Argo data are a thermosteric rate of rise of 0.69 ± 0.14 mm yr−1 from 2005 to 2010 [von Schuckmann and Le Traon, 2011]. Below 2000 m, deep ocean CTD transects remain the dominant data set and provide only sparse (spatial and temporal) coverage. Purkey and Johnson  estimate a deep ocean (below 2000 m) contribution of 0.1 ± 0.1 mm yr−1 from circa 1992 to 2005.
6 Concluding Remarks and Future Directions
This paper brings together a broad set of perspectives and information on oceanographic temperature measurements and their implications for climate change. Included are discussions of the history of temperature measurements, the primary instrumentations which have been used to complete the measurements, and their associated accuracy. Additionally, it has been shown how ocean temperature observations are a key to understanding climate change. In particular, observational studies show an increase in ocean heat content since the 1970s that is an order of magnitude larger than any other component of Earth's energy budget (e.g., atmosphere, land, cryosphere) [Bindoff et al., 2007; Church et al., 2011; Hansen et al., 2011], and the associated ocean thermal expansion (thermosteric sea level) is one of the major contributions to historical global sea level rise [Church et al., 2011].
It is apparent from this review that the oceanographic observational network has improved substantially with time over the past century—both in increasing quantities of measurements and in their higher quality. This improved network provides a much better understanding of Earth energy imbalance, ocean warming, and thermosteric sea level rise. At the same time, much of the ocean is still unmonitored, so uncertainty remains. In particular, deep oceans, marginal seas, and areas beneath sea ice need improved monitoring, but given its role in heat content and sea level, special attention should be given to the deep ocean. Palmer et al.  show that integrating OHC over increasing depth provides an increasingly closer estimate of the net TOA radiation in climate models (within 0.05 W m−2 for the full ocean depth).
Overall, the major future challenges are (i) to maintain the current ocean subsurface observing system, (ii) to expand it to a truly global coverage—including coastal and ice-covered regions and extending vertically to abyssal regions—through existing and emerging instrumental technologies, and (iii) to sustain some level of overlap for the complementary in situ and satellite observing systems to facilitate rapid detection of instrumental biases and intercalibration of sensors [Church et al., 2011; Wijffels et al., 2010] and to potentially track possible major climatic events, such as the impact of an explosive volcanic eruption.
With respect to (ii), even with the advances to the observing system culminating in the Argo array, more than 50% of the ocean is without routine observations. Important areas such as boundary currents, which are responsible for large poleward heat transport, need higher-frequency observations than are currently provided by Argo.
6.1 Some Future Directions of Instrumentation
Fortunately, technological advances are being made on all fronts. Arvor3500, a profiling float capable of diving to 3500 m, has been constructed in prototype, and Deep NINJA Argo (4000 m capable float) test floats have already been deployed (Argonautics newsletter 13, August 2012, http://www.argo.ucsd.edu/newsletter.html). These floats will contribute to a subset of the Argo fleet (Deep Argo) which will periodically measure the global ocean to depths deeper than the current 2000 m. Deep APEX and SOLO floats are also under development with target profiling depths of 6000 m. Under-ice measurements are also being realized. Klatt et al.  and Wong and Riser  detail floats which use an ice detection algorithm to stay submerged, recording under-ice information until it is safe to surface and relay information to satellite.
Another method for gathering under-ice temperature profiles is with the use of instrumented animals, most commonly pinnipeds [Fedak, 2012]. These animals are able to sample the upper 700 m or so in areas of ice cover that are inaccessible to conventional observing platforms.
An important development for future improved sampling of the deep ocean, continental shelves, boundary currents, and marginal seas is the ocean glider. Gliders [Rudnick et al., 2004; Rudnick and Cole, 2011] are similar to Argo floats in that they use an external bladder to regulate buoyancy for descent and surfacing and are also are similar in shape, but with the addition of wings and/or tails. These attachments aid in navigation as the gliders can be “flown” by changes in dive angle which are imparted to the onboard software when the glider communicates with a satellite at the surface. Gliders are under the control of the pilot, so they can be deployed to carry out a set geographic and depth sampling plan, with updated instructions when needed. This makes them ideal for measurements in boundary currents, on shelves, and in marginal seas. Autonomous oceanographic instruments capable of making hydrographic measurements can now reach depths of 6000 m (see http://www.whoi.edu/page.do?pid=38144).
Development of deep gliders [Osse and Erikson, 2007] is ongoing, and these instruments may make an important contribution to future monitoring of the deep ocean. However, the gliders, unlike Argo, are used primarily for individual projects with minimal coordination as part of the global observing system. Some coordination planning activity has begun but is a long way from incorporating gliders into a coherent blueprint for sampling the global ocean on a timely basis.
6.2 Improved Observational OHC Estimates and Analysis Methodologies
A high-quality subsurface ocean temperature database along with accurate and comprehensive metadata is an important prerequisite for advancing knowledge on instrumental biases (e.g., XBT/MBT) and devising more accurate corrections to help further reduce uncertainties in OHC estimates. Although much has been done to retrieve original data and metadata and to improve quality control procedures, more work is required to realize the potential of the historical subsurface ocean observing system (originally designed and funded with a focus on short-term forecasting) to advance our understanding of climate change and develop improved prediction systems in order to underpin 21st century decision and policy-making assessments [Wijffels et al., 2010; Levitus et al., 2012]. Improvements to the quality of the historical ocean temperature database and metadata information for climate research purposes are currently being planned through a global coordinated effort (http://www.clivar.org/organization/gsop/activities/clivar-gsop-coordinated-quality-control-global-subsurface-ocean-climate).
Another large source of uncertainty in OHC estimates is the use of varying methodological approaches (e.g., mapping practices and climatological references) by different research groups. To better understand and quantify the structural uncertainties arising from methods used in publications, a comprehensive project is underway [Boyer et al., 2013]. In this project, a series of systematic intercomparisons is being carried out for a number of sensitivity tests based on different parameter choices but using agreed temperature databases (e.g., same input data). It is hoped that this project will provide helpful guidance on best practices to be developed for how best, for instance, to infill observational gaps.
Such improvements in the historical ocean observations as well as on methods used to estimate OHC will lead to better understanding and increased confidence in past and present OHC changes at global and regional scales, will allow more critical assessments of heat uptake and ocean thermal expansion changes in climate model simulations, and will help to better constrain ocean data assimilation efforts and future climate model predictions/projections. Expanded efforts to compare energy balances estimated from reservoir heating with TOA measurements from satellites will further improve confidence of the energy budget.
Despite these potential future improvements to ocean monitoring, past and present measurements show that the Earth is experiencing a net gain in heat, largely from anthropogenic factors [Hansen et al., 2005; Levitus et al., 2001], although the magnitude differs among individual studies. For ocean heat content, there have been multidecadal increases in energy content over the entire water column. Two recent detection and attribution analyses [Gleckler et al., 2012; Pierce et al., 2012] have significantly increased confidence since the last IPCC AR4 report that the warming (thermal expansion) observed during the late twentieth century, in the upper 700 m of the ocean, is largely due to anthropogenic factors. For sea level rise, despite spatial and temporal nonuniformity, the global trend is approximately 3 mm yr−1 over the past 20 years, with a large contribution from thermal expansion.
It is difficult to overestimate the importance of ocean temperature measurements for persons who are attempting to understand the present and future impacts of human emissions on the climate. A long-term high-quality record of ocean temperature observations is crucial for constraining our understanding of climate change. This is particularly so since the oceans are responsible for the majority of heat uptake and thermal buffering. This manuscript serves as a historical perspective and a future road map for the oceanographic community.
N.L.B. acknowledges support from the ARC Centre of Excellence for Climate Systems Science. L.J.C. was supported by the MOST project (grant 2012CB417404). J.A.C. and S.W. were funded by the Australian Climate Change Science Program. C.M.D. was funded by the Australian Antarctic and Ecosystems Research Cooperative Centre. J.G. was supported through NOAA grant NA17RJ1231 (Scripps Institute of Oceanography). S.A.G. was supported by the Joint DECC/Defra Met Office Hadley Centre Climate Programme (GA01101). V.G. was supported through the Cluster of Excellence “CLISAP” (EXC177), University of Hamburg, funded through the German Science Foundation. T.B., J.M.L., and G.C.J. were supported by the NOAA Climate Program Office and NOAA Research. A.P. was supported by the Inter-American Institute for Global Change Research through the US National Science Foundation grant GEO-0452325. K.E.T. and J.T.F. were sponsored by NASA under grant NNX09AH89G. F.R. was supported by EC FP7 project MyOcean2 and operationally supported in part by NOAA/AOML.
The Editor on this paper was Eelco Rohling. He thanks three anonymous reviewers for their review assistance on this manuscript.