Optimal interpolation of global dissolved oxygen: 1965–2015

Oxygen inventory of the global ocean has declined in recent decades potentially due to the warming‐induced reduction in solubility as well as the circulation and biogeochemical changes associated with ocean warming and increasing stratification. Earth System Models predict continued oxygen decline for this century with profound impacts on marine ecosystem and fisheries. Observational constraint on the rate of oxygen loss is crucial for assessing the ability of models to accurately simulate these changes. There are only a few observational assessments of the global oceanic oxygen inventory reporting a range of oxygen loss. This study develops a gridded data set of dissolved oxygen for the global oceans using optimal interpolation method. The resulting gridded product includes full‐depth map of dissolved oxygen as 5‐year moving average from 1965 to 2015 with uncertainty estimates. The uncertainty can come from unresolved small‐scale and high‐frequency variability and mapping errors. The multi‐decadal trend of global dissolved oxygen is in the range of −281 to −373 Tmol/decade. This estimate is more conservative than previous works. In this study, the grid points far from the observations are essentially set equal to zero anomaly from the climatology. Calculating global inventory with this approach produces a relatively conservative estimate; thus, the results from this study likely provide a useful lower bound estimate of the global oxygen loss.


| INTRODUCTION
There is a growing consensus that the global oxygen inventory has declined over the past several decades (Bindoff et al. 2019). Distribution of oceanic oxygen (O 2 ) reflects the interplay between ocean circulation, biological respiration and air-sea gas exchange. Long-term loss of O 2 is ultimately caused by the imbalance between the O 2 supply and biological consumption, likely driven by the ocean heat uptake (Keeling et al., 2010;Oschlies et al., 2018). In coastal waters, excess nutrient input from land also plays a major role Rabalais et al., 2010).
Many physical and biochemical processes are involved with the ocean O 2 cycling. Relatively efficient surface gas exchange maintains the surface ocean close to saturation with the overlying atmosphere for most of ice-free regions. Decomposition of organic matter in the interior ocean consumes oxygen below the surface. At steady state, the biological consumption must be balanced by the supply via ocean ventilation. The terminology, ventilation, refers to the vertical exchange of waters between the surface layer and the ocean interior (Talley et al., 2011). As the seawater warms up, its ability to hold O 2 decreases due to the temperaturesolubility relationship. In addition, ocean ventilation and biogeochemical processes can change in response to the warming and increasing upper-ocean stratification. Ocean ventilation covers a wide range of transport processes including wind-driven shallow overturning circulation (Brandt et al. 2015;Duteil et al., 2014;Eddebbar et al., 2019), the formation of mode and intermediate waters (Claret et al., 2018;Gnanadesikan et al., 2012;Sallée et al., 2010), the lateral eddy stirring (Gnanadesikan et al., 2013;Gnanadesikan et al., 2015;Rudnickas et al., 2019) and the deep meridional overturning circulations (Gnanadesikan et al., 2007;Gordon, 1966;Palter & Trossman, 2018; van Aken et al., 2011). These circulation systems are driven by atmospheric winds and buoyancy fluxes with significant interannual, decadal and multi-decadal variability, which then causes fluctuations in the rate of O 2 supply into the ocean interior (Duteil et al., 2018;Kwon et al., 2016;McKinley et al., 2003;Ridder & England, 2014).
Earth System Models have become powerful tool to simulate the complex interplay of these processes behind the long-term trend and interannual variability of O 2 . These models predict that the loss of oxygen continues throughout this century under unabated emission of greenhouse gases Cocco et al. 2013). These changes will have profound influences on marine ecosystem and nutrient cycling (Breitburg et al. 2018;Levin 2018). In order to assess the ability of the models to reproduce these processes, it is necessary to validate the model against observations. Observational estimates of the past oxygen loss over the last several decades are therefore crucial for assessing the model skill. However, there are only a few assessments of the global oceanic oxygen inventory, reporting a range of oxygen loss over the last several decades. Schmidtko et al. (2017) estimated the linear trend of 961 ± 429 Tmol/decade since 1960, which is approximately 2% ± 1% loss of global oxygen inventory over last past 50 years.
The objective of this study is to develop the threedimensional, time-varying data set of dissolved oxygen for the global oceans including uncertainty estimates. Optimal interpolation is applied to quality-controlled bottle O 2 data provided by the World Ocean Database 2018 (hereafter, WOD18) . In order to control the source of uncertainty, this study uses the bottle O 2 data only, unlike the previous studies of Schmidtko et al. (2017) and Ito et al. (2017, hereafter I17) who blended the bottle and CTD measurements. Because of the limited number of source data, there will be large data gaps, and optimal interpolation method is used to fill data gaps. There are three major differences from the earlier work of I17 including the mapping parameters for objective mapping, data sources and the method of interpolation. In addition, there are minor changes such as the choice of climatology to calculate oxygen anomalies. The differences resulting from different data processing are examined in detail. Below is the structure of this paper. Section 2 introduces the method. Section 3 is the analysis of the data, including comparison to prior study. Section 4 briefly describes data access, and the section 5 concludes this paper.

| METHOD
Dissolved oxygen is one of the most frequently measured chemical tracers in the ocean. There are approximately 2.8 million temperature, 2.4 million salinity, and 0.9 million oxygen vertical profiles in the Ocean Station Data (OSD) reported to the World Ocean Database 2018 (hereafter, WOD18) . WOD18 is a compilation of quality-controlled oceanographic data set contributed by the international scientific community. Dissolved oxygen concentrations in the OSD profile are typically measured by modified 'Winkler titration' method (Carpenter, 1965;Winkler, 1888). Most modern oxygen chemical titration measurements are based on Carpenter's whole bottle titration method and an amperometric or photometric end-detection with uncertainty of about 1 µM. There are other data sources such as Conductivity-Temperature-Depth instruments (CTD) and profiling floats, but the majority of the existing oxygen measurements are in the Winkler titration data, which will be used in this study.
While the number of OSD measurement is the largest among all historical observational platforms, the CTD measurements have increased after 1990s and that of profiling floats are rapidly increasing in recent years. It is beyond the | 169 ITO scope of this paper to discuss how best data from these different platforms can be integrated into the data set, and it is left for the future studies.

| Preprocessing
The preprocessing of the data is necessary to prepare the WOD18 data before mapping. This step also includes additional checks for data quality. The original WOD18 profiles are placed into bins which are the 1° x 1° longitude-latitude grid cells with 102 vertical depth levels referenced to the standard depths of WOD18. Basin mask is used to include data points only from the four major ocean basins including Atlantic, Pacific, Indian and Southern Oceans. Other ocean basins and marginal seas are excluded from this analysis. The target analysis period is after 1965 when the modern oxygen titration method is established by Carpenter as referenced above. Some of the data from most recent years are not yet included in the database, so the analysis ends in 2015.
First, questionable or unrepresentative data marked by the WOD18 quality flags are excluded and only acceptable data are retained for the analysis. Additional quality control is applied similar to I17. Each Winkler titration data are compared to the monthly climatological oxygen concentration from the World Ocean Atlas 2018 (WOA18)  and the outliers are identified and excluded from the binned data. I17 calculated its own climatological mean value whereas this study uses WOA18 which is widely accepted as the standard climatological product in the oceanographic community, which is a minor deviation from the earlier work. The outliers are identified with above or below three times the sample standard deviation (s) relative to the monthly climatological mean. The choice of this threshold is arbitrary and this specific choice (3s) follows I17. The monthly climatology is available for the upper 1,000 m of the water column, and the annual mean is used for the deeper waters.
Second, the quality-controlled data points are averaged for each bin at 1°x1° and monthly resolution where statistical mean, sample-variance and sample size are recorded from 1962 to 2017. Sample variance can reflect small-scale variability within the 1°x1° longitude-latitude bins and sampling noise, which is used as a measure of background noise in the later analysis. While the target period is from 1965 to 2015, additional two years are included in preparation for the pentadal analysis. The averaged oxygen concentrations are recorded as the anomalies from the monthly climatological mean. Figure 1 shows the sampling pattern for each decade. The number of bottle data itself is higher during 1970s and 80s but the sampling density is skewed towards North Atlantic and North Pacific basins. The global ocean is more evenly sampled during 1990s and 2000s.
Finally, the monthly anomalies are averaged into yearly anomalies. The binned data are very sparse at the monthly timescale. For each year, the monthly anomaly data are averaged into yearly anomaly neglecting the months with missing data. The yearly averaged anomalies and its yearly variance are recorded. This step increases data coverage significantly while averaging out high-frequency variability in the data. The variance field is retained and will be used to estimate the high-frequency variability and will be used for the uncertainty analysis. In addition, a 5-year moving window averaging is applied to the yearly anomaly neglecting the years with missing data. This further increases the data coverage while averaging out variability on the timescale shorter than 5 years. Its variance is recorded as a part of high-frequency variability. The resulting yearly binned data cover the 50-year period from 1965 to 2015.

| Optimal interpolation
Once the preprocessing step is complete, an objective map is assembled to estimate oxygen anomalies for all 1°x1° grid cells following Breatherton et al. (1976). This is based on optimal interpolation providing the least-square estimate of oxygen anomaly field on a regularly spaced grid cells. This process minimizes the mean square error of the mapped data for given observational data points. The binned oxygen anomaly, X(t), is expressed as a (N × 1) vector where N is the number of binned data. The objective map of oxygen anomaly, Y(t), is a (M × 1) vector, where M is the number of grid cells.
There are two covariance matrices; D is the M × N datagrid covariance and C is the N × N data-data covariance matrix. ϵ is the noise-to-signal variance ratio. Since the time series in each bin is too sparse to compute the autocovariance matrices, the autocovariance function between the two points denoted by indices m and n, separated by a distance L mn , is prescribed using isotropic Gaussian function with the e-folding length scale of L ref where C mn = exp( − L 2 mn ∕2L 2 ref ) . Matrix D follows the same definition but for data-grid covariance. The reference e-folding length scale (L ref ) is set to 1,000 km. The value of ϵ is estimated from the noise-tosignal variance ratio of the binned anomaly data. Optimal interpolation (Equation 1) minimizes the mean square error, and Y approaches to 0 with a strong noise (ϵ ≫1) or when the data-grid covariance is small. The latter would be the case for grid points far from observations. Conversely, a (1) Y = D(C + ∈ I) − 1 X. ITO smaller noise level tends to increase the sensitivity. In some cases where the estimated ϵ falls below a certain threshold, its magnitude is prescribed to the minimum value of 0.1. This calculation is applied to four basins separately where all observational data points are used to map each of the four basins except for the Atlantic map excluding the data points from the Pacific basin and for the Pacific map excluding the data points from the Atlantic basin. The calculation is repeated for all depth levels and the 50-year period annually . Figure 2 shows an example of the output from the objective map (Equation 1). It also shows the distribution of the binned data. The mapped field generally reflects the binned anomaly data. Data density varies significantly.
Uncertainty analysis is therefore important and necessary to correctly interpret the mapped data.

| Error analysis
Using the Gaussian autocovariance matrices from the previous section, the mean square error of the objective mapping is calculated following Breatherton et al. (1976). The coefficient of determination (R 2 ) for the least-square fit (Y m ) is calculated as R 2 = D m (C + I) -1 D m T where D m is the m-th row (1-R 2 ) where σ Y 2 is the total variance of Y. This represents the mapping error and its example from 2005 is shown as Figure 3a. The mapping error is elevated away from the cruise track and is the dominant source of uncertainty in the tropical and the southern hemisphere oceans.
There are small-scale and high-frequency variability that have been averaged out in the preprocessing step. The general formula for error propagation can be applied to estimate uncertainties from these fluctuations (Taylor, 1997). Equation 1 can be re-written such that each element of Y can be expressed as the weighted sum of the elements in X; Y m = w mn X n using Einstein summation convention. If the unresolved 'noise' is random and uncorrelated, the standard error (σ m ) can be calculated as (w 2 mn s 2 n ) 0.5 where s 2 n represents the noise variance of the binned data. This formula is used to map the standard error due to unresolved sub-grid scale variance within 1° x 1° monthly bins, and its example is shown as Figure 3b. This noise variance is calculated including all years, and it does not vary over time. The overall amplitude is relatively small (<5 µM). Its spatial pattern reflects regions of increased climatological gradients such as near the edges of the tropical Pacific oxygen minimum zones (OMZs) and frontal regions in the Southern Ocean. The preprocessing step also excluded high-frequency variability from monthly to 5-year timescale in order to increase the spatial coverage. Potential drivers of such high-frequency variability can be influenced by regional or large-scale atmospheric and oceanic circulation. Thus, the assumption of uncorrelated noise is not applicable for this type of variability. A more conservative estimate of the uncertainty can be given by the arithmetic sum of the weighted standard deviation, σ m = w mn s n , which is an upper bound for the uncertainty estimate, and this is adopted for the unresolved high-frequency variability (Figure 3c). Similar to the sub-grid scale noise, the statistics is calculated for all years and it does not vary over time. The regions of intense high-frequency variability include frontal regions, western boundary currents, eastern boundary upwelling regions, edges of tropical OMZs and in the Southern Ocean fronts. These regions have strong climatological oxygen gradients and/or are the region of elevated physical variability such as wind-driven upwelling, subduction, and ocean eddies and jets (Cabré et al., 2015;Claret et al., 2018;Deutsch et al., 2011;Kwon et al., 2016;Rudnickas et al., 2019;Sasano et al., 2015;Stramma et al., 2010).

| Global loss of oxygen from 1965 to 2015
The new results include quantitative estimates of uncertainty for each grid cell; thus, it is more straightforward to compute the uncertainty of the global inventory (Taylor, 1997). The small-scale noises are assumed to be random and uncorrelated. Gaussian autocovariance functions are used to assemble the uncertainty associated with mapping errors and high-frequency variability. The e-folding scale of 1,000 km is used for horizontal directions, and in the vertical direction, the e-folding scale is set to 300 m. First, the anomalies are calculated for the vertically integrated column inventory as a function of longitude and latitude, and then they are integrated horizontally to yield the global integral. The 95% confidence intervals are calculated as two times the standard error of the combined uncertainty. In conclusion, while the new gridded results are consistent with the earlier gridded product of I17, the treatment of missing values in the global -20 -10 0 + 10 +20 -20 -10 0 + 10 +20 -20 -10 0 + 10 +20 Objective map of O2 anomaly (2005, 350m), µM integration of I17 resulted in significantly stronger estimate of oxygen loss. Figure 4 shows the global oxygen inventory time series from 1965 to 2015 with uncertainty estimates as 95% confidence interval. Based on this analysis, it is visually certain that global oxygen inventory has declined in the last 50 years. The estimate from this study is significantly more conservative than the previous work, and even so, the 99-percentile linear trend is negative for both 0-700 m depth range (−54 TmolO 2 per decade) and full water column (−273 TmolO 2 per decade).

| Comparison with an earlier study
In this section, the optimal interpolation of oxygen anomaly field is compared with an earlier product by I17. Figure 5 shows the comparison for the upper thermocline at 350 m depth from a specific year (1980,2005). The result from this study is shown in Figure 5a,b. The uncertainty due to mapping error is calculated for each grid point, which can be used as a mask for optimally interpolated oxygen field. Grid cells with significant mapping error (R 2 < 0.3) are marked with hatch. The results from this study are compared with an early study of I17 (Figure 5c,d). Comparing Figure 5a,b and c,b, the two maps are similar in terms of general patterns and locations of positive and negative anomalies. However, there are some notable differences. For example, the negative anomalies in the Sea of Okhotsk in 2005 have larger amplitude in I17 than this study. In contrast, this study and I17 show similar values in the tropical Atlantic in 2005. There are a number of technical changes in the data processing and analysis.
There are three major factors that this study is different from I17. First, the parameters used to map oxygen anomalies are different. Zonally elongated features of oxygen anomalies are evident in I17 (Figure 5b) because of the different Gaussian functions used in the mapping. I17 used longer length scale for the east-west direction than the north-south direction by a factor of 2, causing zonally elongated patterns. Secondly, the data source is updated. I17 used WOD13, and this study updated to WOD18 with some addition of newer data. I17 included both Winkler O 2 and CTD measurements, whereas this study uses Winkler O 2 only. There is a trade-off between including more data points and introducing potential biases with CTD-O 2 . Fewer data points may result in more conservative estimates. Finally, the treatment of data gaps is different. This study assigns smoothly varying values to all grid cells but I17 does not assign values to a grid cell if there is no observation within the radius of influence. As a result, the oxygen anomaly field in this study has more smoothed appearance while spatial patterns are similar. In practical terms, optimal interpolation essentially assigns zero anomaly to the grid cells far from observational constraints based on the Gaussian autocovariance function. This has implication for the calculation of global inventory as discussed below.
The global oxygen inventory is compared between this study and I17 ( Figure 6). The earlier product of I17 only covers to 1,000 m depth so this comparison is focussed on this depth range. The two cases are calculated from the I17 data set according to the treatment of missing data. The first case integrates oxygen anomalies while missing data points are replaced with zeros (red line in Figure 6). In any given year, more than 40% of the ocean grid cells in I17 are missing values because there are no measurements within the radius of influence. This has led to a concern about the amplitude of large-scale change if missing grid cells are set to zero anomaly. The second case (gold line in Figure 6) attempts to address this issue by essentially replacing missing values with the global mean concentration from the same year. This F I G U R E 6 The time series of global oxygen inventory in the upper 1,000 m; (black) this study with 95% confidence interval as grey shading, (red) I17 data with 'no correction' meaning no correction is applied for data gaps. Missing data points are set to zero anomaly. (gold) I17 data. This is the published version with missing data points being replaced with the global mean. This results in large amplitudes 1965 1970 1975 1980 1985 1990 1995  ITO increases the amplitude of the signal significantly as seen in Figure 6. For example, if 70% of grid cells are missing, the missing grid points are assigned to the average of the grid cells filled with the data (30%), effectively amplifying the signal by a factor of 3 (=100/30). This can generate significant amplification of the signal especially for the years when sampling density is low.
New results from this study (the black line in Figure 6) overlap with the conservative version of I17 replacing the missing value with zeros, but it clearly does not overlap with the version replacing the missing data with the global mean. The conservative version of I17 is essentially similar to the optimal interpolation generating a more conservative estimate of the globally integrated oxygen loss. For grid cells far from observational constraints, the signal decays and anomalies are close to zero, producing a more conservative global inventory change. Thus, the treatment of data gaps plays a crucial role for the assessment of the global inventory.

| DATASET ACCESS
This data set is available from Biological & Chemical Oceanography Data Management Office (BCO-DMO) with the Digital Object Identifier (DOI) of 10.26008/1912/bcodmo.816978.2. This data set includes the gridded oxygen anomaly at yearly resolution including its uncertainty estimates. The data set is gridded in 1° x 1° longitude-latitude grid with 102 vertical levels. The gridded data as well as metadata are contained in a single netCDF file. The data are freely open and available with no restrictions.

| CONCLUSIONS
The loss of oxygen from the global ocean is expected under the greenhouse gas emissions and the resultant ocean heat uptake. It is crucial to develop observationally derived gridded data product to evaluate the patterns and magnitudes of the past oxygen loss from recent decades. In this study, a new full-depth gridded oxygen product is generated using Winkler titration measurements only. The results are broadly consistent with the previous work of I17 in terms of the spatial patterns. The new product has more smoothed horizontal structure and conservative amplitudes, and it includes uncertainty estimates from mapping errors, unresolved small-scale and high-frequency variability.
The calculation of global inventory depends on the treatment of missing data. In this study, grid cells far from the observational constraints are essentially filled with zero anomaly. This study calculates the global inventory smaller than previous estimates, and the results from this work should be considered as a lower bound estimate of the global oxygen loss. It is beyond the scope of this work to reconcile the difference with different global inventory estimates and it is left for the future study. This issue may be related to the fact that current Earth System Model simulations forced by the historical greenhouse gas emissions tend to underestimate the rate of oxygen . This underestimation is not only in inventories but also in terms of the mean concentrations (see fig 4.13 of Long et al., 2019). It is again beyond the scope of this paper to reconcile observed and modelled oxygen changes, and future studies are warranted. The data set developed in this study is freely available from Biological & Chemical Oceanography Data Management Office, and it is hoped to stimulate further discussion and research.