We develop a new observationally derived monthly ocean surface climatology for the partial pressure of CO2 (pCO2) that allows an independent data-based constraint on contemporary air-sea CO2 fluxes. Our approach uses a neural network, trained on ~17,800 bottle-derived measurements of pCO2, to diagnose monthly pCO2 levels from standard ocean hydrographic data. Although the pattern of contemporary air-sea CO2 fluxes is generally consistent with the independent underway pCO2 data network, we find a strong shift in the magnitude of oceanic sources and sinks of CO2. In particular, we find a contemporary Southern Hemisphere oceanic CO2 uptake of 0.93 PgC/year, driven by a prominent CO2 sink in the subpolar region (25°S–60°S), that is five times the magnitude of the Northern Hemisphere oceanic sink (0.18 PgC/year). Globally, our results suggest a net open-ocean CO2 sink of 1.55 ± 0.32 PgC/year for the nominal year of 2000.
 Understanding how the ocean modulates atmospheric carbon dioxide (CO2) on higher-frequency (seasonal to interannual) scales is important since the ocean is absorbing up to one third of anthropogenic CO2 emissions based on a range of data-based and modeling estimates [Wanninkhof et al., 2012, and references therein]. Although our constraint on the global air-sea CO2 flux has improved over recent years, large uncertainties remain, particularly in understanding higher-frequency regional air-sea flows. Indirect methods have been useful in providing insights for regional flows using inversions of atmospheric CO2 [Gurney et al., 2008] or ocean interior data [Gruber et al., 2009]; however, they require the use of uncertain transport models and data synthesis methods. We have therefore solely relied on global underway pCO2 measurements and their synthesis [Takahashi et al., 2009] (herein after referred to as T-09) as the only direct data-based constraint for contemporary air-sea CO2 fluxes. Although the underway pCO2 data network has given us tremendous insight into the distributions of contemporary air-sea CO2 fluxes, considerable uncertainties remain. The Northern Hemispheric oceans, for example, are well constrained for oceanic pCO2 due to autonomous sampling undertaken mainly by commercial ships of opportunity. In the Southern Hemisphere, however, where ships of opportunity are sparse, large spatial and temporal data gaps exist in the underway pCO2 network. Where underway pCO2 measurements are sparse, simplistic interpolation schemes are required, contributing to large regional uncertainty in constraining contemporary air-sea CO2 fluxes. Here we present a new, independent monthly ocean surface pCO2 distribution using an observationally derived empirical technique that diagnoses pCO2 from biogeochemical information. This new pCO2 climatology provides an additional data-based constraint on spatial and temporal patterns of contemporary air-sea CO2 fluxes throughout the ocean.
2 Global Training Data Set
 For over 20 years, global oceanographic measurement programs like the World Ocean Circulation Experiment and Climate Variability and Predictability have collected and analyzed hundreds of thousands of in situ bottle carbon measurements of total dissolved carbon dioxide (CT) and total alkalinity (AT) along with standard hydrographic parameters (SHPs; temperature, salinity, dissolved oxygen, and nutrients) [Key et al., 2004, 2010]. In situ concentrations of CT and AT allow us to calculate pCO2 concentrations through well-known dissociation constants of CO2 in seawater (see auxiliary material), thereby providing a global-scale independent pCO2 data set (Figure 1). Although spatiotemporal coverage of the global bottle-derived pCO2 data set is too sparse on its own, the coinciding SHPs provide powerful biogeochemical information that can help diagnose pCO2 where only SHP data exist. Empirical predictions of CO2 from hydrographic properties have been successfully deployed to quantify decadal anthropogenic CO2 accumulation [McNeil et al., 2001; Wallace, 1995] and contemporary air-sea CO2 fluxes [Bates et al., 2006; McNeil et al., 2007]. If a robust empirical relationship can be established, it can be applied to much larger hydrographic data sets, thereby providing an independent data-based constraint on monthly ocean surface pCO2 distributions and air-sea CO2 fluxes.
3 Neural Network Overview
 Here we couple a neural network clustering algorithm with a principal component regression (PCR) to diagnose monthly surface pCO2 distributions in the ocean. In our approach, the algorithm captures larger-scale ocean dynamics via clustering data into “biogeochemical fingerprints” in a self-organizing map (SOM) [Kohonen, 1988]. In brief, the SOM approach utilizes bottle-derived pCO2 measurements and SHP distribution information, along with geographical constraints, to iteratively cluster the bottle measurements into a set of J neurons based on similarities and homogeneity within the data set. Using an algorithm that employs discrete clustering is appealing, as it removes the need for any ad hoc data partitioning to help empirically constrain the system. This has led to application of SOMs in a wide range of disciplines [e.g., Abramowitz, 2005; Hsu et al., 2002; Telszewski et al., 2009]. After the SOM routine has clustered the multidimensional data set, PCRs are derived between pCO2 and the SHPs using data within each neuron, each of which can be thought of as a local-scale optimizer that follows the global nonlinear optimization analysis performed by the SOM. To then predict pCO2 using any independent set of SHP measurements, a similarity measure is first used to determine which neuron best represents the SHP measurements, and then the pCO2 value is predicted using the regression parameters established with training data of that neuron. We call this approach self-organizing multiple linear output (SOMLO) [Sasse et al., 2012] (see auxiliary material for more details).
4 Application to the Global Data Set
 Training the SOMLO model is conducted using bottle-derived pCO2 measurements within the mixed layer where coinciding SHPs exist. We further refined the pCO2 training data set to be post-1980, due to large uncertainties in early measuring techniques, and excluded coastal margins to mitigate terrigenous biases on coastal samples (see auxiliary material for more details). The final number of usable mixed-layer measurements to train our global model (22,688) was derived from 293 cruises (see auxiliary material for list) and data from the Bermuda Atlantic (BATS) [Bates, 2007] and Hawaiian Ocean (HOT) [Keeling et al., 2004] time series stations.
 In order to account for the influence of oceanic uptake of anthropogenic CO2, we normalized the pCO2 data to the nominal year of 2000 in a similar way to T-09 by assuming constant CO2 disequilibrium with the atmosphere (see auxiliary material). Although there are regions that are known to break this assumption (e.g., some high-latitude regions [Lenton et al., 2012]), a sensitivity analysis was performed by training and testing our SOMLO model using data without anthropogenic corrections. Comparison between the two approaches suggests insignificant impacts on the model's ability to predict ocean surface pCO2 and therefore our final air-sea flux results (see auxiliary material for more details).
 To determine the optimal parameter combination and SOM size, we employed the same approach as outlined in Sasse et al. . For our SOMLO analysis, the SHP parameter set which captured global pCO2 with the highest skill was a combination of temperature, salinity, dissolved oxygen, and phosphate. The inclusion of geographical information in classifying the data set into 49 neurons also enhanced the global skill of the SOMLO technique by ~9% (see auxiliary material). Due to missing phosphate or dissolved oxygen measurements in some bottle samples, the number of data points available to train our optimal model was 17,753.
5 Testing the New Approach
 To independently test our model, we individually excluded the entire dataset from each cruise or time series station during the SOMLO training process and then used the excluded SHP information to predict pCO2 concentrations. Comparison between our independent predictions and bottle-derived pCO2 measurements indicates that SOMLO estimates follow a normal distribution with a bias of 0.08 µatm and standard deviation of 22.5 µatm (n = 17,350; see auxiliary material for more details). Although there were twice as many summertime measurements as winter, we found no seasonal bias from our independent testing of SOMLO (see auxiliary material).
 To illustrate SOMLO's ability to diagnose temporal pCO2 concentrations using SHP information, we compare our independent pCO2 predictions to the bottle-derived measurements at the 18 year Bermuda Atlantic time series station (Figure 2). Although there is a ~20% underestimate of the peak summertime pCO2 levels, SOMLO is able to reconstruct the 18 year seasonal pattern within its uncertainty range (blue shaded) at a location where no data was used to train the model. It is important to emphasize that BATS data was used to train our final SOMLO model.
6 Diagnosing Monthly pCO2 Climatologies
 The world ocean atlas 2009 project (WOA09) objectively analyzed millions of SHP measurements taken over a 50 year period to constrain monthly 1° × 1° SHP climatologies [Antonov et al., 2010; Garcia et al., 2010a, 2010b; Locarnini et al., 2010]. Using our SOMLO analysis, we exploit these SHP monthly surface ocean climatologies to estimate monthly pCO2 distributions for the nominal year of 2000.
 Large-scale features in our estimated annual mean pCO2 distribution (Figure 3a) are consistent with our broad understanding of CO2-rich waters in the eastern equatorial Pacific via upwelling [Feely et al., 2002] and lower pCO2 levels via solubility drivers in temperate regions. We find a global correlation of 64% between our bottle-derived SOMLO-pCO2 climatology and the underway T-09 pCO2 climatology, indicating that over most of the ocean, these two independent data-based approaches confirm the general spatiotemporal pattern of pCO2 in the ocean. However, in over one third of the ocean, the magnitude of pCO2 concentrations differs distinctly, particularly in the Southern Ocean and equatorial Pacific (see auxiliary material for seasonal distribution plots).
7 Air-Sea CO2 Flux Patterns
 We use our monthly pCO2 distributions to calculate air-sea CO2 fluxes using the equation F = kαΔpCO2, where k = gas-transfer velocity, α = solubility of CO2 in saltwater, and ΔpCO2 = difference between ocean surface and atmospheric pCO2. To quantify monthly 1° × 1° air-sea CO2 flux climatologies, we employed the widely used quadratic wind parameterization of k proposed by Wanninkhof et al.  in conjunction with the 10 m CCMP wind field product of Atlas et al. , and calculated α using WOA09 surface temperature and salinity values (see auxiliary material). We then integrated our monthly flux estimates to quantify the annual open-ocean contemporary flux distribution (Figure 3b), where negative values represent CO2 flux into the ocean.
 Our results show large CO2 outgassing in the equatorial region and strong uptake in the subpolar regions (25°–60°), in particular throughout the Southern Hemisphere (Figures 3b and 4). For the equatorial Pacific Ocean (18°S–18°N), our SOMLO analysis estimates a +1.1 PgC/year outgassing, more than double that estimated using T-09 pCO2 values (Figure 4). However, relatively sparse bottle measurements in the eastern equatorial Pacific and large interannual flux variability in the equatorial Pacific [Feely et al., 2002] make it difficult to establish whether our estimates in this region are a true reflection of the long-term trend. For the Southern Ocean, where there is better spatiotemporal data coverage and weaker El Niño/Southern Oscillation influence, our results suggest a contemporary sink for atmospheric CO2 of −0.81 PgC/year, which is much larger than the T-09–based estimate of −0.28 PgC/year, but consistent with a linear empirical approach [McNeil et al., 2007] and estimates from some ocean biogeochemical models [Lenton et al., 2013].
8 Net Oceanic CO2 Uptake Estimate and Uncertainty
 Integrating our contemporary CO2 flux estimates over the global open ocean suggests a net uptake of 1.10 PgC/year for the year 2000. To quantify errors in our contemporary CO2 flux estimates, we evaluated both systematic and random errors in our pCO2 climatologies. The systematic bias of +0.08 µatm calculated using an independent subsample test (see section 4) translates to a 0.02 PgC/year overestimation in the global net flux. For random errors in diagnosing surface ocean pCO2, we use the first standard deviation (σ) in the residual error distribution (22.5 µatm) along with quoted uncertainty in bottle-derived pCO2 (±8 µatm; see auxiliary material). To constrain the net variance in ΔpCO2, we need to additionally consider uncertainty in atmospheric pCO2, which has been estimated to be ±0.2 µatm [Takahashi et al., 2009] with no known systematic offset. Assuming that these uncertainty estimates are all one σ around a normal distribution, we quantify the net variance in ΔpCO2 to be 930.29 µatm2 [= (22.5 + 8)2 + 0.22]. The corresponding variance in flux estimates for any 1° × 1° grid cell (i) can be calculated using the following:
 When integrating flux estimates over grid cells (1… i… I), the net variance for uncorrelated uncertainties is the sum of individual variances:
 For our global integrated flux estimate, the variance due to random uncertainties in ΔpCO2 is 1.04 × 10−4 (PgC/year)2, which translates to a standard deviation in our global flux estimate of ±0.03 PgC/year within a 99.7% confidence interval.
 A second, potentially significant, source of uncertainty in our predicted ocean surface pCO2 climatology relates to reliability in the WOA09 objectively analyzed products. Parameter sensitivity tests identified temperature and phosphate as the two most important parameters for capturing ocean surface pCO2 in our SOMLO model. Despite high confidence in the global WOA09 temperature climatology [Locarnini et al., 2010], sparse in situ phosphate measurements in some ocean regions (e.g., Southern Ocean) contribute to uncertainty in the WOA09 interpolated monthly distributions [Garcia et al., 2010a, 2010b] and therefore our ocean surface pCO2 predictions. As uncertainty estimates in WOA09 objectively analyzed products remain elusive, we are currently unable to quantify uncertainty in our flux estimates due to this issue.
 Combining atmospheric and ocean surface pCO2 sources of error, along with estimated uncertainties relating to k parameterization (±0.2 PgC/year) and wind speeds (±0.15 PgC/year) [Wanninkhof et al., 2012], we calculate a global flux uncertainty of ±0.25 PgC/year [= (0.022 + 0.032 + 0.22 + 0.152)0.5].
 Our open-ocean contemporary CO2 uptake of 1.10 ± 0.25 PgC/year for the year 2000 is similar to the estimate of 1.21 ± 0.59 PgC/year derived using T-09 pCO2 distributions, where the uncertainty estimate is taken from the most recent error analysis by Wanninkhof et al. . These pCO2-constrained global contemporary air-sea CO2 fluxes are a combination of both natural and anthropogenic CO2 uptake. By including an estimated natural steady state CO2 outgassing of 0.45 ± 0.2 PgC/year from organic matter deposition from rivers [Jacobson et al., 2007], we estimate a net contemporary oceanic CO2 sink of 1.55 ± 0.32 PgC/year for the year 2000.
 Inversions of ocean interior data [Gruber et al., 2009], atmospheric CO2 inversions [Gurney et al., 2008], atmospheric O2/N2 [Ishidoya et al., 2012; Manning and Keeling, 2006], indirect tracer-based techniques [Khatiwala et al., 2009; McNeil et al., 2003], and ocean general circulation models participating in the RECCAP project (REgional Carbon Cycle Assessment and Processes) suggest that ocean anthropogenic CO2 uptake ranges from 1.9 to 2.5 PgC/year for the year 2000 [Wanninkhof et al., 2012], somewhat higher than both the underway pCO2 estimate of 1.83 ± 0.62 PgC/year [Wanninkhof et al., 2012] and our bottle-derived pCO2 estimate of 1.55 ± 0.62 PgC/year. It is important to emphasize, however, that our new technique constrains the ocean's contemporary CO2 uptake, which includes the net effect of both the anthropogenic and natural CO2 flux. The discrepancy between our net contemporary CO2 constraint and other anthropogenic CO2 estimates could be due to a range of issues including natural variability in the oceanic CO2 sink, discrepancies in quantifying the coastal air-sea CO2 budget, uncertainties in the riverine outgassing signal, undersampling biases, or uncertain transport models. Aside from providing a new constraint on the global net contemporary oceanic CO2 sink, our technique also provides a new independent way to diagnose ocean surface pCO2 distributions, which will be important in helping understand any future changes in the efficiency of the oceanic CO2 sink.
 This work would not be possible without the efforts of the captains, crew, technicians, and researchers who measured and synthesized the global carbon and hydrographic data sets. In particular, we thank the researches involved in the Global Ocean Data Analysis Project (GLODAP) [Key et al., 2004], CARbon dioxide IN the Atlantic (CARINA) [Key et al., 2010], and PACIFic ocean Interior Carbon (PACIFICA; http://pacifica.pices.jp/). B.I.M. acknowledges the funding provided to him from an Australian Research Council Queen Elizabeth II Fellowship (ARC/DP0880815).