Combined constraints on global ocean primary production using observations and models
Erik T. Buitenhuis,
Tyndall Centre for Climate Change Research, School of Environmental Sciences, University of East Anglia, Norwich, UK
Corresponding author: E. T. Buitenhuis, Tyndall Centre for Climate Change Research, School of Environmental Sciences, University of East Anglia, Norwich Research Park, Norwich NR4 7TJ, UK. (http://tinyurl.com/contacterik)
 Primary production is at the base of the marine food web and plays a central role for global biogeochemical cycles. Yet global ocean primary production is known to only a factor of ~2, with previous estimates ranging from 38 to 65 Pg C yr−1 and no formal uncertainty analysis. Here, we present an improved global ocean biogeochemistry model that includes a mechanistic representation of photosynthesis and a new observational database of net primary production (NPP) in the ocean. We combine the model and observations to constrain particulate NPP in the ocean with statistical metrics. The PlankTOM5.3 model includes a new photosynthesis formulation with a dynamic representation of iron-light colimitation, which leads to a considerable improvement of the interannual variability of surface chlorophyll. The database includes a consistent set of 50,050 measurements of 14C primary production. The model best reproduces observations when global NPP is 58 ± 7 Pg C yr−1, with a most probable value of 56 Pg C yr−1. The most probable value is robust to the model used. The uncertainty represents 95% confidence intervals. It considers all random errors in the model and observations, but not potential biases in the observations. We show that tropical regions (23°S–23°N) contribute half of the global NPP, while NPPs in the Northern and Southern Hemispheres are approximately equal in spite of the larger ocean area in the South.
 Primary production forms the basis of the food web and as such controls how much material and energy is available for the biosphere as a whole. It also forms the entry point of inorganic nutrients and carbon into the food web, and therefore, primary production is the ecosystem flux that is most directly influenced by anthropogenic changes in global biogeochemical cycles. Primary production is of similar magnitude in the land and marine biospheres, or ~100 Pg C yr−1 for gross production and ~50 Pg C yr−1 for net production (net primary production (NPP) = gross production minus autotrophic respiration [Prentice et al., 2001]). This is where the similarity stops. Heterotrophic activity in the land biosphere is dominated by detritivory by bacteria, while in the marine biosphere, it is dominated by herbivory by zooplankton. This leads to very different standing stocks of autotrophs: ~500 Pg C in land plants [Prentice et al., 2001] and ~1 Pg C in marine autotrophic protists [Le Quéré et al., 2005], with an average turnover time for autotrophs of ~10 years on land and ~1 week in the ocean.
 Primary production in the ocean has mostly been measured by incubating samples with 14C and measuring the difference in uptake between light and dark incubated bottles [Steemann Nielsen, 1952]. Interpretation of what 14C incubations measure varies, but there is evidence supporting the conclusion that daytime 14C uptake (which is the case for most in situ measurements) measures NPP [Marra, 2009]. Because there is no consensus on this question, we will assume that 14C uptake measures particulate net primary production (NPP), though this might introduce some unquantified bias into our results.
 NPP is the flux that is the most relevant to global ocean biogeochemical cycles and, therefore, the focus of this study. Initial estimates of global NPP were made by extrapolating in situ measurements in space and time (Figure 1). Large gaps in the data coverage led to a considerable spread in the estimated NPP from in situ measurements. More recently, global NPP has been estimated using remote sensing of surface chlorophyll [Antoine et al., 1996; Behrenfeld and Falkowski, 1997; Mélin, 2003] or surface carbon [Behrenfeld et al., 2005]. These satellite-based estimates all rely on assumptions about the vertical distribution of NPP and the C:Chl (or C:backscattering coefficient) ratio. The improved spatial and temporal coverage provided by remote sensing data has narrowed down the range of estimated values to 46–65 Pg C yr−1, but a large spread remains, mainly because of the uncertainty in the vertical distribution of NPP. Furthermore, no formal error analysis has been done previously, no central (most likely) estimate exists, and the satellite-based estimates have not ruled out the lower NPP values determined from in situ measurements alone.
 Here, we combine model results and observations to constrain global NPP and its uncertainty. We first improve the vertical representation of NPP in the PlankTOM5 global biogeochemical model of Buitenhuis et al.  by implementing and validating the dynamic iron-light photosynthesis formulation of Buitenhuis and Geider . This mechanistic formulation reproduces the effects of iron and light availability on the C:Chl:Fe stoichiometry of phytoplankton and the dynamical influence of this stoichiometry on photosynthetic performance. It is thus well suited to study the spatiotemporal (including depth) distribution of phytoplankton carbon biomass and their activity in a changing light-Fe environment. We use the model and a range of perturbation experiments to calculate global NPP when the model best fits the observations and estimate its uncertainty. Finally, we verify the results using the NSI-MEM (Nitrogen, Silicon and Iron Regulated Marine Ecosystem Model) ocean biogeochemistry model [Shigemitsu et al., 2012].
2 Model Description
2.1 PlankTOM5 Biogeochemical Model
 PlankTOM5.3 is a global ocean biogeochemical model representing five plankton functional types (PFTs): three phytoplankton PFTs (diatoms, coccolithophores, and mixed phytoplankton) and two zooplankton PFTs (microzooplankton and mesozooplankton). The model has 25 state variables, including 10 phytoplankton components (three PFTs with variable Fe:C, Chl:C, and Si:C ratios and fixed macronutrient:C ratios), 2 zooplankton components, 6 dissolved/nutrient components (dissolved inorganic carbon, alkalinity, oxygen, nitrate (NO3), silicate (SiO3), and Fe), and 7 dead particulate/detritus components (small and big particulate organic carbon and iron, dissolved organic matter, and detrital CaCO3 and SiO2). PlankTOM5.3 was developed from PlankTOM5.2 of Buitenhuis et al.  and includes (1) the dynamical photosynthesis model of Buitenhuis and Geider , (2) the biophysical feedback through heat absorption by chlorophyll for each phytoplankton plankton functional type (pPFT) of Manizza [2006; see also Manizza et al., 2008], and (3) a new parameterization of the ballasting effect based on the observations of drag coefficient as a function of the Reynolds number from Ploug et al. . Here, we present only these new equations. Documentation of the other compartments can be found at http://lgmacweb.env.uea.ac.uk/green_ocean/.
2.1.1 Dynamic Photosynthesis Model
 The change in phytoplankton concentration is calculated as
in which Pie is the concentration of element e in phytoplankton PFTs i, in which the elements e are C, Fe, or Chl (N, P, and O occur in fixed ratios to C). d is the fraction of primary production that is exuded as dissolved organic carbon (DOC), PPie is the assimilation rate of phytoplankton (equations (2), (4), and (5)), wPi is the total loss rate (equation (6)), and gZjPi is the grazing rate of zooplankton Zj on phytoplankton Pi [Buitenhuis et al., 2006, 2010]. The loss rates are the same for all elements.
 The assimilation rates are based on the iron-light colimitation model of Buitenhuis and Geider  as follows: for carbon,
where PCmax is the maximum photosynthesis rate (equation (3)), αChl is the constant initial slope of the PI (photosynthesis-irradiance) curve, θC is the variable chlorophyll-to-carbon ratio, and I is the light intensity (blue plus red) in mol photons m−2 s−1. Light intensity is converted from W m−2 to mol photons m−2 s−1 using a conversion factor of 3.99 for blue light and 5.24 for red light.
 PlankTOM5.3 has fixed C:N ratios and variable C:Fe ratios. Fixed ratio models typically use saturation kinetics to model dependence of growth rate on external nutrient concentrations, while variable ratio quota models use a linear dependence of growth rate on internal nutrient quota. Droop  showed that the law of the minimum can be used to calculate the effect of multiple potentially limiting nutrients. Here we assume that limitation by a saturation model and a quota model can be combined in a single minimum function to represent the maximum photosynthesis rate:
where PCref is the maximum assimilation rate at 0°C, Q10 is the temperature dependence of growth, T is temperature, Q is the internal phytoplankton Fe:C quota, NO3 and SiO3 are the seawater nutrient concentrations, and K½ are the half saturation concentrations for growth.
 The assimilation rate for iron is
where ρhimax is the maximum iron uptake rate at iron limitation, ρlomax (= μmax Qmax) is the steady state iron uptake rate at saturating iron concentrations, Fe′ is the dissolved iron concentration, and K½, Fe is the half saturation concentration for iron uptake.
 The assimilation rate for chlorophyll is
where θCmax is the maximum chlorophyll-to-carbon ratio.
 We simplified the phytoplankton loss rates to a single term that represents respiration, aggregation, and mortality because insufficient data were available to separately represent these processes. Compared to the previous model version [Buitenhuis et al., 2010], the loss term dependent on nutrient limitation was removed, but the quadratic loss term was retained, as follows:
where L is the loss rate and Ldia is the additional loss term for nutrient-limited diatoms.
2.1.2 Biophysical Feedback
 The change in temperature through heat absorption by chlorophyll for each phytoplankton plankton functional type (pPFT) was calculated according to Manizza [2006; see also Manizza et al., 2008], as
 in which T is the temperature, Σ are the sums over the wavelength bands λ (blue, red, and infrared) or pPFTs, E is the light intensity in W m−2, z is depth at the top of each model box, Δz is the depth of each model box, ρ is the density of seawater, Cp is the specific heat of seawater, kw(λ) is the extinction coefficient of pure water, and χ(λ,PFT) is the PFT-specific extinction coefficient of chlorophyll.
2.1.3 Ballasting of Sinking Particulate Organic Carbon by CaCO3 and SiO2
 We recalculated the ballasting parameters of the model, which increases the sinking speed of big particles as their content of detrital CaCO3 and SiO2 increases. All detrital CaCO3 and SiO2 end up in big particles. Small organic carbon particles sink at a fixed rate of 3 m d−1. First, the dependence of the drag coefficient (Cd) on the Reynolds number (Re) was fit to the data from Ploug et al. [2008, Figure 2], using their measurements on fecal pellets and the data they reproduce from Taghon et al. . This gave Cd = 51 Re−0.56. Then, the drag equations [Buitenhuis et al., 2001] were solved offline by iteration. This gave pairs of particle density and sinking speed. Lastly, a simple function between particle density and sinking speed was derived for use in the model. With the new data, the function changed from concave to convex (i.e., from changing faster at high densities to changing faster at low densities). We therefore changed the function to
in which vsink is the sinking speed of big particles; ρ is the density of the particle, which is calculated from the composition in organic matter, CaCO3, and SiO2; and ρsw is the density of seawater. The densities were calculated from Ploug et al. [2008, Table 3], assuming that the organic matter in the three types of fecal pellets had the same density, giving ρorganic = 1.08 kg L−1, ρCaCO3 = 1.34 kg L−1, ρSiO2 = 1.2 kg L−1, a = 0.0303, and b = 0.6923.
2.2 Nucleus for European Modelling of the Ocean (NEMO) Physical Model and Model Forcing
 The PlankTOM5.3 biogeochemical model was run online in the ocean general circulation model (OGCM) NEMOv2.3 [Madec, 2008]. We use the ORCA configuration with a horizontal resolution of 2° longitude and on average 1.1° latitude, and a vertical resolution of 10 m in the top 100 m, increasing to 500 m at 5 km depth. The model has a free surface height [Roullet and Madec, 2000]. It is coupled to a dynamic-thermodynamic sea ice model [Timmermann et al., 2005]. The vertical mixing is calculated at all depths using a turbulent kinetic energy model [Gaspar et al., 1990]. Subgrid eddy-induced mixing is parameterized according to Gent and McWilliams .
 The standard simulation was run from 1920 to 2009. The model was spun up from 1920 to 1947 using constant atmospheric forcing of daily wind and precipitation from the National Centers for Environmental Prediction reanalysis [Kalnay et al., 1996] repeated every year using values for year 1980, followed by varying atmospheric forcing corresponding to each year from 1948 to 2009. Perturbation experiments were initialized with the output of the standard simulation and were run from 2004 to 2009. Results, unless otherwise stated, are analyzed for year 2009.
3 Materials and Methods
3.1 Database and Treatment of Observations
 We synthesized observations of 14C NPP in the ocean (Table S1 in the supporting information, Figures 2a and 2d). The average relative error of the measurements is 19%, decreasing to 12% for values over 10 µg C L−1 d−1. The error was calculated from the replicated, nonzero California Cooperative Oceanic Fisheries Investigations data only (n = 8094).
Table 1. Average Concentrations and Globally Integrated Rates in Perturbation Tests with PlankTOM5.3
 The NPP database contains 50,050 data points. The data are available at http://lgmacweb.env.uea.ac.uk/green_ocean/.biogeodata.html. The data were binned on the model grid (2° in longitude, 1.1° average in latitude, 31 vertical levels, and a monthly climatological year), leaving 22,017 grid points. From a simple average of the observations alone, about half of NPP takes place in the top 20 m and 90% in the top 100 m (Figure 2d). The database contains more data in the low latitudes (Figure 2a), with 69% of the gridded data points in the tropics (23°S–23°N), which make up 42% of the surface area of the ocean. There are no observations in the south of 25°S in winter. We also computed the vertically integrated NPP containing 7509 data points. The vertical integration greatly reduces the number of data, leaving 3132 data points on the model grid, or almost 7 times less than the depth-resolved database.
 The 14C technique measures only particulate NPP. We therefore compared the observations to the particulate NPP in the model. In addition, the model phytoplankton produce 5% of primary production as DOC (equation (1), Table S2), based on the nutrient sufficient data compiled by Nagata . Validating this dissolved NPP falls outside the scope of this paper.
Table 2. Cost Functions in Perturbation Tests with PlankTOM5.3 (Equation (11))
 In addition to NPP, the model was evaluated using the World Ocean Atlas 2005 data set of in situ chlorophyll (Figure 3d), which contains 104,689 data points on the model grid. We used World Ocean Atlas (WOA) chlorophyll (Chl) because it is depth resolved and therefore helps to evaluate our depth-resolved model. Sea-viewing Wide Field-of-view Sensor (SeaWiFS) chlorophyll is used for graphical comparison only, because its horizontal coverage is more complete. In fact, coverage is so good that between ~50°S and ~50°N, we also calculate interannual variability of surface Chl. Mixed layer depth (Figure 3d) was calculated from World Ocean Atlas 2005 temperature and salinity, using a density criterion of 0.03 kg L−1 [de Boyer Montégut et al., 2004].
3.2 Parameterization of the PlankTOM5.3 Biogeochemical Model
 The maximum assimilation rate was calculated as the maximum growth rate divided by the fraction of primary production that is particulate (i.e., not DOC):
 The maximum growth rate parameters (μmax, 0°, Q10) were fit to laboratory data (Table S2, Figure 4). For diatoms and mixed phytoplankton, we used the data from the Liverpool Phytoplankton Database [Bissinger et al., 2008]. This database did not include any measurements on coccolithophores. Therefore, for coccolithophores, we used the data of Buitenhuis et al. .
 The light limitation parameters (αChl, θCmax) were taken as the average for the three pPFTs from Geider et al. . The iron limitation parameters (ρhimax/ρlomax, K½, Fe, Qmin, Qopt, Qmax) for diatoms and coccolithophores were taken from the data for Thalassiosira oceanica and Emiliania huxleyi in Buitenhuis and Geider . The iron limitation parameters for mixed phytoplankton were based on the Pelagomonas calceolata data in Sunda and Huntsman .
3.3 Tuning of the Standard Simulation of PlankTOM5.3
 The above selection of parameters without prior model adjustments led to a strong overestimate in coccolithophores, accounting for ~60% of the phytoplankton biomass, an unrealistically high figure [Le Quéré et al., 2005]. As a consequence, CaCO3 export at 3.4 Pg C yr−1 was also unrealistically high [Lee, 2001]. We adjusted coccolithophore model parameters to obtain a realistic ecosystem distribution. We decreased the competitiveness of coccolithophores by increasing the relative preference of mesozooplankton for coccolithophores to 2.5 times that of mixed phytoplankton, doubling their K½, Fe to 2.6 nmol L−1 relative to the observed value for E. huxleyi [Buitenhuis and Geider, 2010], and increasing their K½, NO3 to 0.4 µmol L−1. Rather than change one parameter a lot, we chose to change these three parameters in an effort to keep them within the (poorly constrained) range of observed/realistic values. This gave a realistic coccolithophore biomass of 25% of the phytoplankton and a more realistic CaCO3 export of 1.6 Pg C yr−1. As in previous versions of the model, we tuned the particle degradation rate to get a realistic air-sea CO2 flux of 2.1 Pg C yr−1 in the 1990s, within 5% of Denman et al. .
3.4 Model Evaluation and Statistics
 As in Buitenhuis et al. , we evaluate the model using the following cost function:
 This cost function gives the same penalty when the model is half the observed value or when the model is twice the observed value. It is also a relative measure of error and thus gives the same penalty for small and large values. We also use the square root of the average residual sum of squares:
 This formulation is dominated by errors in the large values. It has the same units as the observations. For both these evaluations, the observations were binned onto the model grid.
 We calculated the 95% confidence intervals of NPP from the ratio of two residual sum of squares (RSS) values, using the following formula [Abramowitz and Stegun, 1972]:
where RSSmin is the value for the model simulation that best fit the observations, RSS are the values for the model simulations that are inside or just outside the confidence interval, 1.645 is the F distribution value for p = 0.05, and n is the degrees of freedom. An F distribution is appropriate for the ratio of two χ2 distributions, such as squared residuals [Berry and Lindgren, 1990]. We approximated n with the number of observations (binned by month on the model grid) because most model parameters were constrained by additional observations and were therefore not free parameters. In addition, the number of parameters in the biogeochemical model (~100) is small relative to the number of observations (20,394 for NPP and 104,689 for Chl). This procedure for calculating confidence intervals assumes that the model simulations are independent and that the average residual of the simulation with RSSmin is 0. It is clear that the model simulations are not independent, and the average residual of the simulation with RSSmin is −13% relative to the observations. Nevertheless, the model results that best match the observations and their confidence intervals should be robustly estimated given that we used very large sample sizes [Donaldson, 1968]. It should be noted that the confidence intervals only account for the uncertainty arising from the model-observation mismatch, including the measurement error in the observations, but not from any biases in the observations, the most likely of which are spatial bias and the possible underestimation of NPP by up to 13% noted in the introduction. To analyze the potential for spatial bias in the observations to introduce a bias in the best estimate of global NPP, we repeat the analysis by regions.
3.5 NSI-MEM Biogeochemical Model
 To check the robustness of our statistical methodology and results, we also use the NSI-MEM model [Shigemitsu et al., 2012]. NSI-MEM is based on the NEMURO (North Pacific Ecosystem Model for Understanding Regional Oceanography) model [Kishi et al., 2007] and includes an iron cycle [Shigemitsu et al., 2012]. The model was developed in a 1-D setting, but here we run it in the OGCM COCO (CCSR Ocean Component Model) [Aita et al., 2007]. Thus, both the biogeochemical and physical models are different from PlankTOM5-NEMO. The parameters used by Shigemitsu et al.  were tuned for application in the western North Pacific. Here, we tuned the parameters for application in the global ocean as given in Table S3.
Table 3. Regional Analysis of Primary Production (Pg C/yr)
Open Ocean PlankTOM5.3
Latitude Total NSI-MEM
The perturbation simulations did not constrain the lower 95% confidence intervals, but a priori knowledge says they must be ≥0.
 The statistical evaluation requires a number of model simulations that span the expected ranges of global NPP. These ranges were obtained by performing perturbation experiments of PFT turnover rates.
 In PlankTOM5.3, PFT turnover rates were increased to match the upper 99% of observed rates, following the concept introduced by Eppley  that the phytoplankton with the highest growth rates will outgrow the others and thus are most representative of the population growth rate. Combining these high growth rates with observed resource efficiencies would create a model organism that could not exist in nature [cf. Litchman et al., 2007]. Therefore, in the standard simulation, we have used growth rates that were fit to all data. We used the 99% approach in the perturbation experiments, not only for phytoplankton, but for zooplankton rates as well. We increased the turnover rate until 99% of the data (or at least two points for coccolithophore growth rate) were under the curve at a constant Q10. Note that by keeping the Q10 constant, it becomes easier to interpret the results than if we had increased both the turnover rate and the Q10. The following turnover rates were increased: phytoplankton growth rate, zooplankton grazing rates, zooplankton respiration and mortality, and organic particulate detritus degradation rates. The latter was insufficiently constrained by observations, so we increased it by factors 1.5, 2, and 3. This resulted in a wide range of NPP rates for the statistical evaluation. For the regional analysis of NPP, a wider range of perturbations was needed. Parameter values are given in Table S4.
 In NSI-MEM, perturbation experiments used the same general approach as for PlankTOM5.3, increasing phytoplankton maximum nutrient uptake rate and particulate organic matter degradation rates. Because of the different model structure, we do not use the same observational data to calculate the parameters in the perturbation experiments but use simple scaling factors (Table S5).
 About 3 times more perturbation experiments were performed with both models than are reported here. The experiments cover changes in zooplankton and remineralization parameters and all the phytoplankton parameters in equations (1)–(5). We report only the simulations that showed the lowest RSS at each global NPP. This ensures that the reported RSSmin is the best constrained by the observational database and that the reported 95% confidence interval covered was as good a sample of the total parameter space as we could achieve and thus at its widest.
4.1 Evaluation of New Model Components
 We evaluated the PlankTOM5.3 model against 4-D fields (latitude, longitude, depth, and month) of NPP, chlorophyll concentration, phytoplankton growth rate, microzooplankton-caused phytoplankton mortality rate, microzooplankton concentration, 2-D fields (latitude, longitude) of mesozooplankton concentration, export at 100 m, and the global rate of mesozooplankton grazing on phytoplankton. Data sources for observations are given in Table 1. Compared to the PlankTOM5.2 model [Buitenhuis et al., 2010], PlankTOM5.3 shows improvements in all cost functions relative to the observational databases (Table 2) and also improved average rates and concentrations for NPP, average chlorophyll and microzooplankton concentration, microzooplankton-caused phytoplankton mortality rate, and global mesozooplankton grazing rates, the same average phytoplankton growth rate, but deteriorated average mesozooplankton concentration and export (Table 1).
 The chlorophyll concentration in PlankTOM5.3 has clearly improved relative to PlankTOM5.2 (Tables 1 and 2; Figures 3a–3c). We performed sensitivity simulations without the biophysical feedback or the new ballasting formulation (see sections 2.1.2 and 2.1.3). These simulations showed virtually the same improvements in chlorophyll concentration, NPP, and other cost functions (Table 2), showing that the improvements are due to the new photosynthesis model. The model now includes a clear deep chlorophyll maximum in the tropics with no deep productivity maximum (Figure 2), as observed. However, the deep chlorophyll maximum is too deep, probably due to the fact that the physical model produces an upper mixed layer that is too deep (Figures 3d and 3f).
 There is a marked improvement in the interannual variability of chlorophyll in PlankTOM5.3 compared to PlankTOM5.2 (Figures 3g–3i). Comparison of the relative interannual variability in Chl:C, phytoplankton C, and NPP shows that the former is low, while the latter two closely resemble the interannual variability in Chl in the respective model versions (data not shown). This suggests that the improvement in the model is not primarily due to a better representation of phytoplankton chlorophyll content but that the signal originates in a better representation of NPP and propagates into phytoplankton carbon. This interpretation is consistent with the fact that PlankTOM5.2 already included variable Chl:C, while PlankTOM5.3 has added the effect of the Chl:C ratio on NPP. It thus suggests that the new formulation, which is more directly based on observations for its equations and parameters, better captures the sensitivity of (carbon) NPP to variability in the atmospheric forcing. This will be an important improvement when exploring the feedbacks between climate change and global biogeochemical cycles in future studies.
 As noted above, the cost function of export at 100 m has improved relative to the inversion results of Schlitzer . Since ocean biogeochemistry, in particular air-sea CO2 flux, is very sensitive to export, this result warrants further analysis. We compared the global average nutrient profile of PlankTOM5.2, which has slightly higher export, and the standard simulation of PlankTOM5.3 to the World Ocean Atlas observations (Figure 5). Both model runs were initialized with these observations, and the standard simulation has remained closer to the observations, even though PlankTOM5.3 was run longer (90 years) than PlankTOM5.2 (60 years). As a consequence, the cost function has decreased from 1.5 to 1.3. This would only be an indication that the model has improved if the ocean had been in steady state over that time, which we know it was not [Le Quéré et al., 2007]. However, the interannual variability of chlorophyll has also improved considerably in the standard simulation (Figures 3g–3i), which is sensitive to the nutrient concentration gradient across the permanent thermocline. This gradient is sharper and thus closer to the observations in the standard simulation (Figure 5c). We therefore conclude that the lower export constitutes a real improvement in the model.
 The supporting information presents additional graphs comparing observations and model for microzooplankton concentration, microzooplankton-caused phytoplankton mortality, phytoplankton growth rate, and mesozooplankton and phosphate concentration and export (Figure S1).
4.2 Global and Regional Estimates of NPP
 The previous model version, PlankTOM5.2, already was the ocean biogeochemical model with the best fit to the observations of NPP at Bermuda Atlantic Timeseries Station (BATS) and Hawaii Ocean Timeseries (HOT) out of 12 models tested [Saba et al., 2010]. Here, we make a further slight improvement in the cost function (Table 2) and the depth-resolved and vertically integrated RSS0.5 (Figures 6a and 6b) of PlankTOM5.3 relative to the observational database of NPP (equations (11) and (12)).
 The perturbation experiments with both models cover a range of NPP from ~10 to ~85 Pg C yr−1 (parameter values in Tables S4 and S5). The simulations have clear increases of the RSS0.5 at the high and low ends which are used to estimate the 95% confidence intervals. The global modeled ocean NPP most consistent with the depth-resolved 14C observations is 56 Pg C yr−1 with a 95% confidence interval of 51–65 Pg C yr−1. Perturbation experiments with the NSI-MEM model suggest the same global NPP of 56 Pg C yr−1, but with a wider 95% confidence interval of 42–68 Pg C yr−1 (Figure 6a).
 The global modeled ocean NPP most consistent with the vertically integrated 14C observations is 51 Pg C yr−1 (95% confidence interval 37–63 Pg C yr−1) for PlankTOM5.3 and 44 Pg C yr−1 (95% confidence interval 23–53 Pg C yr−1) for NSI-MEM (Figure 6b). The vertically integrated estimate for PlankTOM5.3 agrees well with the depth-resolved estimate. However, the 95% confidence interval is almost twice as wide because of the smaller number of data points. Because NSI-MEM overestimates NPP in the tropics (Figure 2c) where there are about twice as many observations as in the rest of the ocean (Figure 2a), the lower estimate of vertically integrated NPP in NSI-MEM can be discounted as biased.
 We also used the perturbation experiments to calculate the best estimates of NPP in 10 regions: 9 open ocean regions in the Pacific, Atlantic, and Indian Ocean basins divided into tropics (23°S–23°N) and the extratropics in both hemispheres, and the coastal (<200 m depth) ocean (Table 3). We also estimated the 95% confidence intervals for each region with the PlankTOM5.3 model. The confidence intervals from the NSI-MEM model for some of the regions were not sufficiently constrained by the perturbation experiments. In both models, the largest contributions to global NPP are made by the Pacific Ocean and the tropical region, both about 50%, and roughly a quarter by the Atlantic and Indian Oceans and also by the southern and northern extratropics, while the coastal ocean contributes 8% in PlankTOM5.3 and 9% in NSI-MEM. Relative to the region areas, both models agree that the largest contribution is made by the coastal ocean, 77% larger than expected from its area in PlankTOM5.3 and 98% larger in NSI-MEM, followed by the North Pacific, 54% and 40% larger than expected from its area, and the equatorial Indian Ocean, 26% and 49% larger than expected from its area, and also that the smallest relative contribution is made by the South Atlantic, 37% and 67% smaller than expected from its area. The South Indian and South Atlantic are least constrained by the observations, 87% and 84% fewer observations than expected from their areas. The models disagree about the absolute amount of NPP in the South Pacific and South Indian, with lower NPP in NSI-MEM in the South Pacific and higher in the South Indian, and the NPP in the equatorial Pacific is also slighty lower than the PlankTOM5.3 95% confidence interval.
 We compare our results to four satellite-based algorithms of NPP. Although satellite algorithms are sometimes treated as if they were observational data by ocean biogeochemical modelers, these algorithms are in fact themselves models with considerable uncertainties associated with their equations, parameters, and input data choices [Friedrichs et al., 2009; Saba et al., 2010]. Satellites do not cover the whole ocean, mostly because of cloud cover. The results with PlankTOM5.3 suggest that for the SeaWiFS satellite, NPP at grid cells where chlorophyll data are not available from the SeaWiFS satellite is about half of the average NPP over the SeaWiFS-covered part of the ocean. Therefore, the range of global NPP for each satellite algorithm was estimated between a low estimate assuming zero NPP when no chlorophyll data were available and a high estimate assuming an average NPP (Figure 6b). Our calculated global NPP rates for the satellite algorithms are slightly different from the original publications because we calculated the results on our model grid. For Antoine et al.  we calculate 38.0–39.0 Pg C yr−1, for Behrenfeld and Falkowski  46.0–47.6 Pg C yr−1, for Mélin  50.3–51.3 Pg C yr−1, and for Behrenfeld et al.  65.7–70.7 Pg C yr−1. For the Behrenfeld and Falkowski  algorithm, our high estimate matches the corrected estimate from this algorithm in Field et al. . The RSS0.5 for the Behrenfeld and Falkowski  algorithm is comparatively high. This result is dominated by overestimations of more than 5000 mg C m−2 d−1 in only 23 coastal sites (<1% of the observations). The Antoine et al.  algorithm falls below both of the depth-resolved 95% confidence intervals, and the Behrenfeld and Falkowski  algorithm falls below the PlankTOM5.3 depth-resolved confidence interval. The Behrenfeld et al.  algorithm falls just above the PlankTOM5.3 depth-resolved confidence interval, only partially overlaps with the NSI-MEM depth-resolved confidence interval, and is well above both vertically integrated confidence intervals. The Mélin  and Westberry et al.  algorithms are closest to our best estimates of depth-resolved global NPP and fall within the PlankTOM5.3 depth-resolved confidence interval. Our results confirm, on a global scale, the analysis of Saba et al.  at BATS and HOT that “the difference in overall skill between the best BOGCM and the best ocean color model at each site was not significant” (Figure 6b).
 Our analysis suggests that the Behrenfeld and Falkowski  model underestimates global NPP, whereas Milutinovic and Bertino  suggest it has a positive bias. This discrepancy occurs despite the fact that Milutinovic and Bertino  exclude the coastal ocean, which we find to be overestimated and the biggest contributor to RSS0.5 in the Behrenfeld and Falkowski  model. The reason for this discrepancy is due to methodology: Milutinovic and Bertino  assume that the Behrenfeld and Falkowski  model is flawless and that the uncertainty of the output stems from the uncertainty in the input data, whereas we directly assign a cost function to the output using observed local 14C primary production. We make no a priori assumption that our standard simulation is the best model (it is not) but use the observations to estimate both the most likely global NPP and its confidence interval.
 By combining state-of-the-art models of ocean biogeochemistry and physics, we are able to analyze a model that is depth resolved and has a physically realistic, mechanistic representation of photosynthesis, light extinction, temperature distribution, and nutrient transport. The inclusion of a new photosynthesis model in PlankTOM5.3 has improved the representation of NPP and all the other cost functions of biogeochemical concentrations and rates (Table 2). The interannual variability of the detrended NPP in the standard simulation is 0.4 Pg C yr−1, or a peak-to-peak variability of 1.7 Pg C yr−1. Thus, interannual variability is small relative to the 95% confidence interval of the year 2009 model results.
 The two main ways to narrow the confidence interval are to improve the model (decrease RSSmin, equation (13)) and to increase the number of observations. Analysis of the spatial distribution of residuals between NSI-MEM and the observations suggests that the 95% confidence interval is larger than in PlankTOM5.3 because in the simulation that fits best to the observations (RSSmin, equation (13)), there is a trade-off between some regions being overestimated and some regions being underestimated. Because of this, perturbation experiments lead to improvements in some regions, resulting in a slow increase in the RSS away from RSSmin and thus a large confidence interval. To some extent, this trade-off is probably an inevitable product both of using point observations with measurement errors to evaluate relatively large regions in space and time, and of inadequacies in the models. However, the results are consistent with the expectation that a better model would suffer less from such a trade-off between localized improvements and deteriorations, and would therefore have a faster increase in RSS away from RSSmin and therefore have a smaller 95% interval, as is found for PlankTOM5.3. In addition to model improvements, the other way in which we have constrained the confidence interval is by effectively increasing the number of observations. The depth-resolved confidence intervals, which are based on 7 times more data, are smaller than the vertically integrated confidence intervals for both models. For the depth-resolved evaluation of PlankTOM5.3, which has the lowest RSSmin, the model is now able to provide a confidence interval that is smaller than the range of satellite algorithms published since 1996.
 The regional analysis with PlankTOM5.3 is well within the 95% confidence interval of the global analysis with both models. This shows that despite a spatial bias in the number of observations, the model is mechanistically realistic enough that it can produce reliable results in regions where NPP is less well constrained by the observations. Though the regional analysis with NSI-MEM gives a lower global total, the upper 95% confidence interval of this estimate is unconstrained and thus does not contradict these results. More observations in the Southern Ocean could improve the constraints on the estimated NPP and resolve the disagreement between the regional analysis of the two models in the South Pacific and South Indian.
 Our analysis shows that implementing the dynamic iron-light colimitation model of Buitenhuis and Geider  leads to a clear improvement of the ability of the PlankTOM5.3 model to reproduce the observations, in particular the interannual variability of chlorophyll. However, the parameterization of the model is based on few data and could probably be improved further by additional constraints on phytoplankton physiological parameters. Improvements in the mixed layer depth in the OGCM might also result in a narrower confidence interval. This scope for improvement can be seen in the depth of the deep chlorophyll maximum, which is too deep in PlankTOM5.3 (Figure 3). The too deep chlorophyll maximum also leads to an overestimation of NPP below 60 m (Figure 2). NPP below 60 m contributes ~20% to the global total, but only 2% to the RSS, because NPP variability decreases with NPP (data not shown). Therefore, this overestimation has only a small impact on the confidence interval of global NPP.
 Figure 7 summarizes the current understanding of the global marine ecosystem carbon fluxes and standing stocks. As has been shown before [Calbet, 2001; Calbet and Landry, 2004], ~80% of NPP is grazed by zooplankton. We are not aware of estimates on how the remaining 20% gets partitioned between phytoplankton mortality, viral loss, and direct export of (aggregating) phytoplankton. After passing through the epipelagic food web, about one sixth of NPP is exported below 100 m, which implies that five sixth of NPP is regenerated production.
 In conclusion, our results with vertically integrated NPP support the wide range of previously reported global estimates, but it provides a quantitative measure of error and a most likely estimate. The depth-resolved evaluations of model NPP against observations contain more information than the vertically integrated evaluation and therefore give a better constrained estimate. Both models agree that our best estimate of global ocean particulate NPP from the depth-resolved evaluations is 56 Pg C yr−1, while the 95% confidence interval from the better constrained PlankTOM5.3 model is 51–65 or 58 ± 7 Pg C yr−1 (Figure 6a).
 We thank the many scientists and institutions that made, compiled, reanalysed, and made publicly available all the measurements that were used to initialize, force, and evaluate the model, in particular the Ocean Primary Production Working Group (OPPWG), Jan Bissinger, and Richard Geider. We thank Clare Enright for programming support, Gareth Janacek for help with the statistics, the Laboratoire d'Océanographie et du Climat for making the code of the NEMO model available, and the Research Computing Service at the University of East Anglia for use of the Linux cluster. We thank two anonymous reviewers and Toby Westberry for their helpful reviews. We thank the Laboratoire d'Océanologie de Villefranche-sur-mer and NERC (contracts NE/C516079/1 and NE/G006725/1) for financial support.