Validating and improving the uncertainty assumptions for the assimilation of ocean‐colour‐derived chlorophyll a into a marine biogeochemistry model of the Northwest European Shelf Seas

The correct specification of all sources of uncertainty is critical to the success of data assimilation (DA) in improving the realism and accuracy of forecasts and reanalyses. This work focuses on improving the uncertainty assumptions made during the assimilation of ocean‐colour‐derived chlorophyll a$$ a $$ into an operational marine coupled physical–biogeochemical DA system, which produces daily biogeochemistry forecasts on the Northwest European Shelf Seas. Analysis of the observation–model misfits shows significant biases in chlorophyll a$$ a $$ , which vary strongly with season. The behaviour of these misfits agrees well with previous studies and can be attributed to systematic errors within the coupled model. Diagnostic metrics, used frequently within numerical weather prediction, are applied to separate out the random component of the observation and model errors, allowing for the derivation of new error covariance matrices. These new error covariance matrices are then modified to account for the biases in the model that cannot be treated explicitly within the operational DA system. This has the effect of inflating both the error variances and the correlation length‐scales. Experiments show that the new error covariances can result in significant improvements in the accuracy of the analysis and forecast. In particular, the new error covariance matrices reduce the bias in the spring phytoplankton bloom present when using the previous error covariances. Validation against independent glider observations in the North Sea also shows reductions in bias in chlorophyll a$$ a $$ and oxygen that extend below the surface to the depth of the mixed layer. Accounting for the biases in the model in the error correlations can lead to much larger improvements than not accounting for them; however, there are also regions where large degradations are seen that may indicate model instabilities. This may be improved by estimating the bias separately for the different regions on the shelf.


INTRODUCTION
Shelf seas are crucial to all life on our planet.They are known to be responsible for 90% of global fisheries, 20% of marine primary production, and 20% of atmospheric carbon dioxide uptake (Pauly et al., 2002;Borges et al., 2006;Jahnke, 2010).Here we focus on the operational analysis and forecasting of surface total chlorophyll a on the Northwest European (NWE) shelf.Chlorophyll a is a proxy for phytoplankton biomass, which forms the base of the marine food web and has important economic, environmental, and public safety consequences.Each year, phytoplankton blooms are a major ecosystem driver for the NWE shelf (Lutz et al., 2007;Henson et al., 2009).The blooms can occur throughout the year, but generally the most significant occur in February-May.
The regular assimilation of observations to initialise models of the coupled physics-biogeochemistry marine system has proven to be essential to the realism of biogeochemistry forecasts (e.g., Nerger and Gregg, 2007;Teruzzi et al., 2014;Skákala et al., 2018).An important source of information for constraining the models via data assimilation (DA) is available using satellite remote-sensing data (e.g., McClain, 2009;Aiken et al., 2014;Groom et al., 2019), which are capable of viewing the ocean with high temporal and spatial coverage.Ocean-colour-derived total chlorophyll a, in particular, has been demonstrated to be essential (e.g., Carmillet et al., 2001;Hoteit et al., 2005;Nerger and Gregg, 2007;Ciavatta et al., 2011;2016;Ford et al., 2012;Ford and Barciela, 2017).
The assimilation of data at many operational centres is performed using a class of variational DA (e.g., Waters et al., 2015).An important input into variational DA is the error statistics of the observations and the background (the prior estimate of the model state of interest made before the observations are given, usually provided by a previous forecast).The assumption that both the observations and background errors are unbiased and follow a Gaussian distribution means that their error statistics are fully described by the predefined error covariance matrices R a ∈ R p×p and B a ∈ R n×n respectively, where p is the number of observations and n is the size of the model state (number of model variables × number of grid points).The subscript a denotes that these are the matrices assumed during the assimilation of the data and may differ from the true statistics, as discussed later.The diagonal elements of these matrices represent the error variances, whilst the off-diagonal elements represent the error covariances in space and between different variables.Normalising the covariance of two variables by the variance of each variable gives the (nondimensional) correlations.The error correlations are often parameterised in terms of a correlation length-scale, given a predefined correlation function.The matrices B a and R a specify how closely the analysis of the model initial state should be constrained by the background and the observations.It is therefore crucial that B a and R a account correctly for all sources of uncertainty, and that the unbiased assumption is achieved, so that the weighting of the model and the observations is valid and the most accurate analysis is obtained.
Many different approaches have been proposed for estimating B a and R a (see Section 2).Here we will focus on the use of consistency diagnostics as proposed by Desroziers et al. (2005), which uses samples of output from an already operationally running marine biogeochemistry analysis system.The Desroziers et al. (2005) method has the benefit of representing all sources of uncertainty in the assimilation system implicitly.A disadvantage is that the estimate of B a is restricted to the space of the observations; however, for large observation datasets, such as ocean-colour-derived total chlorophyll a, that observe a key model variable directly, and by carefully binning the sample, the technique can provide meaningful estimates with minimal sampling noise.An issue specific to this application that we must contend with is significant biases in the marine biogeochemistry model, which cannot be accounted for explicitly within the current setup of the variational assimilation system.We therefore propose a method for modifying the Desroziers et al. (2005) estimate to account for these biases.
The structure of this article is as follows.In Section 2 we provide a brief overview of how B a and R a may be estimated in practice, with a focus on the consistency diagnostics of observation-model misfits (Desroziers et al., 2005).This method is then modified to account for biases within the estimate of B a .In Section 3 the 2018 case study is presented, for which we provide new estimates of B a and R a in Section 4. In Section 5 we apply these new estimates to demonstrate the impact on the forecast using the new estimates of B a and R a , focusing on the impact of accounting for biases in B a .In Section 6 we investigate whether repeated estimation of B a and R a results in convergence of the estimates.Finally, in Section 7 we summarise our key findings and discuss the limitations and implications of this work.

ESTIMATING UNCERTAINTIES IN DA
The importance of B a and R a can be appreciated by looking at the analytical form of the analysis, x a , given by the minimum of the variational cost function: where y ∈ R p is a vector of the observations and x b ∈ R n is the background state (see Kalnay, 2003 for a derivation).K a ∈ R n×p is given by B a H T (HB a H T + R a ) −1 , where H ∈ R p×n is the observation operator, the mapping from the modelled variables to those observed.The observation operator can account for observations that have a different spatial distribution from the model grid on which x is defined, and can also allow for the observations to measure different variables from those modelled.If the observations are distributed in time over the assimilation window, then the observation operator may also include a dynamical model to account for this.To simplify the notation, we assume the observation operator to be linear; however, the theory can be modified for small nonlinearities in the observation operator as is typical in DA (Lorenc et al., 2000).x a provides the most probable state given the background and observations, assuming that the errors in the data are unbiased and Gaussian and the error covariances are correctly specified by B a and R a (Kalnay, 2003).Within this section we give a brief overview of the various possibilities for estimating B a and R a .We begin by defining the error and its statistics.
Let z be the estimate of the truth contained in the vector z t .The error in z can then be defined as The background estimate of the true state, x t , given by x b comes from a forecast initiated from the previous analysis of x.The errors in the background,  b = x b − x t , are therefore an amalgamation of the errors in the previous analysis propagated by the (potentially) erroneous model: for example, due to missing processes or boundary condition uncertainty, including atmospheric forcing uncertainty, as well as errors in the physical and biogeochemical models.
For the observations, y, the true state we wish to estimate mapped to the observed variables is given by y t = Hx t .Therefore  y = y − Hx t incorporates not only instrument error but also forward model error (due to uncertainty in specifying the mapping H) and representation errors (due to the difference in scales and processes represented by y and x: Janjic et al., 2017).Ocean colour data, in particular, are subject to uncertainty in the retrieval and postprocessing to the final gridded product.If the observations are distributed in time within an assimilation window and a dynamical model is included in H, then the observation error should arguably also incorporate model error (Gejadze et al., 2017;Howes et al., 2017).
Due to the multifarious origins of the errors in both y and x b , it is difficult to estimate all components individually.Fortunately for DA we only need the combined error covariance matrices.However, these matrices may still vary in time and space.
The bias in z is given by the expectation of  z , b z = E[ z ].An equation for the error covariance of z is then given by (2) To estimate b z and Z, an estimate of the truth is necessary.For this purpose, it could be assumed that in situ data have a negligible error compared with satellite data and the model and so can be used as a proxy for the truth (e.g., Brewin et al., 2017).Due to the limited sampling of in situ data, it is difficult to obtain a large enough sample to obtain estimates of the observation and background-error variances, let alone the covariances.However, Brewin et al. (2017) used a total of 2,791 samples collected in the North Atlantic region analysed by high-performance liquid chromatography, spanning 1995-2014, to estimate ocean-colour-derived chlorophyll a biases and root-mean-square errors (RMSE), by parameterising the uncertainties as a function of optical water type (OWT) (Jackson et al., 2017).These estimates of bias and RMSE as a function of OWT were then used to map uncertainties in ocean-colour-derived chlorophyll a on a per-pixel basis (i.e., at every data point from the ocean-colour-derived chlorophyll a product).This approach can, to a certain degree, overcome issues with the distribution of data used in the validation, and accounts for uncertainties varying with the conditions and the magnitude of the observed chlorophyll a.The assumption that in situ measurements of phytoplankton group chlorophyll a have negligible error is of course problematic, but in practice it can be difficult to quantify the uncertainties in in situ data meaningfully (Boss et al., 2008;Brewin et al., 2014).Additionally, for the estimation of satellite observation-error statistics, in situ data comparisons do not allow for an estimate of uncertainty due to the mismatch in scales between the satellite observation and model, as the in situ observations represent a different scale from either (Groom et al., 2019).For estimates of surface chlorophyll a, this includes the differences in horizontal and temporal scales, as well as the depths represented by the in situ and satellite data (Sathyendranath et al., 2019).To understand the consequences of the mismatch of scales on the estimate of the uncertainty in the ocean colour product, in situ methods capable of measuring the optical and biogeochemical properties of the water continuously would be needed.
Many studies have attempted to estimate the background-error covariances using model outputs.Examples include using an ensemble of perturbed model runs (e.g., Ciavatta et al., 2018), using differences in lagged forecast (e.g., Ford and Barciela, 2017), and computing empirical orthogonal functions (EOFs) from a long time series (e.g., Cossarini et al., 2019).These methods all assume that the variability in the model encapsulates its uncertainty fully.However, it is entirely possible that sources of uncertainty are missed and model biases will not be captured.
Another approach to estimating the covariance matrices is based on consistency diagnostics of observation-background misfits, making use of the theory used to assimilate the data.This method makes no attempt to untangle the separate sources of uncertainty in B a and R a , so should, by definition, account for everything.
Let the innovation be defined as d = y − Hx b .This can be seen to be equal to the difference of errors in the observations (3) An estimate of the differences in the biases in the observation and background (b y and b b respectively) is then given by the sample mean of a population of innovations: where N is the size of the sample.
An estimate for the sum of R and HBH T (from which we can then attempt to separate out the two matrices) can similarly be given by the sample covariance of the innovations, assuming that the errors in the background and observations are uncorrelated.Here we neglect the "a" suffix on R and B, as these are a sample estimate of the true error covariances rather than the estimation used within the assimilation.
If HBH T is thought to be characterised well by a previous estimate of HB a H T , then this can be subtracted from Equation (5), leaving just R, or vice versa.If this is not the case, then various possible solutions have been proposed to untangle R and HBH T from the sample covariance of the innovations, which we now discuss.
The approach of Hollingsworth and Lönnberg (1986) is to assume that the observation errors are spatially uncorrelated, so that by fitting a function to the correlations of the innovations the variances can be separated out.In the review of Sitwell and Ménard (2020) this is referred to as an a priori method.
If significant biases are present, then ideally these would be corrected offline before the assimilation (as is attempted with the ocean-colour-derived chlorophyll a observations in our case, see Section 3) or corrected during the assimilation with methods such as weak-constraint 4DVar (e.g., Trémolet, 2007) and VarBC (Auligné et al., 2007).However, if biases are present in the innovations (violating the DA theory) and are not subtracted when estimating the innovation covariances, then The bias present in the innovations will result in the estimate of the variances as well as the correlation length-scales increasing.Therefore, the Hollingsworth and Lönnberg (1986) approach will diagnose more significant error correlations in the background than if the biases had been removed.In addition to this, it will also diagnose a proportionally larger background-error variance.Assuming the biases originate from the background and not the observations, this inflation of the background-error covariances is consistent with the suggestion by Dee and Da Silva (1998), who propose that significant biases present in the background that are not corrected before or during the assimilation should be incorporated into the covariance matrices used within the assimilation.This will have the effect that the biased background is downweighted and the bias in the analysis estimated by Equation (1) will be reduced, although the analysis-error covariance will be increased.However, overall, inflating the background-error covariances in this way will result in the analysis with the smallest RMSE possible without correcting the biases present explicitly.
An alternative method to Hollingsworth and Lönnberg (1986), which makes less strong assumptions on the structure of the covariance matrices, is the method of Desroziers et al. (2005) (from now on referred to as DBCP05).DBCP05 proposed a method to estimate the covariance matrices directly from the output of the assimilation system.Hence these are referred to as a posteriori methods in the review of Sitwell and Ménard (2020).
Let us refer to d o a = y − Hx a as the residual and d a b = Hx a − Hx b as the increment (in observation space).Using the equation for the analysis (Equation 1), we can rewrite these as d o a = (I − HK a )d and d a b = HK a d.DBCP05 showed that an estimate for the observation-error covariance matrix can then be given by where of the covariance matrices used within the assimilation, which may differ considerably from the true error covariances (without the a suffix) present in D (Equation 5).In a similar way, an estimate for the background-error covariance matrix in observation space can be given by From Equations ( 7) and ( 8), it is clear that the DBCP05 diagnostics result in the innovation covariance, D, being partitioned into observation and background (in observation space) components according to the identity minus the weights they are given in deriving the analysis in observation space (see Equation 1).R new and (HBH T ) new can also be interpreted in terms of an adjustment to the matrices used within the assimilation: where D a = (HBH T ) a + R a .
If the covariances are given correctly, then D a = D and Equations ( 9) and (10) will simply return the matrices used within the assimilation.However, if the covariance matrices are misspecified, then Equations ( 9) and (10) will return different matrices from those used within the assimilation.These will not be the true error covariances.How they relate to the true statistics was explored theoretically by Waller et al. (2016), who showed a complex interaction between misspecifying the observation-and background-error variances and length-scales.However, the new estimates can provide guidance on how to adjust the covariance matrices currently used, and have proven beneficial in numerical weather prediction, in particular for elucidating observation-error correlations (e.g., Weston et al., 2014;Bormann et al., 2016;Campbell et al., 2017;Simonin et al., 2019).
The method of DBCP05 has also been applied by Mattern et al. (2018) to the tuning of background-and observation-error variances in a physical-biogeochemical model for the assimilation of physical and chlorophyll a observations.They found that tuning only the variances in this way resulted in improved prior and posterior model-observation fits, as well as a reduction in overfitting to the observations and a better balance in the weighting of physical and chlorophyll a observations.A limitation of that study, which is addressed here, is the assumption that the underlying spatial structure of B a is correct.
The method of DBCP05 has also been used for marine biogeochemical monitoring by Cossarini et al. (2019).In that study, the DBCP05 diagnostic was used to estimate the observation-error variances for new biogeochemical-Argo float data.Performing this monthly allowed them to account for the dynamically varying representation error associated with the biogeochemical-Argo float data.
If Equations ( 7) and ( 8) are used simultaneously to update the full (HBH T ) a and R a matrices and the data are unbiased, then performing the assimilation again with these new matrices should mean that the data assimilation system will appear consistent with the innovation statistics, as (HBH T ) new + R new = D from Equations ( 7) and (8).Therefore, applying Equations ( 7) and (8) again will return the same estimates for HBH T and R. If not updating the full HBH T and R matrices simultaneously, then the estimates will differ as the diagnostic is reapplied.In the study of Mattern et al. (2018), where the covariances are rescaled without altering the length-scales, the estimates are updated iteratively to convergence.In other studies it has been assumed that one of the matrices, usually (HBH T ) a , is correct, so that the observation-error covariance matrix can be found by reapplying the estimate iteratively (Gauthier et al., 2018).However, the covariances estimated will only converge on the true error covariance matrices if the error in (HBH T ) a is negligible (Bathmann, 2018;Gauthier et al., 2018).
In Equations ( 7) and ( 8), the biases in the innovations, residuals, and increments are removed when estimating the error covariances, as this is consistent with the assumptions made in data assimilation theory.However, as discussed above, if significant biases are present in the data then this information should not be ignored during the assimilation.We could compute R and HBH T from Equations ( 7) and ( 8) without removing the biases.This will attribute some of the innovation bias to the background and some to the observations, depending on the weights I − HK a and HK a .However, in our case, as the observations are independently bias-corrected using the estimates from Brewin et al. (2017), in the following we attribute the biases in the innovations solely to the prior.Therefore, in line with Dee and Da Silva (1998), we propose a modified estimate for HBH T that incorporates the innovation biases fully: The effect of including the biases in the estimate of HBH T is to inflate the matrix so that during the 1477870x, 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License assimilation the weighting of the background is reduced, and therefore the analysis is less likely to be pulled towards a biased state.The bias in the analysis will therefore be reduced at the expense of increasing its error covariance.Increasing the background-error correlation length-scales by incorporating the biases will also have the effect of spreading information across larger regions, reducing spatial biases.Again this comes at the expense of no longer extracting as much small-scale information from the observations.
As (HBH T ) new,bias + R new equals D bias rather than D, applying the diagnostic a second time would give That is, the estimates of the covariance matrices will not converge in one step.By accounting for the biases in (HBH T ) new,bias , we expect the biases to be reduced and so in the next iteration d(d) T will be reduced compared with the previous iteration.However, if the inflation is removed then the bias will return and so it is not desirable to apply Equation ( 13) iteratively.In Section 6, we will investigate the convergence of the diagnostics further for our application.Equations ( 7), ( 8), or ( 11) can be estimated from samples of innovations, residuals, and increments.The advantages of this approach over in situ comparisons is that the sample size, N, is now the number of ocean colour data, which is orders of magnitude greater than the number of inhomogeneously placed in situ observations.Therefore, there are many more data to provide robust estimates of correlations as well as variances and biases for both the observations and the model equivalent.This could be used to allow for dependences of the uncertainties on time/location and so forth to be estimated.To increase the sample size for any given strata, assumptions of isotropy, ergodicity (perhaps within a given season), and homogeneity can be made.Careful consideration should therefore be given as to how to bin the data.

2018 CASE STUDY
An operational prediction system for the biogeochemistry of the NWE shelf is maintained by the Met Office (Edwards et al., 2012;McEwan et al., 2021).This system is based on the coupling of the ocean physics component, Nucleus for European Modelling of the Ocean (NEMO: Madec et al., 2015;O'Dea et al., 2017) and a biogeochemical component based on the European Regional Seas Ecosystem Model (ERSEM: Baretta et al., 1995;Blackford, 1997;Butenschön et al., 2016).The coupling is achieved using the Framework for Aquatic Biogeochemical Models (FABM) coupler (Bruggeman and Bolding, 2014).The coupling is one-way, with the physics affecting the biogeochemistry but no feedback from the simulated biogeochemistry to physics (Skákala et al., 2022).
To compute the new error covariance matrices, we use output from a previously published run of the NEMO-ERSEM analysis system for 2018 (Skákala et al., 2021).Details of this setup are given in the following subsections.

3.1
The physical component: NEMO NEMO is a primitive equation model adapted to regional and global ocean circulation problems.We use version 3.6, in which the prognostic variables are the three-dimensional velocity field, nonlinear sea-surface height, potential temperature, and practical salinity.In our experiments, NEMO has 7-km spatial resolution on the Atlantic Margin Model (AMM7) domain using a terrain-following z*- coordinate system with 51 vertical levels (Siddorn and Furner, 2013;O'Dea et al., 2017).The lateral boundary conditions for physical variables at the Atlantic boundary were taken from the outputs of the Met Office operational 1/12  (Hersbach et al., 2020).
In our experiments, the Atlantic boundary values for nitrate, phosphate, silicate, and oxygen were taken from the World Ocean Atlas (Garcia et al., 2013) and dissolved inorganic carbon from the Global Ocean Data Analysis Project (GLODAP) gridded data set (Key et al., 2015;Lauvset et al., 2016), while plankton and detritus variables were set to have zero fluxes at the Atlantic boundary.

3.3
The assimilative component: NEMOVAR The physical and biogeochemical data were assimilated on a daily basis into NEMO-ERSEM using NEMOVAR (Mogensen et al., 2009;Waters et al., 2015;King et al., 2018), a Three-Dimensional Variational (3DVAR)-First Guess at Appropriate Time (FGAT) method.A weakly coupled data assimilation approach is taken, whereby physical and biogeochemical variables are assimilated separately.The innovations are calculated using the background from the coupled model; separate analyses are carried out for the physical and biogeochemical variables, with the resulting increments being added back into the relevant component of the coupled model.
Observations of total chlorophyll a were provided by European Space Agency's Ocean Colour-Climate Change Initiative (ESA OC-CCI) v4.2.The ESA OC-CCI global dataset is comprised of merged Medium Resolution Imaging Spectrometer (MERIS), Aqua-Moderate Resolution Imaging Spectroradiometer (MODIS), Sea-viewing Wide Field-of-view Sensor (SeaWiFS), and Visible Infrared Imaging Radiometer Suite (VIIRS) data, providing observations of water-leaving radiance in the visible domain, derived surface chlorophyll a, and inherent optical properties (Mélin et al., 2017;Sathyendranath et al., 2019).The data are provided daily on a 4 km × 4 km grid with gaps in cloud-contaminated and low-light pixels.The data were then superobbed by taking the median of any observations within a 7-km radius to match the resolution of the model and reduce the representation uncertainty.The assimilation of chlorophyll a was performed in log space, considering that the chlorophyll a concentration is approximately log-normally distributed (Campbell, 1995) and it is therefore natural to assume that the errors will also be log-normally distributed.
The background-error variances for the biogeochemical variables were derived in Skákala et al. (2018).These are based on a monthly climatology of log-transformed error variances obtained from the 100-member ensemble Kalman filter (EnKF) Proudman Oceanographic Laboratory Coastal Ocean Modelling System (POLCOMS)-ERSEM reanalysis of Ciavatta et al. (2018).These variances were then regularised and smoothed using the moving-averages algorithm and rescaled to the range 0.02-1.5 log 10 (mg⋅m −3 ), so that the average ratio of background error to observation error was similar to that calculated in the region when assimilating OC-CCI data into a global coupled physical-biogeochemical assimilative system (Ford and Barciela, 2017).
In NEMOVAR, the background-error correlations are modelled using a diffusion operator (Weaver et al., 2016).Successive applications of an implicit diffusion operator are used to simulate the matrix multiplication of an autoregressive correlation matrix.The method is easily able to account for complex boundary conditions and allows for geographically varying length-scales (Mirouze and Weaver, 2010) and multiple length-scales (Mirouze et al., 2016).As the number of applications of the implicit diffusion operator tends to infinity, the modelled correlation function is Gaussian.In our application, we use 10 applications of the diffusion operator, which provides a good approximation to a Gaussian correlation function.For both the physical and biogeochemical variables, the horizontal correlation length-scales are specified a priori and are based on a combination of two different length-scales, a longer 111-km correlation scale (L 2 ) and a shorter length-scale (L 1 ) based on the first baroclinic Rossby radius of deformation, with a minimum value of 21 km set (King et al., 2018).The correlation between two points separated by Δx is given by For the physical correlations, the weighting, w, between the two length-scales varies spatially (King et al., 2018).For chlorophyll a, the weighting is set to 0.5.
For physics variables, NEMOVAR computes a 3D analysis using vertical correlation length-scales, which are a function of the mixed-layer depth in the background (King et al., 2018).For chlorophyll a, a 2D surface analysis is computed, and these increments are applied equally throughout the mixed layer (Skákala et al., 2018;2020).
The values of the ocean-colour-derived chlorophyll a RMSE and bias are supplied by OC-CCI (Brewin et al., 2017).These per pixel values are derived from in situ matchups that provide an estimate of the bias and RMSE as a function of OWT.Higher OWT values (indicative of turbid waters) tend to dominate around the coast throughout the year, but are also abundant in the more open oceans during the spring and summer.Lower OWTs (indicative of oligotrophic waters) tend to dominate in the open ocean from late summer.For each pixel of the ocean colour product, a daily weight is assigned to describe how well the pixel is characterised by each OWT (Jackson et al., 2017).To enable enough samples, there is no seasonal variation in these statistics.The RMSE and bias values are converted to error standard deviations using the method of Ciavatta et al. (2016).To this estimate, a fixed value of 0.01 [log 10 (mg⋅m −3 )] is added to account for representation uncertainty (Skákala et al., 2018) not included in the OC-CCI uncertainties, whilst maintaining the average ratios of the background-to observation-error variances suggested by Ford and Barciela (2017).The estimate of the bias supplied by OC-CCI is also used to correct the mean value of the data before assimilation.The observation errors are assumed to be uncorrelated, so that R a is diagonal.

The innovations
The operationally used error variances and correlations described above are referred to, hereafter, as the current B a and R a .New estimates of B a and R a derived in this section are based primarily on the DBCP05 diagnostics.
As discussed in Section 2, these are based on separating the innovation statistics out into the observation and background component.Figure 1 shows the mean and standard deviations of the innovation as a function of five bathymetry bins and month for the 2018 period.The bathymetry bins have been chosen so that they cover roughly equal numbers of grid points.It can be seen that the innovation mean and standard deviation have similar magnitude, especially in winter and late spring/early summer.The model appears to consistently underestimate the winter concentration of chlorophyll a and overestimates chlorophyll a during the spring bloom, compared with the satellite-derived estimates.The background biases (assuming the observations are unbiased) are therefore significant in a way not seen for the sea-surface temperature when similar analysis is performed (not shown).
It can be expected that the DBCP05 diagnostics will be sensitive to the choice of bins used to separate out the data.In choosing the bins, it is important that they describe the variability in the error statistics appropriately and also allow for a large enough sample within each bin.To compute the background-error statistics, we have partitioned a year's worth of innovation, residual, and increment statistics by month and by bathymetry (on-shelf (bottom depth < 250 m) and off-shelf (bottom depth > 250 m)); this partitioning is shown by the white contour in the lower panel of Figure 1.Using just two bins ensures a large sample size, which is especially crucial when estimating error correlations.To compute the observation-error statistics for log 10 chlorophyll a derived from ocean colour, we have partitioned the same sample of innovations, residuals, and increments by month and the dominant (largest weight) OWT, as is done by Brewin et al. (2017).Partitioning the data differently to estimate B and R may aid the ability of the DBCP05 diagnostic to untangle the two different components of the total innovation covariances.Within these bins, we assume homogeneity, ergodicity, and isotropy over the given sample.
To ensure the new estimates of B a and R a are free from sampling noise, we impose a lower limit on the sample size (given by the number of superobbed ocean colour data within a given bin) when computing the error covariances.This is set at 100 for the error variances and 250 for the correlations.In practice, this lower limit is only ever met when computing the observation-error variances for very high and very low OWTs (see Section 4.3).

Estimated background-error covariances
Within this section, we show the results from estimating new background-error covariances for surface log 10 chlorophyll a.We do not adjust the vertical correlation, as ocean colour data do not provide information at depth.Two estimates of both the variances and correlations are computed: one where biases are not included, as in Equation ( 8), and one in which the biases are included, as in Equation ( 11).
The NEMOVAR-ERSEM DA system has inputs of monthly varying background-error standard deviations for total chlorophyll a defined at each model grid point (Skákala et al., 2018).We wish to keep some of the spatial variability in the current estimate of the standard deviations, but this is a higher resolution than can be achieved with two bathymetry bins.Therefore, we only update the magnitudes of the variances according to the bathymetry bins by performing the following steps.
1. First, we estimate the background-error standard deviations as a function of month and the two bathymetry bins,  b new (month, bathymetry), from a year's innovation and residual samples using the DBCP05 method (Equation 8).These have been estimated by removing all biases from the innovations and residuals.We also provide a separate estimate of the background-error standard deviations, which is derived including the absolute innovation biases: background-error standard deviations are from the currently used values as a function of month and bathymetry.Including the bias acts to inflate the error variances for the periods when the biases are largest (cf. Figure 1).In most seasons the DBCP05 estimate gives smaller values than those currently used, the main exception being for the off-shelf values in February when the innovation bias is included.3. To provide a new estimate of monthly varying background-error standard deviations for total log 10 chlorophyll a defined at each model grid point, the following is then performed: This provides new estimates of the background-error standard deviations that have the same spatial variability as the previous values within a given bathymetry bin, but with updated magnitudes.To avoid a sudden jump between the bins, the perturbations added to  b new are linearly interpolated as a function of bathymetry in the range 125 m< bathymetry <3,125 m.A lower limit of 0.05 [log 10 (mg⋅m −3 )] is imposed.Again this is repeated for the  b new,bias (month, bathymetry) estimate to give  b new,bias (month, gridpoint).
Examples of the output of this process for March-August are shown in Figure 3.We see that we maintain the spatial variability in the error standard deviations present in the currently used values on and off the shelf (marked by the black contour), but have changed the magnitudes.
Using the same DBCP05 output used to update the background-error standard deviations, we also update the background-error correlations.Figure 4 shows raw estimates of the correlations estimated for each month and the two bathymetry bins.Again, two separate estimates of the correlations are provided, for when the biases are (red) and are not (cyan) included.Including the bias acts to increase the error correlation length-scales for the periods when the biases are largest (cf. Figure 1).These estimates can be compared with the currently used correlations (black lines in Figure 4), which do not vary with bathymetry or month.The new estimates have a large amount of monthly variability, but in general the length-scales are longer than those currently used, especially when the biases are included.
The current correlations (black lines) are based on those used for the assimilation of SSTs.We performed the same analysis for SST data using data from the VIIRS instruments on board Suomi-NPP.This verified that the correlations used for SST are a good estimate, with the assumption of monthly invariant length-scales being a good approximation.The bias in the SST innovations is also much less significant.The difference between the SST and chlorophyll a correlation estimates is therefore interesting, as it had been assumed that it is the physics which drives the correlations in chlorophyll a.These results show that there are other processes responsible for the correlations and they are much more complex for chlorophyll a than for SST.
To use these new estimates of the correlations, Equation ( 14) has been fitted to the correlations using curve_fit from the Python scipy.optimizepackage, summarising the curves in terms of three parameters for each month: w, L 1 , L 2 (w bias , L 1,bias , L 2,bias when bias is included).These parameters are provided in Table 1.Recall that the currently used parameters are w = 0.5, L 1 ≈ 25 km and L 2 = 111 km.There is a fair amount of monthly variability in the optimised parameters, but, comparing the yearly average, we see that when the innovation bias is included this has the effect of giving less weight to the smaller length-scale (w) and increasing the larger length-scale (L 2 ).The impact of including the innovation bias has negligible effect on the smaller length-scale (L 1 ).This is similar if we compare the optimised parameters for on the shelf (B < 0.25 km) and off the shelf (B > 0.25 km).Off the shelf, less weight is given to the smaller length-scale and the larger length-scale is increased compared with on the shelf.Again there is little change in the smaller length-scale.

Estimated ocean colour observation-error variances
Here, we use the DBCP05 diagnostic to produce new estimates of the ocean-colour error standard deviations that, by default, include the representation, instrument, and forward model uncertainty.The current estimates of observation-error variances, estimated from in situ data, are a constant function of OWT over time.The increased sample size due to comparing with the model instead of in situ observations means, again, that we are able to provide monthly estimates of the standard deviations.We do not update the bias estimates as we have assumed that the innovation bias is dominated by the model bias so have no additional information on this.We also do not provide estimates of the error correlations.This is because a good estimate of B a is needed to estimate the correlations in R a .This will be addressed in Section 6 once we have a better estimate of the background-error correlations.
Estimates of the ocean-colour error standard deviations as a function of OWT are given in Figure 5 for each month.Values are not available for every OWT, due to the sample size being too small.These values can be compared with the currently used values provided by OC-CCI.As with the background errors, the currently used values seem generally to overestimate the ocean-colour error standard deviations.As discussed in Section 2, the OC-CCI estimates do not include the uncertainty due to the mismatch in scales between the ocean colour data and the modelled surface chlorophyll a, nor do they account for uncertainty in the observation operator.It is therefore interesting that the OC-CCI estimates appear to overestimate the ocean-colour error standard deviations.This could perhaps be explained by the mismatch in the scales represented by the ocean colour data and the in situ data used to provide the OC-CCI estimates, such that the OC-CCI estimate of the ocean-colour error standard deviation has the undesirable property of including the representation uncertainty of the in situ data.The OC-CCI estimates were also estimated for the ocean colour data before they were superobbed.If the errors in the data are uncorrelated, then we can expect the superobbing to reduce the error variances.This highlights an advantage of the DBCP05 diagnostic in that it provides an estimate of the observation-error variances for the actual observation assimilated, without the need to hypothesise the missed source of uncertainties or the effect of processing the observations before assimilation.
One important consideration is how the ratio of the background-error standard deviations to the observation-error standard deviations changes for the new estimates compared with the current values.From Figures 2 and 5, we see that in general the estimates of the background-error variances are reduced more than the observation-error variances, so that in general less weight will be given to the observations.However, this is very variable and for some months and locations the opposite is true.If the weighting given to the observations is not justified, this may result in initialisation shock, in which the analysis output from assimilation of the observations is not a consistent solution to the model equations and so a large error quickly manifests in the forecast as the model tries to regain balance.This will be investigated in Section 5, where we look at the impact of the new error statistics on forecast skill.
These estimates of the error standard deviations are converted to per-pixel values using daily maps of weighted per-pixel OWT.As each pixel is characterised by more than one OWT, we are still able to provide a value of the error standard deviations from each pixel despite the fact that for some OWTs we did not have a large enough sample to estimate the error standard deviation (cf. Figure 5).Example composites of the currently used error standard deviations (top rows) and the new estimates (bottom rows) are given in Figure 6.With the current estimates, it can be seen that there is a large variability in the spatial distribution of the standard deviations each month, but not in the magnitudes.In contrast, there is much more seasonal variability in both the magnitudes and spatial distribution in the new estimates.
This monthly variability in the magnitude of the error standard deviations could plausibly be due to changes in the optical properties, for example zenith angle or changes in the representation uncertainty.It could also be an artefact of the DBCP05 diagnostic effectively leaving an imprint of the background-error standard deviations.Note: The weights are given to two significant figures and length-scales are in metres to three significant figures.

TRIALS WITH NEW UNCERTAINTY ESTIMATES
To understand the impact of the new estimates of the error statistics for the observations and background, a year's worth of analyses and six-day forecasts have been initiated daily throughout 2018, using the same model and assimilation setup used to compute the new covariances.
Four different experiments have been performed (also summarised in Table 2).
• Control: original uncertainties currently used operationally.
• New-nobias: New R and B without accounting for biases (Equations 7 and 8).
• New-biasstd: New R and B accounting for bias in the background-error standard deviations only (Equations ( 7) and ( 11) for background-error standard deviations and Equation ( 8) for background-error correlations).
To recap: the main differences between the control and new statistics are essentially a change in the ratios of the error variances, the temporal variability in observation-error variances (cf. Figure 5), and the temporal variability in the background-error correlations (cf. Figure 4).

Impact on analysis
Within this section, we first look at the impact the new error covariance estimates have on the analysis, which is then used to initialise the forecasts.Monthly varying and shelfdependent.w, L 1 , and L 2 from Table 1.
Monthly varying and shelfdependent.w, L 1 , and L 2 from Table 1.

Validation against ocean colour data
Figure 7 shows a time series for the average chlorophyll a of the different experiments over the whole model domain (left) and split into on-and off-shelf regions (middle and right respectively).These can be compared with the time series of the OC-CCI data (black line).A small negative winter bias in the analysis and a larger positive bias in spring/early summer and late summer can be seen on comparing the control experiment (red) with the observations.Using the new statistics when the biases are not included in the background-error correlations (green and yellow lines) can be seen to reduce this bias consistently.The timing of the spring bloom was already captured fairly accurately by the control; however, the magnitude was far too strong.The new statistics do not change the timing of the spring bloom across the region, but do help to reduce its magnitude, bringing it more in line with that observed.
The performance when the biases are/are not accounted for in the variances is very similar (green/yellow lines).The only difference between the two analyses is the magnitude of the background-error variances used.This suggests it is the seasonal variability in the observation-error variances and background-error correlations that is important, rather than the overall change in ratios of the background-to observation-error variances.This is consistent with the findings of Skákala et al. (2018), who found little sensitivity when tuning the ratio of the background-to observation-error variances.
The behaviour of the analysis time series is much more sensitive to accounting for the biases in the correlations (cyan lines).In many places the increase in correlation length-scales is seen to be beneficial.However, during the spring/summer bloom, the large oscillations in chlorophyll a on the shelf are unrealistic.This may be due to the larger length-scales in the B matrix spreading the information in the observations too far and impacting the stability of the model.This is supported by spatial maps of analysis increments, which indicate that the longer correlation length-scales appear, on occasion, to be spreading information from very localised features more widely than is appropriate.
Figure 8 separates out the analysis time series into different regions on the NWE shelf for April-July only (the months in which the worst performance from including the biases in the correlations was seen in Figure 7).The behaviour of chlorophyll a and the characteristics of the model bias are seen to vary across the different regions.We see that the unrealistic behaviour of the analysis when accounting for the biases in the correlations is mostly confined to the Norwegian Trench and North Sea regions, whereas accounting for the biases in the correlations is seen to be beneficial in the performance of the analysis in the NW approach, Irish Sea, and the English Channel.This suggests that, in order to gain the benefits of accounting for the biases in the correlations, the B matrix should be specified separately for the different regions to allow for spatial variability in the bias characteristics.

5.1.2
Validation against independent glider data So far we have validated the analysis against the OC-CCI observations that are also assimilated.Glider observations, although limited in number, provide an independent validation for the analyses of chlorophyll a and oxygen, as well as allowing us to see the impact below the surface.A number of gliders were deployed during 2018 as part of the Alternative Framework to Assess Marine Ecosystem Functioning in Shelf Seas (AlterECO) programme (http:// projects.noc.ac.uk/altereco/).The AlterEco gliders that operated in the central North Sea between May and August 2018 provided data for temperature, salinity, chlorophyll a (derived from fluorescence), and oxygen concentrations (Skákala et al., 2021).
Figure 9 shows the analysis validated against the chlorophyll a observations provided by the gliders.In the control simulation (Figure 9b), a large positive bias in chlorophyll a can be seen around the base of the mixed layer during the summer months.As discussed in Skákala et al. (2021), this may occur as a consequence of model dynamical adjustment to surface chlorophyll a assimilation, that is, resulting from the imbalance between the incoming irradiance and nutrient abundance beneath the mixed layer and the limited ability of ocean-colour data assimilation to correct chlorophyll a below the mixed layer.The improvement in the bias compared with the control for the new statistics is shown in Figure 9c-e.When the bias is not included in the background-error correlations (Figure 9c,e), we see a clear reduction in the positive bias throughout May and June but some increases in the bias afterwards.When the bias is included in the background-error correlations (Figure 9d), we see a similar effect to that in the surface chlorophyll a, that is, a mixture of large improvements along with large degradations.Again, this is indicative of the larger length-scales in the B matrix impacting the stability of the model.Recall that the North Sea, the location of the gliders, was the region most sensitive to the inclusion of biases in the background-error correlations (cf. Figure 8).
Figure 10 shows the analysis validated against the oxygen observations provided by the gliders.Accurate monitoring and forecasts of oxygen are paramount to identifying oxygen depletion events (i.e., hypoxia: Vaquer Sunyer and Duarte, 2008), which can have disastrous impacts on marine life.We see that the control (Figure 10b) has a positive bias at the majority of depths and times, increasing in magnitude from May onwards.Assimilating the surface chlorophyll a observations with the new error covariances is seen to reduce the large positive bias from May onwards.Again, including the biases in the background-error correlations can lead to some degradations over the control (Figure 10e).

Impact on forecast skill
Within this section we show how the changes to the analysis (mostly improvements) are propagated in the skill of the forecast.Figure 11 shows the bias versus root-median-square difference (RMSD) in six-day on-shelf forecasts of surface chlorophyll a validated against ocean-colour-derived surface chlorophyll a for February-September.As with the analysis time series, we again see that the behaviour of the forecast skill when the biases are/are not accounted for in the variances is very similar (green and yellow lines), but much more sensitivity is seen when accounting for the biases in the correlations.Note the different scales on each axis; the grey box indicates the same value ranges in each subplot.
We would expect that accounting for the biases in the background-error covariances would reduce the bias in analysis and forecast but increase the RMSD.This is generally true; however, again the improvement over the control is mixed and, in general, not accounting for the biases in the correlations gives more reliable improvements over the control.Similar conclusions can also be drawn from the forecast skill computed for the off-shelf region (not shown).
The forecast skill with lead time can provide a good indication of initialisation shock due to overfitting to the observations and an imbalanced analysis.As the degradation in forecast skill with lead time is consistent across the different experiments, there is no obvious sign of an increased initialisation shock with the new error covariance statistics.Mostly the improvements at the initial time are maintained throughout the six-day forecast.
Another interesting feature seen in these forecast skill plots is the sensitivity to accounting for the biases in the background-error correlations for the July period.Figure 4 showed that there was little difference in the correlations in July, due to the bias being negligible.Therefore this sensitivity must be propagated from May and June, when the differences in correlations when accounting for biases were large.
The forecast skill was also validated against the glider observations (not shown).It was found that the improvements seen at the analysis time were again maintained in the six-day forecast.

CONVERGENCE OF UNCERTAINTY ESTIMATES
Given the improvements in the analysis and forecast seen with the new estimates of B and R, obvious questions are as follows.
• Can we do any better if we run the DBCP05 diagnostics again?
• Do we expect the estimates of B and R to converge?
As discussed in Section 2, the presence of significant biases in the innovations, which, although reduced, still remain, disrupts the usual assumption of convergence of the DBCP05 diagnostics.The answers to these two questions are therefore not obvious.
Figure 12 shows the new standard deviation estimates for B for the first and second iteration of the DBCP05 diagnostics for the three experiments as a function of month for on the shelf only.The first iteration comprises the new estimates used in the three experiments (see Figure 2).For the second iteration, we provide values for when the biases were and were not included in the variance estimation using the output from the experiments described in the previous section.We see that, in the case in which the biases were not added to the standard deviations in the first iteration (New-nobias, left panel), the first and second iterations are very similar when the bias was again not included in the second iteration.Similarly, when the biases were added to standard deviations in the first iteration (New-biasstd and New-bias, middle and right panels), the first and second iterations are very similar when the bias is included in the second iteration.Therefore,  Figure 13 shows the new variance estimates for R as a function of OWT averaged over the year for the first and second iterations of the DBCP05 diagnostics for three experiments.Note that all experiments used the same R, so there is only one line for the first iteration.This time there is a clearer shift in the standard deviations estimated between the first and second iterations, although interestingly the second estimates are largely insensitive to the different B matrices used in the different experiments.
The outputs from the second iteration indicate a greater weighting to the observations than was achieved using the estimates of B and R from the first iteration.
Figure 14 shows the new correlation estimates for B for the first and second iterations of the DBCP05 diagnostics for the three experiments for May, June, and July for on the shelf only.The first iterations are the values used in the three experiments (see Table 1).For the second iteration, we provide values for cases in which the biases were and were not included in the correlation estimate.Similarly to the error standard deviations, we see that, in the case when the biases were not added to the correlations in the F I G U R E 12 Estimates of the background-error standard deviations for log 10 chlorophyll a [log 10 (mg⋅m −3 )] as a function of month for on the shelf.We compare the original values used (black line, same as the black line in Figure 2), estimates derived from the first iteration of the DBCP05 diagnostic (black dashed line, same as the red and cyan lines in Figure 2, depending on whether the bias was or was not included in the experiment), and estimates derived from the second iteration of the DBCP05 diagnostic with and without the bias included (red lines, solid when the bias is not included and dashed when the bias is included).[Colour figure can be viewed at wileyonlinelibrary.com] first iteration (New-nobias and New-biasstd, top and middle panels), the first and second iterations are very similar when the bias again is not included in the second iteration (red solid line).Similarly, when the biases were added to correlations in the first iteration (New-bias, bottom row of panels), the first and second iterations are very similar when the bias is included in the second iteration (red dashed line).Therefore, again the correlation estimates are close to convergence, as long as the biases are treated consistently.
Also shown in Figure 14 are the new correlation estimates for R for the second iteration of the DBCP05 diagnostics for the three experiments.Note that the original and first iterations both assumed no correlations were present in R. In each case, we see that correlations in the ocean colour data are significant (correlations  greater than 0.2 are seen at tens of km separation distances).The inclusion of these error correlations can be expected to allow for the use of much finer scale information in the observations (Rainwater et al., 2015;Fowler et al., 2018).Therefore, benefits from including correlations when assimilating ocean colour data could be expected, although this is not currently easy to implement.However, for May, in particular, it is clear that the estimate of the correlations in the observations depends upon the B matrix used within the assimilation, with longer length-scales diagnosed when the length-scales in B used in the assimilation are longer.This makes it hard to have confidence in the results, as R should be independent of B. The sensitivity of the estimates of observation-error correlations to length-scales in the B matrix using the DBCP05 diagnostics are discussed extensively in Waller et al. (2016).

SUMMARY AND DISCUSSION
The aim of this work was to derive new estimates of B and R for the assimilation of ocean-colour-derived chlorophyll a into a marine biogeochemistry model.Using the method of DBCP05 allowed us to derive new estimates of the error statistics as a function of month as well as bathymetry and optical water type (OWT) for the background and observations respectively.Within these bins, we made strong assumptions of ergodicity, isotropy, and homogeneity.The new covariances derived from these consistency diagnostics differed from those currently used operationally in the following ways.
1.The observation-error variances varied as a function of month as well as OWT, rather than just OWT, F I G U R E 14 Estimates of the background-error correlations for log 10 chlorophyll a on the shelf, May-July.We compare the original values used (black line, same in all panels and same as the black line in Figure 4), estimates derived from the first iteration of the DBCP05 diagnostic (black dashed line here the same as the purple solid and dashed lines in Figure 4, depending on whether the bias was or was not included in the experiment), and estimates derived from the second iteration of the DBCP05 diagnostic for the three different experiments (red lines, solid when the bias is not included and dashed when the bias is included).Also plotted in yellow are the estimates of observation-error correlations, which were originally and in the first iteration assumed to be uncorrelated.and included the effect of representation uncertainty implicitly.2. The background-error variances were also updated and no longer assumed that observation-to background-error variances satisfied a particular ratio.3. The background-error correlations were no longer based on those used for the assimilation of SSTs, allowing for monthly varying estimates of the correlation length-scales on and off the shelf.4. The significant bias in the model was accounted for by including the bias within the background-error covariance.
Including the biases in the background-error covariance resulted in the error variances and correlation length-scales being inflated, so that the analysis was less constrained by the biased model.The increase in correlation length-scales also implies that less small-scale information is extracted from the observations, allowing them to correct biases over a larger region.The alternative approach of removing the biases in the innovations before they are assimilated (e.g., Lea et al., 2008) was not investigated, due to the difficulty in estimating the temporal and spatial variability of the biases accurately.
We tested the impact of the new error covariance matrices within the framework of the operational CMEMS system for the NWE shelf.We found that in general, when the covariances do not account for the biases, large improvements in the skill of the analysis and forecast can be gained by updating the background-error covariances and observation-error variances.Specifically, we showed the potential for a reduction in bias of area-averaged chlorophyll a compared with the satellite data in most regions for most months.
The results are apparently insensitive to whether the background-error variances were inflated to include the model bias.This suggests that it is the seasonal variability in the covariances that is important.More sensitivity was achieved by increasing the correlation length-scales to account for model bias.However, the results were mixed, giving large improvements as well as degradations in the analysis.This may be due to the different nature of the bias in the different regions of the shelf, so that the treatment of model bias should not be applied in the same way across the whole shelf.The assumption of isotropy of the errors on the shelf may also not be justified.Future work could relax this assumption by estimating the correlations separately in the latitudinal and longitudinal directions.
No attempts were made to update the vertical correlations in B, as the application of the DBCP05 diagnostic within this article is limited to the space of the OC-CCI dataset (i.e., surface total chlorophyll a).However, validation against independent glider observations of chlorophyll a and oxygen showed that the impact on analysis skill seen at the surface largely followed through to the impact at lower depths, although the impact is more subtle.Again, the impact on variables below the surface was much more mixed when the biases were included in B. This suggests that the larger length-scales may be causing imbalances within the model.However, the improvements seen at the analysis time were found to carry through to the six-day forecast, with no evidence of initialisation shock with the new error covariance matrices.This suggests that the new error covariance matrices were more consistent with the data and the improvements at the analysis time were not simply due to overfitting the observations.
Repeated application of the DBCP05 diagnostic could be expected to provide better estimates of B and R, although this is complicated by the presence of significant (although reduced after the first iteration) biases in the innovations.We found that applying the diagnostic a second time resulted in little change in the estimate of the background-error variances and correlations, as long as the biases in the innovations were treated consistently.A second iteration of the diagnostic, however, resulted in a large reduction to the estimate of the observation-error variances.This could be due to the fact we have neglected the observation-error correlations, which may be significant for the OC-CCI dataset.At present it is not possible to account for observation-error correlations within NEMOVAR and so inflation of the observation-error variances is common practice (e.g., Hilton et al., 2009), in parallel to thinning and spatially averaging (superobbing) the data.Inclusion of observation-error correlations in the future would allow for finer scale information to be extracted from the dataset, which is not possible by simply inflating the error variances (Rainwater et al., 2015;Fowler et al., 2018).
A remaining question is how transferrable these estimates are to other years and datasets.Cossarini et al. (2019) found that online estimation of the error variances of biogeochemical-Argo float data was necessary to account for the flow-dependent nature of the representation error.However, online estimation of both B and R would be prohibitively costly.The suitability of these estimated error covariances for other years and datasets should therefore be evaluated.However, given the improvements shown in the analysis and forecast, we recommend the new error covariances be trialled for operational use in their current form.The new estimates of OC-CCI uncertainties will also be of interest to the wider ocean colour community.
Further improvements could also be expected in combining the monthly varying B defined in this work with a dynamically evolving ensemble representation of the background-error covariances, commonly known as hybrid DA (Clayton et al., 2013).Hybrid DA aims to overcome both the issues with a predefined B (Skákala et al., 2020) and the limitations of a small ensemble representation of B. This is clearly more expensive than just using the monthly varying B defined here, as an ensemble system must also be developed and maintained.The benefit also relies on the accuracy of both the predefined and ensemble representations of B (Bowler et al., 2017).Therefore, these new estimates of B will also be beneficial to these future developments.

(
y = y − Hx t ) and the background in observation space (H b = Hx b − Hx t ), such that d =  y − H b .

F
Mean of the innovations (top left) and standard deviation of the innovations (top right) in log chlorophyll a [log 10 (mg⋅m −3 )] as a function of bathymetry and month.The bathymetry bins (bottom) have been chosen so that they cover roughly equal numbers of grid points; the white contour marks the continental shelf defined by 250-m bathymetry [Colour figure can be viewed at wileyonlinelibrary.com] b new,bias (month, bathymetry) =  b new (month, bathymetry) + ||E[d(month, bathymetry)]||.(15) 2. For each month and bathymetry bin, a perturbation to the currently used background-error standard deviations is found based on the mean value of the currently used background-error standard deviations for each bin: Δ b (month, bathymetry) =  b new (month, bathymetry) −  b curr (month, bathymetry).(16) This is repeated for the  b new,bias estimate.Figure 2 shows how different the DBCP05 estimates of the 1477870x, 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License F I G U R E 2 Comparison of new estimates of the background-error standard deviations for log 10 chlorophyll [log 10 (mg⋅m −3 )], shown in cyan (not accounting for innovation bias) and red (accounting for innovation bias), and the average currently used background-error standard deviations (black) as a function of month for the two bathymetry bins (on-shelf: solid, off-shelf: dashed) [Colour figure can be viewed at wileyonlinelibrary.com]

F
Comparison of new estimates of the background-error standard deviations for log 10 chlorophyll a [log 10 (mg⋅m −3 )], shown in the middle row (not accounting for innovation bias) and bottom row (accounting for innovation bias), and the currently used background-error standard deviations in the top row as a function of latitude and longitude.From left to right: estimates for March-August.[Colour figure can be viewed at wileyonlinelibrary.com]

F
Comparison of raw output from DBCP05 diagnostics of background-error correlations in log 10 total chlorophyll a for each month and bathymetry bin (on-shelf: solid, off-shelf: dashed), shown in cyan (not accounting for innovation bias) and red (accounting for innovation bias).The black lines are the currently used correlations, which do not vary with bathymetry or month.[Colour figure can be viewed at wileyonlinelibrary.com] Comparison of new estimates of ocean-colour-derived log 10 chlorophyll a error standard deviations [log 10 (mg⋅m −3 )] as a function of OWT for each month (coloured lines, see legend) and the currently used values provided by OC-CCI (black dashed line).[Colour figure can be viewed at wileyonlinelibrary.com]

F
Comparison of monthly composites of the new estimates of ocean-colour-derived log 10 chlorophyll a error standard deviations [log 10 (mg⋅m −3 )] (bottom row) with the currently used values (top row).Left to right: estimates for March-August.[Colour figure can be viewed at wileyonlinelibrary.com]TA B L E 2 Description of variances and correlations used in the trials with new uncertainty estimates (Section 5) Currently used, fn of month and grid point.w = 0.5, L 1 based on Rossby radius, L 2 = 111 km.New-nobias diag(R new ): Estimated using DBCP05 diagnostic, Equation (7), fn of OWT and month.None diag((HBH) T new ): Estimated using DBCP05 diagnostic, Equation (8), fn of month and grid point.

F
Time series of the analysis of median chlorophyll a [mg⋅m −3 ] for the different experiments (see legend) compared with the OC-CCI observations (black line), for (left to right) the full model domain, on-shelf, and off-shelf.Smoothed over 15-day running averages.[Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 8 As in Figure 7, but separated out into different on-shelf regions from April-July.[Colour figure can be viewed at wileyonlinelibrary.com]

F
a [mg⋅m −3 ].(a) Chlorophyll a observed by glider, (b) bias in control, (c-e) change in absolute bias for the three experiments (positive is an increase in bias, negative is a reduction in bias).[Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 10 As in Figure 9 but for glider observations of oxygen [mmol⋅m −3 ]. [Colour figure can be viewed at wileyonlinelibrary.com] surprisingly, the error variances are close to convergence, especially when compared with the original values used (black line).

F
I G U R E 11 On-shelf bias versus RMSD of forecast error validated against ocean colour data.Each symbol represents a forecast of different length in integer days; the skill of the analysis is indicated by the cross and the skill of day 6 is indicated by the pentagon.Note the different scales on each axis; the vertical dashed line indicates zero bias and the grey box indicates the same value ranges in each subplot.[Colour figure can be viewed at wileyonlinelibrary.com]

F
I G U R E 13 Estimates of the observation-error standard deviations averaged over the year as a function of OWT for log 10 chlorophyll a [log 10 mg⋅m −3 ] on the shelf.We compare the original values used (black line, same as the black dashed line in Figure 5), estimates derived from the first iteration of the DBCP05 diagnostic (black dashed line, same as black line in Figure 5), and estimates derived from the second iteration of the DBCP05 diagnostic for the three different experiments.[Colour figure can be viewed at wileyonlinelibrary.com] 1477870x, 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 1477870x, 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License 1477870x, 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License , 2023, 750, Downloaded from https://rmets.onlinelibrary.wiley.com/doi/10.1002/qj.4408by University of Reading, Wiley Online Library on [08/02/2023].See the Terms and Conditions (https://onlinelibrary.wiley.com/terms-and-conditions)on Wiley Online Library for rules of use; OA articles are governed by the applicable Creative Commons License [Colour figure can be viewed at wileyonlinelibrary.com]1477870x