Uncertainty Quantification of Ocean Parameterizations: Application to the K‐Profile‐Parameterization for Penetrative Convection

Parameterizations of unresolved turbulent processes often compromise the fidelity of large‐scale ocean models. In this work, we argue for a Bayesian approach to the refinement and evaluation of turbulence parameterizations. Using an ensemble of large eddy simulations of turbulent penetrative convection in the surface boundary layer, we demonstrate the method by estimating the uncertainty of parameters in the convective limit of the popular “K‐Profile Parameterization.” We uncover structural deficiencies and propose an alternative scaling that overcomes them.


Introduction
Earth System Models (ESMs) require parameterizations for processes that are too small to resolve. Uncertainties arise both due to deficiencies in the scaling laws encoded in the parameterizations and the nonlinear interactions with resolved model components, sometimes leading to unanticipated and unphysical results. The first challenge can be addressed by improving the representation of the unresolved physics (e.g., , while the second requires "tuning" of the parameterizations when implemented in the full ESM (e.g., Hourdin et al., 2017). In this paper, we illustrate how to leverage recent advances in computation and uncertainty quantification to make progress toward the first challenge. Our focus will be on oceanic processes, but the approach can be applied to any ESM parameterization, provided that a high-resolution submodel can be constructed.
The traditional approach to the formulation of parameterizations of subgrid-scale processes is to derive scaling laws that relate the net effect of such processes to variables resolved by the ESMs. These scaling laws are then tested with either field observations (e.g., Large et al., 1994;Price et al., 1986), laboratory experiments (e.g., Cenedese et al., 2004;Deardorff et al., 1980), or results from a high resolution simulations (e.g., Harcourt, 2015;Li & Fox-Kemper, 2017;Reichl et al., 2016;Wang et al., 1996). Rarely are parameterizations tested over a wide range of possible scenarios due to the logistical difficulty and high cost of running many field experiments, the time necessary to change laboratory setups, and computational demand. The computational limitations have become much less severe over the last few years through a combination of new computer architectures such as Graphic Processing Units (GPUs; Besard, Churavy, et al., 2019), new languages that take advantage of these architectures (e.g., Julia Bezanson et al., 2017), and improved large eddy simulation (LES) algorithms (Sullivan & Patton, 2011;Verstappen, 2018). Modern computational resources have opened up the possibility of running libraries of LES simulations to explore a vast range of possible scenarios. This paper discusses how such computational advances can be applied to assess parameterizations in ocean models.
The Bayesian approach can also be used to test the functional dependence of the parameterization on large-scale variables. One estimates the posterior distribution on subsets of the LES simulations run for different scenarios. If the posterior probabilities for the different scenarios do not overlap, the functional form of the parameterization must be rejected. We will illustrate how this strategy can be used to improve the formulation of a parameterization.
Bayesian methods are particularly suited to constrain ESM parameterizations of subgrid-scale ocean processes. Most of these processes, such as boundary layer or geostrophic turbulence, are governed by well understood fluid dynamics and thermodynamics. Thus, LES simulations provide credible solutions for the physics. The atmospheric problem is quite different where leading order subgrid-scale processes such as cloud microphysics are governed by poorly understood physics that may not be captured by LES simulations.
In this paper, we will apply Bayesian methods to constrain and improve a parameterization for the surface boundary layer turbulence that develops when air-sea fluxes cool the ocean. LES simulations that resolve all the relevant physics will be used as ground truth to train the parameterization. Our paper is organized as follows: In section 2 we describe the physical setup and the LES model. In section 3 we introduce Bayesian parameter estimation for the parameters in the K-Profile Parameterization (KPP). We then perform the parameter estimation in the regime described by section 2 and show how the Bayesian approach provides insight on how to improve the KPP parameterization. Finally, we end with a discussion in section 4. between the mixed layer and stratification below cools the boundary layer. This process, in which the layer is cooled both at the surface and by turbulent mixing from the entrainment layer below, is called penetrative convection. Here we evaluate the ability of the K-Profile Parameterization (Large et al., 1994) to capture penetrative convection by comparing predictions based on it against large eddy simulations (LES) of idealized penetrative convection into a resting stratified fluid. It provides the context in which we outline the Bayesian approach to parameter estimation which we advocate.

Penetrative Convection Into a Resting Stratified Fluid
We suppose a constant surface cooling Q h > 0 to a resting, linearly stratified boundary layer with the initial state u| t=0 = 0 and b| t=0 = 20 g + N 2 z +  (0, 10 −10 g) exp(4z∕L z ), where z ∈ [− L z , 0], u = (u, v, w) is the resolved velocity field simulated by LES, b is buoyancy, N 2 is the initial vertical buoyancy gradient, and  (0, g10 −10 ) is a Gaussian white noise process added to induce a transition to turbulence. The surface buoyancy flux Q b is related to the imposed surface cooling Q h , which has units W m −2 , via where = 2 × 10 −4 ( • C) −1 is the thermal expansion coefficient (assumed constant), g = 9.81ms −2 is gravitational acceleration, ref = 1035kgm −3 is a reference density, and c p = 3993 J/(kg • C) is the specific heat capacity. Our software and formulation of the large eddy simulation model are discussed in Appendix A.
Results from a large eddy simulation of turbulent penetrative convection in a domain L x = L = L z = 100 meters and 256 × 256 × 512 grid cells, respectively, is presented in Figure 1. The resulting horizontally averaged temperature profiles are not affected by the domain size. The left panel shows the three-dimensional temperature field = 0 + b∕ g associated with the buoyancy b, where 0 = 20 • C is the surface temperature at z = 0. The right panel shows the horizontally averaged buoyancy profilē (3) Figure 2. Boundary layer depth and its evolution in time after initial transients. The blue squares are the analytic scaling 4, the red line is an estimate of the boundary layer depth directly from the LES (described in the text), and the purple line is the classic scaling which ignores the entrainment layer 8.
The visualization reveals the two-part boundary layer produced by penetrative convection: close to the surface, cold and dense convective plumes organized by surface cooling sink and mix ambient fluid, producing a well-mixed layer that deepens in time. Below the mixed layer, the momentum carried by sinking convective plumes leads them to overshoot their level of neutral buoyancy (nominally, the depth of the mixed layer), "penetrating" the stably stratified region below the surface mixed layer and generating the strongly stratified entrainment layer. The total depth of the boundary layer is h and includes the mixed layer and the entrainment layer of thickness Δh. Turbulent fluxes are assumed negligible below z = −h.
In Figure 2, we show the evolution of h(t) defined as the first depth from the bottom where the stratification is equal to a weighted average of the maximum stratification and the initial stratification (The weights are 2/3 for the initial stratification N 2 and 1/3 for the maximum stratification N 2 m so that h satisfies zb (−h) = 2N 2 b ∕3 + N 2 m ∕3. This guarantees that h is a depth where the local stratification lies between the background stratification and the maximum stratification since it is defined as the first depth starting from the bottom that satisfies such a criteria.). The dotted line confirms that the evolution after an initial transient is best fit by the formula, where N 2 is the initial stratification and the numerical factor is a best-fit parameter.
Equation 4 is easily understood through dimensional considerations (up to prefactors), but more information flows from an analysis of the horizontally averaged buoyancy equation, whereb is the horizontally averaged buoyancy, wb is the horizontally averaged vertical advective flux, and q (z) is the horizontally averaged vertical diffusive flux. Integrating the equation in time between t ′ = 0 and some later time t ′ = t, and in the vertical between the surface, where q (z) = −Q b , and the base of the entrainment layer where all turbulent fluxes vanish, one finds Substitutingb(z, 0) = b 0 + N 2 (z + h) andb(z, t) = b 0 + Δb, an approximation of the profile shown in Figure  1b except at very early times in the simulation, yields The first term on the left of Equation 7 describes boundary layer deepening due to buoyancy loss at the surface, while the second term corresponds to the further cooling caused by turbulent mixing in the entrainment layer. Other authors have also arrived at a similar expression for the boundary layer depth upon taking into account turbulent entrainment. See, for example, Appendix F in Van Roekel et al. (2018).
Ignoring turbulent mixing in the entrainment layer, that is, setting Δb = 0, yields the deepening rate which differs by roughly 20% from the best fit expression 4 due to the effects of turbulent mixing in the entrainment layer. Equation 8 is the deepening rate associated with a convective adjustment parameterization and is known as the empirical law of free convection. We now review how these processes are represented in the KPP model.

The K-Profile Parameterization of Penetrative Convection
In penetrative convection in a horizontally periodic domain, the K-Profile Parameterization models the horizontally averaged temperature profile,̄(z, t) with the coupled equations where T(z, t) is the modeled temperature meant to approximatē(z, t), h(t) is the boundary layer depth, C = {C S , C N , C D , C H } is a set of free parameters, F(T, h; C) is the parameterized temperature flux, and (T, h; C) is a nonlinear constraint that determines the boundary layer depth at each time t. Our formulation, which isolates the four free parameters {C S , C N , C D , C H }, is superficially different but mathematically equivalent to the formulation in Large et al. (1994) (see Appendix C for details). Finally, we emphasize that the K-Profile Parameterization is deemed successful only if it accurately models the evolution of the entire observed temperature profilē(z, t), rather than, say, the boundary layer depth or the buoyancy jump across the base of the mixed layer.
The K-Profile Parameterization (KPP) represents F through the sum of a downgradient flux and a non-local flux term (Large et al., 1994), for −h ≤ z ≤ 0 and 0 otherwise, and = min {C S , z∕h}.
is the "K-profile" shape function (K is the namesake downgradient diffusivity of the K-Profile Parameterization), and Φ is a "non-local" flux term that models convective boundary layer fluxes not described by downgradient diffusion.
The KPP model estimates the boundary layer depth h using the nonlinear constraint (10). The boundary layer geometry introduced in the right panel of Figure 1 motivates the form of nonlinear constraint. The jump in buoyancy, Δb, is the difference between the buoyancy in the mixed layer and the base of the entrainment region. The buoyancy jump may thus be written in terms of the entrainment region thickness, Δh, and the entrainment region buoyancy gradient, N 2 e , as Δb = N 2 e Δh. Using the plume theory outlined in Appendix B to motivate the scaling Δh ∝ w ⋆ /N e , we thus find for some universal proportionality constantC H . KPP posits that the boundary layer depth h is the first such depth from the surface at which Equation 12 holds.

Journal of Advances in Modeling Earth Systems
10.1029/2020MS002108 Large et al. (1994) estimate the mixed layer buoyancy with an average over the 'surface layer', 1 C S h ∫ 0 −C S h B(z)dz where B = gT, and 0 < C S < 1 is a free parameter that defines the fractional depth of the surface layer relative to the total boundary layer depth, h. The buoyancy jump becomes, therefore, Large et al. (1994) then express the stratification in the entrainment region, N e , in terms of the stratification at the base of the boundary layer, such that The scaling in Equation 14 introduces a new free parameter in addition toC H ; however, because this free parameter is not independent fromC H , we combine the two into a new free parameter C H , which we call the "mixing depth parameter." To prevent division by zero, the small dimensional constant 10 −11 m 2 s −2 is added to the denominator of Equation 12 (Griffies et al., 2015). Combining Equations 12, 13, and 14, we can write Equation 15 is the implicit nonlinear constraint in Equation 10 that determines the boundary layer depth, h. In Appendix B we discuss the physical content of Equation 15 for the case of penetrative convection.
The boundary layer depth criteria in Equation 15 is often referred to as the bulk Richardson number criteria, because in mechanically forced turbulence the denominator is replaced by an estimate of the mean shear squared and C H becomes a critical bulk Richardson number (Large et al., 1994). In penetrative convection there is no mean shear, and C H is not a Richardson number. See Appendix C for more details.
The representation of penetrative convection in KPP has four free parameters: the surface layer fraction C S , the flux scalings C N and C D in Equation 11, and the mixing depth parameter C H in Equation 15. Ranges for their default values are reported in Large et al. (1994). We choose reference parameters within those ranges as (C S , C N , C D , C H ) = (0.1,6.33,1.36,0.96).
These parameters are not the original set of independent parameters proposed by Large et al. (1994), but rather algebraic combinations thereof. Nevertheless, we emphasize that our formulation is mathematically identical to that proposed by Large et al. (1994). The mapping between the current set of parameters and the original are one-to-one; hence, no information is lost in transforming from the current set of parameters to the original ones; see Appendix C for details. With regard to the numerical implementation, we do not use enhanced diffusivity as explained in the appendices of Large et al. (1994). Our objective is to calibrate the free parameters

Model Calibration Against LES Solutions
We outline a Bayesian method for optimizing and estimating the uncertainty of the four free parameters through a comparison of the parameterization solution for T(z, t; C) and the output̄(z, t) of the LES simulations. First we introduce a loss function to quantify the parameterization-LES difference, We choose the square error in space to reduce the sensitivity to vertical fluctuations in the temperature profile. We take the maximum value of the squared error in time for t ∈ [t 1 , t 2 ] to guarantee that the temperature profile never deviates too far from the LES simulation at each instant in time. The parameterization is taken to be the KPP model given by Equations 9 through 15, and the data are the horizontally averaged LES output. The initial time t 1 is chosen after the initial transition to turbulence of the LES simulations.
A natural way to extend the concept of loss functions to account for parameter uncertainty is to introduce a likelihood function for the parameters. Similar to how the form of the loss function is critical to the estimation of optimal parameters, the form of the likelihood function is critical for estimating the parameter uncertainties. The likelihood function quantifies what we mean by "good" or "bad" parameter choices. The Bayesian method uses this information to estimate parameter uncertainties. These estimates are only as good as the choice of likelihood function, much like optimal parameters are only as good as Following  we introduce the likelihood function as the probability that parameter values explain the data P(data|C), as where (C) is the loss function which depends both on data and parameters C, and  0 > 0 is a hyperparameter associated with the likelihood function as opposed to a parameter in the parameterization. The posterior distribution, P(C|data), is then given by Bayes formula where P(C) is the prior distribution. In terms of probability densities, letting P(C) ∝ 0 (C) and P(C|data) ∝ (C) denote our prior and posterior distributions for the parameters (The proportionality sign is introduced, because Bayes' formula applies to probabilities, while 0 (C) is a probability density function.) C, Bayes formula becomes In our context Bayes' formula updates prior guesses about KPP parameter values and yields a posterior distrbution based on the LES data.
We choose the hyperparameter  0 as the minimum of the loss function (C). The minimum is found using a modified simulated annealing procedure (In simulated annealing one finds the minimum of the loss function decreasing  0 to zero as one explores the parameter space through a random walk. Here we keep updating  0 to the new local minimum every time the random walk stumbles on a set of parameters, for which (C) <  0 .) (Kirkpatrick et al., 1983). Once the parameter values C * that minimize the loss functions have been found, that is,  0 = (C * ), the likelihood of any other parameter choice C 1 is given by For example, if the choice C 1 increases the minimum of the loss function by a factor of two, that is, (C 1 ) = 2 0 , then it is 1/e less likely. The probability distribution (C) is then sampled with a Random Walk Markov Chain Monte Carlo (RW-MCMC) algorithm (Metropolis et al., 1953), described further in Appendix E.
To illustrate our choices, as well as the RW-MCMC algorithm, we show a typical output from an RW-MCMC algorithm for a 2-D probability distribution in Figure 3. We use the probability density function for the KPP parameterization presented in the next section, but keep two of the four parameters fixed (C D and C H ) to reduce the problem from four to two parameters (C N and C S ). The prior distributions for C N and C S are uniform over the ranges reported at the end of this section. The parameters C D and C H are set to the values that minimize the loss function. We show results for two arbitrary values of  0 for illustrative purposes. Starting from a poor initial guess, the RW-MCMC search proceeds towards regions of higher probability (lower loss function) by randomly choosing which direction to go. Once a region of high probability is found, in this case parameter values in the "blue" region, the parameters hover around the minimum of the loss function as suggested by the high values of the likelihood function. The orange hexagons represent the . An example of a RW-MCMC search trajectory based on a sample probability distribution for KPP parameters using 10 5 RW-MCMC iterations. The trajectories starts from a region of very low probability (white areas) and moves toward progressively higher probabilities (the darker the blue shading, the higher the probability). The blue probability distributions on the left side and the top are the corresponding marginal distributions of C H and C D , respectively. The green star is the best known optimal of the probability distribution (i.e., the mode of the probability distribution). The value of (C * ) is the value of the loss function at the green star.
process of randomly walking towards the minimum of the loss function and correspond to the "burn-in" period. The burn-in period is often thrown away when calculating statistics since it corresponds to an initial transient before the RW-MCMC settles around the minimum of the likelihood function. We see that the choice of  0 does not change the overall structure of the probability distribution but does affect how far from optimal parameters the random walk is allowed to drift.
Parameterizations such as KPP exhibit a dependence on resolution in addition to nondimensional parameters. Here we perform all calculations for a vertical resolution Δz = 6.25 m and timestep Δt = 10 min representative of those used in state of the art ESMs. We do not use enhanced diffusivity as in Large et al. (1994) for this resolution. The parameterization is relatively insensitive to halving Δz and Δt, for a fixed set of parameters, but the results are sensitive to doubling either one. Thus, the optimal parameter values and their uncertainties are only appropriate for the resolution used for the calibration and would need to be updated especially if the parameterization was run at a coarser resolution. This dependence on resolution could be handled within the Bayesian method by introducing Δz and Δt as additional parameters in the probability distribution, but we do not pursue this approach.
The temporal window used to compute the loss function is from t 1 = 0.25 days (so as to eliminate initial transients in the LES) to the final simulation day chosen to be when h ≈ 70 m. We apply the Bayesian parameter estimation procedure to KPP using data from one LES simulation in section 3.1 and from multiple LES simulations using different initial stratifications in section 3.2. We use a uniform prior distributions for the KPP parameters over the following ranges: The surface layer fraction C S , being a fraction, must stay between zero and one. The other parameter limits are chosen to span the whole range of physically plausible values around the reference values given in Equation 16. The choice of uniform distributions is made to avoid favoring any particular value at the outset.

Calibration of KPP Parameters From One LES Simulation
In this section we apply the Bayesian calibration method to the LES simulation of penetrative convection described in section 2.1 and quantify uncertainties in KPP parameters in section 2.2. The horizontal averages from the LES simulations are compared with predictions from solutions of the KPP boundary layer scheme, Equations 9 and 10. The boundary and initial conditions for KPP are taken to be the same as those for the LES simulation, that is, 100 W/m 2 cooling at the top, z T = 0.01 • C m −1 at the bottom, and an initial profile To estimate the full probability distribution function, we use the RW-MCMC algorithm with 10 6 iterations to sample the probability distributions of the four KPP parameters (C S , C N , C D , C H ). The large number of forward runs is possible because the forward model consists of a one-dimensional equation, namely, KPP in single column mode. The Markov chain leads to roughly 10 4 statistically independent samples as estimated using an autocorrelation length, see Sokal (1997). The RW-MCMC algorithm generates the entire four-dimensional PDF, Equation 18.
The parameter probability distribution can be used to choose an optimal set of KPP parameters. Of the many choices, we pick the most probable value of the four-dimensional probability distribution, the mode, because it minimizes the loss function, see Appendix D for the detailed calculation. In Figure 4a we show the horizontally averaged temperature profile from the LES simulation (continuous line) and the temperature profiles obtained running the KPP parameterization with reference and optimal parameters (squares and dots) at t = 8 days. The optimized temperature profiles are more similar to the LES simulation than the reference profiles especially in the entrainment region. Figure 4b confirms that the square root of the instantaneous loss function, the error, grows much faster with the reference parameters. The oscillations in the error are a consequence of the coarseness of the KPP model: only one grid point is being entrained at any given moment.
The improvement in boundary layer depth through optimization of the parameters is about 10%, or 10 m over 8 days. As discussed in section 2.1, the rate of deepening can be predicted analytically within 20% by simply integrating the buoyancy budget over time and depth and assuming that the boundary layer is well mixed everywhere, that is, ignoring the development of enhanced stratification within an entrainment layer at the base of the mixed layer. KPP improves on this prediction by including a parameterization for the entrainment layer. The reference KPP parameters contribute a 10% improvement on the no entrainment layer prediction, and the optimized parameters contribute another 10%. While these may seem like modest improvements, they can prevent large biases for the boundary layer depth when integrated over a few months of cooling in winter rather than just 8 days. We will return to this point in the next section when we discuss structural deficiencies in the KPP formulation.
To visualize the probability distribution we focus on 2-D marginal distributions, for example, along with the other five possible pairings, as well as the 1-D marginal distributions such as and similarly for the other three parameters.
The marginal distribution can intuitively be thought of as the total of a parameter (or pair of parameters) while taking into account the total uncertainty of other parameters. Furthermore, the marginal distribution takes into account potential compensating effects that different parameters may have on one another. The marginal distribution does not capture the effect of individually varying a parameter while keeping all the other parameters fixed at a particular value (That is, unless the other parameters have essentially delta function 1-D marginal distributions.). That is an effect represented by a conditional distribution.
Constructing the marginal distributions only requires constructing histograms of the trajectories generated by the RW-MCMC algorithm. The 2-D marginal distributions are visualized with heatmaps in Figure 5, and the 1-D marginal distributions of the corresponding parameters are shown along the outermost edges. For the 2-D marginal distributions, the dark blue regions correspond to regions of high probability, and the light blue regions are regions of low probability. The white space corresponds to regions that the RW-MCMC algorithm never visited. The 2-D marginal distributions show that parameters must be changed in tandem with one another in order to correspond to a similar model output. Furthermore, their structure is distinctly non-Gaussian.
The 1-D marginal distribution of the mixing depth parameter C H (the bottom left rectangular panel) is much more compact than that of the other three parameters suggesting that it is the most sensitive parameter. The mixing depth parameter's importance stems from its control over both the buoyancy jump across the entrainment layer and the rate-of-deepening of the boundary layer. (Again it may be useful to remember that C H is often referred to as the bulk Richardson number in the KPP literature, even though it takes a different meaning in convective simulations, see Appendix C.) The parameters C D and C N set the magnitude of the local and nonlocal fluxes. Results are not sensitive to their specific values, as long as they are large enough to maintain a well-mixed layer. The value of the surface layer fraction C S is peaked at lower values but is less sensitive to variations than C D or C H .
The uncertainties of the parameters can be used to infer the uncertainties of the temperature profile at each depth and time, predicted by KPP. To do this, we subsample the 10 6 parameter values down to 10 4 and evolve KPP forward in time for each set of parameter choices. We construct histograms for the temperature field at the final time for each location in space individually. We then stack these histograms to create a visual representation of the model uncertainty. This uncertainty quantifies the sensitivity of the parameterization with respect to parameter perturbations as defined by the parameter distributions. The histogram of temperature profiles at time t = 8 days as calculated by both our prior distribution (uniform distribution) and the posterior distribution (as obtained from the RW-MCMC algorithm) is visualized in Figure 6. We see that there is a reduction of the uncertainty in the temperature profile upon taking into account information gained from the LES simulation. The salient features of the posterior distribution temperature uncertainty are as follows: 1. 0-10 m depth: There is some uncertainty associated with the vertical profile of temperature close to the surface. 2. 20-60 m depth: The mean profile of temperature in the mixed layer is very well predicted by KPP. 3. 60-70 m depth: The entrainment region contains the largest uncertainties. 4. 70-100 m depth: There is virtually no uncertainty. The unstratified region below the boundary layer does not change from its initial value. Now that we have applied the Bayesian methodology to one LES simulation and explored its implications, we are ready to apply the method to multiple LES simulations covering different regimes in the following section.

Calibration of KPP Parameters From Multiple LES Simulations
We now use our Bayesian framework to explore possible sources of bias in the KPP model. To this end we investigate what happens when we change the initial stratification in penetrative convection simulations. This is motivated by recent work on boundary layer depth biases in the Southern Ocean (DuVivier et al., 2018;Large et al., 2019). In those studies, KPP failed to simulate deep boundary layers in winter when the subsurface summer stratification was strong.
We perform 32 large eddy simulations and calculate parameter distributions for each case. In the previous section we saw that C H is the most sensitive parameter. Thus, our focus now will be on the optimization and uncertainty quantification of C H . In the background, however, we are estimating all parameters. We keep the surface cooling constant at 100 W/m 2 for all regimes and only vary the initial stratification. The integration time was stopped when the boundary layer depth filled about 70% of the domain in each simulation. We used 128 3 grid points in the LES, ≈0.8 m resolution in each direction (Although the parameter estimates will vary upon using less LES resolution, the qualitative trends are expected to be robust.). We use a lower resolution for the LES in these trend studies as compared to those in the previous section, but results were not sensitive to this change. In the Bayesian inference, each one of the probability distributions were calculated 10 5 iterations of RW-MCMC, leading to an effective sample size on the order of 10 3 . The stratifications ranged from N 2 ≈ 1 × 10 −6 to N 2 ≈ 3.3 × 10 −5 s −2 .
We find, as visualized in Figure 7, that C H is not constant but depends on the background stratification, N 2 . The blue dots are the median values of the probability distributions, and the stars are the modes (minimum of the loss function). The error bars correspond to 90% probability intervals, meaning that 90% of parameter values fall between the error bars. The large discrepancy between the median and the mode is due to the mode being the optimal value of the entire four-dimensional distribution whereas the median only corresponds to the marginal distribution. The reference KPP value is plotted as a dashed line.
The median values and optimal values increase monotonically with the initial stratification revealing a systematic bias. Furthermore, it exposes where the systematic bias comes from: No single value of C H , Equation 15, can correctly reproduce the deepening of the boundary layer for all initial stratifications. This suggests that the scaling law for the boundary layer depth criteria is incommensurate with the LES data.
The failure of Equation 15 can be understood by going back to the buoyancy budget in Equation 7. Using the KPP estimate for the buoyancy jump across the entrainment layer, and introducing N 2 h ≡ z B (−h) for the stratification at the base of the entrainment layer to distinguish it from the interior stratification N 2 , we find that the boundary layer depth criterion, Equation 15, implies, Substituting this expression in the buoyancy budget, Equation 7, one obtains an implicit equation for the evolution of the boundary layer depth h, The LES simulation described in section 2.1, and many previous studies of penetrative convection, for example, Deardorff et al. (1980) and Van Roekel et al. (2018), show that the boundary layer depth grows as √ t . To be consistent, N h would have to scale as h 2/3 , but this is not observed in the LES simulations nor supported by theory. This suggests that we must modify the formulation of boundary layer depth, as we now describe.

Modification of the KPP Parameterization to Reduce Biases
From the multi-regime study of the previous section we found that there is no optimal KPP mixing depth parameter C H that works for arbitrary initial stratification. This prompted us to look for an alternative formulation of the depth criterion which satisfies the well-known empirical result that the boundary layer depth deepens at a rate, where c is a dimensionless constant found to be close to 3.0 with the LES simulation in section 2.1. Furthermore, c was found to be close to 3.0 across all the numerical experiments from section 3.2. Substituting this expression into the buoyancy budget, Equation 7, we find that This expression can then be used as a new boundary layer depth criterion to replace Equation 15, where C ⋆ replaces C H as the dimensionless parameter whose value sets the boundary layer depth. The value of N 2 here is the background stratication. Based on Equation 29 and our LES data, we expect Equation 30 is an implicit equation for h which guarantees that Equation 28 holds.
We now repeat the model calibration in section 3.2 but using this new boundary layer depth criterion to test whether there is an optimal value of C ⋆ that is independent of initial stratification. We estimate all KPP parameters and show the new mixing depth parameter for simulations with different initial stratifications in Figure 8. Encouragingly there is no obvious trend in the optimal values of C ⋆ and the error bars overlap for all cases. This supports the new criterion in the sense that parameters estimated in different regimes are now consistent with one another. The uncertainties in C ⋆ translate into an uncertainty in boundary layer depth prediction. In particular, values between 0.05 ≤ C ⋆ ≤ 0.2 imply a boundary layer depth growth in the range Additionally, one can check if the constants estimated following the methodology of section 3 are consistent with an independent measure directly from the diagnosed LES simulation. In particular the LES simulations suggest that C ⋆ ≃ 1/6 as per Equation 31. From Figure 8 we see that the optimal C ⋆ is smaller than 1∕6 = 0.167 (the dashed black line), and the value 1/6 is not within the confidence intervals for many of the simulations. There are several potential reasons for the discrepancy, for example, the neglect of curvature in the buoyancy budget (since we assumed a piece-wise linear buoyancy profile) or the finite resolution of the parameterization. Perhaps the most likely explanation is the difference in how the boundary layer depth was diagnosed in the LES, which need not have the same meaning as the one in KPP. A different definition in the LES simulation, such as the depth of maximum stratification, would yield a different scaling law, but still proportional to √ t . Whatever the choice, the Bayesian parameter estimation bypasses these ambiguities/inconsistencies by direct comparison with the entire horizontally average temperature profile from the LES. . This is similar to Figure 7, but using the modification from section 3.3. Here one can see that there mixing depth parameter when estimated across various regimes produces similar results. This is a desirable feature in a parameterization.
We do not explore other modifications to the boundary layer depth criterion as this would greatly expand the scope of this article. Furthermore, biases in KPP are not limited to the cases explored here, see Van Roekel et al. (2018) for discussions and remedies. The criterion described in this section assumes a constant initial stratification and a constant surface heat loss, which leads to the √ t growth of the boundary layer depth. It would be interesting to extend the criterion to arbitrary initial stratification, variable surface heat fluxes, not to mention the interaction with wind-driven mixing. The goal here is not to derive a new parameterization, but rather to illustrate and argue for a Bayesian methodology in the refinement and assessment of parameterizations.

Discussion
We presented a Bayesian approach to assess the skill of the K-Profile Parameterization (KPP) for turbulent convection triggered by surface cooling in an initially stably stratified ocean. The KPP model for this physical setting consists of a one-dimensional model with an algebraic constraint for the mixing-layer depth together with four non-dimensional parameters. These parameters consisted of an algebraic reorganization of the original KPP parameters so that terms in the equations could be associated with choices of parameters. Parameters were estimated by reducing the mismatch between the vertical buoyancy profile predicted by KPP and the area-averaged buoyancy profile simulated with a three-dimensional LES code for the same initial conditions and surface forcing. Using Bayes' formula we further estimated the full joint probability distribution of the four parameters. Furthermore, the probability distribution was used to quantify interdependencies among parameters and their uncertainty around the optimal values.
Repeating the Bayesian parameter optimization and uncertainty quantification for different initial stratifications, we found that no unique set of parameters could capture the deepening of convection in all cases. This implied that the KPP formulation does not capture the dependence of convection on the initial stratification in the simple test case considered here: constant surface cooling, constant initial stratification, no wind, and no background flow. The parameter that required re-tuning for each case was the one associated with the boundary layer depth criterion, thereby suggesting that this criterion has the wrong functional dependence on stratification. We thus reformulated the boundary layer depth criterion to capture the semi-analytical result, supported by the LES simulations, that the boundary layer depth deepens as the square root of time when the initial stratification is constant. The validity of the new formulation was vindicated because the Bayesian approach was able to find a set of parameters which captured the evolution of the boundary layer, as compared to the LES simulations, for all initial strtatifications. In this way, the Bayesian methodology allowed us identify and remove a bias in KPP formulation.
The methodology outlined here could be as easily applied to other parameterizations of boundary layer turbulence, such as those reviewed in CVMix (Griffies et al., 2015), Pacanowski and Philander (1981), Mellor and Yamada (1982), Price et al. (1986), and Kantha and Clayson (1994). It is expected that the inclusion of additional physics, such as wind-driven mixing and its interaction with convection, would also be amenable to the techniques described in this manuscript. Our experience is that progress is faster if one starts with simple idealized setups, like the ones considered here, and then move to progressively more realistic ones which accounted for variable stratification and surface heat fluxes, wind-stress forcing, background shear, surface waves, and so forth. The Bayesian method would then provide a rigorous evaluation of parameter uncertainty, parameter dependencies, and biases in the formulation of the parameterization.
Ultimately, our hope is that parameter probability distributions estimated in local regimes will serve as useful prior information for calibration/tuning of Earth System Models (ESMs). Local simulations of turbulence must be carefully designed and incorporate suites of subgrid-scale processes that have leading order impact in global ocean dynamics: surface and bottom boundary layer turbulence, surface wave effects, deep convection, mesoscale and submesoscale turbulence, and so forth. Bayesian calibration of parameterization for each subgrid-scale process will then result in probability distributions for all the nondimensional parameters associated with the parameterizations. These distributions can then be used as prior information for what is a reasonable range of values that each parameter can take, when the parameterizations are implemented in an ESMs.
With regard to calibration of ESMs, the parameterizations of different subgridscale processes may nonlinearly interact with each other and with the resolved physics. Additional calibration is then required for the full ESM. Presently, this is achieved by perturbing the parameters within plausible ranges (Mauritsen et al., 2012;Schmidt et al., 2017). The Bayesian approach provides an objective approach to determine a plausible range. The same algorithm cannot be used to calibrate the ESM, because the methodologies described here are not computationally feasible when applied to larger systems. Promising approaches to address this challenge through the use of surrogate models are described in Sraj et al. (2016) and Urrego-Blanco et al. (2016). Such models bring internal sources of uncertainty, and it is not clear to what extent one can trust a surrogate of a full ESM. One potential way to address this additional challenge is the Calibrate, Emulate, and Sample (CES) approach outlined in Cleary et al. (2020). There the surrogate model's uncertainty is estimated through the use of Gaussian processes and included as part of a consistent Bayesian calibration procedure.
Should the global problem still exhibit significant biases, even when all available prior information about parameterizations and about global data are leveraged utilizing emulators or traditional methods of tuning, then one would have to conclude that there is a fundamental deficiency in our understanding of how the different components of the climate system interact with one another, or that perhaps the models do not include some key process. For example, Rye et al. (2020) argue that glacial melt might be one such missing process which is not currently represented in ESMs. The advantage of the systematic calibration approach outlined here is that it allows us to quantify uncertainty in ESM projections and identify the sources of such uncertainty.

Journal of Advances in Modeling Earth Systems
10.1029/2020MS002108 of motion are Equations A1-A3, The buoyancy b appearing in Equation A1 is related to conservative temperature by a linear equation where 0 = 20 • C is a reference temperature, = 2 × 10 −4 ( • C) −1 is the thermal expansion coefficient, and g = 9.81 m 2 s −1 is gravitational acceleration at the Earth's surface.

A1. Subfilter Stress and Temperature Flux
The subfilter stress and momentum fluxes are modeled with downgradient closures, such that is the strain rate tensor and e and e are the eddy viscosity and eddy diffusivity of conservative temperature. The eddy viscosity e and eddy diffusivity e in Equation A5 are modeled with the anisotropic minimum dissipation (AMD) formalism introduced by Rozema et al. (2015) and Abkar et al. (2016), refined by (Verstappen, 2018), and validated and described in detail for ocean-relevant scenarios by Vreugdenhil and Taylor (2018). AMD is simple to implement, accurate on anisotropic grids (Vreugdenhil & Taylor, 2018) and relatively insensitive to resolution (Abkar et al., 2016).

A2. Numerical Methods
To solve Equations A1-A3 with the subfilter model in Equation A5, we use the software package "Oceananigans.jl" written in the high-level Julia programming language to run on Graphics Processing Units, also called "GPUs" (Besard, Churavy, et al., 2019;Besard, Foket, et al., 2019;Bezanson et al., 2017). Oceananigans.jl uses a staggered C-grid finite volume spatial discretization (Arakawa & Lamb, 1977) with centered second-order differences to compute the advection and diffusion terms in Equations A1 and A2, a pressure projection method to ensure the incompressibility of u, a fast, Fourier-transform-based eigenfunction expansion of the discrete second-order Poisson operator to solve the discrete pressure Poisson equation on a regular grid (Schumann & Sweet, 1988), and second-order explicit Adams-Bashforth time stepping. For more information about the staggered C-grid discretization and second-order Adams Bashforth time-stepping, see section 3 in Marshall et al. (1997) and references therein. The code and documentation are available for perusal at https://github.com/CliMA/Oceananigans.jl.

Appendix B: Parcel Theory Derivation for the KPP Boundary Layer Depth Criterion
Here we summarize the derivation of the KPP boundary layer depth criterion for penetrative convection, because we could not find a succinct description in the published literature. Following Deardorff et al. (1980) we consider the vertical momentum equation for a parcel punching through the entrainment layer, where b ′ is the buoyancy of the parcel, assumed to be equal to the mixed layer value, andb is the area mean buoyancy profile in the entrainment layer. This equation holds if the area occupied by sinking plumes is small compared to the total area so thatb is a good proxy for the buoyancy in the environment around the plumes and b ′ −b represents the buoyancy force experienced by the parcel. The parcel velocity decelerates from w ′ ≡ w e at the mixed layer depth (z = −h + Δh) to zero at the boundary layer depth (z = −h) where turbulence vanishes. Assuming that the background stratification N 2 e is approximately constant in the entrainment layer we also have b ′ −b = N 2 e (−h + Δh) − N 2 e z. The momentum equation can then be integrated from z = −h + Δh to z = −h, assuming that the background stratification N 2 e is constant in the entrainment layer. Introducing Δb as the difference between the environment buoyancy in the mixed layer and that at the base of the entrainment layer, we have Δb = N 2 e Δh, and hence, and (Deardorff et al., 1980) assumes that w e ∝ w * (− h + Δh). The criterion for diagnosing the boundary layer depth follows from this relationship; h is defined as the first depth z below the ocean surface, where for some universal constant C H . In the main text we show this scaling fails to predict the rate of deepening of the boundary layer depth in LES simulations. Further analysis, not reported here, show that this failure stems from relationship (B3) which is not supported by the simulations.
Equation B4 is often referred to as a critical Richardson number criterion which may seem odd given that no Richardson number appears in the expression. This is best understood if one extends the criterion to the case when there is a momentum shear in the boundary layer, typically induced by mechanical stresses, such that in addition to a density jump Δb(z) there is also a momentum jump Δu(z) across the entrainment layer. The entrainment layer base is then found where the Richardson number matches a critical value Ri c , The rationale behind this extended criterion can be found in Large et al. (1994). For the purely convective limit Δu(−h) = 0 and the dependence on Ri c drops out.

Appendix C: Relationship Between the Model in Section 2.2 and Large et al. (1994)'s Formulation of KPP
The formulation of KPP in Section 2.2 represents an algebraic reorganization of the formulation proposed by Large et al. (1994). The two formulations are mathematically equivalent. In this appendix, we discuss in detail how the four free parameters C H , C S , C D , and C N are algebraically related to the free parameters proposed by Large et al. (1994). Large et al. (1994)'s formulation of KPP for the case of penetrative convection with no horizontal shear introduces six nondimensional parameters: the Von Karman constant = 0.4, the ratio of the entrainment flux to the surface flux T = 0.2, a constant that sets the amplitude of the non-local flux C * = 10, a constant that ensures the continuity of the buoyancy flux profile c s = 98.96, the surface layer fraction = 0.1, and a parameter that controls the smoothing of the buoyancy profile at the base of the boundary layer depth C v . Large et al. (1994) argue that C v can take any value between 1 and 2. We set the reference value C v = 1.7, which corresponds to the strong stratification limit in the model proposed by Danabasoglu et al. (2006) and given by equation (8.184) in Griffies et al. (2015).
In our formulation we introduce four parameters which are related to the original Large et al. (1994) parameters as follows, We are able to reduce the number of parameters from six ( , c s , C V , T , , C * ) to four (C H , C S , C D , C N ), because in the case of penetrative convection the two combinations C v ( T ) 1/2 and c s 4 always appear together.

10.1029/2020MS002108
Using the reference KPP parameter values reported above, our parameters take the values: C H = 0.956, C S = 0.1, C D = 1.36, C N = 6.3275.
We refer to these as the reference parameters.
It is worth commenting why the critical Richardson number, the focus of much literature on KPP, does not appear when considering penetrative convection. The boundary layer depth is determined implicitly through equations (21) and (23) in Large et al. (1994), where B is buoyancy and B r is the average of B between the surface and the depth z. The boundary layer depth is defined as the depth z = −h where Ri b (−h) = Ri c . For convection without shear, the case considered in this paper, |V r − V(z)| 2 = 0 and w s = w * (c s ) 1∕3 4∕3 . The two equations can therefore be combined together: and the critical Richardson number drops out from the expression. This expression further supports our decision to introduce the single parameter C H in favor of the combination of original parameters appearing on the left hand side of (C4). In penetrative convection it is the parameter C H that controls the boundary layer depth rather than the critical Richardson number.
The optimal parameters and probability distributions for (C H , C S , C D , C N ) can be mapped onto ( , C v ( T ) 1/2 , c s 4 , C * ) using the inverse transformation, (C5)

Appendix D: A Primer on Uncertainty Quantification
The probability distribution of the parameters in a parameterization must quantify the likelihood that the parameters take on values other than those that minimize the loss function . To achieve this the probability distribution must satisfy two key properties: 1. In the limit of no uncertainty, the probability distribution should collapse to a delta function centered at the optimal parameter values that minimize the loss function. 2. The uncertainty of a parameter value C should increase proportionally to the value of (C).
There are many probability distributions that satisfy the above properties. We choose the following: where 0 is a uniform prior distribution,  is a loss function, and  0 is a hyperparameter.
The hyperparameter  0 sets the shape of the likelihood function exp ( −(C)∕ 0 ) and its associated uncertainty quantification. The limit  0 → 0 corresponds to no uncertainty, because the likelihood function and the probability distribution collapse to a delta function peaked at the optimal parameter values that minimize the loss function. The limit  0 → ∞ instead corresponds to a likelihood function that adds no information to reduce the uncertainty and the posterior distribution (C) is equal to the prior one 0 (C). Thus,  0 must take finite values between zero and infinity, if the likelihood function is to add useful information.
For any finite value of  0 , the probability distribution has its mode (maximum) at the optimal parameters, if the prior distribution is uniform. This can be easily demonstrated. Let C * denote the parameter values for which the loss function has its global minimum and C denote any other set of parameter values. It is then the case that (C) is smaller than (C * ) for any C, Hence, the most probable value of the probability distribution is achieved at the minimum of the loss function independent of  0 for a uniform prior distribution.
As mentioned in section 3, it is convenient to set the hyperparameter  0 to be equal to the minimum of the loss function (C * ). This choice satisfies two key requirements. First, the uncertainties of parameters should be independent of the units of the loss function. Second, the hyperparameter  0 should be larger the larger the loss function (C * ), because the latter is a measure of the parameterization bias and the former should be larger if there is more uncertainty about acceptable parameter values.
In practice it is seldom possible to find the global minimum of  and instead we adopt a "best guess" of the optimal parametersC and set 0 = (C). Since (C * ) ≤ (C), our choice is conservative because a larger  0 corresponds to more uncertainty.