Advancing Eddy Parameterizations: Dynamic Energy Backscatter and the Role of Subgrid Energy Advection and Stochastic Forcing

Viscosity in the momentum equation is needed for numerical stability, as well as to arrest the direct cascade of enstrophy at grid scales. However, a viscous momentum closure tends to over‐dissipate eddy kinetic energy. To return excessively dissipated energy to the system, the viscous closure is equipped with what is called dynamic kinetic energy backscatter. The amplitude of backscatter is based on the amount of unresolved kinetic energy (UKE). This energy is tracked through space and time via a prognostic equation. Our study proposes to add advection of UKE by the resolved flow to that equation to explicitly consider the effects of nonlocality on the subgrid energy budget. UKE can consequently be advected by the resolved flow before it is reinjected via backscatter. Furthermore, we suggest incorporating a stochastic element into the UKE equation to account for missing small‐scale variability, which is not present in the purely deterministic approach. The implementations are tested on two intermediate complexity setups of the global ocean model FESOM2: an idealized channel setup and a double‐gyre setup. The impacts of these additional terms are analyzed, highlighting increased eddy activity and improved flow characteristics when advection and carefully tuned, stochastic sources are incorporated into the UKE budget. Additionally, we provide diagnostics to gain further insights into the effects of scale separation between the viscous dissipation operator and the backscatter operator responsible for the energy injection.


Introduction
Mesoscale eddies play an important role in determining ocean circulation.They contain a large part of the kinetic energy (KE) of the ocean, contribute to the transfer of heat and properties, and impact the form and evolution of ocean currents.Their horizontal size is proportional to the Rossby radius of deformation, which reaches up to 200 km in the low latitudes, decreasing to less than 10 km in high latitudes.In addition, the Rossby radius decreases in shelf areas reflecting weak density stratification and small depth.These variations in the Rossby radius govern the varying size distributions of mesoscale eddies in the global oceans.
Mesoscale eddies are generated through different types of instabilities, with one of the most prominent sources being baroclinic instabilities.Baroclinic instability releases available potential energy (APE) maintained by the mean forcing of the ocean, transferring it into eddy kinetic energy (EKE) across a range of scales near the Rossby deformation radius (Ferrari & Wunsch, 2009).
A direct cascade of enstrophy to small scales and an inverse cascade of energy to large scales usually accompany the dynamics of mesoscale eddies.Eddy kinetic energy is partly transferred to mean kinetic energy, but the rest of the upscale transfer is stopped by large-scale friction, eddy killing by winds at the surface, interactions with topography, or wave generation.Enstrophy and some energy go downscale, reaching grid scales where they need to be dissipated through horizontal eddy viscosity in global ocean models.In nature, at even smaller scales of the cascade, the flow transitions to ageostrophic turbulence and waves, and finally to three-dimensional turbulence, the energy of which is converted to heat by molecular dissipation.The dynamical mechanisms associated with upand down-scale energy cascade of meso-and submeso-scale eddies were investigated, for example, in the works of Schubert et al. (2020), Contreras et al. (2023), and Srinivasan et al. (2023).
In climate studies, ocean models are integrated over hundreds of years, which limits their resolution to coarse (around 1°) or eddy-permitting resolutions (around 1/4°) (Hewitt et al., 2020).Baroclinic instability in an ocean model is not resolved at coarse resolution, and eddy-driven transfers of buoyancy and other properties are absent.The APE cannot be converted to EKE; it has to be taken out by parameterizations compensating for the missing eddies.This is generally done by the Gent-McWilliams (GM) parameterization (Gent, 2011;Gent & McWilliams, 1990), which introduces the so-called eddy bolus velocities.Bolus velocities model the eddy-driven property fluxes and release the APE.In addition to GM, the missing mixing by eddies along isopycnal surfaces is parameterized by isopycnal diffusion (Redi, 1982).
Horizontal grids with a cell size around 1/4°or 1/6°are often described as "eddy-permitting."Such grids are sufficiently fine to represent large eddies and simulate baroclinic instability in parts of the ocean.The GM parameterization must be carefully tuned on eddy-permitting meshes, as described in Hallberg (2013).However, the range of resolved scales on such meshes is not large enough, and viscous closures (e.g., Fox-Kemper et al., 2008) intended to eliminate enstrophy and energy at grid scales also affect the scales where eddies are generated by baroclinic instability and where the bulk of EKE is residing.As a result, both EKE and eddy generation are excessively dissipated and damped.Until the resolution reaches the level of resolving submesoscale dynamics (generally finer than 5 km at midlatitudes), the entire range of scales, including large scales, will be exposed to over-dissipation, as illustrated, for example, by Soufflet et al. (2016).It leads to an underestimated transfer of heat, salt, momentum and misrepresentation of the mean dynamics of the ocean and the forcing sensitivity of models.
For a more accurate ocean simulation and better representation of eddy dynamics, energy dissipated due to horizontal viscosity should be returned back to the system.The kinetic energy backscatter parameterization proposed for the ocean in Jansen et al. (2015) and developed further by Juricke et al. (2019) is intended to help in such situations.By reinjecting energy, energy backscatter transfers the excessively dissipated energy back to the scales of eddy generation.It thereby compensates for the over-dissipation of the large scales and energizes the entire range of scales.
The concept of energy backscatter in its deterministic and stochastic forms has a long history of research in atmospheric and ocean sciences.Physical and numerical approaches to the compensation of excessive energy losses for atmospheric parameterization were mentioned in the works of, for example, Berner et al. (2009), Leutbecher et al. (2017), and Dwivedi et al. (2019).Idealized ocean models were enhanced by backscatter to account for the dynamics of unresolved mesoscale eddies in the works of for example, Frederiksen et al. (2013), Jansen and Held (2014), Jansen et al. (2015), and Zanna et al. (2017).Kinetic energy backscatter need not involve additional prognostic variables, such as the kinematic backscatter implementation proposed in Juricke et al. (2020aJuricke et al. ( , 2020b)).Juricke et al. (2020aJuricke et al. ( , 2020b) ) compute a new antiviscous operator by taking the locally averaged original viscous operator and multiplying it by a tuning coefficient.Such a backscatter operator is then subtracted from the original viscous operator.This modification reduces viscous over-dissipation in some range of scales.The parameterization does not increase computational cost and significantly improves ocean simulations toward the high-resolution truth.However, it acts without using an additional budget equation to keep track of past developments of the flow, that is, this version of backscatter does not have temporal memory.
More physically grounded and reliable is the concept of dynamic energy backscatter.Here, the amplitude of backscatter depends on a prognostic subgrid energy budget (see Jansen et al., 2015;Juricke et al., 2019).The subgrid kinetic energy budget controls how the excessively dissipated energy is returned back to the resolved scales.Other subgrid budgets were recently developed, for instance, by Uchida et al. (2022) who propose a subgrid potential vorticity model in a quasi-geostrophic framework.This study focuses on the subgrid kinetic energy budget of Juricke et al. (2019).It controls how the excessively dissipated energy is returned back to the resolved scales.Our contribution to the theory and practical use of the dynamic kinetic energy backscatter is threefold: (a) adding an advection term to the prognostic subgrid energy budget, (b) adding a stochastic term to the subgrid energy budget accounting for missing small-scale energy transfers, and (c) changing the scale of backscatter to investigate how the scale of energy reinjection is affecting the energy spectrum of the resolved flow.
A set of numerical simulations is used to investigate the consequences of these three changes to backscatter.We run the Finite-volumE Sea ice-Ocean Model (FESOM2, Danilov et al., 2017;Scholz et al., 2019) for two middle complexity setups: a channel setup and a double-gyre setup, described in detail in Section 2.4.Channel simulations allow us to compare results with the previous works mostly tested on the channel setup (e.g., Juricke et al., 2020aJuricke et al., , 2020b)).However, it has several disadvantages, such as high variability of area-integrated kinetic energy due to the channel's narrowness or a lack of spatial separation between regions of release and dissipation of energy.On the other hand, the double-gyre setup has more defined areas of creation and dissipation of kinetic energy and a longer zonal direction that allows eddies to develop and evolve in space.It also has the advantage of being more intuitively understandable and closer to reality, as it represents the idealized physical processes of subpolar and subtropical gyres in the North Atlantic or North Pacific basins.The double-gyre setup can be extended to include more complicated coastlines and bottom topography to create an even more realistic representation of basin dynamics.
The outline of the article is as follows.We begin in Section 2 with the model essentials, which include the methodology used to create the new components of the subgrid energy budget for energy backscatter, the description of the two modeling setups that we use to test the implementations and the diagnostics used to investigate the effect of the new components.Section 3 describes the results and improvements achieved in simulations whereas the advection and stochastic components in the subgrid energy budget are applied independently and simultaneously.The paper closes with discussions and conclusions in Section 4.

Equations of Motion
We solve the primitive equations in idealized ocean basins with eddy viscosity and backscatter.The horizontal momentum equation reads where u = (u, v, w) denotes the full three-dimensional velocity field, u h = (u, v) and ∇ h = (∂ x , ∂ y ) the horizontal velocity field and horizontal gradient, respectively, f the Coriolis parameter, e z the unit vertical vector, p the pressure, ρ 0 the reference density, V(u h ) the horizontal eddy viscosity, B(u, e) the backscatter operator, described in more detail below, and ν v the coefficient of vertical viscosity.
The vertical momentum equation reduces to hydrostatic balance in the form where g is the gravitational acceleration and ρ is the deviation of density from its reference value ρ 0 ; b denotes buoyancy.
The equations for potential temperature T and salinity S take the form where K is the diffusivity tensor in the form of a symmetric 3 × 3 matrix that aims at minimal mixing of tracers across surfaces of isoneutral density, and ∇ = (∇ h , ∂ z ) denotes the gradient in three dimensions.We assume a linear equation of state, in particular, density is linearly dependent only on temperature; salinity tracer stays constant in time.In this case, isoneutral K implies no mixing.
To compute (horizontal) kinetic energy contained in Equation 1, one takes the scalar product of the horizontal momentum equation (Equation 1) By integrating over the entire domain, the viscous operator acts as a domain-averaged sink of kinetic energy, whereas the backscatter operator serves as a domain-averaged source of energy.The term including pressure in Equation 5 serves as an input term for the kinetic energy equation, and a more detailed discussion will be provided in Section 2.7).Here, our focus will be on the viscous and backscatter terms, which are connected through the subgrid energy budget equation presented below.
The horizontal viscosity operator in Equation 1 is biharmonic and has the form described in Juricke et al. (2020aJuricke et al. ( , 2020b)), which was found to be minimally dissipative for FESOM.The backscatter term in Equation 1 is antidissipative.As in Jansen et al. (2015) and Juricke et al. (2019), the amplitude of its coefficient varies locally for each grid cell and is based on a budget of unresolved (subgrid) kinetic energy (UKE) e = e(x, y, z, t), which satisfies the prognostic equation UKE has the dimension of energy density per unit mass.In the numerical implementation, e is determined for each velocity control volume.In the first term on the right-hand side of the equation, Ėdis is a kinetic energy source diagnosed from the dissipative term in the horizontal momentum Equation 1. c dis is a parameter that represents the fraction of direct energy cascade to microscales.If c dis is smaller than 1, part of the kinetic energy goes to small scales and is dissipated.
(1 c dis ) Ėdis can be interpreted as a hidden sink term for the flow.The second term Ėback is a UKE sink (on average) and represents the rate of energy returned to the resolved flow via the backscatter operator.The last term is UKE harmonic diffusion, which redistributes subgrid energy and has a significantly smaller magnitude than the other terms.ν C is a diffusion coefficient following Jansen et al. (2015) but the amplitude of this coefficient is of minor importance (cf. the discussion in Juricke et al. (2019)).
The feedback of Equation 6to Equation 1 is established through the backscatter operator B(u, e), which is a discrete harmonic operator chosen as in Juricke et al. (2019, Equation B.2).This operator requires the specification of a cell-local backscatter coefficient ν B , which, for grid cell c, is chosen as where the c 0 determines the rate at which dissipated energy is reinjected into the resolved flow; it is constant across the grid.Further, S c corresponds to the area of the top face of cell c and e c is the UKE for cell c.
To reduce the contribution from grid-scale fluctuations and to control the scales at which energy is injected into the momentum equation via backscatter, it is necessary to apply a smoothing filter within the following terms: the UKE source term Ėdis , the backscatter term Ėback , and the backscatter contribution B(u, e) to the momentum Equation 1 (for a discussion, see Juricke et al., 2019).The filtering procedure is implemented by repeated application of a single averaging operator that averages cell centroid quantities to the common cell vertex and then averages the new vertex quantities back to the cell centroids.The optimal choice of filtering for Ėdis , Ėback and B (u, e) employed here follows the recommendation of Juricke et al. (2019) with some modifications that will be discussed later.

Deterministic Backscatter With Subgrid Energy Advection
The existing implementations of dynamic kinetic energy backscatter by Jansen et al. (2015), Juricke et al. (2019Juricke et al. ( , 2020b)), and Klöwer et al. (2018) are either considering the balance of unresolved (subgrid) EKE (i.e., UKE) as taking place locally or being distributed by the barotropic (vertically averaged) flow (Jansen et al., 2019).This is arguably a simplification, as UKE should be transported by the fully resolved 3D flow, and a question arises whether ignoring this transport is a good approximation.Indeed, one may expect that input (i.e., generation) of subgrid energy and its dissipation are not colocated, and the UKE density at a given point is influenced by its input in regions upstream.Only when the flow statistics are homogeneous in the direction of the mean flow (e.g., a uniform zonally re-entrant channel flow) can the advection be assumed to be of minor importance.But even in such cases, eddies may be strong enough to introduce inhomogeneities affecting the distribution of UKE in space.
There is strong evidence that the non-local nature of energy transfers is influenced, for example, by the background state, specific region (such as the Gulf Stream, Kuroshio, subtropical gyre, etc.), as well as the locally prevalent dynamic regime (Chen et al., 2014(Chen et al., , 2016;;Jamet et al., 2022;Kang & Curchitser, 2015).
Considering the inclusion of UKE advection as an extension to the dynamic backscatter of Equation 6 introduces the challenge of selecting adequate advective velocities, adding complexity to the model.We chose to implement advection of UKE by the full resolved 3D flow, assuming that additional subgrid advection by unresolved processes is negligible.Alternatives exist, such as, for example, Jansen et al. (2019), where barotropic subgrid energy is advected by the resolved 2D barotropic flow.While stochastic advection is another potentially interesting approach, constraining it presents its own set of challenges, as discussed in the literature on stochastic advection (e.g., Hu and Patching (2023)).
Consequently, we extend Equation 6 by incorporating full advection of UKE in three dimensions by the velocity field of the resolved flow.The subgrid energy budget equation with the new term has the following form where u is the resolved three-dimensional velocity.We study the effect of UKE advection using the channel and double-gyre setups described in Section 2.4.The flow in the channel setup is statistically homogeneous in the zonal direction so that the regions of predominant kinetic energy production and dissipation statistically coincide.This makes it more challenging to analyze the direct effect of the subgrid advection term on local energy transfers.
In the double-gyre setup, these regions are separated, which can help to interpret the effects of UKE advection.
We will demonstrate in Section 3.3 that accounting for UKE advection leads to consistent but modest improvements compared to control simulations in which the advection of UKE was ignored.This conclusion holds even for the channel setup with zonally homogeneous mean flow.The introduction of the advection term in the subgrid equation does not negatively affect the numerical stability of the model, even though Juricke et al. (2019) speculated that this might be the case.

Stochastic Backscatter
Stochastic backscatter can offer more freedom in how to return energy to the resolved scales than deterministic backscatter and can also be used to represent missing variability and subgrid uncertainties.However, the question of the optimal form of the stochastic contribution in backscatter schemes remains open.Among existing studies, stochastic eddy forcing is applied to the QG model in Mana and Zanna (2014); stochastic parameterizations extracting information from the subgrid eddy statistics are studied in Grooms and Majda (2013), Grooms et al. (2015); stochastic forcing is applied to velocity and temperature equations in Cooper (2017); stochastic perturbations are tested on various parameterization schemes in Juricke et al. (2017).Perezhogin (2019) develops and compares deterministic and stochastic kinetic energy backscatter schemes for the primitive equations of the ocean.Li et al. (2023) proposed a stochastic modeling approach for mesoscale eddies within the framework of a quasi-geostrophic model, relying on the decomposition of the Lagrangian velocity.The interest of the ocean modeling community in stochastic schemes remains high and is expected to increase further during this decade (Fox-Kemper et al., 2019).
We propose to combine the deterministic backscatter with a stochastic approach by adding a new stochastic term to the UKE budget.Among several available options for representing stochastic aspects of unresolved eddies, our assumption is that the absent subgrid energy should closely mirror the structure of the high-resolution EKE.Therefore, the new term is designed to improve the simulated eddy variability using data from a high-resolution reference simulation denoted as truth.The additional stochastic term is added to the UKE equation (Equation 6).It aims to add missing spatial and temporal variability.
To generate correlated patterns for the stochastic forcing, we first ran a higher-resolution, 10 km simulation and calculated surface EKE for every mesh element for each simulated day of a 9-year simulation.Then we coarsegrained the field to the eddy-permitting mesh (i.e., 20 km, see Section 2.4) by calculating the average amount of EKE over four neighboring cells.The coarse-grained high-resolution EKE is then normalized (which resolves the discrepancy variability amplitude between EKE and UKE) and decomposed into empirical orthogonal functions (EOFs) and the corresponding set of principal components (PCs) that reflect the temporal dynamics of each EOF.
We then retain only the leading EOF with the largest contribution to the total variance.Here, we choose the cutoff at 50% of the total explained variance, thereby reducing the number of EOFs from thousands to dozens.
The main motivation for using the EKE structure of the high resolution for the stochastic contribution to the UKE budget is based on the assumption that the EKE distribution is not accurately represented at a coarser resolution.
We use the stochastic term to partly-via the amplitude of the stochastic term-compensate for missing EKE activity at a coarser resolution.Such terms could be added directly to the right-hand side of the momentum equations.However, we opted for an inclusion into the existing UKE budget for this study, to employ the backscatter strategy as a means to ultimately inject the energy at the adequate scales.
We also attempted to use data on the difference between coarse and fine resolution runs for the EOF decomposition (see Section 3.6 for more information).However, we decided against it due to the lack of a clear physical argument in favor due to the distinct evolution of processes in these simulations.Another supplementary experiment was conducted using the coarse-grained kinetic energy tendencies derived from the fine-resolution simulation.The tendency data exhibited substantial fluctuations, leading to an increase in the number of EOFs on multiple occasions.We did not find sufficient evidence confirming enhanced model performance despite the considerable increase in EOF dimension that needed to be included.Hence, we ultimately opted to use the fineresolution EKE data.
Based on the EOF decomposition, we introduce a new stochastic term in the UKE equation, Equation 6, which now reads The summation is over i, the ordinal number of the EOF, and C 1 is a constant with dimensions s 1 .Each EOF i has dimensions equal to the number of control volumes in a horizontal layer, and each PC i is a numerical value representing a realization of a stochastic process.
The corresponding PCs follow Ornstein-Uhlenbeck processes where the dW is an increment of the standard Wiener process and the mean reversion rates μ i are determined by fitting the Euler-Maruyama discretization of Equation 10, which is a first-order autoregressive process (AR(1)), to the daily mean data of the high resolution simulation.The fitting procedure assumes the acquisition of two parameters (lag-1 correlation and variance) that comprehensively characterize the AR(1) process.For simplicity, the variance parameter σ is taken the same across all the PC i , and is absorbed into the tuning parameter C 1 .
We assumed that the accumulated variance of the AR(1) process (which is equal σ 2 1 μ 2 ) has an upper limit equivalent to the variance of the EKE field in the fine-resolution simulation.The rationale behind this is to avoid injecting more than the entire variance of the fine resolution into the coarse resolution through the stochastic term.This upper limit is stringent in a nonlinear context due to two factors: (a) considering only the first N EOFs (N depends on the setup), which explain 50% of the variability, and (b) injecting via UKE.As the noise reaches resolved velocities, it undergoes several rounds of smoothing, generally reducing variance.Hence, we have redirected our attention from attempting to quantitatively replicate a portion of the high-resolution variance through C 1 to understanding C 1 as a tuning parameter.This approach enables a comparison of the stochastic term of UKE with other UKE components (the specific values of C 1 are detailed in Section 3).
The prefactor e of the stochastic term in Equation 9is a heuristic choice, corresponding to multiplicative noise.Multiplicative noise tends to outperform simple additive noise (see, e.g., Berner et al., 2017) as it scales with the prognostic properties.The argument is that with low levels of UKE, also the associated uncertainty tends to be low.At high levels of UKE, however, one assumes large levels of associated uncertainties (see, e.g., the description of the stochastically perturbed parameterization tendencies in Shutts, 2005, and the stochastic schemes of Juricke et al., 2017).Scaling with e also allows for a more flow-dependent injection of noise, rather than simple additive noise.It also leads to an immediate vertical scaling of the noise amplitude.To enforce vertical structure, alternative approaches could be used such as using analytical or modeled data from fineresolution vertical profiles (e.g., Yankovsky et al., 2023) or the average magnitude of UKE at a specific level, but using the readily available information via the UKE variable e is a more interpretable choice.
In Section 3, the effect of the implementations described above will be compared to the impact of the older version of the UKE budget for kinetic energy backscatter following Juricke et al. (2019), stated as Equation 6.The latter already substantially improves the mean state.Despite the general capacity of the backscatter to inject as much kinetic energy as we want, the subgrid equation is designed to limit this amount of energy input.With stochastic forcing in the subgrid equation, we could continue to increase the amount of input arbitrarily.However, it will not necessarily bring a simulation closer to the high-resolution truth but simply make it more energetic.Consequently, model stability may also become an issue.Therefore, the diagnostics introduced in Section 2.7 and the tuning of C 1 focus not only on the mean kinetic energy.Other flow variables and their variability will be considered in order to capture the overall effect of the addition of the stochastic term as part of the UKE budget.Different intensities of the data-driven stochastic term will be tested.We find that certain intensity ranges benefit the flow.However, exceeding these intensity intervals can lead to serious artifacts.

The FESOM2 Modeling Framework
This study utilizes the FESOM2 model (Danilov et al., 2017), which solves the primitive equations on a quasi-Bgrid.The surface mesh is triangular, and there are 40 vertical layers, with layer depth varying from 9 m in the top layer to 370 m in the bottom layer, dividing the domain into small triangular prisms.The two configurations employed here are vertically bounded by a flat bottom situated at a depth of 4,000 m.The bottom boundary conditions are taken as linear friction.
The viscous operator is a discrete biharmonic operator.The biharmonic viscosity coefficient is specified in terms of harmonic viscosity as , where c and c′ are the neighboring grid cells, l c′c is the length of the edge between the cells, and γ 0 , γ 1 , γ 2 are the tuning coefficients, the values of which are provided in Table A1.The coefficient γ 0 determines the amplitude of the background viscosity, γ 1 represents the non-dimensional scaling factor and γ 2 facilitates effective dissipation in situations with significant velocity differences.The missing dimensional factor is hidden in computations of the inner Laplacian, which contains cell area (for details, see Juricke et al., 2020aJuricke et al., , 2020b)).We use the Pacanowsky-Philander vertical mixing scheme (Pacanowski & Philander, 1981) for both setups.For a discussion of alternative mixing schemes, see Scholz et al. (2022).
For the Coriolis parameter, we use the β-plane approximation f = f 0 + βd, where d is the meridional distance from the zero-degree latitude.The constants here and throughout the text are chosen to agree with those originally proposed for the respective test cases discussed below, and are specified in Table A1.
We use an advection scheme with a combination of 3rd order upwind and 4th order central fluxes.Further study of other high-order schemes and their effect on the backscatter parameterization is planned (see the related discussion in Ménesguen et al., 2018;Soufflet et al., 2016).
All simulations were conducted under a "linear free surface" condition.In general, this implies a loss of exact conservation for tracers but has a negligible impact on the KE and UKE budgets.

Characteristics of the Channel Setup
The first of two test configurations is a zonally periodic channel following Soufflet et al. (2016) (Figures 1a and  1b).The size of the channel is 4.5°(about 500 km) in the zonal direction and 18°(about 2,000 km) in the meridional direction.
The initial density profile changes gradually along the meridional direction as well as vertically (Figure 2a).It is directly associated with the temperature gradient by a linear equation of state.The density gradient allows the model to form a jet in the middle of the channel.To continuously maintain a quasi-stationary turbulent regime, the zonally averaged velocity and temperature fields are relaxed to the initial mean temperature and velocity state in the entire domain.
The Rossby radius of deformation (approximately 20 km in the center and ±5 km from south to north) is governed by the predefined vertical stratification to which the model is relaxed.Thus, we choose a coarse grid consisting of equilateral triangles with 20 km edge length, which is eddy-permitting (Figure 1a), and a fine grid where the edge length is 10 km thus (barely) eddy-resolving (Figure 1b).We use Cartesian geometry for the channel setup (i.e., we replace the cosine of latitude by one).

Characteristics of the Double-Gyre Setup
The second setup follows Levy et al. (2010) and represents a double-gyre configuration, in the following referred to as the DG setup (Figures 1c-1f).It uses a rectangular domain with its left corner at 30°N, rotated by 45°.The size of the domain is 28.3°(about 3,140 km) on the long side and 21.2°(about 2,350 km) on the short side.Here, we use a mesh formed of right-angled triangles instead of equilateral triangles to avoid castellated boundaries.We use spherical geometry for the DG setup.The short sides of the right-angled triangles are equal to 20 and 10 km, corresponding to the coarse (Figures 1c and 1e) and high-resolution simulations (Figures 1d and 1f).Vertical walls bound the domain on all four sides.
The initial temperature profile follows Pacanowski and Philander (1981) and Levy et al. (2010).It is rapidly nonlinearly decreasing from the surface to a depth of 500 m and slowly linearly decreasing to 0°C below (Figure 2b).There is no initial meridional temperature gradient.The initial vertical temperature stratification adjusts during the simulation based on forcing and internal mixing, but due to the depth of the setup, this process takes several decades.Surface forcing is based on a mean northern hemisphere wind stress (Figure A1b) and heat flux.Wind forcing is an essential flow driver through Ekman pumping.A sinusoidal wind stress profile forces a subpolar gyre in the north and a subtropical gyre in the south, thereby imitating North Atlantic dynamics.The heat flux can be divided into several components, that is, latent, sensible, and radiative heat flux (Levy et al., 2010).As a simplification, we only use sensible and radiative heat fluxes here.Both enter the surface directly, while radiative heating is also distributed vertically over the first couple of layers according to a solar penetration profile.The exact sensible heat flux expression used in the simulation is γ(T ocean T atm ), where γ is a transfer coefficient, which is taken to be equal to 4 W m 2 K 1 , T ocean is the sea surface temperature and T atm is the apparent air temperature (Figure A1a).The radiative heat flux (Figure A1c) considers the losses due to cloudiness, reflection, and albedo.Latent heat flux due to evaporation is neglected, and so is any freshwater flux (i.e., salinity is constant throughout).The annual cycle was removed compared to a similar configuration of Levy et al. (2010).

Summary of Configurations
Figures 2c and 2d show the stratification of both setups.It is evident that the double-gyre setup has a more complex vertical stratification that changes with integration time until it reaches a (quasi-)equilibrium state, while for the channel, stratification is continuously relaxed back to the initial state, effectively constraining the timemean circulation while allowing eddy variability.The full effect of backscatter on the large-scale mean circulation can therefore only be analyzed in detail with the double-gyre setup.
Overall, we use the DG setup as an extension of the idealized zonally periodic channel setup as, in this case, regions where eddies are generated and where they are dissipated are not necessarily statistically colocated.It is also longer in the zonal direction, which allows multiple eddies to develop and evolve in space.One of our aims is to understand how the complexity of the setup affects the efficiency of the initial UKE equation (Equation 6) and the new UKE components of Equations 8 and 9 implemented in this study.
Table 1 provides a summary of simulation configurations.The 10 km simulation is the high-resolution reference for both configurations, the 20 km is the low-resolution reference without backscatter.The other simulations are also on the low-resolution 20 km grid and include backscatter with and without the advection and stochastic terms in the UKE equation.

Spin-Up
Both setups start with the prescribed temperature stratification and a small initial perturbation.The perturbation leads to the emergence of turbulence in a short time, as evidenced by the growth of kinetic energy over the first year (Figure 3) and by the presence of eddies in the vorticity field (not shown).The channel simulation reaches a statistically steady state after a little more than 1 year, maintained by the relaxation of the velocity and temperature fields.For our diagnostics, we thus take 9 years after a single spin-up year.In the DG setup, isopycnals become inclined because of Ekman pumping in the southern part of the domain and Ekman suction in the northern part of the domain as a consequence of the sinusoidal wind forcing.This process is much slower, so we require a 50-year spin-up to reach a quasi-equilibrium state.
Besides the difference in spin-up time, Figure 3 also indicates different levels of surface KE fluctuation between the two setups.The comparatively larger fluctuations in the channel versus double-gyre setup are explained by the fact that the channel is narrow in the zonal direction and, therefore, cannot host many eddies simultaneously.As a result, the resolved daily EKE changes greatly along the eddy life cycles.To mitigate the impact of fluctuations, we use 9-year averaging for both setups, that is, a simulation length of 9 years after the respective spin-up.

How Much Filtering Is Necessary?
In both deterministic and stochastic energy backscatter parameterizations, one has to decide about the scale of energy injection.Spatial smoothing applied to the injection ensures a scale separation between energy reinjection and energy dissipation (see discussion, e.g., in Grooms ( 2023)).Spatial filtering operators commonly involve only the nearest discrete cells for the reason of parallel implementation.Every cycle of spatial filtering applied to the operators increases the scales on which these operators act.Both over-smoothing and insufficient smoothing hamper performance of the backscatter term (e.g., Juricke et al., 2019).
Understanding scale separation is also essential when several parameterizations are applied simultaneously.Jansen et al. (2019) consider a generalized energy-based parameterization that combines the GM parameterization and backscatter approach proposed in Jansen et al. (2015).The GM parameterization dissipates APE at the grid scales and represents the effect of the conversion of APE into EKE.However, GM classically ignores the respective EKE input into the momentum equations.A significant result of Jansen et al. (2019) is the opportunity to freely tune the model between non-eddy-resolving and eddy-resolving regimes by coupling GM to the backscatter parameterization.
We aim to better understand the effects of scale separation between energy injection and dissipation, especially when additional advection of UKE is incorporated into the subgrid energy budget.The initial choice of smoothing filter cycles in this study follows Juricke et al. (2019), with two cycles assigned to the UKE source term Ėdis and the backscatter term Ėback in the UKE equation (Equations 6, 8, or 9), and four cycles for B(u, e) in Equation 1.This choice corresponds to (2,2,4) in Table 1.Further optimization may be possible.However, as discussed by Juricke et al. (2019), the choice of filtering options may also be influenced by the desired resolution of the simulations.Juricke et al. ( 2019) increased filtering because, at higher resolutions, a greater number of filtering cycles was required to achieve sufficient scale separation and maintain stability in backscatter simulations at 10 km compared to 20 km resolutions.Furthermore, the use of filters raises the question of whether smoothing grid-scale contributions could interfere with the impact of the newly implemented advection term in the UKE equation.Advection and smoothing both affect where and at which scales energy is reinjected.
In our study, we ran additional simulations where we reduced the number of filter cycles for the contribution of backscatter in the momentum equation to zero (i.e., for B(u, e) in Equation 1).We then analyze the effect of filtering via spectral diagnostics.This completes the study of Juricke et al. (2019) with additional spectral estimates based on newly developed Fourier techniques tailored for unstructured grids (Juricke et al., 2023).

Diagnostics
We examine a set of mean quantities calculated for each vertical layer z to diagnose the effect of our changes in the subgrid equation.As a main diagnostic, we use vertical profiles of the area-averaged layer-wise mean eddy kinetic energy where A i denotes the area of grid cell i, and the overbar denotes the time average of 9 years.We also examine the vertical profiles of the root mean square of vertical velocity anomalies where j denotes the vertex index and B j is the area of the median-dual cell associated with vertex j.As they show the amplitude of the time-averaged vertical velocity fluctuations for each vertical layer, they enable the detection of vertical fluctuation anomalies that may appear due to the wrong viscosity and backscatter settings.The different cell areas in Equation 11versus Equation 12 arise because in FESOM2, scalars and pressure are located on vertices, while horizontal velocities are located on centroids of triangles.The vertical velocities are computed at the scalar locations, too.Lastly, vertical profiles of buoyancy flux, which characterize the vertical profile of the release of APE, are computed as A too-high RMS vertical velocity or big changes in the profiles of APE to EKE conversion (Equation 13), as a rule, indicate an excitation of unphysical small-scale waves, which signal the need for tuning dissipation and/or backscatter.
To diagnose the impact of the backscatter schemes on APE to EKE conversion, we take a closer look at the pressure gradient work term 1 5, which is a source term for kinetic energy.Using incompressibility and hydrostatic balance (Equation 2) we obtain wb is a local measure of APE to EKE conversion.Crucially, the EKE equation contains a redistribution term which is associated with the divergence of pressure fluxes.The relative role of the pressure gradient power term in Equation 14 can be reduced by integration over a sufficiently large domain, and it will be zero if the integration is extended to the entire domain.A positive buoyancy flux releases APE and is linked to an increase in KE.
A similar expression holds for the eddy part of the pressure gradient power and buoyancy flux.In this study, we focus on the eddy part w′ b′ and take it as a local diagnostic for the transfer from APE to EKE even though, strictly speaking, it only holds in an (sufficiently large) area-integrated sense.
Another metric, to be utilized later, is the root mean square of sea surface height, defined as where i is an element of the surface layer and N is the total number of elements.
As an essential part of diagnostics, we compute the horizontal power spectra of the different contributions to the viscous and backscatter parameterizations.Spectra are computed based on the data of the surface layer.In order to use the discrete Fourier transform, we interpolate velocity and dissipation tendency fields to a regular quadrilateral grid.Then the 2D spectra are condensed to 1D spectra over the module of wave number.We apply cubic interpolation for zonal and meridional velocities to compute kinetic energy spectra and nearest-neighbor interpolation for zonal and meridional velocities and dissipation tendencies to compute the dissipation power following the results of Juricke et al. (2023).The motivation for different interpolation methods is the smooth nature of the kinetic energy field and the non-smooth, discrete representation of the dissipation and backscatter operators.
As the DG setup was simulated and analyzed assuming a spherical geometry, it was necessary to convert the grid and vector fields into Cartesian coordinates before performing interpolation.We first transformed the mesh and velocities to a new spherical system of coordinates such that the center of the domain is at the equator.After this transformation, we selected the central rectangular area of the domain (see the box in Figures 1e and 1f) for further interpolation and Fourier transform.Spectra are computed as an average of the daily output for 9 years with the smallest resolved wavelength corresponding to wavenumber π/h, where h is the height of an equilateral grid triangle (see discussion in Juricke et al., 2023).We denote as h c the height of the coarse grid triangle and as h f the height of the fine grid triangle in the channel.In the case of the DG setup, the minimum wavelength is 2h, that is, wavenumber π/h, where h is the smaller side.The limiting wavenumber depends on direction: it is π/h along small sides and

̅̅ ̅ 2
√ π/h in the direction along and perpendicular to the large side.Since we are willing to discuss spectra averaged over angles, we have to stop at π/h.As a final diagnostic, here specifically for the DG setup, we evaluate vertical density profiles.As mesoscale eddy parameterizations ultimately strive to reproduce a precise representation of the ocean stratification, we examine the alignment of the isopycnal contours with those of the reference simulation.

Eddy-Permitting Simulations Suffer From Reduced Kinetic Energy
To assess the effects of incorporating the new components into the subgrid energy budget, we first look at changes in eddy kinetic energy (Figures 4a and 4b) for the simulations that only have the viscosity parameterization.Comparing the simulation results for "20 km" (gray line) and "10 km" (black line), we observe that the lowresolution simulation has a significant EKE deficit for the DG setup, even more than in the channel.In numerical terms (Table 2), surface EKE of the low-resolution simulation corresponds to 77% for the channel and 33% for the DG setup of the high-resolution simulation.Regarding surface KE, the respective numbers are 82% and 44%.
Variability of the vertical velocity also differs greatly between the two resolutions (Figures 4c and 4d), but here with opposite tendencies between the two setups.For the DG setup, vertical fluctuations at low resolution are larger at the majority of the depth levels, while it is the opposite for the channel (consistent with Soufflet et al. (2016)), but also located at greater depth as compared to the high-resolution reference.The changes in w RMS may reflect the change in stratification in the DG setup associated with the resolution modification.Note.The top-performing results for each diagnostic are set in red.It should be noted that the high-resolution reference is not the final truth when it comes to, for example, KE, as at even higher resolution, we expect even higher KE levels.Consequently, achieving above 100% can be seen as beneficial.The vertical level at which the maximum of b′w′ occurs (max b′w′) varies between the channel and DG configurations.The vertically integrated buoyancy flux (vert.int.) is taken over the top 500 m for the double-gyre setup.

Journal of Advances in Modeling Earth Systems
10.1029/2023MS003972 BAGAEVA ET AL.
Buoyancy fluxes, which serve as an indicator of APE release, are substantially reduced at low resolution for both simulations (Figures 4e and 4f), especially the near-surface peak is much weaker.In the DG setup, moreover, a significant reduction of energy production is observed along the entire water column.

Dynamic Backscatter Enhances Kinetic Energy
We first switch on dynamic backscatter as in Juricke et al. (2019).This improves all diagnostics on the coarse grid toward the values on the fine grid (solid blue line in Figure 4).We note, in particular, that the point of maximum vertical velocity variability in the DG setup moves closer to the surface, as it should (Figure 4d).Moreover, the upper part of the buoyancy flux profile for the channel becomes more distinct with backscatter, hence agreeing with Soufflet et al. (2016) who observe a dominant peak (due to mesoscale instability) at 1,000 m depth and a secondary isolated peak (due to submesoscale instability) closer to the surface.For the DG setup, mesoscale production experiences the largest improvements (Figure 4f).

Impacts of Added Advection in the UKE Equation
When the advection term is included in the subgrid UKE equation, it improves the backscatter effect, bringing it even closer to the high-resolution truth for both setups (solid orange line on Figures 4a and 4b).For the channel setup, subgrid energy advection increases EKE beyond what is observed in the 10 km reference.This is not necessarily a negative result because we do not resolve the full eddying flow even at 10 km resolution (Soufflet et al., 2016).We expect that even the 10 km is underestimating EKE.
For both setups, the presence of advection in the subgrid equation correctly shifts the profile of RMS vertical velocity to the direction of the high-resolution truth, although the amplitude of the shift is small (Figures 4c and  4d).The profile of RMS vertical velocity is a convenient diagnostic of numerical instabilities in the deep ocean.Such instabilities may occur when background viscosity is too small (see Juricke et al., 2020aJuricke et al., , 2020b)).Here, we do not see any indication of the onset of numerical instability, with or without subgrid energy advection.In the DG setup, vertical velocity variability even decreases when advection is included, which indicates that subgrid energy advection does not induce spurious waves.At the same time, subgrid energy advection enhances the release of APE near the peaks (Figures 4e and 4f), thereby reducing biases in energy production and enhancing EKE.Based on the vertical profile diagnostics, we conclude that adding the advection term to the subgrid equation has a positive effect, with differing changes depending on the test case.This conclusion is supported qualitatively by a two-dimensional horizontal view of the production term (see Figure 5), which shows the buoyancy flux at the depth level where it reaches its maximum (differing between the two setups) and the dissipation power on the surface level for the different configurations.The reason for looking at different vertical levels is motivated by the different EKE and APE diagnostics peaks at different levels.Dissipation power involves the combined impact of both the viscous closure and the backscattering term in simulations incorporating backscatter.In addition, we examine the full terms, incorporating both the dissipative and flux components of the operators, recognizing that upon integrating over the domain, the flux term (i.e., the redistribution term) will disappear, leaving only the dissipative component of the viscous operator.Both diagnostic fields exhibit significant fluctuations.In order to better distinguish between areas of dissipation and anti-dissipation, we conservatively remapped the dissipation field to a coarse mesh with 100 km resolution.It removes the flux contributions that are largest at grid scales, arising from the triangular grid structure of FESOM2 (see the discussion in Juricke et al., 2019).Due to the added advection term in the UKE equation, the central jet's energy production areas are extended.They reach further into the jet domain, albeit the jet is in the wrong position compared to the high-resolution simulation (Figure 5c).Additionally, advection in the UKE equation prevents backscatter work in the boundary layer, as demonstrated in Figure 5g.With the addition of UKE advection, backscatter now focuses primarily on the eddy regions within the domain, resulting in a more physical process representation.
Furthermore, we conducted an assessment comparing the time scale of UKE advection with the scale of subgrid kinetic energy.We obtained (not shown) that the scales of subgrid advection are of similar order as the scales associated with the residence time of kinetic energy within the UKE budget.This assessment suggests that there is sufficient time for advection to play a significant role before energy is reinjected.

Spectral Diagnostics
Spectral diagnostics of EKE (Figures 6a and 6b) show the expected scalings (i.e., 5/3 and 3) in certain wavenumbers ranges.However, the simulated spectral slopes deviate relatively early (i.e., at low wavenumbers) from theoretical expectations in the 20 km simulation without backscatter.Backscatter significantly increases the energy level, especially at mid-range, without adding much energy to the small scales (which is generally not desirable for reasons of numerical stability).Including advection results in a minor positive change to KE across all scales.For the channel, a rather close agreement is reached between the simulations with backscatter and the high-resolution simulation.Adding UKE advection even leads to a slight overshoot.For the DG setup, the level of KE is still deficient at very small and large scales, even with added UKE advection.
In all DG setup simulations, one can observe a spectral density pile-up near the finest grid scale (2h c ).While this is generally seen as related to insufficient dissipation, the same effect can be observed in channel simulations when considering the finest grid scale and has been (at least partially) identified as an artifact of the interpolation from the triangular to a rectangular grid when computing Fourier spectra (Juricke et al., 2023).
Dissipation power spectra (Figures 6c and 6d) show the total dissipation (in the case of simulations with the purely viscous closure without backscatter) or the sum of total dissipation and backscatter (in the case of simulations with backscatter) across scales.One might expect that viscous dissipation is concentrated at small scales.However, if the resolution is insufficient, it affects all scales and peaks at scales where the energy content is maximal (also see the discussion in Soufflet et al., 2016).On the other hand, backscatter has a distinct injection maximum at large scales and a dissipation maximum at small scales.The points where the dissipation power spectrum crosses the kaxis mark the scales at which there is a change from energy dissipation to energy injection.When there are more smoothing cycles, the point of intersection moves toward larger scales.Conversely, reducing smoothing causes the intersection point to shift toward smaller scales.The 20 km simulation without backscatter is more dissipative than the 10 km, and the influence of dissipation is mostly at the long-wave part of the spectrum.This changes completely with backscatter: energy is injected on large scales, propagates in both directions of the energy cascade, and actively dissipates along the direct cascade on smaller scales.We observe that the added subgrid energy advection component enhances the backscatter effect on the large scales and dissipation near the grid scales.In summary, adding UKE advection enhances both total kinetic energy and total dissipation-which follows from the energizing effect of backscatter-across all scales.It does not, however, noticeably affect the scale at which the overall dissipation (small scales) changes to backscatter (large scales).

Sufficient Filtering Is Important
Insufficient backscatter smoothing causes significant deviations for all diagnostics.When disabling the filter in the backscatter operator, we observe a loss of energy for all simulations (dashed lines in Figures 4a-4d).For the channel, the performance is even worse than the 20 km simulation without backscatter (Figures 4a and 4b).
Concerning vertical velocity (Figures 4c and 4d), it either substantially enhances variability (DG setup) or reduces it (channel).We also observed significant unphysical fluctuations on small scales in the energy spectra (Figures 6a  and 6b).The further detrimental impact of insufficient smoothing is seen in snapshots of the vorticity fields: Eddies and filaments get a highly distorted "patchy" structure and do not propagate in a physically fully coherent way (not shown).
The simulations with insufficient backscatter filtering illustrate the minimal scales where non-smoothed backscatter injects energy into the system.In the case of the channel setup (Figure 6c), we observe the additional isolated peaks of energy injection (wave number 34) and energy dissipation (wave number 29).They coincide with the double peaks in KE (Figure 6a).The general nature of the twin peaks in kinetic energy spectra (consistent between the setups) can only be speculated at this point.They may be related to the formulation or filtering of the UKE components Ėdis and Ėback which ultimately determine the backscatter coefficient.Alternatively and most likely, they are a product of the procedure of collapsing spectra from 2D to 1D, where different directions in the grid may show up as two peaks rather than one.
We do not exclude the potential interference between the role of UKE advection and the degree of backscatter operator smoothing, as both affect the locality of the backscatter parameterization.However, we can clearly state that insufficient scale separation between dissipation and backscatter causes serious flow deviations and is an inadequate parameterization option for FESOM2.It should be noted that both a variable viscosity coefficient and smoothing have an impact on the transition wavenumber.Consequently, careful consideration is needed as to whether the transition wavenumber should actually be independent of the solution or if it should also be "tuned" for the resolved regime (i.e., backscattering into the inverse rather than the forward cascade).We have seen in our experiments (see also Juricke et al. (2019)) that the latter seems to be the case.Therefore, filtering additionally accounts for the fact that the transition wavenumber should be moved according to the flow, either by adjusting the viscosity/backscatter coefficients or by smoothing.At this point, a generalization regarding smoothing to other grid types or models can not be made.

EOF Analysis
The first EOFs of the kinetic energy of the high-resolution simulation correspond to the highest variability of KE and are determined by the fluctuations of the mean flow.The presence of a strong localized jet in the DG setup allows the first few EOFs to be relatively large-scale and to capture a large part of the variability in KE, whereas the removed mean flow variability of the channel setup makes the first EOFs already much smaller-scale.
Constructing the spatial correlation of the stochastic subgrid term based on EOFs with very fine local structures can excite undesirable noise.This needs to be kept in mind and treated with caution.To explain a reference percentage of variability (i.e., 50% in our case), one needs to consider 18 EOFs for the channel setup and only 7 EOFs for the DG setup (see Figure 7).The decreasing values of the fitted coefficients of the autoregression process for the corresponding PC accompany the decrease in the explainable capacity of the EOFs.In conclusion, as the EOF mode increases, we observe a shift toward EOFs featuring smaller-scale patterns and PCs with reduced lag coefficients.
As an alternative approach, we also calculated the kinetic energy difference between the coarse-grained data and the output of a coarse-resolution simulation instead of just the pure coarse-grained high-resolution kinetic energy.Through this second approach, we could take the systematic differences in kinetic energy between the outputs of simulations with different resolutions.Consequently, we generate meaningful EOF patterns of missing kinetic energy variability.In the following, however, we will focus on the initial approach, that is, the patterns generated directly from the kinetic energy data of the coarse-grained high-resolution data.The reason for this is that for the second approach, it was necessary to keep substantially more EOF modes (more than twice as many) to retain 50% of variability, leading to a much more small-scale structure of stochastic forcing patterns that could potentially cause model numerical instabilities and undesired excitation of grid-scale noise.Another reason is that eddy formation differs between high and coarse resolutions.For instance, a large eddy in high resolution does not always align with the eddy pattern observed in coarse resolution, as examined in the analysis of relative vorticity dynamics (not shown).Thus, we have not found enough reasons to alter our method of selecting data.Furthermore, we did not find sufficient evidence to justify the use of KE tendencies data as well.Increasing the number of EOFs on multiple occasions did not lead to an improvement in model performance.However, other options on how to create the EOF patterns are possible.

Impact of the Stochastic Subgrid Energy Source
Based on the magnitude of the other terms in the subgrid energy equation, we selected three options for the noise amplitude coefficient C 1 in Equation 9: C 1 = 0.001, which corresponds to "low"-intensity noise (simulation "20 km + SBS(low)"), C 1 = 0.005 corresponding to "middle"-intensity noise (simulation "20 km + SBS(middle)"), and C 1 = 0.01 (channel) or C 1 = 0.008 (DG setup) corresponding to the maximum amplitude that does not cause the model to become numerically unstable ("20 km + SBS(high)").
The first result of our simulations is that we can significantly enhance the model's kinetic energy levels via stochastic backscatter while preserving stability.We are inclined to view this form of stochastic forcing as, ultimately, an injection of energy.Given the positive contribution of the stochastic term due to the positively skewed distribution of UKE (as negative values are generally prevented by the shut down of backscatter in such instances), a positive input to UKE leads, to first order, to an increase in KE due to the backscatter.
We observe an energizing of the surface layers in the vertical energy profiles (Figures 8a and 8b) for all noise categories and, in particular, the energy increase beyond the reference simulation for the "strong" noise.We also find a good agreement in kinetic energy with the high-resolution reference simulation for the spectra at large scales (Figure 9), which indicates that we are able (at least partially) to reproduce the spectral slope using the added stochastic term in the UKE equation.On the other hand, the vertical energy profile shows unphysical energy growth in the lower layers of the model when using the "strong" stochastic term.It is possible that a more careful tuning of the amplitude as a function of z might mitigate this problem.However, this would be at the expense of introducing yet more tuning parameters so that we restrict ourselves to testing with the stated form of multiplicative noise with simple amplitude tuning.Diagnostics of vertical velocity anomalies (Figures 8c and 8d) reveal that, especially in the case of the channel setup, the high-amplitude stochastic term doesn't reflect the expected flow behavior at depth.Consequently, this amplitude is outside of an acceptable range.
Snapshots of relative vorticity for the DG setup (Figure 10) show that stochastic backscatter energizes the field with eddies, especially along the jet area.However, we observe increased eddy activity in the northern part of the DG domain that does not correspond to the high-resolution truth.This effect can be caused by insufficient EOF selection, poor fitting of the principal components, a locally overly large noise amplitude, or by the performance of the EOF approach itself.We nevertheless confirmed the presence of additional eddy dynamics along the jet (see Figures 10d and 10e) and an improvement of the kinetic energy spectra curve across the full range of scales (see Figure 9).Our concern about near-grid-scale noise caused by the stochastic component was not confirmed for the DG setup.
The results for the channel setup showed a worse performance of the EOF approach: we obtained small-scale growth of kinetic energy (Figure 9a), which could be explained as a spurious wave generation caused by the additional stochastic backscatter term.Thus, the robustness of the stochastic component, in particular, depends on the flow characteristics and noise amplitude of extracted EOF patterns, which should be sufficiently large-scale.This property was also validated when analyzing the simulations using data of KE difference between two resolutions rather than total high-resolution KE data.In this case, the model diagnostics showed a worsening in the relative vorticity fields and in energetics compared to the high-resolution KE-based EOFs (not shown).

Combined Effect of Subgrid Energy Advection and Stochastic Forcing
Our final set of simulations assesses the combined effect of stochastic and advection subgrid terms.Global diagnostics are summarized in Table 2.It shows generally favorable improvements when using some form of backscatter and, in particular, when using both new subgrid terms together.However, it is difficult to pick a clear winner.Consequently, we will also discuss sea surface height (SSH) differences as well as vertical density profiles.For conciseness, we limit the discussion to the DG setup and also restrict to the low (C 1 = 0.001) and middle-intensity (C 1 = 0.005) cases for the stochastic term.The large amplitude case has already been rejected as performing poorly (see Figure 8 and the discussion in Section 3.7).
The SSH diagnostics show the time-averaged SSH for the coarse-grid simulation without backscatter (Figure 11a), coarse-grid simulation with backscatter (Figure 11b), and the fine-resolution simulation (Figure 11c).The middle and bottom rows of Figure 11 show the time-averaged SSH difference between the coarse-grained high-resolution simulation and the different combinations of subgrid terms as indicated in the subplot headings.Two features deserve particular attention: first, we look at the flow separation from the wall near the left corner of the domain.This point of separation is moved north when the resolution is finer.The reason for this is the reduction of viscous dissipation in higher-resolution simulations (see further discussion in Sein et al. ( 2016)).For vertical walls, the sensitivity to the level of viscosity is higher than for sloped topography.Thus, the backscatter, which has a limited impact on the location of the mean flow and mainly affects the eddy part of the flow, can not completely fix the point of separation.However, we observe the magnitude of the mean SSH difference decreases with backscatter (dark red in the left corner in Figures 11e-11i vs. Figure 11d).This moves the point of jet separation a little further north.The presence of the UKE advection term (Figures 11f and 11i) decreases the difference to the high-resolution simulation along the jet area.At the same time, it slightly worsens the SSH difference in the south of the domain (Figure 11f).The stochastic term helps to improve the southern area SSH difference (Figures 11g-11i).
Overall, combining the original backscatter with the additional components reduces the domain-averaged RMSE SSH compared to simulations with only the original backscatter or no backscatter at all (Table 2, Figure 11i).
Second, the density profiles are compared on a North-South at 15°longitude (Figure 12).We observe a significant difference between coarse resolution without backscatter and any of the simulations with backscatter: without backscatter, one can see a nearly barotropic jet penetrating along the entire water column at around 30°N (Figure 12a).The lack of eddies together with the wind forcing lead to steep isopycnals.With backscatter, eddies can form, which improves the form of isopycnals in the upper layers toward the slopes seen in the reference simulation.In addition, the backscatter DG simulations after 9 years might still contain some drift in the stratification, although probably small (i.e., the figures might still change a bit if we let it run for longer).
Adding UKE advection (Figure 12d vs. Figure 12c, Figure 12h vs. Figure 12e, and Figure 12i vs. Figure 12f) results in the straightening of the slope of isopycnals and enhances the variability of the contours of the isopycnal surfaces, resembling more closely those of the reference simulation (Figure 12b).Furthermore, it is noted that UKE advection contributes to the smoothing of the density profile peak in the southern region of the domain (propagating downward from the surface into the deep ocean), which had been intensified in the simulation with the classical backscatter (Figure 12c).
Adding the stochastic term straightens isopycnals along the entire domain.The optimal results are obtained using the stochastic term of moderate amplitude within the range of noise amplitudes.The low-amplitude noise does not have a big impact, while the high-amplitude noise leads to excessive mixing near the surface (flattening the isopycnals on Figure 12g).When the UKE advection is switched on in addition to the stochastic components (Figures 12h and 12i), the isopycnals exhibit a smoother pattern, particularly in the southern corner.
Based on SSH diagnostics (Figure 11i) as well as EKE diagnostics (Table 2), the coarse-resolution setup that utilizes a combination of the middle-intensity stochastic term and advection component in the subgrid equation produces very good results.Furthermore, the new terms individually have the potential to improve certain flow features (Table 2) and rectify the flow behavior in different regions of the DG field (Figures 12d and 12f).However, compared to the reference high-resolution simulation, coarse-resolution simulations with backscatter still have weaker stratification (Figure 12b vs. Figures 12c-12i).

Discussion and Conclusion
In this work, we tested the performance of two additional contributions to the unresolved subgrid energy equation that is used in the framework of the kinetic energy backscatter of Juricke et al. (2019), which is based on earlier work by Jansen et al. (2015).These terms are additional subgrid energy advection and stochastic forcing and were employed in two test cases, a channel and a double-gyre simulation.
The idea behind advecting subgrid kinetic energy by the three-dimensional resolved flow is motivated by the fact that the locations of kinetic energy dissipation and forcing do not necessarily coincide.Our results show that, indeed, this additional contribution to the subgrid energy equation has modest but consistent positive effects: it corrects the behavior of isopycnals, decreases the difference of SSH to the high-resolution simulation in eddy-rich regions and improves the mean vertical profiles.The additional advection of subgrid energy prevents the excessive activity of backscatter along the western boundary of the double-gyre domain.Energetically, adding subgrid advection leads to enhanced energy creation and energy dissipation on all scales compared to the simulation without subgrid advection.The potential concern regarding a negative impact on model stability after incorporating subgrid energy advection, stated by Juricke et al. (2019), was not confirmed in our experiments.At the same time, the advection of subgrid energy adds only a 1.5% additional costs to simulation time.Moreover, no tuning is necessary as it is physically based.Our conclusion is, therefore, that subgrid kinetic energy should be treated with advection.
The second additional, stochastic contribution to the subgrid energy budget has been designed to enhance the simulated eddy variability by incorporating data on regions of enhanced eddy activity from a high-resolution simulation.Our proposed stochastic term can improve diagnostics in the flow's calm and active areas.Moreover, the spectral characteristics of the flow with stochastic forcing in the subgrid energy equation improve across a wide range of scales.However, we need to be cautious when using stochastic forcing: if its amplitude is too large, it can cause serious distortions and artifacts, even while a consequently improved energy spectrum may be close to expectations.Moreover, the acceptable level depends on the setup and is difficult to assess a priori.To some extent, it is possible to guard against such failures by looking for anomalies in the amplitude of vertical velocity fluctuations at depth or an excess of eddies in calm regions of the domain.But careful monitoring and tuning is critical.Furthermore, it will generally be necessary to recompute noise correlation patterns for different domains individually.The question of how to create relevant and cost-efficient patterns for stochastic noise, for example, in studies such as Storto and Andriopoulos (2021), Leutbecher et al. (2017), Subramanian and Palmer (2017), and Christensen ( 2020)) remains a key area of investigation and will continue to drive future research.
None of the parameterizations examined in this study result in a degradation of the vertical hydrography (Figure 12, all simulations with backscatter).This outcome is contrary to what would be anticipated if backscatter were to cause erroneously large vertical mixing.A careful analysis of diapycnal mixing and contributions of backscatter to such terms within different regimes and under different conditions will be carried out in future studies.
Stochastic forcing in the subgrid equation not only improves the flow characteristics, when done carefully, but also allows generating ensemble simulations.This enables the construction of distribution functions for output variables and measures the uncertainty of the backscatter performance, an important potential direction for further research.
Several other aspects, which are worth further investigation, relate to the design of the stochastic term.One potential alternative to the EOF method is the use of dynamical mode decomposition as a tool to understand the flow variability and reduce the dimensionality of the system (e.g., Franzke et al., 2022).Following the EOF approach, the selection of data for decomposition and the number of the EOF modes, which explains a sufficient amount of missing variability, remain at the modeler's discretion.
Machine learning methods could capture the missing variability as an alternative to classical stochastic methods.Deep learning methods driven by the data from an idealized simulation (Bolton & Zanna, 2019) and from realistic coupled climate models (Guillaumin & Zanna, 2021) were applied to ocean momentum forcing to represent the subgrid variability.The authors showed that convolutional neural networks can be constructed to satisfy the momentum conservation law and capture spatial and temporal eddy variability.
Finally, the necessary scale separation between the work of the backscatter and viscous operators is crucial and can be diagnosed by spectral methods.When there is not enough scale separation, the energy injection occurs in the dissipation scale range.This results in highly disturbed flow filaments and prevents eddies from propagating in a physically coherent manner.Ensuring adequate scale separation between injection and dissipation is crucial, as insufficient differentiation between these scales can lead to significant flow distortion, rendering it unsuitable as an eddy parameterization.However, the optimal configuration may vary based on factors such as the filter, model specifications, the physical regime, resolution, and other relevant aspects.
Potential research on parameterizing mesoscale eddies beyond the scope of dynamic energy backscatter could be related to the position of large oceanic structures (for instance, the jet in the case of the double-gyre setup) in coarse resolution simulations.Dynamic backscatter, in any of its variations considered here, so far did not yield fundamental improvements, for example, of the point of jet separation in the double-gyre test case.This is mostly likely due to the variety of processes interacting in such highly dynamic regions, which cannot all be improved by backscatter alone.However, improvements to the mean flow by our default dynamic backscatter and an alternative form of dynamic backscatter have also been observed by Juricke et al. (2020b) and Chang et al. (2023), respectively.Nevertheless, new or extended approaches in this regard remain a focus of further research.

Appendix A
In the following we provide supplementary details concerning the setup configurations.

Figure 1 .
Figure 1.Channel (a, b) and double-gyre (c-f) setups.Annual-mean EKE [m 2 /s 2 ] (after spin-up) for the coarse grid simulation (a, c) and for the fine grid simulation (b, d), which was determined by the formula: u 2 +v 2 u 2 v 2 2 .A snapshot of the relative vorticity for the coarse grid simulation (e) and the fine grid simulation (f), showing the designated area for Fourier decomposition (black box).Aerial view of the surface layer.

Figure 2 .
Figure 2. Vertical temperature and density profiles.Panel (a) shows the initial vertical temperature stratification in the channel (domain averaged), while panel (b) displays both the initial (dashed line) and equilibrium (solid line) vertical temperature stratification in the double-gyre setup (domain averaged).Panel (c) shows the annual mean of the vertical density profile along 2.5°longitude in the channel, and panel (d) shows the annual mean of the vertical density profile along 15°longitude in the double-gyre setup after spin-up.

Figure 3 .
Figure 3.The variability of surface kinetic energy over time (the calculation was performed for the total integrated field, considering area weights).The blue line represents a 10-year simulation of the channel setup.The highlighted solid blue box indicates the 9 years chosen for analysis, excluding the first spin-up year.After a 50-year spin-up, the orange line corresponds to the double-gyre setup, with the 9 years chosen for analysis indicated by a solid orange box.The gray line indicates the amplitude of the initial drift of the double-gyre setup.

Figure 4 .
Figure 4. Vertical profiles for the channel setup (left column) and the double-gyre setup (right column).Each setup includes layer and time-averaged (9 years) diagnostics for EKE [m 2 /s 2 ] (a, b), the RMS vertical velocity anomalies [m/s] (c, d), and buoyancy flux [m 2 /s 3 ] (e, f).Figures a, b, and f have a gap on the vertical axis to better highlight changes closer to the surface.

Figure 5 .
Figure 5.The 9-year average of 2D buoyancy flux [m 2 /s 3 ] (a-d) and the 9-year average of the dissipation power [m 2 /s 3 ] (e-h) computed as the dot product of the velocity field and its dissipation tendency.The dissipation field is also coarse-grained to the 100 km grid.Plots are provided for the following DG setup configurations: coarse resolution simulation (a, e), coarse resolution with deterministic backscatter (b, f), coarse resolution with deterministic backscatter and subgrid energy advection (c, g), and fine resolution simulation (d, h).

Figure 6 .
Figure 6.Kinetic energy and dissipation spectra for the channel and DG setups averaged over 9 years.The vertical lines show the largest wavenumbers (smallest wavelength) on coarse and fine meshes.The second peak in the kinetic energy spectra plot corresponds to wavenumber 29.The wavenumber k is displayed in units of 2π/domain size.

Figure 7 .
Figure 7.The explained variance (red line), the cumulative variance (gray bars), and the fitted coefficient (μ i ) (blue) for the AR(1) process are listed for each EOF mode.

Figure 8 .
Figure 8. Vertical profiles for the channel setup (left column) and the double-gyre setup (right column) after incorporating a stochastic UKE term with varying amplitude.Each setup includes layer and time-averaged (9 years) diagnostics for EKE [m 2 /s 2 ] (a, b), the RMS vertical velocity anomalies [m/s] (c, d), and buoyancy flux [m 2 /s 3 ] (e, f).Figures a, b, and f have a gap on the vertical axis to better highlight changes closer to the surface.

Figure 9 .
Figure 9. Kinetic energy spectra for different amplitudes of the stochastic UKE term.A dashed red line represents the spectra of the low-amplitude stochastic term on the subgrid equation and is almost identical to the spectra of the deterministic backscatter simulation data.The dashed-dotted and solid red lines represent data simulated with middle and high-amplitude stochastic terms, respectively.

Figure 10 .
Figure 10.Snapshots of relative vorticity for coarse resolution without backscatter (a), coarse resolution with deterministic backscatter (b), fine resolution (c), and coarse resolution with varying stochastic backscatter amplitudes (d-f).The black boxes show the designated area for Fourier decomposition.

Figure 11 .
Figure 11.Over 9 years, the sea surface height [m] was averaged and compared between three simulations: coarse-resolution simulation (a), coarse-resolution simulation with deterministic backscatter (b), and fine resolution (c).The difference in SSH between the high-resolution coarse-grained simulation and the various coarseresolution simulations (d-i) was also analyzed.Green and red circles indicate specific regions of improvements and impairments compared to the low resolution.Plot (i) corresponds to the best simulation incorporating middle amplitude stochastic term and UKE advection.

Figure 12 .
Figure 12. 9-year averaged vertical density profiles along 15°longitude for the double-gyre setup.The green circles indicate where specific improvements were made toward the high-resolution reference simulation, while the red circles indicate areas with impairments.The figures have a gap on the vertical axis to better highlight changes closer to the surface.

Figure A1 .
Figure A1.The analytical forcing functions are based on latitude in the double-gyre setup.These functions include air surface layer temperature (a), wind stress (b), and solar radiation (c).

Table 1
An Overview of the Essential Parameters for the Simulation Setups Δx is the side of equilateral triangle or the smallest side of rectangular triangles in the channel and DG configurations, respectively.

Table 2
Summary of Global Diagnostics, Averaged Over a 9-Year Period, Comparing the Various Coarse-Resolution Model Configurations Relative to the High-Resolution Reference, Which Is Normalized to 100%, Except for RMSE SSH, Where High-Resolution Simulation Corresponds to 0