Ocean models require subgrid-scale parametrizations of vertical mixing expressed in terms of a quantity that is easily diagnosable from model output, such as the Richardson number. To date parametrizing mixing for low (<1) Richardson number flows, such as the Equatorial Undercurrent, has received the most attention. Here a new Richardson number parametrization is proposed that provides estimates of vertical turbulent diffusivity in the high Richardson number stratified shear flow that is associated with mesoscale ocean features such as eddies and fronts. This parametrization is based on direct observations of vertical turbulent diffusivity from three separate ocean regions in the North Atlantic and Southern Ocean and is found to be robust for values of the Richardson number greater than 1 at depths below the ocean surface boundary layer. The new parametrization gives substantially improved agreement with the observed mixing in the presence of mesoscale ocean features compared to existing Richardson number parametrizations.
 Vertical turbulent mixing processes in the ocean occur on a wide range of temporal and spatial scales. Ocean general circulation models, which typically use longer time and larger space scales than those of turbulent mixing processes, are capable of explicitly resolving only a subset of these processes and hence require suitable subgrid-scale parametrizations of vertical mixing. Such subgrid-scale parametrizations of vertical mixing are commonly expressed in terms of a quantity that is easily diagnosable from model output, such as the Richardson number [Pacanowski and Philander, 1981; Large et al., 1994; Yu and Schopf, 1997; Jackson et al., 2008; Zaron and Moum, 2009].
 The gradient Richardson number (Ri), often used to describe the stability of stratified shear flow [Monin and Yaglom, 1971], is defined as the ratio of buoyancy frequency (N) squared to vertical shear (Sh) squared
 [Gill, 1982] where g is acceleration due to gravity, ρ is potential density, and u, v are components of horizontal velocity. Hereafter Richardson number (Ri) refers to the gradient Richardson number as defined above (equation ((1))).
 Richardson number based mixing parametrizations have typically been developed for application in global-scale models with large-scale (~1°) horizontal resolution. The aim of such parametrizations has been to improve model representation of large-scale ocean features that are significant to climate often with particular reference to the equatorial ocean [Pacanowski and Philander, 1981; Large et al., 1994; Jackson et al., 2008; Zaron and Moum, 2009].
 Due to the absence of Coriolis effects, mixing around the Equatorial Undercurrent is observed to be largely dependent on Richardson number and up to an order of magnitude higher than in regions away from the Equator [Peters et al., 1988; Zaron and Moum, 2009]. Direct observations of the Equatorial Undercurrent report a range of Richardson numbers between 0.1 and ~14, with a large proportion of the observations being for Richardson numbers less than 1 [Peters et al., 1988; Zaron and Moum, 2009]. Vertical turbulent mixing has been observed to be greatly enhanced in regions of low (<1) Richardson number both in the laboratory [Turner, 1973; Thorpe, 2005] and in the ocean [Toole and Schmitt, 1987; Peters et al., 1988].
 Large-scale models of the equatorial ocean have been found to be insensitive to the exact form of Richardson number parametrization of vertical mixing provided that high mixing is associated with low Richardson number and vice versa [Yu and Schopf, 1997]. However, recently the application of existing Richardson number based mixing parametrizations to other oceanic flows which generate vertical mixing in response to low Richardson number, for example gravity driven overflow currents, has proved to be less than successful [Chang et al., 2005; Xu et al., 2006; Jackson et al., 2008]. Problems with the wider application of existing Richardson number based mixing parametrizations have led to the development of new Richardson number based parametrizations which either include additional flow-related terms such as the turbulent kinetic energy [Jackson et al., 2008] or relate Richardson number to a nondimensional expression of vertical mixing [Zaron and Moum, 2009].
 The flow associated with mesoscale ocean features (horizontal scale of ~10 to 100 km), away from the equator, does not typically generate the low Richardson number associated with the enhanced mixing observed for the Equatorial Undercurrent. Previous studies of strong mesoscale features at higher latitudes report Richardson numbers in the range 3 to 40 for the Gulf Stream [Pelegri and Csanady, 1994] and 2 to 20 for the Florida current [Winkel et al., 2002]. Nevertheless, the shear from mesoscale flow may interact with shear generated by the internal wave field to produce elevated mixing where the flow has Richardson number <20 [Polzin et al., 1996]. The geostrophically balanced flow of the upwelling-driven coastal jet off Oregon has Richardson numbers >1 [Avicola et al., 2007]. However, interactions between the jet and internal gravity waves can result intermittently in reduced Richardson numbers (<1) and enhanced mixing [Avicola et al., 2007].
 To date, despite the increasing use of mesoscale resolving ocean models, little attention has been paid to parametrizing the vertical turbulent mixing that mesoscale flow may generate.
1.1 Parametrizing Vertical Mixing
 The magnitude of vertical turbulent mixing is often approximated using a turbulent diffusion coefficient (K). The turbulent diffusion coefficient of a quantity (such as momentum or tracer concentration) resulting from shear flow in stable stratification (i.e., buoyancy frequency > 0) has been traditionally [Munk and Anderson, 1948] related to the Richardson number through an equation of the form
where Ks is the turbulent diffusion coefficient for the quantity s, Kos is the turbulent diffusion under neutral stability, and αs, ns are constants [Munk and Anderson, 1948; Monin and Yaglom, 1971; Peters et al., 1988]. As Ri reduces towards a critical value (Ricrit), mixing has been observed to increase in excess of what would be predicted by equation ((2)) [Peters et al., 1988; Lozovatsky et al., 2006; Soloview et al., 2001]. Hence, equation ((2)) is generally only considered to be appropriate for high Ri (Ri > Ricrit) [Peters et al., 1988; Lozovatsky et al., 2006]. To obtain a parametrization which can represent turbulent diffusion coefficients for the full range of Richardson number (0 <Ri < ∞), the estimate of diffusivity for any given Ri is often considered to be the sum of the estimate of diffusivity from a relationship for low Ri (Ri < Ricrit) and the estimate of diffusivity from a relationship for high Ri, i.e., K(Ri) = K(Ri)low + K(Ri)high [Peters et al., 1988; Large et al., 1994; Soloview et al., 2001]. Several forms for the low Ri relationship and values for Ricrit have been proposed [Peters et al., 1988; Large et al., 1994; Soloview et al., 2001; Lozovatsky et al., 2006].
 Mixing in the ocean has been identified to potentially arise from several sources, of which stratified shear flow is only one [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994]. Consequently, when calibrating a Richardson number based parametrization of turbulent diffusion coefficients against direct observations [Peters et al., 1988] or when using such a parametrization in an ocean model [Pacanowski and Philander, 1981; Large et al., 1994], the observed (or modeled) diffusivity is represented as a sum of diffusion terms. Most commonly, an expression representing a background diffusivity is added to the Richardson number parametrization [Pacanowski and Philander, 1981; Peters et al., 1988], although explicit expressions for non-shear-driven mixing processes such as convective overturning and double diffusion could be included [Large et al., 1994]. Hence, without incorporating explicit expressions for convection and double diffusion, the full expression for turbulent diffusivity is of the form
where Kbs is a background diffusivity which is often considered to be a constant [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994]. The background diffusivity is considered to arise from unresolved small-scale shear processes [Large et al., 1994] such as the finescale-shear instability arising from wave-wave interactions of the internal wave field [Polzin et al., 1997].
 The turbulent diffusion coefficients for tracers are usually considered to be equal but different from the diffusion coefficient for momentum [Munk and Anderson, 1948; Peters et al., 1988]. Typically, the turbulent diffusion coefficient for momentum is referred to as the turbulent viscosity (Kv), while the turbulent diffusion coefficient for tracers is referred to as the turbulent diffusivity (Kt). As formulated in equation ((2)), turbulent diffusivity and turbulent viscosity are independent. However, empirical relationships have been proposed relating the constants αs and ns for turbulent diffusivity and turbulent viscosity [Munk and Anderson, 1948] and theoretical arguments used to determine the ratio of turbulent diffusivity to turbulent viscosity [Monin and Yaglom, 1971]. Lacking direct observations of Kv and Kt to estimate the parameters in equation ((2)), previous studies have proposed various values for the constants Kos, αs, and ns (Table 1). Only one set of extant parameters has been estimated from calibration of equation ((2)) to direct observation [Peters et al., 1988], although the authors do not state which of the parameters were fixed a priori and which were fitted to observation. The parametrization of Pacanowski and Philander  is a variation on the form of equation ((2)) where turbulent diffusivity and turbulent viscosity are related by equating turbulent diffusivity under neutral stability to turbulent viscosity. The parametrization of Pacanowski and Philander  has been re-cast into the form of equation ((2)) by Yu and Schopf  using parameters given in Table 1.
Table 1. Constants Used in the Turbulent Mixing/Richardson Number Parametrizations of the Form of Equation ((2)) From the Literature With the Respective Constants Used for the Background (Kbs) Turbulent Viscosity and Diffusivity
 Expressions such as equation ((2)), commonly used in all major ocean general circulation models (see Griffies et al.  for a review), are formulated in terms of dimensional constants such as Kos, which potentially limits their universal application [Chang et al., 2005; Jackson et al., 2008; Zaron and Moum, 2009]. More recently, a new approach to parametrizing mixing in the Equatorial Undercurrent has been proposed where a nondimensionalized mixing coefficient is calculated as a function of Richardson number [Zaron and Moum, 2009]. The use of a nondimensionalizing factor |V|2/Sh, where V is the horizontal velocity vector at the depth Ri is calculated, is considered to provide the necessary dimensional information to allow the prediction of dimensional mixing coefficients from a nondimensional Richardson number [Chang et al., 2005; Zaron and Moum, 2009].
 The new Richardson number parametrization of vertical turbulent mixing presented in this paper is specifically designed to improve the representation of vertical turbulent mixing for flows with high (Ri > Ricrit) Richardson number. The aim of presenting this parametrization is to provide a parametrization which can readily be used in conjunction with existing parametrizations to improve the representation of vertical mixing in mesoscale resolving ocean models. To this end, the parametrization presented here is based on observations made in the presence of mesoscale flow from three separate non-equatorial ocean regions and is formulated to be compatible with other commonly implemented Richardson number parametrizations.
 For compatibility with other commonly implemented Richardson number parametrizations [Pacanowski and Philander, 1981; Large et al., 1994], the parametrization presented here is formulated following the form of equation ((2)). However, as direct observations of vertical mixing in the ocean will potentially contain mixing from other sources, equation ((2)) is combined with a background mixing term (Kbs) which represents the contribution to vertical mixing from other unresolved processes. Following previous studies, for simplicity Kbs is considered to be a constant [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994]. Hence,
where the high label is dropped for simplicity. Equation ((3)) is assumed to be appropriate for all high values of Ri. Turbulent diffusion under neutral stability is typically considered to be a constant for flows in the ocean interior below the surface mixed layer [Peters et al., 1988; Pelegri and Csanady, 1994; Yu and Schopf, 1997]. Therefore, as Kos is defined as constant, this relationship is only appropriate for use below the ocean upper mixed layer. Where the region of neutral stability is bounded by a surface, for example in the upper mixed layer which is bounded by the atmosphere, Kos is often considered to be a function of distance from the bounding surface and surface stress [Munk and Anderson, 1948; Robinson, 1966; Monin and Yaglom, 1971; Soloview et al., 2001].
2.1 Data Set Description
 Three sets of turbulence measurements were used in the calibration of equation ((3)), two from the North Atlantic and one from the Southern Ocean (Figure 1). Each station in each data set consists of between 5 and 19 profiles, to a maximum depth of 150 to 300 m, taken over the course of approximately 1 h. Representative and mean profiles of turbulent kinetic energy dissipation, vertical shear, and buoyancy frequency are shown in Figure 2.
2.1.1 Porcupine Abyssal Plain (PAP) Site Data Set
 Measurements were taken as part of UK RSS Discovery cruise D306 to the Porcupine Abyssal Plain in June–July 2006 [Burkill, 2006]. During the period of the measurements, a cyclonic eddy was present within the survey region [Painter et al., 2010]. Turbulent mixing was measured using a microstructure shear profiler (as described in section 2.2) at 15 stations taken as part of an 11 day time series on the site of the long-term PAP observatory (12 stations) and associated mesoscale survey of the area (3 stations). While each station was in progress, horizontal current velocities down to ~300 m were measured using a ship-mounted 150 kHz RDI Acoustic Doppler Current Profiler (ADCP). The instrument was configured to sample over 120 s intervals with 96 depth intervals of 4 m thickness starting at 14 m depth using pulse length 4 m and blank beyond transmit of 4 m. Calibration of the ADCP was carried out over the continental shelf on route to the survey site [Burkill, 2006; Painter et al., 2010].
2.1.2 Iceland Basin Data Set
 Measurements were taken as part of UK RSS Discovery cruise D321 to the Iceland Basin in July to August 2007 [Allen, 2007]. On arrival at the survey site, it was found that within the survey area there was an eddy dipole, consisting of a cyclonic eddy and an anticyclonically rotating mode-water eddy Forryan et al. . During the 3 week survey, turbulent mixing was measured using a microstructure shear profiler at 15 stations in various locations in and around the eddy dipole structure. While each station was in progress, horizontal current velocities down to approximately 300 m were measured using a ship-mounted 150 kHz RDI ADCP as described for cruise D306 above [Allen, 2007; Forryan et al., 2012].
2.1.3 Southern Ocean Data Set
 Measurements were taken as part of UK RSS James Cook cruise JC29 to the Kerguelen Plateau, in November to December 2008 [Naveira Garabato, 2008]. Turbulent mixing was measured using a microstructure shear profiler at nine stations off the northern edge of the Kerguelen plateau. While each station was in progress, horizontal current velocities down to approximately 300 m were measured using a ship-mounted 150 kHz RDI Ocean Surveyor ADCP. The instrument was configured to sample every 2 s, averaged into 120 s intervals on processing, with 60 bins of 8 m thickness and a blanking distance at the surface of 6 m. Calibration was carried out over the continental shelf on route to the survey site [Naveira Garabato, 2008].
2.2 Calculation of Turbulent Diffusivity and Turbulent Viscosity: Turbulent Kinetic Energy Dissipation
 Turbulent kinetic energy dissipation can be estimated directly from measurements of microstructure velocity shear made using a microstructure shear profiler. The microstructure profiler used here for all turbulence measurements was an MSS90L free-fall microstructure profiler (serial number 35) produced by Sea and Sun Technology GmbH and ISS Wassermesstechnik. The profiler is cylindrical in shape with two velocity microstructure shear probes as well as standard high precision conductivity-temperature-depth (CTD) sensors mounted at the descending end protected by a guard ring. The two shear probes are on slim shafts, ~150 mm in front of the CTD sensors, measuring the velocity fluctuations in the “clean,” undisturbed water in advance of the other sensors. A vibration control sensor and a two-component tilt sensor provide data to remove noise contamination from the signal. The profiler has buoyant foam rings at the opposite end from the sensor array where a light tether is attached for data and power transmission. On deployment, the profiler is allowed to free-fall vertically through the water, the sensor array downwards, by maintaining sufficient slack in the tether. This also isolates the profiler from the motions of the ship and minimizes contamination of the signal by vibrations caused through cable tension (pseudo-shear). Data from the sensors are recorded continuously while the profiler is falling by a PC, connected via the tether, using software provided by Sea and Sun Technology GmbH [Prandke, 2008]. The calibration of the CTD sensors was carried out by Sea and Sun Technology GmbH using standard calibration equipment and procedures for CTD probes. The vibration control sensor, the tilt sensors, and the shear sensors were calibrated by ISW Wassermesstechnik.
 Vertical microstructure shear was calculated from the measurements taken using the microstructure profiler following the method of Stips . Assuming isotropic turbulence, the rate of turbulent kinetic energy dissipation can be calculated from the variance of the vertical microstructure shear
where ν is the molecular viscosity of water, u′ are the turbulent velocity fluctuations, and the overbar indicates a spatial mean value [Lueck et al., 2002]. The assumption of isotropy in equation ((4)) can be justified if the critical ratio
is greater than 20 [Yamazaki and Osborn, 1990]. For the data presented here, taking the molecular viscosity for seawater to be 1.2 × 10−6 m2 s−1, I is greater than 20 for 91% of the measurements made below the surface boundary layer.
 The variance of the vertical microstructure shear was determined by integration of the vertical microstructure shear power spectrum (Φ(k)), where k is the wave number, estimated using the Welch modified periodigram method [Welch, 1967] from the vertical microstructure shear fluctuations. Hence, equation ((4)) can be represented as
 The measured vertical microstructure shear power spectrum was used to scale and dimensionalize a nondimensional analytical form of the empirical Nasmyth universal turbulence spectrum (Φnas(kn))
where kn is the nondimensional wave number [Roget et al., 2006].
 The universal spectrum was scaled by curve fitting, using a least squares fit, to the measured shear power spectra for each 1 s segment of recorded data (~1024 data points representing 0.5 m depth with a configured probe drop speed of 0.5 m s−1) between the limits 2–30 cpm (cycles per meter) for dissipations above 1 × 10−8 W kg−1 and between 2–15 cpm for dissipations below 1 × 10−8 W kg−1. The lower limit wave number of 2 cpm, the smallest wave number resolvable within a depth interval of 0.5 m, eliminates low-frequency noise from the probe oscillating during descent [Prandke, 2007]. The upper integration limits were selected heuristically from observation of shear power spectra. The maximum limit of 30 cpm was selected to be below the resonant frequency of the shear probe guard ring which is visible in the shear power spectrum as a peak at between wave number 50 to 100 cpm. For dissipations below 1 × 10−8 W kg−1, the power spectra are only distinct from instrument noise up to a maximum wave number of ~15 cpm. Within the integration limits observed, shear power spectra compare well with the universal Nasmyth spectrum (Figure 3). From consideration of the instrument shear power spectra, the noise threshold of the instrument is estimated to be of order 10−10 W kg−1, which is below the lowest dissipation recorded here.
 The rate of kinetic energy dissipation was calculated by integration of the fitted universal spectrum between 2 cpm and the Kolmogorov wave number (kc). The Kolmogorov wave number, the reciprocal of the Kolmogorov microscale, is given by
and represents the smallest scale of turbulent motions unaffected by the dissipative effects of molecular viscosity.
 A correction for the attenuation of the shear probe response as the wavelength of the velocity fluctuations decreases was applied to the dissipation estimate using an empirical polynomial function derived for the shear probe [Prandke, 2007, 2008]. The kinetic energy dissipation rates calculated for each of the two independent shear sensors were combined to provide a single estimate of dissipation for each cast following the method described in Prandke .
 Errors in calculating estimates of the dissipation rate arise from a number of sources. Calibration of the shear sensors is to within ±5%, and the influence of non-isotropic turbulence is estimated to add up to a maximum of 35% error to calculations [Yamazaki and Osborn, 1990]. In addition to these, uncertainties in the flow speed past the shear probe, estimated to be ~±5%, adds an additional ~20% error to the calculation [Oakey, 1982; Moum et al., 1995], as the calculated dissipation depends on the variance of flow shear squared. Lesser errors (<10% Dewey and Crawford ) arise from drift in shear probe calibration and uncertainties in the estimates of viscosity. Combining all the estimates of error together gives a generally accepted estimate of ±50% error in the calculation of turbulent dissipation [Oakey, 1982; Moum et al., 1995; Rippeth et al., 2003].
2.3 Averaging Turbulent Quantities
 To obtain robust estimates of turbulent quantities, the measurements from all casts for each station were combined. This is necessary because the mixing processes involved are intermittent in both time and space (Figure 4). Consecutive casts often show considerably different structure with the distribution of turbulent quantities at the same depth for different casts usually being strongly non-Gaussian. Prior to averaging into 8 m depth intervals, the distribution of the turbulent diffusivity and turbulent kinetic energy dissipation data for each station was compared to both normal and lognormal distributions over 8 m depth intervals using the Kolmogorov-Smirnov test [Press et al., 1989]. Although the distribution of the turbulence data presented here is strongly non-normal, it was not found to be lognormal at any high significance of the Kolmogorov-Smirnov test. However, the log transform of the turbulence data is closer in approximation to a normal distribution than the untransformed data, which indicates that the mean of the log transformed data is likely to be a better approximation to the average. Consequently, when averaging turbulent diffusivity and dissipation data into 8 m depth intervals, the method of Baker and Gibson  was used. The mean (M) of the data is given by
where μ and σ2 are the arithmetic mean and variance of the log transformed data [Baker and Gibson, 1987].
2.3.1 Turbulent Diffusivity
 Turbulent diffusivity can be calculated from estimates of the turbulent kinetic energy dissipation rate
where Γ is the mixing efficiency [Osborn, 1980; Peters et al., 1988]. In line with previous studies [Osborn, 1980; Moum et al., 1995; Rippeth et al., 2003], a constant value of 0.2 was used for the mixing efficiency. Recent studies using numerical simulations suggest this equation can be considered as a valid representation for vertical turbulent diffusivity for all scalars [Lindborg and Fedina, 2009].
2.3.2 Turbulent Viscosity
 Turbulent viscosity is commonly used within ocean models to represent the vertical turbulent transfer of momentum [Pacanowski and Philander, 1981; Large et al., 1994; Zaron and Moum, 2009]. However, turbulent viscosity is generally regarded as a poor descriptor of the turbulent processes responsible for momentum transport [Tennekes and Lumley, 1972; Hinze, 1975]. This is due partly to the underlying simplicity of the gradient model for turbulent mixing and the assumption that there is a single length scale for turbulent processes [Hinze, 1975]. An additional problem with the concept of turbulent viscosity is that momentum can be transferred by turbulent overturning independent of turbulent shear production through changes in pressure [Tennekes and Lumley, 1972; Hinze, 1975].
 Assuming both buoyancy flux and pressure transport are negligible and that the rate of turbulent kinetic energy dissipation is equal in magnitude to the rate of production of turbulent kinetic energy by the flow, turbulent viscosity can be determined from
[Peters et al., 1988; Thorpe, 2005], where the vertical scale over which Sh is measured is the same as the vertical scale of the flow responsible for the production of turbulent kinetic energy. Here the vertical scale of the flow is considered to be mesoscale, and for consistency the vertical shear used in equation ((7)) is calculated in the same manner as the vertical shear used when estimating the Richardson number (section 2.6).
 Consistent with previous studies [Peters et al., 1988; Zaron and Moum, 2009], the results of using equation ((7)) to calculate Kv from the data should be viewed with the caveat that the strong assumptions qualifying the application of equation ((7)) may be hard to justify in an ocean context. For further discussion on the validity and application of this equation, see section 4.3.
2.4 Calculating the Vertical Shear
 The individual ADCP velocity components recorded while each station was in progress were averaged in time for each 8 m depth interval to produce a station mean velocity profile of 8 m resolution. Where the raw ADCP data were recorded with higher vertical resolution than 8 m (cruises D321 and D306), the ADCP data were first averaged into 8 m intervals. The gradient in velocity from the mean profile was calculated between successive depth levels from the individual horizontal velocity components by first-order differencing. The absolute gradients for the mean profile were then combined by taking the root sum of the two components squared to give the absolute vertical shear at the midpoint of each depth interval.
2.5 Calculating the Buoyancy Frequency
 Prior to the calculation of the buoyancy frequency, for consistency with the ADCP data, the microstructure measurements of temperature and salinity for each cast were averaged into a profile divided into 8 m intervals from which density was then calculated. The buoyancy frequency was calculated using these measurements of density. The values for N2 were averaged across the casts for each station for each depth interval to produce a station mean buoyancy profile.
2.6 Estimation of Richardson Number
 The Richardson number is highly scale-dependent, with the instantaneous value of the Richardson number calculated at a point in a stratified shear flow depending on both the Richardson number of the mean flow, which is calculated over scales of the same order as the flow velocity and flow length scale [Turner, 1973], and the vertical resolution of the measurements of shear and buoyancy used in the calculation [De Silva et al., 1999].
 The correlation between shear-driven mixing and Richardson number is strongest when the measurement scale of the Richardson number is of the same order as the vertical scale of the shear generating the mixing. For example, when vertical turbulent mixing is a result of finescale shears (instability on vertical scales of 2 to 3 m, Polzin ) the Richardson number calculated at a vertical resolution of 3 m shows a close correlation to observed turbulent dissipation, while the Richardson number calculated at a vertical resolution of 10 m shows no correlation [Toole and Schmitt, 1987; Polzin, 1996]. However, regardless of vertical resolution, the Richardson number alone is regarded as a poor predictor of observed mixing as it does not capture the variability in mixing [Peters et al., 1995].
 Here, the vertical scale of the overturning generating the observed mixing, estimated using the Thorpe length scale [Thorpe, 1977], is of order 1–3 m for all observations below the seasonal thermocline, except JC29 station 7 at ~140 m depth where the Thorpe length scale is ~10 m. This would suggest that finescale shear may well be generating the observed mixing and that the observed mesoscale flow shear may not be directly responsible. Nevertheless, the magnitude of the mesoscale flow Richardson number may well correlate with the magnitude of the observed mixing through influencing the strength of the finescale wave-flow interactions which may actually be generating the mixing [Polzin et al., 1996].
 At the mesoscale, in strong western boundary currents, mixing has been observed to be associated with vertical shear generated during frontogenesis [Van Gastel and Pelegri, 2004; Nagai et al., 2009]. In the meanders of the Gulf Stream, current mixing associated with shear is observed to occur on vertical scales of greater than 25 m [Van Gastel and Pelegri, 2004]. Observations of enhanced turbulent dissipation during frontogenesis in the Kuroshio suggest a vertical scale of ~50 m for vertical shear, although it is not clear whether the mixing is associated with current shear or other submesoscale mixing processes [Nagai et al., 2009]. A recent study has proposed that for equation ((6)) to be accurate, the vertical measurement scales should be of order 20 times the vertical scales of the turbulent overturning [Lindborg and Fedina, 2009]. Hence, a Thorpe scale of order 1–3 m would imply that for equation ((6)) to be accurate the vertical measurement scales should be between 20 and 60 m. This suggests that vertical length scales of at least 25 m are appropriate for calculating the Richardson number relevant to the shear generated by mesoscale flow (mesoscale shear).
 The Richardson number was calculated from profiles of vertical shear and buoyancy frequency using equation ((1)). Profiles of vertical shear and buoyancy frequency calculated from ADCP data and microstructure measurements of temperature and salinity show not only large-scale velocity and density trends but also the signatures of smaller scale processes as variability about the mean profile. Consequently, prior to calculating the Richardson number, profiles of vertical shear and buoyancy were smoothed using a running average filter to remove the effects of processes occurring at vertical scales smaller than mesoscale. As the observations suggest a range of vertical scales that may be appropriate to mesoscale shear, the size of the smoothing window applied when calculating the Richardson number was varied between 8 m (raw unsmoothed data) and 72 m in size when fitting the data to equation ((3)).
2.7 Fitting to Data
 Equation ((3)) was fitted to observations of the Richardson number and contemporaneous observations of turbulent viscosity and turbulent diffusivity, respectively, by allowing the parameters αs, ns, Kos, and Kbs to vary within limits suggested by the literature (Table 1). In each case, Kos was constrained to be within the range 1 × 10−5 to 1 × 10−1 m2 s−1, αs and ns were constrained to be within the range 1 to 100, and Kbs was constrained to be within the range 1 × 10−8 to 1 × 10−3 m2 s−1. These ranges were chosen to provide flexibility for the optimizer routine without artificially overconstraining the fit.
 Observations were fitted to equation ((3)) using a least squares fit which minimizes the sum of the square of the differences between the log of the observed turbulent diffusivities, or viscosities, (Kobs) and the log of the turbulent diffusivities, or viscosities, calculated from equation ((3)) by using the parameters (Kfit) [Emery and Thomson, 1997]. Consequently, the residual sum of squares is dimensionless and can be expressed as
 The parameter set and smoothing window combination with the lowest residual sum of squares was selected as the best fit. Confidence limits for a fit and for the individual parameters were estimated using the percentile method from a bootstrapped distribution [Efron and Gong, 1983] calculated by fitting to 10,000 resampled data sets.
 For quantitative comparison between two parametrizations, a dimensionless quality metric (QM) was calculated from the residual sum of squares:
 The mean QM with confidence limits, for a parametrization, were estimated from a bootstrapped distribution of the residual data set. The QM quantifies variability in the ratio of observation to parametrization. The uncertainty in a given Kfit is expressed by the range [Kfit/QM, Kfit * QM].
3.1 Turbulent Diffusivity
 When equation ((3)) is fitted to all the data for turbulent diffusivity from all three data sets simultaneously, the best fit is for a smoothing window of 56 m with corresponding parameter values of αs = 1, ns = 1.49 (90% confidence limit 1.35–1.66), Kos = 3.62 (90% confidence limit 2.86–4.8) × 10−4 m2 s−1, and Kbs = 8.14 (90% confidence limit 7.89–8.59) × 10−6 m2 s−1 (Table 2). This gives a parametrization for turbulent diffusivity of
with a QM of 2.48 (90% confidence limit 2.37–2.60). In order to assess any potential bias in the parametrization, the distribution of the difference between the log of the observed diffusivity and the log of the diffusivity calculated using the parametrization was calculated for the whole data set. The mean of the distribution is zero, which suggests that there is no consistent bias in the parametrization. Comparing the observations of turbulent diffusivity to the values calculated using the parametrization, 60% of the calculated values are within a factor of two of the observations (Figure 5). The parametrization appears to be representative of the mean observed diffusivity at any given Ri across the range of Richardson numbers observed (Figure 6). However, there is considerable variability in observations about this mean.
Table 2. Results of Fitting Equation ((3)) to Observations of Turbulent Diffusivity (Kt) Using Different Sized Windows (Sw) to Vertically Smooth Observed Shear and Buoyancya
Kos (m2 s−1)
Kbs (m2 s−1)
The figures in parentheses indicate bootstrapped 90% confidence limits. For the 56 m and 72 m smoothing windows, the bootstrapped fits returned α = 1 in all cases. The quality metric (QM) is calculated as described in section 'Fitting to Data'.
6.02 (2.19–7.34) × 10−4
15.38 (13.11–17.37) × 10−6
14.6 (7.3–16.3) × 10−4
7.51 (5.89–9.76) × 10−6
15.4 (3.4–16.9) × 10−4
5.72 (4.51–9.51) × 10−6
3.62 (2.86–4.8) × 10−4
8.14 (7.89–8.59) × 10−6
4.02 (3.11–5.36) × 10−4
9.02 (8.73–9.36) × 10−6
 Previous studies have commonly used a value of αs = 5 (Table 1). In order to assess the impact of fixing αs = 5, equation ((3)) was fitted to the full data set, as described above, with αs = 5 and the remaining parameters, ns, Kos, and Kbs allowed to vary within the ranges given above. For all smoothing window sizes, the best fit QM for a fit with αs = 5 is 2.50 (90% confidence limit 2.34–2.62), which is slightly larger than the best fit QM for the fit with all parameters varying (Tables 2 and 3). For all sizes of smoothing window, confidence limits to the αs = 5 fit QM overlap the confidence limits to the equivalent QM for the fit with all parameters varying (Table 3).
Table 3. Results of Fitting Equation ((3)) to Observations of Turbulent Diffusivity Kt Using Different Sized Windows (Sw) to Vertically Smooth Observed Shear and Buoyancya
Kos (m2 s−1)
Kbs (m2 s−1)
Parameter αs is fixed at 5 and the remaining parameters allowed to vary as described in section 2.7. The figures in parentheses indicate bootstrapped 90% confidence limits. The quality metric (QM) is calculated as described in section 2.7.
4 (3.6–5.3) × 10−4
14.8 (13–15.4) × 10−6
6 (6.1–9.3) × 10−4
6.7 (5.7–8.2) × 10−6
8 (6.7–13.2) × 10−4
5.88 (4.44–8.13) × 10−6
9 (7–14.8) × 10−4
5.84 (4.15–8.1) × 10−6
1 (7.2–17.4) × 10−3
6.74 (4.73–9) × 10−6
Kbs is considered to be a constant term representing the diffusivity from processes other than mesoscale shear (section 1.1). It is not unreasonable to expect that the diffusivity from such processes may vary from place to place in the ocean. In an attempt to estimate the likely variability in Kbs, equation ((3)) was fitted to the three individual data sets (using a 56 m smoothing window) with the values from the best fit parametrization for αs, ns, and Kos (αs = 1, ns = 1.5, Kos = 3.62 × 10−4 m2 s−1 see above) and only Kbs varying. Fitting equation ((3)) in this manner gave estimates of Kbs for the D306 data set of 7.71 (90% confidence limit 6.28–9.30) × 10−6 m2 s−1 with a QM of 2.46 (90% confidence limit 2.33–2.60), for the D321 data set of 8.69 (90% confidence limit 7.1–10.59) × 10−6 m2 s−1 with a QM of 2.20 (90% confidence limit 2.02–2.39), and for the JC29 data set of 18.8 (90% confidence limit 11.15–26.92) × 10−6 m2 s−1 with a QM of 2.87 (90% confidence limit 2.52–3.12). In all cases, the values of the QM 90% confidence limits overlap the QM values for fitting the parametrization with Kbs = 8 × 10−6 m2 s−1 to the individual data sets (D306 QM 2.46 (90% confidence limit 2.33–2.61); D321 QM 2.20 (90% confidence limit 2.03–2.39); and JC29 QM 2.92 (90% confidence limit 2.54–3.19)).
3.2 Turbulent Viscosity
 When equation ((3)) is fitted to the data for turbulent viscosity (Kv) from all three data sets simultaneously, the best fits do not appear to be representative of the observations (Figure 7). Approximating Kv to a constant value of 1 × 10−3 m2 s−1 (Figure 7), the mean of the log transformed observations, gives a QM of 2.66 (90% confidence limit 2.55–2.78), which is lower than any QM which can be obtained from fitting equation ((3)) to observed Kv.
4.1 The Richardson Number Range for Mixing at the Mesoscale
 Direct observations of vertical mixing and Richardson number (Ri) used in previous studies to derive parametrizations of vertical mixing have been taken exclusively from around the equator and have focused on the Equatorial Undercurrent [Peters et al., 1988; Zaron and Moum, 2009]. The range of Richardson numbers reported by Peters et al.  varied from 0.1 to ~14, with a large proportion of the observations being for Richardson numbers less than 1. The observational data used here come from three separate ocean regions, with one data set taken in the presence of strong mesoscale features (D321), one data set from a less dynamically active region of the ocean (D306), and one in close proximity to a vigorous frontal system (JC29). Considering the three data sets individually, the D306 and D321 data sets cover broadly the same range of Richardson number (1 < Ri < 50), while the JC29 data set covers a narrower range of smaller value Richardson number (0.28 < Ri < 10). The range of Richardson numbers covered by the observations (0.28 < Ri < 50, section 3) is of the same order as reported in previous studies of the Gulf Stream (3 < Ri < 40, Pelegri and Csanady ) and of the Florida current (2 < Ri < 20, Winkel et al. ). This would suggest that a parametrization for vertical mixing based on these observations ought to be more broadly representative of ocean mixing associated with the mesoscale than the one based on observations of the Equatorial Undercurrent, as intended.
4.2 Turbulent Diffusivity
 The parametrization for vertical turbulent diffusivity presented here consists of a Richardson number parametrization of mixing combined with a constant representing vertical turbulent diffusivity arising from other unresolved shear-driven processes (section 2). Together, these two terms represent a parametrization of vertical turbulent diffusivity for Richardson numbers between 1 and 50, which is appropriate to the observations. Considering the two terms separately, the Richardson number mixing term is dominant (contributes more than 50% of the observed diffusivity) for Richardson number <10, while the constant background term dominates when the Richardson number >10 (Figure 6). The majority of the observations (73%) are for Richardson numbers in the range of 1 to 10, with very few (~3%) being for Richardson numbers less than 1 (Figure 9).
 Considering the Richardson number mixing term and comparing the values of the parameters αs = 1, ns = 1.5, Kos = 3.6 × 10−4 m2 s−1, estimated by fitting observations from all data sets to equation ((3)), with those from previous studies summarized in Table 1 [Peters et al., 1988; Pelegri and Csanady, 1994; Yu and Schopf, 1997], ns is within the range of previous estimates, and Kos is of the same order of magnitude. However, αs is lower than the commonly used value of 5 and at the lower limit of the range of values used to constrain the term while fitting (section 2.7).
 Fixing the value of αs = 5 and fitting to observation results in a parametrization where the range of parameter values (ns, Kos, and, Kbs) producing the best fits are within the range of values used for these parameters in previous studies (Tables 1 and 3). However, fixing the value of αs = 5 and fitting to observations does not produce a better (lower QM) fit than fitting with all parameters varying (Tables 2 and 3). Comparing the observations of turbulent diffusivity to the values calculated using the parametrization where αs = 5, 60% of the calculated values are within a factor of 2 of the observations. If we consider Ri above 1, then the parametrization where αs = 5 produces estimates of turbulent diffusivity that in all cases are within the 90% confidence limits of the parametrization obtained when all parameters are allowed to vary (Figure 8). This suggests that the parametrization where αs = 5 represents the observations at least as well as the parametrization obtained when all parameters are allowed to vary. However, the parametrization obtained when all parameters are allowed to vary is preferred as it represents the fit to observations with the lowest QM.
 The value of Kbs derived from fitting equation ((3)) to the full data set is of approximately the same magnitude as values that have been used in previous parametrizations (Table 1) and close to estimates of the open ocean value of vertical mixing from wave-wave interactions of the internal wave field (7 × 10−6 m2 s−1Polzin et al. ). Estimating Kbs for the individual data sets results in values of Kbs from 7.7 to 19 × 10−6 m2 s−1.
 In all cases, the QM for the parametrization with the data set-specific Kbs is within the 90% confidence limits of the QM for fitting the parametrization with the whole data set value of Kbs = 8 × 10−6 m2 s−1. For the D321 and D306 data sets, the confidence limits for the data set-specific Kbs encompass the full data set value. This would suggest that the value of Kbs derived from the full data set is not unreasonable as a value of background vertical mixing for these observations. The value of the JC29 data set Kbs is outside the 90% confidence limits to the full data set value. However, there are only a small number of values in the JC29 data set for Richardson number >10 (Figure 6) where Kbs is the dominant term in the relationship. Applying the full parametrization to the JC29 data results in a QM that is no worse than that of the data set-specific Kbs fit (full parametrization QM 2.92 (90% confidence limit 2.54–3.19) data set-specific QM 2.87 (90% confidence limit 2.52–3.12)). This suggests that estimates of Kbs from the JC29 data set alone are likely to be ill-constrained.
4.2.1 Comparison to Previous Parametrizations of Diffusivity
 Comparing the observations used here to previous parametrizations of vertical turbulent diffusivity [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994] shows that for the range of Richardson numbers covered by the observations (1 < Ri < 50), all of the previous parametrizations underestimate the vertical turbulent diffusivity observed (Figure 9). Yet these parametrizations are commonly implemented in ocean models.
 The QM calculated by comparing the parametrization of Pacanowski and Philander  to the observations (using a 56 m smoothing window for Ri) presented here is 3.53 (90% confidence limit 3.36–3.70), when comparing the parametrization of Large et al.  the QM is 5.39 (90% confidence limit 5.1–5.72), and when comparing the parametrization of Pacanowski and Philander  the QM is 27.64 (90% confidence limit 26.17–29.28). Changing the size of the smoothing window changes the QMs but in no case is the QM from these previous parametrizations smaller than the QM for the parametrization presented here.
 The parametrization of Zaron and Moum  was tested in two forms, the “alternative” and the “revised” (“alt” and “rev,” respectively, in Zaron and Moum ). The alternative form of the parametrization was derived directly from observation and relates Ri to nondimensional turbulent diffusivity, while the revised form of the parametrization was designed to reproduce the Ri dependence of the observed vertical turbulent flux. Of the two forms, the alternate form is the most appropriate for comparison to the results presented here as it was fitted directly to observations of turbulent diffusivity. The observations of turbulent diffusivity presented here were nondimensionalized by dividing by |V|2/Sh, where both V and Sh were calculated using a 56 m smoothing window and compared to the parametrization of Zaron and Moum . The QM for the alternative form of the parametrization, calculated for the nondimensional observations, is 4.95 (90% confidence limit 4.30–5.74) (Figure 10). This indicates that the parametrization of Zaron and Moum  is better than the parametrizations of Large et al.  and Peters et al.  but worse than both the parametrization presented here and the parametrization of Pacanowski and Philander  in representing the observations. In summary, for the observations presented here, the new parametrization provides a significantly more accurate reproduction of mixing than all previous parametrizations.
4.2.2 Applicability of the Turbulent Diffusivity Parametrization
 Richardson number is considered to be only a first-order redictor of mixing as functional relationships between Richardson number and mixing fail to capture the observed variability in the mixing [Peters et al., 1995]. This is clearly true for the parametrization presented here and for similar, commonly used, Richardson number parametrizations (Figure 9). However, the parametrization presented here is proposed for implementation in mesoscale resolving ocean general circulation models where the output of the parametrization represents mixing integrated over large time and space scales (of order km/days) compared to the scales of turbulent variability (of order m/min). Consequently, a parametrization that is representative of the mean of the observations is most appropriate to use in a model.
 The Thorpe scales of the observed mixing (of order 1–3 m) suggest that the mixing parametrized here is likely to be occurring at the finescale [2–3 m, Toole and Schmitt, 1987; Polzin, 1996]. Consequently, what the parametrization may well be reflecting is the relationship between the internal wave-mean flow-driven mixing and the strength of the mesoscale flow, which has been suggested to be dependent, in part, on mean flow Richardson number [Polzin et al., 1996]. This would imply that the main source of the observed variability in the mixing is likely to be due to variability in the underlying internal wave field. The strength of internal wave-flow interactions may potentially dominate internal wave-wave interactions for flow with Richardson number <20 [Polzin et al., 1996]. This is consistent with the suggested limits of applicability for which the Richardson number mixing term in the parametrization presented here is dominant (1 < Ri < 10).
 Shear-driven mixing resulting from wave-wave interactions of the internal wave field occurs typically at a vertical scale which is too small to be represented explicitly within a mesoscale resolving ocean model with a vertical scale of order 10 m and as such is represented as the constant Kbs term. The magnitude of the processes combined within the Kbs term are likely to vary both temporally and spatially, and representing such variability using a uniform constant term, although common in implemented mixing parametrizations [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994], is unlikely to be representative under all circumstances. The magnitude of the Kbs term derived from the observations presented here is of the same order of magnitude as similar terms used in previous parametrizations and, as such, is likely to be no worse a representation of finescale-shear-driven mixing. In regions of the ocean away from mesoscale flow, in the absence of convection and double diffusion, where the Richardson number is >10, the vertical turbulent diffusivity is likely to be dominated by this background mixing term.
4.3 Turbulent Viscosity
 The usefulness of equation ((7)) in calculating turbulent viscosity accurately at the mesoscale relies on two strong assumptions; that that pressure transport of momentum is negligible and that production and dissipation of turbulent kinetic energy by the mesoscale flow are equal [Peters et al., 1988; Thorpe, 2005; Zaron and Moum, 2009]. The presence of a strong internal wave field may potentially invalidate both these assumptions since internal waves are likely to contribute both to the pressure transport of momentum and to shear production of turbulent kinetic energy at vertical scales smaller than the mesoscale [Peters et al., 1988, 1995; Zaron and Moum, 2009].
 From the data used here, it is not possible to determine the magnitude of any internal wave driven pressure transport of momentum. However, the Thorpe scales for the observed mixing are consistent with a finescale-shear production which suggests that internal wave shear may be significant in the production of the observed dissipation. Dissipation caused through internal wave-wave interactions is generally accepted to scale with N2, such that ε ∝ N2 [Gregg and Sanford, 1988; Polzin et al., 1995], although this scaling may not be appropriate for internal wave-flow interactions [Polzin et al., 1996]. While not consistent with the ε ∝ N2 scaling, the data presented here show a monotonic increase in dissipation with increasing N2, which is suggestive of an internal wave dissipation source (Figure 11). Combined with the observed Thorpe scales, this apparent monotonic relationship between ε and N2 suggests that the observed turbulent viscosity may indeed be being produced at the finescale with the magnitude of the finescale mixing influenced by the mesoscale flow shear. Consequently, under these circumstances, approximating vertical turbulent momentum transport using a turbulent viscosity would not be appropriate.
4.3.1 Comparison to Previous Parametrizations of Viscosity
 For completeness, the observations of viscosity reported here are compared to the estimate of vertical turbulent viscosity from previous parametrizations [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994]. This shows that for the range of Richardson numbers covered by the observations, none of the previous parametrizations appear to represent the observations well (Figure 12).
 The QM when comparing the parametrization of Pacanowski and Philander  to the observations of turbulent viscosity (using a 56 m smoothing window for Ri) here is 7.2 (90% confidence limit 6.84–7.58), when comparing the parametrization of Large et al.  the QM is 8.14 (90% confidence limit 7.76–8.57), and when comparing the parametrization of Peters et al.  the QM is 48.64 (90% confidence limit 45.96–51.53). Hence, none of the previous parametrizations of turbulent viscosity represent the observations better than a constant turbulent viscosity of 1 × 10−3 m2 s−1, which has a QM of 2.66 (90% confidence limit 2.56–2.78). Observations of turbulent viscosity were nondimensionalized, as described above, and compared to the parametrization of Zaron and Moum . As with the other parametrizations, neither form of this parametrization appears to be representative of the nondimensional observations of turbulent viscosity (Figure 13). The QM for the alternative form of the parametrization, calculated for the nondimensional observations, is 12.78 (90% confidence limit 11.19–14.66). Changing the size of the smoothing window changes the QM when comparing to the previous parametrizations, but in no case is the QM from comparing a previous parametrization smaller than the QM for a constant turbulent viscosity.
 A new Richardson number based parametrization for turbulent diffusivity has been developed (equation (3)) using observations from three separate ocean regions (section 2.1). The parametrization provides estimates of vertical turbulent diffusivity in stratified shear flow typical of mesoscale ocean features, incorporating eddies and fronts, and has been shown to give a more accurate representation than previous parametrizations [Pacanowski and Philander, 1981; Peters et al., 1988; Large et al., 1994; Zaron and Moum, 2009]. This parametrization is considered to be robust for values of the Richardson number greater than 1 at depths below the ocean surface boundary layer.
 The new Richardson number parametrization for turbulent diffusivity has been formulated using dimensional constants to be compatible with the majority of existing Richardson number parametrizations of turbulent diffusivity and could be used in conjunction with existing expressions for low Richardson number mixing to give a parametrization that is applicable across the full (zero to infinity) range of Richardson number. The implementation of this new parametrization in an eddy resolving ocean model remains to be tried.
 The observations of turbulent viscosity reported here are found to be best represented by a constant turbulent viscosity of 1 × 10−3 m2 s−1. This may well be due to the turbulent mixing occurring at the finescale rather than the mesoscale with the parametrization reflecting the relationship between the internal wave-flow-driven mixing and the strength of the mesoscale flow. The presence of turbulent mixing with multiple turbulent length scales undermines the assumptions behind equation ((7)) and makes the estimation of turbulent viscosity from observations of turbulent kinetic dissipation and mesoscale flow-related vertical shear problematic. Consequently, the finding that there appears to be no robust relationship for mesoscale shear enhanced turbulent viscosity is regarded to be at best tentative, and further work is required to investigate this relationship.
 The authors wish to thank the officers, crew, and entire scientific compliment aboard the R.R.S. Discovery and R.R.S. James Cook during cruises D306, D321, and JC29.