Surface-sampled simulations of turbulent flow at high Reynolds number

A new approach to turbulence simulation, based on a combination of large-eddy simulation (LES) for the whole flow and an array of non-space-filling quasi-direct numerical simulations (QDNS), which sample the response of near-wall turbulence to large-scale forcing, is proposed and evaluated. The technique overcomes some of the cost limitations of turbulence simulation, since the main flow is treated with a coarse-grid LES, with the equivalent of wall functions supplied by the near-wall sampled QDNS. Two cases are tested, at friction Reynolds number Re$_\tau$=4200 and 20,000. The total grid node count for the first case is less than half a million and less than two million for the second case, with the calculations only requiring a desktop computer. A good agreement with published DNS is found at Re$_\tau$=4200, both in terms of the mean velocity profile and the streamwise velocity fluctuation statistics, which correctly show a substantial increase in near-wall turbulence levels due to a modulation of near-wall streaks by large-scale structures. The trend continues at Re$_\tau$=20,000, in agreement with experiment, which represents one of the major achievements of the new approach. A number of detailed aspects of the model, including numerical resolution, LES-QDNS coupling strategy and sub-grid model are explored. A low level of grid sensitivity is demonstrated for both the QDNS and LES aspects. Since the method does not assume a law of the wall, it can in principle be applied to flows that are out of equilibrium.


INTRODUCTION
Despite advances in hardware and in particular the use of massively parallel supercomputers, applications of direct numerical simulation (DNS) are limited in terms of the Reynolds number (Re) that can be reached, owing to the cost of the simulations. Measured in terms of number of grid points, the cost scales strongly with Re, for example the number of grid points required scales as Re 37/14 L (where L is the distance from the leading edge) for boundary layer flow [1] and smaller timesteps are also required as the grid becomes finer. A cheaper approach is large-eddy simulation (LES) where only the larger scales are simulated, while smaller scales are modelled. However, near a wall the smaller scales play a predominant role and to obtain sufficient accuracy many LES in practice end up being 'wall-resolved' LES, where grid node counts are significantly lower than DNS (typically of the order of 1%) but a strong scaling with Re remains, meaning that LES is also too expensive for routine application, for example to flow over a commercial aircraft wing. The alternative of wall-modelled LES has much more attractive scaling characteristics (fixed in terms of boundary layer thickness, for example), but relies very heavily on a wall treatment. Given that there is no accurate reduced-order model for turbulence near a wall (which would require some kind of breakthrough solution of the 'turbulence problem'), a lot of reliance would be placed on the near-wall model, with little likelihood of significant improvements over second-moment closure approaches based on the Reynolds-averaged equations. In this paper we consider an alternative approach whereby small-domain simulations are used to represent the near-wall turbulence, in a non-space-filling manner, and linked to an LES away from the wall, where the sub-grid models might be expected to work with reasonable accuracy.
To understand the new approach, an appreciation of recent progress in understanding the physics of near-wall turbulence is useful. The inner region, consisting of the viscous sublayer and the buffer layer, out to a wall-normal distance of z + ≈ 100 (where z is the wall normal distance and the the style of heterogeneous multiscale methods (HMM), a general framework in which different modelling techniques/algorithms are applied to different scales and/or areas of the computational grid [13,14,15,16]. More specifically, the crux of HMM is the coupling of an overall macroscale model (i.e. the LES in this case) with several microscale models (i.e. the QDNS blocks); it is these microscale models that can provide missing/more accurate data (i.e. the shear stress boundary conditions) back to the macroscale model.
A similar multiscale reduced-order approach was formulated independently by [17] and applied to a quasigeostrophic model of the Antarctic Cirumpolar Current. The velocity field and potential vorticity gradient were advanced in time using a coarse grid model; this model comprised small embedded subdomains at each 'coarse' grid location (in contrast to the use of blocks encompassing multiple coarse grid points in this work), within which smaller-scale eddies evolved on a separate spatial and temporal scale. This is similar to the method proposed here in that the domain comprises smaller turbulence-resolving simulations that are coupled with a coarser grid simulation.
Furthermore, the components of the eddy potential vorticity flux divergence were computed and averaged in the subdomains and fed back to the coarse grid model, much like the near-wall averaged shear stresses computed in the approach described here. However, unlike the present work, the state of the eddy-resolving embedded subdomains was not carried over between coarse grid time-steps and was reset each time to a given initial condition.
In this paper we set out the method and present results from a proof-of-concept simulation of turbulent channel flow, also showing the sensitivity of the method to various numerical parameters.
Section 2 provides details of the numerical approach and its implementation as a Fortran code. Section 3 presents the proof-of-concept results from the simulation of turbulent channel flow. The potential for extension of the method to very high Reynolds number is then discussed in Section 4.
The paper closes with some conclusions in Section 5.

Numerical method
The same numerical method is used for both the LES and the near-wall QDNS domains shown in Figure 1, all of which have periodic boundary conditions applied in the wall-parallel directions x and y. Within these domains the incompressible Navier-Stokes equations are solved on stretched (in z) grids, using staggered variables (with pressure p defined at the cell centre and velocity components u i at the centres of the faces), by an Adams-Bashforth method. The governing equations are the continuity equation and the momentum equations where all variable are dimensionless (normalised using the channel half height, friction velocity, density and kinematic viscosity) and the term δ i1 provides the driving pressure gradient. Enforcing a constant pressure gradient or constant mass flow rate are the two main approaches to ensuring that the flow field evolves with a near-constant wall shear velocity [18]. In the present work, the constant pressure gradient δ i1 frequently used in similar channel flow simulation setups (e.g. [19,20,21]) is used only in the LES, whereas for the QDNSs it is set to zero and a constant mass flow rate is employed for consistency reasons so that there is conservation of mass between the LES and QDNS.
It was found that using only the LES stresses alone to drive the QDNS simulations resulted in too high a flow velocity. Any inaccuracies in the shear stresses would increase over time since there was no mechanism in place to keep the wall shear velocity (and therefore Re τ ) near the desired constant value.
Grids are uniform in the wall-parallel directions x and y and stretched in the wall-normal (z) direction according to where a is a stretching parameter and ζ is uniformly spaced on an appropriate interval (−1 ≤ ζ ≤ 1 in the LES for example).
The Adams-Bashforth method advances the solution in time using two steps. In the first step a provisional update of the velocity field is made according to where A final correction is then made to give where the pressure is obtained by solution of Application of a fast Fourier transform in horizontal planes leads to a tridiagonal matrix that is solved directly.

Model implementation
The model code was written in Fortran 90, with conditional statements used to enable/disable the LES parameterisation depending on the flag set in the simulation setup/configuration file.
Each iteration of the combined LES-QDNS approach entailed first running each QDNS simulation individually with its own setup file (containing the number of timesteps to perform, for example); the LES was then run immediately afterwards to complete the iteration (and thus a single LES timestep, as explained in the next subsection). The setup and execution of these simulations was performed using a Python script that ensured the simulations were run in the correct order, and also performed statistical averaging and postprocessing of the simulation results. Such postprocessing includes the averaging of the shear stresses from all the QDNS and writing out these results to a file in a format that the LES expects, as discussed in the next section. Note that, while the model itself was written in Fortran and could only be executed in serial, the Python script that handled the execution of the simulations was parallelised such that all of the QDNS were executed at the same time, with the results then being combined/postprocessed via MPI Send/Receive operations. The mpi4py library [22] was used for this purpose. For a setup involving N × N QDNS per wall, the LES-QDNS approach requires (N × N × 2) + 1 MPI processes (N × N × 2 processes for the total number of QDNS, and one process for the LES).

Interconnection between LES and QDNS
The basic arrangement for the simulations is as shown on figure 1. To illustrate the details we consider a baseline case at Re τ = 4200, corresponding to the highest current Re τ for DNS of channel flow [23]. The DNS used a domain of size 2π by π by 2 with a 2048 × 2048 × 1081 grid. The smallest resolved length scale in a DNS needs to be O(η), where η is the Kolmogorov length scale [24]. The choice of O(η) grid spacing in the DNS of [23] therefore satisfied this requirement, and is consistent with known guidelines for the choice of wall units in turbulent channel flow simulations (see e.g. [19,25,26]). Here we attempt the same configuration using an LES in a domain 6 × 3 × 2 † on a 24 × 24 × 42 grid (with stretching parameter a set to 1.577) with a 4 × 4 array of QDNS on each wall, each QDNS using a 24 3 grid (with stretching parameter a set to 1.4) covering a domain in wall units of 1000 × 500 × 200. The total number of grid points is less than half a million, or 0.01% of the DNS. In this baseline case the QDNS grid spacing in wall units is ∆x + = 41.7, ∆y + = 20 with the first cell centre at z + = 1.5.
The choice of QDNS resolution follows guideline values in the literature (e.g. ∆x + typically less than 50 in the spanwise direction compared to 20 for DNS [27]) such that the cost of the QDNS is approximately an order of magnitude less than a full DNS near the wall [28]. It was found that refining this further had little impact on the accuracy of the results, as discussed in Section 3. Seen in plan view the entire QDNS occupies one LES cell (i.e. L x,QDNS = ∆x LES and L y,QDNS = ∆y LES . In the wall-normal direction the QDNS overlaps the LES, in this case by three cells, to avoid using the immediate near-wall points that are most susceptible to errors in the accuracy of the sub-grid modelling. These three cells cover the region out to z + = 200 with the centre of the first LES cell at z + = 30. The LES grid was deliberately kept very coarse in order to highlight the potential savings of the proposed method and how it takes advantage of the separation of scales, although it was found a posteriori that it needed refining to a 96 × 96 × 56 grid in order to yield a much better mean flow prediction (see Section 3).
The required resolution for DNS and QDNS scales strongly with Reynolds number [27], with  It is possible that the effects of these turbulent small-scale structures are being dissipated by the averaging procedure or simply by the region of lower resolution outside the QDNS block, with downstream blocks becoming increasingly inaccurate as a result. It is unclear how many QDNS blocks will be required in general, but the number is likely to scale with Re τ in order to obtain adequate sampling near the wall.

PROOF OF CONCEPT AND SENSITIVITY TO NUMERICAL PARAMETERS
The mean streamwise velocity u + and root mean square (RMS) of the streamwise velocity fluctuations u + RMS were used as performance measures. These are defined, for each point k in the z-direction, by where u + i,j,k is the dimensionless velocity at grid point (i, j, k). The quantities N x and N y are the number of grid points in the x and y directions. The quantities were not accumulated over all timesteps, but were instead accumulated every S timesteps, where S was chosen to be sufficiently small to ensure a steady average. In addition, the mean velocity relative to the friction velocity was also considered. This quantity is defined as The mean streamwise velocity for the baseline case is shown in figure 3 in linear and semi- LES approach of [11] also gives the modulation effect, but not the multi-block model of [10], which uses the same replicated near-wall block everywhere on the wall.
The extremely coarse-grid LES shows significant errors in the structure of the turbulence as the wall is approached. Figure 5 shows  Any simulation-based model of turbulence is only useful if it provides a suitable degree of grid independency. In the current case the resolution required for the QDNS is reasonably well known, based on previous DNS. Figure 6 (a) shows a negligible effect on the mean flow of increasing the near-wall QDNS from 24 3 to 32 3 , which is still well below the levels required for a resolved DNS (64 3 would give a resolution of ∆x + = 15.6, ∆y + = 7.8 and a first grid point at z + < 1). One effect of the increased resolution is the reduced near-wall peak of the RMS streamwise velocity, shown  to the QDNS arrangement, comparing the baseline case (4 2 on each wall) with two coarser cases (2 2 and 1 2 on each wall) and one finer case (6 2 on each wall). on figure 6(b), with the correct trend to agree with the DNS in the limit of very fine resolution.
Additionally there is a slight improvement (<4%) in the RMS from around z + = 80 onwards. This increases the LES grid point count by a factor of 21 and the timestep is reduced by a factor of 4 due to Courant number restrictions, but relatively little change is seen in the mean flow. The main effect is for the centreline velocity prediction to change from a 3% undershoot to a <1% undershoot. Similarly, the disagreement at the interface between the LES and QDNS blocks is reduced from about 5% to 2%. While the agreement at the near-wall peak in the streamwise velocity velocity derivatives from the LES to enforce the mass flow rate in the QDNSs. For each block, a number of LES grid points were used for the averaging. It was observed that too small an averaging window caused the RMS velocity curve to be significantly higher than the DNS results, which was likely caused by small grid-to-grid point oscillations (in turn caused by under-resolution of the turbulence) being picked up near the wall. On the other hand, too large an averaging window can introduce turbulence smoothing, reducing the turbulent kinetic energy levels in the QDNS and therefore causing the curve to be lower than that of the DNS. The latter may have had an effect here since the number of grid points used in each averaging window (N x /4 × N y /4) was obviously greater in the refined case (with the length and width of the averaging window remaining the same).
Note that the choice of averaging size did not significantly alter the mean streamwise velocity results which were consistently better than the results from the coarser LES grid. The ultimate convergence of the LES back to the DNS would require much finer grids and large parallel simulations, which is beyond the scope of the current investigation. Nevertheless, the limited sensitivity to the grid at these very low resolutions is promising.
Finally in this section, we consider the effect of the basic arrangement of the QDNS blocks.
The baseline configuration has 4 × 4 blocks, as sketched in figure 1. This configuration seems to be capable of resolving near-wall flow features, as illustrated by the velocity contours that were shown on figure 2. Figure 9 shows the effect of reducing the number of near-wall blocks to 2 × 2 and 1 × 1, which clearly under-samples the flow features. The mean flow on figure 9(a) shows that the principal effect of reducing the near-wall block count is to slightly diminish the accuracy of the near-wall turbulence. This is possibly due to aliasing effects when trying to sample the very highfrequency turbulent structures. Whilst this result is not catastrophic, it does lead to the conclusion that 4 × 4 blocks is probably a minimum number of blocks for a reasonable prediction of the mean flow for the current domain size. On the other hand, increasing the number of near-wall blocks to 6 × 6 yields an improved mean flow prediction particularly near the LES-QDNS interface. While the RMS curve for the 6 × 6 case displays the correct shape, the values continue to overshoot the DNS data near the wall. As already noted, these RMS values are sensitive to the averaging procedure used to enforce the mass flow rate in the QDNS blocks.

EXTENSION TO HIGHER REYNOLDS NUMBER
Since the method has been proposed here as a way of simulating high Reynolds number flows, it is of interest to test the approach at even high Reynolds numbers. In this section we consider a simulation at Re τ = 20 000, which is a factor of nearly 5 higher than that used in the previous section. If we keep the same near-wall QDNS configuration, with 4 × 4 blocks, each of which showed the equivalent figure at Re τ = 4200. Part (a) of figure 10 shows the streamwise velocity field from the LES at z + = 322, with the QDNS block superimposed, although these are too small to be clearly visible. Figure 10 Statistical results for the simulation at Re τ = 20 000 are shown on figure 11, comparing the results with the logarithmic law of the wall u + = 1/κ log z + + b with κ = 0.39 and b = 4.5 (where these values have been chosen to agree with the DNS data from [23]). The solution overshoots the log law by about 6% near the LES-QDNS interface. It seems unlikely that sub-grid models can be blamed for the overshoot, although this is something that could be tested. Compared to [12] the mean flow prediction is approximately 9% too low, although we should note that the Reynolds number in this simulation is well above the highest Reynolds number used by Dean to make his correlations. In general the RMS streamwise velocity fluctuations shown on figure 11(b) follow the expected trend, with the RMS increasing as Re τ increases. The near-wall peak clearly increases with the fivefold increase in Re τ , although the RMS at z + = 100 decreases by 20% before once again rising slightly above the Re τ = 4200 line. Therefore, the method proposed here clearly has limitations dependent on Re τ , which could be mitigated through the use of finer LES resolution or more QDNS blocks which have been shown to improve the results in the Re τ = 4 200 case.
In summary, the effect of increasing Reynolds number is partly captured by the method presented in this section, which is based on a computational cost (including the number of time steps) that scales approximately proportional to Re τ . However the results at Re τ = 20, 000 deviate from the logarithmic law, suggesting that the quality of the results will decrease with further increases in Re τ . To improve on this probably requires more computational resource, and in this respect it is interesting that the trend to over-predict the logarithmic law of the wall was also seen when the number of QDNS blocks was reduced to 2 × 2 and 1 × 1, as shown in figure 9. This suggests that one method to increase the accuracy of the simulations is to increase the number of QDNS blocks, for example to 32 × 32 at Re τ = 20 000. Although this parallelises trivially, it formally represents a scaling of the computational grid with Re 2 τ for channel flow, albeit with a much lower constant of proportionality than wall-resolved LES. Another method of increasing the accuracy at higher Re τ would be to increase the domain size of the QDNS, also resulting in a higher scaling exponent.
These estimates may be reduced if the resolution of structures associated with the mixed scaling of [8] is the limiting factor. Otherwise, for very high Re τ one may need to apply the method recursively, with successively smaller domains as the wall is approached.

CONCLUSIONS
A new approach to simulating near wall flows at high Reynolds number has been presented and tested. The method relies on LES for the whole domain, but with the skin friction supplied from a set of quasi-DNS of the near-wall region (out to a wall normal distance of z + = 200). These nearwall simulations use periodic boundary conditions and are not space-filling, but provide an estimate of the two components of skin friction, given the instantaneous near-wall velocity gradients. The method has an extremely small communication overhead between the LES and quasi-DNS and is thus suitable for scaling to large core counts. The accuracy of the method was demonstrated small-scale near-wall features by large structures, residing either in the logarithmic or outer regions of the flow. This makes it possible, for example, to study the effects of wall-based flow control schemes in a high-Reynolds number external environment. The method is found to be robust to changes in grid resolution. An O(Re τ ) total cost extrapolation to Re τ = 20 000 demonstrated some limitations, suggesting that accurate simulations at higher Re τ probably have a higher total cost scaling (including an increase in grid points and in the number of timesteps), however at much lower cost relative to wall-resolved LES. For the particular case considered here, that of turbulent channel flow, wall functions for LES based on the logarithmic law of the wall would be expected to work well. The advantage of the current approach is that the log law is not assumed and it would be expected that the effects of a range of non-equilibrium flow conditions could be captured, so long as the surface sampling is sufficient relative to the dominant large-scale structure in the flow.
Overall the new method offers the potential for engineering calculations at high Reynolds number at a substantially lower computational cost compared to current LES techniques. Iridis 4 compute cluster. The data files generated as part of this work will be available through the University of Southampton's institutional repository service.