• Please log in or register to access this feature.

SEARCH

SEARCH BY CITATION

Keywords:

  • discontinuous Galerkin;
  • high performance computing;
  • multirate time stepping;
  • explicit Runge–Kutta;
  • shallow water equations;
  • Great Barrier Reef

SUMMARY

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

This paper presents multirate explicit time-stepping schemes for solving partial differential equations with discontinuous Galerkin elements in the framework of Large-scale marine flows. It addresses the variability of the local stable time steps by gathering the mesh elements in appropriate groups. The real challenge is to develop methods exhibiting mass conservation and consistency. Two multirate approaches, based on standard explicit Runge–Kutta methods, are analyzed. They are well suited and optimized for the discontinuous Galerkin framework. The significant speedups observed for the hydrodynamic application of the Great Barrier Reef confirm the theoretical expectations. Copyright © 2012 John Wiley & Sons, Ltd.

1 INTRODUCTION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

The development of suitable and fast time integration methods for ocean modeling constitutes an important challenge. It is indeed impossible to use one single time discretization scheme that is effective for all physical processes in a complex marine model, as different subsystems have widely different characteristics in terms of time scales, dynamic behavior, and accuracy requirements. The primitive equations for ocean flows allow for the existence of phenomena exhibiting a wide spectrum of propagation speeds. Typically, external gravity waves propagate at 10 − 100 m/s and internal waves at a few meters per second, whereas advection is characterized by speeds ranging from 10 − 3 to 1 m/s. Large-scale and small-scale processes have significant interactions so it is essential to simulate them simultaneously. Today, it seems impossible to reproduce all scales with structured uniform grids because the computational cost can become very crippling with the high resolution that is required. Therefore, variable resolution is needed both temporally and spatially.

Unstructured grids are well suited to capture complex topography and also allow the representation of a wide spectrum of time and length scales in a single model. The finite volumes and the finite elements are the two main methods that make use of unstructured grids. Many groups are now developing finite volume codes for coastal applications such as the Finite Volume Community Ocean Model (FVCOM) [1] and others [2, 3]. In the area of Large-scale ocean modeling, continuous finite element methods (FEM) are used in models such as the Finite Element Ocean Model (FEOM) [4], the Imperial College Ocean Model (ICOM) [5, 6], and others [7, 8]. Our research team is developing the Second-generation Louvain-la-Neuve Ice-ocean Model (SLIM) 1 [9-12] which is a discontinuous Galerkin-based finite element model.

The variable resolution and the complexity of unstructured mesh generation processes generally lead to grids with an important dispersion of element sizes. More specifically, although finite element meshers are able to control the average element size of a mesh, they are usually unable to control the smallest element size. In this context, the classical, conservative, explicit time discretization methods are limited because of stability requirements. The Courant–Friedrichs–Lewy (CFL) condition, that combines the finest cell size and the highest wave velocity, may highly restrict the global allowable explicit time step. Accordingly, the computational efficiency of explicit time-stepping methods may be drastically low.

For instance, consider the case of a typical mesh of the Great Barrier Reef (GBR), illustrated by Figure  1, made up of about 1 million triangles. This mesh is built by means of the open source software GMSH 2[13]. Element sizes were determined to capture the relevant bathymetric and topographic features as well as the associated hydrodynamic processes, such as eddies and tidal jets  [10]. For the mesh and bathymetry presented in Figure  1, the estimated minimum and maximum stable time steps among all elements are 0.154 and 7.972 s, respectively. To run a 24-h simulation with a classical explicit method, 561,039 time steps would have to be performed on almost 1 million elements. One possibility to reduce these expensive computations is to adapt the time steps under local stability conditions.

Figure 1. Bathymetry and mesh of the Great Barrier Reef with a first zoom (bottom left, white) on the Holbourne Island and a second one (upper right, red) on the Whitsunday Islands Archipelago. The mesh is made up of 909,185 triangles, with inner radii comprised between approximately 29 m and 1.3 km, and 444,598 nodes.

Download figure to PowerPoint

image

Multirate schemes represent a class of methods that use various time steps on different grid cells. The strategy consists in gathering the grid cells in different groups that satisfy the local CFL stability conditions for a certain range of time steps. Standard explicit Runge–Kutta (ERK) methods are applied on bulk groups with a local time step in such a way that the total computational efforts are drastically reduced. Buffer groups are introduced, with adapted ERK methods, to accommodate the transition between the different bulk groups. However, the development of such methods is still challenging because convergence and conservation properties should remain satisfied during the communication between the groups. In this context, two multirate approaches that attempt to partly solve the transition issues are explored. The first one, introduced by Constantinescu and Sandu, [14], preserves the system invariants but is at most Second-order accurate. On the other hand, Schlegel et al.[15] have proposed a method that borrows some ideas of the implicit–explicit (IMEX) splitting scheme [16, 17]. It can be proved that a third-order multirate scheme can be achieved with an appropriate base ERK method. Unfortunately, this method turns out to be nonconservative in our strategy.

The aim of this paper is to develop and adapt these multirate methods to large unstructured meshes in the framework of the discontinuous Galerkin method (DGM). The standard ERK methods and their time step restrictions are described in Section 2. Two multirate approaches [14, 15], with different features, are introduced and analyzed in the DGM framework (Section 3). The construction of multirate groups for multiple levels of refinement and a way to optimize the speedup is addressed in Section 4. Finally, numerical experiments will be shown and discussed for the GBR in Section 5.

2 EXPLICIT TIME INTEGRATION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

When solving time-dependent partial differential equations (PDEs), it is a common practice to first discretize the spatial variables to obtain a semidiscrete method of lines (MOL) by leaving the time variable continuous. The spatial and temporal discretizations are then independent. The advantage of this procedure is that the problem reduces to a system of ordinary differential equations (ODEs) to which a numerical method for initial value ordinary equations can be applied.

As an illustration, consider the case of a one-dimensional (1D) scalar advection equation written in a conservative form

  • display math(1)

with initial condition

  • display math(2)

and appropriate boundary conditions. Assume a conservative DG spatial discretization of Equation (1) which can be represented by a function f(ui,t). It contains the volume and interface parts of the steady-state residual of the problem multiplied by the inverse of the mass matrix associated with the element. By notational convenience, we define ui (ui,1 ⋯ ui,n) as the set of all local discrete values defined in element Ωi. The semidiscrete DG approximation can be written, for each grid element Ωi, as the following Cauchy problem

  • display math(3)

which needs to be solved in time. A class of numerical methods to integrate the solution in time is the family of ERK schemes. The MOL approach may be extended to any conservation law with larger dimensions and/or multiple unknown fields. Other discretization techniques may also be used to approximate the spatial terms of the PDEs.

Table 1. Butcher tableau and development of the two-stage, Second-order RK method.Thumbnail image of

2.1 Explicit Runge–Kutta schemes

ERK methods are among the most popular time-stepping schemes [18, 19]. They are self-starting meaning that they give the solution at the next time step only in terms of the current solution. Therefore, only the initial condition is needed to start the time integration. Other well-known schemes such as the Adams-Bashforth methods are multistep and use solutions at different previous time steps. ERK methods have also the property of being relatively flexible. For instance, the time step may be changed at each iteration of the scheme. These explicit schemes may also be developed up to high orders of accuracy with the constraint that an rth order Runge–Kutta (RK) method needs s ≥ r inner stages, that is, evaluations of the steady-state residual and multiplication by the inverse mass matrix. A widely used ERK method is the classical four-stage, fourth-order scheme (RK44). Flux limiters, strongly recommended for hyperbolic conservation laws with DG discretization in space, may be applied to the solution in a simple manner. They ensure that nonoscillatory properties are achieved for strong shocks. An s-stage ERK method computes the next step solution un + 1, at time tn + 1 = tn + Δt, with the use of the current solution un at tn by applying the following algorithm:

  • For k = 1 : s do

    • display math(4)
    • display math(5)
  • Compute un + 1 as

    • display math(6)

Butcher tableaus conveniently represent this family of ERK methods. They are defined by a matrix inline image and two vectors inline image [20]:

image

where A is strictly lower triangular. For consistency, it is required that inline image. The order of the method is related to the constraints imposed on Akl, bl, and ck. The u(k) variable represents the solution at stage k of the method, that corresponds to the intermediate time inline image. At each stage of the ERK method, Kk is computed, that is, the steady-state residual evaluated for u(k) and multiplied by the inverse of the mass matrix. Afterwards, the next step solution un + 1 is obtained by summing un with a linear combination of the Kk. These methods can be rewritten as a convex combination of Euler steps [21]. Therefore, they may be categorized in the family of strong stability preserving (SSP) time-stepping schemes. This property ensures that a certain norm, such as the total variational norm [21], of the solution does not increase in time. SSP numerical methods are often required for problems with discontinuous solutions, such as shock waves in hyperbolic problems. Nonphysical behaviors like spurious oscillations can be avoided in this way. Gottlieb et al.[22] discussed the RK SSP schemes in detail, and several examples of these methods can be found in  [21]. As an example, consider the two-stage, Second-order method, RK2a, defined by the Butcher tableau represented in Table  1.

2.2 Time step restrictions

Even if ERK time integration methods are known to be very efficient for solving several types of PDEs, they have a major drawback because of their stability requirements. Indeed, it is well attested that the global time step should be taken below a critical value, determined by the CFL condition. For advection dominated advection–diffusion equations, the CFL constraint on the time step can be expressed as a ratio of the grid spacing Δx and the amplitude of the wave/advective velocity c. In almost all realistic scenarios, the CFL condition is not constant both spatially and temporally.

On the one hand, unstructured meshes have elements with a wide spectrum of sizes. Several numerical applications require that some regions of the domain are examined more closely. Local refinement is often needed to capture the topography of complex geometry and/or some specific physical behaviors. On the other hand, even for structured meshes, the wave speed may vary considerably across the entire domain. As an example, consider the case of the two-dimensional (2D) shallow water equations, where the wave speed is defined as inline image with g defined as the gravity and H(x,y) a strong varying water depth depending on the local horizontal coordinates. The global time step is then determined by the element where H(x,y) reaches its maximum.

For problems with unstructured meshes, made up of N grid elements, and nonconstant wave velocities, the CFL condition can be written as:

  • display math(7)

where Ωi represents element number i ∈ [1, … ,N] of the mesh. The constant C depends on the particular PDE and on the ERK scheme that defines the shape of the stability zone [23]. This is a severe restriction on the time step to guarantee overall stability. In the case of the GBR mesh of Figure  1, the global time step is critically smaller than the one required for most elements.

The use of fully implicit time integration schemes constitutes a way to avoid the restriction mentioned before. In such strategies, the only limitation on the time step is accuracy. The drawback, however, is that implicit methods require solving large (non-) linear systems of equations. Indeed, the dimension of the systems to solve increases with the number of degrees of freedom.

Another alternative, while using explicit schemes, is to use a multirate approach. The main idea is to consider different regions in the discretized spatial domain where the CFL condition is locally satisfied. Mesh elements are sorted, by their own characteristic stable time step, in different groups, specified by a maximum time step, for which ERK methods are stable and achieve the target accuracy. With locally adapted time steps, the computational efforts of the global algorithm could be considerably reduced.

As an illustration, consider a 1D mesh, represented in Figure  2, where elements Ωi have a size equal to h, except Ω0 that is twice smaller. If Equation (1) is to be solved on this mesh, assuming a constant wave speed c, one can determine the stable time steps for both kinds of cells. If inline image is stable for Ω0, then Δt may be assumed stable for Ωi > 0. Consider that we are able to apply the same s-stage ERK method with a time step inline image on Ω0 and Δt on the other elements. For a large value of N, the number of elements in the mesh, one can show that a speedup of 2 is obtained compared with the same s-stage ERK method applied with the same time step inline image everywhere.

Figure 2. one-dimensional unstructured mesh.

Download figure to PowerPoint

image

But, in such a strategy, there exists an inconsistency at the interface between the small and large element because they use a different time step. Therefore, a coherent transition should be ensured at the interface between groups of elements. In particular, convergence and conservation properties should be fulfilled at the interfaces. This constitutes one of the major difficulties when developing multirate schemes.

3 MULTIRATE TIME INTEGRATION

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

Multirate schemes for conservation laws have been reported in the literature since the early 1980s, but they were either locally inconsistent or not mass conservative. Mass-preserving multirate schemes were developed by Osher and Sanders [24] as well as by Dawson and Kirby [25], but it turns out that the time-stepping accuracy of the overall method is only first-order because of the treatment of the interfaces. Tang and Warnecke [26] proposed multirate schemes, based on standard two-stage ERK methods, that achieve Second-order consistency in time. The drawback, however, is that the resulting schemes are not mass preserving. Hundsdorfer et al. [27] discuss, within the framework of partitioned RK methods, the defects of multirate methods of first and second orders because of either the local inconsistency or the lack of mass conservation. They give a particular attention to monotonicity properties of the considered multirate schemes.

Multirate methods have also been developed in the framework of self-adjusting strategies. Savcenco et al. [28] consider implicit time-stepping methods suitable for stiff ODEs. Those methods use an error estimator to determine if smaller time steps are required to keep the error below a given tolerance for all components. The aim is to minimize the execution time without loosing accuracy. At the interfaces, it is necessary to interpolate the solutions associated to different times. In this context, Hundsdorfer et al. [29] studied a particular multirate scheme: the θ-method with one level of temporal local refinement. Stability, local accuracy, and propagation of interpolation errors are analyzed in detail.

A first step to construct multirate schemes is undeniably to ensure that the different local time steps are well synchronized. A solution for a coherent time progression is to combine different time steps that are integer multiples of each other. For the sake of simplicity, the analysis will be restricted to groups that have time step ratios of κ = 2. If a reference time step Δt *  is assumed for the group with the largest stable time step, the other partitions will be time-integrated with stable time steps Δt *  / 2z, z = 1, ⋯ z * , with z *  + 1 the number of groups, and z the multirate exponent of the group. Multirate schemes may also be developed for arbitrary integer time step ratios κ where the different groups would have stable time steps defined as Δt *  / κz.

The difficulty, when developing multirate strategies, is to manage the transition between groups of elements that use a different stable time step. Indeed, if a final time is to be reached, the number of stages of the respective ERK methods would not be the same on two neighboring elements that belong to different multirate groups. Some information is thus missing to ensure a coherent transition. This problem of communication reveals two underlying issues: conservation of the fluxes and accuracy of a multirate method. Both Constantinescu [14] and Schlegel [15] proposed a solution by introducing buffer groups. In these regions, adapted ERK methods are applied to bridge the transition between bulk groups where the standard ERK methods are used. Both multirate approaches are based on partitioned Runge–Kutta (PRK) schemes that are used to solve problems with two different ERK methods [18, 30].

3.1 Second-order conservative multirate Runge–Kutta schemes

The idea of Constantinescu and Sandu [14] is to extend singlerate ERK methods to multirate ERK methods. They developed a general systematic approach, based on PRK methods [30], to construct a family of Second-order multirate PRK schemes (MPRK-2). For the sake of convenience, the following notation is used: inline image means that a given ERK method x is used with a Δt *  time step and its associated Butcher tableau. The methodology consists in choosing a Second-order accurate s-stage ERK base method inline image and extend it to a 2s-stage ERK method in the buffer region. It should ensure the transition between two partitions that have a time step ratio κ = 2. Constantinescu [14] has shown that, if the ERK base method belongs to the family of the SSP time discretization methods, the corresponding multirate scheme will maintain this property.

For the sake of simplicity, the development of the multirate approach of Constantinescu is detailed through a basic 1D example on a mesh similar to Figure  2. This will clarify the features and the size of the buffer regions.

3.1.1 Introductory example

First of all, let us define some conventions and notations. Assume that the right-hand side fi, in Equation (3), computed on element Ωi, can be split into the volume contribution fi,i and the left and right interface contributions fi − 1,i and fi + 1,i, depicted in Figure  3, that use information from the neighboring elements:

  • display math(8)

The volume and interface terms for each element at each stage k of an ERK method may be defined as follows:

  • display math(9)

At each stage of the ERK method, inline image. For a classical singlerate ERK method, inline image because the normals at the interface between two neighboring elements are opposite each other. This property ensures global conservation of the fluxes after each iteration of the method.

Figure 3. one-dimensional unstructured mesh with interface fluxes, fi,i − 1 and fi − 1,i, between neighboring elements Ωi − 1 and Ωi. inline image is applied on element Ω0, whereas inline image is used for elements Ω1, Ω2, and Ω3.

Download figure to PowerPoint

image

Consider the Second-order accurate SSP ERK base method RK2a represented in Table  2 (a) and the mesh depicted in Figure  2. The key idea is to extend the RK2a method, inline image, to a four-stage adapted Butcher tableau inline image, Table  2 (b), where the base method is repeated twice on the same time interval. Actually, inline image is strictly equivalent to the base method inline image if it is used on all elements. The Butcher tableau shown in Table  2 (c) corresponds to the base method applied twice successively with the same time step inline image. Actually, this Butcher tableau contains implicitly the update for inline image. In other words, RK2a is applied at first time to un to obtain inline image, that corresponds to time inline image, and then again to inline image to compute un + 1. The methods inline image and inline image have now the same number of stages, and therefore, it enables the transition between the two types of elements.

Table 2. Butcher tableaus corresponding to (a) the two-stage, Second-order base method, (b) the buffer-adapted method, and (c) the two-stage, Second-order method applied twice successively with a twice as small time step.Thumbnail image of

Consider the setup of Figure  3, we apply inline image to element Ω0 and inline image to elements Ω1, Ω2, and Ω3. With the use of the Butcher tableaus of Table  2 and the definition of the volume, Equation (8), and interface contributions, Equation (9) on each element, it is now possible to develop the computations related to each stage of the method. To distinguish the intermediate times, defined by the c vector of the Butcher tableau, we define inline image that represents the current inner time used at stage k on element Ωi.

At the first stage of the coupled methods, there are no ambiguities. It is identical to apply the same base method RK2a everywhere. The inline image are all computed at the same intermediate time level inline image. Virtual incoming fluxes, f − 10 and f43, are supplied at the boundary of the domain as represented in Figure  3. It is assumed that the virtual element Ω − 1 (resp. Ω4) is of the same type, same size, and uses the same ERK method, as its neighbor Ω0 (resp. Ω3).

  • display math

At the second stage, the intermediate time levels are not the same on each element, that is, inline image, whereas inline image for i = 1,2,3.

  • display math

Several simplifications can be performed at the third stage and are highlighted with bold characters. Every entry of the third row in Table  2 (b) is zero and therefore inline image for i = 1,2,3. It follows that inline image if and only if i and j belong to the set {1,2,3,4} implying that inline image and inline image. At this stage, the intermediate times are different: inline image, whereas inline image for i = 1,2,3.

  • display math

At the fourth and last stage, it can be deduced from the previous simplifications that inline image and that inline image. It follows that inline image for i and j belonging to the set {2,3,4}. The unique simplification that can be performed at this level is thus inline image. The intermediate times, inline image, are logically all equal to tn + Δt *  at this last stage.

  • display math

The final operation is the update where the next step solution un + 1 is computed by using Equation (6). From the earlier simplifications, highlighted in bold characters, it follows that

  • display math(10)
  • display math(11)
  • display math(12)
  • display math(13)

Equation (13) is thus equivalent to Equation (6) for the base method inline image in Ω3. This means that the four-stage adapted method collapses into the original two-stage base method RK2a if and only if the particular element is at a minimum distance of two connected elements from Ω0. In other words, applying RK2a with inline image on Ω0 only has an influence on the integration scheme used on the two next elements, that is, Ω1 and Ω2. Therefore, a buffer region of size 2 is needed between the two bulk groups that are integrated with inline image and inline image, respectively. Elements Ω1 and Ω2 are stable for Δt *  but require twice more computations than Ω3. Despite that, this multirate approach requires less computations than the classical singlerate method.

About the conservation of the fluxes at the interfaces between elements, we can check that inline image for i = 1,2,3 and k = 1,2,3,4. Because the b vectors of the Butcher tableaus are equal, ba = b2b, for all elements, the sum of the fluxes cancels at each interface:

  • display math(14)

The so-called first-order and Second-order conditions are verified for the two methods considered separately [14]. At the critical interface between Ω0 and Ω1, the order of the coupling between inline image and inline image has to be considered. The first-order coupling conditions are implicitly satisfied. It can be verified that the Second-order PRK coupling conditions, that is,

  • display math(15)

are satisfied [30]. The RK2a multirate method of Constantinescu is thus globally Second-order accurate. Indeed, for PRK methods, the global order is defined as the minimum among the orders of the two methods considered separately and the order of their coupling [30].

3.1.2 Generalization

The strategy of Constantinescu [14] may be used to manage different integer time step ratios. A time step ratio κ = 2 between the different multirate groups seems to be sufficient for our target applications. Stable time steps of two neighboring cells are assumed to be relatively close for the vast majority of the mesh elements. This multirate approach may be extended, not only to any s-stage ERK base method, as shown in Table  3 , but also to multiple levels of refinement. It is nevertheless required, for an s-stage base method, that a buffer region of at least s connected elements separates two bulk groups. It is only at that distance that it is possible to collapse the adapted method into the base method. This general property can be proved using the same arguments as in the earlier introductory example. Imbricated multirate groups for buffers of size 2,3, and 4 are illustrated around the Holbourne island in Figure  13(a)–(c).

Elements are connected through their interfaces (nodes in 1D, segments in 2D, and faces in three-dimensional (3D)) in a DG formulation. This is a major advantage, in the context of multirate methods, compared with the standard continuous FEM where all types of elements are connected through nodes. Accordingly, buffer regions are generally considerably larger than in the discontinuous case, and more elements need twice as many operations as required by their stable time step. The efficiency of the multirate methods is therefore lower. Another issue, with continuous elements, is the handling of the mass matrix that is not block diagonal and thus couples the whole solution. This would complicate the use of several time steps on different multirate groups. However, we did not investigate in practice the multirate approach for continuous finite elements.

Table 3. Butcher tableaus for (a) the arbitrary s-stage explicit Runge–Kutta base method, (b) the adapted buffer method, and (c) the base method with half of the time step applied twice successively.Thumbnail image of

Because this multirate strategy is based on the PRK method, the order of the coupled method can be obtained as the minimum among the base methods used and the order of their coupling [30, 31]. Constantinescu [14] has shown that the MPRK-2 schemes, defined by the Butcher tableaus in Table  3, are (1) Second-order accurate if the base method is at least Second-order accurate and (2) have at most a Second-order accurate coupling regardless of the order of the base method. The third-order coupling conditions are never all satisfied for this multirate strategy [14]. It is actually at each critical interface, between a buffer group and a more constrained bulk group, that the coupling reduces to Second-order accuracy.

In spite of the order restrictions, the MPRK-2 schemes present the advantage of being conservative. It is shown in [14] that any PRK method with the same weights (ba = b2b) is conservative. In particular, MPRK-2 (described by Table 3) is conservative.

Multiple levels of refinement may be defined recursively on nested multirate groups. The base method, inline image, and the associated adapted method, inline image, are applied successively to the different buffer and bulk groups. Consider an arbitrary problem with multiple levels of refinement. Bulk groups, inline image, are integrated with a Δt *  / 2z stable time step as well as their neighboring buffer groups, inline image. This procedure is illustrated in Figure  4 for a general case. The overall speedup that this technique would yield compared with a classical singlerate ERK method strongly depends on the amount of elements that are allocated to each multirate group.

Figure 4. Multiple levels of refinement. The multirate exponent of a group is z.

Download figure to PowerPoint

image

3.2 Recursive flux splitting multirate

Knoth and Wolke [17] developed IMEX integration methods in the context of advection–diffusion equations in air pollution applications. An efficient solution is expected by splitting the right-hand side of the differential equation (16) in a nonstiff advection part, the inline image term, and a stiff diffusion part, the inline image term

  • display math(16)

that are solved with an explicit and an implicit method, respectively. The IMEX method should ensure that the cumulative integration interval for inline image equals the explicit time step used for inline image. The key idea of Schlegel et al. [15] is to consider that inline image is nonstiff like inline image but is restricted by a smaller time step, then solve them together with an inner method for inline image and an outer method for inline image that are both ERK methods. The same base method can either be used for the two parts or two different schemes can be mixed. These choices strongly depend on the stability requirements with respect to inline image. An imbricated system with s × q stages is obtained by combining an s-stage outer method inline image with a q-stage inner method inline image. For a complete explanation about the construction of the new Butcher tableaus, see [15]. The resulting method is called the recursive flux splitting multirate (RFSMR) and may be written in a PRK form:

  • display math(17)
  • display math(18)

for k = 1, … ,s and where inline image and inline image are the RK parameters of the resulting method. They are obtained by combinations of the original inner and outer methods parameters: AO,bO,cO and AI,bI,cI.

Order conditions can be established for these mixed schemes. They consist in both the classic order conditions for the base ERK methods and additional coupling conditions [31]. It can be shown that the resulting methods for inline image and inline image as well as their coupling are Second-order accurate if and only if the underlying base methods are at least Second-order accurate. Knoth and Wolke [17] have derived an additional third-order consistency condition that, when satisfied by the base method, leads to a third-order accurate multirate scheme experimentally. Yet, the theoretical proof of this property remains an open question. In particular, the RK43 scheme represented in Table  4, used as inner and outer method, fulfills this condition and leads to a third-order accurate multirate scheme.

Table 4. RK43 Butcher tableau.Thumbnail image of

The two resulting schemes, inline image and inline image, that have both s2 = 16 stages may be constructed [15]. The ten-stage methods, represented in Tables  5 and  6, are obtained by eliminating redundant rows and columns in the resulting Butcher tableaus. More explanations about the RFSMR method and its properties can be found in [15]. Our analysis is restricted to the interpretation of the resulting schemes and their effective application in a multirate approach.

Table 5. inline image-Outer buffer.Thumbnail image of
Table 6. inline image-Inner buffer.Thumbnail image of

The inline image (resp. inline image) method is strictly equivalent to the base method RK43 applied once with a time step Δt *  (resp. twice successively with a time step Δt *  / 2), if only this method is used for all the domain variables. Indeed, if the bold entries of Table  5 (resp.  6) are gathered, by eliminating rows and columns that are redundant when the method is considered independently, the Butcher tableau of method RK43 (resp. 2 × RK43 with half of the time step) is obtained. These methods are used in inner and outer buffer groups that accommodate the transition between two bulk groups that have a time step ratio κ = 2.

The critical interface is now located between the inner and the outer buffer group. Because inline image, the solutions at each inner stage of the method are all computed at the same intermediate time steps inline image. This is not sufficient to draw a conclusion about the order of the coupled method, but it simplifies a lot of the third-order conditions that have to be satisfied. A drawback of the method is that there is no conservation of the fluxes at the critical interface because inline image. However, if we consider a partitioning based on fluxes rather than in terms of the components, mass preservation is guaranteed at any stage of the explicit PRK method [15, 27]. This approach has not been investigated in this paper.

The buffer groups have a different meaning compared with the multirate approach of Constantinescu [14]. The total buffer has a size of two connected elements, but it is not necessary that it separates two bulk groups. An inner buffer either separates a bulk group and an outer buffer group or two outer buffer groups that have a different stable time step. This property can easily be verified by developing the Butcher tableaus presented earlier. Construction of appropriate multirate groups for the approach of Schlegel et al. [15] will be detailed in the next section. Figure  13(d) illustrates the different multirate groups for the method of Schlegel around the Holbourne island.

The Butcher tableaus have ten stages in the buffer regions. This is more than the eight stages needed when the base method RK43 is applied twice successively. However, Table  5 indicates that only K1, K5, K6, and K10 have to be computed, whereas in Table  6, K5 and K10 are superfluous. Indeed, some columns in the corresponding Butcher tableaus are equal to zero everywhere. This has to be taken into account when implementing this method, but the ten intermediate solutions u(k) are effectively needed to ensure a coherent transition between the multirate groups.

To have an idea about how the two buffers communicate with their respective bulk neighbors, we just need to copy Tables  5 and 6 and replace all the nonbold entries by zeros. The two resulting Butcher tableaus are then respectively equivalent to RK43 applied with a Δt *  time step and RK43 applied twice successively with a Δt *  / 2 time step.

4 MULTIRATE GROUPS

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

The key idea to achieve the best speedup is to take advantage of the multiple levels of refinements. But the speedup that can be reached with such multirate strategies strongly depends on the distribution of the characteristic stable time steps among the elements of the mesh. In particular, the gap between the minimum and the maximum stable time steps as well as the amount of elements present in each multirate group has a significant influence on the computational efficiency of the methods. Therefore, the mesh elements are to be organized in an optimized way.

The major difficulty, when implementing multirate methods, is to manage the different groups and the communication between them. In the 2D-DGM framework, we propose to gather elements in groups that share the same multirate characteristics and then treat them one by one. Inside each group of elements, three types of groups of interfaces are distinguished: (i) interfaces that are common to elements of the same element group; (ii) interfaces that are common to two different element groups; and (iii) interfaces that are part of a physical boundary. The multirate groups communicate through the interface groups of type (ii).

A generic way to construct these multirate groups is developed in Section 4.1. Afterwards, in Section 4.2, two efficiency issues are addressed: the influence of the reference time step on the speedup and the duplicate computations of interface residuals.

4.1 Construction of multirate groups

Consider that the stable time steps may be computed for each element of a mesh. The minimum and maximum stable time steps are noted Δtm and ΔtM. A reference time step Δt *  < ΔtM is fixed. By using a time step ratio κ = 2, the different time step ranges can be defined recursively. The maximum multirate exponent z *  is determined as follows:

  • display math(19)

such that inline image. Elements may now be sorted in z *  + 1 groups. Indeed, the stable time step of each element in the mesh belongs to one of the following sets:

  • display math(20)

where z stands for the multirate exponent of the group. Because buffer groups have to be inserted, a tag θ is attributed to each multirate group Ωθ:

  • display math(21)

where the integer σ defines whether the group is characterized as a bulk, σ = 0, an inner buffer, σ = 1, or an outer buffer, σ = 2. This is a general notation that is adapted to manage the two multirate strategies. For an s-stage MPRK-2 method, the inner buffer groups are always empty sets and the outer buffer groups have a size s. Note that two bulk groups that are integrated with two different time steps never have neighboring elements. The building procedure is illustrated step by step, for the two multirate approaches, on a simple mesh represented by Figure  5(a). For the method of Constantinescu, the illustration is limited to the multirate scheme that uses a two-stage base method, but it can be extended to a buffer of any size.

Figure 5. Construction of multirate groups for discontinuous elements. Inner buffer groups are empty for the method of Constantinescu (b). Inner buffer groups may recover a whole bulk group for the method of Schlegel, that is, tags 1 and 7 (d).

Download figure to PowerPoint

image

The first step, common to both methods, is to assign a bulk tag, defined by Equation (20), to each element depending on its characteristic time step. Buffer groups are neglected at this level. As illustrated in Figure  5(a) there are four initial groups: Ω0, Ω3, Ω6, and Ω9. The transition between them is then ensured by introducing the buffer groups.

The procedure is quite simple for the method of Constantinescu. Because there are no inner buffer elements, the tags are either equal to 3(z *  − z) or 3(z *  − z) + 2. The buffers have a size of two connected elements because the base method, RK2aC, has two stages. The group with the smallest multirate tag, Ω0, remains the same and then successively the buffer elements are introduced. At the same time, it is ensured that two neighboring elements have neighboring tags. In other words, the tags of the elements are smoothed, that is, two connected elements have either the same tag, if they are both members of the same buffer or bulk group, or two successive tags. Figure  5(b) shows the distribution of the mesh elements in their respective multirate groups.

The smoothing procedure is more complex for the method of Schlegel because two types of buffers are introduced. It is divided in two steps. The first one, shown in Figure  5(c), introduces the outer buffer elements. The technique is identical than for the previous method but with an outer buffer of size 1. Afterwards, as illustrated by Figure  5(d), inner buffer elements are introduced by changing the tags of the elements that are still in the current bulk groups but have an interface in common with an upstream outer buffer group. A bulk group may be empty for the multirate method of Schlegel.

As expected, introducing buffer groups has a significant influence on the repartition of the elements. Indeed, many elements are attributed to groups that have a smaller time step than prescribed a priori. Initially present multirate tags may disappear because of the inserted buffer groups. In the example of Figure  5, no element remains in group Ω9. Nevertheless, it is a necessary condition to construct a coherent multirate scheme.

If continuous elements were used for the spatial discretization, the efficiency would be worse. As shown in Figure  6(a) and (b), the impact of introducing buffer groups is much more severe than in the discontinuous case. Elements are not anymore connected by faces but by nodes and therefore the size of the buffer regions drastically increases.

Figure 6. Construction of the multirate groups for continuous elements. The buffer elements cover a drastically larger part of the domain than in the discontinuous case.

Download figure to PowerPoint

image

By considering the algorithm that constructs the groups, it seems difficult to determine a priori the effective distribution of the elements in multirate groups. Therefore, it seems inevitable to build the groups to compute the theoretical speedup. Furthermore, it will be shown in the next section that the choice of the reference time step Δt *  has a significant influence on the repartition of the elements and therefore on the theoretical speedup.

4.2 Efficiency

4.2.1 Choice of the reference time step

In practice, the effective speedup of multirate versus singlerate is determined by taking the ratio of the two corresponding CPU times. However, it is worth to have an a priori estimation of the theoretical speedup that could be achieved. The work performed at each stage of an ERK method for each individual element may be approximated as constant. The total work is then determined as the sum of the number of elements in each multirate group multiplied by the number of ERK stages that have to be performed in this group. The theoretical speedup, Sth, of a multirate method versus its singlerate equivalent can then be expressed as the ratio of their respective work:

  • display math(22)

where N stands for the total number of elements in the mesh. The discrete repartition function γ(i) defines the effective multirate exponent associated to element i. This means that 2γ(i)s stages have to be performed on element i to achieve a Δt *  time step. The function γ does not only depend on the stable time step of the element but also on its corresponding multirate group. This is because, in buffer regions, more stages have to be performed than prescribed a priori by their effective stable time steps.

Assume a fixed mesh with specific Courant numbers per element that do not vary in time. Two factors may then influence the theoretical multirate speedup defined by Equation (22): the reference multirate time step Δt *  and the function γ. But for each Δt * , there exists an optimal configuration of the multirate groups. Actually, the repartition function γ is entirely dependent on Δt * . For a given Δt * , the number of multirate groups may vary as well as their organization and the number of elements present in each of them. Variations of Δt *  may therefore have a significant influence on the theoretical speedup. The analysis can be reduced to a fixed range of Δt * .

Firstly, all Δt *  such that 2Δt *  < ΔtM have to be proscribed. In this situation, Sth(2Δt * ) > Stht * ) because the reference time step is twice larger and that γ only differs for the elements with the largest characteristic time steps where it has a larger value. The reference time step Δt *  has to be chosen in a range defined as follows:

  • display math(23)

where α is a factor that determines the time step, αΔtm, used for the multirate groups with the largest multirate exponent. When α = 1, these groups are simply integrated with Δtm, the maximum time step authorized for global stability. For inline image, the multirate groups are exactly the same as for an α belonging to the set inline image and therefore inline image is an empty set. The objective is to determine the maximum of equation (22) with the constraint defined by Equation (23). This can be summarized as

  • display math(24)

Multirate groups are generally almost the same for very small variations of α, but it seems very difficult to predict the maximum of the objective function Sth(α). The theoretical speedup can be computed for each α, in practice, and optimization techniques can be used to determine the optimal  α * . Figure  7 shows the evolution of the reference time step Δt *  and the corresponding multirate subdivision as a function of α for an arbitrary Δtm and ΔtM. For a certain inline image, the reference time step, Δt * , jumps to a new curve starting at ΔtM. Consequently, a new multirate subdivision appears and z *  jumps from 2 to 3, which means that for inline image (resp.  inline image) there will be three (resp. four) levels of refinement. This phenomenon partly shows the complexity of predicting the theoretical speedup depending on parameter α.

Figure 7. Influence of α on Δt *  and the number of multirate groups. For a certain value inline image, an additional multirate group appears.

Download figure to PowerPoint

image
4.2.2 Avoid duplicate computations

Two types of operations are performed during ERK time stepping: (1) summing up vectors when the current solution is computed at an inner stage of a method, Equation (4), or when the next step solution is updated, Equation (6) and (2) computing the steady-state residuals, Equation (5). Almost all the computational efforts are contained in the operations of the second type and can be split into an interface and a volume term. It was shown in the previous sections that the effective computational gain, when using multirate methods, relies on the amount of computations that are avoided compared with a singlerate method. In the DGM formulation, the interface contributions of the steady-state residual have to be computed only once at each stage of the method. Indeed, interface fluxes are opposite each other at element boundaries. Ideally, the interface terms of type (ii), at the boundary between two multirate groups, should only be computed once. However, this is not simple to implement because each group is treated separately and runs with a different time step. In our implementation, this superfluous computation of interface terms is avoided for the methods of Constantinescu but not for the method of Schlegel. Consequently, the effective speedup will be lower than the theoretical one for the method of Schlegel.

5 NUMERICAL EXPERIMENTS AND RESULTS

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

The two multirate ERK methods are applied for the temporal integration in the framework of ocean modeling. A depth-averaged barotropic 2D model is used to compute the mean horizontal velocity vector u and the free-surface elevation η for shallow waters. Consider the nonconservative shallow water equations [9, 32]:

  • display math(25)
  • display math(26)

where f,g,ν, and ρ are respectively the Coriolis parameter, the gravitational acceleration, the horizontal eddy viscosity, and the mean water density. The actual water depth is H = h + η, where h is the reference water depth. The bottom and wind stresses are parametrized as τb and τs, respectively. The equations are discretized with inline image discontinuous finite elements for both elevation and velocity fields. Three methods of Constantinescu, RK2aC, RK33C, and RK44C based on the corresponding base methods, and the method of Schlegel RK43S are compared. The convergence and the performance of the methods are first analyzed on a simple shallow water test case. The step-by-step procedure of the different multirate strategies is then illustrated on a more realistic application, the GBR.

5.1 Convergence and performance of the multirate methods: an island in a rectangular basin

We introduce a simple shallow water test case to compare the different multirate methods in terms of efficiency, convergence, and conservation properties. Consider the water circulation in a rectangular closed basin defined on the domain [ − W,W] × [ − L,L], where W = 350 m and L = 75 m with an elliptic island in the middle. Figure  8(a) represents this domain with an associated bathymetry that varies between 10 and 5 m (around the island). The bottom stress is a quadratic dissipation term that depends on the bathymetry. A Coriolis force is also acting on the system. The initial condition corresponds to an exponential elevation where 0 m≤  η(x,y)  ≤ 0.05 m, as represented in Figure  8(b).

Figure 8. Illustrations related to the simple shallow water test case: an island in a rectangular basin. (a) Bathymetry. (b) Initial condition. (c) Coarse mesh.

Download figure to PowerPoint

image
Table 7. Theoretical speedups (for α = 1) corresponding to the four multirate methods evaluated for four meshes obtained by successive refinements of the original mesh of Figure  8(c).
 hh / 2h / 4h / 8
# elements89235681427257088
Δt * [s]0.10840.05410.02700.0135
RK2aC2.62832.83703.01103.1232
RK33C2.41892.70222.91333.0638
RK44C2.22962.57892.82393.0072
RK43S3.24513.24843.25423.2553

We use this example to compare the different time-stepping schemes through three experiments. The first experiment compares the efficiency of the methods by measuring the integrated L 2 errors for both elevation and velocities as well as the CPU time after 25 s of simulation on different meshes. We consider four meshes that are obtained by successive refinements of the original mesh of Figure  8(c). The number of mesh elements, the reference time steps, and the theoretical speedups associated with each mesh and each multirate method are listed in Table  7. The maximum multirate exponent z *  is 4 for all meshes and methods. This means that the time step for the singlerate methods is Δtm = Δt *  / 24. The stability requirements of the explicit temporal discretization limit the time step to very small values associated with each mesh. Therefore, the temporal error is much smaller than the spatial error and the total error is expected to scale as the spatial error [33]. In this case, we use a inline image discretization, and we expect Second-order convergence for all fields when the mesh is refined. Indeed, as represented in Figure  9, a convergence of order two is observed for both elevation and velocity fields regardless of the time-stepping method used. Multirate methods have thus no adverse effect on the global space-time error. Moreover, all the multirate methods give a better ratio than the singlerate ones between CPU time and error because they need less operations. Figure  9 reveals that the RK2aC method is the most efficient. This is because it yields the best effective speedup compared with the number of stages of the method and the corresponding buffer size. The RK43S method needs four stages and has an effective speedup that is lower than the theoretical one, for the reasons mentioned in Section 4.2.

Figure 9. Integrated L 2 errors for the elevation (a) and velocities (b) as functions of the CPU time for the four selected multirate schemes and their singlerate equivalents. The errors are computed after 25 s of simulation. The first mark of each curve corresponds to the reference mesh of Figure  8(c) with an element size h. The three next marks are associated with three successive refinements of the original mesh: h / 2, h / 4, and h / 8. The maximum multirate exponent is z *  = 4 for all the multirate schemes applied on all meshes (with α = 1). Second-order convergence is observed for all the schemes as expected.

Download figure to PowerPoint

image

Figure  10(a) and (b) gives the L 2 error in function of the CPU time for both elevation and velocity fields after 0.5 s of simulation. The same mesh, represented by Figure  8(c), is used for all computations but with different time steps. The original time step associated with the mesh is divided successively by a factor 2 such that the pure temporal error is visible. The expected convergence rates are observed for all time-stepping schemes. However, the multirate methods produce larger temporal errors than their singlerate counterparts. For multirate methods, the error associated with the largest time step Δt *  propagates to all mesh elements after a certain time. The global temporal error is of the order of the largest time step. In our case, the largest time step of the multirate RK2aC method is 16 times bigger than the time step of the singlerate RK2a method. Both methods being of quadratic precision the error is 162 times larger for the multirate one. This may be verified in Figure  10 by comparing the two blue convergence curves. The third-order accurate RK43S method gives the best precision for a fixed CPU time. This method is even more accurate than the RK2a method after three temporal refinements. All the methods of Constantinescu achieve Second-order accuracy, but RK33C is slightly more efficient than RK2aC and RK44C.

Figure 10. Integrated L 2 errors for the elevation (a) and velocities (b) on the original mesh of Figure  8(c) as functions of the CPU time for the four selected multirate schemes and their singlerate equivalents. The errors are computed after 0.5 s of simulation. Each mark is associated to a ratio of the original time step: Δtm, Δtm / 2, Δtm / 4, and Δtm / 8 for the singlerate methods and Δt * , Δt *  / 2, Δt *  / 4, and Δt *  / 8 for the multirate methods. The expected convergence rates are observed for the eight schemes. Second-order convergence for the three schemes of Constantinescu and third order for the scheme of Schlegel.

Download figure to PowerPoint

image

Finally, we compare the conservation properties of the four selected multirate schemes. The total water volume at a time t is computed as follows:

  • display math(27)

We evaluate the conservation defect of a particular method as

  • display math(28)

Figure  11 shows the conservation defects for the selected multirate methods for a simulation of 200 s. The experiment confirms the theory. There is a perfect conservation of the total water volume for the schemes of Constantinescu and not for the scheme of Schlegel.

Figure 11. Conservation defects evaluated as the mean water volume per cubic meter that is added or removed from the original total water volume as a function of time. As expected, the methods of Constantinescu are conservative. An oscillation of the relative mass is observed for the one of Schlegel.

Download figure to PowerPoint

image
Table 8. Comparison of the four selected multirate schemes.
Multirate methodα * Δt *  [s]z * % Outer buffer% Inner bufferTheoretical speedupExperimental speedup
RK2aC0.757.381620.0 %0 %4.6064.461
RK33C0.7357.233626.3 %0 %4.3964.327
RK44C0.6956.840631.3 %0 %4.2184.183
RK43S0.8554.207512.9 %13.0 %5.3644.343

From those three experiments, the RK2aC scheme seems the most appropriate for applications with inline image spatial discretization. It delivers the best total speedup (lowest number of stages, minimum buffer size) and respects an important property in oceanography: mass conservation. Moreover, if some numerical experiments show that the RK43S method has good monotonicity behavior in terms of the TV norm [15], the RK2aC scheme should theoretically behave better for problems that develop shocks because it inherits the monotonicity properties [21, 22] of the base method [14, 27]. Note that first-order time-stepping schemes are too dissipative and therefore inappropriate. However, the other schemes may present some advantages for other applications, for example, when the temporal error is dominant.

5.2 Hydrodynamics of the Great Barrier Reef

Consider the unstructured mesh of the GBR, depicted in Figure  1, on which the 2D shallow water equations (25) and (26) are solved. The parametrization of the equations as well as multiple details about the model and the mesh can be found in [10]. Bathymetry, wind stress, and open sea boundary conditions are obtained from terrain data or measured data. A zero mass flux and a tangential momentum, proportional to the mean tangential velocity, are imposed along the impermeable boundaries (coasts and islands). The parametrization of Smagorinsky [34] for the kinematic viscosity ν is used to incorporate unresolved features.

The four multirate approaches have been tested. Figure  12 shows the theoretical speedup depending on parameter α. As expected, we observe that inline image. The three methods of Constantinescu yield a curve of almost the same shape, but a shift of the maximum is observed. Because the number of elements in the buffer groups increases with the number of stages of the method, the speedup declines. The method of Schlegel, RK43S, achieves a significantly higher speedup because of buffers that do not perform more expensive operations than actually needed.

Figure 12. Theoretical speedup as a function of parameter α ∈ inline image for the Great Barrier Reef test case.

Download figure to PowerPoint

image

The optimal values for α associated with the reference time step, the maximum multirate exponent z *  as well as the corresponding theoretical speedups are listed in Table  8 for the four different multirate methods. An illustration of the corresponding multirate groups is shown in Figure  13 for a zoom around the Holbourne Island. Observe that for RK2aC, RK33C, and RK44C, the size of the buffer groups is increasing with the number of stages of the base method. For RK43S, a distinction can be made between the inner and outer buffer groups that are both of size 1. Figure  14 shows the multirate groups on the whole GBR for method RK2aC. The global percentage of inner and outer buffer elements for the whole GBR mesh is given in Table  8. The optimal values of α yield a maximum multirate exponent z *  = 6 for the three methods of Constantinescu, whereas z *  stays at 5 for the method of Schlegel.

Figure 13. Multirate groups around the Holbourne island associated with the four multirate strategies obtained for the optimal Δt * . Outer buffer groups are colored in red and inner buffer groups in blue. Bulk groups have colors that vary from light gray to dark gray depending on their multirate exponent.

Download figure to PowerPoint

image

Figure 14. Multirate groups for the RK2aC method on the whole Great Barrier Reef mesh. Elements have colors that depend on their multirate groups. Small (resp. large) time steps are used on blue (resp. red) elements.

Download figure to PowerPoint

image

The four selected methods were used to run the GBR test case, and the CPU times have been measured. The same runs have also been performed with the corresponding singlerate methods where the global time step is simply the minimum among all. The experimental speedups, listed in Table  8, are obtained by taking the ratio of the singlerate and multirate CPU times. The theoretical and experimental speedups are relatively close for the three methods of Constantinescu, whereas the one of Schlegel yields a worser experimental speedup. As mentioned in Section 4.2, this lack in efficiency results from the computational overhead caused by the superfluous operations performed at the interfaces between multirate groups. Recall that the speedup strongly depends on the kind of mesh that is used. Significantly higher speedups may be obtained for the same problem with other meshes where the ratio between the average and the smallest element size is drastically larger.

The RK2aC scheme is used to perform a 24-h simulation on the mesh presented in Figure  1 with data corresponding to the first of March 2000. A plot of the velocity vectors and the sea surface elevation is presented in Figure  15 corresponding to time 21:51:23. Tidal jets and eddies resulting from the interaction of the flow with the topography near the open-sea boundary are clearly visible.

Figure 15. Sea surface elevation (color levels) and bidimensional velocity field (arrows) around the Whitsunday Islands archipelago. Velocity vectors have a norm that varies between 0 and 0.822 m/s.

Download figure to PowerPoint

image

6 CONCLUSION AND FUTURE WORK

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

In this paper, two ERK multirate approaches have been implemented, in the DGM framework, for solving Large-scale problems with different time steps. The first strategy is conservative and reaches Second-order accuracy, whereas the second one is not conservative but is third-order accurate. Even if the multirate methods are more complex to implement than their singlerate equivalents, they inherit a lot of properties that make them particularly adapted to multiscale simulations. A significant speedup, for a well-chosen reference time step, has been observed for the two kinds of multirate methods on an unstructured mesh of the GBR. However, the speedup turns out to be highly dependent on the nature of the mesh. Furthermore, other parameters, such as the choice of the reference time step, have a significant impact on the speedup.

Large-scale applications such as the GBR require the use of parallel computers. Some kind of load balancing strategy has to be supplied to accommodate multirate schemes. Indeed, small elements have a higher cost than large elements in such a strategy and will require more frequent updates at interprocessor interfaces. The key idea consists in creating an optimized mesh partition such that the amount of grid cells of the different multirate groups is ideally the same on each computer core. However, a compromise should probably be found between the effective work on each processor and the amount of communications between them.

Until now, we have not considered that some parameters related to the local stability condition may change in time. The meshes could be adapted at some time steps, and the multirate groups would have to be consequently changed. A more physical constraint is that the wave/advective velocity changes considerably in time and could cause the solution to blow up after a certain number of iterations. A criterion could eventually be found to determine whether it is worth or not to recompute the multirate groups at a certain moment to stay stable all along the simulation.

ACKNOWLEDGEMENTS

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES

Bruno Seny is a research fellow with the Belgian Fund for Research in Industry and Agriculture (FRIA). Jonathan Lambrechts is supported by grants from the Belgian National Fund for Scientific Research (FNRS). The present study was carried out within the scope of the project ‘Taking up the challenges of multi-scale marine modeling’, which is funded by the Communauté Française de Belgique, as Actions de Recherche Concertées, under contract ARC 10-15/028.

REFERENCES

  1. Top of page
  2. SUMMARY
  3. 1 INTRODUCTION
  4. 2 EXPLICIT TIME INTEGRATION
  5. 3 MULTIRATE TIME INTEGRATION
  6. 4 MULTIRATE GROUPS
  7. 5 NUMERICAL EXPERIMENTS AND RESULTS
  8. 6 CONCLUSION AND FUTURE WORK
  9. ACKNOWLEDGEMENTS
  10. REFERENCES