Resolving Wave Propagation in Anisotropic Poroelastic Media Using Graphical Processing Units (GPUs)

Biot's equations describe the physics of hydromechanically coupled systems establishing the widely recognized theory of poroelasticity. This theory has a broad range of applications in Earth and biological sciences as well as in engineering. The numerical solution of Biot's equations is challenging because wave propagation and fluid pressure diffusion processes occur simultaneously but feature very different characteristic time scales. Analogous to geophysical data acquisition, high resolution and three dimensional numerical experiments lately redefined state of the art. Tackling high spatial and temporal resolution requires a high‐performance computing approach. We developed a multi‐ graphical processing units (GPU) numerical application to resolve the anisotropic elastodynamic Biot's equations that relies on a conservative numerical scheme to simulate, in a few seconds, wave fields for spatial domains involving more than 1.5 billion grid cells. We present a comprehensive dimensional analysis reducing the number of material parameters needed for the numerical experiments from ten to four. Furthermore, the dimensional analysis emphasizes the key material parameters governing the physics of wave propagation in poroelastic media. We perform a dispersion analysis as function of dimensionless parameters leading to simple and transparent dispersion relations. We then benchmark our numerical solution against an analytical plane wave solution. Finally, we present several numerical modeling experiments, including a three‐dimensional simulation of fluid injection into a poroelastic medium. We provide the Matlab, symbolic Maple, and GPU CUDA C routines to reproduce the main presented results. The high efficiency of our numerical implementation makes it readily usable to investigate three‐dimensional and high‐resolution scenarios of practical applications.


Introduction
Majority of the most powerful supercomputers on the world host hardware accelerators to sustain calculations at the petascale level and beyond.Graphical processing units (GPUs) are amongst widely employed hardware accelerators, initiating a revolution in high-performance computing (HPC) in the last decade.The three-dimensional calculations targeting billions of grid cells -technically impossible resolutions decades ago -became reality.This major breakthrough in HPC and supercomputing comes however with the cost of developing and reengineering scientific codes to efficiently utilize the available computing power.Increasing the low-level parallelism is the key.In Earth sciences, HPC and GPU-accelerated applications target in particular forward and inverse seismic modeling and geodynamics -fields where high spatial and temporal resolutions as well as large spatial domains are required.We here develop a multi-GPU implementation for applications in seismic modeling in porous media.
Understanding seismic wave propagation in fluid-saturated porous media enables more accurate interpretation of seismic signals in Earth sciences.The two phase medium is represented by an elastic solid matrix (skeleton) saturated with a compressible viscous fluid.The dynamic response of such an isotropic two phase medium results in two longitudinal waves and one shear wave, as predicted by Frenkel (1944) (see also Pride and Garambois (2005)).The wave of the first kind (fast wave) is a true longitudinal wave where the solid matrix motion and the fluid velocity vector fields are in-phase.The wave of the second kind (slow wave) is a highly attenuated wave where the solid matrix motion and the fluid velocity vector fields are Abstract Biot's equations describe the physics of hydromechanically coupled systems establishing the widely recognized theory of poroelasticity.This theory has a broad range of applications in Earth and biological sciences as well as in engineering.The numerical solution of Biot's equations is challenging because wave propagation and fluid pressure diffusion processes occur simultaneously but feature very different characteristic time scales.Analogous to geophysical data acquisition, high resolution and three dimensional numerical experiments lately redefined state of the art.Tackling high spatial and temporal resolution requires a high-performance computing approach.We developed a multi-graphical processing units (GPU) numerical application to resolve the anisotropic elastodynamic Biot's equations that relies on a conservative numerical scheme to simulate, in a few seconds, wave fields for spatial domains involving more than 1.5 billion grid cells.We present a comprehensive dimensional analysis reducing the number of material parameters needed for the numerical experiments from ten to four.Furthermore, the dimensional analysis emphasizes the key material parameters governing the physics of wave propagation in poroelastic media.We perform a dispersion analysis as function of dimensionless parameters leading to simple and transparent dispersion relations.We then benchmark our numerical solution against an analytical plane wave solution.Finally, we present several numerical modeling experiments, including a three-dimensional simulation of fluid injection into a poroelastic medium.We provide the Matlab, symbolic Maple, and GPU CUDA C routines to reproduce the main presented results.The high efficiency of our numerical implementation makes it readily usable to investigate three-dimensional and highresolution scenarios of practical applications.
ALKHIMENKOV ET AL. out-of-phase.Depending on the medium's properties, the slow wave may propagate as a longitudinal wave, or it may diffuse and attenuate quickly.Maurice Anthony Biot performed systematic studies of solid-fluid deformation in porous media based on the Hamiltonian principle of least action.He first investigated a static loading known as the theory of consolidation (Biot, 1941;Biot & Willis, 1957).The mathematical description of the macroscopic coupled solid-fluid deformation in a porous medium is analogous to the theory of thermoelasticity (Biot, 1941;Zimmerman, 2000).Biot later developed the theory of poroelasticity or Biot's theory for wave propagation in fluid-saturated media (Biot, 1956a(Biot, , 1956b)).Biot summarized these results in Biot (1962a); Biot (1962b) and provided a final set of unknown fields, parameters, as well as, a guidance to expand poroelasticity to include viscoelasticity and non-linear effects (Biot, 1965).Fluid flow in porous media in Biot's theory is assumed to be laminar, described by Darcy's law (Biot, 1956b), and is usually referred to as the low frequency Biot's theory.If the fluid flow is accelerated, viscous boundary layers form in the pores and a slight modification of Biot's equations is needed to account for this high frequency effect (Biot, 1956a).We focus in this study on the low frequency Biot's theory (Biot, 1956b).A detailed analysis of the coupled solid-fluid deformation in a porous media can be found in various recent studies, for example Bourbié et al. (1987); Wang (2000); Cheng (2016).Approximations based on this theory are widely used in biology and medical imaging, and in Earth sciences (e.g., Carcione (2014)), they are used in seismic exploration, seismic monitoring of geological CO 2 sequestration and nuclear waste disposal, geothermal energy production and hydrogeology.
One of the main application of Biot's equations in Earth sciences is the estimation of seismic dispersion and attenuation in porous media due to wave-induced fluid flow.Several wave attenuation mechanisms related to fluid flow arise from Biot's theory (Müller et al., 2010;Pride et al., 2004).The first attenuation mechanism introduced by Biot is the global fluid flow, which occurs at the wavelength scale of a propagating wave.In this mechanism, the dissipation is caused by the relative fluid motion between the solid matrix and the fluid (Biot, 1956b).The second mechanism is the wave-induced fluid flow at the mesoscopic scale.This scale is defined as much larger than the sizes of individual pores but much smaller than the wavelength of a propagating wave (Pride et al., 2004;White et al., 1975).In this mechanism, the dissipation is caused due to fluid-pressure gradients arising between mesoscopic heterogeneities in the medium.For example, fluid-pressure gradients appear between highly permeable structures such as fractures and the embedding solid matrix of much lower permeability.Wave-induced fluid flow at microscopic scale also occurs and is referred to as squirt flow, in which fluid-pressure gradients take place between compliant and stiff pores (Dvorkin et al., 1995;Mavko & Nur, 1975).Other mechanisms involve different kinds of wave scattering and wave mode conversions at interfaces.Possible non-linear viscous and plastic effects are small for most of the applications in applied seismic and are then neglected under the linear approximation assumption.
The aforementioned analytical approaches for wave-induced fluid flow at global and mesoscopic scales mainly exist for simple geometries.For more complex geometries, a numerical approach is needed to estimate seismic dispersion and attenuation.In principle, it can be done numerically in two ways.One approach relies on direct modeling of wave propagation in porous media and estimation of dispersion and attenuation of a propagating wavelets (Caspari et al., 2019;Masson et al., 2006).The other approach is based on a quasi-static numerical modeling and estimation of effective frequency-dependent elastic properties.The modeled frequency-dependent properties are used to retrieve dispersion and attenuation of seismic waves (Hunziker et al., 2018;Masson & Pride, 2007;Quintal et al., 2011;Rubino et al., 2009).

10.1029/2020JB021175
2 of 35 second-order system was considered.Moczo et al. (2019) and Gregor et al. (2021) investigated the accuracy of the discrete characterization of material heterogeneities and subcell-resolution for the finite-difference modeling of Biot's equations.
A major challenge in the numerical modeling of Biot's equations relies in the treatment of the dissipation term in the equations of motions.This term is represented by a parabolic operator coupled to viscosity, permeability, and density and, affects the numerical stability of the entire system of equations.The diffusion process exhibits a much larger characteristic time scale then the wave propagation process, which makes Biot's equations "stiff," thus challenging to solve.A straightforward explicit time integration of a "stiff" system is possible but requires very small time steps and is computationally inefficient.Various studies discuss stability conditions in the scope of poroelastic wave propagation and report a series of issues (Carcione & Quiroga-Goode, 1995;Chiavassa & Lombard, 2011;Masson et al., 2006).A more detailed discussion regarding the stability of discrete schemes of Biot's equations can be found in Alkhimenkov et al. (2020).
We here propose a multi-GPU numerical implementation of the anisotropic elastodynamic Biot's equations building upon three key ideas: Concise numerical implementation, high numerical resolution, and high computational efficiency.A concise numerical implementation means that we designed a simple and short numerical code ensuring it is suitable for parallel GPU devices.We use a variant of a conservative staggered space-time grid discretization (Virieux, 1986), which is equivalent to a finite volume approach (Dormy & Tarantola, 1995).High numerical spatial resolution up to six billion grid cells permits us to resolve very complex geometries.High computational efficiency allows our numerical model to simulate, in a few seconds only, wave fields in domains involving over 1.5 billion grid cells.We further explore several aspects of Biot's equations, namely, wave propagation in poroelastic isotropic and anisotropic media, fluid diffusion, dimensional and dispersion analyses, and numerical stability.The resulting code is implemented in CUDA C, which is suitable for programmable Nvidia GPU devices.The choice of a rectangular grid is determined by the usage of GPUs, so that the numerical implementation is straightforward.We provide the Matlab, symbolic Maple, and GPU CUDA C routines to reproduce the main presented results.These routines are available for download from Bitbucket at https://bitbucket.org/yalkhimenkov/fastbiot_gpu3d_v1.0 (last access: February 8, 2021).The routines archive (v1.0) (Alkhimenkov et al., 2021) is available from a permanent DOI repository (Zenodo) at http://doi.org/10.5281/zenodo.4519367(last access: February 8, 2021).
The novelties of the present article are summarized as following: 1. We present a dimensional analysis, reducing the number of needed material parameters from ten to four. 2. We perform a dispersion analysis as a function of dimensionless parameters.3. We achieve a close-to-ideal parallel efficiency (98% and 96%) on a weak scaling tests up to 128 GPUs and an effective memory throughput efficiency of 90% for the 3D anisotropic poroelastic wave propagation code.4. We achieve a very fast execution time (seconds) using high-resolution models involving more than 1.5 billion grid cells.

Constitutive Equations
We describe the elastodynamic Biot's equations for an isotropic medium saturated with a single phase fluid.We use a classical velocity-stress formulation for the Biot's equations.The equations describing a two phase continuum mainly differ from the single phase continuum formulation (see Appendix A) by the presence of both solid and fluid velocities and, as well as both solid and fluid pressure fields.Furthermore, the scalar parameters linking stresses and velocities in the single phase continuum become a symmetric coefficient matrix in the two phase continuum.The set of equations describing a two phase continuum (solid and fluid) was formulated in the theory of poroelasticity (Biot, 1956b(Biot, , 1962a)).The symmetric coefficient matrix is positive definite, which directly follows from the elastic potential energy.Biot's equations can be written in a symmetric form by separating volumetric and deviatoric parts of the stress tensor.Lists of symbols are given in Tables 1 and 2. The constitutive equations are (Biot, 1962a;Pride et al., 2004;Wang, 2000;Yarushina & Podladchikov, 2015) (1) and (2) linking the stress-strain relations for the solid and fluid phases with the conservation of mass (Equation 1) and representing the deviatoric stressstrain relation for the solid phase (Equation 2).The constitutive Equations 1 and 2 are written for the total pressure p and fluid pressure f p , as it was originally suggested in Biot (1962a).The porosity  in Darcy's flux q is constant in time but can be different spatially throughout the model domain.
For an isotropic material saturated with a single fluid, in which the solid frame consists of a single isotropic mineral, the Biot-Willis coefficient is where u K is the undrained bulk modulus M is the fluid storage coefficient and the Skempton's coefficient B reads

Dynamic Equations
The conservation of linear momentum (Newton's second law or dynamic equations) can be written in a symmetric form (Biot, 1962a;Pride et al., 2004;Wang, 2000;Yarushina & Podladchikov, 2015) where , T is the tortuosity and   , , 1, ,3 i j k . The off-diagonal parameter fluid density  f can be considered as the added mass ALKHIMENKOV ET AL.   coefficient.Equation 8 is analogous to that of a single phase media (Equation (A11)), the only difference being the substitution of the scalar density by a coefficient matrix with same dimensions.
Equations 1-8 are the elastodynamic Biot's equations for an isotropic medium saturated with a single phase fluid.The experiments to obtain poroelastic parameters are given in Appendix B. We emphasize that the matrices of coefficients in Equations 1 and 8 are symmetric.This symmetry combined with the non-dimensional analysis make it possible to derive the dispersion relations in a simple explicit form using symbolic calculations (Maple).

Dimensional Analysis of the Elastodynamic Biot's Equations
Dimensional analysis of PDEs unveils the impact of various physical parameters on the considered physical system.The original Biot's Equations 1-8 contain many material parameters making it difficult to understand how they affect the response of a poroelastic continuum.For enhanced clarity, we present a dimensional analysis of the elastodynamic Biot's equations.This analysis reduces the number of material parameters from ten to four, isolating the governing independent physical quantities.
For conciseness, we present our physical system as a one dimensional example to express the total stress tensor as a combination of the volumetric and deviatoric stresses (the entire analysis can be applied to three-dimensional continuum) we introduce the compliance (leaving only dimensionless parameters inside).We reformulate the system using Equation 9as and is a dimensionless parameter (the apparent Biot-Willis coefficient).
We then substitute where is the characteristic length,      * s is the characteristic time and the superscript ˜ refers to the dimensionless quantities.The resulting system of equations reads and and The four dimensionless parameters ,  a ,  ft , and  at define the coupling between the solid and fluid phase.
The two key dimensionless parameters 1 I and 2 I denote the ratio between advection and diffusion time scales and relate to hyperbolic (advection) and parabolic (diffusion) processes, respectively.The pore fluid pressure transport time scale refers to the characteristic time scale of diffusive processes.The elastic travel time scale refers to the characteristic time scale of advection processes.In order to further reduce the number of parameters, we set  where  * is a free parameter.We further choose  * as Equation 25 becomes Taking into account that  , we reformulate Equation 18as Equations 17 and 28 are the dimensionless elastodynamic Biot's equations for an isotropic medium saturated with a single fluid featuring only four dimensionless parameters: ,  a,  ft, and  at.

Dispersion Analysis of the Elastodynamic Biot's Equations
We perform dispersion analysis to understand the behavior of the dimensionless elastodynamic Biot's Equations 17 and 18.For simplicity, we only consider longitudinal waves.A single harmonic plane wave solution is where A is the amplitude, i is the imaginary unit,    2 f is the real angular frequency ( f is the frequency), k is the complex wave number, and l is the propagation direction.This solution is substituted into the system ( 17)-( 18), which gives The dispersion relation for longitudinal waves is where Equation 31 is bi-quadratic with respect to k, the four roots (±k 1 and ± 2 k ) are the complex functions of the non-dimensional angular frequency  The non-dimensional fast and slow wave phase velocities are The inverse quality factors are defined as , the fast and slow waves become the real and non-dispersive functions of the angular frequency .
eliminates  D q in 18, the system of Equations 17 and 18 becomes fully hyperbolic without the diffusive term. 2 I and the Biot-Willis coefficients  and  a control the imaginary part of the wave numbers ± )) scales permits to further simplify the dispersion relations ( 31)-( 33) to which results in a biquadratic equation with respect to k.The four roots (±k 1 and ± 2 k ) are the complex functions of the angular frequency .The dispersion relation (37) is the most important result of the dimen- sional analysis and relates to the final set of non-dimensional elastodynamic Biot's equations (Equation 17 and 28).
Figure 1a shows the non-dimensional phase velocities and inverse quality factors based on the system of Equations 17 and 28 for a homogeneous medium, which are typical for Biot's mechanism.The properties of the medium are given in Table 3.The non-dimensional phase velocity  1 V exhibits some dispersion (less than 10%) and attenuation.The non-dimensional phase velocity  2 V behaves as a diffusion mode at low frequencies, having zero velocity.At higher frequencies,  2 V behaves as a true propagating wave.The low frequency limit of  1 V corresponds to the non-dimensional undrained phase velocity The high frequency limit of V from the dispersion relations under the assumption of    .The explicit formula is given in the following section.

Multiplying non-dimensional phase velocities (
) by the drained velocity d V (Equation 20) permits to recover the dimensional form of the dispersion curves (Figure 1b).We retrieve the dimensional angular , where  is the non-dimensional angular frequency (the y-axis in Figure 1a) and  * is the transformation frequency We highlight that the introduced transformation frequency  * is similar to Biot's characteristic frequency we detail a dimensional analysis where the transformation frequency coincides with Biot's characteristic frequency in Appendix C.
Figure 2 illustrates the advantage of the non-dimensional equations over their dimensional analog.The inverse quality factor 1/ 1 Q for the non-dimensional elastodynamic Biot's equations (Figure 2b) collapsed into the one curve considering the dimensional equations (Figure 2a).

The roots
1 k and 2 k of the dispersion relation ( 37) are the functions of the four material parameters and the non-dimensional angular frequency ,  V also increases (Figure 3a).1/ 1 Q linearly depends on , as  increases, 1/Q 1 only slightly decreases (Figure 3b).V non-dim correspond to non-dimensional velocities, which were rescaled by the dimensional characteristic velocity d V and the transformation frequency  * . c is the Biot's characteristic frequency.The material parameters are those from Table 3.

Rock properties
Carbonate with dependent units V is significant and the quality factor 1/ 2 Q is almost zero, which corresponds to the regime where the slow wave behaves as a true longitudinal wave.The characteristic frequency where the transition from the diffusive to propagation regimes occurs is not affected by .   3, except for viscosities and permeabilities.3.

Numerical Implementation of the Elastodynamic Biot's Equations
We solve the first order velocity-stress formulation of the elastodynamic Biot's Equations 1-8 on a rectangular time-space grid.We base our approach on a conservative staggered space-time grid discretization (Virieux, 1986); for Darcy's flux, we use a semi-implicit discretization (Alkhimenkov et al., 2020).A conservative staggered space-time grid discretization is equivalent to a finite volume approach (Dormy & Tarantola, 1995) (see also LeVeque (1992)).This approach follows from the early Marker and Cell (MAC) method which is a classical method in computational fluid dynamics (Harlow & Welch, 1965;McKee et al., 2008).Field variables are located either at the cell center or corners and, fluxes are computed at the cell boundaries resulting in a conservative staggered grid formulation.Other similar methods were developed such as the standard staggered grid scheme (Levander, 1988;Virieux, 1986;Virieux & Madariaga, 1982), the rotated staggered grid scheme (Saenger et al., 2000), and the Lebedev scheme (Davydycheva et al., 2003;Lebedev, 1964;Lisitsa & Vishnevskiy, 2010).The elastodynamic Biot's equations using the standard staggered grid scheme were solved by Masson et al. (2006).Moczo et al. (2007) provides a review on staggered finite-difference methods for wave propagation in elastic media.

The First Order Elastodynamic Biot's Equations with a Volumetric-Deviatoric Split
Numerically solving the elastodynamic Biot's Equations 1 and 8 requires the coefficient matrices in 1 and 8 to be inverted.This formulation leads to a system of equations describing poroelastic wave propagation in three-dimensional media and can be solved explicitly: ALKHIMENKOV ET AL.   3. and where f .Note that the coefficient matrices in Equations 41 and 43 are symmetric by analogy Equations 1 and 8. Symmetry combined to non-dimensional analysis is a requirement that allows us to derive a time stepping condition in the explicit form.

Discretization
The numerical implementation consists of a time evolution operator to perform time steps within a time loop and space operators to relate fields at old and new times.We rely on a rectangular time-space grid.The time discretization is l t = Δ l t and the spatial grid is i x = Δ i x, j y = Δ j y, and k z = Δ k z.The velocity vector field and the Darcy's flux are defined at half-integer spatial nodes and integer time nodes: , 1/ 2, , , 1/2 1/2, , , 1/2, , , 1/2 ( ) , ( ) , ( ) , ( ) , ( ) , ( ) .
The total and fluid pressure scalar fields are defined at integer spatial nodes and half-integer time nodes: . The stress deviator tensor fields are defined as . A schematic representation of spatial positions is shown in Figure 5.The proposed discrete scheme is second order accurate in space and time.The material parameters are constant inside the finite volumes and may be discontinuous between them.The discrete operators for Biot's Equations 41-43 are given in Appendix E.

Stiffness of Biot's Equations
Wave propagation and fluid pressure diffusion in poroelastic media occur simultaneously but feature very different time scales.This phenomenon is called stiffness of the Biot's equations (e.g., Carcione and Quiroga-Goode (1995)).Stiffness of an equation is a serious issue for numerical solutions because the discrete time step may drop to values hindering the numerical simulation to complete.A simple solution exist to circumvent this issue for Biot's equations (Alkhimenkov et al., 2020;Masson et al., 2006), briefly reported here.The one-dimensional discrete version of 43 is ALKHIMENKOV ET AL.  .In this case, the stable time step becomes very small due to the stiffness of Biot's equations.The opposite end-member  = 1 corresponds to an implicit scheme where the stiffness no longer affects the time step stability; calculating (45) requires . Since Biot's equations do not contain spatial derivatives of the Darcy's flux q in 45, the implicit scheme  = 1 can be achieved in an iterative fashion (i.e., updates in the iteration loop are explicit).The one dimensional code for  = 1/2 is shown in Figure 7.The weight parameter  plays the key role in the stability and convergence rate of the numerical scheme which is explored in the next section.

Von Neumann Stability Analysis
The von Neumann stability method analyzes a time evolution of a discrete numerical solution of a given PDE.The method provides the stability of linear schemes with constant coefficients.We here summarize the von Neumann stability analysis' main results (Alkhimenkov et al., 2020) for Biot's poroelastic equations' discrete scheme (see also Masson et al. (2006)).For that let us introduce the matrices of coefficients and the parameter Θ is already defined in 43.The determinants of these matrices are and the Hadamard product (element-wise multiplication) of  ij and ij  is (49) The parameter A is defined as By using ( 48) and ( 50), the fast wave phase velocity in the high-frequency limit 1

HF
V can be calculated as The matrices  ij , ij  , and    k /k fully describe the dimensional elastodynamic Biot's Equations 41-43.The main issue with the numerical modeling of the Biot's equations is the treatment of the parabolic operator in (Equation E14) and (Equation E15).If  k = 0, then the system (41)-( 43) corresponds to the two coupled hyper- bolic equations, having two longitudinal waves.The stability analysis shows that the Courant-Friedrichs-Lewy (CFL) condition for such system is 1 Δ Δ / HF t x V  (Alkhimenkov et al., 2020), where 1 HF V is given by expression (51).
If  k ≠ 0 and  = 0, then the parabolic operator  [ ] D f D q in (Equation E14) and (Equation E15) affects stability and the system of equations becomes stiff.If  k reaches a certain value, the stable time step Δt dramatically decreases as a function of  k (Figure 6a).The increase in porosity  also reduces Δt but this ALKHIMENKOV ET AL.

10.1029/2020JB021175
reduction is small compared to the reduction due to the increase of  k .However, for the  = 1/2 scheme or  = 1 scheme, the parameter  k does not affect the stable time step Δt (Figure 6b).In this case, the parabolic operator  [ ] D f D q is calculated implicitly, thus, the CFL condition is not affected by  k .The  = 1/2 or  = 1 schemes are stable in one space dimension under the CFL condition where the expression for 1

HF
V is given by Equation 51, which is the same as for the inviscid case.The  = 1/2 scheme is more preferable than the  = 1 scheme, since the  = 1/2 scheme provides a second order accu- racy, which is explored below.
For any considered above schemes, the matrices  ij and ij  must be positive definite as well in order to preserve hyperbolicity of the system (Alkhimenkov et al., 2020).The positive definiteness of the matrix in Equation 41 and ij  also follows from physics, for example, from the classical irreversible thermodynamics (Jou et al., 2001;Yarushina & Podladchikov, 2015).Note, that the positive definiteness of the matrix in Equation 41 is a more restrictive condition than the positive definiteness of  ij (46) and are the same if the shear modulus G is zero.
The extension of the CFL condition (52) to the two, three and n-dimensions is straightforward The conditions ( 52)-( 54) can be generalized to a fourth-order accurate in space, second-order accurate in time numerical scheme using the coefficients of the fourth-order approximation to the first derivative (Levander, 1988;Masson et al., 2006).

Sources, Initial and Boundary Conditions
We initialize the majority of our simulations with a Gaussian perturbation, where x, y, and z are the arrays of spatial coordinates, x l , y l , and z l are the parameters controlling the shape (width) of the Gaussian and A 0 defines its amplitude.We set Depending on the model configuration, we implemented two types of sources in the right-hand side of the total pressure (isotropic media) or total stress (anisotropic media) equation (see Appendix D for the full set of equations).The first type of source is the Morlet wavelet and the second type of source is the Ricker wavelet where c f is the the source peak frequency, t is time, t0 is the wavelet delay, and b f is the time-decay parameter of the Morlet wavelet.The Morlet wavelet features a distinct narrow bandwidth in the frequency domain which significantly reduce the wavelet shape changes during the pulse propagation in a lossy medium.The disadvantage results in a significant time spread in time domain.We use reflecting boundary conditions in our simulations.
The one-dimensional time loop implementation of the proposed scheme (Equation E8-E15) in MATLAB (R2018a) using the Gaussian initial condition (55) is shown in Figure 7.

Multi-GPU Implementation
GPUs are many-core processors originally designed to refresh screen pixels at very high frame-rates for computer games.Nowadays, GPUs are widely used in high-resolution numerical modeling due to their ability to efficiently execute a large number of operations simultaneously.Several studies focused on the implementation of wave propagation solvers using GPUs (Komatitsch et al., 2010;Mehra et al., 2012;Michéa & Komatitsch, 2010;Rubio et al., 2014;Weiss & Shragge, 2013).The CUDA extension to the C language (CUDA, 2020) makes it possible to write C-style codes that are executed in parallel on GPUs.A brief description of the GPU architecture is given in Appendix F.

Computing Systems
We calculated our results on various computing systems depending mainly on the targeted numerical resolution.We performed most of our simulations on an Nvidia DGX-1 -like node hosting 8 Nvidia Tesla V100 Nvlink (32 GB) GPUs, 2 Intel Xeon Silver 4112 (2.6 GHz) CPUs, and 768 GB DDR4 RAM.The second computing system hosts a single Nvidia Tesla V100 PCIe (16 GB) GPU, 2 Intel XEON E5-2620V2 4112 (2.1 GHz) CPUs, and 64 GB DDR3 RAM.The third computing system is composed of 32 nodes, each featuring 4 Nvidia GeForce GTX Titan X Maxwell (12 GB) GPUs, 2 Intel XEON E5-2620V3 4112 (2.4 GHz) CPUs, and 128 GB DDR4 RAM.

Code Implementation on a Single GPU
The CUDA C code structure (Figure 8a) is similar to the MATLAB one (Figure 7).The time loop calls two kernels -or GPU functions -to sequentially update all stresses and the fluid pressure and then update the fluid and solid velocities.Darcy's fluxes D x q , D y q , D z q are time-dependent fields present in both Equation E14 and Equation E15 exhibiting history or time dependence that require them to be stored from previous iteration.We perform the update relying on a pointer swap at every iteration to prevent race conditions and to avoid copying the array itself, which would significantly deprecate the performance.To reduce redundant memory accesses, we locally precompute and store corresponding field variables.In the compute_ StressPrF() kernel, we store the derivatives of the velocities s

The Multi-GPU Code Implementation
The single GPU code enables thousands of threads to simultaneously compute physics on every grid points of the computational domain in a shared (GPU) memory approach.To overcome the on-GPU DRAM memory limitation and leverage the simultaneous utilization of multiple GPUs we implemented a distributed memory parallelization using the message passing interface (MPI) standard.The parallelization among multiple GPUs requires the exchange of the internal boundary values of the solid velocities s v and the Darcy's fluxes D q (represented by black lines in Figure 9).Global boundary conditions are applied if the local sub-domains coincide with the global domain boundaries.We rely on CUDA-aware non-blocking MPI messages for internal boundary condition updates among neighboring GPUs.The CUDA-awareness implies that GPU device pointers can directly be exchanged with MPI bypassing a local CPU copy on both sender and receiver side.
We implemented an overlap among computation and MPI communication to avoid a drop in performance with an increase in the number of MPI processes (Räss, Omlin, & Podladchikov, 2019).Only minimal changes are required to implement this computation/communication overlap and fully hide the MPI boundary exchange latency (Figure 8b).We divided the local computational domain on each GPU in two parts, a boundary points region (1 in Figure 9) and an inner points region (2 in Figure 9).We then use CUDA Streams to perform an asynchronous kernel call in an iterative fashion using two distinct execution pipelines (Räss, Omlin, & Podladchikov, 2019).The first update kernel call computes the boundary flagged nodes only and executes on the high priority stream.Then, the MPI boundary updates starts on the same high priority stream (the update_sides3 function).In the meanwhile, the update kernel call is executed a second time within the istep loop, now flagging and computing the remaining inner points.A wise definition of the number of grid points to include (i.e., the boundary width) enables optimal performance results.
The Nvidia visual profiler (nvvp) is an informative tool to visualize the execution timeline of a GPU process (Figure 10).We compare multi-GPU codes without (Figure 10a) and with (Figure 10b) computation/communication overlap running on an eight GPUs (information shown only for two GPUs).The visual timeline depicts the D q and s v boundary points update on the high priority CUDA stream 21 followed by the MPI message sending and receiving among GPUs (the time line is shown by red box in Figure 10b).During the same time, the D q and s v inner points update happens on the lower priority CUDA stream 22.The update kernel is executed two times (green boxes in Figure 10b).The cumulative time of the sequential executions is identical to the unsplit execution time (Figures 10a and 10b).

Performance Benchmark
We assess the solver's performance and realize the weak scaling tests in a similar fashion as proposed by Räss, Duretz, and Podladchikov (2019); Duretz et al. (2019); Räss et al. (2020).These studies highlight the memory-bounded nature (in opposition to compute-bounded) of a waste majority of PDE solver implementations nowadays on many-core (e.g., GPU) hardware; Memory transfers are limiting the performance of an application, while floating point (arithmetic) operations are not performance relevant.We therefore focus on the memory access efficiency in our numerical calculations.The effective memory throughput (MTP effective ) metric (Omlin, 2016;Omlin et al., 2020) evaluates how efficiently data is transferred between the memory and the computation units, in gigabytes per second (GB/s):  scheme), n p is the floating-point arithmetic precision (either four or eight bytes) and n t t is the time (in seconds) needed to perform the t n iterations.closer the of MTP effective gets to the memory copy only value, the the performance is.We out all the performance tests on the anisotropic Biot 3D implementation using the  1/2 scheme and scalar material properties (see Appendix D for the full set of equations).In that case n IO = 42.We used a numerical spatial resolution of 576 3 grid cells on a Tesla V100 32 GB Nvlink GPU, allocating 29 GB on-chip DRAM memory.We used a numerical spatial resolution of 511 × 511 × 127 grid cells on the Titan X (Maxwell) 12 GB GPU allocating 5 GB on-chip DRAM memory.The maximum global domain spatial resolution on 128 Titan X (Maxwell) 12 GB GPUs involved 4.5 billion grid cells.

Benchmark Results for a Single GPU Implementation
Figure 11 depicts the effective memory throughput (MTP) of the Biot 3D numerical application as a function of the number of threads per blocks in x-, y-, and z-direction on a Tesla V100 32 GB Nvlink GPU.The MTP ref corresponds to the reference MTP, that is, the best combination of threads per blocks (32, 2, 16) for a given resolution of 576 3 ; the MTP of all simulations (MTP effective ) are normalized by MTP ref .The maximal performance drop from the reference MTP is about 17%.It is interesting, that the (32, 2, 8) combination uses only 512 threads out of the 1,024 available but shows almost the same performance as combinations involving 1,024 threads.Good performance with under-utilization of the threads per block resources is known and may result by the increase in the number of concurrent blocks launched allowing for optimal scheduling.
ALKHIMENKOV ET AL.  Figure 12 shows memory access efficiently between the GPU global memory and the computation units as a function of on-chip RAM memory.Our 3D numerical application achieves on average 90% of the "ideal" memory copy only efficiency (copying two 3D arrays without performing any calculations, 740 GB/s) on a single Tesla V100 32 GB NVlink GPU.The average performance is 660 GB/s.A huge drop in the memory access performance at low on-chip RAM memory utilization reflects computations without enough data to saturate the memory bandwidth.
We additionally assessed the effective memory throughput of our 3D routine on a Tesla V100-SXM2 16 GB accessed on the Amazon Elastic Compute Cloud environment (Amazon EC2); our 3D routine performs on average at 740 GB/s (memory copy at 795 GB/s) validating the benchmark results obtained on our local GPU cluster.The discrepancy we observe may be caused by different versions of Nvidia drivers and compilers.

Benchmark Results for a Multi-GPU Implementation
We further investigate the influence of the boundary width on the performance (Figure 13).The split among computation domains allowing for overlap of computation and communication affects the performance.Considering too few or too many boundary points hinders optimal kernel execution as too few resources may be used in the first or the second sequential call.The code execution on a single Tesla V100 GPU with boundary width ratios of 0.2-0.8returns equivalent performance as the execution without the computational split.The performance of the code on 8 Tesla V100 GPUs including MPI communication shows a 2% performance drop compared to the single GPU process.We achieved the best performance using approximately a ratio of 0.3 between boundary and inner points.This splitting allows for enough data to keep all threads busy during the boundary point calculation (the first kernel execution) and provides sufficient time to hide the MPI message sent during the update of the inner points (the second kernel execution).
We performed a weak scaling test using the 1-8 Tesla V100 32 GB NVlink GPUs and the 1-128 Titan X 12 GB GPUs (Figure 14).The parallel efficiency of 1-8 GPUs is 98% and on average 96% on 16-128 Titan X GPUs with a standard deviation of 2%.A standard deviation was calculated as a result of ten simulations.We globally achieved a performance of about 5,280 GB/s on 8 Tesla V100 32 GB NVlink GPUs.Such performance im-ALKHIMENKOV ET AL.  Figure 12.The memory access efficiently as a function of the allocated on-chip DRAM memory.The blue curve corresponds to the "ideal" memory copy efficiency (copying two 3D arrays without performing any calculations), red and yellow curves represent the memory copy efficiency involving all the physics, which is on average 90% of the "ideal" memory copy efficiency.
plies that only 95 s are needed to perform 1,000 (double-precision) explicit time iterations of a model involving 1.5 billion grid cells (1152 3 ).

Comparison Against an Analytical Solution
We perform a direct comparison of our numerical solver against analytically derived non-dimensional phase velocities and the inverse quality factors of 1D Biot's equations in homogeneous poroelastic media.Biot's mechanism, often called global flow, is the unique cause leading to wave attenuation and velocity dispersion.We validated our numerical solver in 1D but the plane wave analysis is multidimensional as plane wave characteristics are identical in 1D, 2D, and 3D.In the numerical simulation, we use the proposed scheme (Equation E8-E15) with  = 1/2, the Morlet wavelet as a source function ( 56) and quantify velocity and the inverse of the quality factor of a propagating wavelet in the time domain.We obtain excellent agreement between numerical and analytical results (Figure 15).

Convergence Analysis
We performed a grid convergence analysis to validate the numerical implementation of the solver.We evaluate the magnitude of the phase velocity truncation errors ( V e ) as functions of decreasing spatial discretization steps Δx.We calculate the truncation errors by subtracting numerically calculated fields from analytical fields and characterize the magnitude of the truncation errors by their L 1 norms, using the velocity estimation (Räss et al., 2017) where V a corresponds to the analytical velocity obtained via the dispersion analysis and V n corresponds to the numerically estimated velocity.
Figure 16a shows the truncation error magnitudes of the estimated velocity in a lossless (/k = 0) and lossy (/  0 k ) media using the  = 1/2 scheme (Equation E8-E15).The source has the form of a Ricker wavelet (57) and the central frequency of the source corresponds to a very low frequency (much lower than the frequency of 1/Q maximum).Our numerical solutions for velocity exhibits second-order spatial and temporal accuracy.The truncation error magnitudes decrease by a factor k as the grid spacing is reduced by the same factor.We obtain similar results for a very high frequency source (much higher central frequency than the frequency of 1/Q maximum).
Figure 16b shows the truncation error magnitudes of the estimated velocity in a lossy medium for the scheme (Equation E8-E15) with  = 1/2 and  = 1.0.Here, the central frequency of the source corresponds to the frequency of 1/Q maximum.In this analysis, we use the numerically estimated velocity of a very high resolution simulation.The  = 1/2 scheme exhibits second-order accuracy in space and in time.In contrast, the  = 1.0 scheme shows only about 1.8 order accuracy.Only the  = 1/2 scheme exhibits second-order spatial and temporal accuracy across all frequencies while the  = 1.0 scheme exhibit second-order spatial and temporal accuracy only at low or high frequencies where attenuation (and dispersion) is very low.For schemes with  other than 1/2 (we used  = 0.6, 0.7, 0.8, and 0.9), tests show that the accuracy is lower than sec- ALKHIMENKOV ET AL.   ond-order.Therefore, the scheme with  = 1/2 is used for the numerical solution of Biot's equations in the rest of the manuscript.

Numerical Experiments
We here present a series of simulations based on Biot's equations in two and three dimensions.We discuss some basic aspects of poroelasticity, namely, wave propagation in homogeneous poroacoustic and poroelastic media, in isotropic and anisotropic poroelastic media, and at low-and high-frequency regimes.

PoroAcoustic and PoroElastic Media
We examine the difference between poroelastic and poroacoustic wave propagation at low-and high-frequencies in two dimensions.The material properties are those of an isotropic sandstone (Table 4).For the poroacoustic material, we set the shear modulus 55 c to zero.A 2D square domain of 9.35 × 9.35 m is used.We define 32 threads per blocks in x-and z-directions with 128 blocks in x-and z-directions, which result in 4,095 × 4,095 grid resolution having ≈16 × 10 6 grid cells.We apply a Gaussian distribution (55) with x l = 0.08, y l = 0.08, and A = 1 at the center of the model domain to the solid velocity ( as initial condition for the poroacoustic and low frequency poroelastic simulations.For the high frequency poroelastic simulations we also apply the Gaussian distribution to the fluid pressure f p . Figure 17 shows the total pressure ( p) and solid velocity ( x V ) fields for poroacoustic and poroelastic simulations.In total, 5,000 time steps were performed and the total physical simulation time was approximately t = 9 × 10 −4 s.The simulations were performed on a single Tesla V100 PCIe GPU.The running time was approximately 55 s for each simulation.For a performance comparison, a few simulations were executed on a single Tesla V100 Nvlink GPU, and the running time was approximately 51 s.Note, that the 2D codes performance is not optimized as it is done for 3D codes.For optimized 2D codes, the performance might be much higher.In the poroacoustic simulations (Figures 17a and 17b), the initial condition corresponds to the low frequency regime and ALKHIMENKOV ET AL.The phase velocity V 1 is normalized by the velocity in the high frequency limit 1 HF V and the dimensional angular frequency  d is normalized against Biot's frequency  c.The material parameters are those from Table 3. f of the source is close to the frequency of 1/ 1 Q maximum.The material parameters are those from Table 3.
only the fast (longitudinal) wave V 1 can be observed.Also note that the 2D poroacoustic medium cannot unload the initial condition applied to the solid velocity field, which is represented by non-zero amplitudes at the center of the model (Figure 17b).In the poroelastic simulations (Fig- ures 17c and 17d), the initial condition corresponds to the low frequency regime and only two waves can be observed-the fast (longitudinal) wave 1 V and the shear wave s V .In the poroelastic simulations (Figures 17e  and 17f), the initial condition of a Gaussian shape with x l = 8 × 10 −4 and y l = 8 × 10 −4 corresponds to the high frequency regime.Three waves can be clearly observed-the fast (longitudinal) wave 1 V , the shear wave s V and the slow (longitudinal) wave 2 V (Figures 17e and 17f).

Anisotropic Poroelastic Media
In this section, we reproduce similar two dimensional results shown in de la Puente et al. (2008); Lemoine et al. (2013), so the present simulations can be qualitatively compared to the previous works.The material properties of anisotropic rocks are similar to those of de la Puente et al. ( 2008); Lemoine et al. (2013) (Table 4).We apply a Gaussian distribution to  zz and f p with x l = 0.08, y l = and 0 A = 1 to the center of the numerical model.Other parameters are the same as in the previous 2D simulations.The simulations were performed on a single Tesla V100 Nvlink GPU.The running time was approximately 51 s for both (glassepoxy and sandstone-VTI) models, 5,000 time steps were performed.The total physical simulation time was t = 6.15 × 10−04 seconds for the anisotropic sandstone and t = 7.061 × 10−04 seconds for the glassepoxy model.The results of the solid velocity fields x V and z V are shown in Figures 18 and 19.In analogy to de la Puente et al. (2008); Lemoine et al. (2013), we show numerical results for inviscid models ( = 0) and viscid models (  0).Simulations in inviscid media mimic the high frequency regime, therefore, fast, quasi-shear, and slow waves can be observed (Figures 18a and 18b and Figures 19a and 19b).Simulations in viscid media correspond to the low frequency regime, therefore, only fast and quasi-shear waves are observed (Figures 18c and 18d and Figures 19c and 19d).

Wave Propagation in 3D Anisotropic Poroelastic Media
We simulate a wave propagating in 3D for the anisotropic poroelastic material whose properties are of the glass-epoxy (Table 4), the properties in the x-direction are duplicated to the y-direction.The simulations were performed on eight Tesla V100 Nvlink GPUs.A three dimensional cubic domain of 9.35 m × 9.35 m × 9.35 m is used.The total resolution is 1,022 × 1,022 × 1,022 grid cells in x-, y-, and z-dimensions, respectively, which results in ≈ 1 × 10 9 grid cells.We apply a Gaussian distribution to the fluid pressure f p (fluid injection) with x l = 0.18, y l = 0.18, z l = 0.18 and 0 = 10 10 at the center of the numerical model.The running time was approximately 73 s for all simulations, 1,050 time steps total simulation time was 6.8 × 10 −4 seconds.This model configuration corresponds to the low frequency regime.
Figure 20 shows the solid velocity field The velocity field is projected into several slices, also the isosurfaces of the wave amplitudes of ±3 × 10 −3 shown.Figure 21a shows the solid velocity field V for the same model (Figure 20) while Figure 21b shows x V of the 100 times smaller model (the size is 0.0935 3 m), which corresponds to the high frequency regime.The initial condition was scaled accordingly, x l = 0.018, y l = 0.018, and z l = 0.018 (A 0 is the same) and the total physical simulation time was also scaled to 6.8 × 10 −6 seconds.The behavior of fast and quasi-shear waves is similar in Figures 21a and 21b but the slow P-wave behavior is different.In Figure 21a, the slow P-wave degenerated into a diffusion mode representing viscous fluid flow in porous media while in Figure 21b the slow P-wave behaves as a true propagating wave.

Conclusions
We developed a multi-GPU solver for the anisotropic elastodynamic Biot's equations in 1D, 2D, and 3D using the CUDA C programming language leveraging the parallel processing power of GPUs.We implement a simple approach to circumvent the stiffness of Biot's equations by using an implicit scheme for Darcy's flux while keeping explicit updates in the iteration loop.We achieve a close-to-ideal parallel efficiency (98% and 96%) on weak scaling tests up to 128 GPUs by overlapping MPI communication and computations.We also achieve an effective memory throughput of 90% of the memory copy throughput.Our multi-GPU implementation of Biot's equations permits to tackle high spatial resolution and exhibits fast execution times.
We perform 1,000 explicit time steps in 95 s for a model involving 1.5 billion grid cells (1152 3 ) on 8 Tesla ALKHIMENKOV ET AL.   4).Panels (a) and (b) correspond to the inviscid medium (  0), panels (c) and (d) correspond to the viscid medium (  0).The total physical simulation time is t = 7.061 × 10 −04 seconds.V100 32 GB Nvlink GPUs using double-precision arithmetics.We analyze the stability and accuracy of the three different numerical schemes and suggest the best out of three.We benchmark the numerical solver against an analytical solution of Biot's equations and present a comprehensive dimensional analysis of Biot's equations to reduce the number of material parameters from ten to four.Our numerical application to resolve Biot's equations enables practical applications in geophysics, engineering, biophysics, and the further understanding the underlying hydromechanically coupled processes in 3D.
Figure 20.Snapshots showing the total solid velocity field  x V V + y V + z V in the medium having the properties of the glass-epoxy (Table 4).The velocity field is projected into X-Z and Y-Z slices.Red and blue isosurfaces denote the wave amplitudes of ±0.4.The anisotropic nature of the model is clearly visible due to the non-symmetric velocity field pattern.The total physical simulation time is 6.8 × 10 −4 s.V in the having the properties of the glass-epoxy (Table 4).Panel (a) shows x V of the same model as in Figure 20, red and blue isosurfaces denote the wave amplitudes of ±0.4, the total physical simulation time is 6.8 × 10 −4 seconds.Panel (b) shows x V of the 100 times smaller model, which corresponds to the high frequency regime, the total physical simulation time is 6.8 × 10 −6 seconds.Red and blue isosurfaces denote the wave amplitudes of ±3.0.

A2. Dynamic Equations
The conservation of linear momentum for a single phase material is Equation (A10) can also be called equation of motion or elastodynamic force balance law.By separating the stress tensor into deviatoric and volumetric parts, Equation (A10) can be written as In summary, the constitutive Equations ( A8) and (A9) and the conservation of linear momentum (A11) fully describe the behavior of a single phase material.Depending on the initial conditions (or the source terms) and the material parameters, the response of a single phase material may include one fast (longitudinal) wave and one shear wave.

Appendix B: Poroelastic Parameters
Three experiments permit to determine the poroelastic parameters required for Biot's equations (Makhnenko & Podladchikov, 2018).The drained bulk modulus d K can be measured under drained experiments.In such experiments the pore fluid is allowed to leave the rock during loading and that pore fluid pressure is maintained at a constant level ( f p = const, see Equation ( 1)) The undrained bulk modulus u K can be obtained under undrained experiments.In such experiments the fluid content inside the rock does not change during loading, meaning that fluid does not flow through the boundaries of the considered element (  0 The Biot-Willis parameter  can be obtained under unjacketed experiments, in which an increase in the total pressure p is equal to the increase in fluid pressure f p : (  f dp dp , see Equation ( 1)).For more information about how to measure poroelastic constants in rock samples, we refer to R. W. Zimmerman (1990).

Appendix C: An Alternative Dimensional Analysis of Biot's Equations
In (12), instead of the base quantity  t, an alternative choice is possible, namely,  a .In this case, Equa- tion (12) reads In the resulting system, Equation ( 17) is still the same, while Equation ( 18) becomes The alternative four dimensionless parameters ,  a , fa and ta now define the coupling between the solid and fluid phases.If we similarly set Thus, we choose the new  * as We end up with 2 I = 1 and the transformation frequency now is equivalent to the Biot's characteristic frequency (40).Indeed, the dimensional angular frequency , where  is the non-dimensional angular frequency and  * is the transformation frequency (analogous to (39)) which is exactly the Biot's characteristic frequency  c (40).This is the main advantage of the new dimen- sional analysis.The disadvantage is that the drained wave velocity d V formula disappears in (C4), which makes the interpretation of 1 I in terms of usual physical quantities less transparent.By using this new dimensional analysis, Figures 1-4 will remain almost the same with the only slight shift of the transition frequency closer to   1.This shift in  is defined by the ratio between  t and (  / f T ).

D1. Arbitrary Anisotropic Media
Elastodynamic Biot's equations in arbitrary anisotropic media can be written in the first order form.The stress-strain relations are where u ijkl c is the 4 th order undrained stiffness tensor and  ij is the Biot-Willis parameter represented by a second order tensor.The conservation of linear momentum reads where

D2. Orthorhombic Media
An orthorhombic medium is described by nine elastic components of the stiffness tensor.We use the shortened Voigt notation as a The stress-strain relations are The modulus * K is usually called the generalized bulk modulus, which, in fact, represents the Voigt average of the bulk modulus for an orthorhombic symmetry system.The conservation of linear momentum reads where   [0;1] is the weight parameter.The following operators for the spatial derivatives are introduced The discretized system of equations is Discretization of  yy and  zz is in analogy to that of  xx .The stress deviator tensor field is discretized as The Darcy's flux and the velocity vector fields in the discrete form are  t out of the parentheses in Equations 1, 2, and 8 thus control dispersion and attenuation of the coupled system of equations.Setting   0

Q
us analyze the solutions (35) and (36) as a function of the material parameters and .The non-dimensional phase velocities ( ) as a function of the non-dimensional frequency  and the Biot-Willis coefficient  are shown in Figure3.Ac- cording to (17),  controls the coupling between solid and fluid phas- es, low values of  (0-0.3)correspond to weak coupling and high values of  (0.7-1.0) correspond to strong coupling.We vary  in the range of (0.05, 0.95) while the other parameters remain the same. 1 V non-linearly depends on  in the whole frequency range, as  increases,  1 

2V
and 1/ 2 Q are almost independent of  (Figures3c and 3d).At low frequencies,  2 V is almost zero and the quality factor 1/ 2 Q is very high (Figures3c and 3d)

Figure 1 .
Figure 1.Phase velocities and the corresponding inverse quality factors 1/Q obtained via the dispersion analysis.(a) Dispersion relations for the non-dimensional elastodynamic Biot's equations,  1 V is the wave of the first kind (nondimensional),  2 V is the wave of the second kind (non-dimensional).(b) Dispersion relations for the dimensional elastodynamic Biot's equations.1 V dim and 2 V dim correspond to dimensional velocities, 1 V non-dim and2

Figure 4
Figure 4 is similar to Figure 3 but instead of , the variations of  ft are shown.We vary  ft in the range of

Figure 2 .
Figure 2. The inverse quality factors 1/Q of the the wave of the first kind.(a) 1/Q for the non-dimensional elastodynamic Biot's equations having different viscosities and permeabilities, all collapsed into one curve.(b) 1/Q for the dimensional elastodynamic Biot's equations for the same data set of viscosities and permeabilities.The material parameters are those from Table3, except for viscosities and permeabilities.

Figure 3 .
Figure 3. Non-dimensional phase velocities (  1 V and  2 V ) and the corresponding quality factors (1/ 1 Q and 1/ 2 Q ) as a function of the non-dimensional Biot-Willis coefficient  and the non-dimensional angular frequency .The material parameters are those from Table3.

Figure 4 .
Figure 4. Non-dimensional phase velocities (  1 V and  2 V ) and the corresponding quality factors (1/ 1 Q and 1/ 2 Q ) as a function of the non-dimensional parameter  ft and the non-dimensional angular frequency .The material parameters are those from Table3.

Figure 5 .
Figure 5.A sketch representing (a) the finite volume, where the solid velocities preserve mass balance and (b) the spatial positions of different fields in the X-Z plane.Darcy's fluxes obey the same behavior.

Figure 6 .
Figure 6.The von Neumann stability analysis for the elastodynamic Biot's Equations 41-43 as a function of  k =  /k and porosity .Panel (a) corresponds to the  = 0 scheme and panel (b) corresponds to the  = 1/2 scheme.The stability of the  = 1 scheme is identical to that one of the  = 1/2 scheme.The material parameters are those from

Figure 7 .
Figure 7.The one dimensional code using the proposed scheme with  = 1/2 in MATLAB.The initial condition of the Gaussian form is set to the fluid pressure.zeta_ij are the matrix coefficients  ij in Equation 46, varrho_ij are the matrix coefficients ij  in Equation 47, etaf_k corresponds to /k, chi corresponds to , lamx stands for x l , stress_xx stands for  xx , Prf stands for f p , Qx stands for D x q , and Vx stands for s x v .
i v and Darcy fluxes D i q .In the Update_QV () kernel, we store derivatives of stresses  ij and fluid pressure f p .ALKHIMENKOV ET AL.

Figure 8 .
Figure 8.Time loop computations for (a) a single GPU CUDA C code and (b) a multi-GPUs CUDA C code implementation.compute_StressPrf corresponds to the update of all stresses  ij and fluid pressure f p .update_QV corresponds to the update of velocities s i v and Darcy fluxes D i q swap (…) stands for a pointer swap of Darcy's fluxes

Figure 9 .
Figure 9. Schematic representation of a domain decomposition on four GPUs.First, the computation of the boundary points (1) of the local domains using streams is performed, then the computation of the inner points (2) of the local domains is carried out together with the nonblocking MPI messages to exchange the boundary values (represented by red boundary lines) among neighboring GPU units.GPU, Graphical processing units; MPI, message passing interface.

Figure 10 .
Figure 10.The Nvidia visual profiler (nvvp) output for various GPU code implementations: (a) single GPU (without computation/communication overlap), (b) mulit-GPUs (without computation/communication overlap), and (c) multi-GPUs (with computation/communication overlap).All implementations share the same compute_StressPrf kernel.The update_QV kernel is (a) executed once per time step updating both boundary and inner points, (b) executed once per time step and followed by internal boundary exchange using MPI, and (c) executed in a serial fashion, first updating the boundary points, then internal boundary exchange occurs using MPI while the inner points are asynchronously computed in the second call of the update_QV kernel.The computation/communication overlap referred to as computational split involves 48, 16, and 16 grid cells in in x-, y-, and z-directions, respectively.GPU, Graphical processing units; MPI, message passing interface; nvvp, Nvidia visual profiler.

Figure 11 .
Figure 11.The effective memory throughput as a function of the number of threads per blocks (in x-, y-, and z-direction).The MTP of all simulations (MTP effective ) are normalized by MTP ref (corresponds to Block (32, 2, 16)).The bold color corresponds to thread-block combinations of 512 threads out of the 1,024 available.MTP, memory throughput.

Figure 13 .
Figure 13.The impact of the boundary width on the memory access efficiency.All the performance results are normalized by MTP ref of the non-MPI code implementation.MPI, message passing interface; MTP, memory throughput.

Figure 14 .
Figure 14.The MPI weak scaling tests of the anisotropic Biot 3D implementation.We show the parallel efficiency of the two Nvidia hardware accelerators, the 1-8 Tesla V100 32 GB NVlink GPUs (Volta) and the 1-128 Titan X 12 GB GPUs (Maxwell).All the performance results are normalized by the single-MPI code performance.MPI, message passing interface.

Figure 15 .
Figure15.A comparison between numerically calculated dimensional phase velocities (up) and 1/Q (down) against an analytical solution of Biot's equations.Each red circle corresponds to a numerical simulation.The phase velocity V 1 is normalized by the velocity in the high frequency limit 1

Figure 16 .
Figure 16.The truncation error magnitudes of the numerically estimated velocities.(a) The low frequency source and (b) cf of the source is close to the frequency of 1/ 1 Q maximum.The material parameters are those from Table3.

Figure 17 .
Figure 17.Numerical simulation of a propagating waves.(a), (c), (e) show the total pressure field p; (v), (d), (f) show the velocity field Vx.Plots (a) and (b) correspond to the poroacoustic medium, (c) and (d) correspond to the poroelastic medium (low frequency regime), and (e) and (f) correspond to the poroelastic medium (high frequency regime).The total physical simulation time is approximately t = 9 × 10 −4 seconds.The material properties are those of an isotropic sandstone (Table4).

Figure 21 .
Figure 21.Snapshots showing the solid velocity field xV in the having the properties of the glass-epoxy (Table4).Panel (a) shows


i represents a spatial derivative in i− direction.The relation between the drained stiffness matrix ij c and the undrained stiffness matrix u

Figure F1 .
Figure F1.Schematic chip representation for both the central processing unit (CPU) and graphical processing unit (GPU) architectures.The GPU architecture consist of thousands of arithmetic and logical units (ALU).On the CPU, most of the on-chip space is devoted to controlling units and cache memory, while the number of ALUs is significantly reduced.

Table 1
List of Principal Notation

Table 3
Poroelastic Properties of Carbonateto the diffusive regime of  2 V .At high frequencies, 

Table 4
Properties of Anisotropic Poroelastic Rocks Used for Numerical Simulations

Discretization of Biot's Equations
, the following operators for the time evolution are introduced ij  is given by (D5).ALKHIMENKOV ET AL.10.1029/2020JB021175AppendixE: For simplicity, equations only in x-direction are shown in the discrete form.A few additional operators are introduced Discretization of  xz and  yz is in analogy to that of  xy .The total stress tensor field 