Hamiltonian Monte Carlo to Characterize Induced Earthquakes: Application to a ML 3.4 Event in the Groningen Gas Field and the Role of Prior

The Hamiltonian Monte Carlo algorithm is known to be highly efficient when sampling high‐dimensional model spaces due to Hamilton's equations guiding the sampling process. For weakly non‐linear problems, linearizing the forward problem enhances this efficiency. This study integrates this linearization with geological prior knowledge for optimal results. We test this approach to estimate the source parameters of a 3.4 magnitude induced event that originated in the Groningen gas field in 2019. The source parameters are the event's centroid (three components), its moment tensor (six components), and its origin time. In terms of prior knowledge, we tested two sets of centroid priors. The first set exploits the known fault geometry of the Groningen gas field, whereas the second set is generated by placing initial centroid priors on a uniform horizontal grid at a depth of 3 km (the approximate depth of the gas reservoir). As for the forward problem linearization, we use an approach in which the linearization is run iteratively in tandem with updates of the centroid prior. We demonstrate that, in the absence of a sufficiently accurate initial centroid prior, the linearization of the forward model necessitates multiple initial centroid priors. Eventually, both prior sets yield similar posteriors. Most importantly, however, they agree with the geological knowledge of the area: the posterior peaks for model vectors containing a centroid near a major fault and a moment tensor that corresponds to normal faulting along a plane with a strike almost aligning with that of the major fault.

An earthquake source can be parameterized in several ways (Aki & Richards, 2002).In this study, we consider a moment tensor (MT) representation (Jost & Herrmann, 1989).This implies that the seismic event is collapsed to a single position (point-source representation), which is usually referred to as "the centroid."Such a representation is justified in case the waveform data is analyzed at periods for which the seismic source is effectively a point source (Aki & Richards, 2002).Additionally, assuming instantaneous rupturing, we end up with 10 source parameters.The first six are the moment tensor components, where the MT's magnitude is a measure of the amount of energy released.This MT can be decomposed into isotropic (ISO), double-couple (DC), and compensated linear vector dipole (CLVD) components (Jost & Herrmann, 1989).The other four parameters are the event's east, north, and depth coordinates and the origin time.
Various data sets and techniques have been utilized to estimate the source characteristics of Groningen earthquakes.Willacy et al. (2018) adopt a deterministic approach to estimate moment tensors and centroids.These authors employed a detailed 3D subsurface model of Groningen but restricted the search space to DC sources.In contrast, Dost et al. (2020) used a probabilistic approach to estimate the centroid and full moment tensor (implying that they allowed for the ISO and CLVD components as well) but employed (locally) 1D models.Deterministic approaches often provide faster computations compared to probabilistic approaches.However, probabilistic approaches quantify the uncertainty of the different parameters; in this case, these are the uncertainties of the 10 earthquake source parameters.Also, the use of 3D subsurface models has a clear advantage over 1D subsurface models.This is because 3D models take into account the subsurface lateral heterogeneity that will affect the shape (amplitude and phase) of the seismogram generated from simulating an earthquake event using those 3D models.
In this study, we investigate the combination of a probabilistic approach with 3D subsurface models to estimate the source parameters of a real event in Groningen.To mitigate the aforementioned "inefficiency" of probabilistic approaches, we modify the workflow described in Masfara et al. (2022).This workflow relies on a variant of the Hamiltonian Monte Carlo (HMC) algorithm and has previously been tested using synthetic recordings generated using the 3D Groningen subsurface velocity model.For this study, we consider the 2019 3.4 local magnitude earthquake below the village of Westerwijtwerd (Figure 1).Since we estimate the full moment tensor, our estimation does not limit the search space to just DC components but includes the ISO and CLVD components.Also, the inclusion of origin time in the estimation quantifies the trade-off between origin time and estimated depth.
In what follows, we first describe the theory underlying the workflow.We subsequently introduce and discuss the (retrieval of the) recordings used to estimate the parameters, including the prior information that is used to increase the computational efficiency of the workflow.Finally, we compare our results to results obtained in other studies and draw conclusions, including the outlook of applying the same approach to a larger set of events in the Groningen area.

Methodology
To enable source characterization, the formal relationship between the observed data and the source (model) parameters is introduced and detailed in the first subsection.Subsequently, we introduce Bayes' theorem and, MASFARA AND WEEMSTRA 10.1029/2023EA003184 3 of 21 assuming Gaussianity, cast it in a form allowing us to utilize it.In Section 2.3, we then introduce the HMC algorithm.Finally, in Sections 2.4 and 2.5, we describe how the algorithm's efficiency can be enhanced via line arization of the forward problem and by choosing meaningful prior information, respectively.

The Forward Problem
In this study, the posterior probability of the model parameters is estimated by means of a Markov process.The generation of such a Markov chain is detailed further below (Section 2.3), but, at this point, it should be understood that for each sample in the chain, forward-modeled data is compared against measured data.In the context of our problem, a specific model m (or sample) implies assigning a specific value to each of the 10 aforementioned source parameters (MT, centroid, and origin time).The measured data d obs consists of the induced event's waveform data, which, in our case, are recordings of particle displacement recorded by KNMI instruments.Computation of the likelihood ρ(d obs |m) yields the probability of these recordings given a model m and involves quantification of the misfit between the recorded particle displacements and numerically modeled particle displacements.The latter is computed by numerically solving the wave equation, that is, they are the result of solving (what is usually referred to as) "the forward problem."Mathematically, the forward problem can be written as r) , ;  (a) ) , (1) where u i is the ith component of the particle displacement vector (u = (u 1 , u 2 , u 3 ) where 1, 2, 3, correspond to the east, north, and down direction, respectively), M jk represents an element of the 3 × 3 moment tensor M at position x (a) , that is, the centroid.Note that j and k indicate the axis along which the force is acting and the direction in which the arm is pointing, respectively (Aki & Richards, 2002).Furthermore, x (r) denotes the position where the displacement is recorded, G is the Green's tensor, * represents temporal convolution, and T 0 denotes the origin time.The comma after the second subscript of an individual element of the 3 × 3 Green's tensor implies a spatial derivative in the k direction with respect to x (a) .To make the computation of   (  (r) ,  ) for a large number of potential centroids (i.e., a large number of x (a) ) more efficient, we invoke reciprocity (Aki & Richards, 2002).In this study, the numerically modeled particle displacements are generated using SPECFEM3D (Komatitsch & Tromp, 2002).For this purpose, we use the 3D subsurface models of the Groningen gas field by Romijn (2017).

Bayes' Theorem
The probabilistic workflow used in this study relies on Bayes' theorem (or rule).In general, Bayes' theorem describes how, in the presence of prior knowledge, the probability of a hypothesis (or model) m depends on the available data d obs .The prior knowledge is accounted for by the prior probability distribution (often simply referred to as "the prior").Ignoring the marginal probability (or "evidence"), Bayes' theorem can be written as where ρ(m|d obs ) is the posterior probability distribution (or simply "the posterior"), ρ(d obs |m) the likelihood, and ρ(m) the prior probability distribution.The model vector m is a ten-component vector containing the centroid x (a) (where a Cartesian east-north-down coordinate system implies that   (a) = (  (a)  1 ,  (a)  2 ,  (a)   3 ) ; hence three model parameters), the moment tensor M (six independent elements and hence six model parameters), and the origin time T 0 (one model parameter).This implies that ρ(m) represents the prior probability of these 10 parameters.
Assuming Gaussian observational errors and a Gaussian distributed prior probability, the posterior in Equation 2 can be written as (Fichtner & Simutė, 2018;Masfara et al., 2022): Here,  () contains the numerically modeled displacement recordings (solution of Equation 1) and d obs the observed ones.Explicitly, for a total of N r three-component instruments,  () is a concatenation of all 3 × N r modeled seismograms and d obs a concatenation of all 3 × N r recorded seismograms.C d , C m , and m (0) are the data covariance matrix, prior covariance matrix, and prior mean, respectively.Evaluating Equation 3 results in the (a posteriori) probability of the model parameters, that is, their probability given observations and prior knowledge of the system (Tarantola, 2006).

Hamiltonian Monte Carlo
Although Bayes' theorem describes how the posterior probability distribution depends on the available data d obs (through the likelihood) and prior knowledge ρ(m), that posterior can usually not be estimated directly (Tarantola & Valette, 1981).In particular, a large number of model parameters and non-linearity prohibit this.To overcome this, we generate a sequence of specific models (often called "samples") in what is referred to as a "Markov chain."The density of these samples reflects the density of the posterior distribution we seek to find.
Numerous sampling algorithms are available to estimate ρ(m|d obs ), all with their own advantages and disadvantages.In this study, we implement a workflow that relies on the Hamiltonian Monte Carlo (HMC) algorithm.HMC was derived from classical mechanics, applied to statistical mechanics (Betancourt, 2017), and considered one of the most efficient probabilistic algorithms for exploring high-dimensional model spaces.HMC relies on the sequential calculation of two quantities.These are the "potential energy" U, which is a function of the model vector m, and the "kinetic energy" K, which, in our framework, is solely a function of the momentum vector p.This momentum vector is an auxiliary vector that has the same dimension as m (10 in our case).Together, m and p make up what is often referred to as the "phase space," and their joint probability is described by the "canonical distribution"  (, ) .
The canonical distribution can be written in terms of an invariant function  (, ) , that is, Here,  (, ) is referred to as "the Hamiltonian," and its value in phase space is usually called "the energy" at that point (Neal, 2011).As such, a model m can be looked upon as the position of a "particle" (Betancourt, 2017) (5) Here, U(m) ≡ − ln ρ(m|d obs ).
Equation 5 describes the more general case; in our implementation,  (, ) is merely a function of the momentum vector and hence  (, ) → () .Specifically, it is given by (Fichtner & Simutė, 2018;Masfara et al., 2022) where the mass matrix   acts as a tuning parameter (Fichtner et al., 2019(Fichtner et al., , 2021)), allowing the particle to move through the desired areas of phase space with corresponding potential and kinetic energy (Betancourt, 2017).
Starting from an initial estimate of m with some prescribed initial momentum, Hamilton's equations, which read will efficiently explore areas with relatively low potential energies (corresponding to the a posteriori more probable areas of the model space; see Equation 5).Here, the quantity τ is the "artificial time" that is used to propagate (the particle) from the initial model along trajectories of constant H.This propagation occurs for some (to-be-determined) time τ lp , where the subscript "lp" stems from "leap" as we use the leapfrog algorithm to evaluate 7. The model reached at τ lp , that is, m(τ lp ), is subsequently accepted with probability ) which is usually referred to as the "metropolis rule" (Tarantola, 2005).If the model m(τ lp ) is not accepted, the process will be repeated by introducing a new (different) momentum vector to the initial model.If accepted, the model m(τ lp ) will serve as the starting point for a new deterministic trajectory after being endowed with momentum.
One of the main advantages of using HMC over generic probabilistic sampling algorithms such as Metropolis-Hasting (MH) algorithms is its ability to sample the posterior distribution more efficiently, which is illustrated in Figure 2. In MH (left figure), new samples are randomly drawn based on specified proposal distributions.This behavior is represented by the red balls around the current sample, which for the first iteration, are close to the starting model.In HMC (right figure), the iterative endowment with the momentum of the current sample results in a combination of short and long trajectories, reducing the correlation between samples.Furthermore, the inclusion of the gradient of the target distribution via the computation of Hamilton's equations enables HMC to stay in the "typical set," constituting areas where the posterior probability is elevated.Therefore, with a combination of short and long trajectories, and the ability to slide along the typical set, HMC can explore the posterior distribution more efficiently with fewer iterations/samples.This type of exploration is particularly useful when sampling complex posterior distribution in high-dimensional model spaces.The main computational burden of HMC is the computation of   ⁄ .This computational burden increases as the number of model parameters increases.To mitigate this, we adopt an approach in which we linearize the forward problem, which we will now briefly describe.

Linearization of the Forward Problem
To ease the computation of the gradient of the potential energy in the model space, Fichtner and Simutė (2018) linearize Equation 1by means of a Taylor expansion around the prior mean m (0) (see Appendix A).Simutė et al. (2022) use this same modification and 3D Earth models to characterize tectonic earthquakes below the Japanese peninsula.In these studies, m (0) is obtained from an earthquake catalog, which is not always directly available for induced earthquakes.Replacing, in d(m), the numerically modeled displacements u(x (r) , t) by numerically modeled displacements resulting from a linear approximation of Equation 1 implies that we assume m (0) to be "sufficiently close" to the true model parameters.This merely applies for the centroid x (a) and origin time T 0 .That is, since the particle displacement depends linearly on the moment tensor components, the linearization does not impose an approximation when it comes to the moment tensor components.Importantly, "sufficiently close" means that the centroid x (a) and origin time T 0 should be at sub-wavelength and sub-period distance from the true centroid and origin time, respectively.
In our case, the assumption that m (0) is sufficiently close to the true model parameters is usually not met.This will render the application of HMC ineffective (to state the least).In order to apply HMC (including a linear approximation of Equation 1) to induced earthquakes, two main challenges, therefore, need to be addressed.First, the recorded seismograms are often dominated by high-frequency signals (>1 Hz), increasing the non-linearity of the forward problem.Second, as mentioned earlier, the prior information is often unavailable or rather inaccurate.To address these challenges, in this study, we use the multi-stage workflow introduced by Masfara et al. (2022).This means that we iteratively update m (0) , which is detailed in the remainder of this section.In addition, we run this workflow multiple times (in parallel), each starting from a different m (0) .This is explained in Section 2.5.In the remainder of this paper, we will refer to the HMC variant that involves a linearization of the forward problems as "linearized HMC."It should be understood, however, that this does not involve a linearization of Hamilton's equations itself.
Figure 3 illustrates the embedding of linearized HMC in the proposed multi-stage workflow.Iteratively updating m (0) partly overcomes deviations of the estimated posterior from the true posterior, thus addressing the first challenge.Given a first m (0) , the three quantities in Equations A4-A6 need to be computed only once in order to sample a "local posterior" around that m (0) .These quantities are used to compute the gradient of the potential energy and hence evaluate Hamilton's equations and the Hamiltonian itself (Equations 7 and 5, respectively).Importantly, in the absence of a linearization of the forward problem, the computation of Equations 7 and 5 requires the forward problem to be evaluated during each deterministic trajectory.Linearization of Equation 1, resulting in the three aforementioned quantities, renders this unnecessary for each individual stage (Masfara et al., 2022).When m (0) does not coincide with the true model parameters, the linearized HMC algorithm will explore a "local posterior" that deviates from the true posterior distribution despite being computationally efficient.This is illustrated in Figure 3a, where the linearized HMC can only explore the area above the orange curve.To obtain a better approximation of the posterior, the workflow uses the result of exploring the local posterior in Figure 3a to obtain a new m (0) (essentially taking the mean of the local posterior and using that as m (0) ).Linearization of the forward problem about the updated m (0) and re-computation of the aforementioned quantities allows for a new exploration of the model space in Figures 3b and 3c.After five Taylor expansions about the new m (0) , six local posteriors are estimated.The associated distributions are, for each stage, depicted in Figure 3d.Having the results from all stages in (d), the workflow then uses variance reduction (e.g., Masfara et al., 2022;Mustać & Tkalčić, 2016) as a criterion to select stages that should be included in the estimate of the final posterior.This is depicted in Figure 3e.

The Importance of the Prior
Having an inaccurate m (0) can only partly be overcome by updating m (0) in progressive stages.That is, the multi-stage workflow will still be ineffective when the initial m (0) is located in a "local mode" of the posterior distribution (i.e., associated with a local minimum of the potential energy).The chance of this happening increases with an increase in the non-linearity between the model parameters and the observed displacement recordings (i.e., higher frequencies).In practice, this happens when the centroid x (a) and origin time T 0 in m (0)  are separated from the true centroid and true origin time by more than (approximately) half a wavelength or half a period, respectively.To address this, we additionally use multiple initial m (0)  list .We illustrate the process of using multiple m (0) in Figure 4. We depict three initial m (0) , with one located in the "correct" lobe, that is,   (0) 2 .Each of the m (0) will then be updated in a similar fashion as shown in Figure 3.While   (0) 1 and   (0) 3 ended up sampling the wrong lobe, the updated   (0) 2 enables the linearized HMC algorithm to sample the correct lobe.In Figure 4b, we detail the last stage of the multi-stage workflow that started with   (0) 2 in the red circle.We end this section by emphasizing that although being very efficient in sampling the posterior distribution (through the potential energy), the proposed multi-stage workflow (including the use of multiple initial priors   (0)  ) ultimately only results in an approximate posterior distribution.This is because the true observational  1) is assumed to coincide with the true velocity model.Since this will not be the case, another "source of error" is introduced, which in practice will result in a deviation of the estimated posterior from the true posterior.Moreover, since a Markov process only approaches the true posterior asymptotically, a Markov-chain-based estimate of the posterior is, by definition, an approximation.Whereas the latter two cannot be circumvented (we don't have the exact subsurface model and also cannot run a Markov chain for an infinite amount of time), the linearization is, in principle, not necessary, and also Gaussian observational errors do not need to be assumed.Not doing so, however, would make the computational demands prohibitively large.

Data
In this study, d obs contains the 3 × N r recordings of displacements (u obs ) due to an induced event that occurred close to the village of Westerwijtwerd in 2019, the province of Groningen (see Figure 1).The KNMI estimates the magnitude of the earthquake to be 3.4 local magnitude.We collected u obs from ten G-network seismometers.These seismometers are selected based on their distance and azimuthal coverage with respect to the estimated epicenter.In Figure 5a, we depict the 10 seismometers as white inverted triangles and the location of the KNMI-estimated epicenter by a blue star.The seismometers are part of the KNMI borehole network: each borehole contains four vertically-separated seismometers.The number at the end of their ID indicates their depth, that is, their IDs run from 1 to 4, with the instruments numbered 1 being at 50 m depth and the instruments numbered 4 being at 200 m depth.We illustrated the configuration of a string of borehole seismometers in Figure 5b.
From the four seismometers in each borehole, we solely used the seismograms recorded by the deepest seismometers: they have a higher signal-to-noise ratio than the shallower seismometers (Dost et al., 2012).Furthermore, all seismometers experience a horizontal rotation while lowering them in the borehole.Consequently, a rotation needs to be carried out for projecting the horizontal recordings to specific preferred orientations, which in our case are to the east-west(x 1 -axis) and north-south(x 2 -axis) orientations, respectively.In Figure 5b, we illustrate the orientation of the deepest borehole seismometer.The axes H1 and H2 are proxies of east and north.We then rotate the data to the true east and north using the angles given in Ruigrok et al. (2019).We depict the original seismograms (obtained from the KNMI) and the rotated seismograms of the selected seismometers in Figure 6.Dost et al. (2020) have used the same recordings to characterize the Westerwijtwerd event probabilistically.These authors, however, use local 1D velocity models to solve the forward problem.Furthermore, they separately use 0.5 and 1 s windows of P and S waves, respectively, where the P-wave is given more weight and evaluated at higher frequencies (i.e., 2-4 Hz for P and 1-3 Hz for S-wave).The P-wave waveform is given a higher weight because of the higher accuracy of the employed P-wave velocity models (compared to the S-wave velocity models).Also, these   2020), which uses a coherence method.This study focuses on determining the hypocenter.They find most Groningen earthquakes to systematically originate approximately 200 m above the reservoir layer.In this study, we exclusively use P-wave seismograms due to the significantly higher accuracy of the P-wave model.Furthermore, we use both the vertical and horizontal components and filter the recordings using a passband of 1-4 Hz, similar to the frequency range used by Dost et al. (2020).
As for the length of the measurement window, we use 2.5 s for all components and taper both ends with a 0.5 s cosine taper.For the data covariance, we use a diagonal matrix representing uncorrelated noise and estimate this to be 5% of each component's maximum amplitude.By taking a certain fraction of the maximum amplitude, we overestimate the "true noise."The reason for this is that we want to account for (part) of the waveform misfit arising from the deviation of the employed velocity model Romijn (2017) from the true (unknown) velocity model.Before applying it to the field data, we perform a synthetic experiment, which is detailed in the next section.

Synthetic Experiment
In this section, we test the validity of the proposed workflow and data processing parameters (i.e., frequency band, length of the measurement window, and noise criteria) on a synthetic event.For this, we first generate synthetic data using the KNMI-estimated hypocenter as the centroid of our synthetic earthquake.We then set T 0 to 3 s, and for the MT, we use the values of 0.2E13 Nm, 2.86E13 Nm, −3.07E13 Nm, 0.76E13 Nm, −0.45E13 Nm, −1.71E13 Nm for M nn , M ee M dd , M ne , M nd , and M ed respectively.These values represent pure shear normal faulting (rake of −90°) along a geological fault with a strike of 165°, a dip of 60°, and a moment magnitude of 3.
We then corrupt the data in the frequency domain to simulate the presence of uncorrelated noise.This is implemented using the same approach as Mustać and Tkalčić (2016).In the time domain, the uncorrelated noise results in amplitude variations that affect the estimation of our centroid and MT, and shift the observed recordings in time (resulting in uncertainty in T 0 ).To effectively test the workflow, we first choose a (single) m (0) that significantly deviates from the actual value (i.e., the synthetic earthquake parameters).For the centroid, we impose a shift of 200 m along each axis, that is, the centroid in m (0) deviates 200 m from   (a)  ,   (a)  , and   (a)  .For the MT, we simply assign a uniform value to each MT component, and for T 0 , we impose a shift of 0.5 s.We then run our workflow for 20 stages (i.e., the prior mean m (0) is updated 20 times).The results are presented in Figure 7.
The yellow stars represent the initial m (0) , and the red lines represent the true synthetic earthquake parameters.The black dots are the samples generated from all 20 stages, which are equivalent to samples used to build all the  3d.Whereas the green dots are the samples from selected stages based on a VR criterion, equivalent to the samples from the selected stages in Figure 3e.The red dots represent samples resulting from a generic HMC run (i.e., HMC without linearizing the forward problem).This run was terminated as soon as the number of times for which the forward problem needed to be solved coincided with the number of times the forward problem was solved while running the multi-stage workflow in which the forward problem was linearized.Mind that each solution of the forward problem involves the computation of 3 × 10 seismograms (recall from Section 3 that N r = 10).
Let us demonstrate the computational benefit of the multi-stage workflow (in conjunction with a linearization of the forward problem) over generic HMC (which does not involve this linearization).The number of times the forward problem needs to be solved in order to generate four model samples using generic HMC (represented by the star and the red dots in Figure 7) is 404.Here, each "solution of the forward problem" in practice involves a separate computation of the   (  (r) ,  ) in Equation 1.We arrive at 404 as follows: it depends on the number of generated samples N s (4 in this case), the number of leaps N lp to arrive at m(τ lp ) (here we use 5), and the number of model parameters N m (10 in our case).First, with the prescribed five leaps to arrive at a new model started from the current model, we evaluate Equation 7 five times.Second, the evaluation of Equation 7requires the computation of    .For that, we use a central difference approximation, which means that for each of the 10 parameters in m, we must evaluate U twice.Additionally, after the five leaps, we still have to compute to evaluate Equation 8, which requires one additional solution of the forward problem per sample.Consequently, the total number of forward problem solutions coincides with N s × N lp × 2N m + N s = 404.Linearization of the forward problem reduces this number dramatically.In fact, for every stage of the multi-stage workflow, the number of samples that can be generated is unlimited in the sense that it does not require additional solutions to the forward problem.The forward problem just needs to be run 2 × N m = 20 times per stage.This number stems from the (one-time) computation of the derivatives of U.These derivatives are included in the A pq , b p , and c (Equations A4, A5, and A6 in appendix Appendix A, respectively).Therefore, to generate all samples for a total of 20 stages (i.e., 20 updates of m (0) ), the number of times the solution to the forward problem needs to be computed is just 400.
We use the mean of the approximate posterior resulting from our multi-stage workflow to generate displacement recordings.In Figure 8, we compare these recordings with the observed (synthetic) recordings.The observed recordings are depicted in brown (recall that noise is added to these seismograms).The recordings associated with the mean values of our estimated posterior are depicted in gray and align well with the noise-free recordings associated with the true source parameters (depicted in green).

Prior Knowledge
In Subsection 2.5, we discussed the importance of using   (0) list to avoid getting trapped in a local mode.For the purpose of generating   (0) list , we make use of the available fault map of Groningen's subsurface by Bourne and Oates (2017).This is inspired by research that reveals a strong correlation between hypocenters and major faults in Groningen's subsurface (Pickering, 2015;Spetzler & Dost, 2017;Willacy et al., 2018).In this context, we also evaluate the importance of the displacement along the horizontal components for the estimated posterior.
The reason for this is potential errors arising from possible incorrect rotations of the horizontal displacements (see Section 3).Combined, we, therefore, investigate three different cases: two centroid prior configurations (i.e., with different   (0) list ) of which one is used in conjunction with both the vertical component recordings and the three-component recordings.The configuration that uses known faults in the reservoir as a basis to generate the   (0) list , in conjunction with the vertical component recordings only, is referred to as "1C-fault."The same configuration, but used to estimate the posterior based on the recordings by all three components, is referred to as "3C-fault."The other centroid prior configuration we consider consists of a square grid that covers not just the fault but also the surrounding area.This configuration of   (0) list is only used in conjunction with the recordings by all three components and is referred to as the "3C-grid'.This centroid prior configuration is considered to evaluate whether the recovered posterior might peak at a centroid position that deviates from the known fault geometry.The two different centroid prior configurations are depicted in Figure 9.
To generate the entries (individual m (0) ) in   (0) list of the two considered prior configurations, we first draw a circle with a 1 km radius around the epicenter estimated by the KNMI.The enclosed area is colored dark green in Figure 9. Next, we discretize the fault inside the circle using a spatial sampling criterion based on the approximate seismic P-wave velocity within the circle and the highest frequency we use while fitting the waveforms.This criterion provides a rough estimate of the minimum "wavelength" of the posterior distribution.By discretizing the fault such that the individual centroids (associated with individual   (0)  ) in   (0)  list are separated by less than half this wavelength, we therefore, ensure that at least one of the initial priors is located in the "correct" lobe, that is, similar to what we have illustrated in Figure 4. Given the P-wave velocities at reservoir depth and a maximum frequency of 4 Hz (recall that we filter the recordings using a passband of 1-4 Hz), we arrive at a value of 200 m for this criterion.This is hence the separation along the fault at which individual centroid priors are placed.We depict these initial centroid priors in Figure 9 as yellow stars.At the same time, the fault orientations at these positions are used to determine the six moment tensor entries in the initial priors.As for the depth and origin time T 0 in the   (0)  , we use the values estimated by the KNMI for both configurations (i.e., 3 km for the depth and 2019-05-22T03:49:00.075s for the origin time).In total, 19 individual m (0) are concatenated in   (0) list for 1C-fault and 3C-fault.For the third case, we consider a centroid prior configuration consisting of a square grid of 2 km × 2 km, with the center again being the epicenter estimated by the KNMI.We use the same criterion (200 m) to determine the horizontal spacing between the individual centroid priors.In Figure 9, we depict these as green stars.For the depth and origin time, we use identical values.Furthermore, for the MT, we assign a uniform value to each MT component for each individual m (0) .In total, we obtain 121 initial m (0) for this configuration.

Application to Field Data
For all cases described above (1C-fault, 3C-fault, 3C-grid), our multi-stage workflow consists of 20 stages.For the centroid prior configuration derived from the geometry of the known faults within the reservoir   (0) list contains N = 19 m (0) , which implies a total of 380 stages.For the 3C-grid, a total of 121 initial priors serve as the starting model of the 121 multi-stage workflows (see Figure 9), resulting in a total of 2,420 stages for this configuration. of the 20 m (0) in   (0) list for which the VR score attains its maximum.Note that this model's centroid is usually not at the location of the initial centroid prior mean (i.e., the centroid in   (0) 0 ) because the models for which the waveforms best fit the observed recordings are often found in one of the later stages; see also Figure 3.For all three cases considered here, the highest VR scores are obtained in those chains for which the initial centroid prior mean is close to a fault.

Estimated Posterior
In Figure 11, we display the 1D marginal posterior distributions obtained from the selected stages of each configuration.In general, the mean value of these posteriors is fairly consistent across configurations, especially for 3C-fault and 3C-grid.For the 1C-fault case, the mean of the posteriors slightly deviates while at the same time having a slightly broader distribution compared to the other two cases.We attribute this to the fact that, for 3C-fault and 3C-grid, the additional data reduces the uncertainty of the estimates.In Figure 12, and for 3C-fault, we also plot the progression of the different stages associated with one of the individual centroid priors included in one of the m (0) in   (0)  list .Specifically, we show the progression of that workflow (i.e., starting from that   (0) i ) that contains the stage that results in the overall maximum VR score.The vertical lines represent the start of different stages, and the red horizontal lines are the posterior means computed using the selected stages (after evaluating the VR scores for all stages).The progression follows a trend identical to the illustration in Figure 3d, especially  for the origin time T 0 , with a slight variation for some others, such as for the depth and M nd that shift monotonically to lower values.It is important to add that an initial estimate of T 0 was obtained using the envelope of the traces.This is described in detail in Section 6.1 of Masfara et al. (2022).

Traces Associated With the Posterior Distribution
Using the posterior mean in Figure 11, we generate synthetic data and compare these with the observed data in Figure 13.In our workflow, the misfit in Equation A1 is based on 2.5 s of the observed particle displacement, bandpass filtered between 1 and 4 Hz.Here, for consistency, we adopted the same values for these parameters.Additionally, we show in Figure 13 the maximum and minimum bounds using synthetic data generated from 1,000 models drawn from the posterior distribution.We depict those bounds as a shaded area in Figure 13.

Source Characteristics
To investigate the source characteristics of the analyzed induced event, we first decompose the MTs of the posteriors shown in Figure 11.In this study, we do not limit our solutions to a single mechanism.We, therefore, decompose our moment tensor solutions into their ISO, DC, and CLVD components.We do this for each case (1C-fault, 3C-fault, and 3C-grid) and depict the decompositions in the Hudson plots in Figure 14.The mean MT for each case is represented by the beachball with the red outline.For all cases, the DC "region" is densely clustered (i.e., the center of the plot), with negative ISO components clearly outnumbering positive ISO components.This is often attributed to the compaction due to the gas extraction (Dost et al., 2020).We show the posterior distributions of the different MT components in Figure 15 (bottom row).Furthermore, in the top row, we depict the translation of the MT solutions in Figure 14 to distributions of strike, dip, and rake.Here, we only show solutions with strikes between 90° and 180°, which are in accordance with the orientation of the fault close by (given the centroid posterior distributions).
We visualize the centroid posterior distributions using horizontal and vertical slices of the Groningen subsurface (Figure 16).In the top row, we show the depth of the top reservoir as a contour map, including the location of faults from Bourne and Oates (2017) at that depth.On top of these contour maps, we show the samples used to generate the 2D marginal posterior distributions of the lateral position of the centroids.We also plot the result from Dost et al. (2020) and the KNMI as the black beachball and blue star, respectively.The red beachball represents the mean MT which is also depicted in Figure 14 (beachball with red outlines).Not only do the posterior means of the (lateral) centroid positions coincide with the known fault, but also, the moment tensor solution agrees quite well with the strike of the nearby fault.On the vertical slices (middle and bottom rows), we depict the depth of the top reservoir as solid black lines.The location of the east-west vertical cross section and the north-south vertical cross section are shown as red and blue lines in the contour maps, respectively.For this specific earthquake, we find the posterior mean of the centroid to be slightly shallower than the centroid estimated by Dost et al. (2020).In fact, instead of being within the reservoir, we find the probability of having the earthquake nucleated above the reservoir is higher.The earthquake (model) parameter that has the strongest trade-off with depth is origin time.This is because an earlier origin time can be translated to an earthquake occurring at greater deeper and vice versa.In this study, origin time uncertainty is considered, and the result shows that the estimated T 0 from the KNMI is lagging by a few milliseconds.As a caveat, however, we do not consider the uncertainty in the 3D velocity models, which may not only introduce amplitude variations but also affect the origin time and/or depth.For a more detailed comparison, in Table 1, we list the mean and standard deviation of our estimated parameters (for the MTs, we convert these into strike, dip, and rake solutions) and compare them with the result of Dost et al. (2020) and the KNMI (hypocenter only).

Discussion and Conclusion
Using a probabilistic workflow incorporating the HMC algorithm, we estimate the source characteristics of a 3.4 M L induced earthquake associated with gas extraction from the Groningen gas field.Specifically, we estimate the posterior probability density of 10 earthquake parameters using two different sets of initial prior probabilities, of which one is used in conjunction with two sets of data: one consisting only of vertical component displacement recordings and a second one composed of the particle displacement in all three directions (east, north, down).We find that the posteriors estimated using both horizontal and vertical components of the seismograms (i.e., the latter data set) have similar shapes.At the same time, the one that only depends on the vertical component MASFARA AND WEEMSTRA 10.1029/2023EA003184 16 of 21 recordings yields a posterior that deviates (slightly) from the results of the other two cases while simultaneously being slightly broader.However, we find no substantial difference in the modeled seismograms associated with the different posterior means.In terms of runtime, using an 8-core MacBook Pro (2018 version), it took us a maximum of 3 min to run the 19 multi-stage workflows of the 1C-fault and 3C-fault case, and 12 min for the 121 multi-stage workflows of 3C-grid.
The main factor that affects the shape of the posteriors is uncertainty, which, in this case, is formulated as data and model uncertainty.In our study, we choose a uniform distribution for the model parameters to encode a state  11).The shaded area is within the maximum and minimum bounds of a total of 1,000 waveforms generated using 1,000 models drawn from the posterior distributions in Figure 11.Each seismogram is filtered and tapered using the same parameters used in the multi-stage HMC workflow.The duration of each trace plotted here is 3.25s.
of ignorance (i.e., σ m → ∞).Whereas the data uncertainty is estimated individually for each component on each seismometer (and hence captured by σ ri in A pq , see Equation A4, where the indices r and i are associated with a specific receiver and component, respectively).It is assumed that the noise is uncorrelated.Prescribing the noise to be correlated will make the workflow more complex and computationally more costly and require us to estimate data covariance matrices.In addition, a study by Gu et al. (2018) reveals that in the case of induced seismicity, accounting for (potentially) correlated noise has relatively little effect compared to the uncertainty arising from the inaccuracy of the velocity model.Ideally, the latter is also formally included.The relation between a specific source model (i.e., a specific set of model parameters) and the particle displacement at the surface will, in that case, be quantified by means of a probability density function (Tarantola & Valette, 1981).Due to limited computational resources, however, we disregard the uncertainty associated with the velocity model.Including it (for our 3D velocity model) will require enormous computational effort as each "cell" in the model must be varied according to their variance when computing the forward problem represented by Equation 1 (effectively, the Green's functions will become probability density functions).While using 1D velocity models, lateral heterogeneity is not considered, and therefore, the number of cells will be exponentially reduced hence the computational burden.In general, using 3D models has improved the characterization of earthquake sources since they better represent the subsurface compared to 1D models (Hejrani et al., 2017;Hingee et al., 2011;Wang & Zhan, 2020).Many studies involving MT inversions limit the model space to purely double-couple sources.Often, this limitation is justified by (presumed) a priori information of the source type.For example, a DC mechanism is usually sufficient to explain faulting in tectonically active areas where volumetric components can be expected to be negligible.In the context of induced seismicity, however, numerous studies have found that non-DC    Note.Specifically for T 0 , the value is relative to the origin time estimated by the KNMI.

Table 1
Comparison Between Earthquake Parameters Estimated in This Study With Estimation From Dost et al. (2020) and the KNMI

Figure 1 .
Figure 1.Map of the research area.The inverted triangles indicate the location of the KNMI seismometers, and the blue star is the epicenter of the 2019 Westerwijtwerd earthquake, as estimated by the KNMI.Axes indicate location using the Dutch RD coordinate system.This specific coordinate system gives the geodetic coordinates for European Netherlands and is used in official national maps.The inset at the bottom right shows the location of the study area.

Figure 2 .
Figure 2. Illustration of model space exploration using Metropolis-Hastings (a) and Hamiltonian Monte Carlo (b)algorithms.Note that with a similar number of accepted samples, HMC explores the distribution more efficiently via a combination of iterative short and long trajectories.This is achieved by prescribing a different momentum for each trajectory and iterative computation of Hamilton's equations.Mind that we only show the rejected samples of the first two moves/ accepted samples for both algorithms.

Figure 3 .
Figure 3. Illustration of linearized HMC embedded in the multi-stage workflow detailed in Masfara et al. (2022).Panel (a) to (c) depict the sampling of a local posterior associated with different m (0) .In (a), m (0) is the initial prior mean.In the next stage (b), m (0) is updated using the results of the exploration of the local posterior associated with this initial prior mean until m (0) (almost) coincides with the most likely model (c).The workflow's progression up to five stages is shown in (d).Panel (e) is the final posterior composed using variance reduction criterion, which discriminates the first two stages from stages 2-5.

Figure 4 .Figure 5 .
Figure 4. (a) Illustration of using multiple initial m (0) while sampling a complex/multimodal posterior distribution using linearized HMC.(b) Zoom of the last stage of the multiple stages associated with the initial model prior   (0) 2 .

Figure 6 .
Figure 6.Observed seismograms before (green) and after rotation/polarity switch (black).Recordings are normalized (individually) with respect to maximum particle displacement (written in blue).

Figure 7 .
Figure 7. Marginal posterior probabilities obtained through applying the proposed linearized HMC workflow to synthetic recordings.The stars represent the initial m(0) .The black and green dots represent all accepted samples from all 20 stages and samples from selected stages (i.e., the VR-score exceeds 0.95) used to compose the final posterior, respectively.The red lines represent the true (synthetic) model parameters, and the red dots are the samples generated running the generic (non-linearized) HMC algorithm.

Figure 8 .
Figure 8. Seismograms modeled using the posterior mean (gray) compared to the modeled observed recordings with noise added (brown) and the modeled observed recordings without noise (green).

Figure 9 .
Figure 9. Horizontal positions of the different centroid priors for the two different prior configurations considered.The first configuration is guided by the known fault geometry inside the green circle and is represented by the yellow stars.This circle has a 1 km radius and is centered at the epicenter estimated by the KNMI (blue star).The second centroid prior configuration uses a 2 km × 2 km grid with the KNMI-estimated epicenter at its center.These centroid priors are depicted as green stars.

Figure 10 .
Figure 10.Maximum VR score in each of the chains associated with the different initial m (0) for the three different cases considered (from left to right: 1C-fault, 3C-fault, and 3C-grid, respectively).Note that here we represent them by plotting the initial prior means (of the lateral positions) of the centroid.

Figure 11 .
Figure11.1D marginal posterior distributions for the three different cases considered."1C-fault": initial centroid prior configuration derived from the geometry of the known faults within the reservoir, and only the vertical particle displacement recordings are used."3C-fault": initial centroid prior configuration derived from the geometry of the known faults within the reservoir, but both horizontal and vertical particle displacement recordings are used."3C-grid": initial centroid prior located on a regular grid in a horizontal plane at the approximate (expected) depth of the event, and both horizontal and vertical particle displacement recordings are again used.

Figure 12 .
Figure12.Progression of 20 stages from using one of the m (0) in the 3C-fault configuration.The vertical lines represent different stages, whereas the red lines are the posterior mean (i.e., the mean of the green distributions in Figure11) obtained from the selected stages based on the VR criteria for 3C-fault configuration.

Figure 13 .
Figure13.The comparison between observed and numerically modeled seismograms.The modeled seismograms are generated given the posterior mean for estimated for each of the considered cases (see Figure11).The shaded area is within the maximum and minimum bounds of a total of 1,000 waveforms generated using 1,000 models drawn from the posterior distributions in Figure11.Each seismogram is filtered and tapered using the same parameters used in the multi-stage HMC workflow.The duration of each trace plotted here is 3.25s.

Figure 14 .
Figure 14.Hudson plot that shows the decomposition of the source mechanisms given the posterior distributions in Figure 11.The beachball with the red outlines represents the mean MT.

Figure 15 .
Figure 15.Top: The distributions of strike, dip, and rake solutions given the beachballs in Figure 14.Here we only show one part of the solutions closer to the orientation of the nearby major faults.Bottom: The marginal posterior distributions for different earthquake mechanisms given the decomposition in Figure 14.

Figure 16 .
Figure 16.Comparison of samples used to generate centroid posterior distributions in Figure 11 (east, north, and depth) with the centroid estimated by Dost et al. (2020) and the KNMI.The samples are color-coded with the density of centroid posteriors.The black line in the last two rows represents the top reservoir obtained from slicing the top reservoir map based on the red and blue lines in the top row. .