## 1 INTRODUCTION

Full waveform inversion is a tomographic technique that is based on numerical wave propagation through complex media combined with adjoint or scattering integral methods for the computation of Fréchet kernels. The accurate and complete solution of the seismic wave equation ensures that information from the full seismogram can be used for the purpose of improved tomographic models. Originally conceived in the late 1970s and early 1980s (Bamberger *et al.* 1977, 1982; Tarantola 1984), realistic applications have become feasible only recently. Full waveform inversion can now be used to solve local-scale engineering and exploration problems (e.g. Smithyman *et al.* 2009; Takam Takougang & Calvert 2011), to study crustal-scale deformation processes (e.g. Bleibinhaus *et al.* 2007; Chen *et al.* 2007; Tape *et al.* 2009), to reveal the detailed structure of the lower mantle (e.g. Konishi *et al.* 2009; Kawai & Geller 2010) or to refine continental-scale models for tectonic interpretations and improved tsunami warnings (e.g. Fichtner *et al.* 2010; Hingee *et al.* 2011). While the tomographic method itself has advanced substantially, an essential aspect of the inverse problem has been ignored almost completely, despite its obvious socio-economic relevance: the quantification of resolution and uncertainties.

### 1.1 Resolution analysis in full waveform inversion

Early attempts to analyse—and in fact define—resolution were founded on the equivalence of diffraction tomography and the first iteration of a full waveform inversion (e.g. Devaney 1984; Wu & Toksöz 1987; Mora 1989). This equivalence, however, holds only in the impractical case where the misfit χ is equal to the *L*_{2} waveform difference. Furthermore, the rigorous analysis of diffraction tomography is restricted to homogeneous or layered acoustic media. The resulting resolution estimates are too optimistic for realistic applications that suffer from modelling errors and the sparsity of noisy data (Bleibinhaus *et al.* 2009).

Resolution analysis in full waveform inversion is complicated by many factors: (1) The data depend non-linearly on the model, meaning that the well-established machinery of linear inverse theory is not applicable (Backus & Gilbert 1967; Tarantola 2005). (2) A direct consequence of non-linearity is the appearance of multiple local minima (e.g. Gauthier *et al.* 1986). These may be avoided with the help of various multiscale approaches (e.g. Bunks *et al.* 1995; Sirgue & Pratt 2004; Ravaut *et al.* 2004; Fichtner *et al.* 2009), also known as frequency-hopping in the microwave imaging literature (e.g. Chew & Lin 1995). The convergence of all currently used multiscale approaches is, however, purely empirical. (3) Contrary to most linearized tomographies, the sensitivity matrix is not computed explicitly in full waveform inversion for reasons of numerical efficiency. This prevents a local analysis based, for instance, on the computation of the resolution and covariance operators for large linear systems (Nolet *et al.* 1999; Boschi 2003). (4) The size of the model space and the costs of the forward problem solution prohibit the application of probabilistic approaches that account for non-linearity using Monte Carlo sampling (Sambridge & Mosegaard 2002; Tarantola 2005) or neural networks (Devilee *et al.* 1999; Meier *et al.* 2007a,b).

In the absence of a quantitative means to assess resolution, arguments concerning the reliability of full waveform inversion images are mostly restricted to synthetic inversions for specific input structures, on the visual inspection of the tomographic images or on the analysis of the data fit. Synthetic inversions are known to be potentially misleading even in linearized tomographies (Lévêque *et al.* 1993). Visual inspection is equally inadequate because the appearance of small-scale heterogeneities is too easily mistaken as an indicator of high resolution. Finally, a good fit between observed and synthetic waveforms merely proves that the tomographic system has been solved, but not necessarily resolved.

Despite being crucial for the interpretation of the tomographic images, methods for the quantification of resolution in realistic applications of full waveform inversion do not exist so far. This deficiency is the source of much scepticism as to whether it is really worth the effort.

Despite the difficulties introduced by the non-linearity of full waveform inversion combined with the computational costs of the forward problem solution, ample information about local resolution can be inferred from the quadratic approximation of the misfit functional χ.

where

represents an earth model composed of *N* physical quantities. The components of **m** may, for instance, be the *P* velocity α, the *S* velocity β and density ρ, that is, (*m*_{1}, *m*_{2}, *m*_{3})^{T}= (α, β, ρ)^{T}. The optimal earth model is characterized by a zero Fréchet derivative, meaning that

for all model perturbations . The Hessian **H** is a symmetric and bilinear operator that acts on the perturbation via a double integral over the model volume *G*.

### 1.2 The role of the Hessian in resolution analysis

The importance of the Hessian in local resolution analysis arises directly from the second-order approximation (1) but also from its relations to the posterior covariance, extremal bounds analysis and point-spread functions (PSFs).

#### 1.2.1 Inferences on resolution and trade-offs from the local approximation

Locally, that is, in the vicinity of the optimum , the Hessian describes the geometry of χ in terms of its curvature or convexity. In this sense, **H** provides the most direct measure of resolution and trade-offs as it describes the change of the misfit when is slightly perturbed to . The diagonal element *H _{ii}*(

**x**,

**x**) defines the local resolution of the model parameter

*m*at position

_{i}**x**. The off-diagonal elements measure the trade-offs between

*m*and model parameters

_{i}*m*|

_{j}_{j≠i}at position

**x**, that is the extent to which the model parameters are dependent. Similarly, the off-diagonal elements encapsulate spatial dependencies between model parameters

*m*and

_{i}*m*at different positions

_{j}**x**and

**y**. Large off-diagonal elements imply that simultaneous perturbations of different parameters or in different regions can compensate each other, to leave the misfit χ nearly unchanged.

#### 1.2.2 Relation of the Hessian to the posterior covariance

A further interpretation of the Hessian is related to Bayesian inference (e.g. Jaynes 2003; Tarantola 2005) where the available information on a model **m** is expressed in terms of a probability density . In the specific case of a linear forward problem and Gaussian distributions describing prior knowledge and measurement errors, takes the form

with the misfit functional

and the posterior covariance **S**. The comparison between (5) and the quadratic approximation (1) suggests the interpretation of **H** in terms of the inverse posterior covariance in a local probabilistic sense.

#### 1.2.3 Extremal bounds analysis

In addition to being the carrier of covariance information, the Hessian **H** provides the extremal bounds within which the optimal model can be perturbed without increasing the misfit beyond a pre-defined limit , where δχ is usually related to the noise in the data (Meju & Sakkas 2007; Meju 2009). The model **m**^{extr} that extremizes the integral of the model parameter *m _{i}* over a specific region

*G*

_{δ}⊂

*G*, that is, , while increasing the misfit to , is given by (Fichtner 2010)

and no summation over the repeated indices *ii*. Eq. (6) involves the inverse Hessian **H**^{−1}, interpreted already in terms of the local posterior covariance. Extremal bounds analysis therefore attaches a deterministic and quantitative meaning to an originally probabilistic concept. Large variances *H*^{−1}_{ii} imply that large perturbations of parameter *m _{i}* within the region

*G*

_{δ}do not increase the misfit beyond the admissible bound δχ, meaning that

*m*is poorly constrained inside

_{i}*G*

_{δ}. The presence of non-zero covariances

*H*

^{−1}

_{ij}|

_{j≠i}indicates that joint perturbations of all model parameters compensate each other, to allow for even larger perturbations of the parameter

*m*that we wish to extremize in the above-mentioned sense.

_{i}#### 1.2.4 Point-spread functions

Finally, we can relate the Hessian to PSFs or spike tests that are commonly used as a diagnostic tool for resolution and trade-offs in linearized tomographic problems (e.g. Spakman 1991; Zielhuis & Nolet 1994; Yu *et al.* 2002; Fang *et al.* 2010). For this, we consider a special type of synthetic inversion where the initial model **m**^{(0)}(**y**) is nearly equal to the optimal model . The only deficiency of **m**^{(0)}(**y**) is the absence of a point perturbation of parameter *m _{i}* that is point-localized at position

**x**, that is,

Using the quadratic approximation (1) and the definition of the Fréchet derivative (3), we find that the *j*-component of the Fréchet derivative ∇_{m}χ evaluated at the initial model **m**^{(0)}, that is, , is given by

where the summation over repeated indices is implicitly assumed from hereon. The first iteration of a gradient-based optimization scheme would then update **m**^{(0)}(**x**) to an improved model , the components of which are given by

The scalar γ is the step length that minimizes χ along the local direction of steepest descent, . Eq. (9) reveals that the Hessian **H**(**x**, **y**) represents our blurred perception of a point-localized perturbation at position **y** in a linearized tomographic inversion. The effect of the off-diagonal elements *H _{ij}*|

_{i≠j}is to introduce unwanted updates of model parameters

*m*|

_{j}_{i≠j}that have initially not been perturbed.

In the restricted sense of eq. (9), the Hessian *H _{ij}*(

**x**,

**y**) is the PSF, that is, the response of model parameter

*m*to a linearized spike test with a point perturbation of

_{j}*m*at position

_{i}**x**. A fully non-linear spike test based on gradient optimization with multiple iterations will generally lead to a sharper reconstruction of the input spike, so that

**H**(

**x**,

**y**) can be considered a conservative estimate of the non-linear PSF. Throughout the following developments, we use the term PSF in the linearized sense as a synonym for the Hessian because it offers an intuitive interpretation of

**H**(

**x**,

**y**). Although the significance of the Hessian in local resolution analysis is evident, the efficient computation of

**H**in time-domain modelling of seismic wave propagation remains challenging. The most efficient approach involves a modification of the well-known adjoint method (e.g. Tarantola 1988; Tromp

*et al.*2005; Fichtner

*et al.*2006; Liu & Tromp 2006; Plessix 2006; Chen 2011) that allows us to compute

**H**applied to a model perturbation , that is,

using two forward and two adjoint simulations, as described in Santosa & Symes (1988), Fichtner & Trampert (2011) and Appendix A.

A model perturbation samples the Hessian via the integral (10); and by sampling **H** with a suitable set of model perturbations, we can gather as much second-derivative information as needed for our purposes, though at the expense of potentially prohibitive computational requirements. It is therefore the purpose of this paper to develop a sampling strategy of the Hessian that operates with as few model perturbations as possible while leading to an approximation of **H** that is physically meaningful and interpretable.

### 1.3 Outlook

This paper is organized as follows: We start in Section 2 with a brief description of a full waveform inversion for upper-mantle structure beneath Europe that will serve as both motivation and testing ground for the subsequent developments. In Section 3, we approximate the Hessian by a position-dependent Gaussian, the parameters of which can be computed efficiently via the Fourier transform of **H**(**x**, **y**) for a small set of wavenumber vectors. The Gaussian approximation can be generalized with the help of Gram–Charlier expansions that express **H**(**x**, **y**) in terms of a parent function and its successive derivatives. Following the theoretical developments, we demonstrate in Section 4 how the Gaussian approximation of the Hessian can be used to infer the image distortion introduced by the tomographic method, as well as the distribution of direction-dependent resolution lengths. Section 5 provides an intuitive interpretation of the physics behind the Fourier transformed Hessian. This interpretation partly motivates several improvements to full waveform inversion techniques proposed in Section 6. These include a new family of Newton-like methods, a pre-conditioner for gradient methods, an approach to adaptive parametrization independent from ray theory and a criterion for the design of misfit functionals aiming at maximum resolution. Finally, in Appendices A and B, we review the computation of Hessian kernels and multidimensional Gram–Charlier expansions.