We propose a new method for the quantitative resolution analysis in full seismic waveform inversion that overcomes the limitations of classical synthetic inversions while being computationally more efficient and applicable to any misfit measure. The method rests on (1) the local quadratic approximation of the misfit functional in the vicinity of an optimal earth model, (2) the parametrization of the Hessian in terms of a parent function and its successive derivatives and (3) the computation of the space-dependent parameters via Fourier transforms of the Hessian, calculated with the help of adjoint techniques. In the simplest case of a Gaussian approximation, we can infer rigorously defined 3-D distributions of direction-dependent resolution lengths and the image distortion introduced by the tomographic method. We illustrate these concepts with a realistic full waveform inversion for upper-mantle structure beneath Europe. As a corollary to the method for resolution analysis, we propose several improvements to full waveform inversion techniques. These include a pre-conditioner for optimization schemes of the conjugate-gradient type, a new family of Newton-like methods, an approach to adaptive parametrization independent from ray theory and a strategy for objective functional design that aims at maximizing resolution. The computational requirements of our approach are less than for a typical synthetic inversion, but yield a much more complete picture of resolution and trade-offs. While the examples presented in this paper are rather specific, the underlying idea is very general. It allows for problem-dependent variations of the theme and for adaptations to exploration scenarios and other wave-equation-based tomography techniques that employ, for instance, georadar or microwave data.
Full waveform inversion is a tomographic technique that is based on numerical wave propagation through complex media combined with adjoint or scattering integral methods for the computation of Fréchet kernels. The accurate and complete solution of the seismic wave equation ensures that information from the full seismogram can be used for the purpose of improved tomographic models. Originally conceived in the late 1970s and early 1980s (Bamberger et al. 1977, 1982; Tarantola 1984), realistic applications have become feasible only recently. Full waveform inversion can now be used to solve local-scale engineering and exploration problems (e.g. Smithyman et al. 2009; Takam Takougang & Calvert 2011), to study crustal-scale deformation processes (e.g. Bleibinhaus et al. 2007; Chen et al. 2007; Tape et al. 2009), to reveal the detailed structure of the lower mantle (e.g. Konishi et al. 2009; Kawai & Geller 2010) or to refine continental-scale models for tectonic interpretations and improved tsunami warnings (e.g. Fichtner et al. 2010; Hingee et al. 2011). While the tomographic method itself has advanced substantially, an essential aspect of the inverse problem has been ignored almost completely, despite its obvious socio-economic relevance: the quantification of resolution and uncertainties.
1.1 Resolution analysis in full waveform inversion
Early attempts to analyse—and in fact define—resolution were founded on the equivalence of diffraction tomography and the first iteration of a full waveform inversion (e.g. Devaney 1984; Wu & Toksöz 1987; Mora 1989). This equivalence, however, holds only in the impractical case where the misfit χ is equal to the L2 waveform difference. Furthermore, the rigorous analysis of diffraction tomography is restricted to homogeneous or layered acoustic media. The resulting resolution estimates are too optimistic for realistic applications that suffer from modelling errors and the sparsity of noisy data (Bleibinhaus et al. 2009).
Resolution analysis in full waveform inversion is complicated by many factors: (1) The data depend non-linearly on the model, meaning that the well-established machinery of linear inverse theory is not applicable (Backus & Gilbert 1967; Tarantola 2005). (2) A direct consequence of non-linearity is the appearance of multiple local minima (e.g. Gauthier et al. 1986). These may be avoided with the help of various multiscale approaches (e.g. Bunks et al. 1995; Sirgue & Pratt 2004; Ravaut et al. 2004; Fichtner et al. 2009), also known as frequency-hopping in the microwave imaging literature (e.g. Chew & Lin 1995). The convergence of all currently used multiscale approaches is, however, purely empirical. (3) Contrary to most linearized tomographies, the sensitivity matrix is not computed explicitly in full waveform inversion for reasons of numerical efficiency. This prevents a local analysis based, for instance, on the computation of the resolution and covariance operators for large linear systems (Nolet et al. 1999; Boschi 2003). (4) The size of the model space and the costs of the forward problem solution prohibit the application of probabilistic approaches that account for non-linearity using Monte Carlo sampling (Sambridge & Mosegaard 2002; Tarantola 2005) or neural networks (Devilee et al. 1999; Meier et al. 2007a,b).
In the absence of a quantitative means to assess resolution, arguments concerning the reliability of full waveform inversion images are mostly restricted to synthetic inversions for specific input structures, on the visual inspection of the tomographic images or on the analysis of the data fit. Synthetic inversions are known to be potentially misleading even in linearized tomographies (Lévêque et al. 1993). Visual inspection is equally inadequate because the appearance of small-scale heterogeneities is too easily mistaken as an indicator of high resolution. Finally, a good fit between observed and synthetic waveforms merely proves that the tomographic system has been solved, but not necessarily resolved.
Despite being crucial for the interpretation of the tomographic images, methods for the quantification of resolution in realistic applications of full waveform inversion do not exist so far. This deficiency is the source of much scepticism as to whether it is really worth the effort.
Despite the difficulties introduced by the non-linearity of full waveform inversion combined with the computational costs of the forward problem solution, ample information about local resolution can be inferred from the quadratic approximation of the misfit functional χ.
represents an earth model composed of N physical quantities. The components of m may, for instance, be the P velocity α, the S velocity β and density ρ, that is, (m1, m2, m3)T= (α, β, ρ)T. The optimal earth model is characterized by a zero Fréchet derivative, meaning that
for all model perturbations . The Hessian H is a symmetric and bilinear operator that acts on the perturbation via a double integral over the model volume G.
1.2 The role of the Hessian in resolution analysis
The importance of the Hessian in local resolution analysis arises directly from the second-order approximation (1) but also from its relations to the posterior covariance, extremal bounds analysis and point-spread functions (PSFs).
1.2.1 Inferences on resolution and trade-offs from the local approximation
Locally, that is, in the vicinity of the optimum , the Hessian describes the geometry of χ in terms of its curvature or convexity. In this sense, H provides the most direct measure of resolution and trade-offs as it describes the change of the misfit when is slightly perturbed to . The diagonal element Hii(x, x) defines the local resolution of the model parameter mi at position x. The off-diagonal elements measure the trade-offs between mi and model parameters mj|j≠i at position x, that is the extent to which the model parameters are dependent. Similarly, the off-diagonal elements encapsulate spatial dependencies between model parameters mi and mj at different positions x and y. Large off-diagonal elements imply that simultaneous perturbations of different parameters or in different regions can compensate each other, to leave the misfit χ nearly unchanged.
1.2.2 Relation of the Hessian to the posterior covariance
A further interpretation of the Hessian is related to Bayesian inference (e.g. Jaynes 2003; Tarantola 2005) where the available information on a model m is expressed in terms of a probability density . In the specific case of a linear forward problem and Gaussian distributions describing prior knowledge and measurement errors, takes the form
with the misfit functional
and the posterior covariance S. The comparison between (5) and the quadratic approximation (1) suggests the interpretation of H in terms of the inverse posterior covariance in a local probabilistic sense.
1.2.3 Extremal bounds analysis
In addition to being the carrier of covariance information, the Hessian H provides the extremal bounds within which the optimal model can be perturbed without increasing the misfit beyond a pre-defined limit , where δχ is usually related to the noise in the data (Meju & Sakkas 2007; Meju 2009). The model mextr that extremizes the integral of the model parameter mi over a specific region Gδ⊂G, that is, , while increasing the misfit to , is given by (Fichtner 2010)
and no summation over the repeated indices ii. Eq. (6) involves the inverse Hessian H−1, interpreted already in terms of the local posterior covariance. Extremal bounds analysis therefore attaches a deterministic and quantitative meaning to an originally probabilistic concept. Large variances H−1ii imply that large perturbations of parameter mi within the region Gδ do not increase the misfit beyond the admissible bound δχ, meaning that mi is poorly constrained inside Gδ. The presence of non-zero covariances H−1ij|j≠i indicates that joint perturbations of all model parameters compensate each other, to allow for even larger perturbations of the parameter mi that we wish to extremize in the above-mentioned sense.
1.2.4 Point-spread functions
Finally, we can relate the Hessian to PSFs or spike tests that are commonly used as a diagnostic tool for resolution and trade-offs in linearized tomographic problems (e.g. Spakman 1991; Zielhuis & Nolet 1994; Yu et al. 2002; Fang et al. 2010). For this, we consider a special type of synthetic inversion where the initial model m(0)(y) is nearly equal to the optimal model . The only deficiency of m(0)(y) is the absence of a point perturbation of parameter mi that is point-localized at position x, that is,
Using the quadratic approximation (1) and the definition of the Fréchet derivative (3), we find that the j-component of the Fréchet derivative ∇mχ evaluated at the initial model m(0), that is, , is given by
where the summation over repeated indices is implicitly assumed from hereon. The first iteration of a gradient-based optimization scheme would then update m(0)(x) to an improved model , the components of which are given by
The scalar γ is the step length that minimizes χ along the local direction of steepest descent, . Eq. (9) reveals that the Hessian H(x, y) represents our blurred perception of a point-localized perturbation at position y in a linearized tomographic inversion. The effect of the off-diagonal elements Hij|i≠j is to introduce unwanted updates of model parameters mj|i≠j that have initially not been perturbed.
In the restricted sense of eq. (9), the Hessian Hij(x, y) is the PSF, that is, the response of model parameter mj to a linearized spike test with a point perturbation of mi at position x. A fully non-linear spike test based on gradient optimization with multiple iterations will generally lead to a sharper reconstruction of the input spike, so that H(x, y) can be considered a conservative estimate of the non-linear PSF. Throughout the following developments, we use the term PSF in the linearized sense as a synonym for the Hessian because it offers an intuitive interpretation of H(x, y). Although the significance of the Hessian in local resolution analysis is evident, the efficient computation of H in time-domain modelling of seismic wave propagation remains challenging. The most efficient approach involves a modification of the well-known adjoint method (e.g. Tarantola 1988; Tromp et al. 2005; Fichtner et al. 2006; Liu & Tromp 2006; Plessix 2006; Chen 2011) that allows us to compute H applied to a model perturbation , that is,
A model perturbation samples the Hessian via the integral (10); and by sampling H with a suitable set of model perturbations, we can gather as much second-derivative information as needed for our purposes, though at the expense of potentially prohibitive computational requirements. It is therefore the purpose of this paper to develop a sampling strategy of the Hessian that operates with as few model perturbations as possible while leading to an approximation of H that is physically meaningful and interpretable.
This paper is organized as follows: We start in Section 2 with a brief description of a full waveform inversion for upper-mantle structure beneath Europe that will serve as both motivation and testing ground for the subsequent developments. In Section 3, we approximate the Hessian by a position-dependent Gaussian, the parameters of which can be computed efficiently via the Fourier transform of H(x, y) for a small set of wavenumber vectors. The Gaussian approximation can be generalized with the help of Gram–Charlier expansions that express H(x, y) in terms of a parent function and its successive derivatives. Following the theoretical developments, we demonstrate in Section 4 how the Gaussian approximation of the Hessian can be used to infer the image distortion introduced by the tomographic method, as well as the distribution of direction-dependent resolution lengths. Section 5 provides an intuitive interpretation of the physics behind the Fourier transformed Hessian. This interpretation partly motivates several improvements to full waveform inversion techniques proposed in Section 6. These include a new family of Newton-like methods, a pre-conditioner for gradient methods, an approach to adaptive parametrization independent from ray theory and a criterion for the design of misfit functionals aiming at maximum resolution. Finally, in Appendices A and B, we review the computation of Hessian kernels and multidimensional Gram–Charlier expansions.
2 THE TESTING GROUND: A REALISTIC FULL WAVEFORM INVERSION
To illustrate the theoretical developments of the following sections with practical applications, we consider a long-wavelength full waveform inversion for European upper-mantle structure (Fig. 1). While this example is very specific, all subsequent developments are valid also for smaller scale applications, for instance, in the context of active source exploration and engineering problems.
As data we used 2200 mostly vertical-component seismograms from 40 events that occurred in Europe and western Asia between 2002 and 2010. To quantify the discrepancies between observed and synthetic waveforms, we measured time- and frequency-dependent phase misfits within a period band from 100 to 300 s (Fichtner et al. 2009). In addition to surface waves, we also included long-period body waves and unidentified phases. The numerical modelling was based on a spectral-element discretization of the seismic wave equation, as described in Fichtner & Igel (2008).
To ensure the rapid convergence of the iterative misfit minimization, we implemented a 3-D initial model. The initial crustal structure is a long-wavelength equivalent (Fichtner & Igel 2008) of the maximum likelihood model of Meier et al. (2007a,b)) who inverted surface wave dispersion for crustal thickness and the isotropic S velocity β. Within the continental crust, the isotropic P wave speed α, and density ρ, are scaled to β as α= 1.5399β+ 840 m s−1, and ρ= 0.2277β+ 2016 kg m−3(Meier et al. 2007a). Velocities and density within the oceanic crust are fixed to the values of crust 2.0 (Bassin et al. 2000). As 3-D initial mantle structure we use the isotropic S velocity variations of model S20RTS (Ritsema et al. 1999) added to an isotropic version of the 1-D model PREM (Dziewonski & Anderson 1981) where the 220 km discontinuity has been replaced by a linear gradient. The scaling to α is depth-dependent, as inferred from P, PP, PPP and PKP traveltimes (Ritsema & van Heijst 2002). The initial density structure in the mantle is radially symmetric.
During the inversion, we successively update both β and α, as well as ρ. After seven conjugate-gradient iterations, the total misfit reduction reached nearly 50 per cent, whereas it was below 2 per cent for the seventh iteration alone. This indicates that the final model, shown in Fig. 1(b), is indeed close to the optimum. Despite being computed from long-period data, the model clearly reveals prominent features previously imaged in both body and surface wave tomographic studies (e.g. Spakman 1991; Zielhuis & Nolet 1994; Boschi et al. 2004; Peter et al. 2008; Schivardi & Morelli 2011; Schäfer et al. 2011). These include the low velocities beneath the Iceland hotspot and the western Mediterranean Basin, the slow-to-fast transition from Phanerozoic central Europe to the Precambrian east European platform, and the high velocities beneath the Hellenic arc.
Having reached a nearly optimal model, we can compute the PSF, H(x, y), in the sense of Section 1.2.4. The result for a point-localized S velocity perturbation at 180 km below northern Germany is shown in Fig. 2.
The spatial extent of the ββ-component , displayed in Fig. 2(a), describes the trade-offs between an S velocity perturbation located at position x, and the S velocity in the surrounding model volume. With few exceptions, is positive, meaning that negative perturbations in β compensate positive ones, and vice versa. The black and grey-shaded volume where trade-offs between S velocity perturbations are most significant, is roughly N-S oriented with a weak tail extending beneath Scandinavia and the North Atlantic. Interpreted in terms of a PSF, the Hessian component is the S velocity structure recovered in a single-iteration synthetic inversion where the input was a perturbation in β localized at position x. The width of the PSF loosely defines a resolution length within which S velocity perturbations cannot be constrained independently. We will return to the resolution length concept in Section 4.
Fig. 2(b) shows the βα-component of the Hessian, , which represents the trade-offs between P and S velocity perturbations. While the horizontal slice qualitatively resembles , the vertical slices reveal that trade-offs between α and β are restricted to shallow depths of less than 50 km. This was to be expected because our data set is dominated by Rayleigh waves that are affected by P velocity structure only near the surface.
The βρ-component , visualized in Fig. 2(c), reveals a surprising complexity that is reminiscent of the strongly oscillatory density kernels for Rayleigh waves (e.g. Takeuchi & Saito 1972; Cara et al. 1984). The absolute values of are generally small compared to those of the other components, meaning that the trade-offs between S velocity and density are hardly significant.
In general, the details of the PSFs depend on both the data and the position of the point perturbation relative to sources and receivers. However, from a series of numerical experiments, we conclude that several characteristics of the PSFs are position-independent. These include the comparatively small amplitude of Hβρ, the restriction of Hβα to shallow depths and the roughly bell-shaped geometry of Hββ.
3 FOURIER-DOMAIN APPROXIMATION OF THE HESSIAN - I: THEORY
To fully quantify resolution and trade-offs, PSFs should ideally be computed for every model parameter and for every point x within the volume of interest. Since the resulting computational costs would clearly be prohibitive, we propose an efficient scheme for the approximation of PSFs. This is based on a parametrization of the Hessian, followed by the estimation of the space-dependent parameters with the help of Fourier transforms.
3.1 Parametrization of the PSF
As briefly discussed in Section 1.2, a series of model perturbations applied to H as in eq. (10) can be considered samplers of the Hessian. Our task is to approximate H from a small number of samples such that the result is physically meaningful and interpretable. This goal can be reached most efficiently by introducing a parametrization of the Hessian that incorporates prior knowledge on its general characteristics. Motivated, for instance, by the bell-shaped geometry of Hββ, shown in Fig. 2(a), we may parametrize the PSF for a specific point perturbation at position x by the y-dependent Gaussian
which is a common approximation for optical systems including microscopes and telescopes (e.g. Bendinelli et al. 1987; Trujillo et al. 2001; Zhang et al. 2007; Pankajakshan et al. 2000). The symmetric matrix C(x) describes the width and orientation of the PSF centred at z(x). Ideally, the diagonal elements of C(x) should be large, to minimize the spatial extent of the PSF. The difference describes the distortion introduced by the tomographic inversion that maps a point perturbation at x into a blurred heterogeneity with a centre of mass at position z(x). Finally, the zeroth moment or weight is equal to the integral over the PSF for a point perturbation at location x. While being convenient, the parametrization of H in terms of Gaussians is also restrictive, and its adequacy is clearly problem-dependent. More general parametrizations may be found within the framework of Gram–Charlier expansions, discussed in Section 3.2 and in Appendix B.
We have, at this point, transferred the problem of computing the PSF for every point in space to the determination of the space-dependent parameters in the approximation (11). An efficient approach to the computation of M(0)(x), z(x) and C(x) rests on the simple recognition that the application of H to sinusoidal model perturbations allows us to compute the Fourier transform of the Hessian: Choosing, for instance,
with a fixed wavenumber vector k, yields the Fourier transform :
As mentioned previously, the integrals over the Hessian times a model perturbation can be computed efficiently with a variant of the adjoint method that requires two forward and two adjoint simulations [see Fichtner & Trampert (2011) and Appendix A].
Equating the right-hand side of (13) with the Fourier transform of the parametrized Hessian—the Gaussian from eq. (11), in our case—yields
Eq. (14) can be transformed into the linear system
Given the zero-wavenumber component M(0), three non-zero and linearly independent wavenumber vectors are required to determine z, as well as three linear combinations of the matrix elements C−1ij. To invert C−1 the real parts of Hββ for three additional wavenumber vectors are required. It follows that the Hessian must be applied to a total of 10 model perturbations: one for the zero-wavenumber component M(0), three plus three for the real and imaginary parts of required for the computation of z and three for the real parts of needed to complete C−1. The resulting space-domain approximation of the PSF can then be used to infer position-dependent resolution and trade-offs, as illustrated in Section 4.
3.2 Generalization: Gram–Charlier approximations
Although the parametrization of the PSF in terms of space-dependent Gaussians will be sufficient for many practical purposes, it remains desirable to generalize the approach from Section 3.1 such that it allows for improved approximations. Gram–Charlier expansions, described in Appendix B, offer such a generalization in a natural way. In the following, we assume that the Fourier transform of Hij(x, y) with respect to y has been computed through the application of the Hessian to sinusoidal model perturbations, as illustrated in eq. (13). Omitting the subscripts in Hij for notational convenience, the most general wavenumber-domain Gram–Charlier approximation of then takes the form
where is the Fourier transform of a parent function G, and H(i)(x) and G(i)(x) are the position-dependent cumulants of H and G, also introduced in Appendix B. The first three cumulants of H(x, y) for a fixed value of x are related to the moments
Similar relations hold for the cumulants and moments of the parent function G. The first and second cumulants, H(1) and H(2), are the centre of mass and the inverse covariance of , respectively.
To allow for a good approximation with few coefficients, the parent function is usually chosen to have the same general characteristics as the function that we wish to expand. Assuming, for instance, that is roughly bell-shaped, as the PSF in Fig. 2(a), we can choose the space-domain parent function to be the normalized Gaussian
with the x-dependent cumulants
Since the values of the first and second cumulants are still undetermined at this point of the development, we are free to impose G(1)=H(1) and G(2)=H(2), so that (17) reduces to the classical Gram–Charlier A series (e.g. Samuelson 1943; Cramér 1957)
Eq. (17) and its specialized form (22) are general expansions of the PSF in terms of its cumulants. The Gaussian parametrization from Section 3.1 is, in this sense, a first-order or low-wavenumber approximation. Higher order terms in the Gram–Charlier series involve higher cumulants and account for the shorter wavelength variations of the PSF.
Similar to the first-order approximation of eq. (14), the coefficients of a general Gram–Charlier expansions can be computed from the Fourier transforms for sufficiently many independent wavenumber vectors k. The space-domain approximation corresponding to (22) is given by (see Appendix B)
While the expansions (17) and (23) are very general, their physical interpretation is complicated by their complexity. We will therefore adhere to the low-wavenumber Gaussian approximation from Section 3.1 throughout the following examples.
4 FOURIER-DOMAIN APPROXIMATION OF THE HESSIAN - II: APPLICATION
To make the previous developments explicit, we return to the full waveform inversion example introduced in Section 2. As a preparatory step we subdivide the study region into nearly equal volume blocks, shown in Fig. 3. These introduce a coordinate system that is approximately cartesian over the expected width of the PSFs. The blocks, also used to parametrize the tomographic model from Fig. 1, are 1°× 1° wide and 10 km deep. We then adopt the Gaussian parametrization of Hββ, proposed in eq. 11.
4.1 The zero-wavenumber component: relative weight of PSFs
We obtain the Fourier transform at zero wavenumber, , from the application of to an S velocity perturbation that is constant throughout the model volume, as shown in Fig. 4(a). The result is—irrespective of any parametrization or approximation—the space-dependent zeroth moment
Horizontal slices though M(0)(x) at 100, 200 and 300 km depth are displayed in Fig. 4(b). As expected, the zeroth moment is mostly positive, with few exceptions at shallow depth confined to the immediate vicinity of sources or receivers. Clearly visible are diffuse versions of ray bundles that connect the epicentres along the periphery of the European plate to those regions with the highest station density. Furthermore, the zeroth moment decreases with depth below 200 km, as expected for a surface wave dominated data set.
According to eq. (24), the zeroth moment is equal to the integral over the PSF for a point perturbation at position x. It follows that M(0)(x) is small when the misfit χ is nearly unaffected by a point perturbation at x. In contrast, M(0)(x) is large when the PSF has a high amplitude, a large spatial extent, or both. In this sense, the spatial variability of M(0)(x) reflects the relative weight of neighbouring PSFs. Information on the resolution length is not contained in the zeroth moment.
From the perspective of gradient-based optimization, M(0)(x) is proportional to the direction of steepest descent evaluated at a model that differs from the optimal model by a constant S velocity perturbation. In the case of perfect data coverage, we would therefore expect M(0)(x) to be constant throughout the model volume. The obvious departures from this ideal scenario, seen in Fig. 4(b), result from the limited sensitivity to deep structure in a surface wave dominated tomography, and from the clustering of sources and receivers. These clusters dominate the misfit, which then leads to a heterogeneous direction of steepest descent.
4.2 Real and imaginary parts of the Fourier transform: image distortion
Tomographic inversions map point perturbations into blurred versions of themselves. The centre of mass of the blurred image—that is the position where the heterogeneity is visually perceived—does not generally coincide with the location of the original point perturbation. This distortion can be quantified, at least in a linearized sense, using the approach described in Section 3.
Based on eq. (12), we compute the spectrum for a non-zero wavenumber k by applying the Hessian to the sinusoidal model perturbations and . For our example, we choose the wavelength of the oscillations to be 2000 km. The model perturbations together with the real and imaginary parts of are shown in Fig. 5 for a wavenumber vector pointing in the x-direction of the coordinate system introduced in Fig. 3 (nearly N–S in geographic coordinates).
Both the real and imaginary parts reveal a similar oscillatory pattern as the sinusoidal model perturbations from which they originate. This behaviour is to be expected provided that any PSF is roughly a space-shifted version of the PSF for a perturbation at the coordinate origin x=0, that is, . Deviations from exact sinusoidal oscillations therefore reflect the spatially variable shape and amplitude of the PSFs.
When interpreted within the framework of the Gaussian approximation (eqs 11 and 14), we can use the spectrum from Fig. 5 to compute the distortion
which is the distance between a point-localized perturbation at position x, and the centre of mass of the corresponding PSF, z(x). The distortion measures the displacement of a point perturbation caused by the tomographic inversion. Upon inserting the definition (25) into eq. (14), we obtain
The three components of the distortion can then be obtained from the evaluation of eq. (26) for three linearly independent wavenumber vectors. We note that the calculation of the complex argument needs regularized because phase jumps appear where is small. This regularization can be achieved by adding a constant ε > 0 to the real part of (26), so that is forced to zero as drops below ε.
The lateral distortion, that is, the horizontal component of , computed from the spectra for two linearly independent wavenumber vectors, is shown in Fig. 6. The interpretation must account for the regularization, which in this particular case means that is strongly attenuated in regions where the absolute value of the spectrum shown in Fig. 5 drops below 0.5 × 10−15 rad s m−4. Loosely speaking, the distortion is meaningful only in regions where the PSFs are significantly non-zero.
Throughout central Europe, the lateral distortion is below 300 km, which is due to the high station density. Large distortions appear to be rather localized, for instance, east of the Caspian Sea where ray paths are almost exclusively N–S oriented. As intuitively expected, slightly increases with increasing depth.
4.3 Resolution lengths
One of the most relevant quantities to be considered in resolution analysis is the resolution length, that is, the distance within which two point perturbations cannot be constrained independently.
The resolution length is the direction-dependent extent of the PSF that defines a region where the original point perturbation trades off significantly with neighbouring heterogeneities. Within the Gaussian approximation (11), the width of a PSF is controlled by the positive definite matrix C(x), the inverse of which can be estimated from the spectrum (14).
We use the standard deviation of the Gaussian in a direction specified by the unit vector e,
as our definition of the direction-dependent resolution length. This means in the context of our specific example that S velocity perturbations separated by less than the resolution length effectively appear as one heterogeneity, instead of having clearly separate identities.
Distributions of horizontal resolution lengths in two perpendicular directions are shown in Fig. 7. The resolution length in x-direction (nearly N–S in central and western Europe) reaches its minimum of 300 km beneath the Balkan peninsula, in agreement with the dense clustering of E–W-oriented ray paths in that region. Additional minima can be observed, for instance, in the vicinity of events along the Mid-Atlantic ridge from where waves travel westwards to stations in central and southern Europe. The resolution length in the local y-direction (nearly E–W in central and western Europe) is longer than in x-direction beneath the Balkans but significantly shorter beneath the North Atlantic. This, again, is in accord with the predominance of N–S-oriented ray paths that connect sources along the northern Mid-Atlantic ridge to stations on the continent. Throughout most of Europe, the resolution length in any direction varies between 400 and 600 km, which is close to the wavelength of the 120 s surface waves in our data set.
5 THE PHYSICS BEHIND: THE RETURN OF SYNTHETIC INVERSIONS?
In the following paragraphs, we complement the formal treatment of the Fourier transform by an intuitive interpretation. This is intended to shed light onto the physics behind the oscillatory behaviour of the Fourier transformed Hessian, and to reveal a link between the width of Fresnel zones and the resolution length. To facilitate our arguments, we restrict ourselves to the scalar case with one model parameter only.
Our interpretation rests on the second-order approximation of the misfit χ (eq. 1), based upon which the Fréchet derivative ∇mχ, evaluated at an initial model , can be expressed in terms of the Hessian.
For a sinusoidal perturbation , the Fréchet derivative (28) equals the Fourier transformed Hessian, that is, . Since is the local direction of steepest ascent, the first iteration of a conjugate-gradient-type synthetic inversion updates m(0) to
where γ is the optimal step length. Thus, in the ideal but unrealistic case of perfect convergence within one iteration, the spectrum should exactly reproduce the oscillations of . The practically observed differences between m(1)(x) and the optimum result from the smearing of by the PSF H(x, y) via the integral in eq. (28). The nature of the smearing operation depends on the characteristics of the PSFs, that are governed by the source–receiver geometry and the waveforms included in the misfit χ.
For further illustrations of the smearing procedure, we consider the simplistic configuration shown in Fig. 8(a). The sources and receivers are oriented horizontally, parallel to the zero lines of δm. Each receiver records one isolated S wave that propagates from one source with the S velocity β.
We can infer the geometry of the PSFs for this configuration from plausibility arguments: The PSF H(x, y) is the response of the tomographic system to a point perturbation at position x. While this has to be understood in the linearized sense (see Section 1.2.4), the Hessian nevertheless accounts for both first- and second-order scattering. When x is located inside the first Fresnel zone, the corresponding PSF is non-zero, because first- and second-order scattering affect the measurement. The shape of the PSF roughly follows the first Fresnel zone, which is the region where different perturbations cannot be independently identified. In particular, the maximum width d of the PSF is close to the maximum width of the first Fresnel zone, that is, , where T and L are the dominant period and the epicentral distance, respectively. When the position of the point perturbation, x, is outside the first Fresnel zone, both first- and second-order scattering become highly inefficient (e.g Fichtner & Trampert 2011). The resulting PSF is then effectively zero. In conclusion, the PSFs for our simplistic setup occupy nearly cigar-shaped regions connecting sources and receivers.
In our first conceptual example from Fig. 8(a), the dominant period T is sufficiently small to generate PSFs that fit within less than half an oscillation period of δm. The integral in eq. (28) therefore reproduces a slightly blurred but still clearly recognizable version of the sinusoidal δm. Seen from the data perspective, waves propagating perpendicular to the oscillations are continuously distorted, thus increasing the misfit from the minimum to .
For a vertically oriented source–receiver configuration, schematically illustrated in Fig. 8(b), the PSFs are parallel to the wavenumber vector. Positive and negative contributions to the integral (28) nearly cancel, which then leads to a Fréchet derivative (28) that is close to zero. Also the misfit is hardly affected, because the wavefield distortions accumulated within the positive and negative parts of δm compensate each other.
The previous line of arguments allows us to interpret the spectrum loosely as a measure for the information transmitted across the model in the directions perpendicular to the wavenumber vector k. Large absolute values of correspond, in this sense, to a large amount of information transported by waves propagating nearly perpendicular to k.
From the amplitudes of the oscillations in , we can infer the resolution length. In fact, based on the spectrum of the Gaussian (14) and the definition of the resolution length (27), we find that is approximately proportional to . Thus, as the PSF or the first Fresnel zone extends with increasing period T, the smoothing integral (28) reduces the amplitude of the spectrum , as shown in Fig. 8(c). The amplitude reduction then corresponds to an increasing resolution length .
An immediate consequence of this relation is the variation of the resolution length along the ray path. It attains a maximum near the centre of the ray path, and minima in the vicinity of sources and receivers where the lateral extent of the Fresnel zone or PSF is minimal.
It is a curious aspect of the previous interpretation that we can obtain information on the Hessian, and thus on resolution and trade-offs, from a series of linearized synthetic inversions in the sense of eqs (28) and (29). This information can be made as complete as needed by replacing the Gaussian approximation (11) by Gram–Charlier expansions (Section 3.2) or any other suitable parametrization of the Hessian. The required synthetic inversions differ from the common resolution tests of the chequer board type in that they start from an initial model m(0) that is equal to an already computed optimal model plus a sinusoidal perturbation .
While synthetic inversions with one single input model are known to be incomplete and potentially misleading (Lévêque et al. 1993), the Fourier-domain approximation of the Hessian builds on a small set of linearly independent oscillatory input models that sample the PSFs in different directions. Synthetic inversions are, from this perspective, indeed a useful tool, but they need to be combined for a meaningful resolution analysis.
6 POTENTIALS AND PERSPECTIVES
The approximation of the Hessian with the help of parametrizations such as those suggested in Section 3 does not rely on the proximity of an optimal model. It may, in fact, be used at any stage of an iterative inversion. Thus, as a corollary to the previous developments, we can propose several improvements to full waveform inversion techniques. These include a new class of Newton-like methods, a pre-conditioner for optimization schemes of the conjugate-gradient type, an approach to adaptive parametrization and a strategy for objective functional design. This section is primarily intended to be thought-provoking. All of the proposed improvements require further research that is beyond the scope of this work.
While the efficiency of Newton’s method and its variants in large-scale full waveform inversion is still debated, conjugate-gradient schemes enjoy widespread popularity. In their pure form, however, conjugate-gradient algorithms may converge slowly because Fréchet kernels are small in weakly covered areas and extremely large near sources and receivers. The iterative inversion therefore tends to generate excessive heterogeneities around hypocentres and station arrays, unless the descent directions are pre-conditioned. The pre-conditioner—ideally close to the inverse Hessian—acts to balance the extreme and the weak contributions, thus leading to faster convergence and to physically more reasonable updates.
Owing to the difficulties involved in the computation of the inverse Hessian, several intuitive pre-conditioning strategies are commonly applied. These include smoothing and the correction for the geometric spreading of the forward and adjoint fields (Igel et al. 1996; Fichtner et al. 2009, 2010).
The Fourier transform of the Hessian , interpreted in terms of an ascent direction in Section 5, provides a pre-conditioner that can be computed efficiently. In the hypothetical case of perfect coverage, is expected to be a scaled version of the oscillatory model perturbations used for its computation. The synthetic inversion from eq. (29) would then converge in one iteration. In a realistic case with irregular coverage, however, the first update m(1) and the target model are not identical, as illustrated in Fig. 5.
Seen from the perspective of the Gaussian approximation (14), the deviations of from a harmonic function result from the spatially variable zeroth moment M(0)(x), the non-zero distortion , and the variable width of the PSFs encapsulated in the symmetric matrix C(x). While corrections of the ascent direction for the distortion and the width of the PSF are wavenumber-specific, a correction for the zeroth moment is generally possible for any initial model.
The proposed pre-conditioning consists in the division of the pure ascent direction by M(0)(x), computed for the current model. This operation, illustrated in Fig. 9, generally requires regularization to avoid division by zero. Fig. 9(a) shows the oscillatory model perturbation used to compute the real part of in Fig. 9(b). The amplitudes of are weak in less covered regions, and large where sources and receivers cluster. The division of by M(0)(x) (Fig. 9d) balances the amplitudes, which leads to a pre-conditioned ascent direction that reproduces the oscillations of the input model much more closely than . The computationally inexpensive division by the zeroth moment can thus be expected to result in a faster convergence of gradient-based inversions.
6.1.2 A new class of Newton-like methods
While the convergence rate of Newton-like methods can be nearly quadratic, their applicability is severely limited by the need to solve Newton’s equation for the descent direction. Since the (approximate) Hessian can neither be computed nor stored explicitly for realistic problems, Newton’s equation is solved iteratively using, for instance, conjugate-gradient algorithms, where each iteration requires the evaluation of the (approximate) Hessian times a model perturbation. The resulting computational costs have so far prevented the application of Newton-like methods in large-scale waveform inversions.
Parametrized versions of the Hessian, such as the ones proposed in Section 3, may offer a computationally more efficient alternative to conventional Newton-like schemes, because an approximate Hessian can be computed explicitly. The application of the parametrized Hessian to a descent direction—needed for the conjugate-gradient solution of Newton’s equation—merely involves an integral over space, which greatly accelerates the computations. Furthermore, the storage requirements for the parametrized Hessian are negligible compared to those involved in the computation of Fréchet and Hessian kernels via adjoint techniques.
The problem arising in full waveform inversion is that the strongly application-specific misfit measures are applicable to complete seismograms, without being necessarily related to isolated phases for which seismic rays can be computed (e.g. Crase et al. 1990; Fichtner et al. 2008; Brossier et al. 2010; van Leeuwen & Mulder 2010; Bozdağet al. 2011). The resolution length concept, introduced in Section 4.3, offers a natural solution to this problem. Instead of using the ray coverage as a proxy for resolution, we propose to choose the shape of the basis functions according to the resolution length in different directions.
The design of an adaptive parametrization can be incorporated into a multiscale approach where the dominant period decreases with increasing number of iterations (e.g. Bunks et al. 1995; Sirgue & Pratt 2004; Fichtner 2010): First, a very coarse parametrization is chosen ad hoc for the first iterations with the longest period data. When the model is sufficiently close to the optimum, the resolution lengths are computed, and then used to refine the parametrization. The iterative inversion is then continued with shorter period data until the next optimum is reached. This refinement/optimization procedure can then be repeated as often as needed.
6.3 Objective functional design
A further potential application of the previously developed methodology is the design of objective functionals, commonly constructed to meet criteria in the data space, such as robustness (Crase et al. 1990; Brossier et al. 2010) and separation of phase and amplitude (Luo & Schuster 1991; Gee & Jordan 1992; Fichtner et al. 2008; Bozdağet al. 2011). However, the design of objective functionals with the aim to maximize resolution has so far received little attention in full waveform inversion (Maurer et al. 2009)—contrary to common practice in linear inverse problems where Backus–Gilbert theory provides all necessary tools (Backus & Gilbert 1967). This is mostly because a quantitative resolution analysis in full waveform inversion was not available. The resolution length computed from the Hessian may serve as a natural measure for the relative performance of objective functionals that already satisfy principal desiderata in the data space. It may as such also be used to assess the resolution capabilities of novel seismic observables, including rotations and strain (Bernauer et al. 2009; Ferreira & Igel 2009).
7.1 Computational requirements
Our approach to resolution analysis is based on Fourier transforms of the Hessian for a set of linearly independent wavenumber vectors. Accepting the Gaussian parametrization of H(x, y), as proposed in Section 3.1, 10 applications of the Hessian to sinusoidal model perturbations are needed to uniquely determine all space-dependent parameters, that is, M(0)(x), z(x) and C(x).
As described in Appendix A, each application of the Hessian to a model perturbation requires four wave field simulations: one for the regular forward field u, one for the regular adjoint field , one for the scattered forward field and finally one for the scattered adjoint field . The simulations of the regular forward and adjoint fields can be reused for every application of the Hessian to any model perturbation. Only the scattered forward and adjoint fields are perturbation-specific. This brings the total number of simulations to 22.
It is instructive to compare 22 to the number of simulations required for a typical synthetic inversion based on a conjugate-gradient method. Each iteration requires one forward and one adjoint simulation, plus two additional forward simulations to determine the optimal step length. It follows that 24 simulations allow us to perform six conjugate-gradient iterations in a synthetic inversion.
In realistic applications, however, the number of iterations is on the order of 20 (e.g. Tape et al. 2010; Fichtner et al. 2010), meaning that a synthetic inversion with only six iterations would be of little use. We therefore conclude that our approach is computationally more efficient than synthetic inversions, while providing a much more complete characterization of resolution.
7.2 Gram–Charlier expansions
A weak point in our development lies in the parametrization of the Hessian by a Gram–Charlier series—truncated, in the simplest case, after the first-order term. As shown in Appendix B, Gram–Charlier series can be considered a variant of Taylor series. It follows that these two expansions potentially share some of their disadvantageous properties: slow convergence or even divergence, depending on the function that we wish to expand.
The efficiency of Gram–Charlier series generally depends on the choice of the parent function. This choice was straightforward in the case of the ββ-component of the Hessian, Hββ, which is always roughly bell-shaped, that is, well-approximated by a Gaussian (see Fig. 2a). However, the off-diagonal elements Hβα and Hρα, shown in Figs 2(b) and 2(c), are less predictable. This makes the choice of a suitable and generic parent function difficult or even impossible. As a result of this complication, we did not attempt to parametrize Hβα and Hρα, meaning that we are currently not able to study interparameter trade-offs in a comprehensive way. Clearly, a more powerful parametrization of the Hessian would be a substantial improvement to our method.
7.3 Relation to migration deconvolution
Full waveform inversion can be regarded similar to various forms of migration (e.g. Tarantola 1984; Chavent & Plessix 1999; Nemeth et al. 1999). Reverse-time migration, in particular, can be interpreted as the first iteration of a full waveform inversion, provided that measurements are made only on body waves with an Ln norm misfit. A close link therefore exists between our approach and migration deconvolution used in exploration. Schuster & Hu (2000) computed migration PSFs for homogeneous media analytically. In the case of heterogeneous media, both ray theory (Hu et al. 2001) and spatially variable matched filters (Aoki & Schuster 2009) have been proposed to approximate PSFs. These were then used for migration deconvolution or ‘deblurring’ (Hu et al. 2001; Aoki & Schuster 2009), much similar to the pre-conditioning of a descent direction with the inverse approximate Hessian.
While the use of ray theory is restricted to well-defined and isolated high-frequency phases, the fitting of PSFs or their inverse with the help of matched filters could easily be generalized to full waveform inversion. In fact, our method proposed in Section 3 corresponds to a matched filter approach where one single input model is replaced by a set of input models that help to constrain the direction dependence of PSFs.
Our method for resolution analysis is, however, more general than resolution analysis is migration, mostly because it is applicable on all scales, to all types of waves and to any misfit functional that one finds suitable for a particular application.
7.4 The problem of multiple minima
The weak point of our analysis is the assumption that the global minimum of χ has been found. In large-scale applications this assumption can hardly be verified, and there is no sufficiently efficient optimization algorithm that does not risk being trapped in a local minimum. This restriction must be kept in mind when using the local Hessian in resolution analysis.
We have developed a new method for the quantitative resolution analysis in full seismic waveform inversion that overcomes the limitations of chequer board type synthetic inversions while being computationally more efficient. Our approach is based on the quadratic approximation of the misfit functional and the parametrization of the Hessian operator in terms of parent functions and their successive derivatives. The space-dependent parameters can be computed efficiently using Fourier transforms of the Hessian for a small set of linearly independent wavenumber vectors.
In the simplest case where the Hessian is parametrized in terms of Gaussians, we can infer 3-D distributions of direction-dependent resolution length and the distortion introduced by the tomographic method. Our approach can be considered a generalization of the ray density tensor (Kissling 1988) that quantifies the space-dependent azimuthal coverage, and therefore serves as a proxy for resolution in ray tomography. The advantages of the parametrized Hessian compared to the ray density tensor include the applicability to any type of seismic wave, and the rigorous quantification of resolution, for instance, in terms of the resolution length. Furthermore, the parametrized Hessian may be used for covariance and extremal bounds analysis, as proposed by Meju & Sakkas (2007) and Meju (2009).
A curious aspect of our approach is that information on the Hessian, and thus on resolution and trade-offs, can be obtained from linearized synthetic inversions where the initial model differs from the optimal model by a sinusoidal perturbation. This suggests that synthetics inversions are, at least in this sense, a useful tool, despite their well-known deficiencies (Lévêque et al. 1993). Yet, they should never be limited to single chequer board tests that bear little useful information.
As a corollary to our developments, we propose several improvements to full waveform inversion techniques—all of which require further investigations that are beyond the scope of this work. These include a new family of Newton-like methods, a pre-conditioner for optimization schemes of the conjugate-gradient type, an approach to adaptive parametrization independent from ray theory and a strategy for objective functional design that aims at maximizing resolution.
The most desirable improvement to our method would be more flexible but still rapidly converging parametrization of the Hessian. This would allow us, for instance, to study trade-offs between model parameters of different physical nature.
We would like to thank our colleagues Moritz Bernauer, Christian Böhm, Cédric Legendre, Helle Pedersen and Florian Rickers for inspiring comments and discussions. We are particularly grateful to Theo van Zessen for maintaining The STIG and GRIT, its little brother. Numerous computations were done on the Huygens IBM p6 supercomputer at SARA Amsterdam. Use of Huygens was sponsored by the National Computing Facilities Foundation (NCF) under the project SH-161-09 with financial support from the Netherlands Organisation for Scientific Research (NWO). Furthermore, we would like to thank Gerard Schuster for making us aware of the relation between the Hessian and PSFs. The comments and thought-provoking questions of Yann Capdeville and an anonymous reviewer helped us to improve the text. Finally, Andreas Fichtner gratefully acknowledges Deutsche Bahn for all the endless delays that provided ample time to finish this manuscript.
APPENDIX A: HESSIAN KERNELS
In this appendix, we provide a condensed recipe for the efficient computation of the Hessian applied to arbitrary model perturbations. In the interest of notational compactness, we introduce the notion of Hessian kernels , defined as the Hessian H(x, y) applied to a model perturbation .
The semicolon in eq. (A4) indicates that the variables to its left are fixed for a given realization of the space-dependent kernel K. We will adhere to this semicolon convention throughout the following developments. The adjoint wave field is the solution of the adjoint wave equation, symbolically written in terms of the adjoint of the wave equation operator L
The adjoint source acts at the receiver positions, and its time evolution is completely determined by the misfit χ. This explains the dependence of on the earth model m. Note that both the forward and the adjoint wave fields, u and , are also dependent on m, that is, u=u(m; x) and
To obtain a relation between the Fréchet kernel K, the Hessian kernel h and the Hessian operator H, we differentiate eq. (A2) once more with respect to m:
It follows from (A6) that the Hessian kernel h from eq. (A2) is equal to the total Fréchet derivative of the Fréchet kernel K in the direction of
To compute h explicitly, we exploit the dependence of the Fréchet kernel on m, u and . The application of the chain rule to eq. (A7) then yields
where denotes the partial Fréchet derivative of K in the direction ; as opposed to the total Fréchet derivative . The symbols and are the Fréchet derivatives of the forward and adjoint fields, respectively.
Eq. (A8) reveals that the Hessian kernel has three contributions. The first one of these, , is localized at the perturbation , and has no obvious physical interpretation. The second contribution, , closely resembles the Fréchet kernel from eq. (A4). However, instead of u it involves the derivative , that can be interpreted as the scattered forward field that is excited when u impinges upon the perturbation . In practice, can be computed most conveniently using finite-differences such as
where ν must be sufficiently small to ensure the validity of the first-order approximation. From a physical point of view, this second contribution to the Hessian kernel describes the effect of second-order scattering from an arbitrary perturbation to and from there to the receiver.
The third contribution to the Hessian kernel, , also resembles the Fréchet kernel (A4), but instead of the adjoint field , it involves the derivative . While may again be approximated with finite differences, for example,
its computation and interpretation are more complex than it may seem. In fact, a model perturbation from m to has two effects: (1) the adjoint field is scattered when it impinges upon and (2) the adjoint source changes as well because a model perturbation affects the misfit χ that determines . It follows that the computation of requires the solution of the adjoint wave equation for the perturbed earth model and the perturbed adjoint source, that is,
In contrast to the second contribution , the third contribution accounts for both the first-order scattering that leads to the perturbation of the adjoint source and the second-order scattering between and the receiver.
A2 A realistic example
To attach physical intuition to the above equations, we consider an isotropic medium parametrized in terms of the P velocity α, the S velocity β and density ρ, that is, . The Fréchet kernel Kβ is given by (e.g. Tromp et al. 2005; Liu & Tromp 2006; Fichtner 2010)
where is the strain tensor computed from the adjoint field. For the specific case of a 25 s Love wave recorded at 25.2° epicentral distance, the Fréchet kernel for a cross-correlation time shift measurement (Luo & Schuster 1991) occupies a cigar-shaped volume that extends from the source to the receiver (Fig. A1).
Making use of eq. (A8) we find that the β-component of the Hessian kernel corresponding to an S velocity perturbation , that is, , is given by
The contribution can be computed directly from the Fréchet kernels. Additional simulations are needed for and because these involve the differentiated forward and adjoint fields, and , that can be computed most conveniently with the help of the finite-difference approximations (A9) and (A10).
The three contributions to the Hessian kernel hβ(δβ) for a point perturbation δβ located beneath southern Sardinia are shown separately in the top row of Fig. A2. The contribution is a superposition of two components, labelled F and S. These correspond to the primary influence zone represented by the approximate Hessian (F) and to the secondary influence zone where second-order scattering affects the measurement. The contribution only extends from the receiver to δβ. Second-order scattering from δβ to an S velocity perturbation within the non-zero parts of has an effect on the measurement. Finally, is restricted to the volume occupied by δβ. The complete Hessian kernel hβ(δβ) is shown in the bottom row of Fig. A2. It can be interpreted as the continuous representation of the row of the Hessian matrix that corresponds to the basis function coincident with δβ. The fully discrete row is obtained by projecting hβ(δβ) onto the basis functions. For a more detailed discussion of Hessian kernels, the reader is referred to Fichtner & Trampert (2011).
APPENDIX B: MULTIVARIATE GRAM–CHARLIER SERIES
The resolution analysis proposed in Section 3 rests on the approximation of the Hessian H(x, y) by a linear sum of terms that consist of a parent function and its successive derivatives. Approximation problems of this type have been studied at least since the end of the 19th century (e.g. Gram 1883; Charlier 1905; Samuelson 1943), and the special case where the parent function is a Gaussian became known the Gram–Charlier series. While originally introduced in a purely mathematical context, Gram–Charlier series became a popular tool for the approximation of 1-D densities in a variety of fields including econometrics (e.g. Sargan 1975), analytical chemistry (e.g. Dondi et al. 1981) and mineral physics (e.g. Pavese et al. 1995). Generalizations to higher dimensions and applications in a geophysical context have, however, remained an exception (e.g. Sauer & Heydt 1979; Gee & Jordan 1992; Del Brio et al. 2009). In the following paragraphs, we describe Gram–Charlier approximations in N≥ 1 dimensions that are based on the representation of a function in terms of its cumulants.
B1 Moments and cumulants
We consider the spatial Fourier transform of a function f(x), defined as
Expanding into a Taylor series, yields
where the components of the tensorial Taylor coefficients M(n) are given by
Introducing (B1) into (B3), reveals that the coefficients M(n)ijk… are equal to the moments of f.
Similar to (B2), the cumulants F(n) of f are defined through the Taylor coefficients of .
we find that the first three cumulants and moments are related as follows:
By definition (eq. B5), the cumulants are additive under convolution.
B2 Wavenumber-domain Gram–Charlier series
Our goal is to approximate f in terms of a suitably chosen parent function g. With the help of eq. (B5), we can express the spectrum as
where F(n) and G(n) are the cumulants of f and g, respectively. Eq. (B8) provides a very general expression of , that can be greatly simplified through the judicious choice of g. Thus, in the context of resolution analysis, the parent function should already capture the principal characteristics of the PSF. We obtain the wavenumber-domain representation of the classical Gram–Charlier A series when the parent function g is equal to the normalized Gaussian
with the cumulants
This eliminates the cumulants of g with orders greater than 2 from eq. (B8). Since we are still free to choose the specific values of G(1) and G(2), we let G(1)=F(1) and G(2)=F(2), so that eq. (B8) reduces to
where M(0) is the zeroth moment of f. Eq. (B11) expresses the spectrum of f in terms of its cumulants with orders n≥ 3, and its first three moments: the norm M(0), the expected value z and the covariance C. By expanding the first exponential on the left-hand side of (B11) into a Taylor series, we obtain the following approximation:
While the coefficients of orders 3, 4 and 5 are equal to F(3), F(4) and F(5), respectively, the higher order coefficients involve products of cumulants.
B3 Space-domain Gram–Charlier series
To translate the wavenumber-domain Gram-Charlier series (B11) into the space domain, we exploit the Fourier correspondence
Again expanding the first exponential on the left-hand side of (B14), yields a more useful equation
The Gram–Charlier series from (B15) can be interpreted as the summation of contributions with increasing wavelength. The truncated expansion
is, in this sense, a long-wavelength approximation.
B4 Properties of the Gram–Charlier expansion
To illustrate the nature of Gram-Charlier expansion, we consider the step function
as a simple 1-D example. The wavenumber- and space-domain Gram–Charlier expansions of orders 0, 3 and 7 are shown in Fig. B1. The 0th-order approximation in the wavenumber domain is a Gaussian that closely matches the exact spectrum within the wavenumber range ±1 m−1. The corresponding space-domain approximation is a Gaussian where the first three moments are equal to those of the step function (B17). Higher order contributions account for the skewness of f(x), but the improvements are limited to the wavenumber interval [−π, π], outside of which the Taylor series from eq. (B11) does not converge. It follows that the Gram–Charlier expansion of (B17) is always a low-wavenumber approximation, even when infinitely many coefficients are included. This behaviour is typical in the sense that a Gram–Charlier series does not necessarily converge, and if it does, the convergence may be slow (e.g. Samuelson 1943; Cramér 1957).
The essence of this example is that the Gram–Charlier series should be considered a tool to capture the gross features of a function with a few coefficients, where the meaning of gross feature is determined by the choice of the parent function.