Comparative appraisal of linear inverse models constructed via distinctive parameterizations (comparing distinctly inverted models)

Authors


Abstract

[1] Most geophysical inverse problems deal with models of the continuous Earth structure. The classical Backus-Gilbert's theory demonstrates that the resolvable model variation is the truth viewed through the resolution kernel that is woven by the data kernels. The actual numerical resolution amounts to the inversion of the Gram matrix formed by the inner products among data kernels. However, due to the usually sizable amount of data constraints and/or imperfection of the forward theory, the practical implementations are usually tackled through certain a priori finite parameterizations based on rather arbitrary choices of bases such as spatial voxels, splines, spherical harmonics, or spherical wavelets. The cross assessment on the consistency among inverse models parameterized or regularized differently has long been downplayed. It is shown in this study that straightforward conversions among different model representations also enable the direct conversions of the Gram matrices. This leads to significant flexibility in formulating the forward data rule in one representation and then carrying out the actual inversion in an alternate domain. Furthermore, it is also fairly easy to convert both the model covariance and resolution matrices across different representations. These conversions thus enable direct assessments across inverse models obtained via different parameterizations and different regularization schemes. An example utilizing preliminary results of an experiment of ambient noise tomography of a plate boundary region of complex tectonics for the northeast coast and offshore area of Taiwan is shown to demonstrate such comparisons.

1. Introduction

[2] It is well known that spatial and/or temporal functions representing the structure of the Earth constrained by finite geophysical observations must be square integrable. That is, observations manifested from Earth structures according to well-established, deterministic physical laws provide finite bandwidth information concerning the Earth model. In other words, a fair approximation of the Earth can only be constituted in the finite degrees of freedom (d.o.f.) provided by the finite amount of noisy and band-limited data constraints. However, it is not possible to make an adequate, a priori estimate of the exact amount of d.o.f. to parameterize the Earth model which is compatible with observed data and is not biased by artifacts resulting from the inversion scheme. The classical Backus-Gilbert (B-G) formulation of linear geophysical inverse problem [Backus and Gilbert, 1968; Backus, 1970a, 1970b; Gilbert, 1971] treats the model variation as continuous spatial and/or temporal function directly and interprets the inverse model as the actual Earth structure viewed through an imperfect window, the resolution kernel, which is constructed from weaving the available data kernels. Whereas conventional models are usually constructed by considering just the data fitting (in addition to a certain regularization scheme to deal with the intrinsic non-uniqueness embedded in almost all geophysical inverse problems), the resolution kernel revealed from the B-G formulation relates the estimated model to the “true” structure of the Earth. The appraisal of the resulting inverse model is made directly in the model space through the resolution kernel, rather than the conventionally indirect assessment in the data space by circuitously examining the data fitting. Because implementation of the B-G formulation requires the inversion of the Gram matrix formed by inner products among data kernels, it might be numerically impractical. This is due to the fact that Gram matrices of modern inverse problems constructed from data constraints are usually sizable and full, rather than sparse. Secondly, direct evaluations of inner products for certain forward approximations are sometimes inaccessible. For example, it is not possible to evaluate the integration of two crossing seismic rays for tomography based on ray theoretical formulation. The usual practice is to discretize the forward rule by choosing a certain ad hoc finite set of basis functions. After the finite parameterization, a resolution matrix analogous to the continuous resolution kernel is equally important in aiding the interpretation of the inverse model. The column vectors of the resolution matrix correspond to point spread functions (PSFs) [Parker, 1994; Oldenborger and Routh, 2009] which reveal how each impulse response of a specific model parameter is spread into the resolved inverse model estimates. The row vectors, on the other hand, associate with the averaging functions (AFs) [Backus and Gilbert, 1968] and indicate how a specific estimate of a model parameter is in fact composed by the weighted average of all “nearby” model parameters. The important information is accessible only by examining the complete resolution matrix, rather than the popularly improvised checkerboard test. For an example, the popular damped least squares (DLS) solution [e.g., Lawson and Hanson, 1974] is regularized by the minimum model norm criteria. It can be shown by examining the AF that the DLS solution is a realization through the Dirichlet kernel that bears significant ringing positive and negative sidelobes arising from attempting a finite approximation to an anticipated spatially localized window with broadband spectrum. This is a general aliasing problem. Imperfect model resolution is attributed mainly to the band limitation of the model information available from the finite amount, and usually, highly nonuniformly distributed data constraints.

[3] An additional complication that further biases the inverse model is the inevitable effect of magnifying the noise contamination embedded within the data. This usually results from those not well-constrained, usually high wave number, components invoked to push for highly localized spatial resolution. This is revealed by the model covariance matrix and is due to the fact that the content of uncorrelated data noise is inevitably manifested in high wave number model components. In other words, the fundamental reason for the intrinsic tradeoff of resolution versus variance for inverse models is embedded in the inevitable tradeoff of spatial versus spectral localizations of resolution kernels.

[4] The most fundamental mathematical concern of a finite approximation to a continuous function is the truncation error. For a chosen basis, say βi(r), i = 1, …, M, expansion of the continuous model variation, m(r), such that equation imagem(r) − equation imageaiβi(r)∥ ≤ ɛ, where ai is the set of finite coefficients, and ɛ is the acceptable tolerance which converges to zero as M increases toward infinity. In a continuous inverse problem, since we will never be able to know exactly the “truth,” m(r), it is thus difficult to validate whether our finite approximation, both the selected basis and the truncation level, βi(r), i = 1, …, M, are in fact appropriate. Trampert and Snieder [1996] points out that the arbitrary truncation of high degree spherical harmonics (SpH) invoked in the parameterization of global inverse problem leads to spectral leakage. It is worth pointing out here that whereas model aliasing is caused by the lack of adequate sampling of the model structure provided by the given data constraints, spectral leakage arises from inadequate truncation of the fine scale basis functions. Chiao and Kuo [2001] derive the formal relation of the actual resolution operator of the truncated model with respect to the original continuous Earth structure. To compromise between the intrinsic tradeoff of the spatial versus spectral localizations, they also devise the multiresolution wavelet parameterization [Chiao and Kuo, 2001; Chiao and Liang, 2003; Chiao et al., 2006; Gung et al., 2009]. Although it is difficult to justify the appropriateness of a particular choice of finite parameterization scheme, it is straightforward to convert inverse models among different parameterization domains along with conversions of the important model resolution and covariance matrices. These conversions are capable of revealing the potential inconsistency that may arise from a particularly invoked parameterization or regularization scheme.

2. Mapping Between Distinctly Parameterized Spaces

[5] Geophysical inverse problems are usually formulated such that the ith observational data, di, is governed by a linearized data rule as an inner product between the model function, m(r), and the Frechet derivative of the data functional, or simply, the data kernel, gi(r), [e.g., Parker, 1977, 1994],

equation image

denoting the spatial variation by the function of the position vector, r.

[6] Adopting a certain kind of truncated, and thus finite basis, function, βi0(r), i = 1, …, M, the discrete model vector, m0 = [equation image]T, is established in the hope that the expansion, equation imagemi0βi0(r) ≅ m(r), approximates fairly to the continuous Earth model. The forward data rule, (1), governing the theoretical prediction of the N observed data, d, is transformed by the discretization into

equation image

The element of the N × M G-matrix, Gij0, now the discrete Frechet derivative of the forward data rule with respect to the model vector, m, is

equation image

in other words, the continuous data kernels, gi, projected onto the same chosen basis invoked in the model expansion. Clearly, the particular choice of having the set of data kernels themselves as the chosen basis functions, that is, βi0 = gi, i = 1, …, N, in representing the Earth model, will in fact lead naturally to the B-G implementation [Parker, 1977]. Notice that this implies that if the chosen basis consists in a pair of bi-orthonormal bases, then the model function, m(r), and the data kernels, gi(r), would each be expanded in terms of the bi-orthogonal primary and dual basis separately to follow the convention of the operation of the inner product. We will refer to the arbitrarily chosen parameterization in equation (2) as the reference parameterization. For example, the chosen basis functions, βi0, may well be the spatially localized spline or boxcar functions. And the model representation, m0, may then be the usual discrete pixels, voxels, or nodes representation of the Earth structure on a regular or irregular mesh covering the region of interest.

[7] For an alternate parameterization and approximation, m′ = [m1 . mi.]T, equation imagemiβiequation imagem, it is related to the reference representation, m0, by the transforming operator, or simply the Jacobian matrix, W = ∂m′/∂m0, such that,

equation image

For example, if the original basis set, βi0, is an orthonormal set, then the content of the transforming operator, W, is defined by the projection of the new primed basis functions onto the original basis functions, Wij = equation imageβi(r)βj0(r)dr. Assuming the existence of the inverse operator, W−1 = ∂m0/∂m′, such that for any row vector, giT, of the matrix, G, GT = [equation image], the inner product with the model vector, m, is

equation image

The first approximation sign, ≅, in equation (5) is simply indicating that the inverse operator might not be exactly restoring the effect of the forward transformation operator. That is, to validate W−1WI, we have to take special caution to assure that consistent d.f. are invoked in both parameterizations. The example of taking discrete wavelet transform (DWT) that preserves the exact d.f. for both forward and inverse transformations would be such an example. Notice that the adjoint operator in equation (5) is defined by, (W−1)* = (W−1)T, and that the operation, giTW−1, is in fact the transpose of the operation, (W−1)Tgi, that is (W−1)T operating on the column vector, gi, or rows of the G-matrix, G. Taking advantage of equation (5), we can rewrite equation (2) as

equation image

It may seem trivial to reach the transformation in equation (6) directly without redundant arguments preceding it after equation (5). However, the extra efforts are to clarify the conversion, G′ = G0W−1, and especially the actual implementation of it, in the framework of the theory of the adjoint operator. That is, the essence of the transformation, G′ = G0W−1, is simply the operator (W−1)T operating on each row vectors of G0. This is very useful for practical implementations of bi-orthogonal expansion such as wavelet transform or smoothing regularizations. All transformations are denoted here in the form of a matrix operator in the standard linear algebraic style, and most of these linear transformations can be written in the matrix form. In fact, the transformations are usually implemented numerically by taking advantage of existing fast transform schemes such as the fast Fourier transform [Cooley and Turkey, 1965], the fast wavelet transform [Mallat, 1989] or the lifting scheme [Sweldens, 1996; 1997]. Furthermore, the transformation in equation (6) is not restricted only in tackling the inverse problems that we emphasized in this study. It is straightforward to envision the matrix equation (2) as, for example, a finite difference implementation of the forward numerical calculation of a certain governing equations set. The transformation in equation (6) then provides a way of directly converting the finite difference formulation into, say, multiresolution representation in the wavelet domain that might provide alternative numerical advantages and physical insights concerning the forward modeling.

[8] An example of such transformation is the bi-orthonormal wavelet transform (BWT) [e.g., Cohen et al., 1992] as demonstrated in the multiscale inversion [Chiao and Kuo, 2001; Chiao and Liang, 2003; Chiao et al., 2006; Gung et al., 2009]. In this case, the forward transformation, m′ = Wm0, is the forward wavelet transform (FWT) of the reference model vector, m0, expanded in terms of the primary wavelet basis; the inverse transform, m0 = W−1m′, is naturally the inverse wavelet transform (IWT) to recover from the same primary basis expansion. Whereas operation, G0W−1, is simply the FWT of each row vector of G0 in terms of the dual wavelet basis. Alternatively, the pair, m′ = Wm0, and m0 = W−1m′, could be the common forward Fourier transform (FFT) and the inverse Fourier transform (IFT) or the forward decomposition and the inverse synthesizing invoking the SpH basis. It is obvious that based on the orthonormality of the invoked bases and the generalized Parseval's theorem, the operation, G0W−1, in these cases correspond to the same forward transform of each row of the reference matrix, G0.

[9] Another extension to more than just switching representation bases is that when a roughening operator, W, is chosen such that the model vector, m0, is convolved with a roughening filter, s, that emphasizes the short wavelength components. The inverse operator, W−1, then represents the smoothing operation that convolves the roughened model, m′, with a smoothing filter, s−1, assuming that it is adequately defined to restore the effect of the roughening W, W−1W = I. That is, if we denote the convolution operator, *, and define the following operation,

equation image

Then the inner product between the data kernel, gi(rr0), and the model function, m0(r), becomes

equation image

It suggests that we can choose a symmetric filter such that s−1(r) = s−1(r), and thus (W−1)T = W−1. Then whereas m′ = Wm0 represents roughening the reference model; G′ = G0W−1, on the other hand, is the smoothing of each row vector of the matrix. This scenario is useful to compare inverse models through different regularization schemes which will be discussed later.

[10] It is straightforward to show that an estimate of the model vector, equation imagedirect0, can be constructed via a generalized inverse operator [e.g., Lawson and Hanson, 1974], equation image. It operates on observation, d, that is generated from the actual Earth structure:

equation image

where Rdirect0 is the resolution matrix for the particular reference parameterization. Similarly, we would have the model estimate, equation imagedirect, for the alternate parameterization defined in equation (6), that

equation image

in the transformed space. Now the model estimate in the original space can be recovered from the solution obtained from the transformed space as

equation image

That is, the directly obtained solution, equation imagedirect0, and the indirect solution, equation imageindirect0, converted from the alternate estimate, equation imagedirect, that has been built by a different parameterization, can be compared on the same ground. The interesting question now is how we should relate the model resolution matrices, R0, and R′, as well as the model covariance matrices in the two different spaces. Substituting equations (4) and (10) into equation (11), we have

equation image

that is, the resolution matrix obtained with the transformed parameterization can thus be easily converted into the original model space,

equation image

and similarly,

equation image

For the model covariance matrix, since

equation image

we thus obtain

equation image

And similarly, we obtain

equation image

Equation (17) has been previously derived in the context of checking the consistency problem when different weightings in the model and data space lead to the change of metrics in each of their own space [Snieder and Trampert, 1999]. Appraisal of the model covariance across inverse models through distinctive parameterizations and regularizations in general has however not been given much attention. Furthermore, cross assessment of the different resolution operators can also reveal the nature of most of the resolution problems encountered in inverse models, such as artifacts appearing in the poorly constrained region [Stark and Hengartner, 1993], the ringing structure due to fitting spatially localized resolution kernel with insufficient density of model sensitivity offered from local data constraints, and the spectral leakage effect.

[11] We can summarize the previous transformation rules in more general forms including the scenario for the case of data reweighting (Snieder and Trampert, 1999; see also special case discussed by Jackson [1972]). That is,

equation image

The data transformation operator, H, might be chosen according to some quality factor, such as the reciprocals of the standard deviations if the corresponding data statistics are available. Note again the exact symmetry of the transformation of the covariance matrix as opposed to the conditional symmetry of the transformation of the resolution matrix resulting from the nature of the (bi-)orthogonality of the invoked basis functions, for example in the multiresolution representation utilizing the wavelet bases. The symmetry of the smoothing operator invoked in the smoothing regularization in equation (8), determines, on the other hand, the symmetry of the transformation of resolution matrix.

3. Potential Applications of Transformation Across Distinct Representations

3.1. Flexibility in Implementing Forward and Inverse Computations

[12] Both local bases of spherical blocks [e.g., Obayashi and Fukao, 1997; Bijwaard et al., 1998; Boschi and Dziewonski, 2000; Gualtiero and Vesnaver, 1999; Kárason and van der Hilst, 2001; Zhou, 1996] or spherical splines [Wang and Dahlen, 1995; Gu et al., 2001; Panning and Romanowicz, 2006] and global basis such as SpH [e.g., Woodhouse and Dziewonski, 1984; Li and Romanowicz, 1996; Ritsema et al., 1999; Mégnin and Romanowicz, 2000] are common in global seismic tomography (Romanowicz, 2003). One of the advantages in invoking SpH in the lateral parameterization is because the approach fits naturally into the spherical coordinate and usually provides straightforward numerical computation such as for the normal mode summation. SpH is, however, not very efficient for parameterizing inverse models for regional high-resolution studies. The conversion in equation (18) serves as an easy transformation such that forward kernels evaluated in global spectral space can still be implemented such that the inversion is carried out in terms of alternate local bases or even multiresolution bases [Chiao and Kuo, 2001; Chiao et al., 2006; Gung et al., 2009]. This is done by transforming the G-matrix row by row as in equations (5) and (6) instead of rederiving the data rule from the scratch in the alternate domain. This helps to take advantage of each of their own potential merit of distinct parameterizations for seeking the optimum set of solutions in the sense that a multiperspective of the Earth structure might be insightful. One such practice to invoke SpH parameterized kernels to invert waveform data of surface wave for three-dimensional mantle velocity structure using the multiscale wavelet representation in the Pacific region has been done recently [Gung et al., 2009]. This method indicates considerable improvements for local spatial resolution.

[13] Along with the dichotomy between global basis SpH and local basis such as spherical pixels in global tomography during the past two decades [e.g., Zhou, 1996; Boschi and Dziewonski, 1999; Chiao and Kuo, 2001], one interesting development has been the efforts to take advantage of irregularly spaced blocks [e.g., Bijwaard et al., 1998]. With this approach, areas with dense rays, or finite kernels, coverage are usually gridded finer than other places within the Earth where data constraints are sparse. This approach is sometimes also referred to as “multiscale inversion.” We suggest that this practice should probably be categorized as “spatially variable scale.” The main difference is that at any given site, there is still only one intrinsically fixed scale implemented in the inversion scheme. This is different from the inversion carried out in terms of the multiresolution wavelet representation because there is a built-in scale hierarchy in the wavelet representation such that the robustly resolvable finest scale at any site is automatically adapted to the local constraints available. In fact, a first attempt which was close to the multiscale hierarchical parameterization approach was to overlay different grids with gradually finer spacing in the same study area [Zhou, 1996]. However, unlike the modern wavelet bases, this approach is based on a redundant and non-orthogonal basis set.

3.2. Comparative Assessment Among Inverse Models via Different Parameterizations

[14] Except for converting distinctively parameterized forward kernels and inverse models, it is also straightforward to invoke equation (18) to convert among inverse models constructed through different parameterizations, along with each of their own corresponding resolution operators and covariance matrices. It used to be considered impractical to consider the evaluation of the complete resolution and the model covariance matrices for sizable inverse problems. Resolution appraisals are sometimes improvised by the checkerboard test, although its validity has been questioned [e.g., Lévêque et al., 1993]. Recent efforts on the implementation of better numerical approximation of the singular vector decomposition of sizable matrices such as PROPACK [e.g., Larson, 1998] and others may eventually enable progress toward the full assessment of the inverse models by actually examining the resolution and model covariance matrices. It is worth emphasizing that there are completely distinctive resolution operators and model covariance matrices for distinctively parameterized inverse models. The conventional and accentuated spatial resolution, examined by the degree of spatial localization of the resolving kernel is fully assessable by the characteristics of the model resolution operator for a finite parameterization based on local bases. However, it is fairly straightforward to convert between the spatial resolution operator and the spectral (Fourier or SpH) or wavelet multiresolution operators using equation (18). This would usually reveal that inverse models parameterized locally may exhibit poor spectral resolution, whereas globally parameterized models tend to behave the other way around [Chiao and Kuo, 2001]. The nonstationary, nonuniform, multiscale resolution might be a useful compromise [Chiao and Kuo, 2001; Chiao and Liang, 2003; Chiao et al., 2006], given that the model information sampled by any particular set of geophysical observational constraints is usually sparse and nonuniformly distributed. There are various measures of resolution depending on the chosen model parameter set, and they should be cross examined to develop a multiperspective vision of what the resolved inverse model is indeed related to the actual Earth structure.

3.3. Regularization and Preconditioning

[15] Equation (2) is usually highly underdetermined and is most commonly solved via DLS to get a model estimate, equation image, that minimizes the data misfit as well as penalizes certain model characteristics that might not be reliably constrained. For example, D could be chosen as a roughening operator such that minimizing ∥Dm∥ results in solutions enforcing long-wavelength components. That is, the cost function to be minimized is

equation image

By choosing a suitable damping parameter, θ2, via tradeoff analysis, the DLS solution is

equation image

By setting D = I, we would invoke the minimum norm regularization scheme that exactly rejects those relatively poorly constrained null-space model components. However, minimum norm solution is known to usually lead to especially disjointed result, whereas roughness penalization has become a common practice to retain smooth and well-constrained model components [e.g., Constable et al., 1987; Smith and Wessel, 1990; Ory and Pratt, 1995; Gouveia and Scales, 1997]. An alternate implementation to equation (20) has been discussed in section 2, assuming that the selected roughening operator is defined such that its inverse operator, a smoothing operator, D−1, is also well defined. For example, instead of invoking the popular numerical implementation of the Laplacian operator as the roughener, the smoothing-roughening operators can be defined by taking advantage of the Gaussian-type filter. Taking into account the fact that the data might have also been reweighted by their quality factor, equation (19) can be rewritten as

equation image

where m′ = Dm; d′ = Hd; G′ = HGD−1. Note that besides the additional data weighting introduced by the operator H, the role of the roughener D is analogous to the transforming operator W in the general context of equation (18) as discussed in equations (7) and (8), so that what has been derived in equation (18) can be easily applied in comparing equations (19) and (21). Furthermore, it is worth mentioning that equation (21) is of the form of the minimum norm regularization in the transformed primed system. Accordingly, the solution is now

equation image

The roughness penalizing regularization is now implemented as the minimum norm regularization of the transformed system that preconditions the original G-matrix as G′ = HGD−1. The advantage is that the minimum norm solution has the precise interpretation of being able to reject null components which are irrelevant in comprehending the data. It manifests the least a posteriori model variance, in the transformed space in this case. Furthermore, invoking the smoothing preconditioner rather than the roughness penalization has significant merit in numerical efficiency [VanDeCar and Snieder, 1994; Claerbout, 1998]. By comparing equation (22) to equation (20), it is clear that the enforced smoothness is the direct manifestation of the smoothing operator, D−1, rather than the indirect result of penalizing the roughness. It is also interesting to mention that although the existence of the inverse operators, D−1, are not guaranteed for all implementations of D, the preconditioner for the popular Laplacian operator has been derived by Claerbout [1998]. In fact, if we replace those popular finite difference implementations by the spectral ones, then it is pretty straightforward to realize the corresponding inverse operator. Furthermore, notice that in the actual implementation of equation (22), there is really no need to invoke D. This yields much more flexibility to choose an adequate smoothing operator rather than the inverse operator for the popular derivative type roughening operator. In fact, Meyerholtz et al. [1989] has discussed the convolutional quelling scheme in which the effect of smoothing is achieved by directly convolving with a function with narrow spectral band rather than invoking a roughening regularizer. It might be argued that the popular roughness penalizing scheme that invokes the criteria of “minimum curvature” does not require the extra efforts to determine an a priori presumed correlation scale length such as the convolution with a specific function will do, say the one standard deviation length for a Gaussian function. The fact is that both convolution with a Gaussian function or roughness penalizing through minimum curvature are all fixed-scale regularizers, unless they are implemented invoking the multigrid technique [Smith and Wessel, 1990]. The only difference is that the convolution smoothing spells out the assumed correlation length explicitly.

[16] On the other hand, the multiresolution wavelet transform has been used [Chiao and Kuo, 2001; Chiao and Liang, 2003; Chiao et al., 2006; Gung et al., 2009] such that the regularization acts on the wavelet coefficients rather than the spatial variations directly. This would lead to filtering through the local-scale hierarchy starting from the robustly constrained long-wavelength components until a certain fine-scale level where the local data constraints accumulated within that wavelength become inadequate to have the corresponding model components robustly resolved. More importantly, this approach adapts automatically to the nonuniform distribution of data constraints by virtue of wavelet bases' capability of compromising between the spatial and the spectral localizations. In other words, it adapts to the nonstationary correlation of model constraints, not the real model correlation, rather than performing fixed-scale smoothing.

[17] Notice again that the distinctive regularization strategies posed in the form of equation (22) with different smoothing or transformation operators can still be implemented in the context of distinct parameterizations that invoke equation (18). If we let the strict minimum norm solution, that is, having D = I, be the reference system, m0, then all the inverse models built using other regularization schemes manifested through invoking distinct smoothing operators, D−1, can in fact be all compared and appraised on a fair ground. That is, to bring all the directly inverted models, equation imagedirect back to the corresponding converted models, equation image0indirect, along with each of their own particular resolution and model covariance matrices in the same space, through the conversion of equation (18).

4. An Example

[18] A simple example of a pilot study of ambient noise tomography for the northeast coast and offshore area of Taiwan is shown here to demonstrate cross assessment of different regularization schemes described above in the context of the general conversions in equation (18) among different representations of inverse models. This area (Figure 1) is rich in slab and crustal seismicity. Although there is dense seismic array on land in Taiwan, there are only two Japanese F-net stations located at the western tip of Ryukyu arc, the parallel topology of the stations coincident with the slab seismicity makes it very difficult to explore the interesting western end of the mantle wedge of the Ryukyu subduction system. With the recent deployment of three broadband ocean bottom seismometers (OBSs in Figure 1), we take advantage of the ambient noise technique [e.g., Weaver and Lobkis, 2001; Campillo and Paul, 2003; Lin et al., 2007] that is unique in the sense of being independent of the local seismicity. That is, as long as the local stations deployment, especially OBSs, can be carefully designed to have the topology of adequate coverage and density, then feasible resolution of the subsurface structure can be achieved. This is very important in this area, with the confusing geochemical signatures [e.g., Wang et al., 2002, 2004], as well as the rather complicated tectonics consisted in the intertwining Okinawa trough, the Eurasian continental margin, and the Ryukyu arc.

Figure 1.

Estimates of surface wave group velocity of period 20 s averaged along paths of receiver pairs obtained by stacking yearlong cross correlations of ambient noise recorded to emerge the EGF among land stations in Taiwan, two Japanese F-net stations, and three broadband OBSs. With fewer than 90 paths, it is apparently far from dense enough to perform an adequate tomographic study. However, this pilot experiment is used to examine the effects of different regularization schemes by invoking transformation (18). Notice the general pattern already revealed in this data set with faster average speed for paths through the aged Philippine Sea plate (southwest rays bundle); generally slow to medium speed along paths mainly through rifted continental margin at both sides of the extending Okinawa Trough and a few faster rays within the trough.

[19] Briefly, travel times for signals of a specific bandwidth, 0.05–1Hz, that is characteristic in this area [Liang et al., 2008], between specific station pairs are measured via stacking cross correlations of continuous noise records to emerge the empirical Green's function (EGF) of the surface wave propagation [Shapiro et al., 2005]. Average group velocities at different periods along pairs of stations, assuming simple great circle paths, are thus extracted via dispersion analyses and are prepared for the regional surface wave tomography. Although the reliable paths data (Figure 1) are far too few for serious regional study, we take advantage of the resulting data to demonstrate the points discussed in this study.

[20] The straightforward base parameterization is to subdivide the study area, 22°N–26°N and 121°E–125°E, into 32 × 32 grids with approximately 14 km interval in each direction. The model function is thus approximated by parameters on the 33 × 33 nodes assuming linear variations in between. We then tackle the inversion utilizing three different regularization strategies, namely: simple damping, convolutional quelling, and multiscale inversion. It is worth emphasizing here that the forward theory is formulated only once for the basic nodes parameterization. Transformation in equation (18) is then taken advantage to solve the problem after convolving each row of the G-matrix to configure the convolutional quelling solution discussed in section 3.3. Furthermore, a BWT is invoked to transform each row of the G-matrix for the multiscale solution within the context of section 3.1. Simple damping yields the DLS solution while the model norm is minimized. In convolutional quelling, we invoke convolution with a Gaussian function as the smoother, that is, operator D−1, in equation (22). A standard deviation length of one grid interval, that is, σ = 14 km, is assumed to define the intrinsic correlation length that characterizes the degree of smoothness. We then adopt the linear bi-orthonormal two-dimensional wavelet basis [Chiao et al., 2006] to transform each row of the G-matrix utilizing the dual basis and carry out the inversion in the wavelet domain. The complete solution, including the resolution and covariance matrices, is then transformed back to the spatial domain to be cross appraised. Namely, all conversions, to and from the reference spatial domain, are performed by taking advantage of equation (18). All the damping factors are determined through careful tradeoff analyses.

[21] The comparison (Figure 2) indicates that the simple damping solution (left column in Figure 2) is dominated by the geometry of the ray distribution and is highly fragmented without a consistent long wavelength pattern. We also compare the vectors representing each PSF, that is, columns of the model resolution matrix after being transformed back to the spatial domain, with the ideal impulse located at the corresponding site and take the L2 norm of the difference. This “spread” is considered as a measure of the degree of spatial resolution with higher spread meaning worse resolution. The best resolution for the northeast coast of Taiwan is located, not surprisingly, where there are concentrated crossing rays (Figure 1). The high model variance is situated also where the resolution is better, indicating the intrinsic tradeoff between resolution and variance. The overall variance is the relatively lowest among the three inversion schemes due to the fact that the model norm is minimized. As for the convolutional quelling solution (center column of Figure 2), it is clear that a fixed-scale smoothing, σ = 14 km, is in effect to enforce longer wavelength components, essentially through fatter rays, as compared to the reference DLS solution on the left. The model resolution matrix is also converted back to the deconvolved space by taking advantage of equation (18) and reveals a different, diffusive distribution as compared to the DLS solution. Notice that the widened rays actually deteriorate the resolution at places with dense rays and are not quite capable of sustaining long-wavelength interpolation in regions of sparse ray coverage. The multiscale solution (right column on Figure 2) is also transformed back to the spatial space from the wavelet domain, along with its associated model resolution and covariance matrices. It is obvious that the inverted model is grouped into a coherent large-scale structure (see also Figure 3) while retaining overall high resolution similar to the DLS solution. The spatial pattern of model variance is somewhat between the DLS solution and the convolutionally quelled solution, in the sense that the better resolved places are still associated with higher model variance, but it also spreads out slightly like the quelled solution. Overall, the smoothing performed by the multiscale solution is apparently different from the convolutional quelling scheme with the fixed-scale smoothness regularization. It preserves better resolution where data constraints are rich enough and at the same time, performing nonstationary regional smoothing such that coherent structures can be retained. In other words, a wavelet-based model would interpolate long-wavelength structure within the neighborhood, from different ranges, where in situ constraints are relatively weak. The minimum norm solution would simply render weakly constrained area to null structure, thus creating gaps and unrealistic, apparent short-wavelength structure. However, the advantage is that this minimum norm criterion also manifests the least a posteriori model variance. Other schemes including convolutional quelling and the wavelet-based solution will all try to interpolate, taking advantage of nearby available model constraints provided from the data, into poorly constrained region. The difference is that convolutional quelling invokes fixed-scale length correlation, just like the popular minimum curvature scheme but explicitly rather than implicitly; whereas a wavelet-based solution tends to be adaptive to the nonstationary nature of the available constraints. The latter two approaches have the least model variance in each of their own domains. For example, coefficients of the wavelet representation can be obtained through equation (18) for all three solutions, but the inversion based on first dual-transformation of the G-matrix into a wavelet domain will have the lowest model variance in the wavelet domain, since it has been demonstrated that all these different regularizations are essentially minimum norm solutions in the particular transformed representations. The resulting lateral heterogeneity, obtained by highly imperfect coverage of data constraints is more or less consistent with the large-scale regional tectonics (Figure 3).

Figure 2.

The same data set (Figure 1) inverted using three different regularization schemes. The left column is the result from simple minimum norm damping, the center column is obtained through convolutional quelling (see the text),and the right column is achieved by multiscale inversion. The top row is the inverted lateral heterogeneity of surface wave group velocity (measured in kilometers per second) of period 20 s. The middle row is the model spread shown by the contour of L2 norm of the difference of each PSF with the ideal impulse (lower value indicates better in situ spatial resolution). The bottom row shows the model variance by plotting the diagonal of the model covariance matrix (higher value demarks worse contamination that might be propagated from the data error). Note how the result from simple damping tends to paint the structure stick to the rays, whereas the one from convolutional quelling behaves basically the same, only with fat rays due to the imposed fixed-scale model correlation. The multiresolution result, however, groups the structure differently.

Figure 3.

Enlarged lateral heterogeneity of surface wave group velocity (measured in kilometers per second) from the multiscale inversion model (top right in Figure 2) overlaid with bathymetry contours (1000 m). Since the 20 s period group velocity is examined, the structure is roughly averaged through depth range in the upper 20 km from the surface (with a smooth depth kernel for the group velocity). Although the ray coverage is too sparse to constrain a useful tectonic interpretation, the overall large-scale pattern that reveals mainly the northeast-southwest to east-west interaction of the Ryukyu subduction system into the Taiwan region is consistent with local tectonics and is encouraging for future studies (more OBSs deployment). The general pattern is consistent with the average group velocity data (Figure 1) that faster speed is associated with the Philippine Sea plate and the southern tip of the Okinawa Trough, whereas slow speed tends to be attributed to the area around Yaeyama islands (part of the southern Ryukyu arc) and other rifted continental crust, although the reliable resolution length is probably too wide due to the sparse raypaths coverage.

Acknowledgments

[22] This research was supported by the National Science Council of the Republic of China under grant NSC 97-2611-M-002-010-MY2. Two anonymous reviewers' as well as the associate editor's comments have improved this work considerably. English editing assistance provided through Taiwan Earthquake Research Center (TEC) funded under grant NSC97-2119-M-001-010 is acknowledged. Major program modules such as the bi-orthogonal wavelet transform and the convolutional quelling invoked for the transformation of the G-matrix, model resolution, and covariance matrices in this study as well as some spherical modules appeared at some previous works are available at ftp://ftp.odb.ntu.edu.tw/chiao/multiscale.tar.