In geophysical tomography, a proper model parameterization scheme for forward modeling is not necessarily a suitable one for the inversion stage, and vice versa. To take full advantage of the merits of parameterization in both stages, we propose a two-step model parameterization approach, in which different model bases for forward computation and inversion are adopted and the basis change is achieved by applying a spatial projection directly to the sensitivity matrix. We demonstrate this approach through an experimental study of waveform tomography for the Pacific upper mantle shear wave structure using first-orbit long-period Rayleigh waves. In the forward modeling, a normal-mode-based nonlinear asymptotic coupling theory is used for the computation of the synthetics and sensitivity matrix, and the model is parameterized in terms of spherical harmonics which provide efficient analytical solutions for path integrals in the forward modeling. Prior to the inversion, the model basis of the sensitivity matrix is transformed to local functions within the study region. After mapping, only local bases around the data sampling path receive effective sensitivities. Accordingly, the computation cost in the inversion is significantly reduced. Furthermore, the two-step model parameterization also adds flexibility to the inversion schemes. In particular, a wavelet-based multiscale inversion is implemented, and its results are compared to simple damping solutions. The general concept and applications of the two-step model parameterization are not restricted to the forwarding modeling technique or model parameterization schemes employed in this experimental study. This approach benefits any inverse problems wherever transformation of model bases helps to better constrain the results.
 In geophysical tomography, model parameterization is important for its influences on results of forward modeling and inversion. For forward modeling, it could be a critical factor for accuracy and efficiency in the computations of data prediction and sensitivity matrices. For inversion, an ideal model parameterization is expected to match the desired data-adaptive resolution with minimum effective elements in the sensitivity matrix. These arguments become more evident for large-scale seismic tomography, in which massive data are usually involved, and the spatial sampling of data is often highly nonuniform as inherited from the nature of earthquake distribution and mostly on-land seismic observatories.
 Conventional parameterization schemes tend to fall into two extreme categories that emphasize either spatial resolution, such as boxcar functions, or spectral resolution such as spherical harmonics (SpH). Among them, SpH are natural bases for the Earth; accordingly, they are often used as lateral model bases in three-dimensional (3-D) global mantle tomography, particularly for those models developed using normal-mode asymptotic theories and long-period waveform data. Here we discuss models of this kind. Comparisons between them and models developed with primarily traveltime data can be found in a review article by Romanowicz .
 Another important reason for the popularity of SpH in global waveform tomography is likely due to its computational advantage in the forward stage, as they can be converted to Fourier series along the source-receiver great circle path, thus providing efficient and accurate analytical solutions for path integrals [e.g., Woodhouse and Dziewonski, 1984].
 At the current stage, major long-wavelength features among various global mantle models are relatively consistent and interest has now focused on the finer details of Earth structure. Nevertheless, limitations of using SpH as model bases are apparent when high-degree models are desired. For global function bases like SpH, no component can be ignored when constructing a local sensitivity kernel. In other words, the resulting sensitivity matrix will be fully loaded with the size proportional to (ℓmax + 1)2, where lmax is the maximum degree used in model expansion, and the follow-up inversion will be greatly hampered with increasing degree and/or amount of data. Furthermore, it has been pointed out that some features in the SpH-based models may be biased from spectral leakage [Trampert and Snieder, 1996; Chiao and Kuo, 2001], which is similar to the aliasing effect when truncated Fourier series is adopted to expand a function with high-degree signals.
 An alternative approach to obtain finer details of the mantle structure is high-resolution regional tomography in areas with dense data coverage. Instead of SpH, local function bases are commonly invoked for such studies.
 Clearly, a proper parameterization scheme for forward modeling is not necessarily suitable for a follow-up inversion, and vice versa. To maximize parameterization merits in both stages, and add flexibilities to inversion schemes, we propose a two-step model parameterization approach in which different model bases of sensitivity matrix are used in each stage through a simple matrix transformation.
 We demonstrate this approach through an experimental study. We first introduce theories to be used in forward computation, inversion and basis transformation of sensitivity matrices, and discuss benefits provided by the two-step model parameterization. We then detail its application to a regional tomography, waveform tomography for Vs structure of the Pacific upper mantle using long-period Rayleigh waves. Finally, results derived from a simple damping scheme and wavelet-based inversion are compared and discussed.
2. Theoretical Background
2.1. Forward Modeling
 We apply a normal-mode-based full waveform modeling technique, nonlinear asymptotic coupling theory (NACT), to the forward modeling, i.e., the computation of synthetic waveforms and sensitivity matrix. Details of this theory are well documented [Li and Tanimoto, 1993; Li and Romanowicz, 1995], here we briefly summarize as follows.
 In contrast to the more conventional 1-D path-average approximation (PAVA) [e.g., Woodhouse and Dziewonski, 1984; Tanimoto, 1986], NACT takes into account coupling between modes both along and across dispersion branches and provides 2-D broadband sensitivity kernels which better resemble the sensitivity of body waveforms to structure along and around the ray geometrical path in the vertical plane containing the source and the receiver. It has been demonstrated that the inclusion of across-branch coupling effects is essential to accurately model the overtone waveforms [Romanowicz et al., 2008]. The comparison was done with a series of numerical experiments in various synthetic 3-D global models, and “exact” synthetic waveforms afforded by the coupled spectral element method were used as references [Capdeville et al., 2002; Chaljub et al., 2003].
 For these early applications of NACT, the desired 3-D Earth models are expressed as perturbations from a starting reference model, and are parameterized laterally in fully normalized SpH, Ylm(ϑ, ϕ) [Edmonds, 1960], and radially in cubic b splines q(r),
where δm represents volumetric model perturbations, ℓmax and kmax are maximum degree in the SpH and numbers of radial cubic b splines qk(r), respectively, and clmk are model coefficients.
 The explicit expression of partial derivatives (i.e., the sensitivity matrix) of the perturbed seismogram δu with respect to model coefficients clmk are derived [Li and Romanowicz, 1995, equations (21)–(27)]. The great circle path integral between each source and receiver pair is required to account for effects due to heterogeneities along the path section, and this computation is simplified by rotating the reference coordinate, such that source and receiver are located on the equator of the new frame, along which the (ℓmax + 1)2 lateral model parameters SpH are expanded in terms of (2ℓmax + 1) Fourier coefficients with a simple recipe [Edmonds, 1960]. This computational benefit, however, doesn't extend to the stage of inversion, in which Fourier coefficients have to be transformed back to SpH domain beforehand, and an inversion barrier will soon be encountered when pursuing higher-degree models, as discussed earlier.
 Lately, local functions spherical splines have been adopted for the lateral model bases in NACT algorithm [Panning and Romanowicz, 2006]. Moreover, Marone et al.  extended the application to a regional study by overlaying different scales of spherical splines to permit higher resolution in the target region.
 Interestingly, regarding the two different parameterization schemes mentioned above, SpH and spherical splines happen to fall into different categories; the former corresponds to global function bases, and the later to local function bases.
 Although spatial mapping among different basis functions is generally straightforward, little attention has been paid to its application on sensitivity matrices and subsequent merits, and forward computations are usually reformulated to adapt to alternative basis functions [e.g., Kuo et al., 2000; Gu et al., 2001; Marone et al., 2007]. In the proposed two-step model parameterization approach, the model basis is altered by applying spatial mapping directly to the sensitivity matrix. Thus, there is no need to reformulate the forward computation, and alternative model bases can be adopted in the inversion stage to fully exploit the parameterization advantages. In the following, we show the basis transformation from a global-function-based sensitivity matrix to local-function-based one, as this will be applied to our experimental study.
2.2. Basis Transformation of Sensitivity Matrices
 In general, we may express the ith point in a perturbed waveform (δui) as an inner product between model perturbations (δm in equation (1)) and the corresponding sensitivity kernel:
where K(r, ϑ, ϕ) is the sensitivity kernel, dV = r2dΩ, dΩ = sin φdφdϑ, and Qk = ∫ KR(r)r2qk(r)dr, the integrated radial kernel. The sensitivity kernel K(r, ϑ, ϕ) is decomposed into radial component Kr(r), and lateral component Kr(ϑ, ϕ). Note that summation over normal modes and mode coupling effects is implied in equation (2). We have also assumed that the sensitivity kernel K(r, ϑ, ϕ) can be separated into a product of Kr(r) and Kh(ϑ, ϕ).
 With equation (2), the global-function-based sensitivity matrix Gilmk is merely
 In contrast to global SpH bases, we may expand model perturbations laterally in terms of local functions,
where bjk are model coefficients, and Hj(ϑ, ϕ) is the jth local basis function.
 With equation (1), equation (4), and the orthogonality of SpH, ∫ Y*lm(ϑ, ϕ)Yl′m′(ϑ, ϕ)dΩ = δll′δmm′, we can relate the local-function-based sensitivity matrix to the SpH-based one.
This is, to compute the sensitivity matrix Gijk, based on local bases Hj(ϑ, ϕ), we simply perform a variable transformation on the already calculated Gilmk.
 The advantages provided by the two-step model parameterization are many: (1) in the forward modeling step, model parameters SpH offer efficient analytical solutions of path integrals for computations of synthetic waveforms and the corresponding sensitivity matrix Gilmk; (2) after the basis transformation, only local functions adjacent to the great circle path of each source-receiver pair are required to effectively model the corresponding data sensitivities. More specifically, the size of Gijk is much smaller than that of Gilmk. Thus, the computation cost of the inversion is greatly reduced, and higher-resolution models might be attempted; and (3) with proper selection criteria to restrict data sensitivities to an area of interest, regional tomography can then be implemented using the local-function-based sensitivity matrix Gijk, even though SpH model bases are used in the forward stage.
 Additionally, note that the integral in equation (5) takes a very general form to convert SpH-based sensitivity to an alternative one, and no restriction is made on the objective model basis yet, namely, instead of certain specific local function basis, we may also choose hierarchical multiscale bases [Chiao and Liang, 2003; Chiao et al., 2006] in place of Hj(ϑ, ϕ), and a wavelet-based sensitivity matrix can be evaluated.
 To simplify the notation, let G be the transformed sensitivity matrix Gijk, m the vector of the perturbed model (equation (4)) to be solved for, and vector d the data residual. The inverse problem can be expressed as
Conventionally, model estimates, , can then be solved by the damped least squares (DLS) algorithm [e.g., Lawson and Hanson, 1974],
The value of the nonnegative damping factor θ2 controls the rigors of the imposed preference of minimum model norm.
 When local functions are used for model basis, it is pointed out that the minimum norm solutions obtained from DLS generally lack interpolation capabilities in sparsely sampled areas and tend to yield fragmented and fractured models [e.g., Chiao and Liang, 2003]. Therefore, preferred model smoothness is often achieved by invoking additional regularization other than simple damping. The commonly used regularization is based on enforcing model smoothness or roughness penalizing, which presumes that the model smoothness [e.g., Meyerholtz et al., 1989], or the intrinsic model correlation length [Tarantola and Nercessian, 1984], is spatially uniform or stationary. Although spatial correlation underlying the Earth structure is expected, stationary correlation length may not be a well-justified hypothesis.
 Both the spatial localization and data-adaptive nonstationary model smoothing can be achieved by applying a wavelet-based regularization [Chiao and Liang, 2003]. In the following, we illustrate a multiscale representation scheme.
 To discretely describe a function f(x) across the interior of the triangle shown in Figure 1, we can specify the spatial variation of f at uniformly distributed nodes. For example, f1 = f(r1), f2 = f(r2), f3, …, where r are position vectors at the internal nodes on Figure 1. These nodes are vertexes of internal triangles through successive levels of refinement of the original triangle by connecting midpoints on the edges. Alternatively, there are ways to build f(x) using hierarchical representations. For example,
where W represents the matrix that transforms the spatial variations from a locally defined function f to wavelet domain h. In contrast to local functions, the spatial interpolation is intrinsically defined in the wavelet domain, such that level 1 parameters describe the largest-scale variations within the base triangular mesh; level 2 parameters (i.e., h42, h52, h62) are specified by residuals between the locally defined f and linear interpolation from level 1 parameters. Finer details can be appended by conducting this procedure to successively higher levels. In such a framework, spatial variations are grouped into local hierarchical scales, with short-scale heterogeneities expressed by their difference from the underlying long-wavelength variations.
 Although the wavelet-based sensitivity matrix can be constructed in the same manner as Gijk through the application of equation (5), we do not follow this direct route for two reasons: (1) we wish to compare the effects of a simple localized parameterization as opposed to the multiscale parameterization, and (2) we may take advantage of the efficiency and flexibility of the lifting scheme to perform the wavelet transform [Sweldens, 1996]. In this study, we transform the representation based on the spherical mesh into one utilizing a spherical wavelet basis [Chiao and Liang, 2003]. More specifically, we recast formula (6) as
That is, we are now solving for wavelet coefficients for the multiresolution representation of the model, or simply, the wavelet transform of the model, Wm. The corresponding modification for the Gram matrix is then simply GW−1. Note that GW−1 = [(W−1)TGT]T. That is, every row vectors of the Gram matrix, G, is wavelet transformed by invoking the dual wavelet basis that is biorthogonal to the primary basis. The new solution becomes
where is the model estimate in the wavelet domain (μ = Wm). With equal degrees of freedom, sensitivity kernels in the wavelet representation are grouped into local hierarchical scales. As a result, the influences of damping regularization applied to the wavelet-based kernels are different from the case of local-function-based kernels. Since longer wavelength components hold more accumulated constraints in the wavelet representation, the damping will act to sort through successive scales depending on the local data constraints. Details will be robustly resolved in sites with dense constraints, whereas long-wavelength features are still available for sparsely constrained area. There is thus no need to invoke additional smoothness regularization.
 With the above forward theory, basis transformation, and inversion methods, we apply the two-step model parameterization approach to an experimental study where the advantages of this approach and evidence of the arguments on the wavelet-based multiscale inversion will be clearly demonstrated.
3. Application: Waveform Tomography of the Pacific Upper Mantle
 Owing to the abundant circum-Pacific earthquakes and seismic stations, the coverage density of the trans-Pacific minor arc surface waves is among the highest on the globe, making the Pacific upper mantle an excellent candidate for high-resolution surface wave tomography.
3.1. Model Parameterization
 To implement the aforementioned multiscale inversion in the regional tomography, we first consider the model basis for local-function-based sensitivity matrix. The maximum degree of SpH expansion in equation (1) can then be assigned according the minimum resolution length offered by the adopted local bases.
 As shown in Figure 2, eight spherical meshes of root level centering at (130°W, 0°) are chosen to cover the study region, and the meshes are refined to the fifth level to form 2048 children meshes and 1071 internal nodes (i.e., the total numbers of index i in equation (8) is 1071). The resulting resolution length is similar to that offered by SpH up to degree 48 (i.e., lmax = 48 in equation (1)).
 In the radial part, seven cubic b splines (i.e., kmax = 7 in equation (1)) with irregular spacing are used for the depth range from the Moho to 670 km (Figure 3). The radial functions are not altered by the basis transformation in this study.
 For regional tomography, we consider the vertical component Rayleigh waveform data of the first orbit. The data are low-pass filtered with a cutoff frequency of 1/60 Hz and a corner frequency of 1/80 Hz. They were recorded on global network (IRIS) for earthquakes with Mw between 5.5 and 7.0 that occurred from 1994 to 2004. Part of the data (1994–2000) was used in the development of an upper mantle Q model [Gung and Romanowicz, 2004]. In addition, we also collected data from a regional network (Full Range Seismograph Network of Japan, F-net) [Okada et al., 2004] for earthquakes occurring in the same time interval.
 We adopt the individual wave packet technique [Li and Romanowicz, 1996]. Only portions with major energy arrivals, including fundamental modes and overtone Rayleigh waves, are selected from the full trace waveform. This allows flexibility in assigning appropriate weights to different phases so as to enhance, for instance, the contribution of higher modes with respect to the naturally dominating fundamental modes. In particular, the weighting scheme of individual wave packets also takes into account waveform amplitude and path redundancy.
 All the data were collected by an automatic picking algorithm, in which various selection criteria are used to guarantee data quality [Gung and Romanowicz, 2004]. To ensure that data sensitivities primarily fall within the study region, only data for which 70% or more of the entire path length lies within the study region are selected. With these criteria, 909 events are used, and the qualified data set consists of 48,104 individual wave packets (1,481,104 data points). The achieved path coverage is shown in Figure 4.
3.3. Starting Model and Crust Correction
 Instead of a spherically symmetric Earth model such as PREM [Dziewonski and Anderson, 1981], we utilize a 3-D mantle model as the starting model to account for effects caused by heterogeneities outside the study region, for the fact that data sensitivities are not entirely limited within the study region as shown in Figure 4. The 3-D model adopted here is the isotropic portion of a recent global anisotropic mantle model, SAW642AN [Panning and Romanowicz, 2006]. The upper mantle of the 3-D starting model, SAW642_iso is reparameterized in terms of SpH and cubic b splines as described in section 3.1.
 For the crust corrections, linear corrections based on boundary undulations are considered by removing effects due to surface topography and Moho perturbations according to CRUST2.0 [Bassin et al., 2000; Mooney et al., 1998].
3.4. Results and Discussion
 We solve the inverse problem by utilizing the LSQR algorithm [Paige and Saunders, 1982]. Two different groups of inversions based on the simple damping scheme (equation (7)) and the multiscale inversion (equation (10)) are conducted, each with wide range of damping factors (θ2).
 In NACT, the sensitivity kernels are 3-D model-dependent, and several iterations are usually implemented to yield the final model. In each new forward computation, an improved 3-D model from the last iteration is used to generate updated sensitivity matrix and residual vector. However, as an experimental study, only one forward computation is done for the following results.
 The variance reduction versus model variance trade-off curves, derived from a wide range of damping factors (0.05, 5), for the two groups of solutions are presented in Figure 5, where the segment for appropriate models is zoomed in for a better comparison. We first notice that, with comparable variance reduction, the results obtained via multiscale inversions have model variances that are considerably lower than simple damping results. This is due to the fact that model variations are assembled through the scale hierarchy from the longer wavelength components in the wavelet domain, such that longer wavelength variations have more accumulated constraint and are better resolved in the multiscale inversion, while shorter wavelength components in regions less supported by data will be automatically damped out.
 Compared to the previous studies [Chiao and Kuo, 2001; Chiao et al., 2006], the observed differences in model variances between models derived from simple damping and multiscale inversion are less pronounced. This is understandable; it implies that the major larger-scale structures have already been fairly represented by the background 3-D reference model SAW642_iso, and only finer details are resolvable for extra perturbations, as shown in Figure 6.
 On the basis of the trade-off curves, we conservatively choose models with variance reduction around 22.5% for the comparison of extra model perturbations; i.e., models derived using damping factors 0.5 for a simple damping scheme, and 0.7 for multiscale inversion, respectively (indicated by the gray box 2 in Figure 5). Note that the variance reduction indicated here is solely due to the extra model perturbations other than that provided by the starting 3-D model SAW642_iso. In other words, the selected models have improved data fit by 22.5% as compared to SAW642_iso.
 In Figure 6, we compare the extra model perturbations derived from the simple damping scheme (model 1) and multiscale inversion (model 2) at three depths. We first notice that, at shallow depth, both models look similar though features in model 2 are slightly smoother. This is because (1) most of the study region is well covered by our data set, and sensitivities for the uppermost mantle mainly result from fundamental modes, which are the naturally dominant phases as most earthquakes occur at shallow depths; and (2) the nonuniform data coverage is mitigated by the data weighting scheme in which data across heavily sampled region are down weighted. Both factors help to generate more homogeneous data sensitivities at shallow depth. As a result, models are less influenced by the inversion schemes utilized, given the same degrees of freedom. This also helps to explain the smaller differences in model variances shown in the trade-off curves (Figure 5).
 The discrepancies between model 1 and model 2 are evident at deeper depth. Model 1 displays much more fragmented features than model 2 as is clearly shown at depths of 250 and 400 km in Figure 6. This is because sensitivities of fundamental modes taper with depth, and overtone phases are usually less excited by most earthquakes. The differences in model 1 and model 2, in particular at larger depths, are consistent with what is shown in the variance reduction versus model variance trade-off curves (Figure 5). Compared to simple damping solutions, models derived by multiscale inversion can achieve similar variance reduction with much less model variances, that is, simpler and more robust variations. The same findings were also observed in previous studies [Chiao and Kuo, 2001; Chiao et al., 2006].
 Next, we superimpose the starting model on model 1 and model 2 to form the complete model perturbations model 1C and model 2C; both are presented in Figures 7 and 8, respectively. Three pairs of models with different damping factors as indicated by the gray boxes in Figure 5 are shown in Figures 7a and 8a, 7b and 8b, and 7c and 8c, respectively. In Figures 7 and 8, models 1C and 2C with similar variance reductions are compared. The same phenomena observed in Figure 6 are further confirmed here that given comparable data fitting, models derived from multiscale inversion constantly exhibit simpler and smoother variations. Note that no additional constraint in spatial correlation is enforced in both inversion schemes.
 While resolved larger-scale features are mostly consistent with the 3-D starting model, there are noticeable smaller-scale anomalies in our high-resolution models. For example, in the west flank of East Pacific ridges, there are north-south trending linear high-velocity anomalies at shallow depths in both model 1C and model 2C (see the boxed features in Figure 8). These narrow linear anomalies are not observed in the starting isotropic 3-D model, SAW642AN_ISO, probably because they are beyond its resolution limit or stronger smoothing regularization is likely employed during the inversion, as these features are also barely visible in the case for stronger damping (Figure 8, model 2C (3)). Interestingly, these narrow linear fast anomalies were also present in a high-resolution (degree 40 in terms of SpH) global phase velocity model of fundamental mode Rayleigh waves [van Heijst and Woodhouse, 1999] and this suggests that they are robust tomographic results.
 Currently, we have no intention of discussing the tectonic meaning of the details resolved in model 2C, mainly because anisotropic effects are not included in the modeling for this experimental study. Anisotropy is a nonnegligible effect in the uppermost mantle, especially for regions underneath the Pacific, as has been pointed out by many studies [e.g., Montagner and Tanimoto, 1991; Ekström and Dziewonski, 1998; Gaboret et al., 2003; Gung et al., 2003; Smith et al., 2004; Panning and Romanowicz, 2006]. Therefore, interpretations of resolved finer details are less meaningful when effects of anisotropy are not modeled. Nevertheless, in addition to its general agreement with the longer wavelength model SAW642_iso, model 2C has resolved many more fine structures accompanied with significant variance reduction, showing the potential toward a high-resolution model in this region using this method.
 We have presented a new approach to fully exercise the advantages of model parameterization schemes used in forward modeling and inversion. In particular, besides computational benefits, this approach also adds flexibility to the inversion schemes. We have demonstrated its application with a regional tomography–waveform tomography for the Vs structure of the Pacific upper mantle using NACT.
 In the experimental study, the model is first parameterized laterally in terms of SpH up to degree 48. Prior to the inversion, the SpH-based sensitivity matrix is mapped onto 1071 nodes of spherical triangle meshes that cover the study region. After the transformation, the size of the local-function-based sensitivity matrix is significantly reduced. For each path, only about 10–15% of the total nodes along and around the minor arc path received effective sensitivities which are originally constructed by all SpH parameters. It is clear that the reduction rate in basis transformation will be even more significant when converting higher-degree SpH to smaller-scale local functions.
 The multiscale inversion is particularly important in places where data sampling is highly nonhomogeneous, which is exactly the case for large-scale seismic tomography, particularly at transition zone depths. Actually, the transition zone is the depth range where global Vs models derived from different groups [e.g., Mégnin and Romanowicz, 2000; Masters et al., 2000; Gu et al., 2001; Ritsema et al., 1999; Grand, 2002] exhibit most of their discrepancies [cf. Romanowicz, 2003; Gung et al., 2003]. In addition to their differences in the adopted modeling techniques and data types, we suspect that model parameterization scheme might be one of the major factors responsible for the discrepancies.
 Finally, we should emphasize that although only lateral model bases are transformed, and a specific forward modeling technique and model parameterization schemes are assigned in this experimental study, the general conception and application of the two-step model parameterization approach is not restricted to any of the above choices.
 We wish to acknowledge the operators of the IRIS, GEOSCOPE, and F-net for providing high-quality waveform data. We thank two anonymous reviewers and the Associate Editor for their constructive criticisms and suggestions. We also thank Barbara Romanowicz and Shu-Huei Hung for their valuable comments. All graphs have been created using the Generic Mapping Tools package [Wessel and Smith, 1991]. This study is supported by the National Science Council of ROC under the contracts NSC 97-2116-M-002-016- and NSC 97-2611-M-002-010-MY2.