### Abstract

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

We examine stacks of several seismic phases having different sensitivities to mantle transition zone structure. When analyzed separately, underside *P* and *S* reflections (*PdP* and *SdS*) are suggestive of very different structures despite similar raypaths and data coverage. By stacking the radial component of *PdP* rather than the vertical *PdP*, we show that this difference does not result from interference from other more steeply inclined phases such as *PKP* and *Ppdp*_{diff}. In general, stacks of *P*-to-*S* converted phases (*Pds*) appear to lack evidence of a 520-km discontinuity when examined without other phases. When these phases and stacked topside *P* reflections (*Ppdp*) are analyzed jointly using a nonlinear inversion method, consistent but nonunique, seismological models emerge. These models show that a discontinuity at ∼653 km depth has smaller contrasts in density and velocity than found in most previous studies. A sub-660 gradient can account for the majority of this difference. A 1.6 ± 0.5% *P*-velocity contrast and a 2.2 ± 0.3% density contrast at ∼518 km depth without a *S*-velocity contrast can explain the lack of a *P*520*s*, together with robust *Pp*520*p* and *S*520*S *phases. For models parameterized with a finite thickness for each discontinuity, the 410-km discontinuity is consistently ∼3 times thicker than the 660-km discontinuity.

### 1. Introduction

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

The mantle transition zone is an enigmatic region within the Earth where seismic velocity and density increase rapidly with depth. The transition zone, bounded by discontinuities at ∼410 km and ∼660 km depth (often referred to as the 410 and the 660), is thought to be of key importance for controlling mantle convection and the resulting heat transport within the Earth. These discontinuities are likely due to phase changes of olivine and other minerals as pressure increases with depth [*Ringwood*, 1975; *Jackson*, 1983]. Over the past several decades our understanding of this region has vastly improved due to exciting progress in mineral physics and seismic analysis, and massive increases in data quantity and coverage. A variety of reflected and converted seismic waves are sensitive to the depth and impedance contrasts across the transition zone discontinuities [e.g., *Vinnik*, 1977; *Revenaugh and Jordan*, 1989, 1991; *Shearer*, 1991, 1993]. Using a multitude of wave types and a variety of analyses, many general conclusions have been drawn regarding the nature of the transition zone.

The discontinuities are shown to vary in topography on regional scales [e.g., *Vidale and Benz*, 1992; *Wicks and Richards*, 1993; *Shen et al.*, 1996, 1998; *Vinnik et al.*, 1996; *Dueker and Sheehan*, 1997; *Flanagan and Shearer*, 1998b; *Li et al.*, 1998; *Gilbert et al.*, 2003] and globally [e.g., *Shearer and Masters*, 1992; *Shearer*, 1993; *Gossler and Kind*, 1996; *Gu et al.*, 1998, 2003; *Flanagan and Shearer*, 1998a; *Chevrot et al.*, 1999; *Gu and Dziewonski*, 2002; *Lawrence and Shearer*, 2006] from studies using a variety of data types and processing techniques. The topographic variations of the 410- and 660-km discontinuities result in a thickening of the transition zone near subducted slabs and a thinning of the transition zone elsewhere (especially beneath plumes [*Li et al.*, 2003]). This thickening and thinning is likely the result of thermal variation in the mantle causing the respective phase changes to occur at different pressures (or depths) determined by Clapeyron slopes of opposite signs [e.g., *Bina and Helffrich*, 1994].

The various types of data and processing techniques used to analyze topography and impedance contrasts often yield similar results, but frequently differ to some degree. Figure 1 shows a graphical summary of the different phases discussed in this study. While high-frequency *P*- and *S*-wave triplications are known to result from large velocity contrasts at ∼660 km depth [e.g., *Grand and Helmberger*, 1984; *Walck*, 1984; *Kennett*, 1991; *Ryberg et al.*, 1998] the *PP*-precursors (underside reflections off the discontinuities, *PdP*) indicate little to no impedance contrast at this depth range [e.g., *Estabrook and Kind*, 1996; *Shearer and Flanagan*, 1999].

There are various problems with direct comparison among different types of data and analyses. One difficulty in relating results from different seismic phases is that each has different lateral coverage. While *SS*- and *PP*-precursors provide global coverage, triplication data are limited to seismic stations within ∼33° of earthquakes. Figure 2 graphically represents the lateral data coverage for *Pds*, *PdP*, *SdS*, and *Ppdp*. Another difficulty arises from the frequency band used to study each type of wave. Longer-period waves, such as *SS*-precursors (or *SdS*), may view a sharp gradient as a discontinuity, whereas the shorter period *Pds* (*P*-to-*S* converted phases) can often differentiate between discontinuities and gradients.

Additional problems arise due to different sensitivities in the Earth. These can be characterized in two ways. First, the waves are sensitive to varying scales of structure. For example, while *SdS* has a large X-shaped sensitivity kernel spanning more than 40° by 40° [*Shearer*, 1991; *Shearer and Flanagan*, 1999; *Dahlen*, 2005], *Pds* is only sensitive to a small region beneath each seismic station. Consequently, *Pds* can measure topography on the order of 50 km, while *SdS* studies typically average over 1000-km-wide regions. Nevertheless, the two data types yield roughly similar global patterns in transition zone thickness [*Lawrence and Shearer*, 2006]. Second, the different waves are sensitive to different elastic properties of the mantle. For example, the *SdS* is caused by reflectivity resulting from variations in shear velocity (*V*_{S}), and density (*ρ*), while *Pds* results from compressional to shear impedance, which is sensitive to compressional velocity (*V*_{P}), *V*_{S}, and *ρ.*

In this study we characterize the transition zone by analyzing multiple data types: *SdS*, *PdP*, *Pds*, *Ppdp* (Figure 1). The underside reflected *SdS* and *PdP* waves have similar coverage, but differing sensitivities to the elastic parameters. *S*410*S* and *S*660*S* are relatively large amplitude discontinuity phases that result in stable global stacks [e.g., *Shearer*, 1991, 1993; *Flanagan and Shearer*, 1998a; *Gu et al.*, 1998]. The *S*520*S* is much weaker, but still robust once the effects of *S*410*S* and *S*660*S* are accounted for [*Shearer*, 1996; *Ryberg et al.*, 1997]. Consequently, many of the constraints on transition zone structure come from *SdS* stacks. While *P*410*P* is a robust feature, *P*660*P* is surprisingly low amplitude [*Estabrook and Kind*, 1996]. The low amplitude of *P*660*P* appears even more remarkable when compared to that of the topside *P*-wave reflection: *Ppdp*, which has large amplitudes for *Pp*410*p*, *Pp*520*p*, and *Pp*660*p* [*Nguyen-Hai*, 1963; *Husebye and Madariaga*, 1970; *Davies et al.*, 1971; *Gutowski and Kanasewich*, 1974; *Ward*, 1978; *Shearer*, 1991]. *Pds* is known to have a similar global average and lateral variation in transition zone thickness to that of *SdS* [*Lawrence and Shearer*, 2006], which suggests that the two data types are compatible. However, the global stack of *Pds* lacks a *P*520*s* [*Lawrence and Shearer*, 2006], which is in contrast to the robust *S*520*S*.

We test the hypothesis that the apparent discrepancies between the phases described here are largely reconcilable given the different sensitivities. To test this hypothesis we employ ray-theory waveform modeling and compare synthetic waveforms to observed stacks of each wave type. In doing so we solve for one model that best fits all of the data. Due to the complexity of this endeavor, linear inversions are impractical. Instead we use a mass-forward modeling technique, called the niching genetic algorithm (NGA), to locate the most optimal solution. While computationally expensive relative to linear inversions, the NGA uses an evolutionary paradigm to search the entire model space, and efficiently iterates toward the best solution. One advantage to this technique is that it allows for model comparison and trade-off analysis. We discuss the limitations of our model, data, and technique, and then draw conclusions from the robust features of the best-fitting model.

### 3. Stacks

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

The stacking process of each wave is automated in a similar manner to *Shearer* [1991] in order to minimize bias. After performing quality control and preprocessing as described above, the waveforms are aligned on the *P*, *PP*, or *SS* reference phase. The time shift for each record is simply the time of maximum absolute amplitude within the “signal” window. The polarity is reversed for negative amplitude peaks. The amplitude of each wave is normalized such that the picked amplitude is set to the SNR, which emphasizes clean data and dampens noisy data. All amplitudes greater than ± SNR (after the normalization) are capped at ± SNR. Waveforms are binned and stacked according to event-to-station distance, with a bin size of 0.5°. A nine-point mean smoothing filter is applied to the 2-D stack of amplitude plotted on time v. epicentral distance (Figure 3). The stacking for *Pds* varies from the others in only two respects. The vertical component is spectrally deconvolved from the radial component after alignment and prior to weighting. A water level of 0.02 and a Gaussian filter width of 0.4 are used to stabilize the spectrally deconvolved receiver functions [e.g., *Ammon*, 1991; C. J. Ammon, An overview of receiver-function analysis, http://eqseis.geosc.psu.edu/∼cammon/HTML/RftnDocs/rftn01.html, 2006]. Additionally, the receiver functions are stacked into 1° bins rather than 0.5° bins.

The stacked amplitudes are plotted on time v. distance maps with positive amplitudes in blue and negative amplitudes in red (Figures 34– 5). The amplitudes are relative to the reference phase (*P*, *PP*, or *SS*), and saturation levels are noted in each subpanel. As observed by numerous other studies [e.g., *Shearer*, 1991; *Shearer and Flanagan*, 1999; *Gu et al.*, 1998, 2003], the stacked *SdS* phases stand out several minutes prior to *SS* as having signal well above the noise (Figure 3a). The *P*410*P* is visible in the distance ranges 100°–118° and 130°–140° on the vertical *PP* stack (Figure 3c). The *P*520*P* and/or sidelobe of the *P*410*P* is visible between 106° and 123°. However, *P*660*P* is difficult to see in this distance range due to low amplitude, interfering waves, or both. On the radial *PP* stack (Figure 4), the interfering waves are damped more than the *PdP* phases because these waves arrive more steeply and therefore are recorded with lower amplitudes on the horizontal component (Figure 5). While the radial stack is less stable because the radial waveforms have lower SNR, the *P*410*P* is clearly visible without interference between 84° and 140° degrees. Clearly, if the *P*520*P* and *P*660*P* were of similar amplitude to *P*410*P*, these phases would also be visible on the radial stack, but they are not.

The stacked *Ppdp* are clearly visible on the vertical *P*- and *PP-*wave stacks (Figures 3c and 3d) for the 410-, 520-, and 660-km discontinuities. The *Pp*410*p* is visible from 52° to 120°, while the *Pp*660*p* is limited to between 70° and 120°. We do not analyze *Ppdp* beyond 90° to limit contamination due to core-mantle boundary phases and the heterogeneous lowermost mantle. The *Pp*520*p* and/or the sidelobes of the *Pp*660p and *Pp*410*p* is visible from 57° to 90°. The *Ppdp* all have negative amplitudes relative to the direct *P* wave, so they appear red. The *P*660*s* and *P*410*s* appear as robust phases on the receiver function plot (vertical deconvolved from radial; Figure 3b) from 40° to 90°, but the *P*520*s* is absent in this range.

The 2-D stacks are collapsed into 1-D stacks by summing the stacks for each distance and depth along the appropriate set of moveouts associated with the desired phase (Figures 3e–3h). This summation is conducted prior to smoothing the 2-D stacks to avoid unnecessary pulse broadening/damping. This is achieved with minimal waveform distortion by first interpolating from travel time to bounce depth or conversion depth using the correct distance range for each bin, and then interpolating depth back to time using a single distance. In this manner, the 2-D stack is correctly collapsed for all times and distances, and bias is reduced by not collapsing the stacks along a single moveout associated with a particular depth. Times associated with negative depths are simply stacked with the moveout of the reference phase of the initial stack (*P*, *PP*, or *SS*). Some features appear more clearly in these 1-D stacks than in the 3-D stacks. For example, *P*520*P* and *P*660*P* are visible in the 1-D stack, whereas they were below the noise in the 2-D stacks. Error bounds for each stacked waveform are determined using a bootstrap method.

### 4. Waveform Modeling

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

In this section we describe the creation of synthetic waveforms and the technique we use to compare transition zone structure with observed waveforms. We employ the following ray-theoretical method because it is fast and computationally efficient. As discussed below, we calculate hundreds of thousands of synthetic waves in a mass-forward modeling technique, so computational efficiency is crucial. As with *Shearer* [1996] we start with the assumption that the source function is given by the reference phase of each stack (*P*, *PP*, or *SS*), and that only the 1-D elastic structure in the transition zone is important. These assumptions are fair considering that the reference phase waveforms are stacked from thousands of traces. Consequently, the reference phases do approximate source time functions [e.g., *Shearer*, 1991], and the laterally varying structure is effectively averaged into 1-D. Therefore the waveform is easily calculated by convolving the transition zone's elastic response function with the reference phase.

We follow the subsequent steps in the calculation of each synthetic waveform. Figure 6 graphically illustrates the steps involved in the computation for a synthetic *SS*-precursor, but the technique is equivalent for each wave type. First, we construct a discrete 1-D velocity and density model as a function of depth, *V*(*z*). Then, given an event-to-station distance, we calculate the ray theoretical travel time of a reflection or *P*-to-*S* conversion at each depth, and map the velocity structure into time, *V*(*t*). The amplitude of a reflected or transmitted phase is calculated from the reflection or transmission coefficient [*Aki and Richards*, 1980] associated with each depth or time, *R*(*t*). The amplitudes of this Earth response function are corrected, *s*(*t*), by scaling according to the change in amplitude, *A*(*t*), due to geometric spreading and anelastic decay: *s*(*t*) = *A*(*t*)/*A*(*t*_{0})*R*(*t*), where *t*_{0} is the time of the reference phase. We only illustrate *S*(*t*) here and not *R*(*t*) because the geometric spreading factor is typically near unity for raypaths that are similar to the reference phase. Therefore *R*(*t*) and *S*(*t*) are very similar. For geometric spreading we use [*r*_{0}/*r*(*t*)]^{2}, where *r* is the distance traveled by the phase reflected/transmitted at a depth corresponding to time *t*. For anelastic decay we employ the quality factor model, QL6 [*Durek and Ekstrom*, 1996]. The final step is to convolve the reference pulse, *Ref*(*t*), with the amplitude corrected Earth response function, *S*(*t*) = *Ref*(*t*)**s*(*t*). The calculation of a synthetic receiver function, *S*(*t*) varies in that the vertical synthetic, *S*_{Z}(*t*), is deconvolved from the horizontal synthetic, *S*_{R}(*t*), and that multiple phases are modeled simultaneously (*Pds* and *Ppdp*). For consistency, the spectral division used in calculation of the synthetic receiver functions is stabilized with a water level of 0.02 and Gaussian filter width of 0.4, in the same manner as the observed receiver functions.

Under the assumption that the transition zone discontinuities have laterally varying topography, and that this topography causes pulse broadening in the observed stacks, we apply pulse broadening to our synthetic waveforms. The distributions of discontinuity topography from *Flanagan and Shearer* [1998a] are assumed to represent the true distribution of topography. Standard deviations of 21.8, 28.0, and 33.8 km are calculated for the 410-, 520-, and 660-km discontinuities respectively. We divide the discontinuity among the depths within 2 standard deviations of the modeled depth giving each depth a velocity increase proportional to modeled discontinuity jump multiplied by the Gaussian operator associated with that depth. This has the effect of smoothing the discontinuities resulting in pulse broadening.

### 5. Upper Mantle Corrections

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

One-dimensional models produce artifacts due to three-dimensional heterogeneities in crustal and mantle structure. One problem with heterogeneity is that the average crust and mantle structure, as sampled by the various phases, is different from the geographically averaged crustal structure predicted by the model, AK135(-F) [*Montagner and Kennett*, 1996; *Kennett et al.*, 1995; *Engdahl et al.*, 1998]. Three-dimensional heterogeneity in the upper mantle has differing effects on the global stacks of the various discontinuity phases due to different data coverage. Therefore, in order to model the 1-D transition zone structure accurately, we must account for the effects of three-dimensional crust and upper-mantle structure. Theoretical travel time residuals between a reference phase and discontinuity phases (e.g., *SS*-*SdS*) are determined by tracing the appropriate rays through 3-D velocity models of the mantle (SB10L18 [*Masters et al.*, 2000]) and the crust (CRUST 2.0 [*Bassin et al.*, 2000]). The model SB10L18 was chosen because it accounts for both *P*- and *S*-velocity anomalies equally. We calculate three (one for each discontinuity) theoretical travel time residuals for each waveform that goes into each stack. These travel time residuals are migrated back to the reference distance (or ray parameter) for each 1-D stack. The average travel time residual for each stacked waveform is nonzero because the data coverage is uneven. In order to facilitate comparison between the observed and synthetic waveforms, we shift the synthetic waveforms by the theoretical travel time residual calculated for each phase. We apply the theoretical travel time residual as a single average time shift with opposite sign for the whole synthetic waveform rather than distorting the waveform by shifting each phase independently. These time shifts are 0.4 s for *Pds*, 0.2 s for *Ppdp*, 0.5 s for *PdP*, and 0.9 s for *SS*. The net result of these corrections is to deepen the transition zone interfaces by ∼ 3 km.

*Shearer* [1991] and *Flanagan and Shearer* [1998a] tested for a systematic offset in *SS* travel times due to upper mantle-structure by computing the *SS-S* travel time residual, *δt*, relative to the reference model. This was accomplished by cross-correlation of the Hilbert transform of the *S* wave with the *SS* wave for the stacked waves with distances from 65° to 95° degrees. In practice, the correction is ill-constrained, varying as a function of distance, the reference quality factor model, and whether the reference stack is *S* or *SS*. Beyond a distance of ∼90° the *S* wave interacts with the highly heterogeneous lowermost mantle and the core-mantle boundary, which contaminates the *δt* value. The value of *δt* varies between 0.03 s and 1 s for different subsets of stacked waveforms from 65° to 85° using the reference model, QL6 [*Durek and Ekstrom*, 1996] with a reference stack of *S*. The value ranges from −0.2 s to 0.7 s for an equivalent stack referenced to *SS*. The *δt* values change by ±0.6 s when using different quality factor models (PREM [*Dziewonski and Anderson*, 1981], PAR3P [*Okal and Jo*, 1990], QM1 [*Widmer et al.*, 1991], AK135(-F) [*Montagner and Kennett*, 1996] and QLM9 [*Lawrence and Wysession*, 2006]). The best estimate we have of the *SS-S* correction is 0.4 ± 0.7 s. The equivalent estimate for *PP-P* is −0.2 ± 0.5 s. Because of the small value and large error bars of these corrections, we choose to ignore them.

### 6. Forward Modeling Approach

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

We determine the model that best explains the data by searching the whole model space with an automatic algorithm rather than searching the model space by hand. While trial and error forward modeling is a reasonable approach to fit a single discontinuity phase [e.g., *Shearer*, 1996], it is difficult to account for tradeoffs between the model and multiple waveforms by hand. Linear inversion is not always stable in an environment where tradeoffs exist between parameters (e.g., *V*_{S}, *V*_{P}, and *ρ*, and depth). Consequently, we choose a parameterization that requires as few free variables as possible so that mass forward modeling is made feasible. Given the parameterization described below, with only 13 search variables (3 Δ*V*_{S}, 3 Δ*V*_{P}, 3 Δ*ρ*, 3 discontinuity depths, and 1 peg depth), a simple two-stage grid search would require the creation of more than 2 × 10^{13} models with associated synthetic waveforms to locate a reasonable model without severely limiting the viable multidimensional parameter space. The creation of so many models is not practical. As discussed below, even a more sophisticated search algorithm requires the computation of 10^{4}−10^{6} models and associated synthetic waveforms to ensure that the whole model space is searched, even with a simplified parameterization.

We model three discontinuities with 0–12% contrasts in *V*_{S}, *V*_{P}, and *ρ* at 410 ± 25, 520 ± 25, and 660 ± 25 km depth. Above and below each discontinuity, the velocities and density increase with the same gradients as AK135(-F) [*Montagner and Kennett*, 1996]. AK135(-F) is a modified version of AK135 [*Kennett et al.*, 1995; *Engdahl et al.*, 1998] that includes a 1-D density profile of the Earth that is constrained by normal modes in addition to fitting the ISC travel times of *P* and *S* waves. Hereafter, AK135(-F) is referred to as AK135. Additionally, we model a steep velocity gradient beneath the 660-km discontinuity with a single parameter describing a peg-depth (700–820 km), below which the model is set to AK135. This sub-660 gradient is defined by the modeled bottom-side *V*_{S}, *V*_{P}, and *ρ* at the 660-km discontinuity and the AK135 velocity and density at the peg depth. Above and below each discontinuity, the velocities increase with the same gradient as AK135. This parameterization allows us to model *V*_{S}, *V*_{P}, and *ρ* from 300 km to 850 km with only the 13 parameters described in Table 1. More complex parameterizations with 16 and 17 variable parameters are also discussed below.

Table 1. Parameterization 1Discontinuity | *V*_{S}, % | *V*_{P}, % | *ρ*, % | Depth, km |
---|

410 | 0–12 | 0–12 | 0–6 | 385–435 |

520 | 0–12 | 0–12 | 0–6 | 495–545 |

660 | 0–12 | 0–12 | 0–6 | 635–685 |

Sub-660 Peg-depth | - | - | - | 700–820 |

### 7. Niching Genetic Algorithm (NGA)

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

Rather than applying a simple grid search we use a niching genetic algorithm (NGA) [*Mahfoud*, 1995; *Koper et al.*, 1999], which uses an evolutionary paradigm, to search the model space efficiently for locally optimal and globally optimal solutions. We refer readers to *Koper et al.* [1999] for a general overview of NGA and its application to geophysical problems, and only provide a cursory description here. A standard genetic algorithm operates by first creating a population of random models and comparing the forward model with the observed data. Models associated with high misfit, or cost, are removed from the population of models. Models with low cost continue on to the next generation, where new models are constructed from random perturbation and cross-breeding between the best models. In this manner, after several generations, only models associated with low cost are retained, and the population converges toward an optimal solution. The NGA is a compound version of the genetic algorithm, where multiple genetic algorithms, each controlling a subpopulation, compete for a portion of the model space. The competition is imposed by applying an artificially high cost to any model in a lower-order subpopulation that is sufficiently similar to the best models of the higher-order subpopulation.

The cost is calculated as the normalized sum of squared differences between the observed and synthetic waveforms. For each stacked waveform at each discrete time step, *i*, we have a standard deviation (*σ*^{2}) calculated using the bootstrap method. We therefore normalize the difference by the variance (*σ*) before summing the squared difference. The sum of squared differences normalizes the misfit by the number of points, *M*, so the NGA does not favor one waveform over another. The total misfit cost is described by equation (1):

We place several a priori constraints, *A*, on the forward modeling process to ensure that it converges toward a realistic solution. Under the assumption that the mass of the Earth is well resolved, we impose a cost associated with excess or shortage of mass relative to AK135; the cost is equal to the difference in the sums of density divided by the number of layers (500). We do not directly impose moment of inertia constraints because we only model small density differences from AK135, so the redistribution of mass is minor. Rather than modeling the waveform at one distance, we model the waves at two distances, which imposes the general constraints of AK135 on the model because the stacked waveforms are summed along the moveout prescribed by AK135. These distances are 120° and 140° for *PP* and *SS*, and 70° and 85° for *Pds* and *Ppdp*. For graphical purposes in the following figures (Figures 7891011–13) we only present waveform fits associated with the greater distance for each waveform. Just as a secondary precaution we add another cost equal to the difference in vertical travel times from AK135 for *P* and *S* waves. Therefore the resultant model should not violate the data that were used in the creation of AK135.

There are several advantages of using a niching genetic algorithm [e.g., *Koper et al.*, 1999]. First, the NGA is faster and more computationally efficient than a standard grid search. Second, it does not depend upon a starting model. Third, it allows the user to examine a suit of models, rather than just one. This is important for understanding tradeoff and placing error bounds on the best estimate. Fourth, the NGA locates local minima with each subpopulation so that if alternate solutions exist they will be located.

### 9. Comparison to PREF

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

The gradual increase in *V*_{S}, *V*_{P}, and *ρ* through the transition zone is assumed to be largely due to an adiabatic increase in temperature and pressure. The discontinuities likely represent temperature and pressure conditions where phase changes occur. High-pressure mineral physics experiments show that phase changes can produce rapid changes in elastic moduli and density [e.g., *Ringwood*, 1975]. In this section we compare the most optimal seismic models described above with seismic models constrained by geochemistry and mineral physics.

*Cammarano et al.* [2005] provides 99 seismic models constrained by pyrolitic composition that fit ISC *P*- and *S*-wave travel times and fundamental spheroidal and toroidal modes as satisfactorily as AK135. These 99 physical reference models (PREF) fit the travel times and fundamental modes best from 100,000 models used in a Monte Carlo type inversion that varied 70 mineral physics parameters. In general these models have lower velocities above the transition zone, a larger jump near the 410-km discontinuity, lower gradients in the transition zone, and high gradients below the 660-km discontinuity than AK135. We compare these 99 seismic models with the stacked waveforms and our lowest misfit model (Figure 11). The pyrolitic model fits the stacked waveforms better than AK135, but worse than our most-optimal models. Because the pyrolitic models lack a 520-km discontinuity, phases associated with this discontinuity are missing in the synthetic waveforms. For all waves, the 410-km discontinuity phases are matched well. While the amplitudes of the 660-km discontinuity phases are better than those of AK135, the times of these phases are off because the depth of the interface is greater (∼665 km).

There are necessary differences between the pyrolitic models and those presented here due to their respective constraints. Our models are not constrained by fundamental modes, and are only constrained to *P*- and *S*-wave travel times through similarity to AK135. Additionally, the gradients above, within, and below the transition zone are set to AK135, so the pyrolitic models are outside the model space explored here. The pyrolitic models are limited by their pyrolitic composition and poorly constrained mineral physics parameters. The suite of models presented by *Cammarano et al.* [2005] have no 520-km discontinuity, which limits their appropriateness.

Despite differences between the PREF models and our models, there are several significant similarities. Both PREF and our models have relatively shallow gradients for the 410 and relatively steep gradients beneath the 660 compared to AK135*.* Both models indicate that the 410 has greater density and velocity contrasts than the 660, and the 660 is sharp relative to the 410. The fact that mineral physics predicts these observations on the basis of a pyrolitic composition suggests that these may be robust features. The PREF models lack parameterization of a second interface related to the garnet phase transformation, so it is difficult to compare the lesser optimal solution with two interfaces with PREF.

### 10. Discussion

- Top of page
- Abstract
- 1. Introduction
- 2. Data
- 3. Stacks
- 4. Waveform Modeling
- 5. Upper Mantle Corrections
- 6. Forward Modeling Approach
- 7. Niching Genetic Algorithm (NGA)
- 8. Results
- 9. Comparison to PREF
- 10. Discussion
- 11. Conclusions
- Acknowledgments
- References

The forward modeling and simulated inversion techniques used here make some key assumptions. First, the stacks are assumed to be representative of the spherically averaged global structure. However, as shown in Figure 2, the lateral coverage for each phase is quite different, and is generally uneven. Additionally, the stacks are collapsed to two distances for each wave, but the same data are used for each stack, and the amplitudes change with distance for the synthetic waveform calculation. So the inversion process must compromise between the two distances. It is possible that this causes some bias; however our results change only slightly if only one or different distances are chosen. We take lateral variations in discontinuity topography into account using the probability density function of the long-wavelength model of *Flanagan and Shearer* [1998a] but the transition zone discontinuities are known to vary by different amounts on short-wavelength scales [e.g., *Li et al.*, 2000, 2003]. Consequently, there may be some differences in pulse broadening.

The data used here are all long-period seismograms sampled at 1 Hz. The waveforms were band-limited between 0.2 and 0.02 Hz or 0.1 and 0.01 Hz, which reduces the sensitivity to the sharpness of the discontinuities. At low frequencies the synthetics do not distinguish between steep gradients and sharp interfaces. At higher frequencies, only sharp discontinuities are observed. So future experiments may benefit from similar broadband analyses or multiple band-limited analyses. However, complications due to the frequency dependence of anelasticity, and the lack of signal coherence at higher frequencies may impede such experiments.

The most optimal models for all three parameterizations of the simulated inversion indicate average discontinuity depths of 413 ± 1 km, 517 ± 4 km, and 654 ± 1 km. If the 660 is actually composed of two distinct discontinuities [e.g., *Deuss et al.*, 2006] and no underlying steep gradient, as found by the locally optimal model described in Figure 8 and Table 4, then the discontinuity depths are likely 414 ± 1 km, 519 ± 3 km, 663 ± 3 km, and 699 ± 11 km depth. In this parameterization the transition zone is 249 km thick. Although the absolute depths for the 410, 520, and 660 may be biased by the use of AK135 above and below the transition zone, the relative values should not be significantly biased. While we attempt to correct for 3-D heterogeneity, it is never possible to be certain that the 3-D corrections are completely correct. Because the 3-D corrections must account for *P-* and *S-*wave velocity, few 3-D models of mantle velocity are capable of producing both corrections accurately. The *P-* and *S*-wave velocity model used here, SB10L18 [*Masters et al.*, 2000], has lower resolution (10°) compared to individual *P*- [e.g., *Montelli et al.*, 2003] or *S*-wave models (e.g., SB4L18 [*Masters et al.*, 2000]). However, the resolution is equivalent for both *P*- and *S*-wave velocity in SB10L18, so the corrections do not add bias due to uneven resolution. Consequently, the observation of a ∼241 ± 2 km thick transition zone is robust and consistent with previous stacking studies [*Shearer*, 1996; *Flanagan and Shearer*, 1998a; *Gu et al.*, 1998, 2003; *Lawrence and Shearer*, 2006].

The agreement between the depths of the seismic discontinuities (at approximately 410, 520, and 660 km) and the corresponding pressures of experimentally determined phase changes suggests that the two are directly linked. There is a wide range of reported depths of each discontinuity [*Shearer*, 2000], which is partially due to lateral variation in topography. The best estimates for the average depths of each discontinuity likely come from the stacking of long-period *SS*-precursors. *Flanagan and Shearer* [1998a] found average depths of 418, 515, and 660 km. *Gu and Dziewonski* [2002] observed averages of 411 and 654 km depth. *Gu et al.* [2003] inverted for gradual velocity heterogeneity and transition zone discontinuity topography simultaneously, resulting in average discontinuity depths of 409 and 649 km. The differences between these studies likely stems from differences in travel time corrections due to upper-mantle and crustal structure. While the absolute depths are muddled by different corrections from 1-D and 3-D velocity structures, *Flanagan and Shearer* [1998a], *Gu et al.* [1998], and *Gu et al.* [2003] agree on transition zone thickness, 242 ± 2 km. Recently, *Lawrence and Shearer* [2006] used receiver functions to demonstrate a similar spherically averaged transition zone thickness (242 km).

The amplitudes of the velocity and density contrasts at each discontinuity are much less certain than the topography due to ambiguities in seismic modeling and greater scatter in the amplitude data than in travel time data. Yet constraints on density and velocity contrasts for each discontinuity are important for mineral physics constraints on geodynamic modeling of mass and heat transport between the upper and lower mantle. Therefore these seismic constraints have large implications for global convection. Previous seismic studies provide a range of velocity and density contrasts for the 410 (Δ*V*_{P}(410) = 5.5 ± 2.5%, Δ*V*_{S}(410) = 4.9 ± 0.8%) and the 660 (Δ*V*_{P}(660) = 4.5 ± 2.5%, Δ*V*_{S}(660) = 6.8 ± 0.5%) [*Shearer*, 2000]. The elastic contrasts of the 520-km discontinuity are much less certain. Most studies model the density and velocity contrasts at 410, 520 and 660 km with single first order discontinuities, rather than as a gradient, a discontinuity underlain by a steep gradient, or two distinct interfaces, so these estimates are biased by the assumed geometries.

In this study we obtain velocity and density contrasts for each interface as described by Tables 2, 4, and 6. Because we model multiple seismic phases rather than just one, we have more constraints, which reduces the tradeoff between the velocity and density contrasts. Nevertheless, some tradeoffs still exist. The best-fit model having finite width discontinuities has 410-km discontinuity contrasts of Δ*V*_{P} = 6.1%, Δ*V*_{S} = 7.0%, and Δ*ρ* = 6.9%. These values are more similar to the theoretical contrasts for a pyrolitic composition (Δ*V*_{P} = 5 ± 0.5%, Δ*V*_{S} = 8 ± 1%, and Δ*ρ* = 4.3 ± 0.3%) [*Weidner and Wang*, 2000] than those of previous works [*Shearer*, 2000]. However, the theoretical results can vary significantly for small changes in temperature and composition. Alternatively, the best-fit model having a first-order 410-km discontinuity (Table 2) has lower contrasts for all values (Δ*V*_{P} = 4.8%, Δ*V*_{S} = 5.1%, and Δ*ρ* = 4.8%), which are more in line with previous studies [*Shearer*, 2000]. This study cannot differentiate between a thin and thick 410-km interface due to the long wavelength of the data used here, so nonuniqueness exists. The transition zone likely has 410-km contrasts between those presented in Tables 2 and 6.

The 660-km discontinuity is more difficult to compare with other results due to more complex phase changes associated with both olivine and garnet [*Simmons and Gurrola*, 2000]. There is likely a steep gradient or curvilinear increase in velocity and density beneath the 660 [*Shearer*, 1996; *Weidner and Wang*, 2000], which makes interpretation of contrasts difficult to quantify. Again, theoretical seismic profiles of pyrolitic composition are highly dependent upon temperature and vary even with slight compositional variations [e.g., *Cammarano et al.*, 2005]. Nevertheless, the theoretical seismic profiles of *Weidner and Wang* [2000] indicate that the discontinuity depth is closer to 650 km than 670 km depth (as seen in PREM [*Dziewonski and Anderson*, 1981]), and that the rate of change as a function of depth decays toward adiabatic by 700 ± 20 km depth. The results presented here indicate that the *V*_{P} contrast at the 660 is smaller than predicted by previous experiments, having the majority of the velocity increase accommodated by a sub-660 gradient. This is most consistent with higher temperature (1900–2100 K) pyrolitic composition at 660-km depth [*Weidner and Wang*, 2000]. By decreasing the aluminum content from 5% to 3% and lowering the temperature to 1700 K, the 660 can gain a second interface at ∼660 km depth [*Weidner and Wang*, 2000], which lends credence to the double interface model (Figure 8, Table 4) [*Deuss et al.*, 2006]. While this is possible, the mineral physics calculations show that the deeper (garnet) interface should have a larger density contrast than *P*-wave velocity contrast, compared to the upper interface, which is inconsistent with our observations. Indeed, the lower interface observed here has small contrasts for both *V*_{S} and *ρ*. Additionally, the deeper interface observed here with several less-optimal models is at ∼700 km, which is not consistent with the *Weidner and Wang* [2000] result (∼660 km). Of course, the transition zone varies laterally in both temperature and composition, so the spherically averaged models presented here likely represent an average of all plausible theoretical profiles, not the average condition profile.

The double interface for the 660 presented here agrees marginally with the results of [*Deuss et al.*, 2006], insofar as to show that it is possible to model one interface at ∼660 km depth and another near ∼700 km depth. However, having two interfaces seems less likely as a global feature because (1) the distinct double interface model provides worse waveform misfit than the single interface model, (2) the double interface model is only plausible if the already smoothed models of *Flanagan and Shearer* [1998a] and *Masters et al.* [2000] are damped even further with a low multiplication factor of *ζ* < 0.25, and (3) the observation of the double interface has only been identified in isolated regions [*Deuss et al.*, 2006]. Therefore we suggest that two distinct interfaces are not likely as global features.

While the 410- and 660-km discontinuities are routinely observed with refraction experiments, these experiments often fail to observe a 520-km discontinuity [e.g., *Cummins et al.*, 1992; *Jones et al.*, 1992]. Stronger support for the observations of a global 520-km discontinuity comes from observations of long-period reflected phases [e.g., *Shearer*, 1991, 1996; *Revenaugh and Jordan*, 1991; *Gu et al.*, 1998; *Deuss and Woodhouse*, 2001]. Because the 520 is more pronounced in reflected phases than in the refraction of seismic waves, it has been proposed that the bulk of the impedance change occurs in density rather than shear velocity [*Shearer*, 1996]. This is supported by mineral physics experiments studying the elastic properties of the *β*- to *γ*-olivine phase change, where changes on the order of Δ*V*_{P} = 1–2%, Δ*V*_{S} = 0.8–1.5%, and Δ*ρ* = 2.5–3% over as much as 50 km depth range. In this study we observe a 520-km discontinuity having, Δ*V*_{P} = 1.2 ± 0.2%, Δ*V*_{S} = 0.4 ± 0.4%, and Δ*ρ* = 2.1 ± 0.8%, which is roughly consistent with mineral physics results considering that the discontinuity is modeled as a thin interface rather than a 50-km thick nonlinear phase transition.

The widths of the 410, 520, and 660 presented in Table 6 are likely upper bounds. To reduce these bounds one simply needs to increase the Gaussian filter width used to model transition zone topography and 3-D velocity heterogeneity. Short period (1 Hz) underside reflections from *P*′*P*′ precursors are observed for the 410 and 660, but not for the 520 [e.g., *Benz and Vidale*, 1993; *Xu et al.*, 1998, 2003], which signifies that the 410 and the 660 may be sharp (< 5km) interfaces while the 520 is more gradual. However the 410 is observed less consistently with 1 Hz *P*′*P*′ precursors than is the 660 [*Xu et al.*, 2003], which suggests that the 410 may be more gradual. While we cannot constrain the actual thickness of either interface due to unknown broadening resulting from 3-D heterogeneity, the 410 appears to be ∼3 times thicker than the 660. Additionally, the larger 410 thickness presented here is similar to that expected for a pyrolitic composition [*Helffrich and Bina*, 1994; *Strixrude*, 1997; *Cammarano et al.*, 2005]. It is possible that lateral variations in temperature, composition, and water concentration change the sharpness and shape of the 410 such that there are both gradients and sharp interfaces with different strengths in different locations [*Xu et al.*, 1998, 2003]. The lack of *P*′*P*′ precursor observations associated with the 520 may simply reflect the low impedance contrast and general difficulty in observing the 520.

Future application of the methods presented here to regional subsets of the data will likely increase our resolution on the shape, sharpness, and contrasts of the mantle transition zone discontinuities. Rather than stacking highly variable structures, such a study would require less severe Gaussian filters to account for 3-D velocity heterogeneity and discontinuity topography. Individual regions may be shown to have different shapes to their respective elastic profiles. If this proves to be true, complex spherically averaged models such as those presented here may not be appropriate.