The Yellowstone hot spot has recently been shown to be a plume that extends into the transition zone. At roughly 60–120 km depth, the plume material rising beneath Yellowstone Park is sheared SW by North America Plate motion, producing a profound low velocity layer emplaced beneath the thin lithosphere. To constrain the absolute seismic velocity of the plate-sheared plume layer, fundamental mode Rayleigh wave observations have been inverted for phase velocity using the two plane wave technique. The resulting phase velocity models are inverted with Moho-converted P to S arrival times to better constrain crustal thickness and absolute S wave velocity structure to ∼120 km depth. A regionalized S wave velocity model has an extremely low velocity minimum of 3.8 ± 0.1 km/s at 80 km depth beneath the hot spot track. Nonregionalized 3-D velocity models find a velocity minimum of 3.9 km/s beneath the hot spot track. Below 120 km depth, our resolution diminishes such that the lateral spreading of the plume track is not resolved. The volume of the low velocity plume layer is small and the estimated buoyancy flux for the Yellowstone plume is <0.1 Mg/s which contrasts with the ∼9 Mg/s value for Hawaii. In addition, a notable region of thick crust and high lower crustal velocities is found around Billings, Montana, consistent with previous refraction and receiver function studies that interpret this as evidence for a massive Precambrian underplating event.
 Over the last ∼17 Ma, the Yellowstone hot spot has produced a series of progressively younger silicic eruptions that extend from the Oregon-Nevada border NE to Yellowstone National Park [Christiansen and Yeats, 1992]. Because the Yellowstone hot spot track (YHT; Figure 1) lies in a region of extensive recent magmatism, including the Columbia River Basalts and the Oregon High Lava Plains, the cause of the hot spot magmatism has remained controversial. This anomalous magmatism has been variously hypothesized to be an extending lithospheric crack [Hamilton, 1989; Christiansen et al., 2002], a self-propagating mantle melt instability [Humphreys and Dueker, 1994], or a thermal mantle plume [Armstrong and Leeman, 1975; Morgan et al., 1984]. Recently, P and S wave tomography studies [Yuan and Dueker, 2005; Waite et al., 2006] have found a low velocity conduit extending from Yellowstone Park and inclined 15° from vertical toward the NW. The base of the conduit extends into a locally thinned transition zone [Fee and Dueker, 2004] but does not extend across the 660 km discontinuity. This structure is consistent with a thermal plume confined to the upper mantle. At ∼60–120 km depth beneath the YHT, this plume material has been sheared to the SW by North America plate motion, producing a profoundly low velocity region beneath the YHT associated with some combination of high temperatures and partial melt [Schutt and Humphreys, 2004]. Although ascending plume material interacting with a moving lithosphere is predicted to follow parabolic-like streamlines [Sleep, 1990], the observed Yellowstone velocity anomaly does not follow parabolic-like streamlines downstream from the present location of the hot spot under Yellowstone Park [Saltzer and Humphreys, 1997; Schutt and Humphreys, 2004; Yuan and Dueker, 2005; Waite et al., 2006]. In addition, a parabolic-like distribution of fast SKS wave polarization axes is not observed [Waite et al., 2005]. The above observations all suggest the YHT is the manifestation of a weak upper mantle plume with little buoyancy induced lateral flow at the base of the lithosphere.
 The previous teleseismic body wave tomographic models all have limitations, however. Only relative velocity variations are constrained, which makes it difficult to compare absolute velocities to other regions or calculate absolute temperature. In addition, the previous body wave tomograms do not resolve crustal velocity and thickness variations, and the tomograms' best-case ∼40 km vertical resolution scale is poor with respect to surface wave studies [e.g. West et al., 2004]. In this study, we measure the velocity of fundamental mode Rayleigh waves in the region. These data are combined with measurements of the relative arrival time difference between receiver function direct P and Pms arrivals to produce models of absolute S wave velocity and crustal thickness variations.
 Data were derived from two broadband PASSCAL experiments that operated in late 1999–2001 and whose recording times overlapped by 75 d (Figure 1): the 46 station Yellowstone array and the 31 station Billings array (Figure 1). These arrays operated from 9 June 2000 to 4 June 2001 and 30 September 1999 to 23 August 2000, respectively. The arrays were instrumented with CMG-3T, CMG-40T, and STS-2 sensors with corner periods at 120, 30, and 120 s. Careful attention was paid to transfer function normalization and channel polarity because the two-plane-wave technique [Forsyth and Li, 2005] fits both the Rayleigh wave's amplitudes and phase with equal weight.
 Events with surface-wave magnitudes >5.7, from distances of 30° ≤ Δ ≤ 120°, were examined for simple fundamental mode Rayleigh wave packets at 17 wave periods between 15 and 150 s. To measure phase velocities at the 17 periods used in this study, seismograms were narrowband (10 mHz width) filtered with a zero-phase four-pole Butterworth filter. The filter center periods were at 15, 18, 20, 22.5, 25, 27.5, 30, 35, 40, 45, 50, 66, 75, 90, 110, 125, and 150 s. Band-pass filtered traces without a relatively simple fundamental mode Rayleigh wave envelope were rejected. This data culling produced 81 events to be analyzed, although for any given event, measurable Rayleigh wave energy was not generally observed at all periods. For most periods, 50–60 events were analyzed (Table 1) which provided data from a reasonable range of back-azimuths (Figure 2). Traces were converted to their Fourier real and imaginary components at each filter center period. Therefore, each station-event seismogram provided up to 34 data associated with 17 periods with one real and one imaginary component at each band-pass period.
Table 1. Variance Reduction by Period for Regionalized Model
Wave Period (s)
Number of Rays
Phase Variance Reduction (%)
3.1. Phase Velocity Measurement With Two Plane Wave Approximation
 2. Using the plane wave parameters derived in step 1 and a starting phase velocity model (discussed below), the data are iteratively inverted for updates to the starting model using
where G is the matrix of partial derivatives, Cmm and Cdd are the a priori model and data covariance matrices, Δd is the difference between the observed and predicted Fourier coefficients, mo is the original starting model, and m is the current model [Tarantola and Valette, 1982]. With each iteration, high misfit events are downweighted to minimize the effects of Rayleigh waves that are too multipathed to be fit by the two-plane wave approximation [Forsyth and Li, 2005].
 Ideally, the data would be sufficient to produce a fully resolved two-dimensional model of phase velocity variations for each wave period. However, similar to most teleseismic tomography, there were insufficient data to produce a unique two dimensional phase velocity map [Menke, 1989]. Similar to previous studies using the two-plane wave technique, we take a conservative approach to this problem and seek to find the simplest model parameterization that fits the data [Forsyth et al., 1998a; Li et al., 2002; Li and Detrick, 2003; Li et al., 2003; Weereratne et al., 2003; Fischer et al., 2005; Forsyth and Li, 2005; Li and Detrick, 2006]. To accomplish this, three sets of phase velocity inversions were conducted, starting with a uniform phase velocity inversion, proceeding to a regionalized phase velocity, and ending with a two-dimensional phase velocity inversion. The regionalization domains are defined based on teleseismic P and S body wave tomograms [Schutt and Humphreys, 2004; Yuan and Dueker, 2005; Waite et al., 2006]. Our most important choice is the width of the hot spot track (Figure 1). The body-wave tomograms constrain the low velocity body beneath the hot spot track above 100 km depth to be 80–120 km wide. But, in the 100–200 km depth range the width of the low velocity anomaly varies between 150 and 200 km. Given that our resolution kernels have little sensitivity below 120 km depth, we have chosen to fix the width of the regionalized hot spot track width to 110 km.
 The uniform model is used as initial starting model (m in equation (1), above) for the next more complicated model. Specifically, the following three inversions are performed.
 1. Invert Rayleigh wave observations for mean phase velocity at each wave period.
 2. Use (1) as a starting model, and invert observations for mean phase velocity in each of the three tectonic regions: Basin and Range (BR), Wyoming Craton (WY), and Yellowstone hot spot track (YHT) (Figure 1).
 3. Use (2) as a starting model and invert data for two-dimensional phase velocity maps. In these inversions, we use “fat ray” Gaussian approximations to the sensitivity kernels. Ray width is chosen such that 95% of the cross-sectional area falls within the first Fresnel zone [Weeraratne et al., 2003; Gudmundsson, 1996].
 Thus, at each of the 17 data periods, three sets of inversions are performed with each inversion requiring two steps. Statistics for the inversion of 30 s waves are listed in Table 2. Note that because poorly fit events are downweighted, the variance reduction between models cannot be strictly compared [Forsyth and Li, 2005].
Table 2. Phase Velocity Results at 30 s
RMS Phase Misfit (s)
Total Variance Reduction (%)
Phase Variance Reduction (%)
Figure 3 shows the results of the regionalized phase velocity inversions. The data fit is excellent with 85–95% of the data variance being explained by the three region model (after poorly fit events are downweighted). This suggests that the regionalized velocity structure is a good approximation to the actual structure. To assess whether the mean phase velocity of the tectonic regions is uniquely resolved, the rank of the resolution matrix
is calculated (Figure 3c; variables are the same as in equation (1)). This value diminishes from a value of three for the shortest periods, implying that the mean phase velocity within each region is uniquely resolved, to less than one for the 150 s period. A unique regionalized phase velocity for the longest period waves cannot be resolved because of the relatively narrow width of the YHT and BR regions.
 Another way to assess the regionalized inversions robustness is to compare the results of the regionalized inversion with the two dimensional inversions (Figure 4). For wave periods >50 s, no significant perturbations to the regionalized phase velocities are required to fit the data. This observation implies that it is sufficient to characterize the velocity structure in terms of our three tectonic regions. One can also see the effect of the regionalized starting model on the 2-D inversion results by comparing inversions performed using the regionalized starting model (model 4 in Table 2) to those that did not use the regionalized starting model (model 3) (Figure 5). The effect of the regionalized starting model is evidenced by the continuity of the low velocities along the hot spot track. Correspondingly, the nonregionalized model does not show this hot spot track continuity. Our preference for the regionalized model is based upon two reasons. First, the body wave tomograms show that this is an accurate regionalization: in particular the 110 km width of the hot spot track domain is well constrained by the body wave tomograms. Second, comparisons of the variance reductions between the three differently parameterized models (three velocity profile fit and the 2-D nonregionalized and regionalized starting model images) shows that all of them provide comparable fits to the data (Table 2).
 It might be expected that the strong velocity variations beneath our study area would cause some events to be poorly modeled by the two plane wave approximation to the incoming wavefield. As previously stated, strongly multipathed events are downweighted during the inversion. We note that the two plane wave technique has been used successfully at locations with strong velocity variations, such as Tanzania and the East Pacific Rise [Forsyth et al., 1998b; Weeraratne et al., 2003]. Examination of the well-fit events with a phase RMS misfit <3 s shows that no systematic misfit exists (Figure 6; Table 3). As expected, the wavefield parameters show increased scattering at shorter periods. For instance, the mean azimuthal anomaly (with respect to the great circle path) of the largest of the two plane waves decreases with increasing period. In addition, the amplitude ratio of the smaller plane wave to the larger plane wave also decreases with increasing period. Furthermore, the scatter of the primary wave azimuthal anomalies also decreases with increasing period. If the large mantle velocity anomaly is causing unusual behavior in the two plane wave fits, this would be expected to occur at the periods most sensitive to the depth of the anomaly, namely the 30–66 s period waves. Given that no distinctive increase in misfits occurs for this period range, we expect the main conclusions of the paper are not sensitive to strongly multipathed waves.
 SKS phase shear-wave splitting measurements find a nearly uniform anisotropy field, having a mean horizontal fast symmetry direction of N43°E, with a mean split time of ∼1.2 s [Waite et al., 2005]. To assess the effect of azimuthal anisotropy on our phase velocity maps, cos(2θ) and sin(2θ) velocity terms, in which θ represents azimuth, are added to the model vector [Montagner et al., 2000]. We then invert for isotropic phase velocity variations and the mean anisotropy magnitude and orientation at each period. The resulting phase velocity curves are very similar to the pure isotropic profiles (Figure 7). This result suggests that the regionalized isotropic phase velocity results are not significantly biased by velocity anisotropy.
3.3. Inversions of Phase Velocity for Regionalized Shear Wave Velocity Structure
 On the basis of the initial VP and VS structure, data kernels ∂c/∂VS (T, z) and ∂c/∂VP (T, z) were calculated where c is phase velocity, T is wave period, and z is depth (Figure 8) [Weeraratne et al., 2003]. Density effects on phase velocity have an insignificant effect and were fixed to nominal values [Weeraratne et al., 2003]. The S-P wave velocity perturbation scaling (dVS/dVP) was assumed to be 1.2, appropriate for the near solidus conditions of the YHT [Cammarano et al., 2003; Schutt and Lesher, 2006].
 The observed phase velocity maps were inverted iteratively for VS structure with the recalculation of the data kernels between iterations. The iteration was repeated until velocity updates changed <0.001 km/s. The least squares inversion was regularized using diagonal damping. The approximate “elbow” value of the damping parameter was 25 s-2, and the best tradeoff between resolution and variance (Figure 9). For consistency, the same damping value was used for the regionalized BR and WY inversions, although this makes these inversions slightly underdamped (Figure 9). The rank of the resolution matrix for the regionalized VS inversions varies from 3.7 for the Wyoming craton region to 2.1 for the Basin and Range region. As expected, the resolution diminishes with depth (Figure 10). Noteworthy is that our Rayleigh wave data set cannot resolve the midcrustal basalt sill beneath the eastern Snake River Plain [Peng and Humphreys, 1998], the bottom of the low velocity plume layer, or the plume conduit.
3.4. Inversions of Phase Velocity for 3-D Shear Wave Velocity Structure
 Because inversions of phase velocity maps for VS are quite sensitive to crustal velocity and thickness variations (Figure 8), it is important to incorporate a priori information on crustal thickness. We have used measurements of the time difference between the direct P arrival and the Moho converted Pms arrival (hereafter called the Pms time) from receiver functions as crustal thickness constraints [Yuan et al., 2006]. These Pms times were spatially averaged using a Gaussian half-width of 40 km to create a smoothed 2-D map (Figure 11). The coherent pattern of Pms time variations varies by 2.4 s due to crustal thickness variations between 38 and 54 km [Yuan et al., 2006]. In addition, receiver function reverberation analysis finds a mean crustal VP/VS value of 1.78 [Yuan et al., 2006].
 The Pms times are incorporated into our velocity inversions as constraints. For each surface grid point, the VS and VP starting profiles (VS and VP as a function of depth for that grid point) used in the regionalized VS inversion were modified by adjusting the thickness of the lower crustal layer to match the observed Pms times. Using these modified VS and VP profiles, data kernels were generated to invert the observed 2-D phase velocity maps (Figure 4) for an updated VS structure. The updated VS structure is then used with the Pms time constraints to estimate a new crustal thickness. This process was iterated three times until the VS and VP profiles matched the observed PmS times within their error bars. Synthetic tests show that the crustal thickness and mean crustal velocity can be resolved within a few percent of their true values. The bootstrap estimated Pms time errors are ±0.2 s and the VP/VS errors are ±0.04. Given these uncertainties, crustal thicknesses are accurate to ±3 km, and the mean crustal VS is accurate to ±0.1 km/s.
 Vertical resolution of the velocity inversions can be estimated from the rank of the resolution matrices (Figure 12). Resolution at the depth of the low velocity channel (∼70 km) in the mantle is ±20–30 km and diminishes with depth [Weereratne et al., 2003]. Horizontal resolution varies with depth: within the crust horizontal resolution is ∼40 km and at 150 km depth is 150 km. The 2-D phase velocity models produced with regionalized and nonregionalized starting model show how the regionalization affects the lateral resolution of the surface waves (Figure 5). In general, the horizontal and vertical 2-D resolution is sufficient to determine crustal velocity variations and the horizontal extent and velocity of the plume layer.
 Crustal thickness ranges from 38 to 54 km with the thinnest crust beneath the Montana Basin and Range province and thickest crust beneath the Wyoming Craton (Figure 13a). Average crustal S-wave velocity varies by 11% with low velocities in the BR and YHT regions (VS = 3.4, VP = 6.0) and highest velocities in the northeast quadrant of the image (VS = 3.8, VP = 6.6; Figure 13b). These mean crustal velocities are consistent with other studies in the region [Gorman et al., 2002; Zhou and Stump, 2004; Zeiler et al., 2005].
 Beneath the YHT, a profound low velocity zone with a velocity minimum of 3.8 km/s at 80 km depth is imaged (Figure 11 and 13) which is 7% lower than the adjacent 4.1 km/s mantle. The 100 km width of the hot spot track anomaly imaged by the body wave tomograms [Saltzer and Humphreys, 1997; Schutt and Humphreys, 2004; Yuan and Dueker, 2005; Waite et al., 2006] is similar to the 80 km width of the eastern Snake River Plain. The halo of average velocities (yellow colors in Figure 13c) around the hot spot track is caused by inversion regularization, and the actual velocity transition is sharper as resolved by body wave tomography. It is notable that our velocity model is consistent with plume material only residing beneath the 100 km wide eastern Snake River Plain, compared to the much broader “tectonic parabola” which is based on current-faulting and paleofaulting rates [e.g., Anders and Sleep, 1992]. Below 100 km depth, the surface wave data cannot uniquely resolve the width of the plume mantle. Indeed, the width of the deep low velocity zone is defined by the regionalization, derived from the body wave tomography [Yuan and Dueker, 2005; Waite et al., 2006].
 The 4.2 km/s velocity at 95 km depth beneath the Wyoming craton to the east of the hot spot (Figures 13d and 13e) is similar to the shear velocity for 4–20 Ma oceanic profiles [Nishimura and Forsyth, 1989]. This suggests that below 90 km the Wyoming province mantle temperature is consistent with a nominal mantle adiabat [Faul and Jackson, 2005]. Yet, xenoliths from the Eocene William Kimberlite in east Montana suggests a 170 km thick lithosphere existed here 50 Ma ago [Carlson et al., 1999]. To reconcile this thick Eocene-age lithosphere with the thinner modern day Wyoming lithosphere, we speculate that the Laramide-age slab hydrated the lithosphere [Humphreys et al., 2003] to enable subsequent post-Laramide convective destabilization and lithospheric thinning. This potential convective thinning may have been initiated or accelerated by the nearby Yellowstone plume. However, we reiterate that no plume material extends significantly beyond the confines of the YHT, albeit the entrainment of ambient mantle certainly does extend outward [Yuan and Dueker, 2005; Waite et al., 2006].
 The 3.8 km/s velocity minimum at 80 km under the YHT compares to minimum shear velocity of 4.0 km/s for the Hawaii to Oahu path [Priestley and Tilmann, 1999] and 3.8 km/s in the western arm of the East Africa Rift [Weeraratne et al., 2003]. Thus, the YHT uppermost mantle is one of the lowest velocity uppermost mantle regions on the Earth.
 Assuming significant velocity effects from melt porosity and hydration can be ruled out, this leaves variations in temperature [Karato, 1993; Godey et al., 2004; Faul and Jackson, 2005] and grain size [Faul and Jackson, 2005] as the primary cause of the low velocities. While a future paper will detail these calculations, a robust statement is that our lowest mantle velocities of 3.8 km/s require a mantle >50° warmer than a 1320°C potential temperature mantle with high confidence [Dueker et al., 2004].
 The narrow width of the plume track (100–200 km width) found by the tomographic body wave velocity images [e.g., Yuan and Dueker, 2005] and, consistent with this paper's findings, in tandem with lack of parabolic asthenospheric flow perturbations found by the SKS analysis [Waite et al., 2005], requires that the Yellowstone plume must have a small buoyancy flux as previous topographic analysis suggests [Sleep, 1990]. Isostatic topography modeling suggests a maximum plume excess temperature of 200°C (with fairly large errors) [Schutt and Humphreys, 2004]. A 200°C thermal anomaly would have plume material that is <31 kg/m3 lighter than surrounding mantle [Schutt and Lesher, 2006]. Assuming this buoyancy anomaly is distributed in a 60 km thick layer (60–120 km depth range) under the 80 km wide YHT, and using the North American absolute plate motion rate of 2.1 cm/a [Gripp and Gordon, 2002], a buoyancy flux of <0.1 Mg/s is calculated. This model assumes the mean upwelling rate of the Yellowstone plume is the same as the plate motion, based on the observation that there is no clear parabolic-like spreading of the plume. This contrasts with a previous estimation of 4.8 Mg/s from modeling dynamic elevation effects [Lowry et al., 2000] and a ∼9 Mg/s buoyancy flux for Hawaii [Sleep, 1990].
 A speculative possibility for the life cycle of the Yellowstone plume is that the plume was nucleated at the CMB due to the influence of the sinking Farallon slab. The slab has primarily subducted to the east of Yellowstone, and in at least one convection model, the slab turns over on itself near the core mantle boundary [Tan et al., 2002]. Downward flow from a slab has been shown to thin the lower thermal boundary layer under the slab, and thicken it outboard of the slab flow direction [Tan et al., 2002]. This thickened thermal boundary layer could serve as a stimulus to nucleate the plume. As the plume head nears the base of the lithosphere, preexisting lithospheric thickness gradients may channel the plume flow to produce the voluminous Columbia River and Steens basalt outpourings [Jordan et al., 2004]. After this initial impact stage about 18–16 ma ago, the plume tail then begins to primarily melt the lithosphere [Doe et al., 1982; Menzies et al., 1983] to create the Yellowstone hot spot track.
 An additional finding is that the thick high velocity lower crust found within the Wyoming Craton is consistent with the 7.1 km/s lower crust found by the Deep Probe experiment [Henstock et al., 1998; Gorman et al., 2002]. The coincident 5–8 km increase in crustal thickness and 5% increase in shear velocity suggest this high-velocity layer thickens to the east. Assuming this high velocity layer is due to basalt emplacement, and the crust above this layer has an average 3.6 km/s shear velocity, the increase in velocity and thickness can be explained as an ancient magmatic underplate, whose thickness increases from 7 to 19 km. This suggests a large input of magma associated with either the 2.6 Ga Stillwater Complex [McCallum, 1996] or the 1.8 Ga magmatic event recorded by Montana mantle xenoliths [Carlson et al., 1999].
 Measurements of fundamental mode Rayleigh waves show very low phase velocities along the Yellowstone hot spot track in the 30–60 s wave period band. Inversion for shear wave velocity structure resolves minimum mantle velocities of 3.8–3.9 km/s at 80 km depth beneath the hot spot track. These uppermost mantle velocities are among the lowest on the planet. Outside the hot spot track, asthenospheric mantle velocities are consistent with adiabatic mantle. We conclude that Yellowstone hot spot track mantle does not have enough water content nor melt porosity to produce the observed low velocities; hence, a significant temperature anomaly is required. This suggests the Yellowstone is hot with respect to adiabatic mantle. Combined with body wave tomograms that show a cylindrical low velocity feature extending into the transition zone, it is clear that Yellowstone is a thermal mantle plume, albeit a very weak one with no lower mantle extension.
 We thank two anonymous reviewers, one of the Associate Editors of JGR, and Editor J.C. Mutter for many helpful comments with the manuscript. We also wish to thank D. Forsyth for unselfishly sharing his surface wave code and for advice on how to use it, as well as D. Weeraratne, who also provided code and answered many questions. Data was provided by the IRIS DMC, whom we thank. We also thank the National Science Foundation for grants 0440432 and 0409538 and for an Individual Research and Development time provided to DS while working at the NSF.