In recent years, a number of relatively high resolution seismic tomography models for the uppermost mantle underneath the western United States have been published, and vigorous debate has ensued about their tectonic interpretation. I present a straightforward, yet quantitative, comparison between models in order to help establish a framework for geodynamic interpretation, and to help judge the role of tomographic theory vs. data selection. Mapped S and P wave anomalies are found to be remarkably consistent between models, which implies that seismologists are beginning to narrow down the structure underneath active continental margins to scales of ∼200 km. Large discrepancies between published anomaly amplitudes exist, however, and the models on the high end of the spectrum raise questions as to how they are to be interpreted in terms of temperature, composition, and melting.
 Seismic tomography provides a key tool to constrain the lateral variations in sub-surface wave speeds, infer composition and temperature, and interpret those models in terms of mantle and continental dynamics. With the advent of EarthScope USArray, numerous, high spatial-resolution seismic tomography models of the upper mantle have been presented. Given the geographic focus of PASSCAL type temporary deployments and the sense with which USArray moves across the United States (U.S.), i.e., west to east, particularly the westernmost regions of the sub-continental U.S. mantle have been mapped with ever increasing level of detail. Similar advances in resolving deep structure are expected for the central and eastern U.S., as well as other regions with dense networks, such as China, in the near future. It is therefore helpful to take stock and assess the degree to which different tomographic methods interpret the seismic data differently in terms of patterns and amplitudes of velocity anomalies. Particularly for the interpretation of velocity anomalies in terms of geodynamic models (e.g., of mantle flow), it is important to understand the degree of consistency and variation between models.
 Comparative seismological assessments have been previously conducted for global mantle tomography in a quantitative sense [e.g., Masters et al., 2000; Becker and Boschi, 2002], and recently also for the western U.S., yet mainly based on visual comparison and geologic interpretation (G. L. Pavlis et al., Unraveling the geometry of the Farallon plate with the USArray: Synthesis of three-dimensional imaging results from the USArray, submitted toTectonophysics, 2011). Such interpretative efforts are, of course, very important, but an independent, quantitative analysis (e.g., of typical amplitudes and correlations) provides a useful and complementary approach. Given assumptions about composition, tomography model amplitudes can be scaled to temperature anomalies [e.g., Goes and van der Lee, 2002; Cammarano et al., 2003; Stixrude and Lithgow-Bertelloni, 2007], implying that different model representations will lead to different conclusions as to the scaling of velocity to density anomalies and hence plate driving forces [e.g., Forte, 2007].
 It is clearly desirable to move beyond comparing models to models. Indeed, the true evaluation of relative tomographic model performance as well as the detection of mantle heterogeneity amplitudes and spectral character can only lie in tests of the explanatory power of a tomographic model compared to actual seismogram waveforms [e.g., Song and Helmberger, 2007]. On global scales, such forward (synthetic vs. observed) seismogram comparisons have been made possible using spectral element methods for long-period surface waves [Qin et al., 2009; Bozdag and Trampert, 2010]. Remarkably, efforts based on a posteriori tomographic model evaluation, such as the construction of the composite, lowest common denominator model SMEAN by Becker and Boschi , have proven very successful in explaining the original data [Qin et al., 2009], while yielding superior performance in geodynamic modeling [e.g., Steinberger and Calderwood, 2006].
 Thus, while regional waveform modeling tests on a western U.S. scale are under way, for now, I proceed to formally analyze regional tomographic models, taking them as different interpretation of Earth structure, and discussing the resulting variations in the context of theoretical modeling and data selection choices.
 I consider the most complete set of recent, western U.S. tomographic models that were available to me at the time of writing. My selection of models is very similar to those discussed by Pavlis et al. (submitted manuscript, 2011), and I refer to that paper and the original publications for an in-depth mapping and tectonic interpretation of structure and details of the tomographic approaches. In the following, I briefly introduce the different models, using the original acronyms, if available, and concatenations of author last name initials if not.
DNA09-P/S: P and S wave anomaly models by the Berkeley group based on teleseismic body wave arrivals from USArray [Obrebski et al., 2010] with multiple-frequency measurements, inverted independently.
DNA10: Swave model based on DNA09-S with additional constraints from fundamental mode surface waves [Obrebski et al., 2011], improving overall resolution in the uppermost mantle.
SH11-P/S: P and S wave models by the Oregon group, obtained from separate inversions of body waves using approximate finite frequency kernels [Schmandt and Humphreys, 2010, 2011], similar to DNA-09, but with a larger dataset including regional, PASSCAL type studies.
SH11-TX: Swave model that was computed similarly to SH11-S [Schmandt and Humphreys, 2010], but using the global, TX2008 S wave model of Simmons et al.  rather than a 1D reference model outside the regional tomography domain (B. Schmandt, personal communication, Aug. 2011).
NWUS-P/S: P and S wave models by the Carnegie/ASU groups [Roth et al., 2008; James et al., 2011], based on first arrivals, ray theory, regional tomography, and arrived at with separate inversions.
MIT-P: A global P wave anomaly model, akin to that of Li et al. , with regionally variable resolution making use of the USArray deployment for improved western U.S. resolution [Burdick et al., 2008, 2010], updated as of March 2011.
SFTS11: Finite frequency, P wave tomography by Sigloch  based on multifrequency band measurements from teleseismic arrivals, using a regionally refined, global tetrahedral mesh [Sigloch et al., 2008].
 I first represent all tomographic models, with any mean offset compared to a 1D reference model removed, at the original layers and grid spacing when regularly spaced voxels were provided, or I interpolate using the Generic Mapping Tools surface program [Wessel and Smith, 1998] to ∼0.15° × 0.15° when irregular grids were used (typically using the spline equivalent T = 1 tension of Smith and Wessel , though this choice does not affect results significantly). Regions without data or below 40% hit count from the SH11 models are masked out. All values are given as relative anomalies
in % with respect to the reference models.
 While most models are available for a wider region, I focus on the domain of maximum model overlap from −125 to −107.5°W and 35.5 to 49°N, mainly determined by the regional extent of NWUS. However, my general analysis of model character (e.g., root mean square (RMS) amplitude vs. depth) is not very sensitive to this geographic restriction. When computing cross-model correlations, I first linearly interpolate all models to the same depth level, refine the gridded representation to uniformly 0.1° × 0.1°, and then sample (using grdtrack [Wessel and Smith, 1998]) at roughly even area spaced locations to generate pairs of data for all sites where both models are defined. From these sets of typically ∼3,500 points, I compute linear (Pearson) or Spearman rank [e.g., Press et al., 1993, p. 640] correlation coefficients, as well as best-fit linear regression slopes for scalings between models (allowing for errors in both “x” and “y” values).
 The simple metric of correlation suffers from well-known biases, and wavelet methods may be superior for length-scale dependent, regional analysis [e.g.,Piromallo et al., 2001]. However, correlation provides a first order estimate of model match. To account for the different spatial frequency content of models (“smooth” vs. “rough”) in an approximate way, I also construct low-pass filtered versions (using grdfft [Wessel and Smith, 1998]), applying a 20% tapering transition such that, e.g., a 250 km low-pass tapers out short wavelengths starting at 300 km smoothly such that none below 250 km remain.
 All models are based on different data and measurement methods, use different theoretical approaches (e.g., ray theory vs. finite frequency), crustal corrections (e.g., pre-determined vs. part of the inverse problem), and employ different parameterization and inversion choices. However, the philosophy behind DNA09, SH11, and NWUS is, broadly speaking, similar in their regional, teleseismic, body wave methodology. I therefore also construct two mean models, forS and Panomalies, by averaging the three respective models after ensuring that their depth-averaged RMS anomalies are upscaled to SH11. Without further confirmation as to the theoretical basis for amplitude differences between models, this should be considered an arbitrary choice. It is irrelevant for the cross-model pattern comparisons, but picking SH11 leads to large-amplitude mean models (see below).
 I call the lowest common denominator models which result from this averaging procedure SMEAN-WUS and PMEAN-WUS [cf.Becker and Boschi, 2002], and also provide plots of the standard deviation of the averaged models [cf. Lee et al., 2011], noting that the number of “independent data points”, three, is not overwhelming. Such “stacked” models may provide, however, an ad hoc“reference” for the most commonly mapped features in regional tomography. MEAN-WUS models are available, along with a simple mapping interface, fromhttp://geodynamics.usc.edu/∼becker; see, e.g., Pavlis et al. (submitted manuscript, 2011) for more advanced visualization and unified data access to the other models.
Figures 1 and 2 show δvP and δvS anomalies for DNA09, SH11, and NWUS for four depths in the upper mantle. It is apparent that, when corrected for amplitude differences, models of the western U.S. upper mantle are generally consistent. This motivated the construction of the regional mean models, which are shown with their standard deviation alongside the originals. I chose to display the maps at the indicated levels because ∼150 and 600 km depth models are relatively speaking the most similar, and the depths of ∼50 and 400 km relatively dissimilar (see below). While the geodynamic interpretation of the mapped features in terms of temperature vs. fractionation or melting anomalies is debated, the general anomaly patterns appear robust.
 As has been discussed extensively before (e.g., Pavlis et al., submitted manuscript, 2011), there are numerous intriguing and consistently mapped features in Figures 1 and 2. For example, the top layers show a clear signature of the Juan de Fuca slab as a coherent structure and a dominant slow signal underneath Yellowstone, which might finger into two linear features toward the southwest, roughly in the direction of absolute plate motion. At greater depths, the fast anomaly structure appears segmented into a northern and southern, V-shaped part, suggesting an irregular and perhaps torn slab structure, as might be expected, e.g., given changes in plate motions [Bunge and Grand, 2000; Tan et al., 2002; Liu et al., 2008]. Underneath Yellowstone, the slow anomaly at shallow depths is replaced by an isolated fast anomaly at ∼400 km, and a broader slow anomaly at larger depths. This implies that if there is a hot plume conduit from the deep mantle to the hot spot, it is deflected, disrupted, or pulsating. Alternatively, the melting anomaly may be related to upper mantle convection induced by the slab itself [e.g., Xue and Allen, 2007; Schmandt and Humphreys, 2010; Obrebski et al., 2010; Faccenna et al., 2010; James et al., 2011; Tian et al., 2011]. Other interesting, consistent features include the structure underneath the Colorado Plateau, mapped as a ring of low velocity material around a fast or average core at ∼50 km depth, and underlain by relatively slow anomalies at 400 km depth, respectively.
 What is also apparent is that mapped anomaly amplitudes vary widely among published tomographic models. Figure 3shows the depth-dependence of the anomaly strengths for all models considered here. TheP wave models fall into a low and a high amplitude group, with NWUS and SFTS11 at the low and the high end in terms of RMS, respectively. For S wave tomography, the range of models is bracketed by NWUS on the low and SH11 on the high end, with peak variability between models at the highest RMS levels (∼100 km) of factors of six or higher.
 This RMS difference largely reflects choices at to the regularization (“damping”) of the mixed-determined inverse problem that tomography represents, but it is interesting that models which differ in terms of amplitudes by a large amount still show very similar patterns (Figures 1 and 2). As has been discussed elsewhere, models such as SH11-S show large velocity anomaly variations even outside likely high partial melt regions such as underneath Yellowstone, e.g., increases in wave speed of ∼8% over ∼200 km distance at 150 km depth, which corresponds to a temperatureT ∼ 500 K increase, using δvS/dT = −15 ⋅ 10−5 K−1 [Stixrude and Lithgow-Bertelloni, 2007].
 Besides inversion choices (e.g., regularization) which are hard to rigorously select for different theoretical approaches [e.g., Boschi et al., 2006], data selection appears to contribute a lot to the amplitude variations between models. For example, for δvS, addition of surface wave information for DNA10 increased the shallow RMS strongly compared to the body wave only DNA09. This is a desirable effect, as body waves typically have fairly poor vertical resolution in the upper ∼150 km because of predominantly vertical raypath incidence. The RMS difference between DNA09 and DNA10 is then likely due to better resolved uppermost mantle structure for DNA10 thanks to the surface waves [Obrebski et al., 2011], with the caveat that the lateral resolution of both datasets is quite different [cf. Tian et al., 2011]. SH11 has more clustered regional raypath information than most other models because of the addition of temporary deployments to the USArray data. The added data and the SH11 model representation seem to lead to high amplitudes which are, however, also seen in DNA10 and the full finite frequency approach of SFTS11. Indeed, finite frequency inversions may reduce smearing and so lead to higher amplitude, and more focused anomalies than ray-theoretical approaches [e.g.,Hung et al., 2004].
 Exploring the effect of the reference model, we can compare the RMS of SH11 and SH11-TX. Particularly deep mantle structure below ∼500 km is reduced when using the global TX2008 tomography [Simmons et al., 2007] as a reference, i.e., there is less of a need of the regional inversion to explain all teleseismic delay times. This serves as a note of caution when considering deep structure of regional tomography, even at the ∼1500 km aperture of the westernmost footprint of USArray. From Figure 3, anomaly amplitudes are in general much reduced below the thermo-chemical boundary layer at depths ≳400 km. This signal is likely real, but it is unclear how well detailed patterns are constrained underneath the western U.S. at present. I will therefore focus on the regions above 500 km subsequently.
 Comparison of Figures 1 and 2 shows that δvP and δvS maps from the regional models show broadly similar patterns. Ratios of the two anomalies can be used to distinguish a thermal vs. compositional or melting origin of velocity variations [e.g., Cammarano et al., 2003], and it is clear that, regionally, particularly low shear wave anomalies are too large compared to compressional wave anomalies to be of thermal origin [e.g., Schmandt and Humphreys, 2010]. However, it is also interesting to compare the overall match of S and P models with depth (Figure 4). Within the well correlated depth levels, above ∼300 km, the
ratio is mainly in a plausible thermal range of R ≲ 1.8 [Karato, 1993; Cammarano et al., 2003], implying that, for the whole domain, compositional or melting anomalies are not dominating any thermal origin of lateral variations in velocity. I also explored the lateral variations in Rusing best-fit linear regression slopes based on local sampling, and there is some indication ofR ≳ 2.5 along the Snake River Plain, and south of it, in NWUS and SH11 at ∼150 km depth. However, such deviations from a simple temperature scaling based on R have to be explored more carefully, taking the resolution of different data types and regional estimates of attenuation into account.
Figure 5 quantifies the degree of model pattern similarity using the linear correlation coefficient computed at different depth layers. For the Pmodels, SH11 is closest overall to the PMEAN-WUS model among the three models that went into its construction. Among the otherδvPmodels, the agreement with both PMEAN-WUS and SH11 is least pronounced for SFTS11, perhaps due to different parameterization and crustal correction choices [cf.Sigloch, 2011]. However, even SFTS11 matches the other δvPmodels at a ∼0.6 level throughout the upper mantle. In terms of depth-dependence, ∼50 km layers are most different between models, which is expected given that different approaches to crustal corrections may affect shallow structure the most. At larger depths, ∼400 km layers are least well correlated, while models strongly agree (correlation ∼ 0.8) at ∼150 km. Similar depth-dependence is found for theδvS model comparisons in Figure 5. However, now NWUS is the most “common” model, and SH11 deviates more from SMEAN-WUS. DNA09 deviates from DNA10 most above ∼200 km, as expected given the typical resolving power of fundamental mode, surface wave phase velocity measurements. As for theδvPmodels, correlation is highest between ∼150 and 200 km, then shows a low at ∼400 km, to increase again somewhat a larger depths. The comparison between SH11 and SH11-TX shows that, while the regional representation is very similar irrespective of reference model, there are subtle pattern changes even at the well constrained depths ≲200 km.
Figure 6shows the total cross-correlation for all tomographic models, sorted byP and S wave models, for the original parameterization (Figure 6, left) and a long-wavelength filtered version (Figure 6, right). Our mean models PMEAN and SMEAN-WUS show higher correlations than other models with the respective tomography models that were not used for averaging (MIT-P, SFTS11 and SH11-TX, DNA10). They also show a higher correlation with models of the different wave type (i.e., PMEAN-WUS withδvSand SMEAN-WUS withδvP models), justifying my attempt of providing lowest common denominator estimates a posteriori. In general, the total correlations for P and S wave models are of order ∼0.7 at the relatively high resolution, ∼200 km scales of regional tomography (Figures 1 and 2), which can be compared to the match of global tomographic models, typically ∼0.6 at the longest, ≳2500 km wavelengths [Becker and Boschi, 2002]. STFTS11 and DNA10 are the least similar among their respective groups, though this is, of course, not to say that they are “worse”. Rather, they may provide a “better” representation of Earth structure, because of different theoretical procedures [Sigloch, 2011] or data selection [Obrebski et al., 2011]. Both NWUS-S/P and SH11-S/P are overall more similar in terms ofδvS vs. δvP than DNA09 (cf. Figure 4).
 To evaluate if the differences between models can be explained by their different short wavelength structural content (cf. Figures 1 and 2), I also computed cross-correlations for models that were filtered to suppress structure at scales shorter than ∼250 km (Figure 6, right). This increases the match between models, in general, as expected if larger-scale structure can be more robustly imaged, increasing correlation closer to ∼0.8. In particular, DNA10 is more similar to other models in the long-wavelength representation. However, SFTS11 still provides a significantly different representation of structure than the other models, and correlations for that model with someSwave models are actually worse if short-wavelength structure is filtered out.
 Simple linear correlation and RMS analyses are clearly only the first steps in a comparative study of regional tomography. However, such tests provide an important baseline for evaluating the detailed western U.S. tomography models for quantitative geodynamic interpretation. The standard deviations and visual analysis of Figures 1 and 2 give a good first idea of regions where different data and inversion choices lead to robust representations of mantle structure, and where models still differ significantly to make tectonic interpretations reliant on specific approaches.
 Given how different approaches (e.g., surface wave included vs. body wave only models with more regional data, ray theory vs. finite frequency) lead to similar results, the general agreement between models may guide future efforts on improving structural representations. The comparison between the regional SH11 and the SH11-TX version, which uses a global tomography model to correct for structure outside the domain, is a reminder of the trade-offs that arise when performing regional tomography with teleseismic arrivals. While such problems are, of course, well known [e.g.,Evans and Achauer, 1993], the analysis highlights the remaining resolution challenges even in high density data regions.
 Nonetheless, patterns in the upper ∼500 km of the western U.S. mantle appear to be now robustly mapped. Conclusions based on direct scaling of tomography to temperature would, however, lead to vastly different answers given the RMS variations between models. Such amplitude uncertainties in tomography are also expected, though perhaps not at the level seen in Figure 3. Besides damping and data choices, non-linear inversion such as re-computation of travel times with raytracing for a 3D model would likely enhance the amplitudes of models such as NWUS-P/S [e.g.,Widiyantoro et al., 2000]. Yet, δvS/δvP ratios are similar between models, and roughly consistent with a thermal origin in the bulk of the volume, with the caveat of still having to reevaluate the physical implications of large velocity anomalies such as in models SFTS11 and SH11.
 An important question is that of the typical length-scales of heterogeneity in the upper mantle, how those power spectra change with depth, and if we will see a transition to tectonically simpler structures once USArray has reached the East coast with its older and thicker continental lithosphere. The maps shown inFigures 1 and 2, and spatial Fourier analysis of them, are broadly consistent with an interpretation as showing pronounced, small-scale convection in the upper ∼400 km of the mantle, transitioning toward smoother structures at depth. Such behavior has been associated with interactions between complex, slab-induced currents and lithospheric instabilities in other tectonically active regions [e.g.,Faccenna and Becker, 2010]. However, the degree to which the change in spectral character with depth is controlled by the loss of resolution given the regional data sets is not clear at this point, and requires further study.
 Published models of the uppermost mantle shear and compressional wave structure underneath the western United States agree to a remarkable degree. This implies that methodological differences and inversion choices are less important than data selection, and tectonic interpretation of patterns is on a sure footing thanks to the efforts of seismologists and EarthScope instrumentation. However, the amplitudes of heterogeneity in tomographic models are hugely different, which raises important questions as to their interpretation in terms of temperature anomalies and driving mantle flow. It also highlights key issues for further study, including the apparent change in the character of upper mantle, small-scale convection with depth.
 I thank Editor Jim Tyburzcy for handling this paper, Matt Fouch and David James for their helpful and encouraging reviews, and all seismologists who make their results available in electronic form. Without their efforts and willingness for open scientific exchange this study would not have been possible. While this in no way constitutes an endorsement of my conclusions, R. Allen, M. Fouch, B. Schmandt, and K. Sigloch were very helpful in addressing my questions and provided comments on a previous version of this manuscript. G. Pavlis kindly shared a pre-print. Most figures were created with the Generic Mapping Tools byWessel and Smith . This research was supported by NSF-EAR 0910985.