Understanding the geotectonic evolution of the southeastern Tibetan plateau requires knowledge about the structure of the lithosphere. Using data from 77 broadband stations in SW China, we invert Rayleigh wave phase velocity dispersion curves from ambient noise interferometry (T = 10–40 s) and teleseismic surface waves (T = 20–150 s) for 3-D heterogeneity and azimuthal anisotropy in the lithosphere to ∼150 km depth. Our surface wave array tomography reveals (1) deep crustal zones of anomalously low shear wave speed and (2) substantial variations with depth of the pattern of azimuthal anisotropy. Upper crustal azimuthal anisotropy reveals a curvilinear pattern around the eastern Himalayan syntaxis, with fast directions generally parallel to the main strike slip faults. The mantle pattern of azimuthal anisotropy is different from that in the crust and varies from north to south. The tomographically inferred 3-D variation in azimuthal anisotropy helps constrain the source region of shear wave splitting. South of ∼26°N (off the high plateau) most of the observed splitting can be accounted for by upper mantle anisotropy, but for stations on the plateau proper (with thick crust) crustal anisotropy cannot be ignored. On long wavelengths, the pattern of azimuthal anisotropy in the crust differs from that in the mantle. This is easiest explained if deformation varies with depth. The deep crustal zones of low shear wave speed (and, presumably, mechanical strength) may represent loci of ductile deformation. But their lateral variation suggests that in SE Tibet (localized) crustal channel flow and motion along the major strike slip faults are both important.
 Our study region in SW China encompasses the southeastern margin of the Tibetan plateau and the so-called “indenter corner” between the eastern Himalayan syntaxis (EHS) and Sichuan basin, a mechanically rigid part of the Yangtze or South China craton. This area is of significant societal and academic interest because of the high level of seismicity and because of the opportunities it offers to study the mechanisms for the buildup and eastward expansion of the Tibetan plateau. Tectonic deformation in this southern extremity of the trans-China seismic belt is often accompanied by large earthquakes, mostly along the major strike slip faults (such as the Xianshuihe-Xiaojiang faults) and (less frequently) along (reactivated) thrust belts, such as the devastating 12 May 2008 Wenchuan earthquake along the Longmen Shan fault zone [Burchfiel et al., 2008; Hubbard and Shaw, 2008; Liu-Zeng et al., 2009]. Deformation in this region is influenced by northward subduction of the Indian lithosphere [Yin and Harrison, 2000; Li et al., 2008], eastward subduction of the Burmese microplate along the Burma arc [Ni et al., 1989; Huang and Zhao, 2006; Li et al., 2008], resistance to further eastward expansion of the Tibetan plateau by Sichuan basin [Cook and Royden, 2008], and, probably, by upper mantle processes related to subduction much further to the southeast [Replumaz et al., 2004; Royden et al., 2008; Li and van der Hilst, 2010]. The crust between EHS and Sichuan basin has undergone clockwise deformation around the EHS [Zhang et al., 2004; Shen et al., 2005], but the mechanisms for deformation are not yet well understood.
 Geological studies suggest that in the past 15 Ma uplift of the eastern part of the Tibetan plateau has occurred with relatively little crustal shortening or eastward motion of the deformation front (for a recent review, see Royden et al. ). Along with the regional topographic gradients, this has been explained by influx of crustal material from the central plateau through ductile channel flow in the deep crust [Royden et al., 1997; Clark and Royden, 2000; Shen et al., 2001; Cook and Royden, 2008]. However, the occurrence and importance of such crustal flow has been the subject of heated debate.
 The case for channel flow beneath eastern Tibet is not straightforward, however. On the one hand, uninterrupted flow over large distances is not easily reconciled with the strong lateral heterogeneity of the low shear speed zones revealed by surface wave array tomography [Yao et al., 2008] and receiver functions [Xu et al., 2007; Wang et al., 2010], unless, of course, flow occurs also in parts of the crust outside the zones of extreme wave speed or in the shallow mantle (or both). On the other hand, some studies have argued for vertically coherent deformation between crust and mantle [e.g., Flesch et al., 2005], which, if true, may be difficult to achieve if the viscosity of the middle or lower crust is several orders magnitude less than that of the upper rigid crust (as proposed in the flow models of Royden et al.  and Clark and Royden ). The level (and lateral extent) of mechanical coupling or decoupling is not yet well known, however.
 Joint analysis of GPS, surface geology, and shear wave splitting measurements has been used to argue for vertically coherent deformation in the crust and upper mantle in Tibet but decoupling beneath Yunnan, SW China [Flesch et al., 2005]. Using shear wave splitting observations, Lev et al.  also argue for crust-mantle decoupling beneath Yunnan, but they cannot constrain the level of coupling beneath the plateau proper. On the basis of shear wave splitting and the surface strain field from GPS data, Sol et al.  argue for mechanical coupling across the crust-mantle interface beneath much of SE Tibet (except Yunnan), and Wang et al.  argue for crust-mantle coupling in Tibet and surrounding regions (including Yunnan) from joint analysis of more shear wave splitting measurements and GPS observations.
 It is important to realize that these interpretations of shear wave splitting and GPS data rely on assumptions about the origin of seismic anisotropy and the relationship between this anisotropy and the GPS data. Under the assumption that observed shear wave splitting is produced by (azimuthal) anisotropy in the upper mantle, the congruity of splitting orientations and crustal structure or strain inferred from GPS has been used to suggest coherent crust-mantle deformation. The depth of the anisotropy that causes shear wave splitting is, however, not well known because the observed splitting represents the integrated effect of anisotropy along steep raypaths [e.g., Savage, 1999]. Furthermore, comparing GPS data with strain induced anisotropy is not straightforward: (1) the GPS surface velocity field depends on the geographical reference frame, whereas strain does not; (2) the strain rate derived from the spatial gradient of the velocity field reflects the present-day (instantaneous) rate of surface deformation, whereas seismic anisotropy is influenced by (finite) strain history; and (3) without geodynamical modeling the GPS observations (velocity or strain rate) provide little insight about deformation in the deeper crust. Thus, inferences about crust and mantle deformation (e.g., coupling or decoupling) from comparison of the instantaneous surface strain rate field from GPS and shear wave splitting data have considerable uncertainty, especially in regions with a complicated deformation history and thick and highly deformed crust, such as the Tibetan plateau.
 Knowing how the azimuthal anisotropy that produces shear wave splitting varies with depth would greatly help us understand the deformation of crust and upper mantle. Surface wave tomography has been used to constrain upper mantle shear velocity heterogeneity and azimuthal anisotropy on regional scales [e.g., Griot et al., 1998b; Simons et al., 2002], but to resolve crustal structure we need measurements at shorter periods than used in those studies. We follow Yao et al. [2006, 2008] and combine short-period phase velocity dispersion measurements from ambient noise interferometry with longer period data from two-station analysis. We present 3-D models of the heterogeneity and azimuthal anisotropy of the lithosphere beneath the southeastern part of the Tibetan plateau (including western Sichuan and northern Yunnan of SW China) and compare the anisotropy thus derived with shear wave splitting measurements. The main questions we aim to address are as follows: (1) Can this surface wave data resolve 3-D heterogeneity and (azimuthal) anisotropy in the crust and uppermost mantle? (2) How does the tomographically inferred 3-D anisotropy compare with results from shear wave splitting? (3) What are the implications for our understanding of the deformation of SE Tibet?
2. Data and Dispersion Analysis
 In 2003 and 2004, MIT, with Chengdu Institute of Geology and Mineral Resources (Sichuan, China), deployed 25 broadband seismograph stations between Sichuan basin and the eastern Himalaya syntaxis (EHS). For the study presented here, we use data from the MIT array along with data from a 50 station (temporary) array deployed in the same period by Lehigh University and permanent stations in Kunming (KMI) and Lhasa (LSA), which are part of the Global Seismographic Network (Figure 1a).
 Following Yao et al. [2006, 2008], we measure for all station pairs the interstation Rayleigh wave phase velocity dispersion from empirical Green's functions (EGFs), which are constructed from ambient noise correlation of about 1 year data and from traditional two-station (TS) analysis, which uses teleseismic earthquake data. The EGF approach yields measurements for periods T = 10–50 s, which are most sensitive to crustal structure, and the TS method yields measurements for periods T = 20–150 s, which constrain structure in the deep crust and lithospheric mantle.
 Measurement from EGFs and the joint interpretation of data measured from EGF and TS analysis should be done with care, and two aspects deserve special attention. First, uneven distribution of ambient noise energy can combine with structural heterogeneity and anisotropy to produce an azimuth-dependent bias of interstation phase velocity measurements from EGFs. Such bias could obstruct inversion for azimuthal anisotropy, but in previous studies we have shown that for the arrays in SE Tibet the effect is negligible if we stack causal and anticausal EGFs from 10 monthlong records [Yao et al., 2009; Yao and van der Hilst, 2009].
 Second, for the periods they have in common (T = 20–50 s) the phase speeds measured from EGFs generally agree with those from the TS method, but for some paths they appear slightly slower. This can be explained by the different sensitivities to heterogeneity. Our inversions are based on ray theory, in which EGF and TS dispersion are both assumed to be sensitive only to structure along the path between two stations. For EGF this is reasonable, but for the periods considered the interstation phase velocity measurements from TS analysis also sense structure outside the two-station path (Appendix A, Figure A1). If the structure between two stations is slow compared to the broader region sampled by the surface waves used in the TS approach, then ray theory (and EGFs) will yield lower phase velocities than TS. In our study, the distribution of sources (Figure A2a) and the regional variation in propagation speed (Figure A2b) combine to produce a bias on the order of 0.4%–0.8% for some paths (Figure A3). Although this bias is much smaller than the inferred variation of phase velocity (section 3, below) we correct for it in order to suppress effects of crustal thickness variations outside the array area.
 This approach to accounting for a finite frequency effect is ad hoc, but not unreasonable since the entire framework for inversion (e.g., the notion of constructing phase velocity maps followed by point-wise inversion for 1-D wave speed profiles, which are then combined into a 3-D model) is based on asymptotic ray theory. Indeed, given the fundamentally asymptotic nature of the inversion a more explicit use of finite frequency kernels would be less meaningful than it might appear. Developing a full-wave approach requires addressing nonlinearity (because the kernels depend on the 3-D variations in model parameters [e.g., de Hoop et al., 2006]) and the fact that EGFs are approximations of real Green's functions (with the success of their reconstruction depending on, for instance, the distribution of ambient noise sources). Since finite frequency kernels depend critically on the measurement [de Hoop and van der Hilst, 2005], determining the real sensitivity of EGF derived phase speed to Earth structure is not trivial.]
Figure 2 presents histograms of the difference in interstation dispersion curve from EGF and TS analysis (after the finite frequency correction) for T = 20–50 s. At each period, the mean difference between phase velocities measured from TS and EGFs (CTS − CEGF) is almost zero, but the standard error of the difference increases from ∼0.05 km/s at T = 20–40 s to 0.08 km/s at T = 50 s. This reflects the difficulty of recovering EGFs from long-period data. Following Yao et al.  we average the EGF and TS dispersion data for T = 20–40 s (yielding 2232 dispersion curves). For T < 20 s, we take measurements from the EGF analysis, and for T > 40 s, we only use measurements from the TS analysis.
 We thus obtain 2413 dispersion curves for T = 10–150 s. The number of measurements at each period is shown in Figure 3a and the average phase velocity dispersion curve (representative of the entire region under study) is shown in Figure 3b. The decrease of the number of measurements with increasing period for the EGF analysis is an effect of the far field approximation (which allows us to use a plane wave representation of surface wave propagation); likewise, the decrease in TS measurements results from the requirement that the two stations are at least half wavelength apart [Yao et al., 2006]. Ray path coverage (shown in Figure 4 for T = 30 and 100 s) is excellent.
3. Phase Velocity Maps and Azimuthal Anisotropy
 We use the continuous regionalization due to Montagner  and the generalized inversion scheme of Tarantola and Valette  to invert path averaged phase velocities at each period for 2-D phase velocity variations. Following Smith and Dahlen , and ignoring 4ψ terms, we express the local azimuthally varying Rayleigh wave phase velocity c(ω, M, ψ) at location M for each angular frequency ω and azimuth ψ as
where c0(ω) is the reference phase velocity (usually the average of all observed phase velocities at a certain frequency) and a0 and ai (with i = 1, 2) are the isotropic phase velocity perturbation and the azimuthally anisotropic coefficients, respectively. The inversion for ai (i = 0, 1, 2) is controlled by three parameters: the standard error of phase velocity measurements σd, the a priori parameter error σp (which constrains the anomaly amplitude), and the spatial correlation length Lc (which constrains the smoothness of the model parameters). Following Griot et al. [1998a] and Simons et al. , we perform “checker board” resolution tests to determine how well these inversion parameters can be retrieved. The lateral resolution of isotropic phase speed variations is ∼100 km (roughly, the average interstation distance) in the array area (Figure 5b). For the azimuthally anisotropic parameters, the lateral resolution reaches ∼200 km for T < 100 s (Figure 5d). At T > 100 s, the azimuthal anisotropy is not well resolved due to relatively poor azimuthal path coverage of measurements at these periods with paths dominated by SSE-NWW direction and because ray theory at long periods (wavelengths > 400 km) may not be valid in a relatively small array region (1300 km × 800 km).
 [NB “Checker board” resolution tests only show how well a given inversion algorithm can retrieve structure from a given data set. Their diagnostic value is limited, however, since they assess neither the effect of realistic data quality (systematic errors, for instance, due to off great circle propagation of surface waves for TS analysis, are difficult to simulate) nor the shortcomings of linearization, inadequacies of the theory used for wave propagation (since often the same theory is used for the forward and inverse problem), or the fact that resolution depends on the shape and scale lengths of heterogeneity [Lévêque et al., 1993; van der Hilst et al., 1993]. With these caveats in mind, checker board tests give qualitative information about spatial resolution.]
 Our analysis of phase velocity measurements suggests that σd is about 1%–2%, and for the inversion we set it to 2% for all measurements. As with regularization, the choice of σp and Lc is somewhat subjective; in our study, they are determined empirically from a series of test inversions [Griot et al., 1998a]. For a given a0, σp is set to be twice that of the standard deviation (in percent) of all observed phase velocities at each period with a minimum value of 0.15 km/s. For a1 and a2, σp is set to be 1.5% of the average phase velocity at each period. The correlation length Liso for the isotropic term is set to be about 100–150 km, determined by the path coverage at each period. In order to obtain robust patterns of azimuthal anisotropy, the correlation length for the azimuthally anisotropic parameters is set to be 2Liso at the corresponding period. Similar to the study of the Australian lithosphere [Simons et al., 2002], the tests show that the isotropic part of the solution is insensitive to the choice of the azimuthally anisotropic parameters.
 The variation of isotropic phase velocities and azimuthal anisotropy at periods 10, 30, 60, and 100 s are shown in Figure 6. Figure 7 shows that in the array area the posterior errors in the isotropic phase velocity and in the magnitude of azimuthal anisotropy are small compared to the perturbation of phase velocities and the magnitude of azimuthal anisotropy. This demonstrates the reliability of the results (for the periods of interest).
 Even without inversion for 3-D structure, we can readily see some interesting features from the isotropic phase velocity and azimuthal anisotropy maps (Figure 6). For example, at T = 30 s Rayleigh wave propagation is slow beneath the plateau area and fast beneath the Yangtze block in SW China, which primarily reflects the difference in crustal thickness in this area (see Figure 1b). At T = 60 s, low phase velocities are observed along the western margin of Yangtze block, which may indicate that at mantle depths the shear velocity is relatively low around the block boundary. At T = 10 s, at which the Rayleigh wave is mostly sensitive to structure between ∼5 and 15 km depth, the fast polarization axes of Rayleigh waves reflect a curvilinear pattern around the eastern Himalayan syntaxis that is also conspicuous in the GPS surface velocity field [Zhang et al., 2004]. At T = 60 s and T = 100 s the fast polarization pattern is different from that at 10 s, giving a first indication that anisotropy in the shallow crust differs from that in the upper mantle.
4. Inversion for Shear Wave Speed and Azimuthal Anisotropy
Yao et al.  used the neighborhood algorithm (NA) [Sambridge, 1999a, 1999b] to invert for the isotropic wave speed variations. NA is computationally expensive, however, and for the large number of parameters considered here it may not yield accurate results. Therefore, we combine NA and linearized inversion in a stepwise approach: NA is used to obtain (point-wise) 1-D models of isotropic structure (with uncertainties), which are then used in linearized inversion for 3-D heterogeneity and anisotropy.
 In the first step, for each point on a regular grid we use NA to estimate a 1-D profile of isotropic Vs using the phase speed as a function of period (T = 10–150 s) inferred from the phase velocity maps. We constrain nine model parameters: Moho depth and Vs of three crustal layers (each with similar thickness) and five upper mantle layers (Moho–90 km, 90–130 km, 130–170 km, 170–220 km, 220–300 km). To account for the large variation in Moho depth (40–75 km), for each grid point we use a reference model with crustal thickness obtained from receiver functions [Zurek et al., 2005; Xu et al., 2007] or from the global reference model Crust 2.0 (http://igppweb.ucsd.edu/~gabi/crust2.html). The search range for Moho depth is ±5 km around this reference, and the reference for Vs as well as the permissible search range (for each layer) is taken from Yao et al. .
 In the second step, we use the point-wise 1-D isotropic Vs model obtained from NA as the starting point for linearized inversion for Vs azimuthal anisotropy in the crust and upper mantle [Montagner and Nataf, 1986]. That is, at each grid point we use a different reference model. At location M, the Rayleigh wave (isotropic and azimuthally anisotropic) phase velocity perturbation δcR (M, ω, ψ) is expressed as
The parameters A, C, F, L, N describe the equivalent transverse isotropic medium with a vertical symmetry axis, which implies averaging over all azimuths. The amplitudes of the cosine and sine terms (Bc, Bs, Gc, Gs, Hc, Hs) constrain the 2ψ azimuthal variation of A, L, and F. For each grid point, the kernels ∂cR/∂pi are calculated from the local 1-D profile (from NA) using normal mode theory [Montagner and Nataf, 1986]. In (2) the integration is from the surface (z = 0) to the maximum depth of the inversion (z = H), which we set to 280 km, and Δh is the normalization thickness for the calculation of sensitivity kernels. In this inversion, we consider the posterior error in phase velocity (section 3, Figure 7), and final errors in the model parameters are estimated from the posterior covariance matrix. We use a Gaussian correlation function with a correlation length that increases linearly from 20 km at the surface to 40 km at 280 km depth. Rayleigh phase velocities are mainly sensitive to L, that is, the derivatives with respect to A in the upper mantle, C, and F are small, so that mainly three parameters (L, Gc, Gs) can be resolved [Montagner and Nataf, 1986; Simons et al., 2002] although all elastic parameters in (2) and density are simultaneously inverted for. Finally the azimuthally anisotropic velocity of vertically polarized shear wave is given by
where ρ is the density, and since Gc and Gs are usually much smaller than L we can write,
where βSV = is the isotropic part of the vertically polarized shear wave speed. The strength of azimuthal anisotropy is ASV = and the azimuth angle of fast polarization axis is = tan−1(Gs / Gc).
 We note that, according to (2), the sensitive kernel for (Gc, Gs) is ∂cR/∂L. Indeed, the objective of the stepwise approach is to obtain with NA optimal estimates of the 1-D isotropic shear wave speeds (i.e., L) at each grid point, which allows (for each location) the calculation of sensitivity kernels ∂cR/∂L for the subsequent inversion for the azimuthal anisotropy parameters Gc and Gs.
5. The 3-D Heterogeneity and Azimuthal Anisotropy
Figure 8 shows the lateral variation of shear wave speed and azimuthal anisotropy in the crust and upper mantle (obtained from the 1-D profiles described above; see also the examples in Appendix B and Figure B1), and Figures 9 and 10 show, respectively, the wave speed and the wave speed perturbations (relative to the reference model) for a series of (vertical) crust-mantle sections across SE Tibet. Since for the azimuthal anisotropy inversion we use dispersion data up to 100 s, we only show the results for azimuthal anisotropy up to 150 km (Figure 8).
 For the MIT array area (Figure 1b) the inferred wave speed variations are generally consistent with our previous study [Yao et al., 2008], but the structures are better resolved and the data used here constrains lithosphere structure in a larger region. The Lhasa Block, north of the Himalayan Thrust Belt, is marked by low wave speeds at middle crustal depth (Figures 9 and 10, profiles AA', BB', EE', and FF'). The Songpan-Ganze fold is also slow at middle/lower crustal depth, but with substantial lateral variation in intensity or depth/thickness of low velocity layer (Figures 9 and 10, profiles AA', BB', CC', DD', and HH'). The crust of the Yangtze Block is generally fast (Figure 8b), but low speeds appear near major fault zones, such as the Red River fault (Figure 8a), the middle crust of Xiaojiang fault zone (Figure 8b, profiles CC' and II' in Figures 9 and 10), and the middle and lower crust around Lijiang fault (Figure 8c, profiles CC' and DD' in Figures 9 and 10). At uppermost mantle depths (that is, 80 and 110 km, Figures 8d and 8e), the Qiangtang Block and the region around the Bangong-Nujiang suture appear fast. Near the margin of the study region, the upper mantle beneath the Lhasa Block is fast but further east wave speeds are average or slightly below average (Figures 8d and 8e). A conspicuous low-velocity zone is imaged in the uppermost mantle (80–150 km depth) around the western margin of the Yangtze Block and the Red River fault zone (Figure 8).
 The 3-D inversions reveal substantial variation of azimuthal anisotropy with depth (Figure 8). Azimuthal anisotropy is relatively weak (that is, ASV is low) in the upper and middle crust (Figures 8a and 8b), but at 10 km depth the fast polarization axes (ϕ) reveal a prominent curvilinear pattern around the eastern Himalayan syntaxis (Figure 8a). This resembles the pattern of surface motion from GPS (Figure 11) but, as explained in the introduction, comparison between seismic anisotropy and GPS velocity fields is not straightforward (see also section 6.2, below). We also observe that the fast polarization axes at 10 km are nearly parallel to the major strike slip fault systems, i.e., Xianshuihe-Xiaojiang fault zone. At 25 km depth (Figure 8b) the fast direction is more complicated than that at 10 km: the fast direction in the Lhasa Block is nearly E-W oriented, while the Songpan-Ganze Fold Belt and Yangtze Block show a predominance of S-N fast direction.
Figure 8 reveals dramatic changes in the pattern of azimuthal anisotropy from middle/lower crust (e.g., 50 km depth; Figure 8c) to the uppermost mantle (80 and 110 km depth; Figures 8d and 8e) in the Lhasa Block and beneath the Songpan-Ganze Fold Belt. Near the Indus-Tsangpo suture, around 93°E, the fast polarization changes from E-W at 50 km depth to N–S at 80 and 110 km depth (Figures 8d and 8e). At 50 km depth, the fast direction in the Songpan-Ganze Fold Belt and Yangtze Block is predominantly N–S (Figure 8c). However, in the uppermost mantle (80 and 110 km; Figure 8d and 8e) the fast axes generally follow the shape of the slow structure along the western margin of Yangtze Block. [We notice a coincidence with the orientations of the Lijiang-Muli and Red River faults, but since such local correlations may not be meaningful we prefer the statistical analysis described below.] The regions north and south of 26°N have nearly orthogonal fast propagation axes in the shallow mantle (Figures 8d and 8e). At 110 km depth (Figure 8e) the fast direction is approximately E-W near 26°N. Both in Yunnan and near the Bangong-Nujiang suture the fast direction at 150 km depth (Figure 8f) differs from that at 80 and 110 km. However, the resolution and reliability of anisotropy obtained at 150 km is not as good as shallower uppermost mantle depth due to the degraded azimuthal path coverage and depth resolution at longer periods. Hereinafter, we restrict the discussion to azimuthal anisotropy to a depth of 120 km.
6.1. Crustal Low-Velocity Zones: Evidence of Crustal Flow?
 Our tomographic images reveal widespread low velocity zones (LVZs) (Figures 8–10) in the crust and locally in the lithospheric mantle beneath SE Tibet and SW China. Before discussing crustal LVZs in more detail, we emphasize that they are anomalous relative even to a crust that is, as a whole, seismically slower than the global average crust.
 The presence of midcrustal LVZs beneath the Lhasa Block in southern Tibet is consistent with the magnetotelluric results that exhibit high conductivity in the middle crust [Unsworth et al., 2005; Wei et al., 2001] and which suggest a weaker and partial molten middle crust. Numerical models [e.g., Beaumont et al., 2004] with a low viscosity and partially molten middle crust explain how midcrustal rocks (e.g., the high-grade metamorphic rocks described by Grujic et al. ) could have been exhumed to the surface [Hodges et al., 2001; King et al., 2007]. The (seismically) normal lower crust of southern Tibet, underlying the midcrustal LVZ observed here, may represent subducted Indian crust [Percival et al., 1992]. Indeed, Priestley et al.  explain the observed seismicity in the shallow and lower crust with a cool, brittle upper “Tibetan” crust and a cold brittle low “Indian” crust, separated by a ductile, aseismic middle crust.
 The southern part of the Songpan-Ganze Fold Belt shows prominent velocity anomalies in both middle and lower crust, and LVZs are also present beneath the western Yangtze Block. As we observed before [Yao et al., 2008], in some but not all areas, the (lateral) boundaries of the LVZs seem to coincide roughly with major faults in this area, e.g., Xianshuihe fault, Anninghe fault, and Luzhijiang fault (e.g., profiles AA', BB', CC', HH', and II' in Figures 9–10). However, the spatial resolution is not sufficient to draw more definitive conclusions about this potentially important structural relationship.
 It is likely that LVZs represent zones of ductile deformation, but by itself their presence neither confirms nor refutes the regional importance of crustal channel flow as suggested by Royden et al.  and Clark and Royden . On the one hand, low shear wave speed implies low (elastic) rigidity and the spatial correlation of LVZs, zones of high (electric) conductivity [Unsworth et al., 2005; Bai et al., 2010; Rippe and Unsworth, 2010], high crustal Poisson's ration [Hu et al., 2005; Xu et al., 2007], and areas of steep geothermal gradients [Hu et al., 2000] are all consistent with the presence of partial melt in the deep crust. Our tomographic images thus suggest the ubiquitous presence of weak zones in the crust below SE Tibet and SW China. On the other hand, it is not yet known if these zones are sufficiently interconnected to enable regional scale channel flow. Our tomographic images suggest substantial lateral heterogeneity, and since we do not know what level of wave speed reduction implies a sufficient reduction in strength to allow flow, we do not know if ductile deformation is confined to isolated LVZs or if flow in other (slow) parts of the crust or lithospheric mantle help create regional scale conduits.
 If regional scale flow occurs its pattern will be more complicated than predicted from models with depth dependent viscosity. In particular, LVZs may be truncated by major faults at depth. This observation suggests that major faults influence (or be themselves influenced by) the pattern of flow and, hence, the style of regional deformation.
6.2. Crust and Mantle Deformation: Coupled or Decoupled?
 Crustal channel flow [e.g., Royden et al., 1997] implies that the weak layer decouples deformation in the shallow crust from that in the lithospheric mantle. In contrast, geodynamical modeling (constrained by GPS observations, quaternary fault slip data, and shear wave splitting measurements) generally favor vertically coherent deformation in the lithosphere [e.g., Flesch et al., 2005]. Resolving this controversy is key to understanding lithospheric deformation of (eastern) Tibet, but several issues must be considered.
 First, the level of crust-mantle coupling does not need to be the same everywhere. Flesch et al.  and Sol et al.  suggest decoupling in Yunnan province and vertically coherent deformation further north, and Bendick and Flesch  suggest that crustal flow and lithospheric coupling can coexist in northern Tibet if the viscosity contrast between crust and upper mantle is much smaller than implied in canonical crustal models [e.g., Royden et al., 1997; Clark and Royden, 2000]. The results of our tomographic studies suggest that the level of coupling may well vary on smaller scales and that it may be scale dependent. The localized nature of LVZs (see above) suggests strong lateral variation in crustal rheology, which may mean that also the level of crust-mantle coupling varies laterally. Effective decoupling may occur across large (or interconnected smaller) LVZs, whereas deformation may appear vertically coherent elsewhere, even if (small) LVZs are present.
 Second, conclusions about crust-mantle coupling based on comparison between GPS data and seismic anisotropy (for instance from shear wave splitting) involve many assumptions. In contrast to anisotropy, GPS velocities depend on the geographical reference frame, and the vectors would change if a different reference is used. This can be remedied by taking the spatial derivative (to obtain the frame-invariant strain rate). Furthermore, seismic anisotropy relates to structures (or fabric) formed over long periods of geological time, often during complex strain histories, whereas the gradient of GPS velocities yields the present-day (instantaneous) strain rate. Direct comparison between the two is meaningful only if a simple deformation history can be assumed.
 Third, GPS observations relate to the near-surface whereas shear wave splitting reflects the accumulative effect of anisotropy over a large (and unknown) depth range. To constrain the style of deformation in the lithosphere one often assumes that splitting observed at the surface has an upper mantle origin [Flesch et al., 2005; Sol et al., 2007; Wang et al., 2008]. But with a thick and highly deformed crust this may not be justified. For typical crustal rocks, shear wave split times are about 0.1–0.2 s per 10 km; mica and amphibole lattice preferred orientations play a major role for crustal anisotropy [Barruol and Mainprice, 1993; Weiss et al., 1999], similar as the role of olivine for upper mantle anisotropy. For the thick Tibetan crust (70–80 km) shear wave splitting from crustal anisotropy can be on the order of 1 s, which is similar to the observed splitting in SE Tibet [Lev et al., 2006; Wang et al., 2008]. To get better insight into crust-mantle coupling we need to determine the relative contributions of crust and mantle to the observed splitting.
6.2.1. Differences Between Crust and Mantle Anisotropy
 In order to quantify the relative contributions from crust and mantle anisotropy to the observed shear wave splitting we compare the observations with predictions from our 3-D anisotropy model from surface wave array tomography. We can predict observed split times and polarization axes if we assume a horizontal symmetry axis and vertically incident shear waves [Montagner et al., 2000; Simons et al., 2002]. We first calculate the split time and polarization for the crust (Figure 12a). The predicated split time is ∼1 s in the plateau area, where the crust is 70–80 km thick (Figure 1b), and much smaller off-plateau in Yunnan. The observed split time is also ∼1 s [Lev et al., 2006; Wang et al., 2008], which suggests that the contribution from plateau crust cannot be ignored.
 The estimated split times and directions from the mantle part of our model (that is, Moho–120 km depth), displayed in Figure 12b, are substantially different from the splitting inferred from 3-D crustal anisotropy (Figure 12a). The most obvious feature is the larger split time in Yunnan (which for this depth range is ∼1 s). The polarization predicted from the upper mantle is mostly NW–SE in the southern part of the study region, while north of 26°N the direction is mostly NE–SW. The implied change in upper mantle deformation across ∼26°N is consistent with observations from shear wave splitting [Lev et al., 2006] and has been attributed to a transition of tectonic boundary conditions [Sol et al., 2007]. In southern Tibet, west of 93.5°E, the fast polarization direction estimated from azimuthal anisotropy in the crust (Figure 12a) is quite different from that estimated from anisotropy only in the uppermost mantle (Figure 12b) around the Bangong-Nujiang suture and Indus-Tsangpo suture, implying a different deformation pattern in the crust and upper mantle around these suture zones.
 Splitting calculated from crust and upper mantle combined (Figure 12c) is more similar to that from the upper mantle (Figure 12b) than from the crust (Figure 12a), reflecting, in general, a larger contribution by upper mantle anisotropy, especially in the Yangtze block where crustal thickness is not large. In some regions the fast directions from crust (Figure 12a) and upper mantle (Figure 12b) are similar, for instance, around Sichuan basin and part of the Songpan-Ganze Fold Belt. This alignment enhances the (total) split times. In some other areas, for instance in the Lhasa Block, crust and upper mantle splitting have nearly orthogonal fast directions, which can explain the smaller split times.
 In Figure 13 we compare predictions from our 3-D model with observed fast polarizations. For this purpose, we only consider observations with significant splitting (δt > 0.2 s), and in order to compare the fields at comparable spatial resolution we smooth the observed data to suppress variations on scales that cannot be resolved tomographically (see Appendix C). Differences between observed and predicted polarization directions are large for splitting calculated from crustal anisotropy (Figure 13a). For mantle anisotropy the differences are smaller (Figures 13b and 13e), but in over 20% of the study region the angle differences exceed 60°. If splitting is calculated from anisotropy in crust and mantle combined, the angle difference is less than 30° in about 55% of the cases (Figure 13c and 13f).
 This discrepancy between the predicted and observed splitting fast directions can have several causes. First, mantle anisotropy at depths larger than the base of the tomographic model (at z = 120 km), such as anisotropy due to subduction beneath the Himalaya and Burma ranges, is not included in the predictions. Second, observed splitting directions may have large uncertainty caused by noise and uneven azimuthal data coverage. Third, the directions inferred from surface wave array tomography have uncertainties due to uneven azimuthal coverage and regularization of the inversion (see sections 3 and 4, above). Finally, the calculation of the splitting parameters assumes vertical incidence and a horizontal orientation of the fast axis [Simons et al., 2002], but the data used have a nonzero incidence angle and dipping axes of symmetry cannot be excluded.
 The above observations suggest important differences in crust and mantle anisotropy beneath SE Tibet. The analysis confirms that, overall, upper mantle anisotropy contributes more to observed splitting than the crust. In fact, the contribution from crustal anisotropy is negligible in Yunnan. For stations on the high plateau of SE Tibet, however, the contribution from the crust (up to ∼1 s split time) is comparable to that from the mantle (Figure 12). Combined, the observed splitting and the 3-D model of azimuthal anisotropy suggest substantial depth variations of anisotropy. This cannot be resolved with traditional shear wave splitting analysis [e.g., Lev et al., 2006], but it may become possible to constrain depth variations with finite frequency shear wave splitting tomography [Chevrot, 2006; Long et al., 2008; Sieminski et al., 2008].
6.2.2. Implications for Understanding Crust-Mantle (De)Coupling
 The conclusion by Flesch et al.  and Wang et al.  that crust and mantle deform coherently follows, in part, from the assumption that the observed splitting at the surface originates in the upper mantle. However, the above analysis shows that on the plateau proper, that is, in some regions west of Sichuan basin, the contribution to splitting from the thick crust is comparable to that from the upper mantle. Over large areas there appear significant differences in the pattern of crust and upper mantle azimuthal anisotropy (Figures 8 and B1). Indeed, the difference between fast polarization axes calculated from anisotropy in the crust (0–Moho) or mantle (Moho–120 km) exceeds 45° in about 75% of the study area (Figure 14). This is consistent with Yi et al.  who found that in eastern Tibet the pattern of Rayleigh Wave azimuthal anisotropy at intermediate periods (T = 30 s) differs from that at long periods (T = 100 s).
 The variation of azimuthal anisotropy with depth (Figures 8, 14, and B1) suggests that in SE Tibet parts of the crust and upper mantle deform (or have deformed) differently from one another. The data do not resolve a correlation between the presence of LVZs and the strength of azimuthal anisotropy, but in many regions (e.g., in Songpan-Ganze Fold Belt) the lower crust has stronger azimuthal anisotropy (∼3%–4%) than the upper and middle crust (∼2%). Moreover, Love-Rayleigh wave analysis [Huang et al., 2010] suggests that LVZs coincide with regions where horizontally polarized shear waves propagate faster than vertically polarized waves (VSH > VSV). This might be indicative of more efficient flow in parts of the deep crust where temperatures are high or where differences in composition (e.g., volatile content, melt) localize ductile deformation.
 These observations appear incompatible with vertically coherent deformation of the crust-mantle system, but without knowing the actual processes that produce the azimuthal anisotropy it is not possible to make more conclusive statements. Anisotropy in the crust and lithospheric mantle can be caused or influenced by several factors, including the style of deformation (e.g., pure or simple shear [Wang et al., 2008]), the presence or absence of cracks [Crampin and Chastin, 2003], crystal orientation (e.g., shape or lattice preferred orientation [Savage, 1999]), deformation fabric (e.g., S-C fabrics for shear deformation of mica [Lloyd et al., 2009]), and in situ conditions such as temperature, stress, water content, and melt [Karato et al., 2008]. Much like the lateral variations in elastic structure inferred here, these factors vary, and local differences between crust and mantle anisotropy may occur regardless of the level of mechanical coupling between crust and mantle. Most of our observations pertain to a regional scale, however.
 High-resolution surface wave tomography from ambient noise interferometry and teleseismic surface wave analysis provides important constraints on the structure and deformation of the lithosphere of SE Tibet and SW China. The main conclusions are as follows:
 1. The lateral resolution of our surface wave array tomography in SE Tibet is ∼100 km for the isotropic part of the model and ∼200 km for azimuthal anisotropy.
 2. Shear wave speed is relatively low in the entire crust, but locally the wave speed in the deep crust is anomalously low even compared to the average crust. Shear speed is also low in the lithospheric mantle beneath some crustal low velocity zones (LVZs).
 3. LVZs occur near areas of low electrical resistivity [Bai et al., 2010], high heat flow [Hu et al., 2000], and high Poisson's ratios [Xu et al., 2007]. It may be possible to attribute the low-wave speeds to anomalous composition or petrology, but the preponderance of evidence suggests that LVZs are mechanically weak probably due to the presence of partial (aqueous) melt.
 4. Around the eastern Himalayan syntaxis, the LVZs form a largely interconnected but complex network with substantial lateral variation in depth, thickness, and strength of the anomalies. Some may be truncated by major faults (e.g., the Xianshuihe fault).
 5. The surface wave data resolves changes of azimuthal anisotropy with depth, and the pattern of fast directions in the crust differs significantly from that in the mantle.
 6. In general the upper mantle contribution to the observed splitting appears larger than the crustal contribution and off the plateau proper (in Yunnan, south of 26°N) upper mantle (azimuthal) anisotropy explains most of the observed shear wave splitting. However, on the high plateau west of Sichuan basin, splitting from the crust can be as large as that from the upper mantle. In regions (like the Tibetan plateau region) with a thick and structurally complex crust, it may thus not be justified to attribute splitting only to mantle anisotropy.
 7. The radial change in anisotropy suggests that beneath much of SE Tibet the upper crust and lithospheric mantle deform (or have deformed) differently. If LVZs represent crustal scale “décollement zones,” then the level of decoupling is likely to vary laterally along with changes in depth and strength of the LVZs.
 8. The presence of mechanical weak zones and the depth variation of seismic anisotropy are qualitatively consistent with expectations from crustal flow models, but strong lateral heterogeneity suggests that the 3-D pattern of any such flow is complicated.
 9. Our research suggests that regional scale deformation in SE Tibet occurs through the interplay between lithospheric units with or without crustal weak zones (that is, blocks with or without vertically coherent deformation) separated from one another by major faults. Understanding how such a system responds to regional tectonic stress may provide important keys to understanding the seismotectonics of this region and is therefore an important target for our future research.
Appendix A:: Correction of Phase Velocity Measurements From the TS Analysis
 According to ray theory, used in traditional surface wave tomography, phase velocity or travel time is only sensitive to the structure along the great circle raypath. The average phase velocity at frequency ω between two stations at A(rA) and B(rB) in a perturbed earth mode can be expressed as
where c(ω, r) = c0 (ω) + δc(ω, r) gives the 2-D phase velocity distribution with c0 as the reference phase velocity and δc(ω, r) as the 2-D phase velocity perturbation, ΔAB is the great circle distance between A and B, and the integration is taken along the great circle path between A and B.
 Considering the finite frequency effect of surface wave propagation, we can express the phase travel time between S(rS) and A as
where tSA (ω) is the reference travel time between S and A, Kϕc (ω, r; rs, rA) is the 2-D phase sensitivity kernel to phase velocity, and the integration is computed at the spherical surface Ω of the Earth. ΔSA (or ΔSB) is the great circle distance between S and A (or B). Therefore, the finite frequency travel time of surface waves based on cross-correlation method between the two stations at A and B is given by
where tSB − tSA = (ΔSB − ΔSA)/c0 is the differential reference travel time between A and B, and Kϕc (ω, r; rs)AB = Kϕc (ω, r; rs, rB) − Kϕc (ω, r; rs, rA) is the 2-D differential phase sensitivity kernel for two-station analysis. Figure A1 shows one example of the windowed differential kernel at T = 30 s with a reference phase velocity 3.6 km/s using the phase kernel expression in the work of Zhou et al.  without considering effect of source mechanism. The average interstation phase velocity in the TS method based on finite frequency theory can be approximated as
provided that the source is almost along the great circle path linking the two stations [see Yao et al., 2006].
 In a generic heterogeneous medium, ABFK may be different from ABRT. However, our approach to invert for both isotropic and azimuthally anisotropic phase velocity maps (section 3) is based on ray theory. In order to suppress contributions from structure outside the interstation raypath on the tomographic inversion results, we perform the following scheme to calculate an approximately ray theory-based interstation phase velocity measurements from the observed finite frequency measurements. First, we use the global crust and upper mantle model from Shapiro and Ritzwoller  to calculate the phase velocity map c(ω, r) at each frequency ω (Figure A2b) and consequently the reference (average) phase velocity c0 and the phase velocity perturbation δc(ω, r) in SE Tibet and surrounding area. From the model, we then calculate the difference of phase velocities between finite frequency approach and ray theory approach as
If the observed average interstation phase velocity from the TS approach [Yao et al., 2006] is ABTS(ω), the corrected interstation phase velocity after suppressing the finite frequency effect is given by
We repeat this process for every station pair for each earthquake at every period for our TS measurements and finally obtain the corrected interstation phase velocity measurements ABTS(ω) within the period band 20–150 s for 16,386 interstation paths from about 150 earthquakes (Figure A2a). For each station pair, we average the dispersion curves from different events and finally obtain 2232 interstation average dispersion curves. The average phase velocity dispersion curve in the study area is thus calculated by taking the mean of all the interstation average dispersion curves. As shown in Figure A3, the original average phase velocity dispersion which is subjected to finite frequency effect will be 0.4%–0.8% higher between 20 and 40 s than that after suppressing the finite frequency effect. This is mainly due to the fact that surface waves from the earthquakes used in this study, which are mainly located to the east and south of our array (Figure A2a), sample faster structure in southern China and Sichuan basin (Figure A2b) where the crust is much shallower than in the Tibetan plateau area. We also notice that at T > 50 s their differences are very small (∼0.1% or less). This is primarily due to the smoothness of upper mantle structure in the global model by Shapiro and Ritzwoller , although phase velocity measurements at longer periods are more influenced by finite frequency effect. Hopefully through our approach of suppressing finite frequency effect on the dispersion measurements, we can mitigate the effect of crustal structure outside the array area on the inversion of phase velocity maps at intermediate periods.
Appendix B:: Point-Wise Shear Wave Velocity Models in SE Tibet
 After two-step inversion described in section 4, we obtain 1-D shear wave velocity and azimuthal anisotropy model for every grid point in the study region. Figure B1 shows three examples of the final model at the location (93°, 30°) in the Lhasa Block, (100. 5°, 29°) in the Songpan-Ganze Fold Belt, and (103°, 26°) in the Xiaojiang fault zone of Yangtze Block. The global reference model is shown as the red dashed line with the crust part averaged approximately from the Global Crust 2.0 model (http://igppweb.ucsd.edu/~gabi/crust2.html) and the upper mantle part from the global ak135 model [Kennett et al., 1995]. For each location, the reference crust shear velocity linearly increases from 3.4 km/s at the surface to 3.85 km/s at the Moho depth, which is a modified version of the three layer crustal model inferred from the Crust 2.0 model with crustal thickness varying from 45 to 70 km. The reason that we did not use the average shear velocity model in the study region as the reference is because the average shear wave velocity in the middle and lower crust in SE Tibet is significantly slower than that of the global average. We observe apparent low velocity layer in the middle-lower crust (20–50 km) of SE Tibet with shear wave speed even slower than 3.3 km/s, which is about 10%–15% slower than the global average, typically 3.6–3.9 km/s at this depth range. We also observe that the azimuthal anisotropy, as shown in the three examples, has large radial variation in both amplitude and azimuth of fast polarization axis. This implies that different depth ranges in the lithosphere of SE Tibet, such as crust and uppermost mantle, may deform differently.
Appendix C:: Spatial Smoothing of the Observed Shear Wave Splitting Data
 The observed shear wave splitting measurements (red bars in Figure 12 or Figure C1) by Lev et al. , Sol et al. , and Wang et al.  reveal variations in orientation over length scales that are much shorter than the scale of horizontal resolution (about 200 km) from our tomographic inversions (Figures 6 and 8). In order to compare azimuthal anisotropy from both shear wave splitting and from surface wave tomography on a similar length scale, we need to smooth the observed shear wave splitting data to a similar length scale of the tomographic maps for azimuthal anisotropy.
 Here we adopt a Gaussian spatial smoothing function to each component of azimuthal anisotropy as we also used for tomographic inversions in section 3 for smoothing the shear wave splitting data. It is expressed as
where Mi and Mj are the two locations in the study region where we have splitting measurements, d(Mi, Mj) is the distance between these two locations, and L is the spatial smoothing length. Let N denote the total number of splitting measurements, δt as the vector of splitting time, and ψ as the vector of angle of fast direction. Since the splitting time and fast direction essentially represent the shear velocity azimuthal anisotropy at some particular location, the splitting intensity function with respect to azimuth for vertically polarized shear waves (e.g., equation (4)) for each measurement is defined as
For each location with splitting measurement, the smoothed splitting time function is defined as
where gik is the normalized Gaussian smoothing coefficient given by
Therefore, the shear wave splitting time and fast direction after spatial smoothing is calculated from the following:
When L → 0, gik → δik (the Kronecker delta function), therefore i () → si (φ), → δti, and i → ψi. Notice that we did not directly smooth the observed splitting time and fast direction considering their nonlinearity.
 The spatial correlation length used from surface wave azimuthal anisotropy inversion is about 200–300 km (section 3) and the spatial resolution for our azimuthal anisotropic maps is about 200 km for most periods less than 100 s (Figure 5). In order to smooth the small wavelength (less than 100 km) variations in the shear wave splitting measurements, we finally choose a spatial smooth length of 150 km in equation (C1). The resulting shear wave splitting measurements after smoothing is shown as the black bars in Figure C1. We notice that after smoothing the shear wave splitting data in SW China generally have a smaller splitting time due to cancellation of different fast directions and an overall NNW–SSE fast direction in Sichuan and nearly E-W fast direction in Yunnan.
 We thank the editor (Robert Nowack), the associate editor, an anonymous reviewer, and Lucy Flesch for their constructive comments which helped to improve our manuscript. We thank Anne Meltzer for sharing (prior to formal release) the data of 50 array stations in southeastern Tibet (Namche Barwa Seismic Experiment) deployed by Lehigh University. This work benefited from discussions with Leigh Royden, Clark Burchfiel (MIT), Paul Tapponnier (Institut de Physique du Globe, Paris), Qiyuan Liu (Institute of Geology, Chinese Earthquake Administration), and Denghai Bai (Institute of Geology and Geophysics, Chinese Academy of Sciences). This work is funded by NSF Continental Dynamics Program NSFEAR 0003571 and Geophysics Program NSFEAR 0910618.