The Main Himalayan Thrust Beneath Nepal and Southern Tibet Illuminated by Seismic Ambient Noise and Teleseismic P Wave Coda Autocorrelation

Nepal is an actively deforming region due to its tectonic setting that hosts destructive earthquakes including the recent 2015 Mw 7.8 Gorkha earthquake. To better understand the physics of earthquakes and better assess the seismic hazard in the region, a highly resolved 3‐D structure of the crust is essential. This study presents a new 3‐D S‐wave velocity structure of the crust using ambient noise tomography (ANT). In relation to the 2015 earthquake, we show that the lateral variation of S‐wave velocity likely controls the rupture propagation and arrest. This study further constrains crustal discontinuities beneath Nepal including the Main Himalayan Thrust (MHT) using teleseismic P‐wave coda autocorrelation. The MHT geometry correlates well with sizeable low S‐wave velocity zones obtained from the ANT and separated by two ramps. The southernmost first ramp is where the 2015 Gorkha earthquake nucleated that we map up to the source model of the 1934 historical earthquake in Eastern Nepal. The northernmost second ramp is adjacent to the mid‐crustal low velocity zone beneath south Tibet. In between the two ramps, we interpret the horizontal low velocity zone as the interseismic creeping patch of the MHT with down‐dip extent of 50–70 km length and 90–100 km in correspondence with the Mw 7.8 2015 and 8.4 1934 source models, respectively. A systematic mapping of the low‐velocity zones in relation to the MHT integrated with GPS geodesy along with the respective slip from individual past earthquake ruptures can help decipher the paradox of maximum Himalayan earthquake magnitudes.

Although numerous studies focused on the geometry of the MHT in the region, it is not clear why some of the large earthquakes (e.g., 2015 Gorkha earthquake) do not rupture MHT to the surface (e.g., Avouac et al., 2015;Elliott et al., 2016;Grandin et al., 2015;Mencin et al., 2016). It is also unclear why the rupture of the Gorkha earthquake propagated eastward from the hypocenter. Moreover, some of these studies show that the strain transferred to the southern edge of the Gorkha earthquake has not been released by the afterslip process. Further, previous studies showed the high slip potential in the Himalaya region (e.g., Bilham & Wallace, 2005;. This indicates the potential for future major earthquakes in the Himalayas. Few studies have provided the lateral variation of compressional P-wave velocity on the MHT surface (e.g., Bai et al., 2016;Pei et al., 2016), however, the details within the rupture area of the 2015 Gorkha earthquake are lacking. Additionally, the precise location or relocation of earthquakes in this region is challenging due to the lack of highly resolved velocity models. This place a limitation on the use of seismicity to map the geometry of MHT (e.g., Kumar et al., 2017;Wang et al., 2017). Hence, a detailed 3-D crustal velocity structure, including the geometry of the MHT and other discontinuities beneath Himalaya Nepal, is crucial to improve our understanding of earthquake processes and seismic hazards in the region.
The wider Himalayan region has been extensively studied using earthquake-based tomography, ambient noise tomography, and/or receiver functions (Agrawal et al., 2021;Hazarika et al., 2020;Kumar et al., 2019;Wei et al., 2021;Wu et al., 2021). However, only limited studies focused on the lithospheric structure beneath Nepal (e.g., Guo et al., 2009;Monsalve et al., 2008), and their lateral resolution is poor. In addition to velocity structure, seismic discontinuities in the crust and upper mantle beneath Himalaya Nepal have been studied mostly using receiver functions (e.g., Nábělek et al., 2009;Nelson et al., 1996;Priestley et al., 2019;Schulte-Pelkum et al., 2005;Xu et al., 2013). Recently, Ruigrok and Wapenaar (2012) focused on the study of the crustal interfaces beneath Tibet using autocorrelation of global phases. The autocorrelation approach has become a powerful tool for mapping crustal and upper mantle discontinuities (Kennett, 2015) as it provides additional information on the P velocity (Pham & Tkalcic, 2017 in comparison to S velocity obtained from the receiver functions.
Here, we use ambient noise data from six network stations covering most of Nepal (Figure 1b) to estimate the group and phase velocity maps. We then apply a fully non-linear Bayesian inversion of group and phase velocity dispersions (e.g., Manu-Marfo et al., 2019) to image the 3-D S-wave velocity structure beneath Nepal. This approach provides robust estimates of S-wave velocity and its uncertainty as a function of depth. However, it is 3 of 21 challenging to identify the exact depths of crustal discontinuities using group and phase dispersion data. This is because surface wave dispersion data are sensitive to average velocity structure, but they are weakly sensitive to seismic discontinuities (Julià et al., 2000). Moreover, the group and phase sensitivity kernels for period 3-35 s show that dispersion data used in this study are mostly sensitive the structure up to 60 km depth ( Figure S1 in the Supporting Information S1). Therefore, uncertainties for S-wave velocity become large at discontinuities. To identify the depth-interface locations, we further apply the P-wave coda autocorrelation approach (Pham & Tkalcic, 2017. In this paper, we image the geometry of the MHT as it extends from a shallow depth beneath Nepal to the mid-crust under southern Tibet. We interpret our results in terms of structural and geological features as well as earthquake and rupture processes of the 2015 Gorkha earthquake. We integrate our results with recent GPS data (e.g., Lindsey et al., 2018;Mencin et al., 2016;Stevens & Avouac, 2015 and historical (e.g., Bilham, 2019) as well as paleoseismological data (e.g., Kumar et al., 2001Kumar et al., , 2006Sapkota et al., 2013;Wesnousky et al., 1999Wesnousky et al., , 2018 and discuss the mechanics of the MHT and it seismogenic potential.

Data Collection and Pre-Processing
In this study, we used continuous time-series data recorded by five networks in Nepal: Hi-MNT (Sheehan, 2001), Hi-CLIMB (Nabelek, 2002), Namaste (Karplus et al., 2020), HiK-NET (Bollinger et al., 2011), and Cambridge (Priestley et al., 2019). We also added the data from five seismic stations located near the border of India and Nepal that belong to the Indian Institute of Science Education and Research (IISER-K), Kolkata. The Hi-CLIMB stations were operated from 2002 to 2005 while Hi-MNT was installed from 2000 to 2002. The Namaste, Cambridge, and IISER-K networks were operated from 2015 to 2016 just after the 2015 Gorkha earthquake. The HiK-NET was operated from 2014 to 2016, and the data were obtained from RESEIF seismological data portal. , and a normal fault (STD). The colored beach balls indicate earthquakes after 1976 with magnitudes greater than 6 obtained from the Global CMT catalog (Dziewonski et al., 1981;Ekström et al., 2012). Red dashed lines bound an interseismic decoupling zone from 0.9 to 0 coupling (Stevens & Avouac, 2015). (b) Ray-coverage using seismic stations from all the networks (represented by different color triangles) considered in this study. In total, we have more than 185 stations providing unprecedented dense ray coverage ( Figure 1b) that enables us to achieve the best lateral resolution in the region so far.
We followed the procedure of Bensen et al. (2007) to compute the cross-correlations between stations. In the pre-processing stage, we cut continuous data into 1-hr lengths. We then removed the mean, trend, and instrument response and filtered the data between 0.02 Hz (50 s) and 0.33 Hz (3 s) frequency band. The lower frequency band was chosen based on the availability of the longest interstation distance of 500 km, which is equivalent to ∼50 s for 1/12th of interstation distance (because we assume the maximum wavelength that provide reliable result is approximately 1/3rd of the interstation distance and the phase velocity of Rayleigh wave is assumed to be around 4 km/s). Similarly, the upper frequency band was chosen based on the minimum interstation distance of 35 km that can be achieved for good quality dispersions. We note that Hi-CLIMB, HiK-NET, and Namaste networks have station pairs with interstation distances of less than 35 km, but the phase and group velocity dispersion could not be selected based on the multiple filter technique (Herrmann, 2013). Therefore, we rejected cross-correlations for interstation distances below 35 km. To remove the earthquake signals from the data, we applied one-bit normalization to the filtered data. Finally, we applied spectral whitening. Then we cross-correlated whitened time series recorded by stations at the same time to avoid any effect of the different distribution of the noise source in the time scale. We preferred hourly long time series for cross-correlation so that a high signal-to-noise ratio (SNR) is achieved, particularly for stations with recordings for a shorter duration. We linearly stacked all cross-correlations obtained for specific station pairs to improve the SNR by canceling the incoherent noise. Here, we considered cross-correlations with SNR greater than or equal to 10. Examples of linearly stacked cross-correlations for three networks as a function of interstation distance are shown in Figure 2.
For autocorrelation of the P-wave coda, we downloaded the P-wave data for earthquakes of Mw > 5.5 in the epicentral range 30°-95° from IRIS, recorded by Hi-CLIMB stations (Nabelek, 2002). We selected this range because the P phase is well separated from other phases. For epicentral distances greater than 95°, it is difficult to separate the coda of the P wave from the PcP waves. We considered 25 s before and 200 s after the P arrival on the vertical component and rotated the horizontal components into radial and transverse components. We selected the data by visual inspection of both radial and vertical seismograms with high SNR for the P waveform. Vertical component seismograms were selected more than radial components as the P wave is well observed in the vertical component, particularly for teleseismic events. We present earthquakes used for P-coda autocorrelations in Table S1. The distribution of earthquake recorded by some stations are shown in Figures S2-S3. We removed the mean and trend from the data and applied the taper. We applied three different frequency bands

Inversion of Ambient Noise Data
We apply a two-step inversion approach to estimate the S-wave velocity structure from the ambient noise cross-correlations. In the first step, we compute the group and phase tomography maps for periods ranging from 3 to 35 s using the group and phase dispersions between station pairs. The group and phase dispersions across each station pair are measured from the cross-correlograms ( Figure 2) by applying Multiple-Filter-Technique (MFT) (Herrmann, 2013). We manually picked all the dispersions and visually inspected them to select the dispersion curves for tomographic inversions. Figure S4 in the Supporting Information S1 shows an example selection of dispersion curves. Our dispersion curves show variation along different paths between station pairs with low uncertainties (Figure 3). This variation is due to the underlying velocity structures across the ray paths. In Figure 3, average dispersion curves are computed from the stack of all the cross-correlograms while the uncertainties are computed as a standard deviation of dispersions computed for 3 months stack with an overlap of 2 months.
This study applies the surface wave tomographic method of Yanovskaya and Ditmar (1990). This method is based on the ray approximation and is a 2-D generalization of the classical 1-D method (Backus & Gilbert, 1968). In classical 1-D ray theory approximation, the travel time residual along the ray path is defined by the integral of the path length divided by the velocity anomalies in each grid points along that path. This 1-D method developed by Backus and Gilbert (1968) assumes that Earth's structure is varying radially not laterally. To invert for the 2-D velocity from the travel time residuals, Ditmar & Yanovskaya (1987) generalizes this classical 1-D ray theory approximation to 2-D such that the travel time residual becomes an integral of area divided by velocity perturbation in 2-D surface. Here, the goal is to compute the velocity perturbation using travel-time residuals applying 2-D non-linear inversion. This method also allows to compute the resolution lengths for the given path (see Text S2 in the Supporting Information S1).

Bayesian Inversion of Dispersion Curves
To compute the S-wave velocity as a function of depth, we apply a fully nonlinear Bayesian inversion method (e.g., MacKay, 2003; Mosegaard & Tarantola, 1995), which does not require any regularization, and the number 10.1029/2022JB026195 6 of 21 of model parameters remains unknown in the inversion. The same algorithm has been previously applied to study the crust and mantle structure as well as deep Earth structure Manu-Marfo et al., 2019;Mohammadi et al., 2022;Pachhai et al., 2015Pachhai et al., , 2022. In this approach, the answer to the inverse problem is expressed in terms of the posterior probability density (PPD), which is proportional to the prior information (independent from the data) and the likelihood, which incorporates data information.
where ( | ) represents the PPD and indicates the probability of the model parameters ( , which includes number of layers, layer thickness, and S-wave velocity) for given observed data ( ), ( , ) represents the likelihood, which is the probability of data for given model parameters, and ( ) represents the prior probability of the model parameters. As the data is fixed, the likelihood function becomes the function of only model parameters, that is, ( ) .
The data errors are generally not known and are estimated as a difference between the measured and predicted data. Here, we assume that the data noise has the Gaussian form, therefore, the likelihood function is expressed in the form of L2-norm.
where is the noise covariance matrix that includes correlated and uncorrelated noise. Here, our goal is to invert for both group and phase velocity dispersions, therefore, this expression can be modified as: where 2 is the number of dispersion curves (i.e., one for group and the other for phase velocity) used in the inversion.
For group and phase velocity dispersion curves, the data error can be highly correlated due to various data processing steps. Here, we apply an autoregressive process of order 1 (AR1) model to account the correlated errors. More details of this type of noise model can be found in . In this case, the likelihood function can be expressed in the following form.
where npts is the number of data points, obs is the observed data, ( ) is the synthetic prediction, is the noise standard deviation, and ( ) is AR1 predictions for AR1 parameter . In our case, we have two values for the standard deviation of noise and two values for the AR1 parameters (i.e., 2 each for group and phase velocity). These parameters are also unknown in the inversion, and this type of approach is known as hierarchical Bayesian inversion.
Inversion of dispersion curves is a highly nonlinear problem, and analytical computation of PPD is not possible for the nonlinear problem. Additionally, the number of layers is not known in advance, and estimated parameter uncertainties depend on the model complexity. If we add the model complexity, the fit between observed data and synthetic prediction significantly improves but may not be necessarily required by the data. As a result, uncertainties can be large. In contrast, a simple model can predict only some portion of the data, resulting in unrealistically small uncertainties. Here a reversible jump Markov Chain Monte Carlo (rjMCMC) sampling is used to estimate the PPD Pachhai et al., 2015). This approach contrasts to the MCMC approach developed for the joint inversion of surface wave dispersions and receiver functions by Shen et al. (2013). As a result, model uncertainties estimated using the MCMC approach in Shen et al. (2013) do not include the uncertainties from the unknown parameterization. In our rjMCMC approach, the model parameters are updated in three different moves: birth, death, and perturbation. In the birth move, a new interface is introduced in a randomly chosen layer, and model parameters (thickness and S-wave velocity) are perturbed from one of the randomly selected layers. In the death move, a layer is randomly picked and subject to deletion (death) with the perturbation of layer thickness and velocity from a randomly selected layer. In the case of the perturbation move, layers are neither introduced nor deleted, but only model parameters are perturbed from a randomly selected layer. All the moves are accepted or rejected based on the Metropolis-Hasting acceptance or rejection criterion (Hastings, 1970;Metropolis et al., 1953) defined by, where ( ′ ) ( ) is the prior ratio and is the ratio of proposal. The prior and proposal ratios become unity as the prior is fixed in all the iterations and we use a symmetrical distribution for the proposal. The only remaining term in Equation 5 is the likelihood ratio, and likelihood is computed using Equation 4.
For a nonlinear problem, the rjMCMC can be very inefficient if low probability regions separate multiple high probability regions. As a result, the sampling needs a long time to converge to the true model. Thus, interacting Markov chains are used here to achieve faster convergence. In this case the acceptance or rejection probability becomes, where represents the tempering parameter (between 0 and 1) which scales the likelihood. For more details of this approach, we refer to , and Pachhai et al. (2014Pachhai et al. ( , 2015.

Autocorrelation of P-Wave Coda
The autocorrelation, mathematically, measures the similarity between a time-series and its delayed version. The autocorrelation method has been a very effective tool in identifying interfaces with a strong impedance contrast in the crust (Gorbatov et al., 2013;Kennett et al., 2015), particularly for shallow crustal structures using teleseismic coda waves (Pham & Tkalcic, 2017 and ambient noise (Gorbatov et al., 2013). The autocorrelation of a seismic waveform contains many positive and negative peaks. These peaks represent the reverberations of a seismic wave within a layer. Figure S5 in the Supporting Information S1 illustrates a simple concept of how the autocorrelation method works. In this example, a P-wave incident on a layer with the thickness (H = 3 km), P-wave velocity (Vp = 3.9 km/s), and S-wave velocity (vs. = 1.9 km/s). The incident P wave converts to either P or S at the interface and reverberates within the layer before recording on a seismic recording station ( Figure S5a in the Supporting Information S1). The impulsive responses of various conversions and reverberation of the incident P-wave are shown in Figure S5b in the Supporting Information S1, and their autocorrelations are shown in Figure S5c in the Supporting Information S1. The impulses of P-wave reverberation are recorded on the vertical component while both P and S reverberations are recorded on the radial component. Therefore, the vertical and radial seismograms include all the ground motions of the P wave. The autocorrelation produces a symmetrical signal on two sides with the largest amplitude in the center. Therefore, only the positive side is considered after removing the largest amplitude in the center ( Figure S5c in the Supporting Information S1). The prominent negative ampli tude in the autocorrelation of the vertical component is represented by 2p. In this example, it represents the time difference between the reflected phase 3p (this phase is called a reflected phase because it has one reflection on the surface) and transmitted phase 1p. This can be expressed in the following mathematical form.
where β is the ray parameter, H is the layer thickness or depth of the discontinuity in the case of a layer over half-space. For teleseismic earthquakes used in this study, the ray parameter is very small, and square of it is negligible. As a result, Equation 7 becomes simple velocity formula in which the wave travels the distance 2H. Therefore, the expression for 2p delay time becomes, Similarly, the largest negative amplitude in the autocorrelation of the radial component is represented by 2s. In this example, it represents the time difference between the pair of the reverberation 1p2s and transmitted phase 1p. In this case, the expression for the computation of 2s delay time becomes, Using these autocorrelation approach, we first measure the delay times of the prominent peaks from the stacked vertical and radial autocorrelograms. We then estimate the layer thickness using P-wave velocity, S-wave velocity, and delay times in Equations 8 and 9. Here we utilized the 1-D velocity model derived using joint Bayesian inversion of group and phase velocity dispersions to estimate the depth of the discontinuities.

Feasibility Study Using Synthetic Data
Before we apply our methods to the observed data, we perform various synthetic tests to illustrate the feasibility of our methods presented in Section 3. In particular, we experiment for the lateral and vertical resolution of our data.

Lateral Resolution
In the first test, we experiment to address the question of lateral resolution of the ambient noise data recorded by stations in Nepal. To determine the lateral resolution (or grid size), we performed checkerboard tests at various periods, using the station distribution considered in this study, applying FMST method (Rawlinson & Sambridge, 2003). We also computed resolution maps using tomographic inversion (Yanovskaya & Ditmar, 1990) applied in this study. More details of the checkerboard test and model resolution are presented in the supplementary material (Texts S1 and S2 in the Supporting Information S1).
Results for the checkerboard tests and resolution map are shown in Figures S6 and S7 in the Supporting Information S1. These tests show that 0.30° anomaly size (i.e., 0.15° grid size) is well recovered beneath Central Nepal, particularly in the rupture area of the 2015 Gorkha earthquake, at periods 7-20 s ( Figure S7 in the Supporting Information S1). Additionally, the resolution maps ( Figure S8 in the Supporting Information S1) show that our data can resolve as short as 15 km length, particularly in Central Nepal for periods up to 25 s. The resolution length for 35 s is low as we have a limited number of stations for that period. Therefore, we consider 0.30° grid size to compute the group and phase velocity maps for a period range of 3-35 s.

Vertical Resolution Applying Bayesian Inversion
In the second experiment, we test the vertical resolution of S-wave velocity by applying Bayesian inversion of dispersion curves. As the data are weakly sensitive to P-wave velocity and density, we do not invert those parameters. For this test, we prepared synthetic dispersion curves for both phase and group velocity using the Raydisp subroutine (Doornbos, 1988) for various models, which include (a) a weak low velocity in the crust at 10.5 km depth using dispersions for periods 3-35 s, (b) same as (a) but using dispersions for periods 5-35 s, (c) a strong low velocity in the crust around 30 km depth beneath the surface, (d) velocity increasing as a function of depth. The details of these experiments are mentioned in supplementary materials (Text S3 in the Supporting Information S1).
These four synthetic experiments (Figures S9-S12 in the Supporting Information S1) suggest that the S-wave velocity is well recovered with narrower uncertainty at a shallower depth, but the uncertainties become larger for deeper depths and at depths where the velocity jump is significant. Additionally, when the lower period data is missing, the resolving power of the data for the shallow crustal structure decreases, thereby increasing the uncertainties. However, when we combine both phase and group velocity dispersions, uncertainties become lower.

Vertical Resolution Applying Autocorrelation
Here, we present three synthetic experiments for the feasibility of P-wave coda autocorrelation of distant earthquakes to recover the discontinuities in the crust and the Moho depth. In the first experiment, we computed vertical and radial synthetic seismograms (for various epicentral distances) for a 2-layer crustal model with a discontinuity at 8 km depth. Here we computed an impulse response using the respknt code (Randall, 1989), and the impulsive response was convolved with the source (Ricker wavelet). Then the random noise with a 10% standard deviation was added to the synthetic data. Then we computed the autocorrelation of both vertical and radial components. To compute the autocorrelation, we first performed a Fast Fourier transformation (FFT) of the P-wave coda. Then we applied spectral whitening in the frequency domain to enhance the higher-frequency content. The spectral whitened spectrum is obtained by dividing the amplitude spectrum by smoothened average of the spectrum (Pham & Tkalcic, 2017). Finally, we computed the autocorrelation of spectrally whitened data in the frequency domain, and an autocorrelogram in the time domain is obtained by applying the Inverse Fast Fourier Transform (IFFT). Note that the autocorrelation of the whitened signal is band-pass filtered (as described in Section 2) to improve the sharpness of the reflected phases and to avoid the spurious effect caused by the unexpected amplification of very high-frequency noise due to spectral whitening. To enhance the SNR of the signals and suppress the incoherent features such as the effect of the source-time functions, earthquake depth phases, and source side scattering, we stacked the autocorrelation from all the events at various epicentral distances by using phase weighted stacking method (Schimmel & Paulssen, 1997). The results of the synthetic experiments are presented in supplementary materials (Text S4 in the Supporting Information S1).
All the experiments presented in supplementary materials (see Figures S13-S15 in the Supporting Information S1) show that the methods described in Section 3 successfully retrieve both lateral and depth structures, including depth discontinuities for different scenarios that have been observed in our study region. We now apply these approaches to the observed data. Figure 4 shows the results for 2-D tomographic maps obtained applying linearized inversion of dispersion curves computed for all the station pairs. The group velocity maps at shorter periods, 5 and 10 s (Figures 4a and 4b),

Results
show the presence of a low-velocity structure south of MCT. In contrast, high velocities are observed north of the MCT and extend northward to the IYS, which delineates this high-velocity anomaly from a low-velocity anomaly north of the IYS. The low-velocity anomaly observed in the south likely corresponds to sediments believed to have been deposited due to the erosion of Churia Hills and Mahabharat ranges brought by rivers and weathering. We associate the observed high-velocity anomaly between the MCT and IYS with the presence of dense metamorphic rocks of the Higher Himalayan Crystallines (Hodges, 2000;Murphy & Yin, 2003).
At 15 s period (Figure 4c), we observe a discontinuous high-velocity anomaly between MBT and MCT which broadens and becomes somewhat continuous at 20 and 30 s (Figures 4d and 4e). At 35 s period (Figure 4f), this somewhat continuous high-velocity anomaly breaks into 2 patches of high-velocity zones, with the highest velocity observed beneath Kathmandu. This high-velocity feature maybe associated with the Indian lower crust and upper mantle structure. We also observe low-velocity anomaly north of IYS on the 5 s ( Figure 4a) period group map and north of MCT on the 30 and 35 s period maps (Figures 4e and 4f). This strong low-velocity feature at the higher periods likely indicates that we are sampling upper-mid crustal structures of the thick Tibetan crust.
Our phase velocity maps (Figures 4g-4l), in general, show similar anomaly patterns as observed in the group velocity maps (Figures 4a-4f), but here, the anomaly signature can be more easily associated with vertical tectonic structures. For example, the phase velocity maps show a low-velocity anomaly south of our study area at 5 and 10 s period, shutting down at 15 s period (Figures 4g-4i). This is in agreement with the presence of a 10 km thick sedimentary basin (Mitra et al., 2006). Also, from 20 to 35 s period (Figures 4j-4l), we observe that low velocities are well confined to the north of STD, which is possible due to the thicker crust beneath Tibet.
We then estimated the 1-D S-wave velocity and its uncertainty as a function of depth up to 60 km by applying joint inversion of phase and group dispersions extracted at every node of 0.30° × 0.30° grids. As described in Section 4 for synthetic experiments, Vp/Vs ratio is fixed at 1.73. The P-wave velocity was derived using the Vp/ Vs ratio while the density was derived from P-wave velocity. Then we applied our algorithm to estimate the 1-D velocity profiles for S-wave velocity. Few examples of PPD of rjMCMC inversion of real data are shown in Figures S16-S21 in the Supporting Information S1. For these inversions, we used various models as starting models, but the final results did not change. We set a uniform prior ( ±1 km/s from the reference model) to all depth. As for the reference model, we first considered AK135 as our reference model with wider priors, then the maximum likelihood model was obtained from this inversion. Finally, this maximum likelihood model was used as a reference model in the second inversion. After inverting data from all the nodes, we combined all the 1-D velocity profiles to compute the 3-D velocity maps and their uncertainties. The uncertainties are computed in the same way, but the 1-standard deviation of the velocity as a function of depth was used. The horizontal slices of 2-D S-wave velocity models at various depths are shown in Figure 5, and their uncertainties are presented in Figure S22 in the Supporting Information S1.  (Figures 5g and 5h), relatively low S-wave velocity is observed beneath South Tibet and high velocity beneath Nepal. However, it is challenging to interpret results at greater depths due to large uncertainties ( Figure S22 in the Supporting Information S1). We found uncertainties increase as a function of depth due to poor sensitivity of dispersions curves (up to 35s) to deeper structures. We also observe higher uncertainties at shallower depths where velocity jumps are strong because the dispersion curves are not sensitive to the depth discontinuities.
We created seven depth profiles of S-wave velocity perpendicular to the strike and two depth profiles of S-wave velocity parallel to the strike of the 2015 Gorkha earthquake, and among them, four profiles are presented in Figure 6, and the other profiles are presented in Figure S23 in the Supporting Information S1. Figure 6 shows a variation in upper and mid-crustal complexity beneath Nepal and South Tibet. The boundary (black dashed line in Figures 6a and 6b) that separates the low velocity from high velocity in the south of the duplex (marked as R1, Figures 6a and 6b) and the boundary that separates high velocity from low velocity in the north of duplex region is considered as the MHT (Priestley et al., 2019). The MHT we consider here is taken from Mencin et al. (2016) and Mendoza et al. (2019). Here, the most pronounced feature is the presence of low velocity up to 10 km in southern Nepal (feature L1, Figures 6a, 6b, and 6d) and mid-crustal quasi horizontal LVZ observed between ∼15 and 25 km depth along the profiles AA' and BB' (feature L2, Figures 6a and 6b). Traversing along these profiles, the LVZ is seen to be connected to a shallow and much broader low-velocity region to the south and a deeper prominent sub-horizontal LVZ to the north (feature L3, Figures 6a and 6b). The presence of a continuous low-velocity layer extending from shallow depths to mid-crustal depths has been observed in receiver function studies beneath Himalaya-Tibet (e.g., Duputel et al., 2016;Nábělek et al., 2009) and has been interpreted as the signature of the MHT.
We observed a west-dipping low-velocity structure between the mainshock and the biggest aftershocks of the 2015 Gorkha earthquake on the profile CC' (Figure 6c). This might be the lateral ramp on the MHT as discussed by Kumar et al. (2017) (feature LR, Figure 6c). The profile CC' shows relatively low-velocity anomalies in the west of the mainshock and east of the biggest aftershock of the 2015 Gorkha earthquake (feature L4 and L5, Figure 6c). The high-velocity structures were observed at a depth greater than 30 km due to the subducting Indian crust, but the uncertainties are large at these depths due to strong velocity variation ( Figure S24c in the Supporting Information S1).
We also found a high S-wave velocity (∼4.2 km/s) at depth of 35-40 km beneath the High Himalaya along the profile BB' (feature H1, Figure 6b) with higher uncertainties due to sharp velocity jump ( Figure S24b in the Supporting Information S1). This high velocity has been previously observed beneath the same region (Guo et al., 2009) as well as beneath India (e.g., Kumar et al., 2021). Monsalve et al. (2008) also found a high Vp/Vs ratio (∼1.78) at this depth beneath the High Himalaya while the P-wave velocity is ∼7.0 km/s. This gives us an S-wave velocity of ∼4.0 km/s which is within the uncertainty of our velocity model. This fast S-wave velocity can be due to the eclogitization of the lower crust (Christensen & Mooney, 1995;Schulte-Pelkum et al., 2005). This could also be due to the underthrusting of the Indian plate (Monsalve et al., 2008).
In addition to the velocity model, we estimated seismic interfaces from autocorrelation. We used the velocity model obtained from our work to convert the delay time to depth information and overlaid the results on the velocity model along the DD' profile located in the vicinity of the HiCLIMB stations (Figure 6d). Examples of individual P-coda autocorrelation for earthquakes from different back-azimuth at H0170 and H0260 stations are shown in Figures S25 and S26 in the Supporting Information S1. The H0170 station is in the south of MCT where the interfaces are flat while H0260 is located above the dipping interface ( Figure S26 in the Supporting Information S1). Figure S25 in the Supporting Information S1 shows that the autocorrelation signal picked close to 13 s does not vary as a function of back-azimuth for this station while the signal picked near 15 s vary weakly as a function of back-azimuth in the case of H0260 ( Figure S26 in the Supporting Information S1) which may be due to the dipping or anisotropic structure. We note that this study considers only isotropic structure. Therefore, we do not analyze autocorrelations of transverse component seismograms as signals from dipping and anisotropic structure mostly appear on the transverse component seismograms. To further test the effect of dipping layer and anisotropic structures, we performed synthetic experiments, in which 3-component synthetic seismograms (vertical, radial, and transverse) were generated at various epicentral distances and back-azimuths, for a dipping interface (10°) at depth of 40 km, using RAYSUM packages (Frederiksen et al., 2003). Then Gaussian random noise was added to all the seismograms. The individual and phase-weighted stack autocorrelations are shown in Figure S27 in the Supporting Information S1 for flat Moho interface and Figure S28 in the Supporting Information S1 for dipping interface. These experiments show that calculated travel time of 2p and 2s phases are close to the theoretical travel time of 2p and 2s phase in the case of flat interface ( Figure S27 in the Supporting Information S1).
In the case of dipping Moho interface, the arrival time of 2s phase is higher, but it is difficult to see the coherent signal on the transverse component as we use the same level of noise in both radial/vertical and transverse components. The retrieved depth for both flat and dipping interfaces are compared in Figure S28g in the Supporting Information S1, and we found that estimated depth of the discontinuity is overestimated by approximately 1 km.
Results for individual stations stacked autocorrelations are presented in Figure 7. The depth interfaces computed beneath all the stations are overlaid on the S-wave velocity profile in Figure 6d and they are separately presented in Figure S29 in the Supporting Information S1. We identify four interfaces beneath our study region. The depth of the top interface decreases toward the north (Figure 7b and Figure S29 in the Supporting Information S1), which is well correlated with the geology of Nepal (Upreti, 1999). The southern portion may represent the sediments deposited beneath Nepal. The depth of the second interface is 10 km in the southern part and increases to 20 km in the north (Figure 7d and Figure S29 in the Supporting Information S1). This interface follows the transition from a high-velocity layer to a lower velocity and may represent the MHT. The third interface is located at the transition from low velocity to high velocity with a depth of 20 km in the south and 30 km in the north (Figure 7c and Figure S29 in the Supporting Information S1), and this discontinuity may represent an interface between the upper and lower Indian crust. The fourth interface is located at a depth of 40 km beneath the southern and deepens north of MCT (Figure 7e and Figure S29 in the Supporting Information S1). We interpret this as the

Discussion
To provide a high-resolution velocity structure of the lithosphere and geometries of discontinuities beneath Nepal Himalaya, we performed a joint analysis of ambient noise tomography and autocorrelation of coda P waveforms using data from different networks of seismic stations located in Nepal and India.
The determination of the 3-D S-wave velocity model is from the inversion of the Rayleigh wave group and phase velocities derived from ambient noise cross-correlation. The inversion results indicate a pronounced LVZ at 15-25 km depth (feature L2, Figures 6a and 6b). The presence of a LVZ beneath Nepal Himalaya has been reported in previous studies along different receiver function profiles (e.g., Duputel et al., 2016;Nábělek et al., 2009) but our results provide an absolute 3-D S-wave velocity structure. This LVZ is interpreted as the signature of the MHT.
To the south, the LVZ is connected to a broader shallower low-velocity region, where most seismicity is confined. This shallow low-velocity region appears to be connected to the LVZ through a ramp structure (feature R1, Figures 6a and 6b). The location of low velocity at this ramp has been identified as duplex using the aftershocks density of the 2015 Gorkha earthquake (Mendoza et al., 2019) where a system of multiple north-dipping faults is bounded. We cannot identify all the dipping faults in this ramp due to the resolution length being longer than the width of the duplex zone, but all the faults shown by aftershocks are within a ramp of this broad low velocity. The low velocity inside the duplex structure is expected due to multiple faults within a system.
CMT analysis of the Gorkha main shock and Mw > 5.0 aftershocks by Duputel et al. (2016) have shown that the flat portion of MHT is located within the shallow low-velocity region observed in our model and also seen on receiver function analysis. The geometry of the MHT has been outlined as a ramp-flat-ramp with shallow thrust fault flattening at depths between 10 and 15 km followed by a mid-crustal ramp connecting to a deeper low-dipping thrust at depths greater than 25 km (Duputel et al., 2016). Our models indicate that the LVZ (feature L2, Figures 7a and 7b) seen by Duputel et al. (2016) as a deeper LVZ is connected to a much deeper LVZ (feature L3, Figures 7a and 7b) through another ramp (feature R2, Figures 7a and 7b). This suggests that the MHT that was previously described as a continuous low-velocity structure (Nábělek et al., 2009) corresponds most likely to a system of LVZ connected by ramps. The presence of mid-crustal ramps beneath Lesser Himalaya affects the interseismic period as well as coseismic rupture segmentation and rupture characteristics (Dal Zilio et al., 2021). Also, it has a great implication for the Himalayan seismic cycle as well as the long-term construction of the Himalayas (Dal Zilio et al., 2021).
During the interseismic period, MHT can stably creep in the LVZ and locked south of it. This might result from the interseismic stress build-up in the border between the locked and continuously creeping part of the MHT (Bilham et al., 2017;Mencin et al., 2016). High shear stress rate was reported by Ader et al. (2012) in the south of the LVZ. The deeper LVZ may be creeping and/or ductile and may indicate the presence of aqueous fluids at high pore pressure. This low velocity also correlates well with high electrical conductivity (Unsworth et al., 2005) and low Vp/Vs ratio in the region (Monsalve et al., 2008). Additionally, high attenuation for the S and P waves has been observed in this region (Sheehan et al., 2014).
Our results (Figures 6a and 6b) allow us to identify within the overall structure of the MHT a clear LVZ (feature L2, in between two ramps R1 and R2), that starts with the zone of incomplete seismic coupling and continues with the fully creeping northernmost interseismic patch (e.g., Bilham, 2019;Jouanne et al., 2019). The stored elastic energy prior to an earthquake (e.g., Bilham et al., 2017;Kanamori & Brodsky, 2004) along 50-70 km and 90-100 km-wide low-velocity zones satisfies the rupture estimates of the 2015 Mw 7.8, and the 1934 Mw 8.4 events, respectively. Furthermore, our results show that the low velocity zone beneath High Himalaya starts at 80 and 100 km from the MFT in eastern (see Figure 6b) and central Nepal respectively (see Figure 6a), which is comparable with the downdip width of fault coupling estimated by Lindsey et al. (2018) along similar transects using GPS data set.
Specifically in connection with the 2015 earthquake, our paper highlights how the lateral variation of S-wave velocity along the MHT controls the rupture propagation and arrest. The low-velocity zone LVZ (feature L2, Figure 6a) appears to resist the rupture of the 2015 Gorkha earthquake from propagating toward the north, while L4 LVZ (Figure 6c) explains why rupture did not propagate toward the west, and L5 LVZ (Figure 6c) might be the reason for rupture termination in the eastern region of Nepal.

10.1029/2022JB026195
15 of 21 The low velocity (feature L3, Figures 6a and 6b) in the middle crust (∼25-30 km) observed beneath South Tibet is consistent with previous studies (Nelson et al., 1996;Schulte-Pelkum et al., 2005;Unsworth et al., 2005;Zhao et al., 1993). This low velocity may indicate the presence of mechanically weak crust beneath South Tibet, which plays a significant role in Himalaya-Tibetan orogenic processes. Based on the thermal structure and composition of the rocks, partial melting seems to occur within the crust beneath South Tibet (McKenna & Walker, 1990). Shear heating along the plate boundary may warm the upper crust beneath South Tibet and the temperature line to cross to the melt line of the materials which is supported by thermal structure across our study region (Henry et al., 1997;Nelson et al., 1996;Royden, 1993;Wang et al., 2013). Thus, the low mid-crustal velocity observed in our study beneath south Tibet may be an indication of the presence of partial melt. Moreover, the low-velocity reservoir beneath the High Himalaya and South Tibet (Figures 6a and 6b and Figure S23 in the Supporting Information S1) might be compatible with an electrically highly conductive zone revealed by the magnetotelluric study across our study region (Lemonnier et al., 1999).
To understand whether the low S-wave velocity we observe at the MHT depth is real (Figure 6b), we performed a synthetic experiment. In this experiment, synthetic dispersion curves were computed for the BB' profile ( Figure 6b). Then the noise described in Section 4 was added to all the synthetic data. We then considered these synthetic dispersion curves with noise as data in the inversion. Figure 8 shows the comparison of the true velocity model (Figure 8a) and the retrieved velocity model (Figure 8b). This comparison shows that the slow S-wave velocity layers are well retrieved along the BB' profile with narrow uncertainties up to the depth of 40 km beneath the surface (Figure 8c). The uncertainties increase as a function of depth due to the weaker sensitivity of the dispersions data to the deeper structure. We also observe higher uncertainties at layer interface positions because the inversion tries to retrieve discontinuity through velocity smearing by adding layers. This experiment suggests that the observed low velocities are not an artifact of the inversion procedure.
We also extract the lateral variation of S-velocity across the MHT (Figure 9). The MHT depth was picked based on the MHT discontinuity suggested by the autocorrelation of coda waves. The lateral variation of S-wave velocity structure was also extracted for MHT depth proposed by Mendoza  Information S1), Hubbard et al. (2016) ( Figure S31 in the Supporting Information S1), and Nabelek et al. (2009) ( Figure S32 in the Supporting Information S1). The higher velocity patches broaden particularly for Nabelek et al. (2009) (Figure S32 in the Supporting Information S1) as their pick for the MHT is deeper than the other two studies. However, one of the common features among our study and MHT surface image extracted for these studies is that the two high-velocity patches in the east and west are separated by low-velocity structure. Additionally, we see the high velocity anomalies near the hypocenter and largest aftershocks of the 2015 Gorkha earthquake as we observed in our work. Figure 9 shows that the 2015 Gorkha earthquake nucleated in the high S-wave velocity region. Another high-velocity patch is observed around the hypocenter of the biggest aftershocks where the seismic density map before the biggest aftershock shows a higher density of aftershocks (Baillard et al., 2017). The boundary between high and relatively low velocity (North of Kathmandu) is the region, where Pandey et al. (1999) observed significantly different seismicity. The marked difference in velocity between the west and east is observed in the region where the rupture of the mainshock terminates. This low velocity is well observed in our north-south profile as well (feature L3, Figures 6a and 6b).
The lateral variation of S-wave velocity in the MHT is due to a change in the elastic properties of the material. The high S-wave velocity obtained in the west of longitude ∼86° and low velocity in the east of it indicates the presence of different elastic properties of the rock in this area. The lateral variation of S-wave velocity across MHT further provides details across the rupture area. The presence of an updip ramp toward the south and a downdip ramp toward the north of the coseismic rupture of the 2015 Gorkha earthquake might have played a crucial role in controlling the rupture in both directions (e.g., Avouac et al., 2015;Duputel et al., 2016;Wang et al., 2017), while the rupture arrested in the east may be explained by the presence of slow velocity material there (Figure 9). The high velocity observed around the hypocenter of the 2015 Gorkha earthquake indicates the presence of highly rigid material there, which is favorable for the nucleation of an earthquake. The relatively low-velocity region between the two high-velocity patches may also be the lateral ramp on the MHT (Kumar et al., 2017). The presence of a relatively low-velocity region beneath the north of Kathmandu may have slowed down the rupture velocity of the 2015 Gorkha mainshock and increased again in the region having a high-velocity patch (Harris & Day, 1997), observed near the hypocenter of the biggest aftershock, then finally arrested in the low-velocity region at longitude ∼86°.
The observed high and relatively low velocities agree with the P wave structure estimated within the rupture area (Pei et al., 2016). Previous studies show that the slip and energy radiated are relatively higher in the region where we observe higher velocity than in the low-velocity region (Fan & Shearer, 2015;Grandin et al., 2015). Baillard et al. (2017) also found a seismic gap in that area, while Hoste-Colomer et al. (2017) proposed a tear fault.

Conclusions
We presented a highly resolved 3-D structure of the crust and discontinuities, including the MHT beneath Himalaya Nepal, using ambient noise tomography and autocorrelation of the P-wave coda. This can be used to better understand the earthquakes, their locations, the physics of earthquakes, and mitigation of the seismic hazard in this region. Our results show that the velocity structure and interface depths identified by using autocorrelation agree well such that those discontinuities identified by autocorrelation are in the transitions from either low to high or high to low-velocity structure except for the topmost interface. We show that high resolution ambient noise tomography and autocorrelation of the P-wave coda can illuminate a detailed geometry of the MHT. We found two slow-velocity ramps connected by a flat low-velocity layer sandwiched between high velocities. This low velocity could be due to the aqueous fluid brought by the subduction of the Indian crust.
The presence of two ramps on the MHT at different depths has been well supported by both the coseismic and postseismic data of the 2015 Gorkha earthquake. The first ramp beneath the Lesser Himalaya correlates with the duplex structure on the MHT that has been proposed using aftershocks, although we cannot identify all the ramps within a duplex system due to the resolution limit. The second ramp beneath the High Himalayas is connected to a broad low velocity beneath South Tibet which may be due to the partial melt. We also observe a high-velocity structure at ∼35-40 km beneath the High Himalayas due to the partial eclogitization in the lower crust.
The different geometry and extent of the L2 LVZ in Central and Eastern Nepal, along with the respective slip from individual earthquake ruptures suggest an interesting avenue of research to decipher the paradox of variable surface slip at the front of MHT as retrieved from paleoseismic trenching in the Himalaya (e.g., Kumar et al., 2010;Sapkota et al., 2013;Upreti et al., 2007;Wesnousky et al., 2017). A systematic mapping of L2 LVZ beneath the Himalaya integrating seismic tomography and GPS geodesy can therefore provide a physical and a mechanical understanding of maximum Himalayan earthquake magnitudes (Bilham, 2019;. The occurrence of millenary Mw > 9.0 earthquakes  in Central and Eastern Nepal would require either a wider L2 LVZ than the ones identified in this work or the involvement of L3 LVZ identified beneath southern Tibet in storing enough elastic energy (Bilham, 2019;Feldl & Bilham, 2006).

Data Availability Statement
In this work, the authors obtained data from the Federation of Digital Seismograph Network (FDSN) with network identifiers XF (Nabelek, 2002), YL (Sheehan, 2001), XQ (Mendoza et al., 2019), and ZO (Bollinger et al., 2011). The authors only used the vertical components (BHZ and HHZ channels) of continuous time series data recorded by broadband stations. Out of the five networks used in this study, three of them (Hi-CLIMB, Hi-MNT, and Namaste) are available through the IRIS Data Service, and the authors downloaded data for each station in a particular network in MSEED format using the online form (https://ds.iris.edu/ds/nodes/dmc/forms/breqfast-request/) available in data center of IRIS. The continuous time series data from each station of HiK-NET network were obtained from RSEIF (https://seismology.resif.fr/dataselect/#/) in MSEED format. Our research group provided the data from Cambridge network (Priestley et al., 2019) in MSEED format. The vertical component of time series data (after single station data processing) of this network are made available for reviewers and editors through Zenodo (https://zenodo.org/deposit/7771967). For P-wave coda autocorrelation, the authors downloaded earthquake data in SAC format from IRISDMC using obspy (Beyreuther et al., 2010). The trans-dimensional code (Manu-Marfo et al., 2019;Pachhai et al., 2015) used in this study are available on Zenodo (https://zenodo. org/record/7825575). The authors generated all plots included in this study using GMT 5 (Wessel et al., 2013) and Matplotlib of version 3.1.1 (Hunter, 2007). H.R. Thapa acknowledges from Generali Insurance Group and The Abdus Salam International Centre for Theoretical Physics (ICTP) for providing financial support during his PhD studies. The authors thank Thanh Son Pham for the discussion about autocorrelation and comments from the editor and two anonymous reviewers improved the original manuscript. All the computations were performed on the ICTP Argo cluster. The authors would like to acknowledge UK funding source and the SEIS-UK for the seismic instrumentation for the Cambridge network and Indian Institute of Science Education and Research Kolkata (IISER-K) for providing the data of six seismic stations. The authors would also like to thank people from Nepal Seismological Center in Kathmandu and Regional Seismological Center in Surkhet who helped on the deployment, maintenance, and decommissioning of Cambridge network. The facilities of IRIS Data Services, and specifically the IRIS Data Management Center, were used to access waveforms, related metadata, and/or derived products used in this study. IRIS Data Services are funded through the Seismological Facilities for the Advancement of Geoscience and EarthScope (SAGE) Proposal for the National Science Foundation under Cooperative Agreement EAR-1261681.