Journal of Geophysical Research: Solid Earth

Reconciling a geophysical model to data using a Markov chain Monte Carlo algorithm: An application to the Yellow Sea–Korean Peninsula region

Authors


Abstract

[1] In an effort to build seismic models that are the most consistent with multiple data sets we have applied a new probabilistic inverse technique. This method uses a Markov chain Monte Carlo (MCMC) algorithm to sample models from a prior distribution and test them against multiple data types to generate a posterior distribution. While computationally expensive, this approach has several advantages over deterministic models, notably the seamless reconciliation of different data types that constrain the model, the proper handling of both data and model uncertainties, and the ability to easily incorporate a variety of prior information, all in a straightforward, natural fashion. A real advantage of the technique is that it provides a more complete picture of the solution space. By mapping out the posterior probability density function, we can avoid simplistic assumptions about the model space and allow alternative solutions to be identified, compared, and ranked. Here we use this method to determine the crust and upper mantle structure of the Yellow Sea and Korean Peninsula region. The model is parameterized as a series of seven layers in a regular latitude-longitude grid, each of which is characterized by thickness and seismic parameters (Vp, Vs, and density). We use surface wave dispersion and body wave traveltime data to drive the model. We find that when properly tuned (i.e., the Markov chains have had adequate time to fully sample the model space and the inversion has converged), the technique behaves as expected. The posterior model reflects the prior information at the edge of the model where there is little or no data to constrain adjustments, but the range of acceptable models is significantly reduced in data-rich regions, producing values of sediment thickness, crustal thickness, and upper mantle velocities consistent with expectations based on knowledge of the regional tectonic setting.

1. Introduction

[2] In the Earth sciences, we often find that Earth models derived using one set of geophysical data can be inconsistent with models derived using another set of observations. A classic example of this (discussed by Maggi and Priestley [2005]) can be found in the crustal thickness of the Zagros Mts. of Iran. Using gravity measurements and seismic results [Giese et al., 1984], Dehghani and Makris [1984] determined a 55 km thick crust. Snyder and Barazangi [1986], using similar data sets, determined a thickness of about 65 km. Using surface wave dispersion, Asudeh [1982] finds a thickness of 43–46 km, while using receiver functions, Hatzfeld et al. [2003] find a crustal thickness of 44–48 km.

[3] Since we are interested in understanding the nature of the true Earth, we seek the set of models that is most consistent with the full set of observations. Probabilistic inverse techniques, like the Markov chain Monte Carlo (MCMC) algorithm, have been successful in combining disparate data types into a consistent model. For example, Mosegaard and Tarantola [1995] used Monte Carlo sampling to jointly invert seismic and gravity data. The stochastic methods that we consider here invert data by probabilistically sampling the model space, comparing the observations predicted by the proposed model to the observed data, and preferentially accepting models that produce a good fit, thereby generating a posterior distribution of models. The model space is mapped through a series of stages that compare proposed models to data, with Bayes' theorem [Bayes, 1763] relating the prior and posterior distributions. This stochastic geophysical model (a probabilistic distribution of geophysical models) is able to reliably predict geophysical observations for a variety of data types, and provide accurate estimates of their uncertainties.

[4] Monte Carlo integration is a method that uses random processes to solve problems that are difficult (or impossible) to solve analytically. It works by drawing samples from a distribution, then forming sample averages to approximate expectations. MCMC draws these samples by running a cleverly constructed Markov chain for a long time [Gilks et al., 1996]. A Markov chain is a sequence of points in the model space whose probability at a given time depends upon the value at previous points. A random walk, which is defined as a perturbative sequence of random changes to a point in a multidimensional space, would be a good example. The Markov chain that we construct here follows a set of rules that preferentially moves to more likely states in the model space, but sometimes moves to less likely states. The result is a chain that can sample the complete model space, efficiently inspect high-likelihood regions, but that does not get trapped in local extrema.

[5] Both Bayes' Theorem and sampling methods like Monte Carlo have been extensively utilized in the Earth sciences. Tarantola [1987] employed a Bayesian framework for tomography, which incorporates a prior background model into the tomographic inversion. While these methods succeed at including prior information, they assume the model follows Gaussian statistics, and seek to find a single solution to the problem with uncertainties derived from this assumption. In our formulation of the problem, no assumptions have been made about the distribution of models and we find that in many instances Gaussian distributions are not applicable. Instead of a single model, what we seek is the distribution of models that are most consistent with both our prior information and our observations.

[6] MCMC originated in statistical physics and has been applied more recently in many different fields. According to the statistics lab at Cambridge University, which maintains an MCMC preprint service (http://www.statslab.cam.ac.uk/∼mcmc/), MCMC has been applied to fields such as agriculture, biostatistics, econometrics, electronics, epidemiology, genealogy, imaging, isotope radiodating, medicine, neurology, signal processing, and speech. Application in the Earth sciences has been more limited. In at least one recent study, Shapiro and Ritzwoller [2002] employed MCMC, along with a linearized inversion and simulated annealing, to invert for shear velocity structure using surface waves. This method has recently been applied to map electrical resistivity changes [Ramirez et al., 2005]. The methodology proposed here differs from previous Earth science applications by using multiple data types to constrain the model. We seek to employ the technique to estimate regional Earth structure using multiple geophysical data sets.

[7] An excellent review of the various Monte Carlo methods is given by Sambridge and Mosegaard [2002]. As explained in that paper, MCMC is one of a continuum of techniques which balance exploring the parameter space (exploration) with utilizing information (exploitation). In this scheme, classic Monte Carlo would fall along the exploration axis, while most search techniques would fall along the exploitation axis. Probabilistic techniques like the Neighborhood Algorithm [Sambridge, 1999a, 1999b], which makes use of Voronoi cells to drive the search, and MCMC, which utilizes a Markov chain, attempt to optimally balance exploration and exploitation. In this sense, MCMC is closer to explorative methods like a uniform random search than to exploitative methods such as the method of steepest descent. Probabilistic inverse techniques are also closely related to other well-used techniques such as simulated annealing and genetic algorithms.

[8] Once a model is selected, we test its acceptability by evaluating the fit of each proposed model against our observational data. The process of comparing model predictions and observations for a given data type is referred to as a stage. At each stage the proposed model may be rejected if the fit to the data has not improved relative to the previous model in the chain. Since we are using two data types to drive the model in our example, there are, potentially, two stages for each proposed model. By properly ordering our stages, we can quickly reject models that cannot fit the observations that are easiest to calculate. The data sets that are more computationally intensive to forward model are relegated to later stages, increasing the efficiency of the inversion. Another important distinction of this methodology is that we do not seek a single best model or to simplify a distribution of models into a model with uncertainties. Rather, the final product is a posterior distribution of models that is best able to fit all of the data. As we will show in the paper, this approach has several important advantages, including less restrictive distributions on the models, and the ability to reliably estimate uncertainties on observables.

2. Methodology

[9] Our MCMC approach is a derivative of the Metropolis algorithm [Metropolis et al., 1953] as described by Mosegaard and Tarantola [1995]. A brief description of the methodology follows. A more detailed description of the methodology (applied to an electrical resistivity problem) is given by Ramirez et al. [2005].

[10] The base sampler selects a proposed model (given our parameterization and constraints) as a random (i.e., Monte Carlo) perturbation from the current state in model space. By comparing the fit between the data predicted assuming the model m (in other words, by forward modeling the data) and the observed data, we can determine the likelihood L(md) of the proposed model. Bayes' rule relates the prior probability p and posterior probability P distributions as follows:

equation image

We use a likelihood function of the form:

equation image

where di is the predicted data for a given model m, d0,i is the vector containing the observed measurements, σi is the estimated data uncertainty, k is a normalization constant, and n ≥ 1. The value of n determines how significantly outliers are weighed. We selected n = 2 as a reasonable value between the end-member case where misfit weights are minimized and we are insensitive to outliers (n = 1) and the other end-member case where misfit weights are maximized and we are overly sensitive to outliers (n = ∞).

[11] Once a model is proposed, a decision needs to be made whether or not to accept the proposed model. If these transitions were always accepted, from one model realization to another, then the simulation would simply be sampling from the prior distribution. Instead, suppose that the proposed transition is only accepted by computing the respective likelihoods L(mi) and L(mj) for both the current and proposed states mi and mj. If L(mj) ≥ L(mi), then accept the proposed transition and move to state mj. If L(mj) < L(mi), then use a randomized decision rule accepting the proposed transition when a random number between 0 and 1 is less than the probability L(mj)/L(mi), and move to state mj. Otherwise, revert back to state mi. By allowing the chain to transfer to a less likely state, the process can move out of local extrema. This guided search through the model space is the Markov chain.

[12] The samples generated through this process will have a limiting distribution that is proportional to the desired posterior distribution P(md0). That is, the search tends to hover in regions of the model space containing states that better fit the prior information and seismic measurements.

[13] Stochastic models have several advantages over conventional models. First, while the model is data driven, it is very easy to incorporate prior information. This information can range from less rigid prior constraints, such as our starting model, to very specific constraints such as a fixed crustal thickness where refraction data may have defined it unambiguously. Importantly, the staged approach presents a means of reconciling data from many sources (i.e., traveltimes, dispersion measurements, receiver functions, waveforms, etc.) having widely different sensitivity functions. The posterior distribution represents models that are most consistent with all of the observed data. The relation between the different data types does not need to be explicitly defined since predictions for each data type are simply forward calculated from the proposed models.

[14] Using this technique, we obtain the full posterior model distribution, and can reliably estimate the uncertainties of model parameters; both the posterior model distribution and the uncertainties will be consistent with the uncertainties of the observables. Models and the parameters that describe models (i.e., sediment thickness) are not constrained to be normally distributed but could have non-Gaussian or even multimodal distributions. For example, Ramirez et al. [2005] used clustering analysis to find several distinct solutions in the model space which would be completely smeared out with an assumed Gaussian distribution. Model uncertainty can then be easily mapped into uncertainties for observable parameters. For example, the posterior distribution of models can be used to estimate uncertainties for traveltimes or traveltime corrections. The uncertainties of these traveltimes will reflect the uncertainties of the model parameters along with their correlations.

3. Data and Parameterization

[15] We have selected as our study area the region of Eastern Asia in the vicinity of the Yellow Sea and Korean Peninsula (YSKP) (Figure 1). We use seismic data from a number of sources in the region, including stations from the Global Seismic Network (GSN), China Digital Seismic Network (CDSN), and International Monitoring System (IMS) networks, as well as data from regional broadband networks like those of the Korean Meteorological Administration (KMA) and Korean Institute of Geoscience and Mineral Resources (KIGAM) in South Korea. We also made use of two PASSCAL deployments in China and North Korea.

Figure 1.

Geographic map of the Yellow Sea–Korean Peninsula region. Plate boundaries are shown by the thick black lines. Sediment thickness contours of 2 and 4 km are shown by the thin gray lines. The numbered diamonds indicate the location of profiles examined later in the text.

[16] We use surface wave group velocities and body wave traveltimes to construct the model. Rayleigh wave dispersion measurements have been made for thousands of paths across the region using a multiple filter analysis technique and, as shown in Figure 2a, they provide excellent coverage [Pasyanos et al., 2003; Pasyanos, 2005]. The data is assembled by taking dispersion measurements for periods between 10 and 100 s at 5 s intervals. We have over 6000 paths at 25 s period, with fewer paths (as few as 2000) at longer and shorter periods (Figure 2b). The short-period data are sensitive to the shallowest Earth structure (i.e., sediments), intermediate period data to deeper crustal structure (i.e., crustal velocity, crustal thickness), and long-period data primarily to upper mantle structure. Combined, they do an excellent job of predicting average S wave velocity structure. Surface wave data, however, have the drawback of allowing large tradeoffs between estimated velocities and the depths of discontinuities, allowing multiple models consistent with the same data. The uncertainties of the group velocity measurements are derived from the broadness of the energy arrival; they generally range from 0.05 km/s to 0.10 km/s for typical paths.

Figure 2.

(a) Path map of surface wave measurements for 30 s Rayleigh waves. Yellow circles show epicenters, red triangles represent stations, and blue lines show paths. (b) Histogram indicating the number of Rayleigh wave paths as a function of period.

[17] The second data type that we use is the traveltime of regional body wave phases, namely Pn (P velocity upper mantle head wave), Pg (crustal P wave), Sn (S velocity upper mantle head wave), and Lg (crustal guided S wave), all shown in Figure 3. Coverage is generally not as good as the coverage of surface waves because we limit ourselves to events which meet certain ground truth criteria [i.e., Bondar et al., 2004]. For example, Bondar et al. define several criteria such as the number of stations and the azimuthal gap to ensure location accuracy of 5 km (GT5) for local networks, GT20 criteria for near-regional networks, GT25 for regional networks, and GT25 for teleseismic networks. In addition, we have assigned GT0 for explosions where the location was known. While coverage is not as good as the surface waves, the body wave data are complementary to the surface wave data because some phases are more sensitive to P wave velocities and because they sample the Earth in a much different manner. For instance, Pn is sensitive specifically to the P wave velocity in the uppermost mantle, just below the Moho. By combining the body wave data with the surface wave data, we can reduce the nonuniqueness arising from the tradeoffs discussed above. We infer uncertainties for the body wave traveltimes based upon the phase type, estimated accuracy of the event location, and estimated quality of the pick. The uncertainties reflect the ambiguity of the timing of the arrival. First arriving P phases generally have smaller uncertainties (ranging from less than 0.5 s up to 5.0 s) than phases picked in the coda of earlier arriving phases, and the more emergent Lg was given the highest uncertainty values, ranging from less than 4 s up to 12 s.

Figure 3.

Path map of body wave traveltimes. (a) Regional P wave phases, Pg (cyan) and Pn (red). (b) Regional S wave phases, Lg (blue) and Sn (green).

[18] The first implementation step for MCMC is to develop a mechanism (the base sampler) for generating model realizations from the prior distribution. In Bayesian analysis, the prior distribution is an assumed distribution which reflects our beliefs about the population being studied. In our case, it is a distribution of seismic models that represents the range of possible parameters. The base sampler is first used to populate the prior distribution for a given starting model and a given set of constraints on the model parameters (Figure 4a). It does this by successive perturbations, starting with an initial model. The base sampler is repeatedly applied until we are satisfied that the run has fully sampled the model space, forming a Markov chain that makes up the prior distribution.

Figure 4.

A schematic diagram of the MCMC inversion algorithm. (a) Flowchart for the prior distribution run (all models proposed by the base sampler are accepted). (b) Flowchart for the posterior distribution run containing two stages.

[19] A separate chain is run in which each model realization is compared to our data at each step in the chain (Figure 4b). Since we are using two data types there are two stages of testing to determine the fit of the proposed model to the observed data. During this process a proposed model is either rejected or accepted based upon the data fit. If a model is accepted, a copy of the model is placed in the posterior distribution. If the model is rejected, the chain returns to the previous model and the process is repeated. An additional copy of the previous model is again added to the posterior distribution, accounting for the increased probability of that model. Thus both prior and posterior distributions are generated.

[20] We parameterize the model as a set of layers in a regular latitude-longitude grid of 2° × 2° cells. The layers represent geologic structure (water, upper sediments, lower sediments, upper crust, middle crust, lower crust, upper mantle) which are specified by their seismic parameters and thickness. The upper mantle layer extends down to 120 km depth, where it sits over the ak135 model [Kennett et al., 1995]. This parameterization has the advantages of reducing the size of the problem, imposing realistic constraints, and making the resulting models more interpretable. The model covers the region between 23° and 57°N latitude and 109° and 147°E longitude, for a total of 323 cells. Lateral smoothness constraints between the cells has been considered but, given the relatively low (2°) resolution of the current study, they are regarded as unnecessary. These constraints will probably be required as the resolution of the model is increased. For a model having 6 variable layers (all but water) and 4 parameters for each layer (thickness, Vp, Vs, density), 323 cells would result in 7752 free parameters, if all of them were completely independent. As explained in the next section, however, Vp, Vs, and density are not completely independent, so in practice there are a considerably fewer number of free parameters.

[21] The starting model selected for the region was CRUST2.0 [Bassin et al., 2000], which is similarly parameterized. We provided the base sampler with the permitted range for each parameter. We also specified standard deviations and permitted range of values as approximations of the regional variations in the starting model. From the starting model and parameter constraints, the base sampler initially select models consistent with the a priori distribution. The seismic parameters are each selected independently, which produces variations in Vp:Vs and Vp:density ratios. The ratios are restricted, however, to realistic values for the given layer type. The result is that Poisson's ratio varies from about 0.28–0.40 (Vp:Vs between 1.8–2.4) for sediments, and 0.25–0.30 (Vp:Vs between 1.73 and 1.87) for crystalline crust and upper mantle layers. A similar relation exists for Vp:density.

[22] In order to reduce dependence on the starting model and introduce some variation in the initial models for the Markov chains, we have randomized the starting model for each chain in our search. This is accomplished by randomly swapping profiles for 10% of the columns in our model. We actually use two Markov chains per inversion run. Using multiple chains permit the exploration of the model space beginning at separate starting points, allowing a quicker sampling of the model space, particularly in complex problems. This ensures coverage of the target distribution, and can also be used to monitor convergence. Prior to convergence, the run has not had enough time to properly sample the model space. The initial period, referred to as “burn-in,” is still unduly influenced by the starting model. When convergence is finally reached, each Markov chain will have had a sufficient number of steps to independently sample the model space. The chains should have similar posterior distributions, which should be stationary in the long run. Convergence is monitored by determining when the Markov chains have become independent of their individual starting points. This is done by comparing sequences drawn from the chains that were applied to different starting points; the chains have converged to distributions that are independent of their starting points when their sequences appear indistinguishable from one another [Gelman, 1996].

[23] Finally, an important parameter for tuning the search is the model step size. This term refers to the allowable distance of proposed models from the current model in solution space. If this parameter is set too high, then the concept of a Markov chain breaks down and the search approaches a straight Monte Carlo sampling. If this parameter is set too low, then the Markov chain does not rapidly move away from the starting model and convergence is slow or never reached. We have selected a step size of 0.10 which allows parameters of the proposed model to move a random amount up to 10% (as characterized by the probability density) from the current state.

4. Results

[24] We have tested our method by making a run consisting of two chains, each running over 8000 iterations, to generate the posterior distribution (Figure 4b). Recall that having multiple chains allow us to efficiently sample the model space, as well as allowing us to monitor convergence. Separately, we also performed a two-chain run in which the models proposed by the base sampler were automatically accepted (Figure 4a). This provides us with a prior distribution, against which we can compare and contrast the posterior distribution. In both cases, we have discarded models from the first 3/4 of the iterations to ensure that the results have achieved the independent, post burn-in convergence.

[25] To see how the method is performing, we look at several profiles in our model. The first is from the southeastern corner of our model in the Pacific Ocean (see point 1 in Figure 1). Figure 5 shows the profiles from the prior and posterior distributions at that point. The red and white colors indicate the profiles from the two Markov chains, while the green lines indicate range of model (dashed lines) and the CRUST2.0 starting model (solid line). The first thing to note is that the chains seemed to be well mixed, indicating that convergence has been reached and that the chains have “forgotten” their starting point, i.e., sufficient iterations were made to render them independent of starting conditions. Second, the range of allowable models is very wide indicating that high uncertainties remain. This geographic location, on the edge of our model, is constrained by little or no data to truly test the proposed models. In fact, if we compare the prior and posterior model distributions, we find little or no difference (Figure 5). In the absence of data, the posterior model closely resembles the prior, as we would expect. The histograms of model parameters (Figure 6) likewise reflect these similarities except, perhaps, for a slight thinning of the crust. Note, as well, that the prior distributions look Gaussian, which is a function of the base sampler. It would also be possible to specify other distributions, such as a uniform distribution.

Figure 5.

Model profile distributions for point 1 (see Figure 1) in the Pacific Ocean. (a) P wave and (b) S wave profiles from the prior distribution. (c) P wave and (d) S wave profiles from the posterior distribution. In Figures 5a–5d, velocity (in km/s) is plotted as a function of depth (in km), where red and black lines indicate profiles from separate chains and green lines indicate the starting model and range of models.

Figure 6.

Histograms of model parameters for (a) prior and (b) posterior models for parameters of (top) crustal thickness, (middle) sediment thickness, and (bottom) P velocity of the uppermost mantle from the model distributions of the Pacific Ocean profile (point 1) shown in Figure 5.

[26] The second profile that we consider is from the Yellow Sea at the center of our model (point 2 in Figure 1). Figure 7 shows the prior and posterior distributions, the red and black colors again indicating profiles from the two chains. Again, we find that the two chains are well mixed, but that the distribution of profiles is quite different. The range of models in the posterior distribution has decreased significantly, indicating that only models which fit the data well were accepted. Notice, in particular, the much narrower range of models for the sediment layers. In the presence of data, then, the model moves away from the prior distribution in cases where the prior distribution does not adequately predict the observed data values.

Figure 7.

Model profile distributions for point 2 (see Figure 1) in the Yellow Sea. (a) P wave and (b) S wave profiles from the prior distribution. (c) P wave and (d) S wave profiles from the posterior distribution. In Figures 7a–7d, velocity (in km/s) is plotted as a function of depth (in km), where red and black lines indicate profiles from separate chains and green lines indicate the starting model and range of models.

[27] Histograms of Moho depth, sediment thickness, and Pn velocity demonstrate the same (Figure 8). Not only have the peaks shifted, but the distributions have also narrowed, indicating lower uncertainties on these parameters. Notice as well that the distribution of sediment thickness is clearly non-Gaussian, appearing to be more of a Poisson distribution. Simply modeling the sediment thickness as a Gaussian distribution with a mean and standard deviation would not effectively characterize the model space. Although not a well-studied region, the models compare favorably to other observations in the Yellow Sea, including crustal thickness estimates of 30–32 km [Guangding, 1994] and sediment thicknesses ranging from 0.5–4.5 km in the vicinity of the profile [Laske and Masters, 1997].

Figure 8.

Histograms of model parameters for (a) prior and (b) posterior models for parameters of (top) crustal thickness, (middle) sediment thickness, and (bottom) P velocity of the uppermost mantle from the model distributions of the Yellow Sea profile (point 2) shown in Figure 7.

[28] The third profile that we consider is near Beijing in China (point 3 in Figure 1). In contrast to the Yellow Sea, this area has the advantage of being a well-studied region and an area that has good seismic coverage due to the location of stations BJI and BJT. Figure 9 shows the prior and posterior distributions. Like the profiles from the Yellow Sea, the range of models is significantly reduced in the posterior distribution. There are much tighter constraints on the velocity of the upper mantle and Moho depth. This results in a significant narrowing in the histograms of model parameters (Figure 10).

Figure 9.

Model profile distributions for point 3 (see Figure 1) near Beijing, China. (a) P wave and (b) S wave profiles from the prior distribution. (c) P wave and (d) S wave profiles from the posterior distribution. In Figures 9a–9d, velocity (in km/s) is plotted as a function of depth (in km), where red and black lines indicate profiles from separate chains and green lines indicate the starting model and range of models.

Figure 10.

Histograms of model parameters for (a) prior and (b) posterior models for parameters of (top) crustal thickness, (middle) sediment thickness, and (bottom) P velocity of the uppermost mantle from the model distributions of the Beijing profile (point 3) shown in Figure 9.

[29] It could be argued that the uncertainties of individual model parameters is still rather large (e.g., crustal thickness is 32.1 ± 4.2 km), given that this is a well-covered region. When we compare our findings to other studies, however, we do find a wide range of results, even in this area. For example, a Moho depth map for China [Zhu et al., 1996] puts the crustal thickness near Beijing between 36 and 38 km, while reflection profiles determined a crustal thickness of 35 km (Rongmao Zhou, personal communication), and a joint receiver function and surface wave dispersion for station BJT puts it at 40 km (Winchelle Sevilla, personal communication). Moreover, if we were to consider more seismically “fundamental” parameters like transit times, rather than more tectonically fundamental parameters such as crustal thickness, we would find more consistency and notably smaller uncertainties. For example, in Figure 11 we show histograms of one-way shear wave crustal transit times (the time it takes a vertically incident S wave to travel from the Moho to the surface). Here, we see a more significant reduction in the range of models in the posterior distribution.

Figure 11.

A comparison of histograms of one-way shear wave traveltimes (in s) for the Beijing profile (point 3). (a) Histogram for the prior distribution. (b) Histogram for the corresponding posterior distribution.

[30] This highlights the tradeoff between model parameters, as well as the correlation in uncertainties between them. In other words, in Figure 10, while a model having a crustal thickness of 40 km might be unlikely but possible and a model having a sediment thickness of 3 km also unlikely but possible, the combined probability of a model having both of these features (thick sediments and thick crust) would be very highly unlikely, since the combination of features would produce extremely long transit times, a quantity to which the data is sensitive.

[31] We can assemble the results from each point to create parameter maps. Figure 12 shows a map of crustal thickness, along with the associated uncertainties. Values for each individual point were determined using the mean and standard deviations of the posterior model distribution. The maps are generally consistent with our understanding of the regional tectonics. We find thin crust in oceanic areas such as the Sea of Japan and the Pacific Ocean, but not in the continental crust of the Yellow Sea. In general, however, the crustal thickness in the oceanic regions is still thicker than the 5–10 km normally observed. This is a long-observed phenomena of nonuniqueness with surface waves (recently discussed by Pasyanos and Walter [2002]) and primarily due to the fact that the difference between 5 km thick crust and 15 km thick crust is greatest at the shortest periods (<15 s) where we have the lowest density of oceanic paths.

Figure 12.

Map of crustal thickness and corresponding uncertainties for the YSKP region. (a) Crustal thickness (in km) of the posterior model and (b) crustal thickness uncertainties of the posterior model.

[32] The thickest crust in our region is found to the west beneath central China and Mongolia. We find that crustal thicknesses exhibit differences from the prior model on the order of 5 km. Compared to the prior model, the posterior model exhibits thicker crust in the Korean Peninsula and thinner crust in the Yellow Sea. Uncertainties are generally higher in the southwestern portion of our study area, where we have no body wave data. Inconsistency between data types is also reflected in higher uncertainties.

[33] Figure 13 shows a map of upper mantle shear velocity, along with corresponding values of uncertainty. The most obvious features are the significantly slower lid velocities under Japan, central China, and a band extending from Mongolia across northern China to Vladivostok, Russia. Some of these features are clearly residual signatures from the prior model which have not changed significantly from the starting model, due to lack of data (mainly regional Sn phases) in these regions. Still, compared to the starting model, the posterior model is up to 0.05 km/s faster in central China and the eastern portion of the slow band, and a similar value slower near the Korean Peninsula in the central portion of our model. Presumably, these changes are being driven by the Sn phases, where they exist, and the long-period surface wave data. There does not seem to be a systematic pattern to the uncertainties.

Figure 13.

Map of upper mantle S wave velocities and corresponding uncertainties for the YSKP region. (a) Upper mantle S wave velocity (in km/s) of the posterior model and (b) upper mantle S wave velocity uncertainties of the posterior model.

[34] Figure 14 shows a map of sediment thickness for the region. Regions of thick sediments correspond to the Bohai, Song Liao, Ordos, Sichuan, Huabei, and Yellow Sea Basins, and around Sakhalin Island. The thickest sediments (approaching 5 km) are found under the Bohai Basin in the northern Yellow Sea. The results compare favorably to other estimates of crustal thickness such as that from Laske and Masters [1997] but cannot recover the full depth of small basins because of resolution. Sediment thickness is essentially zero in large portions of the model. By and large, the maps correspond closely to the prior model. Although there are differences, they are generally increases or decreases (on the order of a km or two) in the depth of existing sedimentary basins, rather than the dissolution of basins from the prior model or the creation of new ones.

Figure 14.

Map of sediment thickness and corresponding uncertainties for the YSKP region. (a) Sediment thickness (in km) of the posterior model and (b) sediment thickness uncertainties of the posterior model.

5. Conclusions

[35] Stochastic methods are an innovative technique for producing data-driven models. They provide a number of advantages compared to traditional models, such as the ability to easily reconcile different types of geophysical data. The uncertainties of model parameters are also consistent with the uncertainties of the geophysical data, whereas they might not be if the posterior model distribution does not conform to assumptions about the model space (assumptions often made in least squares inversions). An important component with regard to data prediction (i.e., for traveltimes in order to improve earthquake location capabilities) is that by mapping out the probability density function, we have the ability to predict new observables with proper uncertainties. These uncertainties will reflect the uncertainties of the model parameters along with their correlation.

[36] Preliminary application of the technique to the YSKP region, using Rayleigh wave group velocities and regional traveltime measurements, shows promise. We plan to include other suitable data sets in continued work. Love wave group velocities may simply be added to the Rayleigh wave group velocities. Receiver function or waveform data will each require an additional inversion stage. Waveform modeling will probably provide the strongest constraints on the model along individual paths, but will also be the most computationally expensive to implement. We plan to further improve the model by incorporating additional data sets to better constrain poorly sampled regions and to increase the resolution.

Acknowledgments

[37] We thank Flori Ryall for her traveltime picks used in this study. We thank two anonymous reviewers for their helpful suggestions. We also thank Shelly Johnson and Charlotte Rowe for their comments on an earlier version of the manuscript. This work was performed under the auspices of the U.S. Department of Energy by University of California Lawrence Livermore National Laboratory under contract W-7405-ENG-48. This is LLNL contribution UCRL-JRNL-206357.

Ancillary