## 1. Introduction

[2] In the Earth sciences, we often find that Earth models derived using one set of geophysical data can be inconsistent with models derived using another set of observations. A classic example of this (discussed by *Maggi and Priestley* [2005]) can be found in the crustal thickness of the Zagros Mts. of Iran. Using gravity measurements and seismic results [*Giese et al.*, 1984], *Dehghani and Makris* [1984] determined a 55 km thick crust. *Snyder and Barazangi* [1986], using similar data sets, determined a thickness of about 65 km. Using surface wave dispersion, *Asudeh* [1982] finds a thickness of 43–46 km, while using receiver functions, *Hatzfeld et al.* [2003] find a crustal thickness of 44–48 km.

[3] Since we are interested in understanding the nature of the true Earth, we seek the set of models that is most consistent with the full set of observations. Probabilistic inverse techniques, like the Markov chain Monte Carlo (MCMC) algorithm, have been successful in combining disparate data types into a consistent model. For example, *Mosegaard and Tarantola* [1995] used Monte Carlo sampling to jointly invert seismic and gravity data. The stochastic methods that we consider here invert data by probabilistically sampling the model space, comparing the observations predicted by the proposed model to the observed data, and preferentially accepting models that produce a good fit, thereby generating a posterior distribution of models. The model space is mapped through a series of stages that compare proposed models to data, with Bayes' theorem [*Bayes*, 1763] relating the prior and posterior distributions. This stochastic geophysical model (a probabilistic distribution of geophysical models) is able to reliably predict geophysical observations for a variety of data types, and provide accurate estimates of their uncertainties.

[4] Monte Carlo integration is a method that uses random processes to solve problems that are difficult (or impossible) to solve analytically. It works by drawing samples from a distribution, then forming sample averages to approximate expectations. MCMC draws these samples by running a cleverly constructed Markov chain for a long time [*Gilks et al.*, 1996]. A Markov chain is a sequence of points in the model space whose probability at a given time depends upon the value at previous points. A random walk, which is defined as a perturbative sequence of random changes to a point in a multidimensional space, would be a good example. The Markov chain that we construct here follows a set of rules that preferentially moves to more likely states in the model space, but sometimes moves to less likely states. The result is a chain that can sample the complete model space, efficiently inspect high-likelihood regions, but that does not get trapped in local extrema.

[5] Both Bayes' Theorem and sampling methods like Monte Carlo have been extensively utilized in the Earth sciences. *Tarantola* [1987] employed a Bayesian framework for tomography, which incorporates a prior background model into the tomographic inversion. While these methods succeed at including prior information, they assume the model follows Gaussian statistics, and seek to find a single solution to the problem with uncertainties derived from this assumption. In our formulation of the problem, no assumptions have been made about the distribution of models and we find that in many instances Gaussian distributions are not applicable. Instead of a single model, what we seek is the distribution of models that are most consistent with both our prior information and our observations.

[6] MCMC originated in statistical physics and has been applied more recently in many different fields. According to the statistics lab at Cambridge University, which maintains an MCMC preprint service (http://www.statslab.cam.ac.uk/∼mcmc/), MCMC has been applied to fields such as agriculture, biostatistics, econometrics, electronics, epidemiology, genealogy, imaging, isotope radiodating, medicine, neurology, signal processing, and speech. Application in the Earth sciences has been more limited. In at least one recent study, *Shapiro and Ritzwoller* [2002] employed MCMC, along with a linearized inversion and simulated annealing, to invert for shear velocity structure using surface waves. This method has recently been applied to map electrical resistivity changes [*Ramirez et al.*, 2005]. The methodology proposed here differs from previous Earth science applications by using multiple data types to constrain the model. We seek to employ the technique to estimate regional Earth structure using multiple geophysical data sets.

[7] An excellent review of the various Monte Carlo methods is given by *Sambridge and Mosegaard* [2002]. As explained in that paper, MCMC is one of a continuum of techniques which balance exploring the parameter space (exploration) with utilizing information (exploitation). In this scheme, classic Monte Carlo would fall along the exploration axis, while most search techniques would fall along the exploitation axis. Probabilistic techniques like the Neighborhood Algorithm [*Sambridge*, 1999a, 1999b], which makes use of Voronoi cells to drive the search, and MCMC, which utilizes a Markov chain, attempt to optimally balance exploration and exploitation. In this sense, MCMC is closer to explorative methods like a uniform random search than to exploitative methods such as the method of steepest descent. Probabilistic inverse techniques are also closely related to other well-used techniques such as simulated annealing and genetic algorithms.

[8] Once a model is selected, we test its acceptability by evaluating the fit of each proposed model against our observational data. The process of comparing model predictions and observations for a given data type is referred to as a stage. At each stage the proposed model may be rejected if the fit to the data has not improved relative to the previous model in the chain. Since we are using two data types to drive the model in our example, there are, potentially, two stages for each proposed model. By properly ordering our stages, we can quickly reject models that cannot fit the observations that are easiest to calculate. The data sets that are more computationally intensive to forward model are relegated to later stages, increasing the efficiency of the inversion. Another important distinction of this methodology is that we do not seek a single best model or to simplify a distribution of models into a model with uncertainties. Rather, the final product is a posterior distribution of models that is best able to fit all of the data. As we will show in the paper, this approach has several important advantages, including less restrictive distributions on the models, and the ability to reliably estimate uncertainties on observables.