Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California, USA
Corresponding Author: C. S. Takeuchi, Institute of Geophysics and Planetary Physics, Scripps Institution of Oceanography, University of California, San Diego, La Jolla, California, USA. (email@example.com)
 We present numerical models of earthquake cycles on a strike-slip fault that incorporate laboratory-derived power law rheologies with Arrhenius temperature dependence, viscous dissipation, conductive heat transfer, and far-field loading due to relative plate motion. We use these models to explore the evolution of stress, strain, and thermal regime on “geologic” timescales (∼106–107 years), as well as on timescales of the order of the earthquake recurrence (∼102 years). Strain localization in the viscoelastic medium results from thermomechanical coupling and power law dependence of strain rate on stress. For conditions corresponding to the San Andreas fault (SAF), the predicted width of the shear zone in the lower crust is ∼3–5 km; this shear zone accommodates more than 50% of the far-field plate motion. Coupled thermomechanical models predict a single-layer lithosphere in case of “dry” composition of the lower crust and upper mantle, and a “jelly sandwich” lithosphere in case of “wet” composition. Deviatoric stress in the lithosphere in our models is relatively insensitive to the water content, the far-field loading rate, and the fault strength and is of the order of 102 MPa. Thermomechanical coupling gives rise to an inverse correlation between the fault slip rate and the ductile strength of the lithosphere. We show that our models are broadly consistent with geodetic and heat flow constrains from the SAF in Northern California. Models suggest that the regionally elevated heat flow around the SAF may be at least in part due to viscous dissipation in the ductile part of the lithosphere.
 While the elastic half-space and layered viscoelastic earthquake cycle models can produce identical surface deformation, they represent fundamentally different mechanisms of stress transfer from plate motion to a seismogenic fault. The elastic half-space models postulate that faults in the upper brittle crust are loaded by localized shear at depth. Such a shear is usually prescribed as a boundary condition without consideration of the mechanisms of localization and the behavior of the ambient ductile rocks. The layered viscoelastic models stipulate postseismic stress transfer from a relaxing viscoelastic substrate back into the brittle crust. Interseismic localization of surface strain in such models is thus a “memory” of past earthquakes, and the effective rheology of the ductile substrate is usually chosen to match available geodetic data. Fault slip in such models is often imposed (rather than solved for), which as we argue below leads to unrealistic stresses in the seismogenic layer. The layered viscoelastic models typically predict fairly broad and diffuse viscous flow below the brittle-ductile transition. Geological and seismic observations, including exposed mylonite zones [Poirier, 1980; White et al., 1980; Rutter, 1999; Norris and Cooper, 2003], offsets of the Moho [e.g., Lemiszki and Brown, 1988; Stern and McBride, 1998; Zhu, 2000; Brocher et al., 2004; Weber et al., 2004], seismic velocity contrasts across faults at depth [Eberhart-Phillips et al., 2006; Thurber et al., 2006, Tape et al., 2009], and deep tremors on the downward extension of major faults [Nadeau and Dolenc, 2005; Shelly, 2010] indicate that localized shear zones do exist below the brittle-ductile transition, although the depth extent, the degree of strain localization (as a function of lithology, temperature regime, and fault slip rate), and the rheology of such “fault roots” are poorly understood [e.g., Bürgmann and Dresen, 2008; Wilson et al., 2004].
 In this paper we consider self-consistent models of the earthquake cycle that use laboratory-derived rheologies of rocks in the lower crust and upper mantle, typical geothermal gradients, and far-field loading (i.e., representing relative plate motion) to investigate the evolution of stress and strain as a function of fault age, plate velocity, composition of the ductile substrate, and thermal regime. We focus on the case of mature continental strike-slip faults such as the San Andreas fault (SAF) in California. We consider two mechanisms of strain localization, thermomechanical coupling and (implicitly) grain size reduction, and demonstrate that viscoelastic models that employ realistic rheologies become kinematically similar to elastic half-space models. Coupled thermomechanical models can be used to infer the magnitude of absolute stress (the effective strength) of the ductile part of the lithosphere, the temperature anomaly at depth and heat flow at the Earth's surface associated with long-term fault slip. There is a debate in the literature regarding the magnitude of stress in the lithosphere and shear heating below the brittle-ductile transition. For example, theoretical estimates of a thermal anomaly due to a strike-slip fault range from several kelvins [e.g., Lyzenga et al., 1991; Savage and Lachenbruch, 2003] to several hundred kelvins [e.g., Thatcher and England, 1998; Leloup et al., 1999]. We demonstrate that nonsingular models of transform faults constrained by experimental data require ductile stresses of the order of 102 MPa and temperature perturbations of the order of 102 K.
2. Model Description
 All numerical calculations presented in this study were performed using the finite element software Abaqus/Simulia (http://www.simulia.com/products/abaqus_fea.html). We simulate the earthquake cycle by applying a far-field velocity boundary condition representing tectonic loading, and allowing the fault to instantaneously slip in the upper crust to make up the displacement deficit accrued during the previous interseismic periods.
2.1. Model Geometry
 We consider an infinitely long strike-slip fault, which simplifies the problem to a two-dimensional antiplane-strain formulation. The model domain is composed of three rheological layers: a 12 km thick elastic upper crust underlain by an 18 km thick viscoelastic lower crust (30 km total crustal thickness) and a 45 km thick viscoelastic mantle (Figure 1). A fault cuts entirely through the upper crust and terminates within the lower crust at a depth of 17 km. We make use of the symmetric nature of deformation with respect to the fault plane to reduce the computational burden.
 The finite element mesh consists of two 50 km thick along-strike (z direction) element layers, each composed of 75 elements in depth (y direction) and 17 elements in the fault-perpendicular (x) direction, for a total of 2550 elements. The node spacing varies in the fault-perpendicular direction from 0.5 km on the fault to 93.58 km in the far field. Antiplane strain conditions are enforced by ensuring that each along-strike nodal layer deforms identically. The solution is insensitive to the chosen element sizes, as confirmed by simulations using meshes with finer nodal spacing.
 We use four rheological models of the lower crust and upper mantle. Two of these models assume classical linear Maxwell viscoelastic rheology, and the other two assume temperature-dependent power law viscoelastic rheology. Both crustal and mantle materials incorporate elastic behavior defined by the linear isotropic Hooke's Law, with a Young's modulus and Poisson's ratio of 80 GPa and 0.25, respectively. The upper crust in all models is composed of a purely elastic material with the same parameters. For simplicity, we assume that the elastic moduli do not vary with depth.
 We define the dynamic viscosities of the Maxwell viscoelastic materials by using ratios of the characteristic relaxation time to the earthquake recurrence interval. The characteristic relaxation time is given by tr = μ/G where G is the elastic shear modulus, and μ is the dynamic viscosity. Model M2000 represents a relatively strong Maxwell material with a relaxation time of 2000 years (μ = 2.02 × 1021 Pa s), and model M20 represents a weak Maxwell material with a relaxation time of 20 years (μ = 2.02 × 1019 Pa s). In the case of linear Maxwell models, no distinction is made between the lower crust and upper mantle.
 For the power law viscoelastic materials, the steady state constitutive relation between deviatoric strain rate and deviatoric stress σd is
where Q is the activation energy, R is the universal gas constant, and A and n are rheological parameters [e.g., Kirby and Kronenburg, 1987]. One can define the effective viscosity ηeff, such that
We assume mafic composition of the lower crust [Rudnick and Fountain, 1995] and ultramafic composition of the upper mantle [Anderson and Bass, 1984; Karato and Wu, 1993]. To allow for variations in composition and water content, we consider two end-member models of “wet” and “dry” lower crust and upper mantle. Laboratory-derived parameters of these rheological models are summarized in Table 1.
Table 1. Rheological Properties of Rocks From Laboratory Measurementsa
A (MPa−n s−1)
Q (kJ mol−1)
ρ (kg m−3)
k (W m−1 K−1)
Elastic moduli for all materials are Young's modulus = 80 GPa and Poisson's ratio = 0.25. All temperature-dependent calculations assume a specific heat of cp of 1000 J K−1 kg−1; thermal diffusivities are 7.37 ×10−7 and 9.04 × 10−7 m2 s−1 for diabase and olivine, respectively. The elastic upper crust has a conductivity of 2.5 W m−1 K−1 in all coupled power law models.
 Because we allow for temperature dependence in our power law viscoelastic materials, the material properties also include thermal parameters (Table 1). Over the duration of each power law simulation, we maintain 10°C at the Earth's surface and 1510°C at 75 km depth (the bottom of the model domain). Zero heat flux boundary conditions are applied on all remaining faces of the domain. We divide our power law models into two classes that aim to explore the efficiency of thermomechanical coupling [e.g., Yuen et al., 1978; Brun and Cobbold, 1980] as a mechanism for long-term strain localization. Therefore in each case we performed two sets of simulations, one excluding the feedback between viscous dissipation and temperature, and another allowing for full coupling.
 For the first (noncoupled) set of power law models, we prescribe a temperature profile that varies linearly between 10°C at the top surface (y = 0) and 1510°C at the base of the model (y = 75 km). This amounts to a conductive geothermal gradient of 20°C/km, typical of the upper continental crust [e.g., Turcotte and Schubert, 2002, p. 133]. The assumed geotherm may be appropriate for a tectonically active crust, but likely overestimates temperature in the lower crust and upper mantle in the stable continental lithosphere.
 For the second (fully coupled) class of power law models, viscous dissipation and heat conduction modify the thermal structure and thus ductile properties of the lower crust and upper mantle. For each finite element in the ductile regions, conservation of energy states that
where ρ is density, k is the thermal conductivity, cp is the specific heat, and H is the internal heat production rate per unit mass. For the noncoupled models, we assume a linear temperature distribution with depth and do not solve equation (3). For the coupled models, viscous dissipation contributes internal energy equal to the scalar product of stress and strain rate tensors,
where the repeating indices imply summation. During each model time increment Δt = tf − ti, heat conduction produces a temperature increment in each element
where κ = k/ρ cp is the thermal diffusivity of the material; viscous dissipation contributes a temperature increment
The model evaluates the total temperature increment ΔT = Tc + Tv in each element for each time increment Δt, and adds ΔT to the element temperature at the end of the previous time increment. The updated temperature field then modifies the effective viscosity through equation (2). In the upper crust, H = 0 and temperature evolution is governed by heat conduction alone. Thermomechanical coupling requires that the energy equation (3) and stress equilibrium equations
(in the absence of body forces) be solved simultaneously. In equation (7), the comma operator denotes differentiation.
 Shear heating has been investigated as a potential mechanism of strain localization in the ductile regime [Yuen et al., 1978; Brun and Cobbold, 1980; Fleitout and Froidevaux, 1980; Chery et al., 1991; Leloup et al., 1999; Montési and Zuber, 2002; Sobolev et al., 2005; Kaus and Podladchikov, 2006]. A preexisting weakness is required to initiate a positive feedback between thermal softening and localized shear. Here we are focused on well-developed equilibrium shear zones and are not concerned with the onset of localization. Note that stress concentration at the bottom tip of seismic ruptures provides a natural “seed” for strain localization in the ductile substrate. Other processes, such as dynamic recrystallization [e.g., Rutter, 1999, Montési and Hirth, 2003], may also contribute to strain localization. For an equilibrium grain size (reflecting a balance between dynamic recrystallization and static grain growth), and comparable contributions of dislocation and diffusion creep [De Bresser et al., 1998], the constitutive flow law governing viscous deformation with grain size reduction has the same stress exponent as dislocation creep [e.g., Montési and Hirth, 2003]. Therefore we use the thermally activated power law rheology (equation (1)) as a proxy for all strain-weakening mechanisms, assuming that the considered range of rheologic parameters such as the stress exponent n and the premultiplying factor A will account for potential contributions of other mechanisms.
 Coupled thermomechanical models need to be “evolved” to generate a temperature anomaly that reflects a balance between conductive heat loss and viscous dissipation in the lower crust and upper mantle in response to far-field plate motion and fault slip over geologic time. In our models this is achieved by applying the plate velocity both in the far field (x = 300 km) and on the fault in the elastic layer (0–12 km). We cosine taper the slip rate on the fault from the far-field rate at 12 km depth to zero at 17 km depth. The temperature structure is initially linear and one-dimensional (1-D) (20°C/km), and the model is kinematically driven over a given period of time, or until the temperature approaches a two-dimensional steady state. Note that because our models include variations in thermal conductivity with depth (Table 1), the corresponding steady state temperature gradient is no longer constant, and varies between the layers. These variations are calculated as part of the thermal evolution. We assume perturbations in temperature and strain rate due to viscous dissipation can develop and grow spontaneously, and do not consider conditions that lead to their initial development. Unless otherwise noted, all coupled simulations discussed below were evolved using a total plate velocity of 40 mm/yr for a slip duration of 20 Myr, comparable to the SAF slip history [e.g., Lisowski et al., 1991]. We simulate the long-term slip history and thermal evolution of the fault using adaptive time stepping without resolving individual earthquake cycles, which would otherwise be computationally prohibitive. The calculated thermal structure for each rheological end-member is then used as an initial condition for the respective earthquake cycle simulations.
 The approach described above worked well for the fully coupled model with weak (“wet”) end-member rheology. Coupled simulations using strong (“dry”) power law rheology generated extremely high stresses (>1010 Pa) in the lower crust, resulting in eventual thermal runaway, an instability involving a rapid temperature increase and a complete stress drop [Gruntfest, 1963; Anderson and Perkins, 1974; John et al., 2009], after which the model evolves to a new steady state. To avoid initial instabilities, we applied a perturbation to the 1-D temperature field. We sought the smallest initial temperature perturbation that ensured a quasi-steady solution in the ductile domain. Numerical tests showed that an initial perturbation of 250°C applied within 1.5 km of the fault plane in the depth interval from 10 to 17 km, and linearly decreasing to zero toward both the top (0 km) and bottom (75 km) of the model domain was sufficient to prevent unstable behavior during the kinematically driven thermal evolution. The initial temperature perturbation does not affect the structure of the “steady state” solution. In particular, in the absence of thermomechanical coupling the initial perturbation diffuses away to negligible values over a 20 Myr period. We subtracted the conductive contribution of the initial perturbations from the model predictions of temperature and heat flow discussed below.
Figures 2a and 2b show the temperature anomalies generated by viscous dissipation for each end-member rheology. The maximum temperature increase varies from ∼160°C in the case of wet composition, up to ∼375°C in the case of dry composition of the lower crust and upper mantle. As one can see in Figures 2c and 2d, near the fault the temperature field approaches a steady state after ∼10 Myr.
2.4. Simulations of Earthquake Cycles
 Simulations of earthquake cycles using coupled power law models were performed for both kinematic (displacement-controlled) and dynamic (stress-controlled) boundary conditions on the rupture surface. In kinematic models (typical of most previous studies of interseismic deformation), we apply an instantaneous coseismic slip on the fault surface such that the slip is constant (8 m) in the elastic layer (0–12 km depth), and cosine tapered to zero from 12 km to 17 km depth. We then lock the entire fault (0–17 km) for a period of 200 years (the earthquake recurrence interval). A constant velocity of 20 mm/yr, corresponding to the long-term half-slip rate, is applied at the fault-perpendicular far edge of the model (x = 300 km). The near edge (x = 0 km) has zero-displacement boundary conditions below the rupture tip (depths greater than 17 km).
 In dynamic models, the far-field loading is applied until the shear stress on the fault at a depth of 6 km (halfway through the elastic layer) exceeds a critical threshold, or the average fault strength σs. We use σs of 30 MPa [Brune et al., 1969; Zheng and Rice, 1998; Fialko et al., 2005; Fay and Humphreys, 2006] in most calculations described below. Once a critical threshold is reached, the fault is allowed to slip and is locked again once the shear stress at 6 km depth reaches 25 MPa, corresponding to a static stress drop of 5 MPa. In these simulations, the earthquake recurrence interval and coseismic slip are calculated as part of the solution rather than imposed a priori.
 Model configurations used in this study are summarized in Table 2. Each pair of rows from top to bottom adds an additional layer of complexity: temperature-dependent power law rheology, thermomechanical coupling, and stress-controlled ruptures. In figures, we refer to each model configuration by a shorthand term. Models involving linear Maxwell rheology are referred to by their associated Maxwell relaxation time, M20 (tr = 20 years) and M2000 (tr = 2000 years). Power law models are referred to using a three letter acronym, in which the first letter indicates the effective water content (D = dry, W = wet), the second letter indicates whether or not thermomechanical coupling is included (N = noncoupled, C = coupled), and the third letter indicates the type of boundary condition on the fault (K = kinematic, S = stress controlled or dynamic).
Eight model configurations utilized in this study. Each pair of rows down the table represents an additional level of complexity in turn: temperature-dependent power law rheology, thermomechanical coupling, and stress-controlled rupture. For power law rheology, nomenclature is as follows: the first letter denotes water content (D, dry; W, wet); the second letter denotes whether or not thermomechanical coupling is active (C, coupled; N, noncoupled); the third letter denotes the mechanism of coseismic rupture (K, kinematic; S, stress controlled).
μ = 2.02e19 Pa s
μ = 2.02e19 Pa s
μ = 2.02e21 Pa s
μ = 2.02e21 Pa s
DNK (Power law)
WNK (Power law)
DCK (Power law)
WCK (Power law)
DCS (Power law)
WCS (Power law)
2.5. Cycle Invariance
 Viscoelastic models of earthquake cycles often need to be “spun up” (i.e., run over multiple cycles) to ensure that predicted surface velocities are cycle invariant; i.e., do not depend on the number of cycles since the model initiation [e.g., Hetland and Hager, 2006]. The number of cycles required to accomplish this invariance scales with the ratio of the effective relaxation time to the recurrence interval. As we demonstrate below, power law models require significantly longer spin-ups compared to linear viscoelastic models. Furthermore, we show that models that achieve strain rate invariance (such that the history of surface velocities does not change from cycle to cycle) may not achieve stress invariance, with important implications for the mechanics of loading of seismogenic faults. Models that account for thermomechanical coupling may require spin-up times that are longer still because of the large timescales required to achieve thermal equilibrium. After a sufficient number of cycles, the incremental heat generation and bulk rheological change during a single cycle are negligible and stress and strain rate become effectively cycle invariant.
3. Results of Numerical Simulations
 In this section we present model predictions for fault-parallel shear stress and fault-parallel shear strain rate at the end of an interseismic period, immediately preceding the next slip event. We also show the predicted time-dependent surface velocities between two slip events (for two interseismic periods separated by 50 cycles, to illustrate cycle invariance).
3.1. Fault-Parallel Shear Stress
 The number of cycles required to achieve stress cycle invariance varies widely depending on the rheology of the ductile substrate. Figure 3 illustrates the evolution of stress at the end of repeated seismic cycles (i.e., immediately preceding the next earthquake) in different locations within the computational domain for the eight tested configurations. Maxwell models (Figures 3a and 3b) achieve cycle invariance in fewer than 100 cycles, with M20 reaching invariance almost immediately. In contrast, power law models that do not account for viscous heating fail to produce converging stresses even after many thousands of cycles (here we show stresses for the first 1000 cycles; we note that surface strain rate invariance is indeed reached in these calculations). Furthermore, predicted stresses are unrealistically high (Figures 3c and 3d). Results shown in Figures 3e–3h were obtained by applying a temperature field from a 20 Myr simulation (described in section 2.3) to a new (undeformed) mesh. The inclusion of thermomechanical coupling mitigates the high stresses predicted by the noncoupled models, though a significant number of earthquake cycles are still required to achieve cycle invariance (Figures 3e–3h). In the case of dynamic ruptures (Figures 3g and 3h) the stresses approach cycle invariance within a few hundred cycles.
Figure 4 shows the distribution of shear stress as a function of depth and distance from the fault after reaching stress cycle invariance for Maxwell and coupled power law models, and during a 1000 cycle spin-up for noncoupled power law models. Stresses are plotted at the end of the interseismic period. Model M20 shows essentially negligible stress in the entire domain, as all of the coseismic stress change is relaxed by the end of the interseismic period (Figure 4a). All kinematically driven models are associated with large negative (i.e., opposite to the sense of far-field loading) stress around the seismogenic fault (Figures 4a–4f). This is clearly unphysical, as the fault is forced to slip in a sense opposite to that of the resolved shear stress. Increasing stress on a fault to a “positive” value in such models requires an additional shear of the entire domain. As a result, shear stress in the elastic layer off of the fault is always higher than the stress acting on the fault, and may be in fact higher than the yield strength of the “intact” upper crust, depending on the magnitude of the developed negative stress anomaly on the fault (Figures 4a–4f). The stress-controlled models (Figures 4g and 4h) are self consistent in that they produce sustained earthquake cycles driven by far-field plate motion while incorporating laboratory-derived rheologies and realistic geothermal gradients. As expected, stresses are lower for wet (weak) compositions compared to dry (strong) compositions, although the difference is moderate. Stresses are also lower for coupled models compared to noncoupled models. Coupled kinematic and dynamic models produce similar stresses below the brittle-ductile transition.
3.2. Fault-Parallel Shear Strain Rate
 The linear Maxwell rheologies give rise to a broadly distributed viscous flow, as evidenced in the respective shear strain rate fields at the end of the interseismic period (Figures 5a and 5b). High ratios of the relaxation time to the recurrence interval (M2000) are required to maintain strain rate anomalies throughout the interseismic period (Figure 5a). As expected, a “weak” substrate model (M20) fails to produce a localized strain rate anomaly near the fault late in the interseismic phase, as most of the coseismic stress change is completely relaxed (Figure 5b). Noncoupled power law models show the effects of stress-dependent weakening, with a noticeable localization of strain rates in the crust and mantle within ∼25 km of the fault plane. Localization is more robust for the weak rheology (wet composition), with higher near-fault strain rates than in the case of strong rheology (dry composition). Also notable is the lobe of negative strain rate for the wet rheology, with highest magnitude ∼10 km away from the fault at a depth of ∼17 km, decaying away into the upper mantle (Figure 5d). This is also observed in the coupled power law models, and is especially prominent in the case of dry rheology (Figures 5e and 5g). These features are surprising, given that the inferred sense of shear is opposite to the sense of shear stress (Figure 4). They likely represent nonlinear viscoelastic effects. In particular, no backward flow is observed in models that impose a constant slip rate or a stress-free boundary condition on the fault. Coupled power law configurations also illustrate the enhanced strain localization produced by thermomechanical coupling (note the logarithmic color scale in Figure 5). The stress-controlled models produce strain rate fields that are nearly indistinguishable from those predicted by kinematic power law simulations.
3.3. Surface Velocities
 Spatiotemporal evolution of surface velocities is of interest, as it can be used to constrain rheological properties of the Earth's crust and upper mantle [e.g., Thatcher, 1975; Li and Rice, 1987; Pollitz et al., 2000; Kenner and Segall, 2003; Freed and Bürgmann, 2004]. Results shown in Figures 6a and 6b illustrate that simple linear Maxwell models fail to reproduce key geodetic observations, namely postseismic velocity transients and permanently elevated interseismic strain rates near the fault. In particular, the high-viscosity model (M2000) is able to generate an arctangent-like velocity profile throughout the interseismic period, but does not produce postseismic transients (Figure 6a). The low-viscosity model (M20) generates robust postseismic transients, but no strain rate anomaly around the fault late in the interseismic period (Figure 6b). Figures 6c and 6d illustrate that power law models are able to achieve surface velocity invariance while stress invariance is not reached (Figures 3c and 3d). All power law models produce both postseismic transients and arctangent-like profiles at the end of the interseismic period (Figures 6c–6h). “Wet” compositions result in more robust early postseismic transients, as one might expect. We note that the wavelength of the transient velocity peak is nearly the same in all power law models, and is considerably larger than that due to the low-viscosity Maxwell model (Figure 6b). We interpret this result as indicating that the high-stress lid (Figures 4c–4h) does not contribute much to postseismic deformation, and the latter is controlled primarily by viscous relaxation in the weak substrate. Simulations using the same rheologic parameters as in model M20, but assuming the thickness of the elastic layer of 30 km (instead of 12 km) produced a wavelength of the surface velocity profiles comparable to that seen in Figures 6c–6h.
 We also investigated to what extent the predicted surface velocities depend on the size of the computational domain. In particular, we performed simulations in which the domain size in the fault perpendicular direction was increased by a factor of 3 (from 300 to 900 km, see Figure 1). A common feature of all models driven by a velocity boundary condition applied on the sides is a small but nonvanishing strain rate in the far field. The magnitude of this far-field strain rate does depend on the domain size (large domains giving rise to smaller strain rates at the end of an interseismic period). However, the near-field strain rate (within several locking depths from the fault trace) is relatively insensitive to the assumed size of the computational domain.
4. Comparison With Observations
 In this section we compare predictions of our models to available observations. We do not tailor the models to specific earthquake scenarios, as we are interested in overall qualitative features of the models.
4.1. Geodetic Observations
 Unfortunately, few observations exist of surface deformation spanning the entire cycle of great earthquakes on a mature strike-slip fault. Here we use a data set collected over a period of 87 years following the 1906 San Francisco earthquake [Kenner and Segall, 2003]. This data set contains GPS, trilateration, and triangulation measurements of surface strain rate. While the available data are too sparse and imprecise to discriminate between candidate rheologies, they may be sufficient to test whether the models produce reasonable surface deformation patterns. We compare modeled surface strain rates with the observed rates (Figure 7) at several epochs after the earthquake. The data shown in Figure 7 were corrected for interseismic deformation [Kenner and Segall, 2003]; correspondingly, we subtracted the late interseismic strain rates from each model prediction.
 All models except M2000 and DNK produce postseismic transients that are reasonably consistent with observations. Of the eight models, only model M20 produces an 11.5 year transient that is comparable in magnitude to the data point at the respective time. However, this model lacks any significant strain rate signature later in the interseismic period, as discussed in section 3.3. We note that the “early” strain rate anomaly inferred from triangulation/trilateration data might be affected by shallow afterslip, resulting in a spuriously high near-fault strain rate amplitude. Therefore we do not consider the high apparent strain rates in the early phase of postseismic relaxation as a strong model discriminant. Both noncoupled power law models (Figures 7c and 7d) and models including thermomechanical coupling (Figures 7e–7h) can be deemed to be within the measurement errors. Kenner and Segall  and Johnson and Segall  argued that models incorporating viscoelastic shear zones on the downdip extensions of faults provide the best fit to the data. We note that shear zones in the models of Kenner and Segall  and Johnson and Segall  were introduced ad hoc, while in our models they are generated as part of the solution. While we cannot discriminate between candidate rheologies due to a considerable scatter in the data, one may conclude that stress-controlled coupled power law models (Figures 7g and 7h) are within the available geodetic constraints.
4.2. Surface Heat Flow Observations
 Here we compare the surface heat flow predicted by our coupled power law models to borehole heat flow observations from several areas around the SAF (Figure 8)—Parkfield, the Elk Hills, and the San Joaquin Valley [Benfield, 1947; Lachenbruch and Sass, 1980; Sass et al., 1971, 1982; Fulton et al., 2004]. These borehole sites have been selected such that there are no other faults between the site and the SAF, so as to avoid potential thermal contributions from other faults. The selected borehole data also encompass a wide range of distances from the SAF and provide information about both the magnitude and the wavelength of the observed SAF heat flow anomaly.
 It is well known that the SAF lacks the near-fault heat flow anomaly that would be expected from frictional heating above the brittle-ductile transition, assuming a coefficient of friction of 0.6–0.8 [e.g., Lachenbruch and Sass, 1980]. However, there is a broader heat flow anomaly in the California Coast Ranges approximately centered on the SAF [Lachenbruch and Sass, 1973]. The proposed explanations for the regionally elevated heat flow include the slab window [Dickinson and Snyder, 1979], advective transport of the frictionally generated heat on the SAF by fluid flow through permeable upper crust [e.g., Scholz et al., 1979], and viscous dissipation in the underlying plastosphere [Lachenbruch and Sass, 1980; Molnar, 1991; Thatcher and England, 1998]. Our results lend support to the suggestion that the observed heat flow anomaly may be at least partially due to shear heating in the ductile substrate. Both end-member rheologies in our coupled power law models satisfy constraints provided by heat flow measurements (Figure 8).
 Differences in the predicted heat flow maxima (∼20 m W m2) may be large enough for the heat flow data to provide some discrimination between candidate rheologies of the ductile substrate. Such a discrimination would hinge on the contribution from frictional heating in the brittle crust, which is ignored in our models. If the contribution of frictional heating is significant, the observations are more consistent with a weak “wet” composition, as a strong “dry” composition and high friction would result in a heat flow anomaly greater than that observed (Figure 8). If the heat flow from frictional heating is insignificant [e.g., Brune et al., 1969; Lachenbruch and Sass, 1980; Fulton et al., 2004], then the strong “dry” rheology may be favored. Mature strike-slip faults likely have a transition zone from highly localized frictional slip to ductile shear that includes a transition from velocity-weakening to velocity-strengthening friction [Marone et al., 1991; Dieterich, 1992; Scholz, 1998]. The depth extent of the velocity-strengthening slip, and the associated effective coefficient of friction, are not well understood, and are not explicitly included in our models. We partially account for a possible occurrence of localized creep by extending the slip interface by 5 km into the viscoelastic medium (12–17 km depth). If stable sliding occurs over a greater depth range and under a low effective normal stress (e.g., due to elevated pore pressure), the predicted heat flow anomaly due to viscous dissipation might be lower than that shown in Figure 8.
5.1. Model Comparisons
 Prior investigations of lower crustal and upper mantle rheology using layered viscoelastic cycle models have shown that surface deformation patterns at a single postearthquake epoch can be used to infer the effective viscosity of the ductile substrate at that particular time; however, the effective viscosity appears to change throughout the cycle [e.g., Kenner and Segall, 2003; Pollitz, 2003; Freed and Bürgmann, 2004, Hearn et al., 2009]. Therefore, a univiscous Maxwell rheology was deemed to be inadequate for the lower crust and/or upper mantle. Our results for univiscous Maxwell models agree with these findings, in that such models are unable to produce both arctangent-like interseismic velocity profiles and transient postseismic deformation (Figures 7a and 7b). Proposed alternatives include biviscous or multiviscous [e.g., Pollitz et al., 2001; Pollitz, 2003, 2005; Kenner and Segall, 2003, Hetland and Hager, 2005; Hearn et al., 2009] or nonlinear (e.g., power law) [e.g., Reches et al., 1994; Freed and Bürgmann, 2004] rheologies. The latter are motivated by laboratory experiments indicating that under high stress and temperature, ductile rocks deform by power law creep [Kirby and Kronenburg, 1987; Karato and Wu, 1993]. Previous studies incorporating power law creep reported that the effective viscosities of the lower crust and upper mantle inferred from fitting the geodetic data must be far lower than those suggested by laboratory experiments [Lyzenga et al., 1991; Reches et al., 1994], or that temperatures below the brittle-ductile transition must be higher than those suggested by surface heat flow data [Freed and Bürgmann, 2004]. Our results show that models assuming laboratory-derived power law parameters and normal geotherms, but neglecting thermomechanical coupling, give rise to unrealistically large stresses in the lithosphere, as viscous dissipation is unable to keep up with the build up of elastic stress (Figures 3c and 3d). Models that account for a feedback between viscous dissipation and the effective viscosity remove this problem and predict reasonable (i) stresses in the lithosphere (Figures 3e–3h and 4e–4h), (ii) surface velocities throughout the earthquake cycle (Figure 7e–7h), and (iii) surface heat flow anomalies (Figure 8), at least for mature faults. Note that our models do not require anomalous temperatures or unusual rheologies below the brittle-ductile transition.
5.2. Thickness and Strength of the Mechanical Lithosphere
 Models presented in section 3 satisfy basic conservation laws (in particular, conservation of energy and momentum), and may allow for predictions of the thickness of the mechanical lithosphere, defined as the portion of the model that supports high deviatoric stress. A high-stress lid extends well below the elastic-ductile transition in dynamic coupled power law simulations (Figures 4g and 4h). The lithosphere in these models thus develops self-consistently for the assumed rheology and loading conditions. The stress in the lithosphere away from the fault is nearly constant down to some characteristic depth Hl that can be associated with the effective mechanical thickness of the lithosphere. The inferred magnitude of shear stress is of the order of 50–125 MPa, consistent with petrological estimates of shear stress below the brittle-ductile transition in tectonically active continental crust [e.g., Hirth et al., 2001; Behr and Platt, 2011]. We did not consider brittle failure in the bulk of the upper crust, so that stress is overestimated in the uppermost ∼10 km. The magnitude of stress in the lithosphere only weakly, if at all, depends on the assumed static strength of the fault σs, as well as the assumed composition of the ductile substrate. Our models predict nearly identical magnitudes of stress in the lithosphere away from the active fault for σs between 30 and 90 MPa, and for the “wet” and “dry” end-member rheologies of the lower crust and upper mantle (Figure 9). The main effect of water content is variable thickness of the lithosphere. For the SAF-like loading rates and the assumed initial geothermal gradients of 20°C/km, the effective thickness of the lithosphere decreases from ∼30 km for the “dry” composition to ∼20 km for the “wet” composition (Figure 9). Also, the “wet” composition gives rise to a “jelly sandwich” structure (strong middle crust and upper mantle separated by a weak lower crust, Figures 4h and 9).
Figure 10 shows the shear stress supported by the lithosphere as a function of the far-field loading rate. For the models illustrated in this figure, we evolved the previously described solution for each end-member rheology by changing the loading rate by a factor of 3 (to 13.3 mm/yr and 120 mm/yr) and applying the respective velocity on both the fault plane in the elastic layer (0–12 km depth) and in the far field. The slip rate on the fault plane was again cosine tapered to zero from 12 to 17 km depth. The model was kinematically driven in this manner until a new quasi-steady thermal state was established, which required ∼5 Myr in all cases. We then simulated dynamic earthquake cycles as before, using the new thermal states as initial conditions, until full cycle invariance was achieved.
 As Figure 10 shows, the stress supported by the lithosphere decreases with increasing loading rate for both end-member rheologies. This relation is somewhat nonintuitive, as one might expect that absolute stresses scale with rates of relative plate motion. The inferred inverse proportionality between stress and loading rate is a consequence of thermomechanical coupling, which allows the localized shear zone to thermally soften and accommodate higher strain rate at lower shear stresses [e.g., Fialko and Khazan, 2005]. At lower loading rates, the dissipative thermal anomaly is relatively small and only weakly promotes localization, resulting in more distributed shear and higher stresses.
5.3. The Magnitude of Temperature Increases Due to Viscous Heating
 Self-heating in the ductile lithosphere due to a long-term motion on a strike-slip fault has been investigated in several studies that reached very different conclusions. For example, Thatcher and England  and Leloup et al.  suggested that the dissipative temperature perturbation should be of the order of hundreds of degrees Celsius, while Lyzenga et al.  and Savage and Lachenbruch  argued for much smaller temperature increases (under similar loading conditions) of the order of 1–10°C. Given that the magnitude of the dissipative temperature anomaly has important implications for the effective strength of the lithosphere and the surface heat flow, tighter constraints on the effects of viscous heating are certainly warranted.
Savage and Lachenbruch  proposed that the high-end (order of 102°C) temperature anomaly deduced by previous studies [e.g., Thatcher and England, 1998] stems from an unphysical stress singularity at the bottom of the elastic layer. Indeed, models of Thatcher and England  and Leloup et al.  assume a constant slip rate in the elastic layer, and an abrupt termination of slip at the brittle-ductile transition. Models of Lyzenga et al.  and Savage and Lachenbruch  avoid the stress singularity by tapering the fault slip below the brittle-ductile transition, or introducing a yield threshold near the fault tip, respectively. Our models assume a tapered slip distribution, similar to that of Lyzenga et al. , so that stresses are bounded everywhere, regardless of the grid size. The predicted temperature anomaly is of the order of a few hundreds of degrees (Figure 2), similar to the values obtained by Thatcher and England  and Leloup et al. , and much larger than the values obtained by Lyzenga et al.  and Savage and Lachenbruch . We note that results presented in Figure 2 are based on the assumption of a mafic composition of the ductile substrate, while the low-end values of Savage and Lachenbruch  were inferred for the case of granitic composition. To test to what extent the predicted temperature anomaly depends on composition, we modified our model to include a granitic middle crust in the depth range of 12–20 km, with creep law parameters n = 3.3, Q = 186.5 kJ mol−1, and A = 2.11 × 10−5 MPa−n s−1 [Carter and Tsenn, 1987], and a lower crust in the depth range of 20–30 km with the creep law parameters n = 3.1, Q = 243 kJ mol−1, and A = 8.0 × 10−3 MPa−n s−1 corresponding to those of felsic granulite [Wilks and Carter, 1990]. In these simulations we assumed a wet olivine rheology of the upper mantle (see Table 1). As one might expect, the respective temperature anomaly is lower than those predicted for the mafic composition (Figure 2), but still in excess of 75°C. The predicted temperature anomaly also strongly depends on the assumed geothermal gradient. For a typical geotherm in the continental crust [e.g., Turcotte and Schubert, 2002, pp. 143–144], the “felsic” end-member model described above predicts a dissipative temperature increase of 270°C, comparable to predictions of the “mafic” end-member models assuming higher temperatures below the brittle-ductile transition (Figure 2). We speculate that the relatively low magnitude of thermal perturbations deduced by Lyzenga et al.  and Savage and Lachenbruch  was due to their neglect of thermomechanical coupling, and underestimation of the background stress.
5.4. Strain Localization
 Our coupled power law simulations demonstrate that strain localization at depth is produced on the downdip extension of the fault due to a positive feedback between shear heating and temperature-dependent rheology (Figures 5e–5h and 11). For the parameters used in this study, the predicted width of the shear zone in the lower crust is several km (Figure 11), in good agreement with some geological observations of exposed lower crustal shear zones [Leloup and Kienast, 1993; Dumond et al., 2008]. Thus much of the relative plate motion is accommodated by a “deep fault root” that extends into the lower crust, and possibly into the upper mantle (Figures 5e–5h and 11). This might provide some physical justification for the use of elastic half-space models of interseismic deformation [Savage and Burford, 1970; Lapusta et al., 2000], although one needs to systematically compare predictions of the elastic half-space models incorporating rate-state friction to predictions of nonlinear viscoelastic models (e.g., Figure 6) to understand similarities and differences between the two classes of models. This will be addressed in future work. The degree of strain localization depends on the host rock composition, water content, ambient temperature, and fault slip rate. Stiffer rheologies, lower ambient temperatures, and higher slip rates all give rise to narrower shear zones. Such dependence may be in part responsible for the ongoing debate on the localized versus distributed nature of deformation in the ductile part of the continental lithosphere. For example, Wilson et al.  argued for broadly distributed deformation below the Wairan and Awatere faults in New Zealand based on the absence of Moho offsets and regional seismic anisotropy. However, these faults have relatively low slip rates (several mm/yr), young age (several Myr), and are part of a complex system of subparallel, closely spaced faults [Bourne et al., 1998], so that the region of elevated strain rate at the base of the crust may indeed be broad and possibly overlapping between neighboring fault zones. Higher silica content and elevated geotherms may also contribute to broad deformation zones in the lower crust. On the other hand, fast-moving mature faults such as the SAF may be associated with fairly deep and localized shear zones, consistent with available data [Poirier, 1980; White et al., 1980; Rutter, 1999; Stern and McBride, 1998; Zhu, 2000; Thurber et al., 2006, Tape et al., 2009; Nadeau and Dolenc, 2005; Shelly, 2010].
5.5. Implications for Field Observations of Ductile Shear Zones
 To date, few cases of thermally controlled localized ductile shear have been reported for strike-slip fault systems [e.g., Leloup and Kienast, 1993; Camacho et al., 2001; Dumond et al., 2008]. It is usually assumed that the relict thermal indicators of a shear zone generated through shear heating, such as metamorphic grade, overprinted mineral assemblages, isotopic closure temperature, deformation microstructures, etc., should vary over a distance comparable to the width of the shear zone. However, our results show that for mature faults the width of the shear zone (defined by the region of high strain rate) may be considerably (by as much as an order of magnitude) narrower than the associated thermal anomaly (Figure 11). The predicted temperature anomaly shows variations of only 15–25°C within a few km of the fault, which may be too small a difference to produce an obvious signature in thermal indicators within the shear zone compared to ambient rocks. The corollary is that field evidence for thermomechanical coupling may be subtle, as expressed e.g., in a regional (tens of km wide) paleotemperature anomaly centered on a shear zone (assuming that the exposure allows one to identify and track the same paleodepth), or anomalously high temperature of the shear zone with respect to the “normal” geotherm (assuming that the paleodepth of the exposure can be determined independently).
 We have considered models of earthquake cycles on a mature strike-slip fault. Models that incorporate laboratory-derived temperature-dependent power law rheology, viscous heat generation, and conductive heat transfer predict the development of shear zones in the middle and lower crust, gradually widening in the upper mantle. These shear zones localize strain in the interseismic period, resulting in stress transfer from the relative plate motion to seismogenic faults in the upper brittle crust. Shear zones also participate in postseismic transients by relaxing coseismic stress changes. For the SAF-like loading rates, the predicted temperature anomaly below the brittle-ductile transition is of the order of 200–400°C, and the width of the shear zone is of the order of several kilometers, consistent with geological observations in exposed deep shear zones worldwide [Leloup and Kienast, 1993; Dumond et al., 2008]. Our numerical simulations suggest that the water and silica content in the lower crust and upper mantle do not appreciably affect the shear stress in the lithosphere, but do control the thickness of a high-stress lid. The stress in the lithosphere is found to be of the order of 50–125 MPa, in agreement with petrological evidence [Hirth et al., 2001; Behr and Platt, 2011]. The lithospheric stress decreases with increasing rate of relative plate motion due to enhanced thermal weakening in the shear zone. Thermomechanical coupling is thus a viable mechanism by which stress perturbations in the viscoelastic lower crust and upper mantle may spontaneously generate localized ductile shear zones that ultimately control the effective strength of the continental lithosphere. Mature (∼107 yrs) shear zones generated in the temperature-dependent power law lower crust and upper mantle may be an order of magnitude narrower than the associated thermal anomalies, implying that field evidence for thermally induced localized shear may be subtle unless paleotemperature indicators are mapped over considerable (kilometers to tens of kilometers) distances away from the zone enhanced ductile shear. Our modeling results suggest that the broad heat flow anomaly around the San Andreas fault in Northern California may in part reflect viscous heating in the deep fault root extending into the lower crust and possibly the upper mantle.
 We thank two anonymous reviewers and editor Tom Parsons for their helpful comments and suggestions. This work was supported by the National Science Foundation (grant EAR-0944336) and the Southern California Earthquake Center (SCEC). Finite element meshes were created using code APMODEL, courtesy of S. Kenner.