2.1. Definition of the ETAS Model
 To make this discussion precise, let us consider the epidemic-type aftershock sequence (ETAS) model, in which any earthquake may trigger other earthquakes, which in turn may trigger more, and so on. Introduced in slightly different forms by Kagan and Knopoff  and Ogata , the model describes statistically the spatiotemporal clustering of seismicity. We choose the ETAS model because of its increasing popularity for the statistical description of earthquake interaction [Kagan and Knopoff, 1981; Ogata, 1988; Console et al., 2003; Zhuang et al., 2004], its establishment as a powerful null hypothesis for forecasting [Helmstetter and Sornette, 2003c; A. Helmstetter et al., Comparison of short-term and long-term earthquake forecast models for southern California, submitted to Bulletin of the Seismological Society of America, 2005; D. Schorlemmer et al., Earthquake likelihood model testing, SCEC preprint, 2005], its simplicity, and its explanatory power of features in catalogs including apparent Gutenberg-Richter b value variations and Omori law exponent variations [Helmstetter and Sornette, 2002], foreshocks [Helmstetter and Sornette, 2003a], and apparent aftershock diffusion [Helmstetter et al., 2003].
 The triggering process may be caused by various mechanisms that either compete or combine, such as pore pressure changes due to pore fluid flows coupled with stress variations, slow redistribution of stress by aseismic creep, rate- and state-dependent friction within faults, coupling between the viscoelastic lower crust and the brittle upper crust, stress-assisted microcrack corrosion, and more. The ETAS formulation amounts to a two-scale description: These above physical processes controlling earthquake interactions enter in the determination of effective triggering laws in a first step and the overall seismicity is then seen to result from the cascade of triggering of events triggering other events triggering other events and so on [Helmstetter and Sornette, 2002].
 The ETAS model consists of three laws about the nature of seismicity viewed as a marked point process. We restrict this study to the temporal domain only, summing over the whole spatial domain of interest. First, the magnitude of any earthquake, regardless of time, location, or magnitude of the mother shock, is drawn randomly from the exponential Gutenberg-Richter (GR) law. Its normalized probability density function (PDF) is expressed as
where the exponent b is typically close to one, and the cutoffs m0 and mmax serve to normalize the PDF. The upper cutoff mmax is introduced to avoid unphysical, infinitely large earthquakes. Its value was estimated to be in the range 8–9.5 [Kagan, 1999]. As the impact of a finite mmax is quite weak in the calculations below, replacing the abrupt cutoff mmax by a smooth taper would introduce negligible corrections to our results.
 Second, the model assumes that direct aftershocks are distributed in time according to the modified “direct” Omori law [see Utsu et al., 1995, and references therein]. Assuming θ > 0, the normalized PDF of the Omori law can be written as
 Third, the number of direct aftershocks of an event of magnitude m is assumed to follow the productivity law:
Note that the productivity law (3) is zero below the cutoff m0, i.e., earthquakes smaller than m0 do not trigger other earthquakes, as is typically assumed in studies using the ETAS model. The existence of the small magnitude cutoff m0 is necessary to ensure the convergence of these types of models of triggered seismicity (in the statistical physics of phase transitions and in particle physics, this is called an “ultraviolet” cutoff which is often necessary to make the theory convergent). In a closely related paper, Sornette and Werner  showed that the existence of the cutoff m0 has observable consequences which constrain its physical value. They also discuss possible scenarios for this break in self-similarity, such as a transition from fracture to friction dominated earthquakes [Richardson and Jordan, 2002] or a minimum earthquake size as predicted by rate-and-state friction [Dieterich, 1992; Ben-Zion, 2003].
 The key parameter of the ETAS model is defined as the number n of direct aftershocks per earthquake, averaged over all magnitudes. Here, we must distinguish between the two cases α = b and α ≠ b:
for the general case α ≠ b. The special case α = b gives
 Three regimes can be distinguished based on the value of n. The case n < 1 corresponds to the subcritical, stationary regime, where aftershock sequences die out with probability one. The case n > 1 describes unbounded, exponentially growing seismicity [Helmstetter and Sornette, 2002]. In addition, the case b < α leads to explosive seismicity with finite time singularities [Sornette and Helmstetter, 2002]. The critical case n = 1 separates the two regimes n < 1 and n > 1. Helmstetter and Sornette [2003b] showed that the branching ratio n is also equal to the fraction of triggered events in a seismic catalog. We consider the case n < 1 which describes stationary seismicity. The branching ratio n measures the distance to the critical state of the crust (n = 1) which may have important implications for the self-organization of the crust.
 The fact that we use the same value for the productivity cutoff and the Gutenberg-Richter (GR) cutoff is not a restriction as long as the real cutoff for the Gutenberg-Richter law is smaller than or equal to the cutoff for the productivity law. In that case, truncating the GR law at the productivity cutoff just means that all smaller earthquakes, which do not trigger any events, do not participate in the cascade of triggered events. This should not be confused with the standard incorrect procedure in many previous studies of triggered seismicity of simply replacing the GR and productivity cutoff m0 with the detection threshold md in equations (1) and (3) [see, e.g., Ogata, 1988; Kagan, 1991; Ogata, 1998; Console et al., 2003; Zhuang et al., 2004]. The assumption that md = m0 may lead to a bias in the estimated parameters. Helmstetter et al. [2005, Figure 1] show that events of magnitude 2 trigger their own aftershock sequences. We thus expect m0 to be smaller than md.
 Without loss of generality, we consider one independent branch (cluster or cascade of aftershocks set off by a background event) of the ETAS model. We generalize to a seismic catalog of an arbitrary number of clusters in the appendix. Let an independent background event of magnitude M1 occur at some origin of time. The main shock will trigger direct aftershocks according to the productivity law (3). Each of the direct aftershocks will trigger their own aftershocks, which in turn produce their own, and so on. Averaged over all magnitudes, an aftershock produces n direct offspring according to (4). Thus, integrating over time, we can write the average of the total number Ntotal of direct and indirect aftershocks of the initial main shock as an infinite sum over terms of (3) multiplied by n to the power of the generation [Helmstetter and Sornette, 2003b], which can be expressed for n < 1 as
However, since we can only detect events above the detection threshold md > m0, the total number of observed aftershocks Nobs of the sequence is simply Ntotal multiplied by the fraction of events above the detection threshold, given by
according to the GR distribution. The observed number of events in the sequence is therefore
Equation (8) predicts the average observed number of direct and indirect aftershocks of a main shock of magnitude M1 > md. Sornette and Werner  showed that m0 may be estimated using fits of Nobs given by (8) to observed aftershock sequences and Båth's law. The essential parameter needed to constrain m0 is the branching ratio n. As we demonstrate below, typical estimates of n in the literature obtained from a catalog neglect undetected seismicity and therefore cannot be used directly to constrain m0.
 Naturally, there is no justification for assuming that md should equal m0, as is done routinely in inversions of catalogs for the parameters of the ETAS model [see, e.g., Ogata, 1988; Kagan, 1991; Ogata, 1998; Console et al., 2003; Zhuang et al., 2004]. First, detection thresholds change over time as instruments and network coverage become better, while the physical mechanisms in the Earth presumably remain the same. No significant deviation from the Gutenberg-Richter distribution or the productivity law has been recorded as the detection threshold md decreased over time [see, e.g., Ouillon and Sornette, 2005, Figure 3]. Second, studies of earthquake occurrence at small magnitude levels below the regional network cutoffs show that earthquakes follow the same Gutenberg-Richter law (for a recent study of mining-induced seismicity, see, e.g., Sellers et al. ), while acoustic emission experiments have shown the relevance of the Omori law at small scales [see, e.g., Nechad et al., 2005, and references therein]. Within the assumption of self-similarity, i.e., a continuation of the GR and productivity laws down to a cutoff, evidence thus points toward a magnitude of the smallest triggering earthquake and a Gutenberg-Richter cutoff that lie below the detection threshold and are thus not directly observable.
 The effect of undetected seismicity below the detection threshold is fundamentally different from the effect of earthquakes outside the space-time study window that may contribute to the seismicity budget inside the region. The event incompleteness below the magnitude detection threshold md cannot be treated in analogy to the time and space detection threshold as a finite size boundary effect problem. While events from outside the study area have a decreasing influence on the inside in time according to the Omori law and in space according to a spatial decay function (e.g., Gaussian or power law), the influence of the many events below the detection threshold inside the study area may be very significant because each magnitude range collectively triggers a roughly equal amount of events of any size. The magnitude detection threshold is thus of a different nature than boundary effects and must be addressed.
2.2. Two Interpretations of the ETAS Model
 The ETAS model may be viewed in two mathematically equivalent ways that differ in their interpretation. In this section, we develop both views to underline that our results apply in both cases and to stress the equivalence of these two views. The first describes the model as a simple branching model without loops [Kagan, 1991]: The independent background events, due to tectonic loading, may each independently trigger direct aftershocks, each of which may in turn trigger secondary shocks, which in turn may trigger more. Because every triggered event, excluding of course the nontriggered background events, has exactly one main shock (mother), but the mother may have many direct aftershocks (children), the model can be thought of as a simple branching model without loops. The background events are assumed to be a stationary Poisson process with a constant rate. The rate of the aftershocks of a background event is a nonstationary Poisson process that is updated every time another aftershock occurs until the cascade dies out. The intensity is thus conditioned on the specific history of earthquakes. The expectation of the conditional intensity is an average over an ensemble of histories. The predicted number of aftershocks of an independent background event of magnitude M1 as in expression (8) is thus averaged over the ensemble of possible realizations of the aftershock sequence, and it is also averaged over all possible magnitudes of the aftershocks. The branching ratio n is therefore an average not only over magnitudes but also over an ensemble of realizations of the nonstationary Poisson process. In summary, the model consists of statistically independent Poisson clusters of events, which are, however, dependent within one cluster.
 The second view of the ETAS model does not allow a unique identification of the mother or trigger of an earthquake. Rather, each aftershock was triggered collectively by all previous earthquakes, each of which contributes a weight determined by the magnitude-dependent productivity law ρ(m) that decays in time according to the Omori law ψ(t) and in space according to a spatial function R(r), often chosen to be an exponential or a power law centered on the event. The instantaneous conditional intensity rate at some time t at location r is given by
where the sum runs over all previous events i with magnitude mi at time ti at location ri. Thus the triggering contribution of a previous event to a later event at time t is given by its own weight (its specific entry in the sum) divided by the total seismicity rate, including the background rate. A nonzero background rate then contributes evenly to all events and corresponds to an omnipresent loading contribution. In this way, earthquakes are seen to be the result of all previous activity including the background rate. This corresponds to a branching model in which every earthquake links to all subsequent earthquakes weighted according to the contribution to triggering. A branching ratio can then be interpreted as a contribution of a past earthquake to a future earthquake, averaged over an ensemble of realizations and all magnitudes. In contrast to the independent background events considered due solely to tectonic loading that exist in the first interpretation, all earthquakes are due to a combination of the background loading and the effect of previous events. This second view becomes the only possible one for nonlinear models whose triggering functions depend nonlinearly on previous events (see, e.g., the recently introduced multifractal earthquake triggering model [Ouillon and Sornette, 2005; Sornette and Ouillon, 2005] and references therein).
 These two views are equivalent because the linear formulation of the seismic rate of the ETAS model together with the exponential Poisson process ensures that the statistical properties of the resulting earthquake catalogs are the same. The linear sum over the individual contributions and the Poisson process formulation are the key ingredients that allow the model to be viewed as a simple branching model.
 This duality of thinking about the ETAS model is reflected in the existence of two simulation codes in the community, each inspired by one of the two views. A program written by K. Felzer and Y. Gu (personal communication) calculates the background events as a stationary Poisson process and then simulates each cascade independently of the other branches as a nonstationary process. The second code by Ogata , on the other hand, calculates the overall seismicity at each point in time by summing over all previous activity. The latter code is significantly slower because the independence between cascades is not used, and the entire catalog is modeled as the sum of a stationary and a nonstationary process. Despite the different approach, both resulting earthquake catalogs share the same statistical properties and are thus equally acceptable.
 While the simulation or forward problem is straightforward when adopting the view of the ETAS model as a branching model with one assigned trigger for any aftershock, the inverse problem of reconstructing the branching structure from a given catalog can at best be probabilistic. Because aftershocks of one mother cannot be distinguished from those of another mother except by spatiotemporal distance, we have no way of choosing which previous earthquake triggered a particular event, or whether it is a background event. Rather, we must resort to calculating the probability of an event at time t to be triggered by any previous event according to the contribution that the previous event has at time t compared to the overall intensity at time t. This probability is of course equal to the weight or triggering contribution that a previous event has on a subsequent event when adopting the collective-triggering view. However, the interpretation remains different since the probability specifies a unique mother in a fraction of many realizations.
 Having determined from catalogs a branching structure weighted according to the probability of triggering, one may of course choose to always pick as source of an event the most probable contributor, be that a previous event or the background rate. Another option is to choose randomly according to the probability distribution and thus reconstruct one possible branching structure among the ensemble of many other possible ones. The latter approach has been used by Zhuang et al.  and labeled stochastic reconstruction.
 The key point is that equating the detection threshold with the smallest triggering earthquake will most likely lead to a bias in the recovered parameters of a maximum likelihood analysis as performed by Zhuang et al.  and in many other studies. Therefore the weights or probabilities of previous events triggering subsequent events were calculated from biased parameters.
 In the following, we show that the branching ratio and the background source events are significantly biased when they are estimated from the apparent branching structure observed above the detection threshold md instead of the complete tree structure down to m0. We adopt the view of the simple branching model to make the derivations more illuminating but all results can be reinterpreted as contributions in the collective-triggering view.