Modern earthquake catalogs are often described using spatial-temporal point process models such as the epidemic-type aftershock sequence (ETAS) models of Ogata (1998). Earthquake catalogs often have issues of incompleteness and other inaccuracies for earthquakes of magnitude below a certain threshold, and such earthquakes are typically removed prior to fitting a point process model. This paper investigates the bias in the parameters in ETAS models introduced by the removal of the smallest events. It is shown that in the case of most of the ETAS parameters, the bias increases approximately exponentially as a function of the lower magnitude cutoff.
 The estimation of models in the presence of noisy or missing data is a topic of significant concern in numerous scientific applications. Indeed, much of our knowledge about structures and processes beneath the Earth's surface is tempered by incomplete data and observations with low signal-to-noise ratios [Bolt, 1996]. In the case of seismology, one of the most significant obstacles to current understanding and characterization of earthquake catalogs is the fact that earthquakes of small magnitude are often undetected by seismic monitoring stations, and as a result, the analysis of modern earthquake catalogs is typically limited to earthquakes above a certain lower magnitude detection threshold [Kagan, 2002, 2003]. That is, prior to modeling and analysis of earthquake catalogs, earthquakes of magnitudes below a certain cutoff are excluded from analysis, so that the remaining events are thought to represent a relatively complete list of earthquakes above the cutoff magnitude.
 As the density of monitoring stations increases and technological developments facilitate the recording of increasingly accurate earthquake catalogs down to progressively lower magnitude thresholds, a question of increasing importance is how much improvement such advances allow, in terms of accuracy in estimating parameters in models for earthquake occurrence and consequently in earthquake forecasting. The goal of this paper is to address this question via a simulation study of realistic earthquake models, analyzing each simulation following a variety of lower magnitude cutoffs, and investigating the bias in parameter estimates introduced by the removal of earthquakes below the lower magnitude cutoff. While a variety of models for earthquake occurrences have been proposed, we focus here specifically on the spatial-temporal epidemic-type aftershock sequence (ETAS) models introduced by Ogata , as these branching models are commonly used to describe modern earthquake catalogs. Such models are widely used at present for a variety of purposes, including time-dependent forecasting and seismic hazard estimation [Schorlemmer et al., 2007], earthquake declustering [Zhuang et al., 2002], detection of anomalous seismic behavior [Ogata, 2007], the study of differences between tectonic zones [Kagan et al., 2010], testing hypotheses about static and dynamic earthquake triggering [Hainzl and Ogata, 2005] and as reference models for the comparison with alternatives [Ogata and Zhuang, 2006].
 The relationship between the lower magnitude threshold and the branching ratio in the ETAS model was investigated from a theoretical perspective by Sornette and Werner [2005a, 2005b]. Here, the focus is on the statistical properties of maximum likelihood estimates of the parameters in these point process models. In particular, we show how the bias in these parameters changes as the lower magnitude cutoff increases and describe the observed relationship between the cutoff and the bias for each parameter. Increased understanding of the biases in the estimates of parameters in ETAS models may lead to improved (bias-corrected) estimates and hence more accurate forecasts and estimates of seismic hazard, as well as more appropriate reference models and improvements in our understanding of seismic phenomena in ways related to the use of the ETAS model.
 It is difficult to study the bias in parameter estimates without the ability to repeatedly draw simulations from the model in question and then estimate its parameters for each realization. While efficient methods for simulating branching models such as ETAS have been developed decades ago [e.g., Ogata, 1981], stable, reliable, automatic methods for their estimation have not been developed until recently, and the repeated simulation and estimation of the space-time ETAS model studied here are greatly facilitated by the expectation-maximization (EM)-type estimation procedure developed by Veen and Schoenberg . Indeed, as described by Veen and Schoenberg , conventional maximum likelihood estimation for multiparameter models such as ETAS is not only computationally inefficient but can be quite problematic because of problems such as multimodality, dependence on choice of starting values, and extreme flatness of the likelihood function near the optimum which can cause what is referred to by Veen and Schoenberg  as computational multimodality. As a result, a substantial amount of patience and supervision is typically required in fitting a space-time ETAS model to a given data set, and this has been a major obstacle to the repeated simulation and estimation required for the current study. Viewing branching process models as incomplete data problems, Veen and Schoenberg  introduced a relatively novel approach by applying the expectation-maximization (EM) algorithm to attain maximum likelihood estimates (MLEs) [Dempster et al., 1977]. The fact that the EM-type estimation procedure of Veen and Schoenberg  is comparatively automatic and resistant to these problems makes it especially amenable to simulation studies such as this.
 The remainder of this paper is outlined as follows. Section 2 describes the methods used in our simulation study, including a brief description of the space-time ETAS model and its estimation. The results are presented in section 3, and a discussion follows in section 4.
 The ETAS model is an example of a class of branching point process models known as Hawkes or self-exciting point processes [Hawkes, 1971]. For a temporal Hawkes process, the conditional rate of events at time t, given information ℋt on all events prior to time t, can be written
where μ > 0, is the background rate, g(u) ≥ 0 is the triggering function which describes the aftershock activity induced by a prior event, and ∫0∞g(u)du < 1 in order to ensure stationarity [Hawkes, 1971]. These models were called epidemic by Ogata , since according to such a model, an earthquake can produce aftershocks which in turn produce their own aftershocks, etc. An example is the time-magnitude ETAS model of Ogata , who suggested the magnitude-dependent triggering function
where ui = t − ti is the time elapsed since earthquake i, K0 > 0 is a normalizing constant governing the expected number of direct aftershocks triggered by earthquake i, and M0 is the lower cutoff magnitude for the earthquake catalog. The term K0/(ui + c)p describing the temporal distribution of aftershocks is known as the modified Omori-Utsu law. While the literature in seismology usually lets p > 0, the interpretation of the modified Omori-Utsu law as a probability density function requires p to be restricted to values strictly greater than 1.
 The ETAS model has since been extended to describe the space-time-magnitude distribution of earthquake occurrences [Ogata, 1998]. A version suggested by Ogata  uses circular aftershock regions where the squared distance between an aftershock and its triggering event follows a Pareto distribution. The model may be written
with triggering function
where (xi, yi) represents the epicenter of earthquake i, and d > 0 and q > 0 are parameters describing the spatial distribution of triggered seismicity.
 One characteristic of model (2) is that the aftershock zone does not scale with the magnitude of the triggering event. However, it has been suggested that the spatial distribution of aftershocks is more diffuse for larger aftershocks [Molchan et al., 1997; Kagan, 2002]. As a remedy, Ogata  suggested the slightly different model
wherein the spatial distribution of aftershocks interacts more dramatically with main shock magnitude. This interaction is sometimes referred to as magnitude scaling, since a typical feature of the model and of earthquake catalogs is a gradual widening of the spatial-temporal aftershock distribution as the magnitude of the main shock increases. Note that while simulations of the ETAS model involve the generation of main shocks and aftershocks, in fitting ETAS models to actual earthquake catalogs, one typically does not differentiate between the two types of events.
 Point process models such as (1)–(2) or (1)–(3) are conventionally fit by maximizing the log likelihood function
where θ = (a, c, d, K0, p, q, μ) is the parameter vector to be estimated and [x0, x1] × [y0, y1] × [0, T] is the space-time window in which the data set (xi, yi, ti, mi) is observed [Daley and Vere-Jones, 2003]. Following convention, we define an error as the difference between the maximum likelihood estimate of a parameter and the true (simulation) parameter value for a particular simulation, and the mean or expected value of these errors is the bias in the estimate of the parameter.
 Maximum likelihood estimates (MLEs), which are values of the parameters optimizing equation (4), can be searched for by conventional gradient-based methods, though Veen and Schoenberg  showed that improved estimates can be obtained in an iterative procedure that incorporates all possible branching structures of the events with their corresponding probabilities. The resulting estimation procedure is substantially more robust than gradient-based methods and produces very similar results, typically with slightly lower bias than gradient-based MLEs [Veen and Schoenberg, 2008]. In this study, we investigate the bias in MLEs obtained using the method of Veen and Schoenberg , for the space-time ETAS models (1)–(2) or (1)–(3).
 The EM algorithm, originally described by Dempster et al. , is a statistical algorithm to construct maximum likelihood estimates of parameters in the presence of unobserved random variables which may influence the likelihood. The procedure is iterative, with two steps: an expectation (E) step and a maximization (M) step. After initial starting values are assigned for the parameters to be estimated and for the random variables which are unobserved, the E step involves computing the expectation of the likelihood given these unobserved variables. In the M step, the parameters are found which maximize the expected likelihood computed in the E step. The above steps are typically iterated until a certain tolerance is reached.
 In the EM-type estimation procedure by Veen and Schoenberg , the branching structure of the observed earthquakes comprises the unobserved random variables. That is, the assignment of which earthquakes triggered which aftershocks, and which of those aftershocks triggered which other aftershocks, etc., is performed according to the ETAS model in each E step. This assignment changes in each iteration as the parameters in the ETAS model change following each M step. The idea is somewhat analogous to the stochastic reconstruction idea of Zhuang et al. [2002, 2004], in which the ETAS model is similarly used to assign a random branching structure to an observed earthquake catalog. It should be noted that the ETAS model formulation (and hence the EM algorithm as well) allows assignments where a triggered event has larger magnitude than its parent event, and that the branching structure estimated in each iteration of the EM algorithm will typically be quite erroneous; the estimation procedure is nevertheless quite effective at producing stable parameter estimates, as shown by Veen and Schoenberg .
 For the purpose of this simulation study, we consider here a region and data set identical to that analyzed by Veen and Schoenberg , namely, a rectangular area around Los Angeles between longitudes −122° and −114° and latitudes 32° and 37° (733 km × 556 km) between 1 January 1984 and 17 June 2004. Following the convention used by Ogata  and Veen and Schoenberg , we consider the epicentral locations of earthquakes within this region, and consider Euclidean distances, approximating this small portion of the Earth's surface as a subset of the plane; 1500 simulations of the ETAS process (1) and (2) were generated. For each simulation, various values of lower magnitude cutoff (M0) were selected, and for each choice of cutoff, the parameters in each simulated data set were estimated using the EM-type MLEs described by Veen and Schoenberg . In order to fit a balance between computational burden, the precision of our results, and to have a range of magnitude cutoffs which might be realistic for future local catalogs, we selected 25 magnitude cutoffs starting from M0 = 2.0, with increments of 0.06 magnitude units. As the true values of the parameters in the process being simulated, we selected the parameters given in Table 1 of Veen and Schoenberg , specifically (a, c, d, K0, p, q, μ) = (2.3026, 0.01, 0.015, 0.0000305, 1.5, 1.8, 0.0008), where each simulation was performed for 7500 days, or approximately 20.5 years, and was limited to earthquakes of magnitude between 2.0 and 8.0. These coefficients were based in part on discussions with UCLA seismologists Y. Y. Kagan and I. Zaliapin and in part on analysis of seismological data compiled by the Southern California Earthquake Center (SCEC) on this same domain. This data set, which for the purpose of this study was only used to motivate the ETAS parameter values chosen for the simulations, includes the origin times, magnitudes, and epicentral locations of 6796 earthquakes occurring in the given region, based on measurements taken by a network of almost 400 seismographic stations throughout southern California, and is considered relatively complete above M0 = 3 (Kagan, 2002). The catalog is maintained by the Southern California Seismic Network (SCSN), a cooperative project of the California Institute of Technology and the U.S. Geological Survey, and is publicly available at http://www.data.scec.org.
 For the ETAS model described above, the number of events in a simulated catalog can vary widely. The model with λ defined in equation (1) and with g defined in equation (2) (hereafter referred to as model (1)–(2), the branching ratio is 0.95, and although this branching ratio is less than unity, a small percentage (approximately 3%) of our simulations were explosive, producing tens of thousands of events, and took a great deal of time to converge. In general, simulations may also contain so few events above magnitude 3.44 (the highest of the lower magnitude cutoffs we chose to consider) that the seven parameters (a, c, d, K0, p, q, μ) could not be reliably estimated at all, for this cutoff value. Since one typically would not estimate an ETAS model for a catalog without at least a few events above the lower magnitude threshold, and so that the simulated catalogs would approximate the range of most realistic earthquake catalogs worldwide to which such a model might be fitted [see, e.g., Kagan, 2003], we limit our analysis in what follows to simulations of model (1)–(2) containing between 500 and 3000 events of magnitude greater than 2.0; 1500 such simulations were produced. Note that our restriction on the size of simulated catalogs is preferable to restricting attention only to those simulations which produced at least some minimal number of large events, which might bias the results toward those producing an abnormally high ratio of the number of large events to the number of small events. For each such simulation, and for each lower magnitude cutoff selected, the bias in each parameter was estimated via the mean difference between the parameter estimates and the true (simulation) values of the parameters.
 For the model in equations (1)–(3) (hereafter referred to as model (1–3)), simulation and estimation were performed in a similar manner. The parameters (a, c, d, K0, p, q, μ) = (1.25, 0.02, 0.0003, 0.0001, 1.2, 1.5, 0.0008), used in generating the simulations, were selected as a typical set of parameters which would generally result in realistic, nonexplosive catalogs; for this model the branching ratio is 0.86. Approximation of the sum of the integral of g in (3) over all simulated events, which is needed for parameter estimation, requires substantial computational time. As a result, 500 simulations were produced for model (1–3), and each estimated following various lower magnitude cutoffs between 2.0 and 3.02 via increments of 0.06; a smaller number of simulations and smaller range for M0 were used because of this increased computational burden. A time span of 30 years, or 10,950 days, was used as the time window for (1)–(3), and the space window used was the same as that used in the simulation of (1) and (2). The purpose of using a larger time window for model (1–3) is to obtain enough simulated events per catalog so that the parameters may be stably estimated even for the largest magnitude cutoffs. In the 500 simulations, the number of events per catalog ranged from 978 events to 2752 events above magnitude 2.0, and estimation of the 7 parameters in model (1–3) using the EM procedure converged readily for all of the simulations and magnitude cutoffs selected.
Figures 1 and 2 show the times and magnitudes of events in typical realizations of simulated catalogs using models (1–2) and (1–3), respectively. One sees in Figure 1 how sharply the number of events in the catalog above a given magnitude cutoff decreases when the lower magnitude cutoff is increased from 2.0 to 3.0, in accord with the Gutenberg-Richter law; in this example, for model (1–2), the catalog size decreases from 2356 events to just 241. Figure 2 shows similar results for model (1–3); here the number of events decreases from 1916 and 191 as M0 increases from 2.0 to 3.0.
Figures 3 and 4 show the biases in the estimates of the parameters governing the spatial and temporal components of the triggering function (parameters c, d, p and q), as a function of the lower magnitude threshold, M0, for models (2) and (3), respectively. (For the sake of brevity, we focus on the space-time clustering parameters here; plots corresponding to the other parameters are shown in Figures S1–S16 in the auxiliary material.) For model (2), the biases in c, d, p, and q are well approximated by exponential functions of M0, as indicated by the curves in Figure 3, which show exponential curves fitted by nonlinear least squares. One sees in particular that the bias in p and q appears roughly to stabilize around a lower magnitude cutoff of M0 = 2.8 and below, whereas for c and d, the bias decreases monotonically as M0 decreases, though the bias is already minimal and decreases rather slowly for M0 ≤ 2.4. It is curious that for model (1–2), in the case of p, and q, the bias appears to decrease slightly as M0 increases from 2.0 to 2.7. These decreases are not substantial, however, and are only borderline statistically significant; as described further below, the standard errors for each of these estimates are 0.002 and 0.005 for these bias estimates of p and q, respectively.
 For model (1–3), exponential functions again fit well to the bias in c, d, p, and q, as shown in Figure 4. Note that the relationships between the biases and magnitude cutoffs for a given parameter are strikingly different from model (1–2) to model (1–3), showing the dramatic effect induced by a seemingly slight change in the model's parametric form. The bias in c, d, and q appears to increase steadily as the lower magnitude cutoff increases, while the bias in p actually decreases as M0 increases from 2.0 to 2.6, and appears to remain constant for M0 between 2.6 and 3.0. For parameters c, d, and q, the sizes of the biases generally tend to increase as the magnitude cutoff increases, for both models (1–2) and (1–3), which is expected since for catalogs with fewer events one would generally expect larger errors in the parameter estimates. However, as the magnitude cutoff increases, the bias in p appears to decrease for model (1–3), which is curious. This may be the result of a small number of simulations with unusual aftershock activity and an unusually high incidence of clustering, especially of events of small magnitudes. The size of the bias in p for model (1–3) is very large for all magnitude cutoffs considered: the bias in p ranges between 0.085 and 0.1.05 which is very substantial, as a small perturbation in the parameter p, which governs the exponent of the power law decay of aftershock activity in time since the main shock, can result in a dramatic change in aftershock activity. The results in Figure 4 suggest that for the model (1–3), estimation of p is in general rather unreliable using local catalogs of the sizes considered here.
Figures 5 and 6 are histograms of the errors in the estimates of ETAS parameters c, d, p and q, for M0 = 2.24, for models (1–2) and (1–3), respectively. (Results for other magnitude cutoffs are very similar in shape, and are shown in the auxiliary material.) One sees that in each case, the parameter estimates are approximately normally distributed; some skew is observed in the estimates of parameter p for model (1–3), which is likely the result of a few simulations with especially strong clustering, but each of the other parameter estimates seem to closely follow the normal curve. Hence one may simply take 1.96 standard deviations in obtaining approximate 95% confidence bounds for these estimates of bias. Note that the errors in estimates of p are quite large for both models (1–2) and especially (1–3), but that the errors in the estimation of q are typically substantially smaller for the parameterization in model (1–3), which has magnitude scaling, than for (1–2). The parameter q must be interpreted slightly differently for the two models, however, due to this scaling. In model (1–2), the parameter q is simply the exponent of the power law spatial decay in aftershock activity around a given main shock, whereas in model (1–3), q is the exponent of the power law decay in distance of an aftershock from its corresponding main shock, after scaling according to main shock magnitude.
 In order to verify that the estimates of the bias are sufficiently precise and have stabilized after n = 1500 simulations, Figure 7 shows the estimated bias in each parameter estimate as a function of n, along with 95% confidence bounds, for each cutoff M0. The results in Figure 7 are for M0 = 3.20 and model (1–2); the results for model (1–3) and for other lower magnitude cutoffs appeared to be similar, as depicted in Figure 8. It is evident that for each of the parameter estimates, the bias is statistically significant, and that the estimate of bias has essentially stabilized after several hundred simulations.
 It must be borne in mind that although the estimates from distinct simulations are by construction-independent, the estimates of different parameters may be highly correlated across simulations and magnitude cutoffs. For instance, Figure 9 shows scatterplots of the errors for (2) in some of the other parameter estimates (d, p and q) versus errors in the estimates of parameter c; here each point corresponds to one simulation and one magnitude cutoff. Of all the pairs of parameters, only the pairs (c, p), (d, q), and (a, K0) exhibited correlations higher than 0.35. The correlations between all pairs of parameter estimates for models (1–2) and (1–3) are shown in Tables 1 and 2, along with corresponding p values based on the t statistic. It is not surprising that the pairs (c, p), (d, q), and (a, K0) have the highest correlations, as the first pair governs the temporal distribution of aftershocks, the second pair governs the spatial distribution of aftershocks, and the third governs the branching ratio of the process.
Table 1. Correlations Between Biases in Parameter Estimates, Across All Simulations, and All Magnitude Cutoffs Between 2.00 and 3.44, Using Model With Triggering Function (2)a
With p values reported in parentheses.
0.146 (<2 × 10−16)
0.128 (<2 × 10−16)
0.126 (<2 × 10−16)
−0.500 (<2 × 10−16)
0.148 (<2 × 10−16)
−0.0250 (1.35 × 10−06)
0.0745 (<2 × 10−16)
0.702 (<2 × 10−16)
0.0377 (2.89 × 10−13)
−0.121 (<2 × 10−16)
0.0440 (<2 × 10−16)
0.0499 (<2 × 10−16)
0.811 (<2 × 10−16)
−0.325 (<2 × 10−16)
−0.0474 (<2 × 10−16)
−0.284 (<2 × 10−16)
−0.210 (<2 × 10−16)
−0.340 (<2 × 10−16)
−0.0593 (<2 × 10−16)
−0.0555 (<2 × 10−16)
Table 2. Correlations Between Biases in Parameter Estimates, Across All Simulations, and All Magnitude Cutoffs Between 2.00 and 3.02, Using Model With Triggering Function (2)a
With p values reported in parentheses.
0.197 (<2 × 10−16)
−0.483 (<2 × 10−16)
−0.432 (<2 × 10−16)
−0.703 (<2 × 10−16)
0.215 (<2 × 10−16)
0.0609 (1.91 × 10−08)
0.150 (<2 × 10−16)
0.342 (<2 × 10−16)
0.0907 (2.89 × 10−13)
−0.227 (<2 × 10−16)
−0.414 (<2 × 10−16)
0.815 (<2 × 10−16)
−0.332 (<2 × 10−16)
0.0522 (1.45 × 10−06)
−0.243 (<2 × 10−16)
−0.568 (<2 × 10−16)
0.749 (<2 × 10−16)
−0.0684 (2.67 × 10−10)
0.222 (<2 × 10−16)
0.599 (<2 × 10−16)
 Similarly, Figure 10 shows scatterplots of the errors in the estimates of parameter d for model (3), and Table 2 presents the correlation between all pairs of parameter estimates along with corresponding p values based on the t test. As one would expect, d has high correlation with q. Some of the other pairs, for example (a, d), (a, K0), (c, d) and (c, q) also have large correlations (−0.483, −0.678, −0.409 and 0.400, respectively). It is also observed that μ has high correlation with c, d, and q (−0.548, 0.760 and 0.605, respectively). It is not surprising that pairs of estimates of parameters such as (a, K0), (c, p), or (d, q) have high correlations, since both parameters in these pairs govern similar physical quantities: the branching ratio, temporal distribution of aftershocks, and spatial distribution of aftershocks, respectively. However, it is curious that estimates of pairs of parameters such as (c, d) are so highly correlated, since these parameters appear to govern such different properties of the earthquake catalog. This phenomenon might need further investigation, since its cause is certainly not obvious nor is it intuitively suggested by the formulation of triggering function (3).
 One might object that the trends observed in Figures 3 and 4 might be due in part to the exclusion of the most explosive simulations with very large numbers of events. The branching ratios for the models (1–2) and (1–3) are 0.95 and 0.86, respectively, and occasionally simulations of these ETAS models can be so explosive, i.e., can contain so many events, that the simulations can fail to complete and can encounter memory problems, and the replacement of these simulations by nonexplosive simulations might in principle have a large influence on the estimates of bias shown in Figure 3. However, this is unlikely to explain the trends observed in Figures 3 and 4, given that the largest catalogs do not necessarily correspond to unusually large (or unusually small) errors in the parameter estimates. As seen in Figures 11 and 12 for models (2) and (3), respectively, there does not appear to be a clear pattern in the relationship between catalog size and the errors in ETAS parameter estimates. For instance, using simulations of model (2) and a magnitude cutoff of 3.08, the correlations between the numbers of events in the simulated catalogs and the errors in the estimated parameters are 0.0142, −0.115, −0.0915, −0.0282, −0.0893, −0.109, and −0.0593 for the parameters a, c, d, K0, p, q, and μ, respectively. The results are somewhat similar for model (3), and for other magnitude cutoffs: for instance, the corresponding correlations are 0.140, −0.0847, −0.0135, −0.0699, −0.0297, −0.0470, and −0.0596 for the parameters a, c, d, K0, p, q, and μ, respectively, using a magnitude cutoff of 2.54. These empirical findings are consistent with the asymptotic theory for maximum likelihood estimates which asserts that under suitable regularity conditions, the biases in parameter estimates generally decrease as the sample size increases [Ogata, 1978].
 The results here provide some insights into the estimation of ETAS models as well as for the collection of seismological data. In the case of model (1–2), our results seem to indicate that for the purposes of estimating the parameters p and q, the bias in ETAS parameter estimates will not substantially decrease by improving the catalog to include earthquakes of magnitude less than 2.8. Similarly, for the parameters c and d, improving the completeness of the catalog by including earthquakes of magnitude less than 2.2 does relatively little to reduce the bias in ETAS parameter estimates. On the other hand, decreasing the lower magnitude cutoff from 3.5 to 3.0 results in very substantial decreases in the bias in the estimation of all of the ETAS parameters.
 For model (3), by contrast, when the lower magnitude cutoff decreases, the biases in parameters c, d, and q appear to decrease monotonically. This is good evidence showing a strong relationship between bias and magnitude cutoff. Since (3) incorporates magnitude scaling in the spatial distribution of aftershocks, this model may be considered more realistic than (2). Unfortunately, however, the parameter p, which governs the exponent of the power law temporal decay in aftershock productivity over time since the main shock, appears to be estimated with considerable bias for all magnitude ranges considered. Indeed, errors in the estimation of p are especially large for a small subsample of simulations featuring especially strong clustering purely by chance, and in such simulations the intensity of the clustering and the corresponding error in the estimation of p were actually substantially larger when using a smaller cutoff, M0. The results suggest that great care must be taken in interpreting estimates of p for model (1–3) when using typical local catalogs of only several thousand events, and that the bias in this parameter should not be expected to diminish in the future as seismometers increase in precision and magnitude cutoffs correspondingly decrease. Further, the fact that alternative parameterizations of the triggering in ETAS models result in substantially different estimates, errors, and biases for parameters such as p and q, which have physical interpretations as the exponents governing the power law decay of seismicity in time and space, respectively, is especially troublesome and suggests that much further research is needed into alternative parameterizations and further investigation of magnitude scaling in ETAS models, as well as the comparison of goodness of fit of various formulations of ETAS model and other branching models.
 Note that while maximum likelihood estimates are well known to have desirable asymptotic properties such as unbiasedness, normality, consistency and efficiency [Ogata, 1978], for relatively small samples such parameter estimates can be biased, and this bias can furthermore be exacerbated by missing data. In practice, with models such as ETAS where the size, time, and aftershock distance distributions are modeled using heavy-tailed distributions, even catalogs of thousands of events may be considered too small for such asymptotic results to be considered applicable [Zaliapin et al., 2005]. In cases where a lower magnitude truncation is used, as is typical in seismology when fitting point process models such as ETAS, the bias in the resulting parameter estimates can be substantial. In cases where ETAS parameter estimates are investigated and used for purposes of seismic hazard forecasting or for seismological understanding, improved estimates may be available by subtracting the estimates of bias obtained by simulations as in this study, resulting in more accurate estimates of ETAS parameters compared with ordinary maximum likelihood estimates. It should be noted, however, that the estimates of bias presented here will of course depend on the parameters of the underlying ETAS model used in the simulation, as well as the minimum magnitude used in generating the simulations (here, a value of magnitude 2.0 was used). Hence, as the ETAS parameters used here were applicable to the southern California earthquake data described in section 2, one would not expect the estimates of bias shown in Figure 3 or 4 necessarily to be applicable to other data sets or for ETAS models applied to other regions. However, the methods described here may be used to estimate the bias accordingly in such cases. Furthermore, we anticipate that studies employing similar methodologies, wherein models are simulated and then estimated repeatedly, may be increasingly used to investigate the bias in parameter estimates for other geophysical models, particularly those where missing data may be a substantial problem.
 We thank the Editor and reviewers for their helpful suggestions. Yan Kagan, David Jackson, Qi Wang, and Peter Bird also provided helpful advice. This research was supported by the Southern California Earthquake Center. SCEC is funded by NSF Cooperative Agreement EAR-0529922 and USGS Cooperative Agreement 07HQAG0008. This is SCEC contribution 1281.