## 1. Introduction

[2] The estimation of models in the presence of noisy or missing data is a topic of significant concern in numerous scientific applications. Indeed, much of our knowledge about structures and processes beneath the Earth's surface is tempered by incomplete data and observations with low signal-to-noise ratios [*Bolt*, 1996]. In the case of seismology, one of the most significant obstacles to current understanding and characterization of earthquake catalogs is the fact that earthquakes of small magnitude are often undetected by seismic monitoring stations, and as a result, the analysis of modern earthquake catalogs is typically limited to earthquakes above a certain lower magnitude detection threshold [*Kagan*, 2002, 2003]. That is, prior to modeling and analysis of earthquake catalogs, earthquakes of magnitudes below a certain cutoff are excluded from analysis, so that the remaining events are thought to represent a relatively complete list of earthquakes above the cutoff magnitude.

[3] As the density of monitoring stations increases and technological developments facilitate the recording of increasingly accurate earthquake catalogs down to progressively lower magnitude thresholds, a question of increasing importance is how much improvement such advances allow, in terms of accuracy in estimating parameters in models for earthquake occurrence and consequently in earthquake forecasting. The goal of this paper is to address this question via a simulation study of realistic earthquake models, analyzing each simulation following a variety of lower magnitude cutoffs, and investigating the bias in parameter estimates introduced by the removal of earthquakes below the lower magnitude cutoff. While a variety of models for earthquake occurrences have been proposed, we focus here specifically on the spatial-temporal epidemic-type aftershock sequence (ETAS) models introduced by *Ogata* [1998], as these branching models are commonly used to describe modern earthquake catalogs. Such models are widely used at present for a variety of purposes, including time-dependent forecasting and seismic hazard estimation [*Schorlemmer et al.*, 2007], earthquake declustering [*Zhuang et al.*, 2002], detection of anomalous seismic behavior [*Ogata*, 2007], the study of differences between tectonic zones [*Kagan et al.*, 2010], testing hypotheses about static and dynamic earthquake triggering [*Hainzl and Ogata*, 2005] and as reference models for the comparison with alternatives [*Ogata and Zhuang*, 2006].

[4] The relationship between the lower magnitude threshold and the branching ratio in the ETAS model was investigated from a theoretical perspective by *Sornette and Werner* [2005a, 2005b]. Here, the focus is on the statistical properties of maximum likelihood estimates of the parameters in these point process models. In particular, we show how the bias in these parameters changes as the lower magnitude cutoff increases and describe the observed relationship between the cutoff and the bias for each parameter. Increased understanding of the biases in the estimates of parameters in ETAS models may lead to improved (bias-corrected) estimates and hence more accurate forecasts and estimates of seismic hazard, as well as more appropriate reference models and improvements in our understanding of seismic phenomena in ways related to the use of the ETAS model.

[5] It is difficult to study the bias in parameter estimates without the ability to repeatedly draw simulations from the model in question and then estimate its parameters for each realization. While efficient methods for simulating branching models such as ETAS have been developed decades ago [e.g., *Ogata*, 1981], stable, reliable, automatic methods for their estimation have not been developed until recently, and the repeated simulation and estimation of the space-time ETAS model studied here are greatly facilitated by the expectation-maximization (EM)-type estimation procedure developed by *Veen and Schoenberg* [2008]. Indeed, as described by *Veen and Schoenberg* [2008], conventional maximum likelihood estimation for multiparameter models such as ETAS is not only computationally inefficient but can be quite problematic because of problems such as multimodality, dependence on choice of starting values, and extreme flatness of the likelihood function near the optimum which can cause what is referred to by *Veen and Schoenberg* [2008] as computational multimodality. As a result, a substantial amount of patience and supervision is typically required in fitting a space-time ETAS model to a given data set, and this has been a major obstacle to the repeated simulation and estimation required for the current study. Viewing branching process models as incomplete data problems, *Veen and Schoenberg* [2008] introduced a relatively novel approach by applying the expectation-maximization (EM) algorithm to attain maximum likelihood estimates (MLEs) [*Dempster et al.*, 1977]. The fact that the EM-type estimation procedure of *Veen and Schoenberg* [2008] is comparatively automatic and resistant to these problems makes it especially amenable to simulation studies such as this.

[6] The remainder of this paper is outlined as follows. Section 2 describes the methods used in our simulation study, including a brief description of the space-time ETAS model and its estimation. The results are presented in section 3, and a discussion follows in section 4.