## 1. Introduction

[2] There exists an established systematic relation between an earthquake's magnitude, *M* and its frequency of occurrence in any given region and time period, *N* [e.g., *Gutenberg and Richter*, 1954]. The Gutenberg–Richter (GR) relation can be expressed as,

where *a* and *b* are constants, with *b* typically close to 1. Given the logarithmic relation between an earthquake's magnitude and its energy or moment, the GR relation reflects a power-law relation between frequency and size.

[3] If the GR relation constituted an exact description of the frequency of earthquake magnitudes, the underlying power-law implies that a larger event will always be observed if we wait sufficiently long. However, the Earth's finite size enforces an upper limit on the size of earthquakes and hence a roll-off to the GR distribution at high magnitudes, for example, an exponential tail (a Gamma distribution) [*Main and Burton*, 1984]; the global catalogue does not yet demonstrate this [*Main et al.*, 2008].

[4] In addition to evidence for GR and Gamma distributions, geological and palaeo-seismological data has been used to argue that “characteristic” earthquakes may result from repeated rupture of the same patch of fault [*Wesnousky*, 1994]. Characteristic-type behaviour also indicates a degree of predictability in the system, particularly with respect to earthquake location and magnitude. *Sieh et al.* [2008] highlight the similarities between rupture locations of palaeo-earthquakes on the Sunda megathrust and events following the 2004 Andaman Islands earthquake, and argue that this is indicative of characteristic behaviour. The characteristic earthquake model imposes strong constraints on the nature of impending events and associated hazards. The expectation of a repeat of the 1797 Menatawai Islands earthquake predicts a tsunami for the city of Padang on western Sumatra which could reach 5-6m locally [*Borrero et al.*, 2006]. A non-characteristic model for the location and size of future earthquakes forecasts a much wider range of possible tsunamis [e.g., *McCloskey et al.*, 2008]. At present the characteristic model informs evacuation planning in western Sumatra; it is clearly desirable to use independent methods to test this hypothesis. Recent large earthquakes on the Sunda megathrust do not discriminate well between these models. Whereas the 1833 earthquake would appear to have been a good model for the spatial extent and moment of the 2005 Nias earthquake [*Briggs et al.*, 2006], the 2007 events could not have been expected [*Konca et al.*, 2008].

[5] A key manifestation of the characteristic earthquake hypothesis is a greater frequency of large events than predicted by extrapolation of the GR relation, adding a ‘bump’ to the tail of the log-linear distribution. Thus frequency-magnitude histograms potentially provide a test for characteristic behaviour. However, in many cases the evidence remains qualitative [e.g., *Schwartz and Coppersmith*, 1984]. Where formal statistical tests are done, it is common to neglect potential biases from the effects of finite sample size and to assume that residuals are Gaussian distributed [e.g., *Speidel and Mattson*, 1997].

[6] We consider random samples drawn from power-law distributions to discern the true properties of residuals in power-law count data. By averaging many synthetic earthquake catalogue realizations, (1) discrete numbers of observations converge towards a continuous probability density function that approximates the parent (underlying) distribution, and (2) significant biases associated with discrete samples (real earthquake magnitude observations) are quantified. Importantly, synthetic samples drawn from power-laws are free from biases that may exist in real data, for example, magnitude saturation in seismometers.

[7] The techniques described are generic to testing the consistency of any data against the null hypothesis of a power-law frequency-size distribution, with wide applicability elsewhere in geophysics, including natural hazard assessment.