If subduction zone earthquakes conform to a characteristic model, in which persistent segments fail at predictable stress levels due to the steady accumulation of tectonic loading, historical seismicity may constrain the occurrence of future events. We test this model for earthquakes on the Sumatra-Andaman megathrust and other subduction zones using frequency-magnitude distributions. Using simulations, we show that Poisson confidence intervals correctly account for the counting errors of histogram data. These confidence intervals demonstrate that we cannot reject the Gutenberg-Richter distribution in favor of a characteristic model in any of the real catalogues tested. A visual bias in power-law count data at high magnitudes, combined with a sample bias for large earthquakes, is sufficient to explain candidate characteristic events. This result implies that historical earthquakes are likely poor models for future events and that Monte Carlo simulations will provide a better assessment of earthquake and associated hazards.
 There exists an established systematic relation between an earthquake's magnitude, M and its frequency of occurrence in any given region and time period, N [e.g., Gutenberg and Richter, 1954]. The Gutenberg–Richter (GR) relation can be expressed as,
where a and b are constants, with b typically close to 1. Given the logarithmic relation between an earthquake's magnitude and its energy or moment, the GR relation reflects a power-law relation between frequency and size.
 If the GR relation constituted an exact description of the frequency of earthquake magnitudes, the underlying power-law implies that a larger event will always be observed if we wait sufficiently long. However, the Earth's finite size enforces an upper limit on the size of earthquakes and hence a roll-off to the GR distribution at high magnitudes, for example, an exponential tail (a Gamma distribution) [Main and Burton, 1984]; the global catalogue does not yet demonstrate this [Main et al., 2008].
 In addition to evidence for GR and Gamma distributions, geological and palaeo-seismological data has been used to argue that “characteristic” earthquakes may result from repeated rupture of the same patch of fault [Wesnousky, 1994]. Characteristic-type behaviour also indicates a degree of predictability in the system, particularly with respect to earthquake location and magnitude. Sieh et al.  highlight the similarities between rupture locations of palaeo-earthquakes on the Sunda megathrust and events following the 2004 Andaman Islands earthquake, and argue that this is indicative of characteristic behaviour. The characteristic earthquake model imposes strong constraints on the nature of impending events and associated hazards. The expectation of a repeat of the 1797 Menatawai Islands earthquake predicts a tsunami for the city of Padang on western Sumatra which could reach 5-6m locally [Borrero et al., 2006]. A non-characteristic model for the location and size of future earthquakes forecasts a much wider range of possible tsunamis [e.g., McCloskey et al., 2008]. At present the characteristic model informs evacuation planning in western Sumatra; it is clearly desirable to use independent methods to test this hypothesis. Recent large earthquakes on the Sunda megathrust do not discriminate well between these models. Whereas the 1833 earthquake would appear to have been a good model for the spatial extent and moment of the 2005 Nias earthquake [Briggs et al., 2006], the 2007 events could not have been expected [Konca et al., 2008].
 A key manifestation of the characteristic earthquake hypothesis is a greater frequency of large events than predicted by extrapolation of the GR relation, adding a ‘bump’ to the tail of the log-linear distribution. Thus frequency-magnitude histograms potentially provide a test for characteristic behaviour. However, in many cases the evidence remains qualitative [e.g., Schwartz and Coppersmith, 1984]. Where formal statistical tests are done, it is common to neglect potential biases from the effects of finite sample size and to assume that residuals are Gaussian distributed [e.g., Speidel and Mattson, 1997].
 We consider random samples drawn from power-law distributions to discern the true properties of residuals in power-law count data. By averaging many synthetic earthquake catalogue realizations, (1) discrete numbers of observations converge towards a continuous probability density function that approximates the parent (underlying) distribution, and (2) significant biases associated with discrete samples (real earthquake magnitude observations) are quantified. Importantly, synthetic samples drawn from power-laws are free from biases that may exist in real data, for example, magnitude saturation in seismometers.
 The techniques described are generic to testing the consistency of any data against the null hypothesis of a power-law frequency-size distribution, with wide applicability elsewhere in geophysics, including natural hazard assessment.
2. What to Expect When Sampling From Power-Laws
 Intuition can be misleading when assessing whether samples are, or are not, drawn from power-law distributions. Generally, the mean of an unbounded power-law distribution is not finite; it increases systematically with sample size. In contrast, a distribution that exhibits Gaussian convergence, for example, finding the mean height of the population, converges to a stable mean rapidly with increasing sample size. Counting uncertainties may also markedly affect the appearance of a log-log histogram. To demonstrate this we have generated 1000 synthetic catalogues containing 500 events drawn from a power-law energy distribution with exponent b = 1; after taking the log10 of the values drawn from the power-law distribution we generate a histogram for each synthetic catalogue with a bin width of 0.1 in log space and superimpose the histograms in Figure 1a using red crosses to highlight the spread in the samples (throughout the paper, the log10 transformation is used to produce the magnitude scale from the power-law distributed energies). The magnitude axis is proportional to the log10 of the energy. The blue line is the power-law distribution from which samples are drawn with the slope equal to the exponent. Figure 1a demonstrates that sampling from a power-law distribution produces a visual bias that increases with magnitude; extreme events appear to have a frequency larger than that predicted by the parent distribution.
 The origin of this observational bias is clear. As histograms show count data, all of the non-empty bins contain 1 or more events (log10N ≥ 0). Some of the counts in a random sample correspond to bins where the mean frequency is less than 1. By averaging over all replicates, we can calculate the mean count in each bin (green circles in Figure 1a), which represents a catalogue with 500,000 events. These mean frequencies recover the parent distribution (blue line). Therefore, visual assessment can be misleading as to whether samples are drawn from power-law distributions; a single sample is insufficient to represent the full parent distribution (blue line). It is in this region of sparse data where candidate characteristic events lie.
 The average residuals for each bin are shown in Figure 1b. The residuals vary with magnitude and can be divided into three regions. At low magnitudes the number of events in each bin ≫1 and counting errors are small, approximately symmetric and approximately Gaussian about the parent distribution. For frequencies of ∼1 at intermediate magnitudes, residuals appear negatively biased because these events are under-sampled. The errors are clearly non-Gaussian and are approximately Poisson.
 For the largest events when the mean frequency is below 1, all occupied bins lie above the mean and hence appear positively biased; this may account for apparent characteristic behaviour.
 Since the region of negative residuals is a consequence of the counting errors, its range is modified by changes in b (Figure 1c). Decreasing b decreases the slope and accentuates the negative residuals; increasing b changes the sign of the residuals in this region. A steeper parent slope results in increasing positive bias.
 The residual analysis is repeated for different sample sizes (Figure 1d). All curves show positive residuals at high magnitudes; the deviation occurs at greater magnitudes as the sample size increases. The gradient of the positive residuals region is the same for all sample sizes. The sample of 10 is dominated by positively biased counting errors. Above ∼500 events, the well-sampled region is resolved for smaller magnitudes, but there are still ∼50% of values in the positively biased tail. Increasing the sample size increases the proportion of counts ≫1 but maintains the same mean number of events in the tail (Figure 1d). Therefore, since single-sample events in the tail cannot be better resolved by waiting for more events, evidence for non-power-law distributions can lie only significantly beyond the counting errors.
3. Implications for Fitting Data
 Standard least squares regression, especially in the case of log counts, does not correctly model the counting errors described above [Sandri and Marzocchi, 2007]. However, there remain numerous examples where the regression technique is either not stated or is clearly inappropriate [e.g., Leonard, 2008; Liu et al., 2007; Polat et al., 2008; Sayil and Osmansahim, 2008]. Using three ways to fit a power-law to the histogram, we demonstrate how different Generalised Linear Models (GLM) account for different parts of the bias. We compare (1) the least-biased (i.e., Maximum Likelihood) method using Poisson residuals (Figure 2a), with the frequently but incorrectly applied (2) Gaussian residuals (least squares in linear space) (Figure 2b), and (3) least squares with Gaussian residuals in log space (Figure 2c). It has been shown formally [Charnes et al., 1976] that GLM with Poisson residuals is mathematically equivalent Aki's  Maximum Likelihood technique often used for fitting earthquake histograms. We reserve a detailed discussion of each of these techniques for the auxiliary material.
 Confidence intervals show regions within which, in the limit of a large number of bins, a certain fraction of bin frequencies are expected to lie; this constitutes an approximate retrospective test for a correctly specified residual distribution. To each plot (Figures 2a–2c) we have added the corresponding 95% confidence limits (dashed lines) on the 2.5 and 97.5 percentiles.
 Confidence intervals for the Poisson residuals (Figure 2a) correctly account for the variability in synthetic data (i.e., 95% of the counts lie within the 95% confidence intervals), unlike either of the other fits (Figures 2b–2c), and crucially capture the presence of strong positive residuals at high magnitudes. Furthermore, Figure 2d shows estimates of the power-law exponents b for each sample using each technique. The Poisson method is not systematically biased. The Gaussian estimate (least squares in linear space) is not biased either but suffers from greater variability in individual estimates. The least-squares in log space technique displays both systematic bias and large variability; the known b used to generate the samples barely lies within the 95% confidence intervals.
 Even authors who explicitly state that they use a Maximum Likelihood method often fail to specify the distribution of residuals assumed in the regression. GLM software normally defaults to Gaussian residuals and hence this hidden assumption could lie behind much suboptimal fitting in the literature. Here we stress that the true Maximum Likelihood fit is carried out using Poisson residuals, which usually requires only the setting of a flag in the GLM function. To facilitate its use we explicitly provide the ‘R’ code for fitting histograms in the auxiliary material.
4. Analysis of Real Catalogues
 We now examine the histograms of earthquake magnitudes recorded in the Harvard CMT catalogue between 1977 and 2007 for the Sunda megathrust and comparable subduction zone regions (see auxiliary material for selection criteria) in order to test for characteristic earthquakes. More than 600 Mw > 5.4 events are recorded in the Sunda megathrust region. The Poisson fit and confidence intervals (see section 3) best characterise the histogram (Figures 2e–2g). The apparent ‘outliers’ at high magnitudes are entirely consistent with the power-law null hypothesis (see section 2). Least squares (in log space) produces a poor fit to the bulk of the data at lower magnitudes (where the fit should be best) and the confidence intervals for the Gaussian fit (in linear space) are clearly too wide.
 We also analyse a subset of the catalogue for events whose hypocentre locations and focal mechanisms suggest they lie on the megathrust interface. These events might be more likely to display a characteristic-type distribution (Figure 2h). However, the confidence intervals on the counts are wider due to the smaller sample (Figure 1c). One event lies beyond the 95% confidence limit, but within the 99% confidence limit (the 99% limit for a count of 1 is at magnitude 9.5). Since 134 events remain in this subset, the result is therefore not surprising. A single event where none is expected, in a bin at a high magnitude, is not strong evidence for a deviation from a single power-law. We apply the Poisson GLM fit to four further subduction zone regions (Figure 3). All these catalogues also have histograms that lie within 95% confidence intervals of the GR distribution, with no evidence for characteristic events.
 For all subduction zone earthquake catalogues considered in this paper, all histogram frequencies lie within Poisson counting fluctuations around a single power-law distribution. Other effects, for example, magnitude correlations or noise in the magnitude measurements, are too weak to be resolved. We find no evidence for rejecting the GR law in favour of the characteristic-type distribution.
 Inappropriate regression techniques provide biased estimates of histogram frequencies. This bias is amplified by logarithmic transformation, such that the extrapolation of a power-law distribution is misleading, particularly for extreme events.
 Whilst taking larger samples improves the estimate of the distribution parameters, it cannot smooth the fluctuations in the tail of an unbounded power-law. Therefore, rejection of the GR distribution requires evidence of observations beyond counting errors.
 Some authors have noted that paleoseismological data record larger magnitude events than would be predicted by extrapolation of the GR model [e.g., Speidel and Mattson, 1997]. These outliers can be explained by a combination of the apparent positive bias for small samples at high magnitudes (even the best palaeoseismic records rarely contain more than 10 events), and the preferential preservation of such events in the geological record.
 In an era when earthquake science is required to advise civil society, our results have important implications for preparedness education in threatened centres of population. The characteristic model potentially constrains the location and magnitude of future events on a given fault segment, an appealing property that allows targeted preparedness strategies (e.g., retrofitting) to be put in place at lower cost. Here we have shown that it may be more prudent to consider a wider range of possibilities, including using Monte Carlo techniques to forecast hazard scenarios with a much wider range of possible consequences.
 MN was funded by EPSRC(GR/T11753/01), JG was funded by the EU TRIGS project (NEST-2005-PATH-COM-043386), and AB was funded by the EU NERIES project (RII3-CT-2006-026130) and collaboration NERC(NE/F01161X/1).