In common with many complex, near-critical systems, the earthquake frequency-size distribution takes the form of a power-law in energy or seismic moment, at least at small to intermediate scales [Turcotte, 1997]. This distribution is known as the Gutenberg-Richter (GR) relation [Gutenberg and Richter, 1949], usually expressed as log(F) = a − bm, where F is the total number of earthquakes of magnitude m or greater, and a and b are model parameters. Magnitude is related to seismic moment M by log(M) = A + Bm, whence F(M) ~ M−β, with β = b/B [Turcotte, 1997]. The precise form of this distribution at large scales is far more uncertain. Physically, we expect an upper bound or taper [Kagan and Jackson, 2000], with a maximum earthquake size limited by factors such as the long-term tectonic deformation rate [Main, 1996], stress regime, seismogenic thickness, and fault zone geometry. This bound is usually modeled by modifying the GR relation (MGR), specifically adding an exponential tail or taper to the cumulative form F(M) ~ M−βexp(−M/θ), where θ is a characteristic or “corner” moment [Turcotte, 1997]. Until the largest earthquakes are sampled, the MGR distribution will continue to look like the unbounded GR. Since smaller magnitudes are sampled more frequently, the rate will tend to converge from below (i.e., rates measured over short time windows are, on average, likely to be lower than the true rate, even in a stationary system) [Main et al., 2008; Naylor et al., 2009]. As a consequence, apparent “surprises” [Lay, 2012] (local record-breaking events in size or rate) can occur as the sample size increases, with significant implications for assessing uncertainties in probabilistic earthquake hazard assessment [Stein et al., 2012]. This paper is concerned with the statistical evidence that can be used to identify when earthquakes magnitude samples start deviating from GR to constrain maximum magnitudes.
 The large number of great earthquakes in recent years has given rise to speculation of non-stationary clustering compared to the global average in the last century [Bufe and Perkins, 2005]. Although this clustering is not (yet) thought to be statistically significant [Michael, 2011; Daub et al., 2012; Shearer and Stark, 2012], it is also not clear that the global earthquake catalog for the digital era is sufficiently well sampled for a reference baseline of average properties to be determined [Main et al., 2008] or to address epistemic uncertainties such as the optimal form of the frequency-size relation. The central limit theorem ensures that the standard error on estimates of a mean of a population decreases with sample size, provided the moments of the distribution are finite. However, the unbounded GR fails this criterion as the distribution does not have a finite mean [Malamud and Turcotte, 1999; Naylor et al., 2009]; importantly, the MGR is indistinguishable from GR prior to the taper being resolved. For such distributions, statistical preference for one model at any given snapshot may not be a sufficient criterion to believe that model.
 Here, we examine the temporal evolution of likelihoods and parameters for the unbounded GR and bounded MGR models for the digital era in order to resolve the form of the earthquake frequency-size distribution at large scales. We analyze the Harvard Centroid Moment Tensor (CMT) catalog [Ekstrom et al., 2012] between 1 January 1977 and 31 July 2012. This catalog has a shorter duration and higher completeness magnitude than others, but has more consistent moment estimation and reporting. The moment of the 2004 Sumatra-Andaman Islands Earthquake (SAIE) is underestimated in the standard CMT catalog because the point source approximation is no longer valid [Ekstrom et al., 2012]. Higher estimates would not influence our results and we prefer to use this value for consistency. We use a conservative minimum moment magnitude threshold of 5.75 (supporting information, Figure S1) for consistency with the previous studies [Main et al., 2008] and to ensure completeness and homogeneity of the catalog, especially in its early part [Kagan, 1997]. Parameters of the GR and MGR models are estimated using the maximum-likelihood method [Kagan and Jackson, 2000]. We prefer applying the cumulative form (the MGR as defined in the main text) rather than the gamma distribution of the probability density function [Kagan and Jackson, 2000]. Parameter uncertainties are estimated directly from the distribution of likelihoods (supporting information, Figure S2), rather than the Hessian matrix of the maximum likelihood estimate (MLE) optimization, and are strongly asymmetric.
 Prior to the record-breaking December 2004 Sumatra-Andaman Islands Earthquake [Lay et al., 2005] (SAIE), the mean monthly earthquake rate and moment rate had apparently converged to stable values [Main et al., 2008] (confirmed in Figure 1). Since then, more events have occurred, including the record-breaking Tohuku earthquake in March 2011 [Simons et al., 2011] and its aftershocks. In this period, the mean event and moment rate have continued to rise, with standard deviations roughly proportional to their respective means. This lack of convergence to a central limit is consistent with under-sampling of the MGR distribution and demonstrates that we must wait longer to reliably constrain these model parameters.
 In contrast, the maximum likelihood estimate (MLE) of the exponent β in the GR and MGR models became stable (within error) by 1990 (Figure 2a). The 95% confidence intervals for β converge with time and are even insensitive to model choice since mid-2004. However, the corner moment θ of the best-fit MGR model (expressed as an equivalent moment magnitude) shows significant steps up from 8.0 to 8.6, and then to 9.0, associated with the SAIE and the Tohoku earthquakes, respectively. Before 1985 and after the SAIE, the upper 95% confidence interval for θ cannot be constrained by the data.
 The analysis of Main et al.  is updated in Figure 2b using the Bayesian Information Criterion (BIC) to distinguish between the competing GR and MGR models, introducing an appropriate penalty for the additional parameter θ in the MGR model. The best fit curves and data are shown in color codes at three times: (1) 30 June 1999, (2) 31 December 2006, and (3) 31 July 2012. With these snapshots, the GR is the current best-fit model, although in 1999 the MGR model was preferred.
 We examine this change in preference in more detail by plotting the values of the model selection criterion ∆BIC(MGR-GR) continuously as a function of time (Figure 3b). We use a modified Bayesian Information Criterion [Leonard and Hsu, 1999], BIC = −2ln(L) + k ln(n), where L is the likelihood function, n is the total number of events, and k is the number of model parameters. The difference is then ∆BIC(MGR-GR) = −2(∆ln(L)) + ln(n). In this notation, the preferred model has the lower BIC. The GR model is initially preferred (∆BIC(MGR-GR) > 0). Values of ∆BIC(MGR-GR) then trend systematically towards a preference for the MGR model (∆BIC(MGR-GR) < 0) in the late 1980s, ending in ∆BIC(MGR-GR) ≈ −8 (or an equivalent likelihood ratio of 2981), i.e., clearly in favor of the MGR model just before the record-breaking SAIE (Figure 3a). The preference suddenly reverts back to the GR model after the SAIE and is increased by the Tohoku earthquake (the next record-breaking event), such that ∆BIC(MGR-GR) ≈ +4 or an equivalent likelihood ration of 54 in favor of the GR model.
 The convergence properties in Figure 3b are clearly volatile, so it may be premature to conclude that the current best fit model will ultimately be the best, given the current state of incomplete sampling revealed by Figures 1 and 2. To test this, we compare the observed sequential variation in ∆BIC(MGR-GR) with that expected from a randomly-sampled GR distribution (Figure 4a). We generate 100 random GR catalogs of with the same number of events as the real filtered CMT catalog. Each simulation uses a value of β randomly selected according to the probability density determined from the likelihood function for the CMT catalog up to 31 July 2012. The simulated values show moderate variability and are analytically bounded in the positive direction by ∆BIC(MGR-GR) = ln(n), where n is the total number of events in the catalog up to that point. As the two-parameter GR model cannot fit the data any better than the three-parameter MGR model, so L ≤ 1, the most favorable scenario for the GR model is that ∆ln(L) = 0. Notably, ∆BIC(MGR-GR) rarely prefers the MGR and only reaches modestly negative values when it does not. We find similar results when using higher magnitude thresholds (supporting information, Figure S3).
 Using 10,000 simulations at the three comparison times, we only once find values of ∆BIC(MGR-GR) as low as that observed prior to the SAIE (Fig 4b), and even after the SAIE and Tohoku earthquakes, the values of ∆BIC(MGR-GR) remain well below average. We are only able to observe the large negative values of ∆BIC(MGR-GR) in the real data in a period where there are no or only one small record-breaking changes in the maximum recorded earthquake (Figure 3a), an associated broad minimum in the average seismic moment release rate (Figure 1), and during a period of relative quiescence in great earthquakes [Shearer and Stark, 2012]. Clearly, the order of earthquake occurrence plays a key role in our ability to distinguish between these two models (supporting information, Figure S4). Too few m ≥ 8 earthquakes were observed prior to the SAIE for the data to be consistent with the GR model favored for the catalog today (supporting information, Figure S5). Thus, it is extremely unlikely that the earthquake frequency-size distribution before the SAIE arises because of an under-sampled GR model.
 In summary, apart from the seismic b-value, no simple statistical metrics or model parameters have converged to their long-term limits for the global CMT catalog. The unbounded GR relation is currently identified by the BIC as the best model for the frequency-magnitude distribution, but the temporal evolution of ∆BIC(MGR-GR) is inconsistent with random sampling from an unbounded GR distribution, indicating a preference for the MGR model that is very unlikely to occur by chance. The earthquakes occurring up to the SAIE represent a well-sampled global MGR distribution, where the corner magnitude, θ, is Mw 8.0. However, the SAIE changed the fundamental statistics of the global earthquake catalog by sampling a record-breaking event from a distinct population of earthquakes where the truncation magnitude is significantly higher than that observed previously. As the mean seismic moment release rate and event rate stabilize with time, the best model will return to an MGR relation with a higher truncation magnitude. The frequency-size distribution of the global earthquake catalog is therefore best explained as a mixture distribution, sampled from a large number of regional MGR distributions with spatially varying model parameters, in particular the event rate and corner moment. These findings are fundamental to constraining insurance risk derived from the largest events; if the catalog has converged, the risk is constrained; if GR remains the preferred model, the exposure to single events will continue to grow.