I study the ability to estimate the tail of the frequency-magnitude distribution of global earthquakes. While power-law scaling for small earthquakes is accepted by support of data, the tail remains speculative. In a recent study, Bell et al. (2013) claim that the frequency-magnitude distribution of global earthquakes converges to a tapered Pareto distribution. I show that this finding results from data fitting errors, namely from the biased maximum likelihood estimation of the corner magnitude θin strongly undersampled models. In particular, the estimation of θdepends solely on the few largest events in the catalog. Taking this into account, I compare various state-of-the-art models for the global frequency-magnitude distribution. After discarding undersampled models, the remaining ones, including the unbounded Gutenberg-Richter distribution, perform all equally well and are, therefore, indistinguishable. Convergence to a specific distribution, if it ever takes place, requires about 200 years homogeneous recording of global seismicity, at least.
 The tail of the frequency-magnitude distribution (FMD) of earthquakes is the most desirable part to know, because it predicts the frequency of the largest events. It is broadly accepted that the Gutenberg-Richter scaling relation cannot be extrapolated to arbitrarily large magnitudes, because this results in a conflict with basic physical principles, e.g., conservation of energy. A simple truncated version, the so-called doubly truncated Gutenberg-Richter (GR) law defined by probability density function (PDF)
with magnitudes m∈m0;Mand the Richter b-value has been criticized, because the sharp cutoff at the maximum magnitude Mis considered to be unphysical. As a compromise, the tapered Pareto distribution (Kagan and Jackson ), also called “modified Gutenberg-Richter” (MGR) distribution became popular in the recent past. The PDF, which is usually given in terms of seismic moment , is
and can be transformed by , with measured in nanometers, into a PDF for earthquake magnitudes m:
the tapering is described by the function
 In equations (2) and (3), β is the shape parameter and the sharp cutoff at an ultimate maximum magnitude M in the truncated distribution (equation (1)) is replaced by smooth tapering characterized by the corner magnitude mc. Although probabilities for large earthquakes are reduced significantly, this model still allows for the occurrence of infinitely large events.
 In their recent analysis, Bell et al.  claim that in the periods between the largest global earthquakes, “the preferred model gradually converges to the tapered GR relation” (equations (2) and (3)) and that “the form of the convergence cannot be explained by random sampling of an unbounded GR distribution.” In the present analysis, I demonstrate that such a conclusion cannot be justified solely from an analysis of the global catalog. Performing an objective comparison of various state-of-the-art models for the FMD, the limits of catalog-based studies on the tail of the FMD are identified. This will be carried out in the following steps: in section 2, I show that the time dependence of the estimated FMDs is governed by the strongly biased estimation of the tail parameter mc or M, which, in turn, results in a misleading comparison of models with different degrees of undersampling, and eventually to the conclusion of a converging FMD. Taking this into account, I calculate Bayesian posterior probabilities to compare various state-of-the-art models for the FMD of the global CMT catalog (section 3). For this analysis, I use the same catalog as Bell et al. , namely the global CMT catalog [Ekström et al., 2012] with magnitudes m≥5.75 from 1977 until mid-2012. Finally, conclusions with respect to potential future studies are drawn.
2 Estimation of the Frequency-Magnitude Distribution
 Models for the FMD of earthquakes usually include three parameters. The lower magnitude threshold m0 is subject to catalog completeness and therefore assumed to be known. The shape parameter (b or β) measures the ratio of small to large events; the values b=1 and β=2/3 are relatively robust for global seismicity [Kagan and Schoenberg, 2001]. Finally, the tail parameter θ(θ=M for the GR model and θ=mc for the MGR model) characterizes the distribution at large magnitudes. Because m0 and the shape parameter play a minor role in this study, I will use the following simplified notation: fθ(m) is the PDF of a FMD, f can be either the GR distribution (equation (1) with θ=M) or the MGR distribution (equation (3) with θ=mc). The corresponding cumulative distribution function (CDF) is denoted by Fθ(m).
 Estimating the shape and the tail parameter by using the maximum likelihood method allows for reasonable fits of empirical FMDs (see, e.g., Figure 3 in Kagan and Jackson ). Such a fit is, of course, only a snapshot in time. When time evolves, the estimated values of β and θchange almost independently in the following way. A growing number of small events will stabilize the shape parameter but has no effect on the estimation of the tail parameter, because θ depends only on the very few largest events in a catalog [Kagan and Schoenberg, 2001]. The estimated value of θ will only undergo a clear change if an earthquake close to the true value of θ in the model occurs. This becomes particularly clear in the GR model, where the maximum likelihood estimate of M in equation (1) can be expressed by a simple analytical equation; in particular, depends only on a single earthquake, the largest observed earthquake,
regardless of all the details in the remainder of the catalog [Holschneider et al., 2011; Zöller et al., 2013]. Consequently, convergence of an estimated GR distribution to the true GR distribution, requires that the maximum observed magnitude converges to the unknown maximum possible magnitude M. For the corner magnitude mc in the MGR model, this result holds in approximation. In other words, as long as the tail of the distribution is not sampled by a large number of events with magnitudes close to the tail parameter θ, the FMD is undersampled and the estimation of θ is biased.
 For a given earthquake catalog, there is no deterministic way to precisely answer the question how close the maximum observed earthquake is to the maximum possible event. For rough estimates in a specific zone, paleoseismological or geological data [Wells and Coppersmith, 1994] may be helpful. For subduction zones, Kagan and Jackson  estimated the corner magnitude mc=9.6 in the MGR model from catalogs and strain rates; this value may also serve for global seismic activity. However, using catalogs alone raises the question whether an assumed model for the FMD is well sampled by the catalog; this question will be addressed in terms of probability theory in the next section.
Bell et al.  compare two candidate models for the FMD of the global CMT catalog; first, the unbounded Gutenberg-Richter (GR) model (equation (1) with M=∞) and second, the MGR model (equations (2) or (3)). They calculate for each model the Bayesian Information criterion BIC=−2 log(L)+2k log(N) [Schwarz, 1978], where k is the number of free parameters, N is the number of data points and
is the sample likelihood; here mi denotes the magnitude of the ith earthquake in the catalog. The difference ΔBIC provides information on which model is preferable.
 It is claimed by Bell et al.  that “the temporal evolution of ΔBICMGR-GR is inconsistent with random sampling from an unbounded GR distribution, indicating a preference for the MGR model, that is very unlikely to occur by chance”. In the following, I demonstrate that the temporal evolution of the likelihoods in ΔBIC as used by Bell et al.  is governed by the biased estimation of the tail parameter θand is thus not suitable to compare models. For this aim, I sample earthquake magnitudes from a perfect MGR model and compare the true FMD with the estimated FMD for the synthetic data set. For the latter, a single parameter is fitted, namely the corner magnitude mc, while I use the true value of the shape parameter β; I recall that βand mc are essentially uncorrelated [Kagan and Jackson, 2000]. In detail, I generate 50 earthquake catalogs based on a Poisson process with the same event rate and time coverage as the global CMT catalog from 1977 until mid-2012 and draw randomly magnitudes from the tapered Pareto distribution with shape parameter β=2/3, lower threshold magnitude m0=5.75, and corner magnitude mc=10. With the occurrence of each earthquake, the maximum likelihood estimate of the corner magnitude is calculated and the estimated distribution is compared with the true distribution computing Δ log(L)= log(Ltrue)− log(Lestimated) for data from a simulated earthquake sequence of 50 years with the true corner magnitude. If was an adequate fit to the true distribution, the values of Δ log(L) would be centered closely around Δ log(L)=0. However, the results for the temporal evolution of Δ log(L) for the 50 simulated catalogs shown in Figure 1 are characterized by overall positive values in most cases; thus, the estimated distribution fails to fit the true distribution, even after some thousands of events have been sampled. Moreover, the Δ log(L) values are characterized by considerable scatter, indicating large estimation errors of the corner magnitude.
 The result of this section can be summarized as follows: the estimated FMD will be strongly biased, if the tail parameter θ is estimated by the maximum likelihood method from an undersampled model. At first glance, the number of 7585 earthquakes in the global catalog seems to be sufficient for statistical estimation of the FMD. However, because the estimation of mc depends solely on a few very large events, the results in Figure 1 clearly indicate that this number is still much too small to allow for an adequate modeling of the distribution tail.
3 The Tail of the Distribution: Model Comparison
 It is a trivial fact that data fitting only works if a reasonable amount of data are available. The rareness of large earthquakes clearly limits the possibility to draw conclusion on the tail of the FMD. However, comparing various model FMDs allows one to give preference to a specific model, which fits better than the others, although the overall quality of the fit might be poor. For this purpose, it is essential to take into account that different models usually have a different degree of undersampling by a catalog. Ignoring this, the application of likelihood-based methods will produce misleading results, because each missing earthquake will account for a missing factor fθ(mi) in equation (5). When the number of missing earthquakes differs for the models under consideration, the comparison will be biased. In particular, the likelihood of undersampled models will be overestimated. This trade-off will be discussed in the remainder of this section. Therefore, two questions have to be addressed properly for all candidate models:
 Which models are likely to be well sampled by the global catalog in its magnitude range? This is necessary for the calculation of the correct sample likelihood.
 Which model performs best in terms of the highest sample likelihood. Models that failed the criterion in point 1 have to be discarded from further analysis; for the remaining models, likelihood-based methods as in Bell et al.  can be adopted.
 The first point can be addressed by the following consideration. If N random numbers are drawn from a distribution F(x), probability theory “predicts” that the distribution of the largest number μ is Pr(μ<x)=F(x)N with a certain expected value . Now, if a random sample includes a number Xmax with , this number is an outlier in terms of a low-probability event, which is sometimes called a “black swan” in popular literature. In this case, the distribution is considered to be not well sampled up to Xmax. On this basis, I define a “well-sampled FMD” in the range m0;mmax, obs in the following way: if mmax, obs is clearly higher than the expected value of Fθ(m)N, it is considered to be an outlier. Then the distribution is, by definition, not well sampled up to mmax, obs from the catalog. In quantitative terms, the FMD is not well sampled if Pr(μ<mmax, obs)=Fθ(mmax, obs)N is high (close to one). In the complementary case, if
is close to one, the FMD is considered to be well sampled up to mmax, obs. The shape parameter (b or β) in Fθ(m) is estimated by the maximum likelihood method; I emphasize that this estimation is not affected by data fitting errors as for the case of θ, because the shape parameter is estimated almost independently of θ from the frequently occurring small earthquakes.
 For the following analysis, I consider 17 candidate models for the FMD from the GR (equation (1)) and the MGR (equation (3)) family with different tail parameters θ. In particular, I compare nine MGR models (θ=8.5,9,9.2,9.5,10,10.5,11,11.5,12) and eight GR models (θ=9.2,9.5,10,10.5,11,11.5,12,∞). The models are referred to as MGR-θand GR-θ. Their FMDs are practically identical for small and intermediate earthquakes; only the tail, characterized by θ, differs (see Figure 2). I note that for the MGR models, the maximum observed magnitude can exceed the corner magnitude. For this reason, MGR-8.5 and MGR-9 are included, while for the GR family, θ=8.5 and θ=9 are not possible, because the magnitude mmax,obs=9.1 of the 2011 Tohoku event exceeds θ=9.
 Now the consistency of all models with the global CMT catalog between 1977.0 and 2012.5 and magnitudes m≥5.75 is compared. Figure 2a shows the CDF of the FMD of the global CMT catalog, which is very well fitted by all candidate models. However, going into the distribution tail (Figure 2b) unveils differences: in particular, model MGR-8.5 seems to be favorable. On the other hand, the probability Pr(μ>9.1) in equation (6) is only 0.0007 for MGR-8.5, indicating an enormous degree of undersampling. For this clear violation of the criterion in point 1 from above, model MGR-8.5 has to be discarded from further analysis; this does not mean that the model is disproved, it is rather not testable in the framework of likelihood methods with the given catalog.
 Next, I calculate for each model the Bayesian posterior probability [Irizarry, 2001], assuming that all models have the same prior probability of being favorable for the global catalog,
where Li is the sample likelihood (equation (5)) of model i.
 Results are shown in Figure 3. While the upper panel shows the values of pi for all models i, the lower panel provides the probabilities from equation (6), indicating the degree of undersampling. The figure illustrates two important features: first, the likelihood values suggest models with tail parameters θ=9 and θ=9.5 with a clear preference for model GR-9.2 (p=0.93); this is not surprising, because θ=9.2 is close to the maximum likelihood estimate of θ, which is, however, biased (section 2). Second, all models that are likely to be well sampled in the magnitude range 5.75≤m≤9.1 of the global catalog have tail values θ≥10.5 and are equally likely in terms of pi, including the unbounded GR model. The probability Pr(μ>9.1) for the preferred model GR-9.2 is only 0.55.
 The objective comparison of the 17 models results in the following important consequences: models for the global FMD with θ≤10 are likely to be undersampled; therefore, the high posterior probabilities in Figure 3 are based on overestimated likelihood values. Because of this insufficient support by data, these models are not testable by means of likelihood-based methods and have to be discarded. The performance of the remaining models (θ≥10.5) is overall equal. No preference can be given to a specific model. Even the unbounded GR distribution is neither better nor worse than the others. However, from a physical point of view, the relevance of these models is questionable, because the high tail parameters allow for the occurrence of unrealistically large earthquakes.
 One might speculate that these findings are specific to the global CMT catalog or to the chosen families of GR and MGR models. It is, however, general that the sample likelihood (equation (5)) will only depend on the tail of the distribution, if the tail is supported by data. For example, two FMD models which are nearly identical for small and intermediate earthquakes might start to deviate from each other at magnitude m≈10. When the likelihood is calculated from an earthquake catalog with maximum observed magnitude mmax, obs=7, the different tails do not enter into the calculation and the likelihoods as well as the posterior probabilities pi will be essentially identical. Furthermore, for models with tail parameters around θ=7, the value of θ is close to the maximum likelihood estimate of . This estimate will only be useful if the catalog includes numerous earthquakes with magnitudes very close to mmax, obs=7. This is, however, rarely the case, suggesting that the results of this study hold in good approximation for most situations that one is confronted with when analyzing real-world earthquake catalogs.
 Which lesson can be learned from the results in Figure 3 on the FMD of the global CMT catalog? The Gutenberg-Richter model truncated at M=9.2 seems to be preferable. Other models with 9≤θ≤10 have increased posterior probabilities pi as well. However, the origin of the high values of pi is not found in the quality of the data fit, it is rather the different degree of undersampling for each model leading to different errors in the likelihood. Consequently, these models have to be discarded from the model comparison. Moreover, the models GR-M with M<9.5 can even be disproved by the occurrence of the 1960 M=9.5 earthquake in Chili, which is not included in the catalog of this study. The models with θ≥10.5 including the unbounded GR model are not distinguishable in terms of pi. In summary, the global earthquake catalog does not allow one to give preference to any particular model family for the FMD (GR or MGR) nor to any particular tail parameter θ. The conclusion to favor any FMD model solely from the analysis of the global CMT catalog is therefore statistically not justifiable. This result may be exacerbated by taking into account that global seismicity arises from a stack of many distributions, each with an own maximum or corner magnitude and different degrees of undersampling.
 Finally, I estimate how many earthquakes are needed to have a sufficient sample for all models with θ≥9, given that mmax, obs=9.1 will not be exceeded during the additional observational period. Equating Pr(μ>9.1) to 99%for these models, this number is calculated from equation (6) to N=45,360, or alternatively, 212 years of global earthquake history with m≥5.75, including also model MGR-8.5, results in more than 200,000 years. If during the additional period an earthquake with m>9.1 occurs, the required observation interval will grow accordingly. Taking into account the finding of McCaffrey  that earthquakes with M>9 have return times between 200 and 1500 years, the number of 212 years has to be considered as the most optimistic view.
 Since the results in this study are almost independent of the specific shape of the assumed tail of the distribution, it is futile to search for additional, potentially better-fitting distributions. In the light of these findings, I suggest to refrain estimations of maximum or corner magnitudes from earthquake catalogs alone. Future studies should focus on the assimilation of independent data from geology and tectonics in order to better constrain the maximum possible earthquake magnitude.
 I am grateful to Sebastian Hainzl for a valuable discussion. The manuscript benefitted from constructive comments of Tom Parsons and Euan Smith. This work was supported by the Potsdam Research Cluster for Georisk Analysis, Environmental Change and Sustainability (PROGRESS). For parts of the calculations, I have used the R software package PtProcess (Harte ).
 The Editor thanks Tom Parsons and Euan Smith for assistance in evaluating this paper.