## 1. Introduction

[2] The origin of Benford's law [*Benford*, 1938] goes back to the 19th century, when the astronomer Newcomb [*Newcomb*, 1881] first noticed that library books of logarithms were more thumbed in the earlier pages than the latter. He explained how this could arise if the frequency of first digits themselves were not uniform in real world observations but rather followed the rule

where *P*_{D} is the probability of first (non-zero) digit D occurring (*D* = 1, …, 9). For example, the real numbers 123.0 and 0.016 both have *D* = 1, and the digit law suggests that numbers beginning with a 1 will occur about 30% of the time in nature, while those with a first digit of 2 will occur about 17% of the time, and so on down to first digits of 9 occurring about 4% of the time (see Table 1). This decreasing trend of probabilities with digit is shown as a histogram in Figure 1. The implications of the digit rule are significant as not only is the distribution not uniform, implying that digit frequencies are not independent, but to be true it must also hold irrespective of the units of the data as well as their source. Hence a universal property of real world measurements is implied. The result was rediscovered in 1938 by an engineer called Benford [*Benford*, 1938]. Benford also extended the law to arbitrary base, *B*, and to multiple digits, *N*. In this case (1) is unchanged except the logarithm base becomes *B* and *D* represents the corresponding *N*-digit integer. (With two digits there are 90 possibilities for *D*, i.e., *D* = 10, 11, …, 99. As the number of digits increases the probability distribution in (1) tends toward uniformity.)

First Digit Frequencies | Number of Values in Each Data Set | Dynamic Range of the Data (max/min) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|---|

1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |||

- a
The first row is the expected percentage according to Benford's law; the second row is Earth's geomagnetic field model *gufm*1 [*Jackson et al.*, 2000]; the third row is the estimated time in years between reversals of Earth's geomagnetic field for the past 84 million years [*Cande and Kent*, 1995]; the fourth row is seismic body P-wavespeeds of Earth's mantle below the SW Pacific estimated from the inversion of seismic travel times [*Gorbatov and Kennett*, 2003]; the fifth row is spherical harmonic coefficients, up to 160 degrees, of Earth's gravity field (model GGM02S) based on the analysis of 363 days of GRACE in-flight data, spread between April 4, 2002 and Dec 31, 2003 [*Tapley et al.*, 2005]; the sixth row is masses of extrasolar planets taken from the interactive ExtraSolar Planet Catalogue (URL http://www.exoplanet.eu); the seventh row is barycentric rotation frequencies of known pulsars (in Hz) from the ATNF catalogue [*Manchester et al.*, 2005]; the eight row is photon fluxes, in photons/cm^{2}/s, for 1451 bright objects identified by the Fermi Gamma-ray Space Telescope across the galactic in the first 11 months of operation, August 2008–July 2009 taken from the LAT 1-year point source catalog (URL http://fermi.gsfc.nasa.gov/ssc/data/access/lat/1yr_catalog/); the ninth row is earthquake depths taken from the National Earthquake Information Catalogue (with artificially assigned values at 5, 11, and 33 kms removed); the tenth row is displacement counts measured on a seismometer in Peru (station NNA) for the first 20 minutes following the first recording of the 2004 Sumatra-Andaman earthquake; the eleventh row is emissions of green house gases per country in million tons*CO*_{2}equivalent for 2005 [*Baumert et al.*, 2010]; the twelfth row is global monthly averaged temperature anomalies from the*gistemp*database over the period 1880–2008 measured in degrees with base period 1951–1980 [*Hansen et al.*, 1994]; the thirteenth row is CODATA recommended values for fundamental physical constants [*Mohr et al.*, 2008]; the fourteenth row is total numbers of cases of 18 infectious diseases reported to the World Health Organization by 193 countries worldwide in 2007 [*World Health Organization*, 2009]; the fifteenth row is values from a geometric series (*a*_{o}*r*^{n−1},*n*= 1…,10^{4}) with starting point*a*_{o}=*π*and factor*r*= 1.05 and the sixteenth row is terms in the Fibonacci series*F*_{n}=*F*_{n−1}+*F*_{n−2}, (*F*_{0}= 0,*F*_{1}= 1). The last row with label “Combined” is the first digit distribution of randomly selected values from all fifteen data sets (each set weighted equally).
| |||||||||||

P_{D} | 30.1 | 17.6 | 12.49 | 9.69 | 7.92 | 6.69 | 5.80 | 5.12 | 4.58 | ||

Geomagnetic Field | 28.9 | 17.7 | 13.3 | 9.4 | 8.1 | 6.9 | 6.1 | 5.1 | 4.5 | 36512 | 10^{10} |

Geomagnetic reversals | 32.3 | 19.4 | 13.9 | 11.8 | 5.3 | 4.3 | 3.2 | 5.4 | 4.3 | 93 | 10^{3} |

Seismic wavespeeds below SW-Pacific | 30.0 | 17.6 | 13.3 | 9.8 | 7.9 | 6.4 | 5.6 | 4.89 | 4.47 | 423776 | 10^{6} |

Earth's gravity | 33.0 | 16.6 | 11.2 | 8.5 | 7.5 | 6.7 | 5.94 | 5.57 | 5.03 | 25917 | 10^{9} |

Exoplanet mass | 33.9 | 15.4 | 10.7 | 9.2 | 6.23 | 9.47 | 5.98 | 4.48 | 4.48 | 401 | 10^{5} |

Pulsars rotation freq | 33.9 | 20.7 | 12.7 | 7.6 | 5.3 | 5.0 | 4.94 | 4.67 | 4.88 | 1861 | 10^{4} |

Fermi space telescope γ-ray source fluxes | 30.3 | 17.9 | 13.0 | 9.9 | 7.6 | 6.96 | 5.23 | 5.23 | 2.72 | 1451 | 10^{5} |

Earthquake depths | 31.6 | 16.9 | 14.0 | 8.69 | 6.98 | 7.42 | 5.27 | 4.58 | 4.36 | 248915 | 10^{2} |

S-A seismogram | 28.4 | 15.7 | 12.5 | 9.6 | 8.97 | 7.37 | 6.52 | 6.04 | 4.93 | 24000 | 10^{5} |

Green house gas emissions by country | 29.9 | 17.9 | 11.4 | 7.6 | 9.2 | 8.15 | 5.97 | 4.89 | 4.89 | 184 | 10^{4} |

Global Temp anomalies in period 1880–2008 | 27.7 | 19.4 | 12.7 | 12.1 | 8.9 | 5.4 | 6.61 | 4.32 | 2.81 | 1527 | 10^{2} |

Fund. Phys. constants | 34.0 | 18.4 | 9.2 | 8.28 | 8.58 | 7.36 | 3.37 | 5.21 | 5.52 | 326 | 10^{4} |

Global Infectious disease cases | 33.7 | 16.7 | 13.2 | 10.7 | 7.3 | 5.4 | 4.56 | 5.07 | 3.34 | 987 | 10^{6} |

Geometric series | 29.8 | 17.4 | 13.0 | 10.0 | 7.8 | 6.6 | 5.8 | 5.0 | 4.6 | 1000 | 10^{21} |

Fibbonacci sequence | 30.0 | 17.7 | 12.5 | 9.6 | 8.0 | 6.7 | 5.7 | 5.3 | 4.5 | 1000 | 10^{14} |

Combined | 30.9 | 17.4 | 13.2 | 9.0 | 7.6 | 6.4 | 5.7 | 4.8 | 5.0 | 10000 | 10^{33} |

[3] Benford showed that 20,229 real numbers drawn from 20 sources all approximately followed the same first digit rule. These included populations of cities, financial data and American baseball league averages. Benford's results were well known in mathematical circles and despite a waning of interest his name became associated with the law. Thirty years later the same first digit distribution was noticed in numbers encountered by computers [*Knuth*, 1968]. This led to the suggestion that advanced knowledge of the digit frequency encountered by computers might be used to optimize their design, although this appears never to have been implemented. It has also been suggested that Benford's law (hereafter BL) may provide a novel way of testing realism in mathematical models of physical processes [*Hill*, 1998]. If quantities associated with those processes are known to satisfy BL then computer simulations of them should do also. More recently BL has been shown to hold in stock prices [*Ley*, 1996] and some election results (B. F. Roukema, Benford's Law anomalies in the 2009 Iranian presidential election, ArXiv:0906.2789v3, 2009).