The Theoretical Foundation of Zipf's Law and Its Application to the Bibliographic Database Environment

Authors

  • Jane Fedorowicz

    1. Accounting and Information Systems Department, J. L. Kellogg Graduate School of Management, Northwestern University, Evanston, IL 60201
    Search for more papers by this author

Abstract

What does the frequency of occurrence of different words in an article have to do with the number of times an article is cited? Or, for that matter, with the number of publications an author has? All of these—word frequency, citation frequency, and publication frequency-obey an ubiquitous distribution called Zipf's law. Zipf's law applies as well to such diverse subjects as income distribution, firm size, and biological genera and species. Zipf in 1949 described a hyperbolic rank-frequency word distribution, which he fitted to a number of texts. He stated that if all unique words in a text are arranged (or ranked) in order of decreasing frequency of occurrence, the product of frequency times rank yields a constant which is approximately equal for all words in a text. The law has been shown to encompass many natural phenomena, and is equivalent to the distributions of Yule, Lotka, Pareto, Bradford, and Price. An ubiquitous empirical regularity suggests some universal principal. This article examines a number of theoretical derivations of the law, in order to show the relationship between the many attempts at ascertaining a theoretical justification for the phenomenon. We then briefly examine some of the ramifications of applying the law to the bibliographic database environment.

Ancillary