## Introduction

Commonly observed patterns follow a few families of probability distributions. For example, Gaussian patterns often arise from measures of height or weight, and gamma patterns often arise from measures of waiting times. These common patterns lead to two questions. How are the different families of distributions related? Why are there so few families, when the possible patterns are essentially infinite?

These questions are important, because one can hardly begin to study nature without some sense of the fundamental contours of pattern and why those contours arise. For example, no one observing a Gaussian distribution of weights in a population would feel a need to give a special explanation for that pattern. The central limit theorem tells us that a Gaussian distribution is a natural and widely expected pattern that arises from measuring aggregates in a certain way.

With other common patterns, such as neutral distributions in biology or power laws in physical phenomena, the current standard of interpretation is much more variable. That variability arises because we do not have a comprehensive theory of how measurement and information shape the commonly observed patterns. Without a clear notion of what is expected in different situations, common and relatively uninformative patterns frequently motivate unnecessarily complex explanations, and surprising and informative patterns may be overlooked (Frank, 2009).

Currently, the differences between families of common probability distributions often seem arbitrary. Thus, little understanding exists with regard to how changes in process or in methods of observation may cause observed pattern to change from one common form into another.

We argue that measurement, described by the relation between magnitude and information, unifies the distinct families of common probability distributions. Variations in measurement scale may, for example, arise from varying precision in observations at different magnitudes or from the way that information is lost when measurements are made on aggregates. Our unified explanation of the different commonly observed distributions in terms of measurement points the way to a deeper understanding of the relations between pattern and process.

We develop the role of measurement through maximum entropy expressions for probability distributions. We first note that all probability distributions can be expressed by maximization of entropy subject to constraint. Maximization of entropy is equivalent to minimizing total information while retaining all the particular information known to constrain underlying pattern (Jaynes, 1957a,b, 2003). To obtain a probability distribution of a given form, one simply chooses the informational constraints such that maximization of entropy yields the desired distribution. However, constraints chosen to match a particular distribution only describe the sufficient information for that distribution. To obtain deeper insight into the causes of particular distributions and each distribution's position among related families of distributions, we derive the related forms of constraints through variations in measurement scale.

Measurement scale expresses information through the invariant transformations of measurements that leave the form of the associated probability distribution unchanged (Frank & Smith, 2010). Each problem has a characteristic form of information invariance and symmetry that sets the measurement scale (Hand, 2004; Luce & Narens, 2008; Narens & Luce, 2008) and the most likely probability distribution associated with that particular scale (Frank & Smith, 2010). We show that measurement scales and the symmetries of information invariances form a natural hierarchy that generates the common families of probability distributions. We use *invariance* and *symmetry* interchangeably, in the sense that symmetry arises when an invariant transformation leaves an object unchanged (Weyl, 1952).

The measurement hierarchy arises from two processes. First, we express the forms of information invariance and measurement scale through a continuous group of transformations, showing the relations between different types of information invariance. Second, the types of aggregation and measurement that minimize information and maximize entropy fall into two classes, each class setting a different basis for information invariance and measurement scale.

The two types of aggregation correspond to the two major families of stable distributions that generalize the process leading to the central limit theorem: the Lévy family that includes the Gaussian distribution as a special case and the Fisher-Tippett family of extreme value distributions. By expressing measurement scale in a general way, we obtain a wider interpretation of the families of stable distributions and a broader classification of the common distributions.

Our derivation of probability distributions and their familial relations supersedes the Pearson and similar classifications of continuous distributions (Johnson *et al.*, 1994). Our system derives from a natural description of varying information in measurements under different conditions (Frank & Smith, 2010), whereas the Pearson and related systems derive from phenomenological descriptions that generate distributions without clear grounding in fundamental principles such as measurement and information.

Some recent systems of probability distributions, such as the unification by Morris (1982; Morris & Lock, 2009), provide great insight into the relations between families of distributions. However, Morris's system and other common classifications do not derive from what we regard as fundamental principles, instead arising from descriptions of structural similarities among distributions. We provide a detailed analysis of Morris's system in relation to ours in Appendix C.

We favour our system because it derives the relations between distributions from fundamental principles, such as maximum entropy and the invariances that define measurement scale. Although the notion of what is fundamental will certainly attract controversy, our favoured principles of entropy, symmetries defined by invariances, and measurement scale certainly deserve consideration. Our purpose is to show what one can accomplish by starting solely with these principles.