Notice: Wiley Online Library will be unavailable on Saturday 30th July 2016 from 08:00-11:00 BST / 03:00-06:00 EST / 15:00-18:00 SGT for essential maintenance. Apologies for the inconvenience.
Power-law frequency distributions characterize a wide array of natural phenomena. In ecology, biology, and many physical and social sciences, the exponents of these power laws are estimated to draw inference about the processes underlying the phenomenon, to test theoretical models, and to scale up from local observations to global patterns. Therefore, it is essential that these exponents be estimated accurately. Unfortunately, the binning-based methods traditionally used in ecology and other disciplines perform quite poorly. Here we discuss more sophisticated methods for fitting these exponents based on cumulative distribution functions and maximum likelihood estimation. We illustrate their superior performance at estimating known exponents and provide details on how and when ecologists should use them. Our results confirm that maximum likelihood estimation outperforms other methods in both accuracy and precision. Because of the use of biased statistical methods for estimating the exponent, the conclusions of several recently published papers should be revisited.
If you can't find a tool you're looking for, please click the link at the top of the page to "Go to old article view". Alternatively, view our Knowledge Base articles for additional help. Your feedback is important to us, so please let us know if you have comments or ideas for improvement.
Power laws have a long history in ecology and other disciplines (Bak 1996, Brown et al. 2002, Newman 2005). Power-law relationships appear in a wide variety of physical, social, and biological systems and are often cited as evidence for fundamental processes that underlie the dynamics structuring these systems (Bak 1996, Brown et al. 2002, Newman 2005). There are two major classes of power laws commonly reported in the ecological literature. The first are bivariate relationships between two variables. Examples of this type of relationship include the species–area relationship and body-size allometries. Standard approaches to analyzing this type of data are generally reasonable and discussions of statistical issues related to this kind of data are presented elsewhere (e.g., Warton et al. 2006). The second type of power law, and the focus of this paper, is the frequency distribution, where the frequency of some event (e.g., the number of individuals) is related to the size, or magnitude, of that event (e.g., the size of the individual).
where c and λ are constants, and λ is called the exponent and is typically negative (i.e., λ < 0). Because f(x) is a probability density function (PDF) the value of c is a simple function of λ and the minimum and maximum values of x (Table 1). The specific form of the PDF depends on whether the data are continuous or discrete, on the presence of minimum and maximum values, and on whether λ is <−1 or >−1. The different forms are often given distinct names for clarity (see Table 1).
Table 1. Descriptions of different power-law frequency distributions, including the name of the distribution, the range of data and parameter values over which it applies, its probability density function (PDF; or probability mass function) f(x), its cumulative distribution function F(x), and the maximum likelihood estimate (MLE) for λ based on the PDF.
There is substantial interest in using the parameters of these power-law distributions to make inferences about the processes underlying the distributions, to test mechanistic models, and to estimate and predict patterns and processes operating beyond the scope of the observed data. For example, power-law species abundance distributions with λ ≈ −1 are considered to represent evidence for the primary role of stochastic birth–death processes, combined with species input, in community assembly (Pueyo 2006, Zillio and Condit 2007); quantitative models of tree size distributions make specific predictions (e.g., λ = −2; Enquist and Niklas 2001) that can be used to test these models (Coomes et al. 2003, Muller-Landau et al. 2006); and power-law frequency distributions of individual size have been used to scale up from individual observations to estimate ecosystem level processes (Enquist et al. 2003, Kerkhoff and Enquist 2006).
Here, we: (1) describe the different approaches used to quantify the exponents of power-law frequency distributions; (2) show that some of these approaches give biased estimates; (3) illustrate the superior performance of some approaches using Monte Carlo methods; (4) make recommendations for best estimating parameters of power-law distributed data; and (5) show that some of the conclusions of recent studies are effected by the use of biased statistical techniques.
Methods for Estimating the Exponent
Perhaps the most intuitive way to quantify an empirical frequency distribution is to bin the observed data using bins of constant linear width. This generates the familiar histogram. Specifically, linear binning entails choosing a bin i of constant width (w = xi+1 − xi), counting the number of observations in each bin (i.e., with values of x between xi and xi + w), and plotting this count against the value of x at the center of the bin (xi/2 + xi+1/2). If the counts are divided by the sum of all the counts, this plot is an estimate of the probability density function, f(x). The traditional approach to estimating the power-law exponent is to fit a linear regression to log-transformed values of f(x) and x, with the slope of the line giving an estimate of the exponent, λ. Bins with zero observations are excluded, because log(0) is undefined, and sometimes bins with low counts are also excluded (e.g., Enquist and Niklas 2001). While in practice the choice of bin width is normally arbitrary, this choice represents a trade-off between the number of bins analyzed (i.e., the resolution of the frequency distribution) and the accuracy with which each value of f(x) is estimated (fewer observations per bin provide a poorer density estimate; Pickering et al. 1995).
Simple logarithmic binning
This approach is similar to linear binning, except that instead of the bins having constant linear width, they have constant logarithmic width, b = log(xi+1) − log(xi). The estimate of λ is obtained by log-transforming the values of x and following the procedure described in the previous section. Since the x data are transformed to begin with, it is not necessary to transform the bin centers again prior to fitting the regression. For power-law-like distributions, an advantage of logarithmic binning is the reduction of the number of zero and low-count bins at larger values of x because the linear width of a bin increases linearly with x; i.e., wi = xi(eb − 1). However, this means that the number of observations within each bin is determined not only by x, but also by the linear width of the bin. Therefore, the slope of the regression will give an estimate of λ + 1, not λ (Appendix A; Han and Straskraba 1998, Bonnet et al. 2001, Sims et al. 2007).
Normalized logarithmic binning
The problem of increasing linear width of logarithmic bins can be dealt with by normalizing the number of observations in each bin by the linear width of the bin, w. This converts the counts into densities (number of observations per unit of x; Bonnet et al. 2001, Christensen and Moloney 2005). The linear width of a logarithmic bin can be calculated as xi(eb − 1) (Appendix A). This normalization approach is typically used in the characterization of aquatic size-spectra and power-law distributions in physics (Kerr and Dickie 2001, Christensen and Moloney 2005). It removes the artifact from traditional logarithmic binning while maintaining the advantage of using larger bins where there are fewer values of x. An alternative approach is to use simple logarithmic binning and subtract one from the estimated exponent (Han and Straskraba 1998, Bonnet et al. 2001).
Fitting the cumulative distribution function
An alternative to binning methods is to work with the cumulative distribution function (CDF). The CDF describes the probability that a random variable, X, drawn from f(x) is ≤x. The CDF is straightforward to construct for a set of observed data, and no binning is required. To construct the CDF, first rank the n observed values (xi) from smallest to largest (i = 1 … n). The probability that an observation is less than or equal to xi (the CDF) is then estimated as i/n (this is the Kaplan-Meier estimate; Evans et al. 2000). Analyzing the CDF avoids the subjective influence of the choice of bin width and the problem of empty bins. Having determined the CDF for a power-law distribution, the exponent, λ, of the probability density function (PDF) can be estimated using regression. The traditional approach is to transform the equation for the CDF such that the slope of a linear equation is a function of λ. The linearized equation differs among distributions (Appendix A). The slope of the regression will be equal to λ + 1, making it necessary to subtract 1 to obtain λ (Bonnet et al. 2001, Rinaldo et al. 2002).
Maximum likelihood estimation
Maximum likelihood estimation (MLE) is one of the preferred approaches for estimating frequency distribution parameters (e.g., Rice 1994). MLE determines the parameter values that maximize the likelihood of the model (in this case, a power law with an unknown exponent) given the observed data. Specifically, MLE finds the value of λ that maximizes the product of the probabilities of each observed value of x (i.e., the product of f(x) evaluated at each data point; see Rice  for a good introduction to maximum likelihood methods). The specific solution for the maximum likelihood estimate of λ and whether the solution is closed form or requires numerical methods to solve depends on the minimum and maximum values of x and on the value of λ (Table 1). Alternatively, the likelihood can be maximized directly using numerical methods (Clauset et al. 2007, Zillio and Condit 2007). While MLE does not provide an opportunity for visual inspection of the distribution to determine if the assumption of the power-law functional form is reasonable, the validity of this assumption can be assessed using simple goodness-of-fit tests such as the Chi-square on binned data (Clark et al. 1999, Clauset et al. 2007, Edwards et al. 2007), or by visually assessing the linearity of binned data, or the CDF (Benhamou 2007), under the appropriate transformation.
Comparing the Methods
While uncorrected simple logarithmic binning clearly provides incorrect estimates of λ, the alternative approaches discussed above all seem reasonable and intuitive. However, the different approaches do not perform equally well, and some produce biased estimates of the exponent (e.g., Pickering et al. 1995, Clark et al. 1999, Sims et al. 2007). We applied Monte Carlo methods to illustrate the advantages and disadvantages of the various approaches and to explore cases relevant to ecology that have not been previously addressed. Monte Carlo methods generate data that are, by definition, power-law distributed with known exponents, making it possible to compare the performance of the different techniques in estimating the value of λ.
We generated power-law distributed random numbers using the inverse transformation method for the Pareto distribution (Ross 2006), and using the rejection method for the discrete Pareto distribution (Devroye 1986). Each analysis consisted of the following: (1) generating 10 000 Monte Carlo data sets for each point in the analysis (e.g., for each sample size), (2) estimating the exponent for each data set using the methods described previously, and (3) comparing the performance of the methods based on bias (i.e., accuracy) and on the variance in the estimate (i.e., precision). We report on simulated distributions generated using λ = −2 and a = 1. The results for other combinations of parameters are qualitatively similar. We also evaluated the influence of sample size on the various estimation techniques, and for binning-based approaches we evaluated the effect of bin width on the analysis.
Uncorrected simple logarithmic binning gives the wrong exponent
Non-normalized logarithmic binning does not estimate λ; it estimates λ + 1 (Han and Straskraba 1998, Bonnet et al. 2001, Sims et al. 2007). Therefore if simple logarithmic binning is used, and an estimate of λ is the desired result, then it is necessary to subtract 1 from the slope of the logarithmically binned data. Not doing so will give the wrong value for the exponent.
Binning-based approaches perform poorly
Linear binning performs poorly by practically any measure. In most cases it produces biased estimates of the exponent and its estimates are highly variable (Figs. 1 and 2). In addition, the estimated exponent is highly dependent on the choice of bin width, and this dependency varies as a function of sample size (Fig. 3). While normalized logarithmic binning performs better than linear binning, its estimates are also dependent on the choice of bin width and are more variable than alternate approaches. Our results are based on recommended practices in binning analyses (following Pickering et al. 1995). Many alternative approaches to constructing bins and performing regressions on binned data are conceivable, and it is possible that some of these may improve the performance of the estimates. However, this highlights the fact that binning-based methods are sensitive to a variety of decisions, and it appears that no amount of tweaking will be able to produce a consistent binning-based method for estimating the exponent. In general, binning results in a loss of information about the distributions of points within a bin and is thus expected to perform poorly (Clauset et al. 2007, Edwards et al. 2007). Therefore, while binning is useful for visualizing the frequency distribution, and normalized logarithmic binning performs well at this task, binning-based approaches should be avoided for parameter estimation (Clauset et al. 2007).
Maximum likelihood estimation performs best
While fitting the cumulative distribution function (CDF) generally produces good results, estimates of λ using the CDF approach are often biased at small sample sizes and are consistently more variable than those using maximum likelihood estimation (MLE; Fig. 2; Clark et al. 1999, Newman 2005). This probably results because the logarithmic transformation used in fitting the CDF weights a small number of points more heavily, and because the points in the CDF are not independent thus violating regression assumptions (see Clauset et al.  for other issues with regression-based approaches). While alternative approaches to fitting the CDF (e.g., nonlinear regression) could improve the performance of this estimator, MLE has been shown mathematically to be the single best approach for estimating power-law exponents (i.e., it is the minimum variance unbiased estimator; Johnson et al. 1994, Clark et al. 1999, Newman 2005). In addition, MLE produces valid confidence intervals for the estimated exponent (Appendix A), which the other methods do not (Clark et al. 1999, Newman 2005, Clauset et al. 2007).
Minimum and maximum values
Minimum and maximum attainable values of ecological quantities can result either from natural limits on the quantity being measured (e.g., trees cannot grow above some maximum size), or from methodological limits on the values that can be observed (e.g., fires <1 ha are not recorded). In addition, the power-law form of the distribution may not hold over the entire range of x, making it necessary to select a restricted range of x on which to estimate the exponent. While binning-based approaches do not assume particular limits on x (but see Pickering et al. 1995), CDF and MLE approaches assume the minimum and maximum attainable values of x given in Table 1. In some cases these limits may be known, but if not, it may be necessary to estimate them (e.g., Kijko 2004, Clauset et al. 2007). Because maximum likelihood estimation for the truncated Pareto requires numerical methods, it has been suggested that in some cases with both a minimum and maximum value that the error introduced by assuming that there is no maximum is small enough that it is reasonable to estimate the exponent using the maximum likelihood estimate for the Pareto distribution. Clark et al. (1999) suggest this approximation in cases where the maximum value is at least two orders of magnitude greater than the minimum; that is, max (x) > 100 × min (x).
Deviations from the power law
Empirical data are rarely perfectly power-law distributed over the entire range of x (Brown et al. 2002, Newman 2005). MLE and CDF approaches respond to deviations differently because the traditional MLE analysis implicitly weights data on a linear scale while the traditional CDF approach weights it on a logarithmic scale (McGill 2003). The CDF approach will therefore respond more strongly to deviations from the power law at large values of x (such as those observed in individual size distributions; e.g., Coomes et al. 2003) than the MLE approach, whereas MLE will respond more strongly to deviations at small values of x (commonly observed in many power-law distributions; e.g., Newman 2005). It is common to truncate data in the tails that exhibit deviations from the power law before fitting the exponent (e.g., Newman 2005). However, these deviations also should not be ignored, as they may help identify important biological processes (e.g., Coomes et al. 2003). In some cases deviations may suggest that the power law is in fact not the appropriate model for the data. This can be evaluated using goodness-of-fit tests on binned data (Clark et al. 1999, Clauset et al. 2007, Edwards et al. 2007) or by using model selection techniques to compare the power-law to alternative distributions (Muller-Landau et al. 2006, Clauset et al. 2007, Edwards et al. 2007).
Most of the MLE and CDF methods presented here assume that the data are continuously distributed, as is often the case (e.g., body size). However, some ecological patterns (e.g., species-abundance distributions) are comprised of discrete observations (e.g., it is impossible to census 4.3 individuals). It is therefore necessary to use analogous discrete distributions. In the case of the Pareto distribution a discrete analog exists in the form of the aptly named discrete Pareto distribution (Johnson et al. 2005, Newman 2005; Table 1; also called the Zipf or Riemann-zeta distribution). In some cases continuous distributions can reasonably approximate discrete data; but in the case of the Pareto, using the continuous maximum likelihood estimate instead of that derived from the discrete distribution produces strongly biased results and should be avoided (Appendix C; Clauset et al. 2007).
Implications for Published Results
One of the most important implications for published results is that studies that have estimated exponents using uncorrected simple logarithmic binning (e.g., Morse et al. 1988, Meehan 2006) have reported the wrong exponent. This is particularly important in cases where the exponent is used to test quantitative predictions. For example, an analysis in Meehan (2006) evaluates whether observed individual size distribution exponents were consistent with those predicted, using simple logarithmic binning. Meehan concluded that the empirical data matched the predictions (Fig. 4a). However, since the reported exponents are equal to λ + 1, the analysis suggests that the size distribution is substantially steeper than expected, thus refuting, rather than supporting, the hypothesized mechanism (Fig. 4a; Appendix B).
Analyses based on linearly binned data should also be revisited due to the potential for biased estimates and the strong influence of bin width on the estimated exponent. In particular, studies that have used linear binning to test the predictions of theoretical models or compare exponents from different data sets (e.g., Enquist and Niklas 2001, Coomes et al. 2003, Niklas et al. 2003, Kefi et al. 2007) may have reached incorrect conclusions. We reanalyzed the original data from Enquist and Niklas (2001) and found that while the original linear binning analyses suggested that observed diameter distribution exponents were near the theoretical prediction of −2, MLE suggests that the observed exponents are actually closer, on average, to −2.5 (Fig. 4b; Appendix B). Our reanalysis indicates that the size–frequency distributions in Gentry's plots are not, in general, adequately represented by a power law with an exponent of −2, as originally claimed by Enquist and Niklas (2001; see Appendix B for an important caveat).
While normalized logarithmic binning performs better than linear binning, it can still introduce biases of ∼10% depending on the bin width. While many analyses based on normalized logarithmic binning are probably reasonable, the recent suggestion that normalized logarithmic binning is the best approach for fitting exponents (Sims et al. 2007) is unwarranted, and MLE should be used whenever possible (Clark et al. 1999, Clauset et al. 2007).
Compared to binning-based approaches, results from fitting the CDF are probably reasonable. In cases with low sample sizes, where small errors in the estimated exponent could influence the conclusions of the study, or where minimum or maximum attainable values of x have been ignored (see Pickering et al. 1995), it may be worth checking the results using MLE. Regardless, MLE is the single best method for estimating exponents and should be used in future studies.
We have focused on power laws because they, at least approximately, characterize a number of distributions of interest to ecologists. The issues raised here, and the conclusions discussed, should apply broadly to frequency distributions in general, and in particular to other distributions with heavy tails. Paying careful attention to fitting methodologies and consultation of statistical references (e.g., Johnson et al. 1994) should help improve the estimation of distributional parameters.
We particularly thank Tim Meehan for generously sharing his data with us and for comments on the manuscript. For the Gentry data we thank Alwyn Gentry, the Missouri Botanical Garden, and collectors who assisted Gentry or contributed data for specific sites. We also thank David Coomes, Susan Durham, Jim Haefner, Brian McGill, and Tommaso Zillio for comments on the manuscript. This work was funded by the National Science Foundation through a Postdoctoral Fellowship in Biological Informatics to E. P. White (DBI-0532847). We used the first–last author emphasis approach (sensu Tscharntke et al. 2007) for the sequence of authors.