Parameter landscapes unveil the bias in allometric prediction


  • Cang Hui,

    Corresponding author
    1. Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland, 7602, South Africa
    Search for more papers by this author
  • John S. Terblanche,

    1. Department of Conservation Ecology and Entomology, Stellenbosch University, Private Bag X1, Matieland, 7602, South Africa
    Search for more papers by this author
  • Steven L. Chown,

    1. Centre for Invasion Biology, Department of Botany and Zoology, Stellenbosch University, Private Bag X1, Matieland, 7602, South Africa
    Search for more papers by this author
  • Melodie A. McGeoch

    1. Centre for Invasion Biology, Cape Research Centre, South African National Parks, P.O. Box 216, Steenberg 7947, South Africa
    Search for more papers by this author

Correspondence author. E-mail:


1. The criteria for choosing the appropriate line-fitting method (LFM) and correction estimator for determining the functional allometric relationship, and for predicting the Y-variable accurately are controversial. A widely accepted criterion for reducing bias in allometric prediction is to minimize the mean squared residual (MSR) on the antilog scale, and a series of correction estimators have been designed precisely to achieve this.

2. Here, using parameter landscapes, we examine the performance of the correction estimators and several LFMs under different data reszidual shapes, sample sizes and coefficients of determination.

3. Predictions from the nonlinear LFM were found to have minimum MSR values (minimum bias), but with obviously skewed frequency distributions of the predicted Y-variable compared with observed data. This implies that using MSR as a bias measure for allometric prediction could be misleading.

4. We introduce a new bias measure, the discrepancy of the frequency distributions of the Y-variable between predicted and observed data, and suggest that the reduced major axis method is the least biased method in most cases, both on the logarithmic and antilog scales.

5. Parameter landscapes clearly illustrate the performance of each LFM and correction estimator, as well as the best solution given specified criteria. We therefore suggest a shift in emphasis from designing more sophisticated LFM or correction estimators (equal to finding the peaks in the parameter landscape) to justifying the measure of bias and performance criterion in allometric prediction.


Defined as the influence of body size on form and function (LaBarbera 1989), allometry has a rich and venerable history and continues to draw much attention as a consequence of its widespread use in behaviour, physiology, ecology and evolution (e.g. Peters 1983; Calder 1984; Schmidt-Nielsen 1984; Brown et al. 2004). The main reason for adopting an allometric approach is that log transformation linearizes a wide array of nonlinear biological relationships, thereby enabling simpler linear statistical analyses to be used. This, in turn, enables the calculation of confidence limits and, for example, statistical testing for homogeneity of slopes across groups. Allometric relationships are used not only to defend or refute particular theories by comparison of slope estimates (Farrell-Gray & Gotelli 2005; Glazier 2005; Reich et al. 2006; Chown et al. 2007; Duncan, Forsyth, & Hone 2007; White, Cassey, & Blackburn 2007), but also to predict, with a degree of accuracy far greater than random, a variable of interest based solely on body size (Wainwright & Richard 1995; Lindstedt & Schaeffer 2002). The predictive capacity is also one of the greatest strengths of allometry and has been used, for instance, in the pursuit of unifying theories of animal locomotion (Bejan & Marden 2006) and metabolic ecology (Allen, Brown, & Gillooly 2002; Brown et al. 2004; Gillooly et al. 2005; Brown & Sibly 2006).

A widely accepted problem when working with allometric equations is one associated with Jensen’s inequality, i.e. the discrepancy between the arithmetic and the geometric mean (e.g. Sprugel 1983; Blackburn & Gaston 1998). Allometric equations derived from linear regression using log-transformed data estimate the geometric and not the arithmetic mean, and predictions using the geometric mean on the original antilog scale can have a substantial effect on the outcomes of biological studies (Smith 1993; Hayes & Shonkwiler 2006). As a result, a series of correction estimators has been designed to mitigate this problem (Finney 1941; Heien 1968; Bradu & Mundlak 1970; Zellner 1971; Beauchamp & Olson 1973; Duan 1983). By intentionally transforming the Y-prediction from the geometric to the arithmetic mean, these estimators can thereby largely reduce prediction bias (i.e. a systematic difference in the estimator’s expectation and the true value of the parameter being tested), measured by the mean squared residual (MSR).

Although the relative performance of the correction estimators has been tested (e.g. Smith 1993; Hayes & Shonkwiler 2006), their performance relative to different line-fitting methods (LFMs), as well as the potential effects of the error distribution, sample size and variation in the data on their performance has not been explicitly assessed. Using systematic simulations, we demonstrate the effect of residual shape, sample size and the magnitude of residuals (or scatter, defined below) on the allometric regression. A general relationship among the LFMs and correction estimators, as well as a guideline for the selection of the LFM for allometric regression, are provided. Of particular importance, by using parameter landscapes we are able to demonstrate the relative performance of the LFM and correction estimators under different measures of bias [specifically, the MSR and the mean of the squared discrepancy of the frequency distribution of the Y-variable between predictions and observations, MSD]. Using this approach, we identify a serious bias when using correction estimators and the nonlinear LFM for prediction, and suggest a shift of focus in allometric regression from designing new LFMs to defining bias with appropriate measures.

Materials and methods

Allometric relationships are described by the power function Y = a · Xb, where Y is the biological characteristic of interest, X is the body mass and a and b are empirically derived constants (Peters 1983). Data following an underlying allometric form can be generated using a routine process (see e.g. Heien 1968; White 2003), inline image, where = 1, 2,…, n and Vi a multiplicative random error. The logarithms of the observations thus give yi = c + b ·xi + vi, where vi = ln Vi is the residual of yi. The independent variable X (typically, body mass) generally obeys a log-normal distribution, or at least the log-normal distribution provides a neutral starting point for ‘broad allometry’ (sensuSmith 1984; LaBarbera 1989). Therefore, the log-transformed independent variable x obeys a normal distribution inline image, with μx and σx as the mean and SD.

Twelve simulated data sets, comprising four forms of residuals (Form I ‘cigar’ shaped, II ‘comet’ shaped, III ‘missile’ shaped and IV ‘hour-glass’ shaped) and three levels of scatter (measured as the coefficient of determination, R2; low scatter inline image, moderate scatter inline image, high scatter inline image) were generated (see Appendix S1). Here, a normal distribution of residuals inline image with symmetrically distributed points (Form I residual) is used to demonstrate the principal argument of this research. Nonetheless, results from using data sets with other forms of residuals comply with the main results presented here, and are therefore only presented in the Appendices S1–S5.

A correction estimator corrects the result (to account for Jensen’s inequality) from the LFM that is chosen a priori, and therefore the choice of an appropriate LFM becomes an essential prerequisite for allometric prediction. For linear regression, the ordinary least squares (OLS) has been recommended for prediction with the highest accuracy because by definition it minimizes the MSR (Legendre 2001)


For allometric regression, OLS fails because it minimizes the log-transformed MSR (with the mean of predictions being the geometric mean of Y), but not the MSR on the original scale (e.g. Blackburn & Gaston 1998)


which corresponds to the arithmetic mean as the mean of predictions. If we then estimate Y directly from an OLS regression (inline image), the prediction will systematically underestimate the observation. For instance, for data sets with Form I residuals, although the expected value of prediction is equal to the observation inline image on the logarithmic scale, the expected value of prediction on the original antilog scale is less than the observation inline image (Finney 1941; Heien 1968). It is the violation of this equality of two values after antilog transformation that causes the problem of allometric prediction. As a result, the prediction of Y should become (Consistent I)

image(eqn 1)

Due to the violation of normality in the residuals vi and the adjustment required for the variance of y, different estimators have been developed to correct predicted values of Y from X. These further include the Consistent II estimator (Maddala 1988)

image(eqn 2)

where inline image the first-order approximation of a minimum variance unbiased estimator (approximate MVUE) (Beauchamp & Olson 1973)

image(eqn 3)

and a nonparametric estimator (the smearing estimator) (Duan 1983)

image(eqn 4)

By multiplying an approximation of the expected residual E(V), the correction estimators can estimate the arithmetic mean from the geometric mean, mitigate the MSR without using the nonlinear regression method, and therefore provide a less biased prediction for the true Y-variable (Smith 1993; Hayes & Shonkwiler 2006).

To demonstrate the performance of these correction estimators and several relevant LFMs [including OLS, reduced major axis (RMA), adjusted method (ADJ) and nonlinear method (NON); see Appendix S2], we present the parameter landscape of the MSR on logarithmic and antilog scales; that is, a 3-dimensional MSR landscape as a function of the parameters c (intercept) and b (exponent) in the allometric relationship. Each LFM and correction estimator, as well as other approaches that are not included in this study, thus correspond to points in the parameter landscape, and the best method (i.e. least bias) will be located at the peak of the parameter landscape (Fig. 1; see Mathematica code in Appendix S3). For instance, the generalized linear model for Form I residual produces the same results as OLS for log-transformed data, and generates the same result as NON for (x, Y) (Cox et al. 2008). Furthermore, although the MSR has classically been considered the measure of bias for allometric prediction, it only depicts an overall bias in prediction on the original scale. For instance, whether the prediction for small body size is more accurate (small bias) or not is unclear from the MSR. As a result, here we present an alternative in the form of the MSD of the frequency distributions as a measure of bias on the antilog scale,


where FY is the frequency of Y observed and inline image the frequency of predicted Y using parameters c and b. For comparison, we also present the MSD on the logarithmic scale,


where fy and inline image are the frequencies of the observed and predicted y. An accurate LFM method and correction estimator should predict a similar frequency distribution of Y to the observations, i.e. a lower value of MSD. Moreover, we also test the influence of the shapes of residuals (vi), scatter (measured as coefficients of determinationinline image,inline image,inline image) and sampling effort (sample size, n) on the bias in the prediction of the Y-variable.

Figure 1.

 Parameter landscapes of the mean of the squared residuals (MSR, s2 in a and S2 in b) and the mean of the squared discrepancy of the frequency distributions (MSD, ϕ in c and Φ in d) at the logarithmic scale (a and c) and the original antilog scale (b and d). The values are log-transformed for demonstration. Red arrows indicate the best estimates (peaks). Data used have Form I residuals and moderate scatter (inline image, see Appendix S1). Parameter landscapes for other data sets are presented in Appendix S3. OLS, ordinary least squares; RMA, reduced major axis; ADJ, adjusted method; NON, nonlinear method; MVUE: the approximation to the minimum variance unbiased estimator. Consistent I, Consistent II and Smearing estimator are removed in the figure because they have indistinguishable localities from the MVUE.


The performance of the correction estimators and the chosen LFMs was clearly illustrated by the parameter landscape (Fig. 1 and Appendix S3). Although the parameters (c and b) that generate the least bias were easy to identify on the logarithmic scale (peaks in Fig. 1a and c), it became difficult on the antilog scale (peaks along the ridge in Fig. 1b and d). On the logarithmic scale, OLS yielded the lowest MSR (Fig. 1a), whereas the NON had the smallest MSR on the original antilog scale (Fig. 1b). In terms of the MSD, RMA was least biased on both the logarithmic and antilog scales (Fig. 1c and d). All correction estimators were indistinguishable but clearly not the best under both criteria and bias measures (Fig. 1). When using parameter landscapes to explore data with different degrees of scatter and different residual shapes, we found these results were only partially sustained (see Appendix S3). Except for data with low scatter (inline image) and Form III residuals, the NON produced the lowest MSR. However, the frequency distribution of Y predicted from the NON was completely different from the observed data. The peak (lowest) in the MSD parameter landscape corresponds to none of the LFM and correction estimators (with RMA closest to the peaks; Appendix S3).

Further detailed examination revealed that the correction estimators can generally be considered to produce regression lines approximately parallel to the OLS, but with higher intercept values (Fig. 2). Moreover, Consistent I and Smearing had the same slope as OLS and Consistent II and Approximate MVUE had a slightly shallower slope than OLS (Appendix S4 and its figure), although not significantly so (table 1 in Appendix S4). Except for the nonlinear LFM, all the other LFMs and corrected estimators predicted the lowest slope for Form III residuals (a triangular shape with decreasing residuals as x increases; Appendix S1) and highest for Form I. The order of MSR from low to high is generally Forms II, IV, I and III on the logarithmic scale, with a change to Forms III, I, II and IV on the antilog scale, an almost complete reversal (see e.g. OLS in table 2 in Appendix S4). There was no sign that the correction estimators performed better than the OLS when measured by the MSR. For inline image residuals of Forms I and II, as well as all three cases in Form IV, correction estimators indeed had less MSR than OLS; 5 out of 12 cases. For inline image and inline image residuals of Forms I and II, as well as all three cases in Form III, OLS in fact performed better (less MSR) than the correction estimators (7 out of 12 cases; table 2 in Appendix S4).

Figure 2.

 Outcome of different line-fitting methods on simulated allometric data, i.e. ordinary least square (OLS), reduced major axis (RMA), adjusted and nonlinear methods, as well as correction estimator for OLS (usinginline image and Form I residuals). Only Consistent I is presented since other correction estimators are indistinguishable from Consistent I (a more detailed comparison of these estimators is shown in Appendix S4).

The frequency distributions of data values predicted (inline image) from different LFMs and from the different correction estimators, even though especially designed for prediction according to the definition, showed obvious deviations from the frequency distribution of the original data (Y) (Fig. 3; see also figure 5 in Appendix S4), especially for scatter levels inline image and inline image. On the original scale (see Appendix S4), OLS tends to have a lower frequency of observations for small values of Y, but this underestimation of the number of low values becomes less obvious with increasing bin classes of body size (figure 5 in Appendix S4). The frequency distribution of data values from RMA most closely matches the distribution of the original data (Figs 3 and figure 5 in Appendix S4). OLS shows a higher modal frequency but a narrower range of Y-predictions; RMA has a distribution similar to the observed data; ADJ has a lower mode but wider range of Y-predictions, whereas the correction estimators, as well as the NON, have a right-shifted frequency distribution. This overall pattern becomes clearer when the level of scatter changes from low to high (i.e. from inline image to inline image; Fig. 3a–c). Moreover, sensitivity tests support the fact that the above results were representative and were not affected by the parameters in the data generating process (Appendix S5).

Figure 3.

 The frequency distribution of log-transformed Y-variable values with Form I residuals for (a) low- (inline image), (b) middle- (inline image) and (c) high- (inline image) level of scatter (see Appendices S1 and S4 for more detail).


Surprisingly, our results suggest that using the MSR as a measure of bias in allometry could be misleading. The rationale behind these correction estimators is to provide a straightforward analytical solution for minimizing the MSR and thus, to improve the accuracy of predictions based on the result from the OLS linear regression. Moreover, a correction estimator was adopted as an alternative to the complexity of directly using a nonlinear algorithm, especially given the limited calculation capacity of computers in the 1940s–1980s when these correction estimators were designed (Finney 1941; Heien 1968; Bradu & Mundlak 1970; Zellner 1971; Beauchamp & Olson 1973; Duan 1983). With the speed and capacity of modern computers having increased substantially, the NON has once again gained momentum and is increasingly employed in the biological sciences for line-fitting and for comparison with other LFMs (Packard & Boardman 2009). Our results confirm that for allometric prediction the NON indeed produces the lowest MSR, although it systematically underestimates the slope value (b) and thus yields a completely right-skewed frequency distribution of predicted values on the original scale. This, however, does not necessarily mean that the NON has any calculation or conceptual flaws, but, on the contrary, implies that the flaw lies with using MSR as the measure of bias for the allometric prediction.

Although earlier computational capacity may have constrained a straightforward way of seeking peaks on the parameter landscape, modern desktop computers can easily handle this task. Clearly, designing new LFMs and correction estimators will have limited value and has essentially become an outdated pursuit. In this regard, it is not the LFMs and correction estimators that should form the focus of further attention in allometry, but the criteria for assessing their performance and the measure of bias, such as the MSR and MSD. MSR is not a robust criterion for allometric prediction for two reasons. First, the peak in the MSR parameter landscape is difficult to distinguish (Fig. 1). Second, the frequency distribution of Y predicted by the parameters at the peak strongly deviates from the observed frequency distribution (Fig. 3). The MSD bias measure we introduce shows a clear ridge in the parameter landscape, suggesting a local optimum, yet the exact peak is still not obvious. Therefore, the focus of allometric regression methodology should be shifted from designing new LFMs and correction estimators to presenting criteria for assessing bias that can not only generate a parameter landscape with clear peaks, but also with plausible interpretation.

In conclusion, as most previous work has shown (Legendre & Legendre 1998; Quinn & Keough 2002; Warton et al. 2006), different criteria of bias measures and research objectives (e.g. prediction on the original scale vs. accurate slope estimation) will lead to different recommendations. For minimizing the MSR on the antilog scale (i.e. for accurate prediction on the original scale), the NON is probably best. For predicting Y with the same frequency distribution as the observations (i.e. for accurate slope estimate), RMA performs well in most cases. OLS tends to underestimate the Y-value for species (or individuals) with extreme body size (either small or large), but overestimates the Y-value for those with moderate body size. Even though the correction estimators might not perform too poorly at prediction on the original scale, the frequency distributions of predicted compared with observed data are obviously right-skewed. Furthermore, it is also clear now that the performance of different LFMs and correction estimators is influenced by the residual shapes, sample size and the coefficient of determination (the scatter) of the data (Appendix S4). Data with a low level of scatter for small body size generate low MSR on a logarithmic scale, but generate high MSR on the antilog scale. Form III data tend to have the highest slope estimate, whereas Form I data have the lowest. Allometric slope comparisons or predictions are not reliable when either the number of observations (< 50) or the coefficient of determination is too low (R2 < 0·67). As LaBarbera (1989) pointed out, ‘scaling studies paint nature with a very broad brush; they are more akin to the gas laws of physics than to Newton’s laws.’ A shift in the methodological focus of allometric regression away from designing sophisticated LFM or correction estimators towards seeking a robust and sensitive bias criterion should help to find the best ‘brush’ for use in scaling studies.