1. The criteria for choosing the appropriate line-fitting method (LFM) and correction estimator for determining the functional allometric relationship, and for predicting the Y-variable accurately are controversial. A widely accepted criterion for reducing bias in allometric prediction is to minimize the mean squared residual (MSR) on the antilog scale, and a series of correction estimators have been designed precisely to achieve this.
2. Here, using parameter landscapes, we examine the performance of the correction estimators and several LFMs under different data reszidual shapes, sample sizes and coefficients of determination.
3. Predictions from the nonlinear LFM were found to have minimum MSR values (minimum bias), but with obviously skewed frequency distributions of the predicted Y-variable compared with observed data. This implies that using MSR as a bias measure for allometric prediction could be misleading.
4. We introduce a new bias measure, the discrepancy of the frequency distributions of the Y-variable between predicted and observed data, and suggest that the reduced major axis method is the least biased method in most cases, both on the logarithmic and antilog scales.
5. Parameter landscapes clearly illustrate the performance of each LFM and correction estimator, as well as the best solution given specified criteria. We therefore suggest a shift in emphasis from designing more sophisticated LFM or correction estimators (equal to finding the peaks in the parameter landscape) to justifying the measure of bias and performance criterion in allometric prediction.
A widely accepted problem when working with allometric equations is one associated with Jensen’s inequality, i.e. the discrepancy between the arithmetic and the geometric mean (e.g. Sprugel 1983; Blackburn & Gaston 1998). Allometric equations derived from linear regression using log-transformed data estimate the geometric and not the arithmetic mean, and predictions using the geometric mean on the original antilog scale can have a substantial effect on the outcomes of biological studies (Smith 1993; Hayes & Shonkwiler 2006). As a result, a series of correction estimators has been designed to mitigate this problem (Finney 1941; Heien 1968; Bradu & Mundlak 1970; Zellner 1971; Beauchamp & Olson 1973; Duan 1983). By intentionally transforming the Y-prediction from the geometric to the arithmetic mean, these estimators can thereby largely reduce prediction bias (i.e. a systematic difference in the estimator’s expectation and the true value of the parameter being tested), measured by the mean squared residual (MSR).
Although the relative performance of the correction estimators has been tested (e.g. Smith 1993; Hayes & Shonkwiler 2006), their performance relative to different line-fitting methods (LFMs), as well as the potential effects of the error distribution, sample size and variation in the data on their performance has not been explicitly assessed. Using systematic simulations, we demonstrate the effect of residual shape, sample size and the magnitude of residuals (or scatter, defined below) on the allometric regression. A general relationship among the LFMs and correction estimators, as well as a guideline for the selection of the LFM for allometric regression, are provided. Of particular importance, by using parameter landscapes we are able to demonstrate the relative performance of the LFM and correction estimators under different measures of bias [specifically, the MSR and the mean of the squared discrepancy of the frequency distribution of the Y-variable between predictions and observations, MSD]. Using this approach, we identify a serious bias when using correction estimators and the nonlinear LFM for prediction, and suggest a shift of focus in allometric regression from designing new LFMs to defining bias with appropriate measures.
Materials and methods
Allometric relationships are described by the power function Y = a · Xb, where Y is the biological characteristic of interest, X is the body mass and a and b are empirically derived constants (Peters 1983). Data following an underlying allometric form can be generated using a routine process (see e.g. Heien 1968; White 2003), , where i =1, 2,…, n and Vi a multiplicative random error. The logarithms of the observations thus give yi = c + b ·xi + vi, where vi = ln Vi is the residual of yi. The independent variable X (typically, body mass) generally obeys a log-normal distribution, or at least the log-normal distribution provides a neutral starting point for ‘broad allometry’ (sensuSmith 1984; LaBarbera 1989). Therefore, the log-transformed independent variable x obeys a normal distribution , with μx and σx as the mean and SD.
Twelve simulated data sets, comprising four forms of residuals (Form I ‘cigar’ shaped, II ‘comet’ shaped, III ‘missile’ shaped and IV ‘hour-glass’ shaped) and three levels of scatter (measured as the coefficient of determination, R2; low scatter , moderate scatter , high scatter ) were generated (see Appendix S1). Here, a normal distribution of residuals with symmetrically distributed points (Form I residual) is used to demonstrate the principal argument of this research. Nonetheless, results from using data sets with other forms of residuals comply with the main results presented here, and are therefore only presented in the Appendices S1–S5.
A correction estimator corrects the result (to account for Jensen’s inequality) from the LFM that is chosen a priori, and therefore the choice of an appropriate LFM becomes an essential prerequisite for allometric prediction. For linear regression, the ordinary least squares (OLS) has been recommended for prediction with the highest accuracy because by definition it minimizes the MSR (Legendre 2001)
For allometric regression, OLS fails because it minimizes the log-transformed MSR (with the mean of predictions being the geometric mean of Y), but not the MSR on the original scale (e.g. Blackburn & Gaston 1998)
which corresponds to the arithmetic mean as the mean of predictions. If we then estimate Y directly from an OLS regression (), the prediction will systematically underestimate the observation. For instance, for data sets with Form I residuals, although the expected value of prediction is equal to the observation on the logarithmic scale, the expected value of prediction on the original antilog scale is less than the observation (Finney 1941; Heien 1968). It is the violation of this equality of two values after antilog transformation that causes the problem of allometric prediction. As a result, the prediction of Y should become (Consistent I)
Due to the violation of normality in the residuals vi and the adjustment required for the variance of y, different estimators have been developed to correct predicted values of Y from X. These further include the Consistent II estimator (Maddala 1988)
where the first-order approximation of a minimum variance unbiased estimator (approximate MVUE) (Beauchamp & Olson 1973)
and a nonparametric estimator (the smearing estimator) (Duan 1983)
By multiplying an approximation of the expected residual E(V), the correction estimators can estimate the arithmetic mean from the geometric mean, mitigate the MSR without using the nonlinear regression method, and therefore provide a less biased prediction for the true Y-variable (Smith 1993; Hayes & Shonkwiler 2006).
To demonstrate the performance of these correction estimators and several relevant LFMs [including OLS, reduced major axis (RMA), adjusted method (ADJ) and nonlinear method (NON); see Appendix S2], we present the parameter landscape of the MSR on logarithmic and antilog scales; that is, a 3-dimensional MSR landscape as a function of the parameters c (intercept) and b (exponent) in the allometric relationship. Each LFM and correction estimator, as well as other approaches that are not included in this study, thus correspond to points in the parameter landscape, and the best method (i.e. least bias) will be located at the peak of the parameter landscape (Fig. 1; see Mathematica code in Appendix S3). For instance, the generalized linear model for Form I residual produces the same results as OLS for log-transformed data, and generates the same result as NON for (x, Y) (Cox et al. 2008). Furthermore, although the MSR has classically been considered the measure of bias for allometric prediction, it only depicts an overall bias in prediction on the original scale. For instance, whether the prediction for small body size is more accurate (small bias) or not is unclear from the MSR. As a result, here we present an alternative in the form of the MSD of the frequency distributions as a measure of bias on the antilog scale,
where FY is the frequency of Y observed and the frequency of predicted Y using parameters c and b. For comparison, we also present the MSD on the logarithmic scale,
where fy and are the frequencies of the observed and predicted y. An accurate LFM method and correction estimator should predict a similar frequency distribution of Y to the observations, i.e. a lower value of MSD. Moreover, we also test the influence of the shapes of residuals (vi), scatter (measured as coefficients of determination,,) and sampling effort (sample size, n) on the bias in the prediction of the Y-variable.
The performance of the correction estimators and the chosen LFMs was clearly illustrated by the parameter landscape (Fig. 1 and Appendix S3). Although the parameters (c and b) that generate the least bias were easy to identify on the logarithmic scale (peaks in Fig. 1a and c), it became difficult on the antilog scale (peaks along the ridge in Fig. 1b and d). On the logarithmic scale, OLS yielded the lowest MSR (Fig. 1a), whereas the NON had the smallest MSR on the original antilog scale (Fig. 1b). In terms of the MSD, RMA was least biased on both the logarithmic and antilog scales (Fig. 1c and d). All correction estimators were indistinguishable but clearly not the best under both criteria and bias measures (Fig. 1). When using parameter landscapes to explore data with different degrees of scatter and different residual shapes, we found these results were only partially sustained (see Appendix S3). Except for data with low scatter () and Form III residuals, the NON produced the lowest MSR. However, the frequency distribution of Y predicted from the NON was completely different from the observed data. The peak (lowest) in the MSD parameter landscape corresponds to none of the LFM and correction estimators (with RMA closest to the peaks; Appendix S3).
Further detailed examination revealed that the correction estimators can generally be considered to produce regression lines approximately parallel to the OLS, but with higher intercept values (Fig. 2). Moreover, Consistent I and Smearing had the same slope as OLS and Consistent II and Approximate MVUE had a slightly shallower slope than OLS (Appendix S4 and its figure), although not significantly so (table 1 in Appendix S4). Except for the nonlinear LFM, all the other LFMs and corrected estimators predicted the lowest slope for Form III residuals (a triangular shape with decreasing residuals as x increases; Appendix S1) and highest for Form I. The order of MSR from low to high is generally Forms II, IV, I and III on the logarithmic scale, with a change to Forms III, I, II and IV on the antilog scale, an almost complete reversal (see e.g. OLS in table 2 in Appendix S4). There was no sign that the correction estimators performed better than the OLS when measured by the MSR. For residuals of Forms I and II, as well as all three cases in Form IV, correction estimators indeed had less MSR than OLS; 5 out of 12 cases. For and residuals of Forms I and II, as well as all three cases in Form III, OLS in fact performed better (less MSR) than the correction estimators (7 out of 12 cases; table 2 in Appendix S4).
The frequency distributions of data values predicted () from different LFMs and from the different correction estimators, even though especially designed for prediction according to the definition, showed obvious deviations from the frequency distribution of the original data (Y) (Fig. 3; see also figure 5 in Appendix S4), especially for scatter levels and . On the original scale (see Appendix S4), OLS tends to have a lower frequency of observations for small values of Y, but this underestimation of the number of low values becomes less obvious with increasing bin classes of body size (figure 5 in Appendix S4). The frequency distribution of data values from RMA most closely matches the distribution of the original data (Figs 3 and figure 5 in Appendix S4). OLS shows a higher modal frequency but a narrower range of Y-predictions; RMA has a distribution similar to the observed data; ADJ has a lower mode but wider range of Y-predictions, whereas the correction estimators, as well as the NON, have a right-shifted frequency distribution. This overall pattern becomes clearer when the level of scatter changes from low to high (i.e. from to ; Fig. 3a–c). Moreover, sensitivity tests support the fact that the above results were representative and were not affected by the parameters in the data generating process (Appendix S5).
Surprisingly, our results suggest that using the MSR as a measure of bias in allometry could be misleading. The rationale behind these correction estimators is to provide a straightforward analytical solution for minimizing the MSR and thus, to improve the accuracy of predictions based on the result from the OLS linear regression. Moreover, a correction estimator was adopted as an alternative to the complexity of directly using a nonlinear algorithm, especially given the limited calculation capacity of computers in the 1940s–1980s when these correction estimators were designed (Finney 1941; Heien 1968; Bradu & Mundlak 1970; Zellner 1971; Beauchamp & Olson 1973; Duan 1983). With the speed and capacity of modern computers having increased substantially, the NON has once again gained momentum and is increasingly employed in the biological sciences for line-fitting and for comparison with other LFMs (Packard & Boardman 2009). Our results confirm that for allometric prediction the NON indeed produces the lowest MSR, although it systematically underestimates the slope value (b) and thus yields a completely right-skewed frequency distribution of predicted values on the original scale. This, however, does not necessarily mean that the NON has any calculation or conceptual flaws, but, on the contrary, implies that the flaw lies with using MSR as the measure of bias for the allometric prediction.
Although earlier computational capacity may have constrained a straightforward way of seeking peaks on the parameter landscape, modern desktop computers can easily handle this task. Clearly, designing new LFMs and correction estimators will have limited value and has essentially become an outdated pursuit. In this regard, it is not the LFMs and correction estimators that should form the focus of further attention in allometry, but the criteria for assessing their performance and the measure of bias, such as the MSR and MSD. MSR is not a robust criterion for allometric prediction for two reasons. First, the peak in the MSR parameter landscape is difficult to distinguish (Fig. 1). Second, the frequency distribution of Y predicted by the parameters at the peak strongly deviates from the observed frequency distribution (Fig. 3). The MSD bias measure we introduce shows a clear ridge in the parameter landscape, suggesting a local optimum, yet the exact peak is still not obvious. Therefore, the focus of allometric regression methodology should be shifted from designing new LFMs and correction estimators to presenting criteria for assessing bias that can not only generate a parameter landscape with clear peaks, but also with plausible interpretation.
In conclusion, as most previous work has shown (Legendre & Legendre 1998; Quinn & Keough 2002; Warton et al. 2006), different criteria of bias measures and research objectives (e.g. prediction on the original scale vs. accurate slope estimation) will lead to different recommendations. For minimizing the MSR on the antilog scale (i.e. for accurate prediction on the original scale), the NON is probably best. For predicting Y with the same frequency distribution as the observations (i.e. for accurate slope estimate), RMA performs well in most cases. OLS tends to underestimate the Y-value for species (or individuals) with extreme body size (either small or large), but overestimates the Y-value for those with moderate body size. Even though the correction estimators might not perform too poorly at prediction on the original scale, the frequency distributions of predicted compared with observed data are obviously right-skewed. Furthermore, it is also clear now that the performance of different LFMs and correction estimators is influenced by the residual shapes, sample size and the coefficient of determination (the scatter) of the data (Appendix S4). Data with a low level of scatter for small body size generate low MSR on a logarithmic scale, but generate high MSR on the antilog scale. Form III data tend to have the highest slope estimate, whereas Form I data have the lowest. Allometric slope comparisons or predictions are not reliable when either the number of observations (n <50) or the coefficient of determination is too low (R2 < 0·67). As LaBarbera (1989) pointed out, ‘scaling studies paint nature with a very broad brush; they are more akin to the gas laws of physics than to Newton’s laws.’ A shift in the methodological focus of allometric regression away from designing sophisticated LFM or correction estimators towards seeking a robust and sensitive bias criterion should help to find the best ‘brush’ for use in scaling studies.