Only the consequences of a theory can produce good evidence in its favour. Here, the theory is that a mathematical distribution, the lognormal, describes the abundance of species. The nature of the distribution and its consequences determine whether it is a satisfactory description and may indicate the limits of its scope. In this section we look at six such features of the lognormal. The conclusions vary between condemning the theory and neutrality about it; none offer support for it.
(i) The lognormal distribution borders unsatisfactory distributions
The mathematics of the lognormal distribution are, of course, well known (Aitchison & Brown 1957; Shimizu & Crow 1988), and we will not revisit them here. Ulanowicz (2003) points out that the lognormal is at the interface between well-behaved negative exponentials and a power-law family of distributions that have infinite theoretical variances. He says ‘it is sometimes not easy to decide whether data represent a lognormal distribution or whether they might better portray a closely related, but more poorly behaved, power-law formula’. Similarly, Schmoyer et al. (1996) note that the log-t distributions (t as in Student's t-test) have infinite variances and that the lognormal is the log-t with an infinite number of degrees of freedom. In both cases, of course, real finite samples must have finite variances, but could be drawn from theoretical distributions with infinite variances. The Cauchy distribution is the standard example of one such in textbooks.
The distributions with infinite variance all have thicker tails than the lognormal. As we will show below, the lognormal already has too thick a right-hand tail to be plausible. The same argument would show that these infinite-variance distributions are even less plausible.
(ii) The lognormal is not additive and is taxonomically restricted
The addition of two lognormal curves with different means or different variances or both leads to a distribution that is not lognormal. So if the lognormal applies, for instance, to passeriform birds and to charadriiform birds, it cannot apply to all birds and, conversely, if it applies to all birds it cannot apply to individual orders of birds. In practice, the lognormal has been applied to taxonomically limited sets without examining subsets. Indeed, that is what we did with our three examples. It is far from clear at what level it should apply. This difficulty is common to all other SAD functions that have been proposed, which suggests that we are far from a satisfactory theory of SADs, irrespective of the validity of the lognormal.
(iii) The lognormal is continuous
It has been noted in the literature from time to time that SADs are intrinsically discrete but the lognormal is continuous (Kempton & Taylor 1974; Pielou 1975; Williamson 1981; Magurran 1988). Equally, it has been said that the discrete SAD might be the realization of an underlying continuous process (Pielou 1977). Most other SAD functions are discrete. However, as the argument in (v) below will bring out, in practice SADs are only evidently discrete at low numbers and having a discrete function for abundant species is not helpful. For instance, most British bird populations are estimated to only two significant figures. A logarithmic transformation for population size has long been recognized as desirable (Williams 1964; Williamson 1972; Gaston & McArdle 1994) and logarithms are in general not integers. The Poisson lognormal is one attempt to bridge the transition but has not been found satisfactory (see (vi) below).
One consequence of the continuous nature of the lognormal is that it goes to infinity in both directions or, converted back to an arithmetic scale, has a tail ending at zero on the left and a long thin tail going to infinity on the right. The former is regarded by Dewdney (1998, 2000, 2003) as showing the lognormal to be unrealistic as, in his survey, all empirical SADs had a high point at the left-hand side; the singleton class is the most common. The latter is the cause of an unsatisfactory individuals abundance curve (see below). A discrete distribution could perhaps circumvent both these difficulties, but that they are difficulties is a valid objection to the lognormal.
(iv) The right-hand tail is too thick
This objection is lethal to the theory that the lognormal describes the abundance of species yet, oddly, has led to a large literature about how interesting it is. The problem was discussed first in Preston (1948) and developed into the theory of canonical lognormals in Preston (1962), with exceptions noted in Preston (1980). By simple manipulation (Preston 1948; Aitchison & Brown 1957), the species–abundance distribution can produce an individuals–abundance distribution, the number of individuals of species with particular abundances. The latter is also a normal (Gaussian) curve, right-shifted. Preston developed this into the canonical lognormal hypothesis, whereby the most abundant species had individuals at the maximum of the individuals abundance curve. This is shown in many books (e.g. Magurran 1988, 2004).
In Preston's canonical view, the whole of the right-hand half of the individuals abundance curve is missing. Taking a less rigid link between the two curves, as in Preston (1948) or May (1975), the missing part is still approximately a half. The lognormal SAD predicts the existence of many extremely abundant species that do not exist. It is standard that a theory that predicts an absurdity must be wrong. Surprisingly, this obvious argument against the feasibility of a lognormal SAD seems not to have been made before. The canonical lognormal is, then, not something interesting but something absurd. We agree with Dennis & Patil (1988) that ‘The enthusiasm ecologists have for this [Preston's Canonical] hypothesis must be judged from a statistical standpoint as premature.’
It is the right-hand tail of the lognormal that generates the non-existent part of the individuals abundance curve. An appropriate theory should generate an individuals abundance curve that matches what is observed. It follows that such a theory must have a thinner right-hand tail than the lognormal. This is indeed what we noted is observed in fitting lognormals; the right-hand tail is empirically thinner than the lognormal.
(v) The log-binomial and the individuals abundance curve
As noted above, on one side the lognormal borders unsatisfactory distributions. On the other side it can be said to border the log-binomial. This is not a realistic distribution and has not appeared in the SAD literature before. Nevertheless, it makes a useful point about the relationship of the SAD and the individuals abundance curve. We develop the argument numerically. It is easily put into algebraic form.
Consider a small binomial distribution: 1, 6, 15, 20, 15, 6, 1. Suppose, in the spirit of Preston, that these are the numbers of species that have exactly 1, 2, 4, 8, 16, 32 and 64 individuals. Then, the individuals abundance distribution, from cross-multiplying, is 1, 12, 60, 160, 240, 192, 64. The binomial distribution, the SAD, is given by the coefficients in the expansion of (x + y)6, while the individuals abundance distribution is given by the coefficients of (x+2y)6. The 2 in 2y comes from using a doubling between classes; it would be 3y had we used triplings as, for instance, do DeVries et al. (1997). The individuals abundance curve is significantly left-skewed (= 10·27, P < 0·01). As the normal is the limit of the binomial, it could be said that the symmetrical lognormal individuals curve borders asymmetrical, left-skewed, log-binomial individuals curves.
In the sense that it produces a realistic individuals abundance curve, echoing what is seen empirically, the log-binomial is an improvement on the lognormal. By allowing species to have only exactly 2n (or 3n or any other such series) individuals it is clearly most unrealistic. No doubt the realism could be improved by regarding the 2n classes as bin markers and allowing the species to have numbers included in the bin. The argument for the individuals abundance curve would then become approximate, but the result would still be a strong left skew. We do not consider this to be a line worth pursuing. The log-binomial has been introduced solely to show the exceptional nature of the symmetrical, normal, shape of the lognormal individuals abundance curve.
(vi) Preston's veil line is a misunderstanding
Preston (1948) developed the theory of the lognormal SAD using histograms in ‘octaves’, i.e. a doubling between bins. He thought that in samples, what we have called incomplete enumerations, the left-hand part of the histogram would be truncated. Some of the bins would not be observed. He also thought that doubling the sample would reveal exactly one more bin at the left-hand end and similarly for other increases in sample size.
This view has remained popular despite its neglect of proper sampling. Grundy (1951) was sceptical: ‘Heuristically, the sample may be supposed to include most of the species to the right of the veil line, and few of those to the left’, but suggested no improvement. He also noted the unfortunate nature of Preston's binning, which we have already discussed, and adds ‘Preston does not, however, seem to use this [equation 5, a formula for using probits], since he adopts a method of grouping into octaves by which half the species appearing as singletons in the sample are assigned to the left of the veil line.’
From the literature, two points emerge. The first is that sampling does often appear to give an appearance of truncation (e.g. Taylor 1978, using different samples; Gaston & Blackburn 2000, using proper subsamples). The second is that increasing the sample does not just add new bins on the left, it changes the shape of the curve. Complete enumerations are left-skewed and incomplete enumerations are often right-skewed (Fig. 3). This can be seen, although it was not discussed, in Hutchinson's (1967) tabulation of Patrick's diatom data, and seen and discussed in Hubbell & Foster (1983) for trees at BCI.
Dewdney (1998) developed a new theory for sampling from SADs. This uses a Poisson approximation to the hypergeometric distribution involved in taking finite samples without replacement. For some SADs, such as the log-series, sampling does not change the shape. He did not produce an expression for the sampled lognormal but plots such a curve showing the increased right skew. All that can be said is that the shape will change and the change will not involve a veil line but a diminution growing stronger going leftwards. This diminution looks like a truncation when the data are presented as histograms. In rank abundance plots it normally produces a great number of singletons (Fig. 1c) and log right skew (Fig. 3c). The problem of sampling from the lognormal is made more difficult by the samples being necessarily discrete. Nevertheless, there will always (contraWilson 1991) be a sampling effect in incomplete enumerations.
One mathematical attempt to deal with the problems of sampling and discreteness was the development of the Poisson lognormal (Grundy 1951; Cassie 1962; Pielou 1969, 1975, 1977; Bulmer 1974; Slocomb, Stauffer & Dickinson 1977). The argument necessarily has several assumptions about sampling. It is not clear if it is these assumptions, or the more general assumption that it is a lognormal SAD that is being sampled, that leads to an unfortunate and clearly incorrect result. Taking the simpler truncation view, the Poisson lognormal gives an estimate of the number of species that have been truncated which, added to the species that have been observed, leads to an estimate of the total number of species. This, of course, only makes sense for incomplete enumerations. For our example of Ecuadorian butterflies, there certainly must be a figure for the total number of fruit-eating nymphalid species that occurred in Ecuador, or some part of it, during the sampling period and it would be interesting to estimate it. In other cases, for instance for Lepidoptera caught in a light trap, it is less clear what set of species would be estimated. The problem is one of extrapolation, always a risky process.
Slocomb et al. (1977) say ‘Pielou (1975) points out that currently available estimates of N [the total number of species] are not satisfactory; this observation is certainly supported by the results presented here and in Bulmer (1974). This lack of confidence in estimates of N is unfortunate’. Pielou (1975) says ‘estimates of s* [the total number of species] obtained by fitting the Poisson lognormal and the continuous lognormal … are discrepant. … estimates of s* rarely inspire confidence. The whole problem is ripe for further investigation.’Hughes (1986) and Magurran (1988, 2004) also note that such estimates are unsatisfactory. O’Hara & Oksanen (2003) note that different estimates are found for the number of unobserved species when fitting histograms from when fitting rank abundance curves, but they are uncertain why. It seems to us that the assumption of truncation is the central issue. Agreeing with Dewdney (1998) that sampling does not lead to truncation means that all calculations based on that assumption, including fitting Poisson lognormals, should be set aside.
Pielou (1969, 1977) notes that ‘In many collections it is found that singleton species (those represented by one individual) are numerous, often the most numerous’ and Dewdney (1998, 2000, 2003) confirms this. That is not true if the collections are large enough. Collections of many thousands of individuals are often still small relative to the actual size of the assemblage, particularly at broader spatial scales. From our examples and two data sets (copepods and phytoplankton) in McGowan & Walker (1993) we suggest, tentatively, that the dominance of singletons will be lost when between 100 000 and 1 000 000 individuals have been studied. Range-size distributions for various groups in Europe do go down to a size of one hectad (100 km2), the sampling unit, but such singletons are not the most common observation, as is seen easily in area dominance (or rank range distribution) plots (Gaston et al. 1998; Williamson & Gaston 1999; Williamson 2002; Gaston 2003), and even so each hectad normally contains more than one individual, even of rare species. The BCI trees have numerous singletons but the BCI plot is only 50 ha (0·5 km2). Scaling up implies 200 individuals per hectad as a median, many thousands in Panama. The abundance of singletons in published data shows the effort required to obtain an adequate SAD rather than showing the shape of such distributions. A satisfactory theoretical SAD should have few singletons in large enumerations and be slightly log left-skewed at that size while smaller enumerations should have a dominance of singletons viewed arithmetically and be log right-skewed. The lognormal does not have these properties (Dewdney 1998).