A taxonomic distinctness index and its statistical properties


  • M. W. Pienkowski,

  • A. R. Watkinson,

  • Gillian Kerby,

  • K. R. CLARKE,

    1. Centre for Coastal and Marine Sciences, Plymouth Marine Laboratory, Prospect Place, West Hoe, Plymouth PL1 3DH, UK
    Search for more papers by this author

    1. Centre for Coastal and Marine Sciences, Plymouth Marine Laboratory, Prospect Place, West Hoe, Plymouth PL1 3DH, UK
    Search for more papers by this author

Dr K. R. Clarke (fax 01752 633101; e-mail: b.clarke@pml.ac.uk).


  • For biological community data (species-by-sample abundance matrices), Warwick & Clarke (1995) defined two biodiversity indices, capturing the structure not only of the distribution of abundances amongst species but also the taxonomic relatedness of the species in each sample. The first index, taxonomic diversity (δ), can be thought of as the average taxonomic ‘distance’ between any two organisms, chosen at random from the sample: this distance can be visualized simply as the length of the path connecting these two organisms, traced through (say) a Linnean or phylogenetic classification of the full set of species involved. The second index, taxonomic distinctness (δ*), is the average path length between any two randomly chosen individuals, conditional on them being from different species. This is equivalent to dividing taxonomic diversity, δ, by the value it would take were there to be no taxonomic hierarchy (all species belonging to the same genus). δ* can therefore be seen as a measure of pure taxonomic relatedness, whereas δ mixes taxonomic relatedness with the evenness properties of the abundance distribution.

  • This paper explores the statistical sampling properties of δ and δ*. Taxonomic diversity is seen to be a natural extension of a form of Simpson's index, incorporating taxonomic (or phylogenetic) information. Importantly for practical comparisons, both δ and δ* are shown not to be dependent, on average, on the degree of sampling effort involved in the data collection; this is in sharp contrast with those diversity measures that are strongly influenced by the number of observed species.

  • The special case where the data consist only of presence/absence information is dealt with in detail: δ and δ* converge to the same statistic (δ+), which is now defined as the average taxonomic path length between any two randomly chosen species. Its lack of dependence, in mean value, on sampling effort implies that δ+ can be compared across studies with differing and uncontrolled degrees of sampling effort (subject to assumptions concerning comparable taxonomic accuracy). This may be of particular significance for historic (diffusely collected) species lists from different localities or regions, which at first sight may seem unamenable to valid diversity comparison of any sort.

  • Furthermore, a randomization test is possible, to detect a difference in the taxonomic distinctness, for any observed set of species, from the ‘expected’δ+ value derived from a master species list for the relevant group of organisms. The exact randomization procedure requires heavy computation, and an approximation is developed, by deriving an appropriate variance formula. This leads to a ‘confidence funnel’ against which distinctness values for any specific area, pollution condition, habitat type, etc., can be checked, and formally addresses the question of whether a putatively impacted locality has a ‘lower than expected’ taxonomic spread. The procedure is illustrated for the UK species list of free-living marine nematodes and sets of samples from intertidal sites in two localities, the Exe estuary and the Firth of Clyde.