A mathematical approach to defining clade names, with potential applications to computer storage and processing

Authors

  • T. Michael Keesey


  • doi:10.1111/j.1463-6409.2007.00302.x

Corresponding author: T. Michael Keesey, PO Box 292304, Los Angeles, CA 90027, USA. E-mail: keesey@gmail.com

Abstract

Clade names may be objectively defined based on conditions of phylogeny. Definitions usually take one of three forms — node-, branch- or apomorphy-based — but other forms and complex permutations of these forms are also possible. Some database projects have attempted to store definitions of clade names in a manner accessible to computer applications, but, so far, they have only provided ways of storing the most common types of definition. To create a more extensible system, I have taken a mathematical approach to defining clade names. To render definitions accessible to computer storage and analysis, I propose using Mathematical Markup Language (MathML) with extensions. Since the mathematical approach is granular to the level of the organism, not to fuzzy higher levels such as population or species, it sheds light on some theoretical difficulties with defining clade names. For example, some definitions do not resolve to a single organism as the ancestor, but to sets of organisms which are not ancestral to each other and share common descendants. I term such sets ‘cladogenetic sets’.

Ancillary