Volume 62, Issue 5
Original Article

Biases with the Generalized Euclidean Distance measure in disparity analyses with high levels of missing data

Oscar E. R. Lehmann

Corresponding Author

E-mail address: lehmanncxii@gmail.com

Sección Paleontología de Vertebrados, CONICET–Museo Argentino de Ciencias Naturales ‘Bernardino Rivadavia’, C1405DJR Buenos Aires, Argentina

Corresponding authorsSearch for more papers by this author
Martín D. Ezcurra

Corresponding Author

E-mail address: martindezcurra@yahoo.com.ar

Sección Paleontología de Vertebrados, CONICET–Museo Argentino de Ciencias Naturales ‘Bernardino Rivadavia’, C1405DJR Buenos Aires, Argentina

School of Geography, Earth & Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT UK

Corresponding authorsSearch for more papers by this author
Richard J. Butler

School of Geography, Earth & Environmental Sciences, University of Birmingham, Edgbaston, Birmingham, B15 2TT UK

Search for more papers by this author
Graeme T. Lloyd

School of Earth & Environment, University of Leeds, Leeds, LS2 9JY UK

Search for more papers by this author
First published: 17 May 2019
Citations: 4

Data archiving statement:

Data for this study, including data sets, scripts, and complete graphical results are available in the Dryad Digital Repository: https://doi.org/10.5061/dryad.4cv1421

Abstract

The Generalized Euclidean Distance (GED) measure has been extensively used to conduct morphological disparity analyses based on palaeontological matrices of discrete characters. This is in part because some implementations allow the use of morphological matrices with high percentages of missing data without needing to prune taxa for a subsequent ordination of the data set. Previous studies have suggested that this way of using the GED may generate a bias in the resulting morphospace, but a detailed study of this possible effect has been lacking. Here, we test whether the percentage of missing data for a taxon artificially influences its position in the morphospace, and if missing data affects pre‐ and post‐ordination disparity measures. We find that this use of the GED creates a systematic bias, whereby taxa with higher percentages of missing data are placed closer to the centre of the morphospace than those with more complete scorings. This bias extends into pre‐ and post‐ordination calculations of disparity measures and can lead to erroneous interpretations of disparity patterns, especially if specimens present in a particular time interval or clade have distinct proportions of missing information. We suggest that this implementation of the GED should be used with caution, especially in cases with high percentages of missing data. Results recovered using an alternative distance measure, Maximum Observed Rescaled Distance (MORD), are more robust to missing data. As a consequence, we suggest that MORD is a more appropriate distance measure than GED when analysing data sets with high amounts of missing data.

Number of times cited according to CrossRef: 4

  • Is Cyclocardia (Conrad) a wastebasket taxon? Exploring the phylogeny of the most diverse genus of the Carditidae (Archiheterodonta, Bivalvia), Palaeontology, 10.1111/pala.12467, 63, 3, (477-495), (2020).
  • Early high rates and disparity in the evolution of ichthyosaurs, Communications Biology, 10.1038/s42003-020-0779-6, 3, 1, (2020).
  • Disparities in the analysis of morphological disparity, Biology Letters, 10.1098/rsbl.2020.0199, 16, 7, (20200199), (2020).
  • Ten more years of discovery: revisiting the quality of the sauropodomorph dinosaur fossil record, Palaeontology, 10.1111/pala.12496, 0, 0, (undefined).

The full text of this article hosted at iucr.org is unavailable due to technical difficulties.