Incomplete specimens in geometric morphometric analyses



  1. The analysis of morphological diversity frequently relies on the use of multivariate methods for characterizing biological shape. However, many of these methods are intolerant of missing data, which can limit the use of rare taxa and hinder the study of broad patterns of ecological diversity and morphological evolution. This study applied a mutli-data set approach to compare variation in missing data estimation and its effect on geometric morphometric analyses across taxonomically variable groups, landmark position and sample sizes.
  2. Missing morphometric landmark data were simulated from five real, complete data sets, including modern fish, primates and extinct theropod dinosaurs. Missing landmarks were then estimated using several standard approaches and a geometric-morphometric-specific method. The accuracy of missing data estimation was determined for each estimation method, landmark position and morphological data set. Procrustes superimposition was used to compare the eigenvectors and principal component scores of a geometric morphometric analysis of the original landmark data, to data sets with A) missing values estimated, or B) simulated incomplete specimens excluded, for varying levels of specimens incompleteness and sample sizes.
  3. Standard estimation techniques were more reliable estimators and had lower impacts on morphometric analysis compared with a geometric-morphometric-specific estimator. For most data sets and estimation techniques, estimating missing data produced a better fit to the structure of the original data than exclusion of incomplete specimens, and this was maintained even at considerably reduced sample sizes. The impact of missing data on geometric morphometric analysis was disproportionately affected by the most fragmentary specimens.
  4. Missing data estimation was influenced by variability of specific anatomical features and may be improved by a better understanding of shape variation present in a data set. Our results suggest that the inclusion of incomplete specimens through the use of effective missing data estimators better reflects the patterns of shape variation within a data set than using only complete specimens; however, the effectiveness of missing data estimation can be maximized by excluding only the most incomplete specimens. It is advised that missing data estimators be evaluated for each data set and landmark independently, as the effectiveness of estimators can vary strongly and unpredictably between different taxa and structures.