Gauging the effects of sampling failure in biogeographical analysis


  • Alan H. Turner,

    Corresponding author
    1. Division of Paleontology, American Museum of Natural History, New York, NY
      *Alan H. Turner, Department of Anatomical Sciences, Stony Brook University, Health Sciences Center T-8 (040), Stony Brook, NY 11794, USA. E-mail:
    Search for more papers by this author
  • Nathan D. Smith,

    1. Committee on Evolutionary Biology, University of Chicago, Chicago, IL
    2. The Field Museum of Natural History, Chicago, IL
    Search for more papers by this author
  • John A. Callery

    1. Applied Mathematics, The Ohio State University, Columbus, OH, USA
    Search for more papers by this author

  • Present address: John A. Callery, 39-72 65th Street, Woodside, NY 11377, USA.

*Alan H. Turner, Department of Anatomical Sciences, Stony Brook University, Health Sciences Center T-8 (040), Stony Brook, NY 11794, USA. E-mail:


Aim  Various methods are employed to recover patterns of area relationships in extinct and extant clades. The fidelity of these patterns can be adversely affected by sampling error in the form of missing data. Here we use simulation studies to evaluate the sensitivity of an analytical biogeographical method, namely tree reconciliation analysis (TRA), to this form of sampling failure.

Location  Simulation study.

Methods  To approximate varying degrees of taxonomic sampling failure within phylogenies varying in size and in redundancy of biogeographical signal, we applied sequential pruning protocols to artificial taxon–area cladograms displaying congruent patterns of area relationships. Initial trials assumed equal probability of sampling failure among all areas. Additional trials assigned weighted probabilities to each of the areas in order to explore the effects of uneven geographical sampling. Pruned taxon–area cladograms were then analysed with TRA to determine if the optimal area cladograms recovered match the original biogeographical signal, or if they represent false, ambiguous or uninformative signals.

Results  The results indicate a period of consistently accurate recovery of the true biogeographical signal, followed by a nonlinear decrease in signal recovery as more taxa are pruned. At high levels of sampling failure, false biogeographical signals are more likely to be recovered than the true signal. However, randomization testing for statistical significance greatly decreases the chance of accepting false signals. The primary inflection of the signal recovery curve, and its steepness and slope depend upon taxon–area cladogram size and area redundancy, as well as on the evenness of sampling. Uneven sampling across geographical areas is found to have serious deleterious effects on TRA, with the accuracy of recovery of biogeographical signal varying by an order of magnitude or more across different sampling regimes.

Main conclusions  These simulations reiterate the importance of taxon sampling in biogeographical analysis, and attest to the importance of considering geographical, as well as overall, sampling failure when interpreting the robustness of biogeographical signals. In addition to randomization testing for significance, we suggest the use of randomized sequential taxon deletions and the construction of signal decay curves as a means to assess the robustness of biogeographical signals for empirical data sets.