Get access

Accuracy, efficiency and robustness of four algorithms allowing full sibship reconstruction from DNA marker data

Authors

  • K. Butler,

    1. Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J5, Canada
    Search for more papers by this author
  • C. Field,

    1. Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J5, Canada
    Search for more papers by this author
  • C. M. Herbinger,

    Corresponding author
    1. Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J5, Canada
      Christophe Herbinger, Department of Biology, Dalhousie University, Halifax, Nova Scotia B3H 4J1, Canada. Fax: 1902 4941397; E-mail: Christophe.herbinger@dal.ca
    Search for more papers by this author
  • B. R. Smith

    1. Department of Mathematics and Statistics, Dalhousie University, Halifax, Nova Scotia, B3H 3J5, Canada
    Search for more papers by this author
    • *

      Authors in alphabetic order.


Christophe Herbinger, Department of Biology, Dalhousie University, Halifax, Nova Scotia B3H 4J1, Canada. Fax: 1902 4941397; E-mail: Christophe.herbinger@dal.ca

Abstract

In the problem of reconstructing full sib pedigrees from DNA marker data, three existing algorithms and one new algorithm are compared in terms of accuracy, efficiency and robustness using real and simulated data sets. An algorithm based on the exclusion principle and another based on a maximization of the Simpson index were very accurate at reconstructing data sets comprising a few large families but had problems with data sets with limited family structure, while a Markov Chain Monte Carlo (MCMC) algorithm based on the maximization of a partition score had the opposite behaviour. An MCMC algorithm based on maximizing the full joint likelihood performed best in small data sets comprising several medium-sized families but did not work well under most other conditions. It appears that the likelihood surface may be rough and presents challenges for the MCMC algorithm to find the global maximum. This likelihood algorithm also exhibited problems in reconstructing large family groups, due possibly to limits in computational precision. The accuracy of each algorithm improved with an increasing amount of information in the data set, and was very high with eight loci with eight alleles each. All four algorithms were quite robust to deviation from an idealized uniform allelic distribution, to departures from idealized Mendelian inheritance in simulated data sets and to the presence of null alleles. In contrast, none of the algorithms were very robust to the probable presence of error/mutation in the data. Depending upon the type of mutation or errors and the algorithm used, between 70 and 98% of the affected individuals were classified improperly on average.

Get access to the full text of this article

Ancillary