Automated binning of microsatellite alleles: problems and solutions

Authors


  • Present address: Angela Frodsham, Cambridge Genetics Knowledge Park, Public Health Genetics Unit, Strangeways Research Laboratory, Worts Causeway, Cambridge CB1 8RN, UK

William Amos, Fax: +44 1223 336676; E-mail: w.amos@zoo.cam.ac.uk.

Abstract

As genotyping methods move ever closer to full automation, care must be taken to ensure that there is no equivalent rise in allele-calling error rates. One clear source of error lies with how raw allele lengths are converted into allele classes, a process referred to as binning. Standard automated approaches usually assume collinearity between expected and measured fragment length. Unfortunately, such collinearity is often only approximate, with the consequence that alleles do not conform to a perfect 2-, 3- or 4-base-pair periodicity. To account for these problems, we introduce a method that allows repeat units to be fractionally shorter or longer than their theoretical value. Tested on a large human data set, our algorithm performs well over a wide range of dinucleotide repeat loci. The size of the problem caused by sticking to whole numbers of bases is indicated by the fact that the effective repeat length was within 5% of the assumed length only 68.3% of the time.

Ancillary