Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information
Article first published online: 2 APR 2001
Copyright © 2001 Wiley-Liss, Inc.
Special Issue: SNP 2000: Third International Meeting on Single Nucleotide Polymorphism and Complex Genome Analysis
Volume 17, Issue 4, pages 289–295, April 2001
How to Cite
Rohde, K. and Fuerst, R. (2001), Haplotyping and estimation of haplotype frequencies for closely linked biallelic multilocus genetic phenotypes including nuclear family information. Hum. Mutat., 17: 289–295. doi: 10.1002/humu.26
- Issue published online: 2 APR 2001
- Article first published online: 2 APR 2001
- Manuscript Accepted: 11 JAN 2001
- Manuscript Received: 16 OCT 2000
- haplotype frequency estimation;
- expectation maximization algorithm;
- nuclear family information;
With the discovery of single nucleotide polymorphisms (SNP) along the genome, genotyping of large samples of biallelic multilocus genetic phenotypes for (fine) mapping of disease genes or for population studies has become standard practice. A genetic trait, however, is mainly caused by an underlying defective haplotype, and populations are best characterized by their haplotype frequencies. Therefore, it is essential to infer from the phase-unknown genetic phenotypes in a sample drawn from a population the haplotype frequencies in the population and the underlying haplotype pairs in the sample in order to find disease predisposing genes by some association or haplotype sharing algorithm. Haplotype frequencies and haplotype pairs are estimated via a maximum likelihood approach by a well-known expectation maximization (EM) algorithm, adapting it to a large number (up to 30) of biallelic loci (SNP), and including nuclear family information, if available, into the analysis. Parents are treated as an independent sample from the population. Their genotyped offspring reduces the number of potential haplotype pairs for both parents, resulting in a higher accuracy of the estimation, and may also reduce computation time. In a series of simulations our approach of including nuclear family information has been tested against both the EM algorithm without nuclear family information and an alternative approach using GENEHUNTER for the haplotyping of the families, using the locus-by-locus allele counts of the sample. Our new approach is more precise in haplotyping in cases of a high number of heterozygous loci, whereas for a moderate number of heterozygous positions in the sample all three different approaches gave the same perfect results. Hum Mutat 17:289–295, 2001. © 2001 Wiley-Liss, Inc.