The genetic basis of many human diseases, especially those with substantial genetic determinants, has been identified. Notable amongst others are cystic fibrosis, Huntington's disease and some forms of cancer. However, the detection of genetic factors with more modest effects such as in bipolar disorders and a majority of the cancers, has been more complicated. Standard linkage analysis procedures may not only have little power to detect such genes but they do, at best, only narrow the location of the disease susceptibility gene to a rather large region. Association studies are therefore necessary to further unveil the aetiological relevance of these factors to disease. However, the number of tests required if such procedures were used in extended genome-wide screens, is prohibitive and as such association studies have seen limited application, except in the investigation of candidate genes. In this paper, we discuss a logistic regression approach as a generalization of this procedure so that it can accommodate clusters of linked markers or candidate genes. Furthermore, we introduce an expectation maximization (E–M) algorithm with which to estimate haplotype frequencies for multiple locus systems with incomplete information on phase.