Estimating unbiased haplotype frequencies from stem cell donor samples typed at heterogeneous resolutions: a practical study based on over 1 million German donors


The human leukocyte antigen (HLA) distribution in donor registry data is typically nonrandom as, mostly for economical reasons, typing additional loci or resolving ambiguities is selectively performed based on the previously known HLA type. Analyzing a sample of over 1 million German stem cell donors, we practically show the extent of the bias caused by the restriction of the input data for HLA haplotype frequency (HF) estimation to subsets selected according to their higher HLA typing resolution and, conversely, the correctness of estimates based on unselected data with a methodology suitable for heterogeneous resolution. We discuss algorithmic aspects of this approach and, also because of the sample size, provide some new insights into the distribution of HLA-DRB1 alleles in the German population and the application of HFs in unrelated donor search.