In order to study the impact of human leucocyte antigen (HLA) polymorphism distribution in identifying a matched haematopoietic stem cells unrelated donor (UD), we performed a multi-centric retrospective analysis with the aim of comparing the HLA-A, HLA-B, HLA-C, HLA-DRB1 and HLA-DQB1 phenotypes of 2126 patients (772 patients for whom a donor search failed to identify a matched UD, and 1354 patients who received a 10/10 allele level matched UD). Our results showed that rare HLA-C is often responsible for difficulty in identifying a donor. This locus may add a degree of complexity to a supposed ‘frequent’ HLA-A HLA-B and HLA-DRB1 phenotype, turning this phenotype into a less frequent one. For example, 32.5% of the phenotypes in the non-transplanted patients could not be explained by any of the pairs of known HLA-A, HLA-B, HLA-C and HLA-DRB1 haplotypes while this percentage dropped to less than 2% if combinations of only HLA-A, HLA-B and HLA-DRB1 haplotypes were considered. Such situations can be anticipated by computing an index, based on HLA haplotype frequency, the average registry sample size (ARS). ARS is defined as the inverse of the phenotype frequency computed using all corresponding pairs of haplotype frequencies. ARS confirmed that the most significant difference between transplanted and non-transplanted patients was correlated with the introduction of the locus HLA-C in the analysis (median: 8.3e + 4 vs 3.1e + 6, P < 0.0001). The higher the ARS the lower the likelihood of finding a 10/10 match UD reflecting the rareness of the patient's HLA. The area under receiver operator characteristics (AUROC) values of the ARS computation for HLA-A, HLA-B and HLA-DRB1 was 0.82 (0.80; 0.84) at a low-resolution level (two digits). Overall, our study promotes the use of haplotype frequency-based computations to develop computer-assisted donor search.