Probability of success in the search for a related bone marrow donor in Cologne, Germany using HLA‐A, ‐B and ‐DRB1 haplotype frequencies

Between 2004 and 2013, 603 patients and their relatives (n = 1297) were typed as part of the search for a suitable HLA‐matched donor in their nuclear and extended families at the central service provider for transfusion medicine at the University Hospital of Cologne. The high success rate in finding donors over the years at our center (38.1%) led us to examine our database retrospectively in order to evaluate the donor search and haplotype frequencies (HFs) in the sample. Our goal was to identify the factors contributing to this high success rate and also to compare the HFs we observed with other reported haplotype frequency estimations (HFE) for the Cologne area. Probability estimations for a successful donor search were constructed based on the HFEs for the sample.


| INTRODUCTION
Within the scope of allogeneic hematopoietic stem cell transplantation (HSCT), the search for suitable donors within the families of patients has many advantages over an unrelated donor search. Usually, family donors share complete haplotypes on a genetic level, which benefits the overall outcome of the bone marrow transplantation. [1][2][3][4] We observed that in most instances, family donors are highly motivated to help their relatives, and the timespan between the occurrence of the disease and the HLA typing is in general much lower than in the search for an unrelated donor. Average costs are also lower in many cases. 5 When studying a collective of patients and their relatives who were typed for allogenic HSCT by transfusion medicine at the University Hospital of Cologne (North Rhine-Westphalia, Germany), we observed few discrepancies regarding dubious paternity as well as high success rates over the years in finding HLA-matched donors in the nuclear family. In light of the high social and cultural diversity in Cologne and the fact that 31% of residents have immigrant backgrounds, 6 we sought to determine how this diversity would impact search results. An aboveaverage success rate has been reported in the Middle East, including developing Arabic countries, with success rates reaching 80%. 5,7,8 The gap between reported success rates in the Middle East and the success rate experienced by our center motivated us to examine our database retrospectively. Our goal was to precisely estimate actual success rates in the search for donors within nuclear and extended families as well as to estimate the haplotype frequencies (HFs) for the sample in Cologne. This effort seemed particularly worthwhile in light of the fact that reliable data about the distribution of HFs in the population are important for donor databases in order to ensure that the recruiting of unrelated stem cell donors can be sensibly planned. 9 Our overarching aim was to identify the causes of these high success rates in The copyright line for this article was changed on 8 July 2019 after original online publication. the family donor search and also to compare the HFs we observed with other reported haplotype frequency estimations (HFE) within the Central European population.

| Properties of the sample
The sample we studied consisted of 1900 individuals, including 603 patients and their relatives (n = 1297) from consecutive family donors requiring allogenic HSCT. The sample was generated from January 2004 to April 2013 by the Transfusion Medicine Unit of the University Hospital of Cologne and analyzed retrospectively. Among the patients, 57.9% were male (n = 349) and 42.1% (n = 254) were female. The average age of the patients was 43.5 years (range: 0-73 years). There were 59 (9.8%) pediatric cases (0-18 years). Table 1 shows recruited donors and their degree of kinship and HLA compatibility to the patient. Within the scope of the analysis, compatibility was categorized as HLA-identical, HLA-haploidentical, 9/10-match, 8/10-match or no compatibility. In 18 of the family donor searches (3%), there was one or more haplotypes between the parents and their children that could not be explained by normal inheritance or a recombination of the alleles. These relatives were thus not representative of a family collective and were excluded from further analysis. For other families with surnames not of German ethnicity, we attempted to determine the ethnic origin of the patients by assessing their surnames. Determinations could not be made in all cases. Another approach was to determine the ethnicity by using ancestry-informative markers. 10 This was not possible in all cases due to the fact that we analyzed empirical data with ambiguous typing resolution. In light of this and the small sample size compared with other publications, 11-18 these families were kept in the analysis. It can be assumed, however, that depending on the location, the population being studied represents a cross-section of the Central European population or is Caucasian.

| Statistical methods
All patient and donor data were initially entered into an Excel spreadsheet (Microsoft Corp., Redmond, Washington) and underwent statistical analysis with SPSS Statistics 22 (IBM Corp., Armonk, New York). The data entered into the spreadsheet included the HLA phenotypes and HLA -A, -B, -C, -DRB1 and -DQB1 loci of the patients and their family members; HLA compatibility and degree of kinship to the patients; as well as the age, gender and name initials of the sampled individuals. To derive the corresponding HFs, an expectation-maximization (EM) algorithm (FAMHAP) was used, which iteratively calculates the maximum likelihood (ML) of the phenotypic HLA data and thus reconstructs where possible the four haplotypes of a nuclear family based on the data with an assumed Hardy-Weinberg equilibrium (HWE). 19,20 To calculate the HFs, first-degree family donors were consistently included for further investigation. To estimate the haplotypes of the nuclear families, some additional data had to be excluded from the analysis. This included datasets in which the familial relationships had not been clearly described or for which phenotypic HLA data for the loci (A-B-DRB1) examined was missing. The HFE was therefore based on 404 individual nuclear families including 1616 analyzed haplotypes. The HFs that could be determined in this way were compared with the HFs from other donor databases.

| Quantitative and qualitative aspects of the data
In order to check the compatibility in our institute, we provided an initially low resolution typing at the HLA-A, -B and -DRB1 loci in accordance with European Federation for Immunogenetics (EFI) standards. 21 The HLA-A and -B Uncle Nephew The table shows the familial relations between the patients in the sample and their HLA compatibility, both in the nuclear and extended family.
screening was performed on peripheral blood lymphocytes using the complement-dependent cytotoxicity (CDC) crossmatch technique (BAG Histo Tray AB 144, BAG Health Care GmbH, Lich, Germany). The HLA-DRB1 typing and in some cases (old blood samples, unclear serological typing results) the HLA-A and -B typing were performed using the sequence-specific-primers (SSP) method (Olerup SSP HLA-A, -B, -DRB1 low resolution kit, Olerup SSP AB, Saltsjöbaden, Sweden). HLA low-resolution typing was performed by polymerase chain reaction SSP (PCR-SSP) using the ABDR-and DR-DQ Typing Tray (Olerup SSP AB). The high-resolution testing was done with sequence-based typing (SBT) (Celera Co., Alameda, California) and with SSP trays (Olerup SSP AB). For the DNA extraction from peripheral EDTA blood samples, the QIAamp DNA Blood Mini Kit (Qiagen GmbH, Hilden, Germany) was used. The confirmatory testing was provided by low-resolution typing for the five loci HLA-A, -B, -C, -DRB1 and -DQB1 if the haplotypes was ascertained by descent. HLA-A, -B, -C, -DRB1 and -DQB1 high resolution typing was routinely performed when the identity couldn't be established by segregation.
Due to the ambiguity of the phenotypic HLA data in terms of a heterogeneous typing resolution and the varying completeness of the phenotypic dataset available for the sample studied (HLA-A: 99.67%, -B: 99.67%, -C: 53.16%, -DRB1: 98.21% and -DQB1: 54.51%), we encountered several problems associated with the use of the EM algorithm. Several other authors have encountered similar problems. [11][12][13][14][15] These could be dealt with by tracing the HLA data back to a uniform serological or low-resolution HLA nomenclature based on Version 3.2.0 of the IPD-IMGT/HLA allele list. 22 Serologically defined split antigens were traced back to their corresponding broad antigens. Furthermore, all DRB1 alleles were translated to a low-resolution molecular genetic notation (two-digit) in order to analyze the HFs. In light of the high number of absent antigens and alleles at the -C and -DQB1 HLA gene loci, only the HLA-A, -B and -DRB1 loci were considered for the HFE.
From the existing data from the nuclear families, 658 haplotypes could be derived. The 239 haplotypes with a frequency of more than 1 in 1000 are described lexicographically and have a cumulative frequency of 71.92% ( Table 2). The 20 most common HFs in our sample are detailed in Table 3.

| DISCUSSION
What differentiates this study from most other HF estimates published to date is the fact that we only examined nuclear families to determine the HFs. With a frequency of 6.18%, the haplotype A1-B8-DR03 is the most common in our cohort and corresponds to the results from Schmidt et al 13 (5.83%) and Eberhard et al 12 (5.97%) for the German population as well as descriptions of the distribution of HFs in the Caucasian population 14 (6.25%) within the National Marrow Donor Program (NMDP) as the most common haplotype in these reports overall. It is also the most common haplotype in the publication from Gourraud et al 23 of the French Bone Marrow Donor Registry. Other common haplotypes in our sample are A3-B35-DR01 (2.29%), A2-B7-DR15 (2.27%), A3-B7-DR15 (1.69%) and A2-B44-DR04 (1.68%), which are also described as common in HFE reports for the Central European population. 24 The individual position shifts in the frequencies-for example, compared with the reference data from Eberhard et al 12 -can be assessed by their larger sample size in relation to our cohort.
The empirical hit rate in the search for HLA-identical donors among family members in our sample can be explained basically by the increased occurrence of sibling typing (with a ratio of 1.8 per patient in the overall sample in Cologne) and can be deduced as follows: Because the relevant alleles in the HLA system are closely coupled on the short arm of chromosome 6, 15 the relevant haplotypes are generally inherited as a genetic unit. Theoretically, the likelihood of finding an HLA-identical sibling among family members is quantified as 25%. 7 To describe this hit rate more precisely, the rate at which the parents of the patients feature homozygous haplotypes or both parents are homozygous with each other must also be calculated. This could hypothetically be caused by accumulation of certain haplotypes in the gene pool of a population and a person inheriting the same haplotype twice by chance. This can be determined more precisely in the sample by deriving the rate of homozygosity of m = 0.0079. A less common reason for phenotypes of two siblings matching completely is chance matching of the phenotypes, even though different haplotypes were inherited from both parents. This probability corresponds to the chance that two unrelated persons have the same HLA phenotype and is calculated based on the HFs estimated by us of k = 0.000118. A formula for the probability F of finding an HLA-identical sibling can be derived as follows: The probability F that two siblings are HLA-identical with one another is therefore 0.25 from the inheritance plus 0.5 multiplied by the homozygosity rate that one parent has the same haplotype twice, that is, he or she is homozygous, plus 0.25 multiplied by the rate k that the phenotypes between the siblings match by chance. This produces a probability of 25.4% for one sibling. The probability F therefore increases with each additional sibling according to the binomial distribution. The number of siblings in the sample varied from 1 to 7 siblings per patient (1n = 307, 2n = 165, 3n = 59, 4n = 31, 5n = 11, 6n = 8, 7n = 3). If the arithmetic mean is calculated taking the binomial distribution into account, according to the distribution of the number n of siblings per patient, this produces a theoretical hit rate F of 38.5%. If we now analyze the rate of successful stem cell donor searches in patients with at least one sibling (n = 584) in the family structure, we find an empirical hit rate of 39%, which most closely approximates the theoretical assumptions, taking into account statistical variations and possible recombinations and mutations of the HLA alleles. This formula can also be applied to extended family constellations ( Table 4). For this purpose, the probability that two related persons with a degree of relatedness V have 0, 1 or 2 haplotypes in common due to inheritance is defined as: P(I = 0| V), P(I = 1|V) and P(I = 2|V). It has previously been shown that for siblings the following applies: P(I = 2|G) = 0.25, P(I = 1|G) = 0.5 and P(I = 0|G) = 0.25. The chance of a complete match between two random relatives is therefore calculated as: This formula shows that the probability of finding an HLA-identical donor in the extended family donor search is extremely low. For cousins, the theoretical hit rate is 0.2%, and for aunts and uncles it is 0.4%. Between parents and their children, the chance of finding an HLA-identical donor is 0.8%, according to the formula and the HFs in Cologne. Whether the extended family donor search produces results in light of this must be critically weighed. Our results show in this context that an extended family donor search was not able to achieve an HLA-identical hit. Schipper et al 25 have also described a formula for calculating the likelihood of finding a suitable donor in the extended family. They developed a software program (EXTFAM), to calculate the probability based on HFs, which is similar to our approach. The formula presented here is different, however, in that we draw on the total number of all HFs in order to calculate probabilities for discovering an HLA-matched donor. Furthermore, using this formula we can simultaneously calculate the probability of finding a suitable donor in the nuclear family. With this formula, we seek to provide an overview of the general probability of finding a donor within the nuclear and extended family. Accordingly, the approach introduced in our study is more general, offering a broader overview of the likelihood of an HLA match, provided the HFs in the sample are known.
In sum, the results of our examination of data on HLAmatched donor searches in Cologne show that higher success rates in our cohorts are strongly correlated with an increasing number of siblings, in accordance with the binominal distribution of HLA frequencies and considering the calculatory  approach used in this study. Jawdat et al also come to the conclusion that it is not the age or sex of a patient, nor the quantity of consanguineous marriages in the population, but rather the number of siblings who are possible donors that plays a significant role in higher success rates. 7 Furthermore, we could show that our HFEs largely correspond to the results obtained in other publications for the Caucasian population. This provides insight into the distribution of HFs in Cologne while also helping to optimize the quality of searches for related and unrelated bone marrow donors in the region.