Sequence-based typing was used to identify human leukocyte antigen (HLA)-A, -B, -C, and -DRB1 alleles from 564 consecutively recruited African American volunteers for an unrelated hematopoietic stem cell registry. The number of known alleles identified at each locus was 42 for HLA-A, HLA-B 67, HLA-C 33, and HLA-DRB1 44. Six novel alleles (A*260104, A*7411, Cw*0813, Cw*1608, Cw*1704, and DRB1*130502) not observed in the initial sequence-specific oligonucleotide probe testing were characterized. The action of balancing selection, shaping more ‘even’ than expected allele frequency distributions, was inferred for all four loci and significantly so for the HLA-A and DRB1 loci. Two-, three-, and four-locus haplotypes were estimated using the expectation maximization algorithm. Comparisons with other populations from Africa and Europe suggest that the degree of European admixture in the African American population described here is lower than that in other African American populations previously reported, although HLA-A:B haplotype frequencies similar to those in previous studies of African American individuals were also noted.
Many African American individuals trace their ancestry to African slaves imported into the Americas between the 1600s and the 1800s. These slaves originated from the west coast of Africa, in a region extending from Senegal south to Angola (1). In the Census 2000, 12.9% of the US population, 36.4 million individuals, identified themselves as African Americans (2). Estimates of admixture in this population using genetic markers have estimated 15–20% European ancestry, although levels of European admixture vary in different regions of the Americas and the United States in specific (e.g. 7% in Afro-Caribbeans from Jamaica vs 12.6% in Charleston, South Carolina, vs 22.5% in New Orleans, Louisiana)(3–5). The degree of Native American admixture in the African American populations is controversial but may be as low as 1%–3%. From a human leukocyte antigen (HLA) standpoint, African American populations carry a more diverse set of alleles and haplotypes than other US populations (6–8).
HLA molecules bind both self- and foreign peptides found within the endoplasmic reticulum or the endocytic pathway of cells and transport them to the cell surface for potential recognition by T-lymphocyte antigen receptors. The characteristics of the binding grooves of individual HLA molecules control the repertoire of bound peptides and affect the immune response profiles of individuals (9). The multiple HLA genes carried within the human major histocompatibility complex are highly polymorphic, with over 700 alleles identified at the HLA-B locus alone (10). The HLA alleles are observed at different frequencies in various human populations and have been used to measure population diversity and make inferences about population history (11–14). It is believed that this variability is the result of selective pressure for immune response diversity in human populations (15). This study uses DNA sequencing to unambiguously identify HLA alleles carried by an African American population and algorithms developed in the International Histocompatibility Workshops to analyze diversity and predict haplotypes.
Materials and methods
The study population included 564 African American individuals consecutively recruited as volunteer donors for a bone marrow registry from January 2004 through March 2004. Because of the recruitment setting, individuals are unlikely to be related and are likely to originate from different areas of the United States. All are self-identified as African Americans.
Identification of known HLA alleles
Genomic DNA was prepared using the QIAamp 96 DNA blood kit (Qiagen, Valencia, CA). Each individual was initially typed at intermediate resolution for HLA-A, -B, -C, and -DRB1 by sequence-specific probe-based hybridization using the One Lambda LABType® SSO Kit (One Lambda, Canoga Park, CA) following manufacturer’s protocols. To identify the HLA-A, -B, and -C alleles carried by each individual, polymerase chain reaction (PCR) primers (Table 1) were used to amplify each locus as previously described (16–20). Applied Biosystems Big Dye terminator chemistry and sequencing primers listed in Table 1 were used to obtain the sequences of both strands of exons 2 and 3. DRB1 alleles were amplified and sequenced using the HLA-DRB High Resolution Typing System (Applied Biosystems, Foster City, CA). This kit includes primers to amplify specific DRB1 allele families; additional in-house PCR and sequencing primers were added when needed to obtain resolution. Reactions products were identified with Applied Biosystems Models 3700 or 3730xl DNA analyzers (PE Applied Biosystems, Foster City, CA) and sequence interpretation was done by assign software (Conexio Genomics, Applecross, Western Australia). Alleles identical in exons 2 and 3 (class I) or exon 2 (DRB1) (10) were not resolved except for B*180101 vs B*1817N alleles (16). For those class I samples yielding alternative allele combinations (10), either allele-specific sequencing primers or allele-specific PCR amplification was used to link polymorphisms and to identify the specific allele combination (10). (In-house primer sequences used for all loci are available at www.dodmarrow.org.)
Table 1. DNA amplification and sequencing reagents used for HLA-A, -B and -C identification
|HLA-A||5A2: CCC AGA CGC CGA GGA TGG CCG||3A2: GCA GGG CGG AAC CTC AGA GTC ACT CTC T|
|HLA-B||5B3: GGG TCC CAG TTC TAA AGT CCC CAC G||3B1: CCA TCC CCG GCG ACC TAT AGG AGA TG|
|5B1: GCA CCC ACC CGG ACT CAG AAT CTC CT||3B1-AC: AGG CCA TCC CGG GCG ATC TAT|
|HLA-C||5Cin1-61: AGC GAG GKG CCC GCC CGG CGAb||3BCin3-12: GGA GAT GGG GAA GGC TCC CCA CT|
|HLA-A||5AIn1-46: GAA ACS GCC TCT GYG GGG AGA AGC AA||5In2-148: GTT TCA TTT TCA GTT TAG GCC A|
|3In2-65: TCG GAC CCG GAG ACT GTG||3AIn3-66: TGT TGG TCC CAA TTG TCT CCC CTC|
|AINT1F: GCG CCK GGA GGA GGG T||INT2F: TTA CCC GGT TTC ATT TTC AG|
|INT2R: GGA TCT CGG ACC CGG AG||AINT3R: TCC TTG TGG GAG GCC AG|
|HLA-B||5Bin1-57: GGG AGG AGC GAG GGG ACC SCA G||Bex3F: GGK CCA GGG TCT CAC A|
|Bex2R: CAC TCA CCG GCC TCG CTC TGG||Bin3-37: GGA GGC CAT CCC CGG CGA CCT AT|
|HLA-C||Cex2F: GGG TCG GGC GGG TCT CAG CC||Cex3F: TGA CCR CGG GGG CGG GGC C|
|Cex2R: GGA GGG GTC GTG ACC TGC GC||3BCIn3-12: GGA GAT GGG GAA GGC TCC CCA CT|
Characterization of new HLA alleles
The variant HLA-A allele was isolated by group-specific amplification using primers (5AIN1-46, AIN1-A, AIN1-G, AIN1-T, and 3AIN3-62) as previously described (21, 22). Sequencing of exons 2 and 3 used primers AINT1F, INT2R, INT2F, and AINT3R (Table 1). The HLA-C loci from cells carrying variant alleles were amplified with HLA-C locus-specific primers described in Table 1, and individual alleles isolated by cloning using the TopTA vector (Invitrogen, Carlsbad, CA). Two to three individual clones carrying each new HLA-C allele were obtained and sequenced using primers described in Table 1. Exon 2 from the new HLA-DRB1 allele was amplified by the PCR using intron primers (I1RB9/I2RB28) as previously described (23). Sequencing of the second exon was performed using sense primers (I1-RBSeq1and 3) (23) and an antisense primer I1-RBSeq4 (24). DNA sequence analysis of PCR products was all carried out in both 5′- and 3′-directions for at least two independent PCR reactions in an ABI 3730 Automated DNA Sequencer (Applied Biosystems). Allele designations were assigned by the WHO Nomenclature Committee for Factors of the HLA System (25).
PyPop (Python for Population genetics, http://www.pypop.org) was used to carry out all the following analyses (26, 27). Allele frequencies were obtained by direct counting. Allele frequencies at each HLA locus were evaluated for deviations from Hardy-Weinberg equilibrium proportions using the exact test of Guo and Thompson (28) and by chi-square testing when expected values were ≥5. Chi-square tests were investigated for overall common genotypes (those expected to be seen in at least five instances), ‘lumped’ genotypes (the set of all genotypes individually expected to be seen in fewer than five instances each), all heterozygotes, all homozygotes, and individual common and heterozygote genotypes. These Hardy-Weinberg tests measure the degree to which observed genotype frequencies differ from those expected based on the allele frequencies for that population, assuming that the population is suitably large and experiences random mating (29).
The Ewens–Watterson test of homozygosity was applied to each locus (30, 31), using Slatkin’s Monte-Carlo implementation of the exact test (32, 33). In this test, the observed homozygosity (F, the sum of the squares of the allele frequencies) is compared with the mean value of F expected for a population of the same size with the same number of alleles, undergoing neutral evolution. The normalized deviate of F (Fnd, the difference between the observed and the expected values of F, divided by the square root of the variance of the expected F) was also calculated for each locus (34).
Two-, three-, and four-locus haplotype frequencies were estimated using the iterative expectation maximization (EM) algorithm (35, 36). Linkage disequilibrium (LD) between alleles at each pair of loci, and two overall (locus-pair-level) measures of LD, normalized to values between 0 and 1, were calculated. The normalized allele-pair-level LD measure, D′ij, is the disequilibrium coefficient (D), divided by the upper and lower bounds of D for the particular alleles at each locus [as described in (37–39)] and ranges from +1 to −1. A D′ij value of 0 indicates linkage equilibrium, while a value of +1 indicates the complete association of a given pair of alleles in a single haplotype, and for the data reported here, a value of −1 indicates the complete absence of a haplotype comprised by those alleles. (Note: The complete absence of a particular haplotype can only be inferred from a D′ij value of −1 when none of the reported alleles has a frequency greater than 0.5.) The first of the locus-pair-level measures, D′(37), uses the products of the allele frequencies at each locus to weight the LD contribution of specific allele pairs, while the second, Wn (40), calculates a normalization of the chi-square statistic for deviations between observed and expected haplotype frequencies. The significance of the overall LD between any two loci was tested using the permutation distribution of the likelihood ratio test (36).
Arlequin v3.0 (41) was used to compare the HLA-A:B haplotypes and HLA-C and DRB1 genotypes in this population with those in the sub-Saharan African populations from Kenya (29, 42), Mali (42), Rwanda (29), Senegal (29), South Africa (29), Uganda (42), Zambia (42), and Zimbabwe (29); the European populations from Croatia, the Czech Republic, Finland, Georgia, Northern Ireland, and Slovenia (29); an African American population (43); a European American population (43); an Afro-Cuban population; and a Euro-Cuban population (29) by calculating pairwise Fst values (and associated P values) for this entire set of populations. Fst is a measure of the genetic differentiation over subpopulations. Because all populations had not been genotyped at the same loci or for the same level of resolution, three comparisons were performed and the analysis focused on the amino acid sequences encoding the polymorphic antigen-binding groove. A given pair of population datasets was determined to differ significantly if the appropriate P value associated was less than 0.05.
Sequence-based typing strategy
Class I locus-specific amplification was followed by sequencing of both forward and reverse strands of exons 2 and 3. Of the 564 individuals tested, many required the use of additional reagents to resolve alternative combinations of alleles at HLA-A, -B, and -C loci. Approximately 14% of HLA-A, 20% of HLA-B, and 16% of HLA-C typing results required either additional sequencing primers or a group-specific amplification and sequencing to resolve the typing result to a single combination of two alleles. Separate amplification of allele groups was used to obtain partial DRB1 exon 2 sequences, and 12% of DRB1 typing results required additional PCR or sequencing reagents to resolve alternative combinations. Additional information from the intermediate resolution probe-based testing was used to resolve a small per cent of the ambiguities (HLA-A 3%, HLA-B 2%, HLA-C 2%, and HLA-DRB1 1%).
Allele and genotype frequencies
The genotype frequencies of all four HLA loci were in Hardy-Weinberg equilibrium [Guo and Thomson P values = 0.0670 (HLA-A), 0.4032 (HLA-B), 0.6587 (HLA-C), 0.2818 (HLA-DRB1)]. In total, there were 563 unique HLA-A, -B, -C and -DRB1 phenotypes among the 564 individuals. The number of observed heterozygotes did not differ significantly from that expected under Hardy-Weinberg equilibrium (Table 2). The majority of alleles observed in the population had been reported in the 1996 HLA Nomenclature Report (e.g. 83% of the HLA-A alleles observed in this study were in the report; 91% HLA-B, 73% HLA-C, and 93% HLA-DRB1) (44). More recently reported alleles included A*0260,*7409; B*0812, B*3528, Cw*0210, DRB1*01010102; and DRB1*030502. Only a small number of the known alleles (based on the April 2005 ImMunoGeneTics/HLA database release) were identified at each locus (Table 3). For HLA-A, only 42 (11%) of 368 known alleles with unique exons 2–3 sequences were observed in the African American population. Twelve alleles, present at ≥3%, contribute 76% of the allele frequency. Two alleles exhibited allele frequencies over 10%, A*0201G1 (12.1%) and A*2301G (10.8%) (Table 3 describes the ‘G’ nomenclature). For HLA-B, 67 (10%) of 665 known alleles were identified. Twelve alleles present at ≥3% contributed 65%. Only B*530101 had a frequency greater than 10% (11.8%). For HLA-C, 36 (20%) of 178 were observed. Ten alleles, present at ≥3%, contribute to the majority of the total frequency (81%). Two alleles were very frequent: Cw*0401G1 at 19.6% and Cw*0701G1 at 11.9%. For HLA-DRB1, 45 (11%) of 412 unique exon 2 sequences were observed. Twelve HLA-DRB1 alleles present at >3% contributed 77% of the allele frequency. DRB1*1503 was present at 12.2%.
Table 2. Heterozygosity at the HLA-A, -B, -C and -DRB1 loci in an African American population
Table 3. HLA allele frequencies in 564 random African American individualsa
| ||440301||0.05053|| ||150201||0.00177|
| ||440302||0.00798|| ||1503||0.12234|
| ||4405||0.00089|| ||160201||0.01596|
| ||4410||0.00177|| |
| ||4501G||0.04699|| |
| ||470101||0.00089|| |
| ||4802||0.00089|| |
| ||4901||0.02039|| |
| ||5001||0.00621|| |
| ||5101G1||0.01064|| |
| ||5109||0.00089|| |
| ||520101||0.00177|| |
| ||520102||0.01773|| |
| ||530101||0.11791|| |
| ||550101||0.00621|| |
| ||5601||0.00266|| |
| ||570101||0.00266|| |
| ||5702||0.00089|| |
| ||570301||0.04078|| |
| ||5704||0.00709|| |
| ||5801||0.03191|| |
| ||5802||0.04167|| |
| ||7801||0.01241|| |
| ||8101G||0.02128|| |
| ||8201||0.00177|| |
Ewens–Watterson homozygosity test
The Ewens–Watterson homozygosity statistic (F) and the normalized deviate of F (Fnd) were used to infer selective pressures acting on individual loci in the population (30, 31). The Ewens–Watterson model predicts a value of F for a population of a given size and showing a given number of alleles at a locus that is evolving in a neutral fashion. The value of the normalized deviate of F for such a population is 0. Observed Fnd values significantly greater than 0 are consistent with the operation of directional selection at that locus, while observed Fnd values significantly lower than 0 are consistent with the operation of balancing selection. Thus, significantly more ‘even’ than expected allele frequencies are consistent with balancing selection, while significantly more ‘skewed’ frequencies are consistent with directional selection or extreme demographic events (e.g. a population bottleneck) (29).
Results of the Ewens–Watterson homozygosity test are shown in Table 4. Negative Fnd values were observed at all four loci, and the Fnd values for the HLA-A and -DRB1 loci were significantly low (P= 0.0227 for HLA-A and 0.0408 for DRB1), consistent with the action of balancing selection at the these loci. In addition, the application of a sign test to these Fnd values reveals an overall deviation (P value = 0.0455) from the expectation of neutral evolution (Fnd= 0), suggesting the action of balancing selection in shaping allelic diversity at all four of these loci.
Table 4. Ewens–Watterson homozygosity test of neutrality
Six individuals carried novel alleles at the HLA-A, HLA-C, or HLA-DRB1 loci (Table 5). Two new alleles, A*260104 and DRB1*130502, exhibited synonymous substitutions. The former exhibited a unique substitution at a conserved position (codon 89), and the latter is a common alternative at codon 90 (ACA/ACG). Four alleles, A*7411, Cw*0813, Cw*1608, and Cw*1704, carried single, non-synonymous substitutions. The substitutions found in three of the alleles, A*7411, Cw*1608, and Cw*1704, were in conserved positions. Cw*0804 and Cw*0813 differ in that the more common arginine is found at codon 35 in Cw*0813. While none of these six variants showed novel sequence-specific probe hybridization patterns when typed with the LabType SSO kit, all were identified later by DNA sequencing.
Table 5. New alleles identified during study
Of the 378 A:B haplotypes predicted by the EM algorithm, 105 were found in three or more copies (Table 6). The most common haplotype was A*3001,B*4201, with a frequency of 0.03388. Nineteen other A:B haplotypes were found at frequencies greater than 0.01. Many of the 20 haplotypes share alleles. For example, B*5301 is found associated with four HLA-A alleles and A*2301G is associated with four HLA-B alleles. Of the 162 C:B haplotypes, 69 were found in three or more copies. The most common haplotype was Cw*0401G1:B*530101, with a frequency of 0.10178. Twenty four other C:B haplotypes were observed at frequencies greater than 0.01, and two of these at frequencies greater than 0.05. Of the 373 B:DRB1 haplotypes, 105 were found in three or more copies. The most common haplotype was B*4201:DRB1*030201, with a frequency of 0.03861. Thirteen other B:DRB1 haplotypes had frequencies greater than 0.01.
Table 6. HLA two-locus haplotypesa in 564 African American individuals identified in three or more individuals
LD measures of the strength of association between pairs of HLA loci showed, as previously reported (45), that B:C associations are stronger than the associations between other loci, but all pairwise associations are statistically significant (P < 0.0001) (Table 7). Table 6 shows the relative LD (D′ij) value for each two-locus haplotype. Values can range from −1.0 to 1.0. For the data reported here, values over 0.4 indicate strong positive associations between alleles in a haplotype and values below −0.75 indicate strong negative associations. Values close to zero reflect random associations of common alleles (linkage equilibrium). As expected, several C:B haplotypes have D′ij values of 1 (e.g. Cw*160101:B*520102 and Cw*1701G:B*4102); 46 haplotypes received scores over 0.4. A:B values begin at 0.88728 (A*3001:B*4202) with 13 haplotypes with scores over 0.4; and B:DRB1 values begin at 0.63285 (B*4201:DRB1*030201) with 9 haplotypes with scores over 0.4.
Table 7. Pairwise global linkage disequilibrium (LD) estimates
Of the 729 three-locus A:B:DRB1 haplotypes, 73 were found in three or more copies (data not shown). Of the 792 four-locus A:C:B:DRB1 haplotypes, 65 occurred in three or more copies with a combined frequency of 0.27914 (Table 8) and 634 occurred only once summing to a frequency of 0.5633 (data available on www.dodmarrow.org). Ten or more copies of the following four haplotypes were observed: A*3001:Cw*1701G:B*4201:DRB1*030201 (0.02174), A*3601:Cw*0401G1:B*530101:DRB1*110102 (0.01324), A*010101G:Cw*0701G1:B*0801G:DRB1*030101 (0.01321), and A*330301:Cw*0401G1:B*530101:DRB1*080401 (0.00969).
Table 8. HLA four-locus haplotypes in 564 African American individuals identified in three or more individuals
To estimate the degree of difference between this population, other African American populations, and African and European populations, Fst (genetic distance) values were calculated between all pairs of populations in a set of sub-Saharan African, European, African American, European American, Afro-Cuban and Euro-Cuban populations for HLA-A:B haplotypes and HLA-C and DRB1 genotypes, considering only variation at the amino acid level for exons 2 and 3 at the class I loci and for exon 2 at the DRB1 locus. Then, the mean pairwise Fst value was calculated within and between each group (sub-Saharan African, European, or African American) and between each African American/Afro-Cuban population and the set of sub-Saharan or European populations. We tested the null hypothesis of population identity (any two population datasets being drawn from the same population) between the African American and the Afro-Cuban populations by considering the P values for each pairwise Fst value, with a significant P value indicating a significant difference between population datasets. These data are summarized in Tables 9 and 10. For each comparison, the mean Fst value for the sub-Saharan African populations is greater than that for the European populations, consistent with previous observations of greater HLA diversity in the sub-Saharan African populations relative to the rest of the world. The mean Fst within the African American populations (for A:B haplotypes and HLA-C genotypes, DRB1 genotypes were not available for another African American population) was an order of magnitude lower than for either of the larger groups, indicating (as might be expected) that these African American populations are not nearly as highly differentiated from one another as the various populations of sub-Saharan Africa and Europe. In addition, the pairwise Fst P values for the African American populations were non-significant for A:B haplotypes, a finding consistent with these datasets being drawn from the same population. However, the African American population reported here and the population reported by Cao et al. (43) differed significantly at the HLA-C locus, and these differences could be because of identification of alleles not known at the time of the previous study (e.g. the recently described Cw*0210 may have been assigned as Cw*0202 in the previous study).
Table 9. Genetic distance as measured by mean pairwise Fst values between population groups
Table 10. Test for population identity
|AfAm1-AfAm2a||0.07207 ± 0.0326||0.01802 ± 0.0121|
|AfAm1-Afam3||0.18919 ± 0.0394|| |
|Afam2-AfAm3||0.30631 ± 0.0388|| |
In terms of individual genetic distances between the African American populations and the populations of sub-Saharan Africa and Europe, A:B haplotype comparisons indicate that the African American population described here is closer to sub-Saharan African populations (Fst = 0.00974) than either the African American population described by Cao et al. (0.00980) or the Afro-Cuban population described by Meyer et al. (0.01362) and is more distant from the European populations (0.03303) than the African American population described by Cao et al. (0.03027) or the Afro-Cuban population (0.01889). Given that these differences are marginal and these populations are not significantly different from one another, this pattern still suggests that the degree of European admixture in the African American population described is lower than that in the other two. While the pattern for the HLA-C comparisons, where the African American population described here is more distant from both the sub-Saharan African and the European populations (Fst = 0.01332 and 0.02026, respectively) than the African American population described by Cao et al. (0.01120 and 0.016524), is not identical to that for A:B haplotypes, a lower degree of European admixture, relative to the African American population described by Cao et al., can still be inferred for the population described here.
Some alleles in the African American populations (e.g. A*2407, B*1525, B*2706) were not observed in African populations (42). Some of these alleles were initially identified in American Indian and/or Asian populations or found uniquely in these populations (43) and may reflect admixture with this group (3–5). These individuals also may carry other alleles common to American Indian and/or Asian populations, for example one A*2407-positive individual also carries B*3505, which is first described in an American Indian (46), and DRB1*1202, which is more common in Asian populations (10, 12).
This study provides allele and haplotype frequency data from 564 African American individuals. The homozygosity values for the HLA-A, -B, -C, and -DRB1 loci in this African American population are consistent with those seen in other African American and sub-Saharan populations (as well as populations from the rest of the world) (11, 12, 22, 29, 47). For example, significantly low Fnd values were observed at the HLA-A and -C loci in an African American population (29, 43) and at subsets of the class I and DRB1 loci in a variety of the sub-Saharan African populations from Cameroon, Mali, Kenya, Zimbabwe, Ugandan, Zambian, and South Africa (22, 29, 42). In all these populations, strong evidence of balancing selection is seen at the HLA-A and -DRB1 loci, and weak balancing selection (perhaps modulated by other selective forces) can be inferred from negative Fnd values at the HLA-B and -C loci.
Balancing selection may result when evolving pathogens confer selective advantage to low-frequency alleles (frequency-dependent selection), when heterozygosity confers a selective advantage over homozygosity (overdominance), or when changing environmental conditions favor distinct phenotypes (environmental heterozygosity). Alternatively, balancing selection may be inferred erroneously when low-frequency alleles are not detected, skewing the allele frequency distribution in favor of more common alleles. Given that a number of novel alleles have been detected in this population at the HLA-A, -C, and -DRB1 loci, the genotyping method used appears to be sufficiently sensitive to rule out an erroneous inference of balancing selection.
The lack of any overall deviations from expected Hardy-Weinberg equilibrium proportions and the detection of selective forces similar to those seen in other African American and sub-Saharan African populations operating in this population suggest that this collection of subjects represents a valid subset of the African American population. This inference is further supported by the low degree of differentiation observed at the class I loci for this population relative to other African American and Afro-Caribbean population samples and the lack of significant differences between A:B haplotypes in these populations. Thus, this population may be useful for additional diversity studies and can serve as a basis for predicting allele and haplotype frequencies in the search for unrelated hematopoietic stem cell donors from this population.
The data presented in this study can be compared with a previous study of HLA-A, -B, and -C alleles in 252 African American individuals using probe-based testing (43). In that study, fewer alleles were detected at each locus. For example, at the HLA-A locus, Cao et al. identified 32 four-digit alleles. Of these alleles, one was observed once but was not observed in this study. In the current study, 40 four-digit HLA-A alleles were identified, and nine that had not been observed previously were observed once (six alleles) or twice (three alleles). Cao et al. did not detect HLA-B*5704 and Cw*0102, while these alleles were identified 8 and 12 times, respectively, in the current study. These differences are expected because of the increased sample size (564 vs 252) and the use of a higher resolution testing method (sequence-based typing vs sequence-specific oligonucleotide probes), detecting a greater number of known alleles. With the exception of B*5703 (allele frequency 0.04078 in this study vs 0.0040 in the study of Cao et al.) and Cw*0501 (0.0328 vs 0.0198), the same set of frequent (≥0.03) alleles were observed in both studies. A comparison of the 45 B:C haplotypes observed three or more times in the Cao et al. study with the 69 haplotypes observed in this study noted 36 haplotypes identified in both studies, although the relative frequencies differed. Nine haplotypes observed by Cao et al. were not observed. Most haplotypes were infrequent in the previous study.
An early study by Just et al. (7), using probe-based and restriction fragment testing, defined 31 four-digit DRB1 alleles in African American individuals from New York, while 45 were identified in this study. Just et al. also identified DRB1*0803, *0805, and *1103 alleles. Common alleles had similar frequencies with the exception of DRB1*1501 and *1503, which were present at frequencies of 0.160 and 0.006, respectively, in the Just et al. study, compared with 0.02482 and 0.12234 in this study.
The HLA-A, -B, -C, -DR, and -DQ assignments of African American families and unrelated individuals studied in an American Society for Histocompatibility and Immunogenetics (ASHI) minority antigens workshop were obtained at low resolution with higher resolution used to clarify haplotypes (8). About half of the 29 most common HLA-A, -B, and -DR haplotypes from the ASHI study were observed as common in this study. The previous study noted a significantly different geographic distribution of DR antigens in the population drawn from ten geographic regions of the United States. Such differences may be the basis for differences in the frequency of common haplotypes in the two studies.
Finally, Mori et al. (48) identified low-resolution antigen-level HLA-A, -B, and -DRB1 haplotypes from African American individuals. Nine of the ten most frequent haplotypes identified by Mori et al. were also frequent in this study (although not all these nine ranked in the top ten most frequent haplotypes identified in this study). The A2, B7, and DR2 (A*0201G1,B*070201,DRB1*150101) haplotype, also common in the Irish population (49), was not observed among the haplotypes found three or more times in this study but was observed occurring in two different individuals. Comparison with a European population from Northern Ireland of 1000 individuals tested by a probe-based method noted that four of the eight common three-locus haplotypes were found also in African American individuals, suggesting the admixture noted in other studies (3–5). Interestingly, one haplotype was identical at low resolution but differed for the alleles carried A*020101,B*440201,DRB1*150101 in Irish individuals compared with A*0201G1,B*440301,DRB1*1503 in African American individuals.
A comparison can also be made to HLA alleles and haplotypes present in the African populations (42). As expected, many but not all of the common alleles including many but not all of the ‘African’ alleles were found in African American populations. Shared common haplotypes were also observed.
Six new alleles identified in this study were not detected during intermediate resolution probe-based typing and represented 0.13% of the total alleles at the four loci. All of the amino acid substitutions fall at positions that do not appear to affect peptide binding (50). Four of the six carry differences in conserved regions, which explains the failure to detect these differences during probe-based typing. It is likely that changes in the usually conserved positions will be observed in random individuals.
Knowledge of allele and haplotype frequencies in human populations guides the search for an HLA-matched unrelated hematopoietic stem cell donor. A new algorithm developed by the National Marrow Donor Program uses this information to predict the likelihood that a volunteer of a particular ethnicity will carry specific HLA alleles when typed at higher resolution (51). The results of this and similar studies will enhance the information used by the algorithm to improve the predictions, facilitating donor selection and conserving patient resources.
This research is supported by funding from the Office of Naval Research N00014-04-1-0398 (CKH and JN) and the National Institutes of Health grant GM35326 (GT and AL). The views expressed in this article are those of the authors and do not reflect the official policy or position of the Department of the Navy, the Department of Defense, or the US government.