Immunogenetics as a tool in anthropological studies

Authors


Professor A. Sanchez-Mazas, Laboratory of Anthropology, Genetics and peopling history (AGP), Anthropology Unit, Department of Genetics and Evolution, University of Geneva, 12 rue Gustave-Revilliod, CH-1227 Geneva, Switzerland. Email: alicia.sanchez-mazas@unige.chSenior author: Professor Erik Thorsby,email: erik.thorsby@medisin.uio.no

Summary

The genes coding for the main molecules involved in the human immune system – immunoglobulins, human leucocyte antigen (HLA) molecules and killer-cell immunoglobulin-like receptors (KIR) – exhibit a very high level of polymorphism that reveals remarkable frequency variation in human populations. ‘Genetic marker’ (GM) allotypes located in the constant domains of IgG antibodies have been studied for over 40 years through serological typing, leading to the identification of a variety of GM haplotypes whose frequencies vary sharply from one geographic region to another. An impressive diversity of HLA alleles, which results in amino acid substitutions located in the antigen-binding region of HLA molecules, also varies greatly among populations. The KIR differ between individuals according to both gene content and allelic variation, and also display considerable population diversity. Whereas the molecular evolution of these polymorphisms has most likely been subject to natural selection, principally driven by host–pathogen interactions, their patterns of genetic variation worldwide show significant signals of human geographic expansion, demographic history and cultural diversification. As current developments in population genetic analysis and computer simulation improve our ability to discriminate among different – either stochastic or deterministic – forces acting on the genetic evolution of human populations, the study of these systems shows great promise for investigating both the peopling history of modern humans in the time since their common origin and human adaptation to past environmental (e.g. pathogenic) changes. Therefore, in addition to mitochondrial DNA, Y-chromosome, microsatellites, single nucleotide polymorphisms and other markers, immunogenetic polymorphisms represent essential and complementary tools for anthropological studies.

Introduction: from immunology to anthropology

More than a century has elapsed since the discovery of the ABO blood groups in 1900 by Karl Landsteiner through haemagglutination assays, (see ref. 1 for a review) an event that marked the starting point of immunogenetic studies applied to the analysis of genetic variation in humans. Other antigens of the red blood cells (together with allozymes, through electrophoretic techniques) were successively found and studied in human populations during the first half of the 20th century.2 Molecules that are instrumental in the immune responses of human beings also revealed inter-individual differences such as immunoglobulins, with the discovery of allotypic variation,3,4 and human leucocyte antigen (HLA) molecules,5 with the finding of an unexpectedly high degree of polymorphism at the level of their peptide-binding region (see http://www.ebi.ac.uk/imgt/hla/). Killer-cell immunoglobulin-like receptors (KIR) were also shown to exhibit a complex polymorphism where both the number of alleles and the number of genes may vary among individuals.6

Today, almost 350 severe pathogens are registered on a worldwide scale (Gideon online. Retrieved from http://www.gideononline.com on 20 December 2010) and many others have existed and are now extinct. Each year, seasonal epidemics of influenza remind us that the turnover of most viruses is very rapid. A high level of polymorphism in the genes coding for molecules involved in immune responses is therefore not surprising in light of our exposure to such a diversity of infectious agents, because we know that evolution may easily adapt the genetic pool of populations to specific environmental pressures through natural selection. For example, red blood cell antigens were found to act as receptors for a number of pathogens, (e.g. Plasmodium vivax, for FY, Plasmodium falciparum, for GPA, Toxoplasma gondii, for RH), and hence to play an important role in the susceptibility or resistance of our organism against specific diseases. In the case of FY, the null allele was positively selected in some geographic regions, but not in others, allowing red blood cells to escape P. vivax infection.7 Also, HLA allelic variation may have been maintained through heterozygote advantage, because we know that some HLA alleles are associated with resistance to several fatal diseases, one recent example being the association of HLA-B*27, HLA-B*51 and HLA-B*57 with improved prognosis of AIDS.8

On the other hand, since the very beginning of population genetic applications of such polymorphisms it was observed that their worldwide patterns of genetic diversity tended to exhibit a geographic structure. Population trees generally discriminated populations from different continents, the main controversy being the position of Africans, either segregating with Europeans within an ‘occidental group’ separated from an ‘oriental group’ of Asian, Amerindian and Oceanian populations,9 or segregating separately from the others.10 This observation indicates that natural selection was probably not the only mechanism at work in the evolution of these polymorphisms, but that their patterns of genetic diversity were also shaped by the history of human migrations; hence the increasing interest in using these immunogenetic systems as informative tools to reconstruct human peopling history. Now, after several decades during which researchers have accumulated population data for these polymorphisms and have analysed their variation at different geographic scales, we may ask whether such studies are indeed useful for anthropological research.

The present review summarizes our current knowledge of three major immunogenetic systems, GM, HLA and KIR, in relation to human population diversity studies. These three polymorphisms symbolize the past (GM), present (HLA) and future (KIR) of immunogenetic studies applied to anthropology, both because different typing technologies have been used to analyse their variability (serology for GM; both serology and molecular typing for HLA; and molecular typing for KIR), and because for each system, our understanding of its diversity in human populations is at a different stage (comprehensive for GM; still increasing for HLA; and just starting for KIR). On the other hand, because the three polymorphisms are encoded by independent regions of our genome, are expressed by different kinds of molecules, and are studied in different sets of populations, they provide complementary information for anthropological studies.

Human genetic diversity as revealed by studies of the GM polymorphism

GM polymorphism at the protein and DNA levels

GM allotypes

The GM immunogenetic system was first discovered by Grubb through human serum agglutination studies.3 This system is defined serologically by allotypic variation (allotypes) of the constant domains of the heavy chains of IgG1, IgG2 and IgG3 immunoglobulins. In the 1970s, a total of about 15 GM allotypes were known: G1M 1, G1M 2, G1M 3 and G1M 17 on IgG1; G2M 23 on IgG2; and G3M 5, G3M 6, G3M 10, G3M 11, G3M 13, G3M 14, G3M 15, G3M 16, G3M 21, G3M 24 on IgG3; as well as G1/3M 28, on either IgG1 or IgG3. Although a number of these allotypes were associated with precise substitutions at the DNA level, (see ref. 11 for a review) others were found to be (partly) conformational (i.e. defined by the tertiary structure of the IgG molecule). Therefore, DNA typing could not replace serology. Also, although monoclonal antibodies were used to type some allotypes, most of them could only be tested using specific reagents typically produced by multiple transfused blood donors. Today, as the stock of available reagents is almost depleted, the ‘GM story’ is coming to an end unless precise GM DNA typing becomes possible.

GM haplotypes

The GM haplotypes are combinations of GM allotypes of different IgG sub-classes. As a result of the close linkage, on the long arm of chromosome 14 (14q32.33), of the genes coding for the constant domains of the heavy chains of immunoglobulins (the IGCH genes), the genetic transmission of IgG allotypes occurs through GM haplotype blocks (recombinations occur but are rare). Table 1 lists the most frequent haplotypes found in human populations.4,12 Note, however, that the GM polymorphism was primarily analysed in the 1970s, and GM haplotypes have generally been deduced ‘by hand’ from GM phenotypes because of the absence, at that time, of accurate genotyping techniques and the lack of available frequency estimation programs accommodating ambiguities. Therefore, there has certainly been some bias towards an over-estimation of the frequency of the most frequent haplotypes found in human populations.

Table 1.   Most frequent GM haplotypes in 10 world regions based on the data compiled by Dugoujon et al. (2004)12
GM haplotypesRegion(s)
  1. 5* stands for 5,10,11,13,14.

  2. Bold type indicates a frequency higher than 10% in at least 90% of populations of the corresponding region.

  3. Regular type indicates a frequency higher than 10% in at least 50% of populations of the corresponding region.

  4. Brackets indicate a frequency higher than 10% in at least 20% of populations of the corresponding region.

  5. SSA, Sub-Saharan Africa; NAF, North Africa; EUR, Europe; SWA, Southwest Asia; SEA, Southeast Asia; NEA, Northeast Asia; OCE, Oceania; NCA, North and Central America; SAM, South America; CCA. Circum-Arctic.

1,17 21(SSA), NAF, EUR, SWA, NEA, CCA, NCA, SAM, SEA, OCE
1,2,17 21(EUR), SWA, NEA, (NCA), SAM, (SEA), (OCE)
3 (± 23) 5*(SSA), NAF, EUR, SWA
1,17 10,11,13,15,16NEA, CCA
1,3 (± 23) 5*(SWA), SEA, OCE
1,17 5*SSA, NAF, (SWA)
1,17 23 5*(OCE)
1,17 10,11,13,15(SSA)
1,17 5,6,11,24SSA
1,17 5,6,10,11,14(SSA)

GM diversity in modern human populations

GM haplotype frequencies worldwide

The GM haplotypes have proven to be very useful for anthropology. There are striking differences in GM haplotype frequencies among populations from different geographic areas.4,12 If the highest resolution level is omitted (i.e. haplotypic subdivisions on the basis of the presence/absence of allotypes G2M 23 and G1/3M 28, which have seldom been tested at the global level), very high frequencies are found for GM 3 5* (where 5* stands for 5,10,11,13,14) in Europeans, North Africans and Southwest Asians, GM 1,17 5* in sub-Saharan Africans and some North Africans, GM 1,3 5* in Southeast Asians and some Oceanian populations, GM 1,17 21 in Europeans, Northeast Asians, Amerindians and some Oceanian populations (and sometimes in other regions), and GM 1,2,17 21 in Northeast Asians and South Amerindians (Table 1). G2M 23 further subdivides haplotype GM 3 5* in two sub-haplotypes, GM 3 (–23) 5* and GM 3 23 5*, with variable frequencies in Europe. In sub-Saharan Africa, GM 1,17 5* (without G2M 23) seems to be predominant (as far as G2M 23 has been tested). In Asia GM 1,3 23 5* is the most frequent form. Some Papuan populations from New Guinea and Australian Aborigines exhibit haplotype GM 1,17 23 5*, thereby differing from GM 1,17 (-23) 5*, which is frequently found in Africa. Other haplotypes are principally found at regional levels, like GM 1,17 5,6,11,24, GM 1,17 5,6,10,11,14 and GM 1,17 10,11,13,15 in sub-Saharan Africa (the latter being frequent in the Khoisan), and GM 1,17 10,11,13,15,16 in Northeast Asian and Circum-Arctic populations. However, most haplotypes are found at low frequencies in different geographic regions. For example, GM 1,17 21 and GM 1,2,17 21 are universal (although with variable frequencies), and GM 1,17 5* is commonly observed in populations with different origins.

Patterns of GM genetic variation worldwide

As suggested above, some GM haplotypes exhibit their highest frequencies in the most outlying regions of each continent (e.g. western and southern sub-Saharan Africa, northwestern Europe, southeastern Asia), whereas genetic profiles with intermediate frequencies for several haplotypes are observed in central or connecting regions like East Africa and the Near-East. This suggests that human peopling history occurred in a centrifugal manner, i.e. from central to peripheral regions, with a loss of diversity through isolation by distance.12 This scenario is suggested by Fig. 1 (a multidimensional scaling analysis of 82 populations, data compiled in ref. 12) where a continuous pattern of genetic variation is clearly visible, and is fully compatible with the spread of modern humans towards different continents from a central region including East Africa and the Near East.

Figure 1.

 Multidimensional scaling analysis (MDS) of 82 populations from all continents tested for the GM polymorphism. Each point represents a population and its symbol its geographic region. Populations are arranged in a three-dimensional space that best reproduces the observed genetic distances. The orientations of the axes are arbitrary, and their scales are relative to each other. The fit to the original genetic distance matrix is measured by a Stress value (from 0.4 = poor to 0.0 = perfect). Data are from Dugoujon et al. (2004). Stress value = 0.158 (good).

GM genetic differentiations in Africa

Besides this general finding at the global level, the study of the GM polymorphism has brought significant results at regional levels. In Africa, linguistics is a better predictor of the GM genetic structure of populations than geography: variation of GM haplotypes is clearly observed among populations whose languages belong to different linguistic phyla of this continent; i.e. Afro-Asiatic (AA), Nilo-Saharan (NS), Niger-Congo (NC) and Khoisan (KH).13–15 It is therefore likely that the spread of populations speaking languages from each of these families had a significant impact on the patterns of GM genetic variation in Africa. In particular, the demographic and geographic expansion of the NC-speaking Bantu started in a region located between present Nigeria and Cameroon and expanded southward during the last 3000 years. Bantu people may have ‘pushed’ KH populations further south compared with the large area previously occupied by the KH populations, which extended from northeast to southern Africa. Despite documented gene flow between Bantu and KH populations, the genetic profiles (here, for the GM polymorphism) observed in KH show that they retained an ancient genetic diversification. Interestingly, KH populations exhibit moderate frequencies for one haplotype, GM 1,17 21, which is frequent in East Africa but rarely found elsewhere in sub-Saharan Africa, indicating that KH and East African populations share ancient relationships. The other African linguistic groups also exhibit a genetic profile compatible with linguistic classification: West Africans, whose languages belong, like Bantu, to the NC family, are genetically similar to Bantu, with very high GM 1,17 5* frequencies; also, AA populations from East Africa exhibit higher frequencies of GM 1,17 21 and GM 3 5* than other sub-Saharan African populations, which makes them closer than the other groups to populations from AA-speaking populations from North Africa and the Near-East. It is also relevant that AA are the most diversified populations from Africa (in terms of internal diversity estimated by heterozygosity), indicating that they may have undergone an ancient demographic expansion in the past, as suggested on the basis of mitochondrial DNA (mtDNA) analyses.16 Finally, the few NS populations that have been tested for GM are almost monomorphic for haplotype GM 1,17 5*. This represents an extreme differentiation compared with NC, which is explainable by rapid genetic drift through isolation. Actually, NS populations are spread discontinuously over a vast geographic area extending from East (Ethiopia) to West (Mali) Africa throughout the Sahara Desert, and may have been submitted to repeated episodes of demographic contraction and gene flow with local neighbours, depending on climatic variation, which extensively modified the environments.

GM genetic differentiations in East Asia

Variation of GM has also been highly informative for anthropological studies in East Asia. A north–south genetic cline is clearly observed, with high frequencies of GM 1,17 21 and GM 1,2,17 21 and low frequencies of GM 1,3 5* in the north, the reverse situation being observed in the south. Here again, the linguistic information is relevant: we observe continuous genetic differentiations between (from one end of the cline to the other) Altaic, Japanese and Korean; North Tibeto-Burman; Northern Chinese (all Mandarin but Southeastern); Wu and Southeastern Mandarin; Southern Chinese and Southern Tibeto-Burman; and Austro-Asiatic, Tai-Kadai and Austronesian populations.17,18 However, contrary to the situation found in Africa, in East Asia the linguistic families are found in specific geographic areas and it is hard to establish whether the observed genetic patterns have mostly been shaped by linguistic or by geographic differentiations in the past. As discussed in more detail below for the HLA polymorphism, GM genetic variation is compatible with the ‘pincer’ model of migrations from West Asia, suggesting that some populations followed a southern (maybe coastal) route through India to Southeast Asia, and others a route north to the Himalaya Mountains to Northeast Asia (although at a different period), both groups of populations later intermixing through north–south migrations in East Asia. As for HLA, a higher level of internal diversity (higher heterozygosity) is observed in Northeast Asia compared with Southeast Asia, indicating higher levels of gene flow, whereas Southeast Asian populations may have undergone rapid differentiation through genetic drift.19

Another crucial example pertains to the peopling history of Taiwan. In a previous study, we investigated the GM polymorphism of several Aboriginal populations from this island (Siraya, Pazeh, Taroko, Atayal, Tsou, Bunun and Puyuma, as well as Yami located in Lan-Yu island off the southeastern coast of Taiwan).20 We found a decrease in heterozygosity from (north)western to southern and southeastern regions (with a higher frequency of GM 1,3 (–23) 5* in the west, whereas GM 1,3 23 5* is (almost) fixed in the south and/or southeast). This is congruent with the linguistic hypothesis of Taiwanese peopling history proposed by Sagart (including archaeological evidence),21 where Neolithic farmers cultivating both rice and millet settled in Taiwan some 5500 years ago and spread to the south and southeast along the coast.

Evolutionary interpretation: human migrations history versus selection

As described above, one remarkable result of the analysis of the GM polymorphism is the observation of abrupt frequency changes between different continental areas worldwide. By subdividing the world into 10 continental or sub-continental regions (sub-Saharan Africa, North Africa, Europe, West Asia, Northeast Asia, Southeast Asia, Oceania, Circum-Arctic, North and Central America, and South America), we found a proportion of genetic diversity due to differences among regions of about 39%.12 This is much higher than generally found (albeit based on a different subdivision of the world and different numbers of groups) for allozymes and DNA markers, of the order of 10–15%,22–24 and 3–7% for most HLA loci.25 Extreme values (up to 88%) of human genetic diversity among the main geographic regions have only been found for strongly selected biological traits like skin pigmentation, whereas craniometric traits also fall within the range of neutrally evolving genetic markers.26,27 We may ask ourselves whether, because of the immunological function of IgG molecules expressing GM allotypes, the GM polymorphism is subject to some kind of (directional) selection. Indeed, some studies have suggested that GM haplotypes were involved in susceptibilities to autoimmune diseases (see ref. 28,29 for a review) and infectious diseases like malaria30–32 or filariasis.33 However, conclusive evidence for disease associations has not been found. Moreover, we did not detect any departure from selective neutrality by using Ewens–Watterson’s tests (with Bonferroni’s correction) on 82 populations tested for GM worldwide.12 Therefore, our explanation of the unusual apportionment of genetic diversity observed for the GM polymorphism is, first, that this system has been tested by serological typing, thereby providing only a broad description of its molecular variation, and, second, as explained above, that the frequencies of the most frequent haplotypes in each geographic region are over-estimated because most GM frequencies were estimated by following a parsimonious approach considering a minimum number of haplotypes deduced ‘by hand’ from the phenotypic distributions. As a consequence, the proportion of genetic variation observed among regions has probably also been over-estimated. On the other hand, the most frequent GM haplotypes defined by serology may be seen as broad GM haplogroups including phylogenetically related haplotypes, an interpretation that is sustained by previous analyses performed at the DNA sequence level34 and that recalls the definition of Y-chromosome (non-recombining region, or NRY) haplogroups.35 The Y-chromosome markers deviate from other DNA markers in being, like GM, highly structured at the global scale: according to Hammer et al.,36 36% of the total variance of NRY haplotypes is attributable to differences among populations, although the results vary considerably depending on the markers studied.

Overall, despite the limitations as the result of serology and the possibility of natural selection acting on this system, the analysis of the GM polymorphism has been very useful in revealing the effects of both geographic and cultural differentiations on the genetic structure of modern human populations, and has provided noteworthy examples of the usefulness of this immunogenetic complex for anthropology.

Human genetic diversity as revealed by studies of the HLA polymorphism

Structure and function of HLA molecules

The HLA molecules are peptide-binding molecules encoded by genes in the HLA complex on chromosome 6 (see ref. 37 for a review). They are divided into two classes, class I and class II, which both present peptide fragments of antigens to T cells. Some class I molecules also interact with natural killer (NK) cells. The HLA class I molecules consist of a polymorphic α heavy chain that is non-covalently bound to a small non-polymorphic β chain (β2m, encoded by a gene on chromosome 15). The α chain includes three extracellular domains, two of which (α1 and α2) form a peptide-binding cleft. The classical HLA class I molecules encompass the A, B and C series of molecules, encoded by three different corresponding α chain loci. They are extremely polymorphic (see next section) and expressed in almost all nucleated cells. They bind short peptide fragments (8–10 amino acids long) derived primarily from endogenous proteins and present them at the cell membrane. Here CD8+ T cells with appropriate T-cell receptors will interact with the peptide–HLA complex. Some class I molecules also interact with NK cells. The non-classical HLA class I molecules encompass the E, F and G molecules, which are much less polymorphic and which primarily function as ligands for NK cells. Two HLA class 1 α-related chains, MICA and MICB, are polymorphic but do not have a peptide-binding cleft nor do they bind β2m. They are stress molecules that are up-regulated under certain conditions and function as ligands for the NKG2D activating receptor on NK cells.

The HLA class II molecules consist of two heavy chains, α and β, which both include two extracellular domains. Their peptide-binding cleft is formed by their α1 and β1 domains. The class II molecules encompass the DR, DQ and DP series of molecules, encoded by corresponding α and β chain loci in the HLA complex. The DRβ, DQα, DQβ, DPα and DPβ chains are extremely polymorphic (see next section), whereas the DRα chain is essentially monomorphic. Four different DRβ chains are expressed; DRβ1, DRβ3, DRβ4 and DRβ5. The class II molecules are expressed in specialized antigen-presenting cells such as dendritic cells, where they pick up longer peptide fragments (8–15 amino acids long) primarily from endocytosed exogenous proteins and present them at the cell membrane. Here CD4+ T cells with appropriate T-cell receptors will interact with the peptide–HLA complex.

HLA polymorphism at the protein and DNA levels

The 4-Mb DNA region of the short arm of chromosome 6 (6p21.3) coding for the HLA class I and class II molecules is one of the most polymorphic regions of our genome.38 Regular updates of the numbers of alleles observed at each HLA locus (current numbers are given in Table 2) are recorded in the IMGT/HLA database (http://www.ebi.ac.uk/imgt/hla/), which also provides DNA and amino acid sequences and alignments of HLA alleles and molecules, and nomenclature information.39 This nomenclature has recently been modified substantially according to the allele naming system shown in Fig. 2.

Table 2.   Relevant characteristics of the most polymorphic human leucocyte antigen (HLA) loci
GeneABCDRB1DQA1DQB1DPA1DPB1
  1. 1Last updated 6 October 2010.

  2. 2Estimated on exons 2 and 3 for Class I and exon 2 for Class II, IMGT database release 2·13 (Buhler 2007).40

  3. n.a., not available.

Number of alleles1138119279608313512728142
Mean number of pairwise nucleotidic differences222·23
± 9·81
26·34
± 11·56
15·83
± 7·09
17·77
± 7·9
17·59
± 8·01
12·11
± 5·53
n.a.9·69
± 4·47
Figure 2.

 New HLA nomenclature system adopted in 2010 (Figure kindly provided by S.G.E. Marsh, HLA Informatics Group, http://hla.alleles.org/nomenclature/naming.html). Note that the gene symbols are written in italics, in contrast to the gene products which are not italicized.

The high level of diversity found at the HLA loci is principally located in exons 2 and 3 for class I genes, and in exon 2 for class II genes. Such exons correspond, at the protein level, to the peptide-binding region (PBR) of the HLA molecules. The mean pairwise DNA sequence differences between HLA alleles are between ∼ 10 and 26 nucleotides, depending on the locus (Table 2 and ref. 40), suggesting a functional relevance. Analysis of the amino acid sequence of HLA molecules shows that allelic variants differ from each other mainly by substitutions in residues contributing to the PBR, in particular in some pockets in the PBR that accommodate side chains of the bound peptides. Hence, peptides eluted from different HLA class I molecules show distinctive amino acid patterns at certain positions, in particular corresponding to pockets 2 and 9 of the HLA molecules.41 It is therefore assumed that the polymorphism of HLA alleles is to a large extent functional because different HLA molecules bind different sets of peptides. A high sequence diversity is therefore required in the PBR of the HLA molecules to bind a high variety of pathogen-derived peptides that are subsequently presented to T-cell receptors.

The distribution of HLA alleles in different populations may be a consequence of this functional polymorphism. In many instances the immune response to a particular peptide epitope of a pathogen may depend on the HLA alleles carried by the individual. Individuals heterozygous for HLA alleles may have a wider peptide binding repertoire and therefore a capability to respond to more pathogen variants, causing selection of heterozygotes. On the other hand, the existence of several different loci both within the class I (A, B and C) and II series (DR, DQ and DP) of molecules may to some extent compensate for the deficits of homozygosity.

It should also be noted that there exists a very strong linkage disequilibrium (LD), or non-random association, between HLA alleles at different loci; i.e. some HLA alleles are found together in populations more frequently than expected based on their gene frequencies. For example some alleles of the DRB1 locus demonstrate strong LD with specific alleles at the DQA1 and DQB1 loci. Furthermore, in many populations HLA alleles at one locus with high sequence homology, i.e. DRB1, are in LD with the same alleles at other loci, i.e. DQA1 and DQB1, which may indicate an evolutionary relationship between some alleles, i.e. DRB1.42,43 One explanation for differences in haplotype combinations of HLA alleles between different populations is the complementary/compensatory abilities of allelic products encoded by the haplotype to bind peptide epitopes from different pathogens, another being past population differentiations or recent admixture of populations.

HLA diversity in modern human populations

Allelic diversity

Although the number of known HLA alleles increases from year to year, now reaching almost 2000 alleles at HLA-B (Table 2), only part of this polymorphism is detected in individual populations because of typing and statistical limitations (i.e. variable levels of typing resolution, and generally low sample sizes). However, most human populations exhibit a high level of HLA diversity. Table 3 summarizes data on the variation in the number of classical HLA alleles according to two independent studies. For most loci (except genes coding for the α chains of class II molecules, which are less polymorphic), between 10 and 30 alleles are observed per population, the largest number being observed at HLA-B (mean ∼ 30–32). With the exception of the DPB1 locus, and populations that underwent rapid genetic drift (see below), HLA alleles generally exhibit low to medium frequencies, and many of them are very rare (and hence, rarely detected). Actually, 60–70% of known classical HLA alleles have only been reported up to three times,44,45 suggesting that new allele variants are being generated on a regular and ongoing basis.

Table 3.   Variation in the number of detected alleles (k) and observed heterozygosity (H) at each classical human leucocyte antigen (HLA) locus (ordered by decreasing k) in two independent studies
LocusBuhler (2007)1Solberg et al. (2008)2
nMean kSDMean HSDnMin kMax kMean kMean HSD
  1. 1Data compiled from the 12th and 13th International HLA workshops (IHW) and the immunogenetic literature.40

  2. 2Data compiled from the 12th and 13th International HLA workshops (IHW), the allelefrequencies.net database, and the immunogenetic literature.49

  3. n, number of populations; k, number of alleles reported per population; H, observed heterozygosity; SD, standard deviation.

HLA-B9031·618·90·9080·065146514229·80·8940·075
HLA-DRB110622·613·40·8730·082258511721·90·870·075
HLA-A9719·711·60·8170·14615448218·10·8020·139
HLA-C6917·17·50·8620·07111453716·30·8530·079
HLA-DPB15613·770·7240·16714723312·40·7070·159
HLA-DQB18912·24·30·7960·11220231911·10·7930·103
HLA-DQA1587·42·20·750·111191386·70·7470·088
HLA-DPA145273·60·4420·183

Heterozygosity levels and selective neutrality

For most HLA loci, allele frequency distributions are usually even (except in some cases), and populations achieve very high heterozygosity levels. This is reflected by the elevated mean heterozygosity values found at each locus (Table 3), the highest value being observed for HLA-B (∼ 91%). Actually, with the exception of HLA-DPB1, heterozygosity levels are often higher than expected for populations undergoing neutral evolution (i.e. only submitted to stochastic factors linked to the history of human populations, like genetic drift and migration),46–50 which is consistent with the action of natural selection favouring heterozygosis. This hypothesis is also confirmed at the molecular level: at all classical HLA loci except DPB1 (and, to a lesser extent, DQB1), most alleles observed within populations are distantly related from a molecular point of view, with often more than 20 diverging nucleotides among their DNA sequences at exon 2 (and exon 3, for HLA class I).51 These HLA loci may therefore be experiencing asymmetric balancing selection where heterozygous genotypes having molecularly distant alleles would have a higher fitness than heterozygous genotypes exhibiting closely related alleles.51 By contrast, classical selective neutrality tests (e.g. Ewens–Watterson tests) performed at the DPB1 locus generally indicate a neutral model of evolution. However, Tajima’s tests for selective neutrality using the information of HLA nucleotide sequences are significant in more than 60% of populations,51 which is less than for the other loci (81–97%), but high enough to indicate selection. Also, the ratio of silent to replacement substitutions in DPB1 sequences is consistent with selection for heterozygosis.52,53 A possible explanation of these results is that HLA-DPB1 would have retained ancient traces of balancing selection at the DNA level,51 although it presently evolves under neutrality.

Global apportionments of genetic variation

As for most genetic polymorphisms tested, the highest level of HLA genetic diversity is found within populations rather than between populations: on average, over several HLA loci, estimated genetic variation within populations, between populations within broad continental regions, and between broad continental regions are 89·9%, 4·4% and 5·7%, respectively, when seven regions and five loci (HLA-A, -B, -C, -DRB1, and -DQB1) are considered46 and are 89·4%, 5·1% and 5·5%, respectively, when five regions and seven loci (HLA-A, -B, -C, -DRB1, -DQA1, -DQB1 and -DPB1) are considered.25 Overall, the average diversity within populations of the classical HLA loci is higher than the value of ∼ 85% often cited for neutral genetic markers22,24 except for HLA-DPB1 (84%),25 which matches other evidence of neutrality (mentioned above) for this locus.

Trends in HLA frequency variation worldwide and at different loci

Solberg et al. (2008)49 have collected detailed data on the HLA diversity in different populations worldwide (but see also http://www.allelefrequencies.net/). Table 4 lists the four most frequent (FMF) alleles at each of the classical HLA loci in 10 regions of the world, along with the cumulative frequency for those alleles (CAF) in each region. This table also includes an ‘other’ region (OTH) with admixed populations derived from more than one region.

Table 4.   Four most frequent classical human leucocyte antigen (HLA) alleles and their cumulative frequencies in 10 world regions
RegionHLA-AHLA-BHLA-CHLA-DRB1HLA-DQA1HLA-DQB1HLA-DPA1HLA-DPB1
AllelesCAFAllelesCAFAllelesCAFAllelesCAFAllelesCAFAllelesCAFAllelesCAFAllelesCAF
  1. SSA, Sub-Saharan Africa; NAF, North Africa; EUR, Europe; SEA, Southeast Asia; SWA, Southwest Asia; OCE, Oceania; AUS, Australia; NEA, Northeast Asia; NAM, North America; SAM, South America; OTH, Other (admixed, or derived from more than one region). The populations included in each region are indigenous to that region or migrated recently (within the last ∼ 500 years) from that region. For example, EUR includes European Americans and European Australians, NAM and SAM are specific for Native American populations, and AUS is specific for Aboriginal Australian populations. CAF, Cumulative Allele Frequency.

  2. For each locus and region, the alleles are listed in decreasing order of frequency. For example at the HLA-A locus in SSA, A*23:01 is the most frequent allele, followed by A*02:01:01G. Empty cells indicate that no typing data were available for that locus–region combination.

  3. Because of differences in the resolution of typing between studies, some alleles were combined to a common category for comparison. The following four most frequent (FMF) alleles present in specific regions of the world represent multiple alleles that are identical in their exon 2 nucleotide sequences (for class II alleles) or exon 2 and 3 nucleotide sequences (for class I alleles) and are indicated as G groups, or that were combined to a common category because of typing ambiguity and are denoted with an asterisk (*).

  4. A*01:01:01G represents A*01:01 and A*01:04N in EUR.

  5. A*02:01:01G represents A*02:01 and A*02:09, in SSA, EUR and SWA.

  6. A*24:02:01G represents A*24:02, A*24:09 and A*24:11N in EUR, SWA and OCE

  7. B*15:02+ represents B*15:02 and B*15:13 in SEA.

  8. DRB1*15:01+ represents DRB1*15:01 and DRB1*15:06 in EUR.

  9. DQA1*01:01:01G represents DQA1*01:01, DQA1*01:04 and DQA1*01:05 in SSA, EUR, SEA, OCE, NAM, SAM and OTH.

  10. DQA1*03:01:01G represents DQA1*03:01, DQA1*03:02 and DQA1*03:03 in SSA, NAF, EUR, SWA, SEA, OCE, NEA, NAM, SAM and OTH.

  11. DQA1*05:01:01G represents DQA1*05:01, DQA1*05:02, DQA1*05:03 and DQA1*05:05 SSA, NAF, EUR, NEA, SEA, and OTH.

  12. DQB1*02:01:01G represents DQB1*02:01 and DQB1*02:02 in SSA, NAF, EUR, SWA, SEA, and OTH.

  13. DQB1*03:01:01G represents DQB1*03:01 and DQB1*03:09 in SEA, NEA and OTH.

  14. DQB1*06:02+ represents DQB1*06:02 and DQB1*06:11 in SSA and OTH.

  15. DPB1*02:01+ represents DPB1*02:01 and DPB1*48:01 in EUR, SEA and OCE.

  16. DPB1*04:01+ represents DPB1*04:01, DPB1*23:01 and DPB1*39:01 in NAF, EUR,SWA, OCE, NEA, NAM, SAM, and OTH.

  17. DPB1*04:02+ represents DPB1*04:02, DPB1*49:01 and DPB1*51:01 in SSA, EUR, SWA, and SEA.

SSA23:01, 02:01:01G, 30:02, 68:020·3753:01, 15:03, 42:01, 58:020·3006:02, 04:01, 07:01, 02:020·5115:03, 13:02, 11:01, 07:010·4001:02, 05:01:01G, 01:01:01G, 03:01:01G0·8006:02+, 05:01, 02:01:01G, 03:010·7502:01, 01:03, 03:01, 02:020·9801:01, 04:02+, 02:01, 17:010·69
NAF02:01, 01:01, 23:01, 24:020·4350:01, 44:03, 08:01, 51:010·2706:02, 07:01, 04:01, 02:020·5307:01, 03:01, 15:01, 13:020·4505:01:01G, 02:01, 01:02, 03:01:01G0·8002:01:01G, 03:01, 05:01, 06:020·70  04:01+, 03:01, 02:01, 04:020·68
EUR02:01:01G, 01:01:01G, 03:01, 24:02:01G0·6307:02, 08:01, 44:02, 51:010·3607:01, 07:02, 04:01, 05:010·5207:01, 03:01, 15:01+, 01:010·4605:01:01G, 01:02, 03:01:01G, 01:01:01G0·7902:01:01G, 03:01, 05:01, 06:020·6501:03, 02:01, 02:02, 01:051·0004:01+, 02:01+, 04:02+, 03:010·79
SWA11:01, 24:02:01G, 02:01:01G, 01:010·4340:06, 51:01, 35:01, 58:010·2804:01, 06:02, 07:01, 15:020·4407:01, 11:01, 03:01, 15:010·3905:01, 03:01:01G, 02:01, 01:030·7102:01:01G, 03:01, 06:01, 03:020·6201:03, 02:01, 02:02, 01:040·9904:01+, 02:01, 04:02+, 01:010·81
SEA24:02, 11:01, 02:01, 02:070·6640:01, 46:01, 13:01, 15:02+0·3807:02, 03:04, 01:02, 08:010·6012:02, 14:01, 09:01, 08:030·4103:01:01G, 01:02, 01:01:01G, 05:01:01G0·7403:01:01G, 05:02, 03:03, 02:01:01G0·5902:01, 01:03, 02:02, 03:011·0005:01, 02:01+, 13:01, 04:02+0·66
OCE24:02:01G, 11:01, 34:01, 24:070·8656:01, 40:02, 40:01, 13:010·4504:03, 07:02, 01:02, 04:010·5815:02, 11:01, 08:03, 15:010·4901:02, 05:01, 01:01:01G, 03:01:01G0·8703:01, 06:01, 05:02, 05:030·6502:02, 01:03, 02:01, 04:011·0005:01, 04:01+, 02:01+, 01:010·82
AUS34:01, 24:02, 11:01, 02:010·9213:01, 40:02, 56:01, 15:210·6704:01, 01:02, 15:02, 04:030·8108:03, 14:08, 04:05, 15:020·5701:03, 03:01, 05:01, 01:010·8606:01, 04:02, 05:03, 03:010·81  05:01, 22:01, 02:01, 04:010·93
NEA24:02, 02:01, 11:01, 33:030·6051:01, 15:01, 35:01, 52:010·2901:02, 03:03, 07:02, 03:040·5009:01, 04:05, 15:01, 08:030·3603:01:01G, 05:01:01G, 01:02, 01:030·7703:01:01G, 03:03, 06:01, 03:020·5401:03, 02:02, 02:01, 01:041·0005:01, 02:01, 04:01+, 04:020·78
NAM24:02, 02:01, 02:06, 31:010·7735:01, 40:02, 51:01, 27:050·4803:04, 07:02, 04:01, 02:020·5614:02, 08:02, 04:07, 04:110·4803:01:01G, 05:01, 04:01, 01:01:01G0·9703:01, 03:02, 04:02, 03:030·9101:03, 02:01, 02:02, 04:011·0004:02, 04:01+, 02:01, 03:010·94
SAM31:01, 02:01, 24:02, 68:010·7439:05, 40:02, 48:03, 39:090·3107:02, 03:04, 04:01, 15:030·8408:02, 16:02, 14:02, 04:110·6205:01, 03:01:01G, 04:01, 01:01:01G0·9603:01, 03:02, 04:02, 05:010·9201:03, 02:01, 02:021·0004:02, 14:01, 04:01+, 13:010·90
OTH02:01, 03:01, 23:01, 24:020·3953:01, 35:01, 07:02, 44:030·2604:01, 07:01, 07:02, 06:020·4707:01, 03:01, 11:01, 08:020·2705:01:01G, 01:02, 03:01:01G, 01:01:01G0·7303:01:01G, 02:01:02G, 03:02, 06:02+0·6402:01, 02:02, 01:03, 03:010·9901:01, 04:02, 02:01, 04:01+0·67

Only a few of the FMF HLA-B alleles (e.g. B*40:02, or *51:01G) are shared across regions. The low CAF of these alleles, which represent 50% or less of the allelic diversity in each region [with the exception of Australia (AUS)], reflects the high level of polymorphism at this locus, and this pattern suggests that HLA-B is extremely responsive to local variation in immune challenges. This is consistent with the highest proportion (96·7%), compared with the other loci, of statistical deviations from neutrality as assessed by Tajima’s tests51 of HLA-B, and also with other types of studies suggesting that this locus is under the strongest selection for heterozygous advantage.54,55 This extreme diversity may explain why, as the result of statistical limitations (e.g. mean sample size of only 127·1 ± 138·4 individuals in 90 populations analysed by Buhler and Sanchez-Mazas,51 compared with the large number of existing HLA-B alleles), the occurrence of rare HLA-B alleles is very heterogeneous among geographic regions and may give the impression that large numbers of regionally restricted alleles exist in all regions. South Amerindians however, carry some HLA-B alleles that are not detected (i.e. are rare or absent) elsewhere and that constitute a rather high proportion (∼ 15%) of the total allelic frequency, which probably results from a combination of genetic drift and natural selection in the colonization of South America.56–60

In contrast to HLA-B, some HLA-A, -C, and -DRB1 alleles are common over very large areas of the world, whereas others enjoy high frequencies only in specific regions. For example, the HLA-A*23:01 allele is one of the FMF alleles in African [Sub-Saharan Africa (SSA) and North Africa (NAF)] populations, but not in other populations, while A*02:01/*02:01:01G is one of the FMF alleles in all regions but Oceania (OCE), where it is ranked fifth (data not shown). Similarly, HLA-C*07:01G is one of the FMF alleles in Africa, Europe (EUR), and Southwest Asia (SWA), while *07:02G is one of the FMF alleles in EUR, Southeast Asia (SEA), OCE, Northeast Asia (NEA), and the Americas [North America (NAM) and South America (SAM)]. At the DRB1 locus, DRB1*11:01 is one of the FMF alleles in SSA, SWA and OCE, and *15:01 is one of the FMF alleles in NAF, EUR, SWA, OCE and NEA. Based on their CAFs, the FMF alleles at these loci represent 40–70% of the allelic diversity in each region.

Patterns of allelic diversity at the class I and DRB1 loci differ considerably from those at DQA1, DQB1, DPA1 and DPB1. At the latter loci, a small number of alleles are observed at high frequencies all over the world (resulting in most cases, at least for DPB1, in ‘L-shaped’ rather than even frequency distributions). The DQA1*03:01/*03:01:01G and *05:01/*05:01:01G alleles are two of the FMF alleles in all regions; the DQB1*0301/*03:01:01G allele is one of the FMF alleles in all regions; DPA1*01:03, *02:01, and *02:02 are three of the FMF alleles in all surveyed regions (and are the only DPA1 alleles observed in SAM); and the DPB1*04:01 and *04:02G alleles are one or two of the FMF alleles in all regions. Moreover, based on their CAFs, the FMF alleles at these loci represent 60–90% of the allelic diversity in each region.

The trends observed for the DQ and DP loci contrast markedly with those for the DRB1 locus, and the differences may reflect divergent strategies of class II allelic diversification. Although there is low diversity in the genes that encode the α and β subunits of the DQ and DP proteins, a population may display greater diversity of heterodimeric DQ and DP proteins than DR proteins because the DQ and DP heterodimers may be encoded both in the cis and the trans positions of their genes (although for DQA1 and DQB1, particular combinations form unstable dimers61,62). As there is much less variation of the DRA gene, this may be driving DRB1 to diversify in a manner more similar to the class I loci.

Patterns of HLA genetic variation worldwide

Despite evidence of natural selection acting on the evolution of the HLA polymorphism, as discussed above, this immunogenetic system is highly informative for anthropological studies, as the patterns of HLA genetic variation reveal spatial and demographic human populations expansions that occurred in the past. At the global level, genetic distances between populations estimated from frequency data for all HLA loci are highly significantly correlated with geographic distances, these correlations reaching intermediate to high values (r ≅ 0·2–0·5).40 This is also true when populations from Taiwan (TW), OCE, and NAM and SAM, which exhibit a very high degree of diversification probably because of rapid genetic drift, are excluded. Significant correlations with geography are also obtained at the global scale when genetic distances are estimated by weighting them by the molecular distances (i.e. the nucleotide differences) among the alleles.51 This result is therefore robust and leads to the conclusion that human migrations were a primary force in the evolution of HLA variation worldwide, in addition to demographic expansions (contributing to allelic diversification) and contractions (contributing to population diversification).

HLA genetic differentiations in Africa

Genetic signatures of the history of modern humans are even more detectable when one focuses on the HLA genetic patterns within specific continental areas. The following examples are illustrative. In Africa, linguistic differentiations among populations speaking languages of each of the four main African linguistic phyla – Niger-Congo (NC), Nilo-Saharan (NS), Afro-Asiatic (AA) and Khoisan (KH) – are excellent predictors of HLA genetic differentiations: according to a recent analysis of HLA-DRB1 variation in Africa,63 AA populations from Ethiopia (i.e. Amhara and Oromo, which exhibit a very high frequency of DRB1*13:02, but also elevated *07:01 and *03:01 frequencies) cluster with AA populations from North Africa, whereas the Nyangatom, a NS population, also from Ethiopia, show a peculiar genetic profile and share some similarities (high frequencies of *11:01) with NC, the latter being further differentiated into West Africans (high frequencies of *13:04) and Central-South Africans (high frequencies of *15:03). Therefore, although the HLA genetic patterns of African populations appear to be geographically structured according to South, West, East and North differentiations,64 a close relationship is also found for the DRB1 locus between genetic and linguistic variation in Africa. This confirms the conclusions drawn from the study of other genetic markers like GM (as described in an earlier section), RH and the Y chromosome:13,14,65 at least for these polymorphisms, present African genetic patterns are mostly explained by recent migrations (i.e. within the last ∼ 15 000 years) corresponding to the expansion of the main linguistic families in this continent.

HLA genetic differentiations in Europe

At loci HLA-C and -DRB1 (and this is also the case for GM, as stated above), the HLA genetic structure of Europeans reveals marked variation between West-Central and North populations, on one hand, and Southeast populations, on the other (with elevated frequencies of DRB1*11:04, DRB1*11:01 and C*04:01 compared with the other regions), a sharp genetic boundary being detected approximately at the level of the Alps.66,67 This may be the consequence of reduced levels of gene flow across these mountains during the peopling history of this continent, most probably during the Neolithic where demic diffusion occurred and probably followed a punctuated or geographically non-homogeneous rather than continuous mode of population migration from the Near East.68–71 The HLA genetic map of Europe is also characterized by an extreme differentiation of some populations, like the Norwegian Sami (high cumulated frequencies of A*03:01G, B*27:05G, C*01:02, DRB1*08:01 and DQB1*04:02), which are more closely related, genetically, to the Finnish population speaking a language of the same Uralic family (non Indo-European) than to other Norwegians.72 On the other hand, Basques, a cultural and linguistic isolate in Northwest Spain, only exhibit slightly different HLA frequencies compared with Indo-European populations,73,74 which is consistent with genome-wide scale analyses.75

HLA genetic differentiations in East Asia and Oceania

In East Asia, latitudinal genetic clines are observed at all classical HLA loci, with higher levels of internal genetic diversity in Northeastern than in Southeastern populations.19 Uneven distributions of some HLA alleles and allelic lineages are also found between Northeast and Southeast Asian populations, with a restricted geographic distribution of some alleles detected in the south (HLA-A*02:03, *02:07, *11:02, B*13:01, *15:02, *38:02, *46:01, C*04:03, DPB1**21:01, DRB1*12:02, *13:12, *14:04), whereas many alleles observed in the north are more globally distributed.19 These results challenge current views sustaining a unique origin of East Asian populations in Southeast Asia (e.g. ref. 76), as they are more compatible with an overlapping model (comparable to the ‘pincer model’ proposed by Ding et al.77) suggesting that modern humans arrived in East Asia from the west through both a northern and a southern route, and after that underwent substantial gene flow by migrating both northward from the south and southward from the north, but at different periods, in East Asia.19

Some results are also relevant for Oceania. For example, HLA-DRB1 data confirm some genetic relationship between Papua New Guinea Highlands populations and Australian Aborigines (with several DRB1*04 and DRB1*14 alleles shared among them), indicating that they may be common descendants of an ancient colonization of this area,78 which was a unique landmass (‘Sahul’) during Palaeolithic glacial periods. On the other hand, Australian and Papuan populations differ genetically from Austronesian-speaking populations, which are highly diversified among them, and more particularly Taiwan aborigines,79,80 whose geographic expansion colonized the entire Pacific area during the last 4500 years.

As a relevant illustration, Fig. 3 shows a summarized view (average genetic distances on loci HLA-A, -B and -DRB1) of HLA genetic relationships in East Asia (including Taiwan aborigines). The major East Asian linguistic groups are well discriminated and Taiwanese populations (all Austronesians represented in the figure except Yami, located in the small Lan-Yu island off the southeast coast of Taiwan, and who speak a Malayo-Polynesian language) are scattered on the right side. The Pazeh and the Siraya, located on the western coast of Taiwan, are close to continental East Asians (Chinese Han), whereas the Ami living in the east coast lie in an outer position; these results may sustain the linguistic theory proposed by Sagart.21

Figure 3.

 Multidimensional scaling analysis (MDS) of 39 populations from East Asia tested for HLA (average distances were computed on HLA-A, HLA-B and HLA-DRB1 data). Each point represents a population and its symbol the linguistic family to which its language belongs. Stress value = 0.10 (good).

HLA genetic differentiations in America

Amerindian populations are also distant genetically from each other for HLA, and even more discriminated when genetic distances are weighted with the molecular distances among alleles.51 Their allelic diversity is limited, with some alleles exhibiting very high frequencies (e.g. DRB1*04:07, *04:11, *0802, *14:02 and/or *1602, depending on the population). Amerindian alleles belong to a subset of lineages observed in East Asia, in accordance with the peopling of the Americas through the Bering Strait. In both Oceania and the Americas, rapid genetic drift as the result of small population sizes and reduced migration levels led to a drop of genetic diversity, but the large molecular differentiation among most HLA alleles might have helped to ensure immunological protection.

Study of American Indian populations from Mexico and South America shows intriguing observations. In spite of the finding of a restricted number of alleles, all HLA loci with the exception of DPB1 present high levels of heterozygosity.49,51 In Amerindian populations, very few allelic lineages (four HLA-A, seven HLA-B, seven HLA-C, five HLA-DRB1, two HLA-DQA1, two HLA-DQB1 and five HLA-DPB1) are detected, but several alleles of the same lineage are present in each population. Many of the alleles found in these populations are not observed in other outbred populations.56–60,81–84 It can be postulated that these alleles were generated in America and are novel alleles. Gene conversion events could be invoked as the mechanism for their generation. In fact, all putative novel alleles may derive from a few founder alleles (those alleles of each lineage found in other populations) and all the nucleotide sequences donated in the gene conversion events may have come from other founder alleles. Almost all novel alleles identified differ from other alleles in the same lineages by amino acid substitutions in residues contributing to the peptide-binding groove, and may potentially have new peptide-binding capabilities.56–60 Most of the postulated gene conversion events may involve donor and recipient alleles of the same locus. The HLA-B locus presents the highest degree of diversity, and the majority of the putative novel alleles found in these populations comes from this locus. Therefore, it has been postulated that HLA-B has diversified more rapidly in the South American populations.56,60 Interestingly, novel HLA-B alleles are present at the highest gene frequencies in many populations, suggesting that novel alleles generated in America were positively selected in these populations probably because they provided selective advantages.57–59 It is conceivable that, with a limited founder polymorphism, any novel allele that arose in these environments would enlarge the peptide-binding repertoire of these populations. Perhaps the HLA-B locus diverged more than the HLA-A or -DRB1 loci in the South American populations, as a result of a higher probability for intra-locus gene conversions because this locus presented a larger number of founder alleles.

The example of Easter Island

The informative value of HLA typing in anthropological investigations may be illustrated by our studies on Easter Island (for more details, see refs 85–87). Although the available data suggest that Easter Island was first colonized by eastern-migrating Polynesians around 1000 years ago, there is also evidence of an early South American contact. As a matter of fact, the Norwegian explorer Thor Heyerdahl proposed that Easter Island was first populated by American Indians (Amerindians). Previous studies of mtDNA or other chromosomal markers have, however, not been able to demonstrate an early Amerindian contribution to the gene pool on Easter Island or other Polynesian islands, before the Peruvian slave raids in Polynesia in the early 1860s, which resulted in an admixture of Amerindian and European genes in the area.

To address this issue we carried out studies of DNA from blood samples collected in 1971 and 2008 from reputedly non-admixed native Easter Islanders. All typed individuals carried mtDNA of Polynesian origin, and most males carried Y-chromosome markers of Polynesian origin while the rest carried Y chromosome markers of European origin. Genomic typing of HLA demonstrated, however, that some individuals carried HLA alleles that are typically detected in Amerindians. For example, some individuals had an HLA haplotype carrying A*02:12, B*39:05 and other alleles, which are not detected or are very rare in non-Amerindian populations (ref. 49; see also Table 4). We could trace the introduction of this haplotype on Easter Island to a time before the Peruvian slave raids. Our studies cannot establish exactly when these Amerindian alleles were introduced to Easter Island, but they indicate that it may have occurred in ‘prehistoric’ times; i.e. before the island was discovered by Europeans in 1722, but after the island had been inhabited by Polynesians.

There are at least two explanations why an early Amerindian contribution to the gene pool on Easter Island was not detected by studies of mtDNA and Y-chromosome markers. One is that the Amerindian HLA alleles may have been subject to different selective forces than Amerindian mtDNA and Y-chromosome markers, because the HLA genes encode molecules of great importance in immune responses. Another explanation is genetic drift. At the end of the 1800s approximately 100 individuals were left on the island as the result of the Peruvian slave raids and epidemics. Uniparental markers such as mtDNA and Y-chromosome markers are more sensitive to genetic drift than genes that are inherited from both parents. In any case, our results illustrate the usefulness of HLA typing to complement studies of mtDNA and other chromosomal markers in anthropological investigations.

Evolutionary interpretation: human migrations history versus selection

Almost all classical HLA loci are under the influence of some form of natural selection in addition to the stochastic forces – random genetic drift, demographic evolution and migration – associated with human peopling history. This has been shown through different approaches: (i) selective neutrality tests, which often reveal deviations from neutral expectations toward an excess of heterozygotes,46,48,49,51,88 although homozygous excess has also been observed; (ii) comparisons of synonymous versus non-synonymous substitution rates, indicating an excess of amino acid replacements in the PBR of the HLA molecules;54,89 (iii) deep coalescent times of most HLA lineages, explainable by balancing selection;90 and (iv) computer simulation studies,55 more recently improved by ABC approaches to infer selection coefficients in specific situations.91 These results may be explained by our knowledge of the immune function of both class I and class II HLA molecules, the main hypothesis being that allelic diversity would have been favoured to better protect individuals in pathogen-rich environments, among other theories.92 Indirect support for this hypothesis has been provided by Prugnolle et al.93 who found a significant correlation between HLA class I heterozygosity levels in populations and pathogen richness at the global level. However, this correlation tends to drop when Amerindian populations are not taken into account (J.-F. Lemaître, M. Currat and A. Sanchez-Mazas, in preparation). As mentioned above, Amerindians may behave as isolated populations in which significant founder effects restrict the level of polymorphism. These populations show high levels of lineage differentiation that may have been selected to cope with environmental factors. Therefore, a better investigation of the relationship between the molecular diversity of HLA alleles and the function of HLA molecules should be undertaken to confirm the hypothesis of pathogen-driven selection.

On the other hand, most studies aimed at estimating selective coefficients (s) at the HLA loci showed that amino acid sites at the PBR region of HLA molecules are under weak selective constraint, as s values do not exceed a few per cent, (e.g. refs 54, 91) whereas other selected polymorphisms may reach much higher values (e.g. 10–20% for G6PD/A- relative to malaria94). Also, because it may depend on the pathogenic environment, the intensity of selection operating on the HLA loci may not be uniform across different geographic regions and may even be absent in specific geographic areas, as shown for Southwest Europe compared with Northwest Africa for HLA-DRB1.91 Therefore, it would be imprudent to dismiss the use of HLA polymorphism to study the history of human migrations because of selection. Rather, in addition to comparisons of HLA to neutral markers by using classical population genetics analyses,46 new approaches using computer simulation, such as those used by Currat et al.,91 can now be applied to disentangle the effects of stochastic and deterministic factors on the evolution of HLA polymorphism. This will certainly help to improve the interpretation of HLA diversity patterns worldwide in the near future.

Human genetic diversity as revealed by studies of the KIR polymorphism

Nature and function of KIR molecules

Constituting 5–15% of the peripheral blood mononuclear cells,95 the NK cells are an integral component of innate immunity, which depends upon their ability to rapidly secrete cytokines and chemokines, as well as to directly kill unhealthy cells.96 When HLA class I expression is generally down-regulated in virally infected or malignantly transformed cells, rendering the cells resistant to cytolysis by cytotoxic T lymphocytes, these aberrant levels of class I expression can result in spontaneous destruction by NK cells, a concept originally termed the ‘missing-self’ hypothesis.97 Normal healthy cells are protected from spontaneous NK cell killing when they express an appropriate ligand for an inhibitory receptor carried by NK cells. In contrast to the cytotoxic T lymphocytes, the NK cells use a vast array of germline-encoded non-arranging receptors that can trigger either inhibitory or activating signals.98 The net signal integrated from the inhibitory and activating receptors determines the effector function of NK cells.98

The human NK cell function is largely controlled by a family of polymorphic killer cell immunoglobulin-like receptors (or KIR) located in the leucocyte receptor complex that maps to chromosome 19q13.4.99 Fourteen KIR receptors triggering either inhibition (3DL1–3, 2DL1–3, 2DL5) or activation (3DS1, 2DS1–5), or both (2DL4) have been identified. HLA-C is the primary ligand for inhibitory KIR.100 KIR3DL1 binds to the Bw4 serological epitope present on 40% of the HLA-B allotypes and certain HLA-A allotypes (HLA-A23, -A24, -A25 and -A32).101 KIR3DL2 has been shown to recognize certain HLA-A allotypes (HLA-A3 and -A11); however, the precise specificity of this receptor has not been defined.102 The KIR2DL4 receptor binds to the trophoblast-specific non-classical class I molecule HLA-G and induces rapid interferon-γ production that promotes vascularization of the maternal decidua during early pregnancy.103

Although the specificity of the inhibitory KIRs has been extensively characterized, very little is known about the ligands for the activating KIRs. Certain activating KIRs are predicted to bind to the same HLA class I ligands as their structurally related inhibitory KIRs.

KIR polymorphism at the protein and DNA levels

Variable gene content KIR haplotypes

The number and type of KIR genes differ substantially between haplotypes (Fig. 4). Nearly 30 distinct KIR haplotypes with distinct gene content have been characterized to date.104 They are broadly classified into two groups, A and B. Only four KIR genes (KIR2DL4, 3DL2 and 3DL3, 3DP1) are invariably present on all KIR haplotypes so they are referred to as ‘framework’ genes.105 Group A haplotypes have a fixed gene content comprising KIR3DL3-2DL3-2DP1-2DL1-3DP1-2DL4-3DL1-2DS4-3DL2 (Fig. 4, haplotype 1), but are diversified through allelic polymorphism of the individual genes. In contrast, group B haplotypes have a variable gene content comprising several genes and alleles, some of which are not on the A haplotype (Fig. 4, haplotypes 2–6). Hence, B haplotypes generally encode more activating KIR than the A haplotype that encodes a single activating receptor, KIR2DS4. Homozygotes for group A haplotypes (Fig. 4, haplotype 1) have only seven functional KIR genes, whereas heterozygotes for group A and group B haplotypes (Fig. 4, haplotypes 1 + 2) may have all 14 functional KIR genes.

Figure 4.

 KIR haplotypes have variable gene content. Map of selected KIR haplotypes is shown. Haplotype 1 represents group-A KIR haplotype and the remainder are the representative of over 30 known group-B haplotypes. The framework genes, present in all haplotypes are shown in black boxes; genes encoding activating KIR are in gray boxes; and those for inhibitory receptors are in white boxes. KIR2DP1 and KIR3DP1 are pseudogenes that do not encode any functional receptors.

The function of the inhibitory KIR depends on the availability of their specific cognate HLA class I ligands. Given that both KIR genes at chromosome 19q13.4 and HLA genes at chromosome 6p21.3 are polymorphic and display significant variations, the independent segregation of these unlinked gene families produce a great diversity in the number and type of KIR–HLA pairs in individuals.

Nature of KIR gene polymorphism

In addition to haplotypic diversity, each KIR gene exhibits considerable sequence polymorphism. As of May 2010 a total of 347 KIR sequences have been deposited into the GenBank (http://www.ncbi.nlm.nih.gov/Genbank/) and IPD-KIR databases (http://www.ebi.ac.uk/ipd/kir/index.html). The inhibitory KIR genes are relatively more polymorphic, whereas the activating KIR genes are generally conserved. Because of the similarity in sequence of the genes there have been many reports of unequal recombinations. This has led to duplication of the genes on the same haplotype106 or to the converse of haplotypes missing genes, including framework genes.107

KIR diversity in human populations

Studies in a limited number of KIR loci and populations to date support the notion that variation within and between populations in the activating KIR is maintained primarily through gene-content variation, rather than allelic diversity. In contrast, although most individuals bear the majority of the inhibitory KIRs, significant allelic polymorphism is often present at these loci. The extensive polymorphism of KIR genes and their alleles has been reviewed previously.6 The synergistic combination of allelic polymorphism and variable gene content individualizes KIR genotypes to an extent where unrelated individuals almost always have different KIR types. Furthermore, the KIR receptors are clonally expressed on NK cells, so that each NK cell clone expresses only a portion of the genes carried by the gene profile of the individual.108 Stochastic expression of different combinations of receptors by NK cells results in this repertoire of NK clones with a variety of ligand specificities. This level of diversity probably reflects a strong pressure from pathogens on the human NK cell response.

KIR gene content diversity in human populations

All human populations have both group A and group B haplotypes, but their frequencies vary considerably.109–111 Worldwide, approximately 30% of individuals are homozygous for the canonical A haplotypes, which are found in all populations examined to date; however, a wide range in the A haplotype frequency is observed between populations, from 8 to 80%. These patterns of haplotypic variation result in differential gene content profiles in world populations; over 300 distinct KIR genotypes have been identified in a collection of worldwide human populations (http://www.allelefrequencies.net). Nevertheless, diversity in KIR gene content between populations can be attributed in large part to frequency variation in common haplotypes, which may reflect both population history and local adaptation. Haplotype estimation in world populations112 across the entire KIR region suggests that the six gene-content haplotypes illustrated in Fig. 4 can account for ∼ 85% of the total observed variation in most world regions; some exceptions are found within Africa and Oceania,113,114 where extensive diversity in the B haplotype is observed, with numerous other, low-frequency haplotypes in addition to those represented in Fig. 4.

By comparative analysis of world populations, a link was found between the prehistoric human migrations and the evolution of two groups of KIR haplotypes distinguished by their content of activating KIR genes.111 The natives of America,115,116 Australia117 and India,118–120 who had extensive prehistoric migrations, carried high frequencies of B haplotypes. Presumably the aboriginal populations of India, Australia and America acquired activating KIR genes to survive the environmental challenges during their distant migrations from Africa.119 In contrast, most Northeast Asians (> 55%), including Chinese, Japanese and Koreans, who settled in the lands of more temperate latitudes where the environmental changes between summer and winter are subtle, carry only group A haplotypes, which express no or only one activating KIR receptor.121–123 In Africans and Europeans, the A and B haplotypes are distributed equally, which suggests a balancing selection.

Patterns of worldwide KIR diversity

In nearly all human populations studied to date, within each of the centromeric and telomeric portions of the KIR cluster (with KIR3DP1 and KIR2DL4 delineating the dividing point for these) there exists extensive linkage disequilibrium (LD).124 For example, across all populations examined for the KIR anthropology component of the 15th International Histocompatibility and Immunogenetics Workshop (IHIW),125 the average overall LD between the centromeric B haplotype loci KIR2DL2 and KIR2DS2 was shown to be nearly complete (Wn = 0·99). Likewise, the telomeric B loci KIR3DS1 and KIR2DS1 are also in very strong LD (Wn = 0·92). In contrast, much less LD is observed between loci of the centromeric and telomeric portions of the cluster in all populations in this study; the overall LD between KIR2DL2 and KIR3DS1 is very low (Wn = 0·10).

The centromeric and telomeric portions of the KIR cluster each have distinct patterns of worldwide variation. Balanced frequencies are often observed for the centromeric KIR A and B haplotypes; the frequency of Cen-A (i.e. characterized by the presence of KIR2DL3, KIR2DP1 and KIR2DL1) in most populations worldwide is ∼ 50–60%, roughly that which is observed for the extended A haplotype. The notable exception is within East Asian populations,110,111,126 where the frequency of the centromeric B haplotype loci KIR2DL2 and KIR2DS2 is generally very low and Cen-A is observed at frequencies greater than 80%; the frequency of the extended (centromeric and telomeric) A haplotype also tends to be highest in these populations. It is interesting to note, however, that these exceptionally high Cen-A frequencies are generally not observed in Amerindian populations,112 suggesting that this shift occurred within East Asia subsequent to the differentiation of the Amerindians from these populations. Although the loci associated with the full-length motif Cen-B1 (i.e. characterized by the presence of KIR2DL2, KIR2DS2, KIR2DL5, KIR2DS3/S5, KIR2DP1 and KIR2DL1) are very common within Africa,110,112,127 the much shorter Cen-B2 (i.e. characterized by the presence of the framework genes in addition to KIR2DL2 and KIR2DS2) is observed primarily outside Africa, and this motif largely replaces Cen-B1 in some populations outside Africa (J. Hollenbach, unpublished results). However, there is no clear pattern or gradient associated with this motif, which appears to be distributed somewhat sporadically across several world regions. African populations in general exhibit substantially greater haplotypic diversity within the centromeric KIR region relative to other world regions.

The telomeric portion of the KIR is characterized by a pattern of variation more closely related to population demographics. As previously noted, the stimulatory KIR3DS1 is found at much lower frequencies in African populations relative to most other world populations;112,128 its frequency increases with geographic distance from Africa.110 Although the African populations exhibit some of the highest frequencies for the centromeric B haplotype loci, the frequency of all telomeric B haplotype activating loci is in general very low in African populations. However, whereas the overall frequency of the extended B haplotype is comparatively low among the Asian populations, extended B haplotype frequencies are relatively high in the African populations. Other world populations generally range between these extremes. It is striking, however, that despite this variation, the worldwide frequency of A and B haplotype heterozygosity is generally stable, with an average frequency of 44%, suggesting that there is a population-level advantage in maintaining these balanced frequencies.125

Finally, although it is likely that KIR2DS3 and KIR2DS5 are alleles of a duplicated locus, which may appear on either the centromeric, telomeric, or both portions of the KIR haplotype,129 current typing methodologies are generally unable to distinguish between the two loci, and population level data are a reflection of the presence or absence of the two alleles regardless of position in the haplotype. In African populations, the frequency of KIR2DS5, in parallel with the remaining telomeric B haplotype genes (KIR3DS1 and KIR2DS1) with which it is generally associated, is extremely low. In contrast, KIR2DS5 is almost always observed in Amerindian populations regardless of whether the locus is centromeric or telomeric in the KIR region and KIR2DS3 is largely absent in these populations.112,115,130 Notably, whereas KIR2DS3 is rarely seen in Amerindian populations, it is observed at moderate frequencies in East Asian populations, suggesting that the fixation of the KIR2DS5 allele at these loci occurred in conjunction with or subsequent to the New World migration and divergence of Amerindian populations.

Evolutionary interpretation: human migrations history versus selection

Meta-analyses of populations gathered worldwide from publications and the http://www.allelefrequencies.net database131 have shown that KIR polymorphism is correlated to geography,6,119,132 despite some limitations in the anthropological characterization of these data. For instance, gene presence/absence frequencies at activating loci (i.e. DS genes) and inhibitory loci (i.e. DL genes) linked to KIR haplotype B clearly reflect a geographical gradient among populations.133 However, the same study on inhibitory loci linked to KIR haplotype A did not show such a correlation. It is important to note that these meta-analyses are based on KIR gene content only, and do not take allelic variation into account, which may explain the different patterns observed between A and B haplotypes. Indeed, because of the polymorphism peculiarities of both haplotype groups (see above), gene content polymorphism for B-related loci appear to be sufficiently discriminative for population genetic comparisons, whereas similar analyses on A-related loci may rely on allelic typing. The study of a limited number of populations where KIR variations were examined at the allele level appears to corroborate this hypothesis.132 In light of these studies, KIR genes appear to be good markers for anthropological studies, similar to the polymorphisms of GM and HLA genes, and mtDNA and Y chromosome markers. However, more in-depth analyses, notably at the allelic level and including more populations with more thorough anthropological characterization, must be achieved to confirm this.

In addition to demographic factors and stochastic forces such as gene flow and genetic drift, KIR diversity is thought to have been shaped by various selective forces. The KIR inhibitory and activating receptors, among others, regulate NK cell functions134 and KIR gene content has been associated with infection, cancer, autoimmunity, pregnancy syndromes, and transplant outcome. These features are likely to make KIR a good candidate for ongoing adaptive evolution. In particular, balancing selection is proposed as the main force shaping the evolution of genes involved in innate135 and acquired immunity.136,137 These last few years, several lines of evidence for KIR selection, both at the haplotypic and the gene levels, have been discussed. For instance, it is proposed that some form of selection is acting to maintain a balance of both haplotype groups in humans. This reflects their biological relevance and complementary roles for the survival of human populations (i.e. the hypothesis implies that A haplotypes are more specialized towards immune defence, whereas B haplotypes are more specialized towards reproduction).138 Two studies using high-resolution allelic typing in Japanese139 and Irish,140 respectively, have shown that higher levels of polymorphism than expected under neutrality are observed both at the haplotypic and allelic level for several telomeric KIR genes (i.e. KIR2DL4, KIR3DL1 and KIR2DS4). This is consistent with an effect of balancing selection maintaining diversity and several haplotypic/allelic variants with intermediate frequencies in both populations. Furthermore, LD analysis suggests that these three loci form ‘core’ haplotypes with distinguishable functions depending on the alleles present at each locus (e.g. KIR3DL1 alleles have been subdivided into three main complementary lineages from a functional point of view128 and all three lineages are strongly represented in the Irish population). Conversely, centromeric genes specifying HLA-C receptors (i.e. KIR2DL1 and KIR2DL3) exhibit less diversity than expected under neutrality, suggesting that their alleles have been subject to positive directional selection. The model proposed here is that balancing selection is maintaining a pool of functionally divergent haplotypes and alleles upon which positive selection can operate.139

It is now widely accepted that KIR genes are co-evolving with their HLA ligands.110,112,139–141 Interestingly, many associations reported between KIR and HLA do differ between populations, which argues against universal selective pressures in diverse human populations for specific KIR–HLA combinations.140 Because of their functional interactions with KIR, as well as the fact that HLA genes are subject to balancing selection49 and have been studied more thoroughly for anthropological purposes, the latter genes may provide an outline with which to draw a clearer picture of the respective roles of human migrations history and selection for shaping KIR gene polymorphism. By maintaining high levels of diversity within populations, balancing selection of HLA genes is likely to lessen their genetic differentiation, as observed for the HLA-DRB1 locus in Africa, Europe and East Asia.48,91 However, although significant deviations from neutrality were reported by this study, this selective effect did not disrupt the high and significant correlation found between genetic and geographic distances at the world scale. This was recently confirmed by meta-analyses performed on HLA-A, -B and -DRB1 loci, where population comparisons showed that HLA genetic patterns were mostly shaped by geographic and linguistic differentiations,40,50 despite the excess of heterozygotes at these loci. As already mentioned, the preliminary results obtained for KIR genes at the worldwide scale suggest similar patterns to those found for HLA, but await confirmation through more thorough analyses.

Conclusion: contributions of immunogenetics to anthropological studies

In this article, we have summarized our current knowledge of the polymorphism of three immunogenetic complexes, GM, HLA and KIR, in relation to their diversity in human populations and the interpretation of that knowledge. Actually, these three genetic complexes represent a small fraction of our genome restricted to three different chromosomes. Likewise, studies of mtDNA and Y-chromosome markers, which have proved to be highly informative to reconstruct gender-specific molecular phylogenies of the human species (refs 142, 143, among many others) also correspond to minor DNA fractions (∼ 0·0005% of the total haploid genome, for mtDNA, and ∼ 2%, for the Y chromosome). By contrast, analyses of microsatellites and single nucleotide polymorphisms have provided relevant information on the entire genome (e.g. refs 144, 145). Impressive technical improvements now also allow high throughput DNA sequencing with promising genome-wide application to the study of human genetic variation worldwide, although this is still in the early stages. Therefore, the study of immunogenetic complexes may be seen as a limited contribution to our knowledge of human genome diversity. Another possible drawback of the analysis of immunogenetic markers in the field of anthropology is the fact that they are prone to natural selection, as discussed in the present review. As IgG, HLA and KIR molecules are instrumental in immune responses, their evolution is clearly influenced by environmental factors, which may be a disadvantage when one tries to reconstruct the peopling history of modern humans. Indeed, when selection is at work, the observed genetic relationships among human populations may not reflect their degree of historical relatedness, as can also be concluded for some highly variable phenotypic traits like human pigmentation.26,27 This would speak for neutral markers corresponding to non-coding regions of the genome, like microsatellites and single nucleotide polymorphisms, being preferred for genetic studies in anthropology.

On the other hand, general conclusions drawn by analysing the patterns of genetic diversity of widely studied immunogenetic markers, like GM and HLA, are shown to be congruent with those found for other genetic markers. This is the case for at least five major results.

  • 1 Of the total genetic diversity of the human species, the highest level of variation is found within populations, whereas inter-population variation represents only a minor proportion of the total genetic variance. This reinforces the hypothesis, also held by a majority of palaeontologists, that all modern human populations share a recent common origin (at about 100 000–200 000 years before present, according to the fossil record), as opposed to multi-regionalist models claiming phylogenetic continuities between local Homo erectus and Homo sapiens in several regions of the world. We note, however, that the proportion of inter-population variation differs depending on the genetic system: it is around 15% for allozymes,24 most DNA markers,22,23 and HLA-DPB1,25,49 and is slightly lower for the other HLA loci (∼ 10% on average), but is notably higher for GM (∼ 46%, including ∼ 39% among geographic groups and ∼ 7% among populations within geographic groups).12 This may be the result, in the case of GM, of a bias in frequency estimation because of serological typing (as discussed above), although the effect of positive selection cannot be totally ruled out. In the case of HLA, we can conclude that balancing selection lowers inter-population variation although this effect is not very pronounced.
  • 2 The highest level of genetic diversity is observed in East and Central East African populations; i.e. Ethiopian (and others like Kenyan and Tanzanian) populations speaking languages of the Afro-Asiatic family are highly diverse for both immunogenetic and other markers like GM, HLA and mtDNA.14,146 This suggests that East African populations underwent a huge demographic expansion leading to a significant diversification of their genetic pool. Actually, such an expansion appears to be very ancient compared with other regions of the world,16 which is compatible with the hypothesis of an origin of the modern humans’ diaspora in East Africa;
  • 3 In contrast, Amerindians exhibit a low level of diversity, together with a high amount of differentiation among populations, for widely studied polymorphisms.147 This is also often the case for Oceanian populations, such as Taiwanese and Polynesians. This result is robust across all genetic polymorphisms, despite the maintenance of genetic variation through balancing selection for some immunogenetic systems like HLA. This means that in the case of the peopling of America and Oceania, genetic drift has been a main mechanism by which populations differentiated, because of small effective population sizes, although HLA traces of natural selection are suggested at the DNA level by the presence of a high diversity of rare HLA-B alleles and the abundance of molecularly distant alleles at all loci.
  • 4 The global genetic variation is mainly structured by geography: recently, a highly comprehensive characterization of human genetic variation based on the study of 650 000 single nucleotide polymorphisms in 51 populations showed that the genetic pattern is continuous from one continent to another and may be explained by shared ancestry with continuous gene flow, except in a few cases where recent admixture may have occurred.148 The same conclusions have been drawn from the study of immunogenetic polymorphisms.10,12,88
  • 5 Most polymorphisms exhibit genetic signatures of past cultural differentiations. Cases where genetic variation parallels linguistic diversity are common, e.g. at the continental level of sub-Saharan Africa, the genetic structure correlates with the linguistic structure for GM, HLA and Y-chromosome markers.15,149

Immunogenetics is therefore an informative tool in anthropology, despite the effect of natural selection, which is clearly demonstrated for HLA but appears to be weak. Moreover, the study of immunogenetic markers may provide important novel information for anthropological studies. Indeed, what is often considered to be a disadvantage in anthropological studies – a non-neutral mode of evolution of the studied polymorphisms – may be highly relevant to understanding complementary aspects of human evolution, like environmental changes. Relevant results obtained through computer simulation have recently been obtained by Currat et al.,91 who estimated an unequal coefficient of selection for HLA-DRB1 in Southwest European (0·7%) and Northwest African (1·9%) populations separated by the Strait of Gibraltar. This difference can be seen as a genetic signature of heterogeneous environments in the past, i.e. different pathogen richness or prevalence of specific infectious diseases in the two regions. Also, the case of Amerindians would deserve deeper investigation to understand the evolution of their peculiar HLA genetic profiles. This could also be carried out by simulating different scenarios taking into account both the initial settlement of America and its recent history marked by European colonization, which brought many new pathogens to this continent. The study of polymorphisms of important molecules for immune responses opens crucial areas of research in the field of human evolution, such as gene–pathogen co-evolution.

Acknowledgements

This work received financial support from the Swiss National Science Foundation (SNF, Switzerland) grants no. 3100A0—112651 and 31003A—127465 (A.S.M.), the ESF (Europe) COST grant of Action BM0803 ‘HLA-NET’ (A.S.M.), the Oslo University Hospital Rikshospitalet, and Medinova (E.T.), and the US National Institute of Health Grant no. AI067068 (J.A.H. and S.J.M.).

Disclosures

The authors declare no conflicts of interests.

Ancillary