SEARCH

SEARCH BY CITATION

Keywords:

  • local adaptation;
  • allele surfing;
  • human evolution

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

Several studies have found strikingly different allele frequencies between continents. This has been mainly interpreted as being due to local adaptation. However, demographic factors can generate similar patterns. Namely, allelic surfing during a population range expansion may increase the frequency of alleles in newly colonised areas. In this study, we examined 772 STRs, 210 diallelic indels, and 2834 SNPs typed in 53 human populations worldwide under the HGDP-CEPH Diversity Panel to determine to which extent allele frequency differs among four regions (Africa, Eurasia, East Asia, and America). We find that large allele frequency differences between continents are surprisingly common, and that Africa and America show the largest number of loci with extreme frequency differences. Moreover, more STR alleles have increased rather than decreased in frequency outside Africa, as expected under allelic surfing. Finally, there is no relationship between the extent of allele frequency differences and proximity to genes, as would be expected under selection. We therefore conclude that most of the observed large allele frequency differences between continents result from demography rather than from positive selection.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

On a worldwide scale, human populations show a large phenotypic variability, particularly for skin colour, face and body shapes, susceptibility to pathogens, as well as for the prevalence of genetic diseases (Lewontin, 1995). However, most of the genetic variation in humans is found within populations rather than among populations or geographic regions (Lewontin, 1972, Barbujani et al. 1997, Rosenberg et al. 2002b). Still, many studies have focused on traits or loci showing geographically restricted distribution, or on loci showing drastic allele frequency differences between two regions. These particular cases can indeed reveal important information about local selective pressures or about the demographic histories of different populations (Balaresque et al. 2007). It is however difficult to disentangle the effects of positive selection from those of demography, since past demographic events such as population bottlenecks or range expansions can mimic the genetic signatures of a selective sweep like long range linkage disequilibrium and reduced allelic diversity.

The colonisation of the world by modern humans was probably accompanied by a series of founder effects with subsequent local population expansions (Handley et al. 2007). Strong bottlenecks have also certainly occurred during the exit out of Africa and at the onset of the colonisation of the Americas by people from Asia (Fagundes et al. 2007, Goebel et al. 2008). These bottlenecks, followed by a spatial expansion, can lead to the geographic spread of an allele that rides on the wave of advance of the spatial expansion, a phenomenon called allelic surfing (Edmonds et al. 2004, Klopfstein et al. 2006, Travis et al. 2007). New mutations arising on the wave front and extant alleles may surf successfully (Excoffier & Ray, 2008), spreading geographically and increasing in frequency in the newly colonised areas (Klopfstein et al. 2006). A combination of simulation, analytical and experimental studies have shown that the probability for an allele to successfully surf is increased in the presence of spatial bottlenecks, when local deme size is small, and when populations at the wave front grow rapidly and exchange few genes with their neighbours (Klopfstein et al. 2006, Hallatschek et al. 2007, Travis et al. 2007, Excoffier & Ray, 2008, Hallatschek & Nelson, 2008). This neutral process has received much attention recently because of its consequences on allele frequencies that mimic selective processes (Nielsen et al. 2007).

However, it is clear that human populations colonising novel habitats have been confronted by new selective pressures due to their exposure to different climate, food sources, and pathogens (Balaresque et al. 2007). Some of these selective pressures certainly triggered local adaptation that impacted on allele frequencies at several loci. However, neutral allele surfing, like selection, will also occur at only a few loci, and will therefore not affect all loci uniformly, like other demographic factors such as demographic expansions, inbreeding or bottlenecks.

Until recently, most human genes showing strong geographic structures were considered to be under positive selection (see Table 1, where 44 such genes are listed). Most of these genes show a marked difference in allele frequencies (typically larger than 20%) between African and non-African populations. In many of these studies, local selection outside Africa was thought to have promoted these large allele frequency differences. Prominent examples are two genes that are involved in the control of brain size, MCPH1 and ASPM (Evans et al. 2005, Mekel-Bobrov et al. 2005). Both genes showed an increased frequency of a derived allele outside Africa and high levels of linkage disequilibrium. The authors therefore hypothesised that the derived haplotypes were under local positive selection in non-African populations. However, Currat et al. (2006) showed by spatially-explicit simulations that similar geographic distributions of allele frequencies could be generated by neutral allelic surfing during the range expansion outside Africa.

Table 1.  Genes reported as showing a high degree of population differentiation in the literature. We use here the official gene symbols as defined by the HGNC, and we provide in brackets the symbols used in the references if these are not the official symbols. For only 11 genes out of 44 (25%), past demography was proposed to be more likely to have shaped geographic structure than selection.
GenesDemography proposed as an explanationReferences
ABCB1 (MDR1) Tang et al. (2004), Wang et al. (2007a)
ABCG2×de Jong et al. (2004)
ADH1B Han et al. (2007), Osier et al. (2002)
AGT Nakajima et al. (2004)
ALDH2 Oota et al. (2004)
APOE×Singh et al. (2006)
ASIP×Norton et al. (2007)
ASPM Mekel-Bobrov et al. (2005)
ATXN2 (SCA2) Yu et al. (2005)
CAPN10 Fullerton et al. (2002)
CASP12 Xue et al. (2006)
CCR5×Sabeti et al.(2005)
CD28×Butty et al. (2007)
CTLA4×Butty et al. (2007)
CYP3A×Schirmer et al. (2006), Thompson et al. (2004)
DARC (FY) Hamblin et al. (2002)
DMD Nachman & Crowell (2000)
EDAR Bryk et al. (2008)
F7 Hahn et al. (2004)
G6PD Saunders et al. (2002)
GNB3 Young et al. (2005)
GRK4 Lohmueller et al. (2006)
ICOS×Butty et al. (2007)
IL13 Zhou et al. (2004)
IL4 Rockman et al. (2003)
LCT Bersaglieri et al. (2004)
MAOA Gilad et al. (2002)
MAPT Stefansson et al. (2005)
MC1R Gerstenblith et al. (2007)
MCPH1 Evans et al. (2005)
MMP3 Rockman et al. (2004)
MSTN (GDF8) Saunders et al. (2006)
MTHFR×Hughes et al. (2006), Rosenberg et al. (2002a)
NAT2 Sabbagh et al. (2008)
OCA2×Norton et al. (2007)
PDYN Rockman et al. (2005)
PTPRC Stanton et al. (2003)
SLC22A4 Mori et al. (2005)
SLC24A5 Lamason et al. (2005), Norton et al. (2007)
SLC45A2 (MATP/AIM1) Norton et al. (2007), Soejima et al. (2006)
SLCO1B1×Pasanen et al. (2008)
TAS2R16 Soranzo et al. (2005)
TRPV6 Akey et al. (2006)
TYR Norton et al. (2007)

In this study, we explore data from the HGDP-CEPH Diversity Panel consisting of 772 STRs, 210 insertion-deletion polymorphisms and 2834 SNPs typed in 53 populations worldwide to determine the prevalence of large allele frequency differences between regions. We find that large allele frequency differences between continental regions are extremely common, as they occur at almost one third of all loci. We discuss the respective role of selection and demographic factors for shaping these patterns in the light of geographic and genomic information.

Material and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

Data

We analysed three multilocus data sets containing short tandem repeats (STR), insertion-deletion polymorphisms (indel) and single nucleotide polymorphisms (SNP), respectively, typed in 53 worldwide populations belonging to the CEPH Human Genome Diversity Panel (Cann et al. 2002, Rosenberg et al. 2002b, Ramachandran et al. 2005, Conrad et al. 2006). The individuals analysed correspond to the H1048 subset defined by Rosenberg (2006), which excludes atypical and duplicated samples. The datasets were downloaded from the web site http://rosenberglab.bioinformatics.med.umich.edu/diversity.html.

Initially, the STR data set contained 783 loci typed in 1048 individuals, but we have removed eleven loci showing overall more than 10% missing data (GATA43C11, GGAA22E01, GATA193D02, GATA135F02P, AAC023, ATT015, ATT077P, GATA63C02, ATA109H09, GATA7F09, and TTTA033), and we thus analysed a total of 9210 alleles at 772 STR loci. We also examined 210 diallelic indels that were typed in the same 1048 individuals, as well as 2834 SNP loci that were typed in a subset of 927 individuals.

The populations were grouped in five main geographic regions, following Rosenberg et al. (2002b): Africa, Eurasia, East Asia, America, and Oceania (Excoffier, 2003, see also Bastos-Rodrigues et al. 2006, Li et al. 2008). A complete list of the populations is found in Table S1.

Analyses

STRs, indels and SNPs data sets were analysed separately. We used ARLEQUIN ver 3.11 (Excoffier et al. 2005) to calculate the average frequency of each allele in the populations. The R statistical package (R Development Core Team, 2008) was used to develop scripts for the analyses listed below.

For each allele i, we computed the average allele frequency inline image within each geographic region j, as well as the difference with the average frequency computed over all other populations as inline image, where inline image is the average frequency of allele i in all populations not belonging to the geographic region j. This was done for all regions except Oceania, because there are only two populations in this region, and therefore the average frequency is subject to large fluctuations and the power to detect significant differences is low. For STR data, we also computed for each locus the index ΔFmax as the largest absolute value of ΔF found among all alleles present at that locus. This index ΔFmax allows us to characterize allele frequency differences at each locus with a single statistic, like in the case of diallelic loci. For diallelic loci, ΔFmax=|ΔF|.

We randomly permuted populations between regions and recomputed each time ΔF, to obtain its null distribution and test for the significance of ΔF for each allele. The same permutation procedure was used to test if the number of alleles with a given frequency difference (kΔF) between a region and the rest of the world was significantly larger than expected by chance.

We also introduced a procedure to test if a random set of populations that are geographically close to each other also present sharp allele frequency differences with the rest of the world. Taking geography into account is actually a more stringent test of allele frequency differences than a procedure based on free permutation of random populations, because populations closer to each other tend to be more similar than populations at greater distance, due to isolation by distance and shared history. However, when regions consist of only a small number of populations, such as America, the number of possible random groups is reduced. kΔF was tested by taking geographical constraints into account as follows: a random population is assigned to the group representing the tested region, and the other populations allocated to this group are drawn at random from the 2Pj 1 geographically closest populations, where Pj is the number of populations in the tested region. The geographic distance between populations was computed as the shortest distance on land (i.e. least-cost path avoiding seas) using the software PATHMATRIX (Ray, 2005).

If allelic surfing was a major driving force behind allele frequency differences, we would expect to find more STR alleles with a higher frequency in newly colonised areas, because surfing promotes the increase in frequency of low frequency alleles. However we would not necessarily expect to find any asymmetry in the direction of frequency change of derived SNP and indel alleles, since surfing should affect equally ancestral and derived alleles. We tested these predictions by performing a sign test on the number of alleles having increased or decreased in frequency outside a region of interest. The ancestral allele for each human SNP was inferred by comparisons with orthologous alleles in the chimpanzee and rhesus macaque genome assemblies, available in the Table Browser at the UCSC Genome Bioinformatics Site (http://genome.ucsc.edu/, table snp128OrthoPanTro2RheMac2, (Karolchik et al. 2008)). The ancestral allele was assumed to be identified if both the chimpanzee and macaque alleles were described and identical, or if an allele was only known in one of these two species. If orthologous alleles were known in both species but were different from each other, the ancestral allele was assumed to be the chimpanzee allele if the human variants contained the chimpanzee allele but not the macaque allele. In all other cases the ancestral allele was assumed to be unknown. Likewise the ancestral state of the indels was inferred by comparing human allelic diversity to orthologous alleles in the chimpanzee and in the gorilla (Weber et al. 2002). In this way, we were able to determine the ancestral allelic states of 176 indel and that of 1530 SNP loci. We then used the R function ‘sign.test’ (Package BSDA; (Arnholt, 2007)) to perform a sign test allowing us to determine if there is any asymmetry in the frequency change of derived alleles. The genomic positions of a subset of 476 STRs, 162 indels and 2784 SNPs could be determined in the NCBI Build 35-reference system. The distance to the nearest gene was computed for each of the mapped loci, and varied from 0 (when the marker is found within the transcript of a gene) to 73.9 Mb. We computed Pearson correlation coefficient between ΔFmax and marker distance to the closest gene to assess whether there was any relationship between these two variables.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

We tested whether populations belonging to the same region have more similar allele frequencies than expected by chance due to shared demographic history or shared selective events. Indeed, they show more similar allele frequencies than random populations, as the number of alleles showing ΔF > 0.2 for a given comparison is always significantly larger than expected by chance when tested with the random population permutation procedure (Tables 2–4 and Tables S2-S4). However, this is not always the case when tested with the geographically explicit permutation test, when randomized regions are made up of spatially neighbouring populations. In the STR dataset all positive frequency differences between America and the rest of the world that are larger than 0.2 are non-significant (Table 2 and Table S2). Additionally some of the larger frequency differences between America and the rest of the world in the indel dataset are also non-significant (Table 3 and Table S3). The geographically explicit permutation test is expected to be more stringent, as geographically close populations are genetically often more similar than random populations. However, if there are only few populations in a region, as is the case for the Americas, the geographically explicit permutation test is too stringent because the number of different random groups is reduced. Allele-specific ΔF was therefore tested with the random permutation procedure only and it is found significant in all cases as soon as ΔF > 0.25. We therefore chose an arbitrary threshold for ΔF of 0.3 to define a set of alleles with significant ΔF to summarise the results.

Table 2.  STR allele frequency differences (ΔF) for the comparisons of Africa vs. the rest of the world and America vs. the rest of the world. Positive inline image (in the upper part of the table) indicate that the alleles have a lower frequency within African (or American) populations than in the non-African (or non-American) populations (because inline image).
ΔFAfrica vs. non-AfricaAmerica vs. non-America
Allelesasignificantbp-value 1cp-value 2dLociesignificantfAllelesasignificantbp-value 1cp-value 2dLociesignificantf
  1. aTotal number of alleles with a given ΔF. Note that we have used semi-open ΔF intervals (]x-y]) to assign alleles to particular intervals, such that for instance a ΔF value of 0.4 was put in the interval 0.35–0.4.

  2. bNumber of alleles with a significant ΔF (inline image).

  3. cp-value for the number of alleles with a given ΔF using random population permutations (* <= 0.05, ** <= 0.001);

  4. dSame as c, but constraining permutations by geography (see Methods; * <= 0.05, ** <= 0.001).

  5. eNumber of loci with a given ΔFmax value.

  6. fNumber of loci with a significant allele frequency difference.

0.65–0.70     0     
0.6–0.6511****110     
0.55–0.60     0     
0.5–0.5555****550     
0.45–0.599****9911* 00
0.4–0.4599****9911* 00
0.35–0.41919****171766** 66
0.3–0.352424****22221313* 66
(−0.3) −0.391224604  69360990493916  625568
(−0.3)–(−0.35)99***665353****4949
(−0.35)–(−0.4)88***773434****3333
(−0.4)–(−0.45)22***112424****2424
(−0.45)–(−0.5)11***111515****1515
(−0.5)–(−0.55)11***1155****55
(−0.55)–(−0.6)0     77****77
(−0.6)–(−0.65)0     22****22
Table 3.  Indel absolute allele frequency differences for the comparisons of Africa vs. the rest of the world and America vs. the rest of the world. Since indels can be considered as diallelic loci, we directly report the number of loci with a given ΔFmax value.
ΔFmaxAfrica vs. non-AfricaAmerica vs. non-America
Lociasignificantbp-value 1cp-value 2dLoci asignificantbp-value 1cp-value 2d
  1. Table header is defined in Table 2.

0.75–0.80   0   
0.7–0.7511****11***
0.65–0.722****0   
0.6–0.6511****0   
0.55–0.644****0   
0.5–0.5533***22***
0.45–0.51010****11* 
0.4–0.451414****66***
0.35–0.41414***88** 
0.3–0.351212***1111** 
0–0.3149104  18193  
Table 4.  SNP allele frequency differences for the comparisons of Africa vs. the rest of the world and America vs. the rest of the world. Since SNPs can be considered as diallelic loci, we directly report the number of loci with a given ΔFmax value.
ΔFmaxAfrica vs. non-AfricaAmerica vs. non-America
Lociasignificantbp-value 1cp-value 2dLociasignificantbp-value 1cp-value 2d
  1. Table header is defined in Table 2.

0.75–0.833****0   
0.7–0.751010****0   
0.65–0.711***11***
0.6–0.651414****55***
0.55–0.63131****1313***
0.5–0.553838****1919***
0.45–0.56262****2222***
0.4–0.458989****6060***
0.35–0.4136136****7272***
0.3–0.35129129***143143***
0–0.323211484  24991303  

Overall we find that large allele frequency differences between geographic regions are extremely frequent (Tables 2–4 and Tables S2–S4). Indeed, 215 of the 772 STR loci (27.9%), 90 out of 210 indel loci (42.9%) and 913 of the 2834 SNP loci (32.2%) have ΔFmax > 0.3 for at least one comparison. Among these, 18.1% of the STR loci with ΔFmax > 0.3 show such a large ΔFmax for more than one comparison, while for the indels and SNPs this fraction is 28.9% and 18.1%, respectively. Note that the total number of loci with ΔFmax > 0.3 is smaller than the sum of the number of loci with ΔFmax > 0.3 involved in the different comparisons that can be computed from Tables S2–S4, because a given locus can show large allele frequencies in more than one continental comparison. The largest observed ΔF (0.79) was found between African and non-African populations for the SNP locus ‘rs5972561’ (see below in Figure 4I).

Figure 4. Examples of spatial distribution of alleles with large ΔFs. Black pies represent the frequency of a given allele, and its average frequency within (WR) and out of (OR) the region of interest is shown on the bar plot. Whiskers in the bar plots represent standard deviations. A: allele 298 at the ATA1F08 STR locus (ΔF = 0.45), 18.7 Kb away from closest gene UTRN. B: allele 176 at the GATA84B12 STR locus (ΔF = 0.56), 106.3 Kb away from closest gene CCDC54. C: allele 111 at the GGAA20G10 STR locus (ΔF=0.51), 628 bp away to closest gene E2F6. D: allele 190 at the GATA11C08 STR locus (ΔF = 0.41) 149.0 Kb away from closest gene STARD13. E: indel locus rs2307832 (ΔF = 0.74), 14.9 Kb away from closest gene USP24. F: indel locus rs133052 (ΔF = 0.72), 9.7 Kb away from closest gene MKL1. G: SNP locus rs6431253 (ΔF = 0.54), 169.2 Kb away to closest gene ARL4C. H: SNP locus rs2252199 (ΔF = 0.53), 30 Kb away from closest gene HSPA13. I: SNP locus rs5972561 (ΔF = 0.79), located in the gene DMD. J: SNP locus rs5959428 (ΔF = 0.52), 323.6 Kb away from closest gene ITM2A.

Download figure to PowerPoint

image

In the comparisons of Africa and America to the rest of the World, the allele frequency differences are strikingly large (Tables S2-S4), as expected under the surfing out-of-Africa hypothesis. When Africa is contrasted to the rest of the world the fraction of loci with ΔFmax > 0.3 is 10.2%, 29.0%, and 18.1%, for STRs, indels, and SNPs, respectively, and these fractions are 19.0%, 13.8%, and 11.8%, respectively, for the Americas. For the Eurasian and East Asian regions, these numbers are much lower, and vary between 1.2% and 8.6%. In keeping with these results, ΔF's are actually never as large in the comparisons of Eurasia and East Asia as in other comparisons. For instance, STRs do not show any allele with ΔF > 0.45 in Eurasia or in East Asia, whereas ΔF reaches 0.6 in Africa and 0.65 in America.

Given their large mutation rate, it may seem surprising that STR alleles show ΔF as large as those observed for SNPs and for indels if these differences had been created during the expansion out-of-Africa some 50 to 60 thousand years ago. Over time, mutations are indeed expected to erode large initial frequency differences at neutral loci, and thus large ΔF (50% or more) could be better explained by their maintenance due to selection. In order to check how quickly mutations would lower the frequency of an allele initially fixed in a population, we have carried out simple simulations at STR loci of an unsubdivided population under a pure stepwise mutation model. We have reported this decrease over 2000 generations in Figure S1 for different mutation rates and different effective population sizes. As expected the rate of decrease is positively correlated with mutation rate, and its variance is negatively correlated with population size. However, for a mutation rate of 5×10−4, the allele frequency is still about 65% after 1,000 generations and 46% after 2,000 generations. For a lower mutation rate of 10−4, the mean expected frequencies are 91–92% and 83–85% after 1,000 and 2,000 generations, respectively, depending on the effective population size. Given the relatively large variance of mutation rates for human STR loci (Xu et al. 2005), it appears therefore likely that STR allele frequencies of more than 80% could still be observed after 2,000 generations if they were initially fixed by surfing or a strong bottleneck, without the need to invoke selection for their maintenance. Still, one would expect that loci with high mutation rates would show lower allele frequency differences today. Since heterozygosity is positively correlated with mutation rate for STRs (Kimmel & Chakraborty, 1996), we would expect loci with a low heterozygosity to have larger allele frequency differences than loci with a high heterozygosity, and this is exactly what we observe in Figure 1.

Figure 1. Relationship between average heterozygosity over all populations (He) and largest absolute allele frequency difference (ΔFmax) for STR loci.

Download figure to PowerPoint

image

Surfing promotes the increase of allele frequencies in the direction of a spatial expansion. Therefore we expect to find more STR alleles with increased frequency in newly colonised areas than alleles with decreased frequency, since the decrease compensating the increase of a single allele will affect several other alleles at a given locus. This excess should be especially pronounced for Africa and America, because they are separated by spatial bottlenecks from the Eurasian continent. As shown in Figures 2 and 3, there is indeed a clear asymmetry in the distribution of STR allele frequency differences between regions. For instance, by considering only alleles with ΔF > 0.3, there are clearly more alleles that increased in frequency outside Africa than there are alleles that decreased in frequency. On the contrary, for East Asia and the Americas, there are more alleles at a higher frequency within these regions (Table S5). Since it is not possible to describe this pattern for diallelic loci like SNPs and indels, we tested for these markers whether the derived alleles show an asymmetry in frequency differences. We actually did not expect to find any asymmetry, as surfing does not discriminate between ancestral and derived alleles. For the indels the derived allele is about equally likely to increase in frequency as it is to decrease in frequency (Table S6). For SNPs however, we find that derived alleles have more often increased than decreased outside Africa for 0.15 < ΔF < 0.5, while we see the reverse situation in America for 0.3 < ΔF < 0.4 (Table S7). No clear pattern occurs for the other two regions (Table S7). This pattern is compatible with surfing, since most derived SNP alleles have low frequencies in Africa and could thus have had more room to increase in frequency by surfing than already frequent alleles.

Figure 2. Comparison of the distribution of allele frequencies between regions. A: Africa vs. rest of the World; B: Eurasia vs. rest of the World; C: East Asia vs. rest of the World; D: America vs. rest of the World. The grey scale in each square is proportional to the fraction of alleles (on a log-scale) with a given average frequency. The size of the circles within squares is proportional to the number of loci with a given average frequency. Note that each locus is represented here by the allele with the largest frequency difference. Frequencies below 0 indicate that the alleles are not present in the respective group of populations. Note that alleles on the diagonal have equal frequencies in the two groups of populations.

Download figure to PowerPoint

image

Figure 3. Lod ratio of the number of alleles with a positive frequency difference (#ΔF+) and the number of alleles with a negative frequency difference (#ΔF-), where positive means a lower frequency in the region of interest and a higher frequency in the rest of the world, as a function of ΔF. A positive lod ratio indicates that more alleles increased than decreased by a given ΔF out of the region of interest. Filled symbols indicate significant lod ratios (p-value < 0.05, as assessed by a sign test). We only report ΔF categories with more than 10 alleles.

Download figure to PowerPoint

image

Eberle et al. (2006) found that genic regions are enriched for signals of positive selection compared to non-genic regions (see also Hinds et al. 2005, Voight et al. 2006, Barreiro et al. 2008). If large ΔF were mainly created by the action of positive selection, it should be especially common close to genes. However, we find the correlation of ΔFmax and distance to the closest gene is only significant (at the 5% level) in three instances: for STR alleles in Eurasia, as well as for SNP alleles in Eurasia and America (Figures S2 and S4). In all three cases the explained variance (R2) is small and the p-values are above the 1% level. For indels there is no significant correlation between ΔFmax and distance to genes (Figure S3). However, the power to detect selection close to genic regions may be limited here by the lower density of markers than that available in previous genomic studies, which were however based on a much smaller number of populations.

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

We have found an unexpectedly large fraction of loci showing strong differences in allele frequencies between continents in all three datasets. 43% of the indels, 32% of the SNPs and 28% of the STR loci show large frequency differences (ΔFmax > 0.3) between a given geographic region and the rest of the world. A visual inspection of the spatial distribution of some of these allele frequencies indeed reveals striking features (Figure 4), with strong differences between continents, either with very narrow or broader clines, which at first sight is difficult to attribute to pure neutral processes. However, the sheer number of loci showing such striking patterns makes it difficult to believe that these patterns have all been shaped by positive selection, as previously advocated (Evans et al. 2005, Mekel-Bobrov et al. 2005, Akey et al. 2006, Myles et al. 2008).

There is a clear excess of large ΔF between sub-Saharan Africa or the Americas and other regions as compared to ΔF between Eurasia or east Asia and other regions (Tables S2-S4). This is in line with previous genome scan studies, which detected more evidence of recent positive selection in Eurasian and East Asian populations as compared to African populations (Kayser et al. 2003, Akey et al. 2004, Storz et al. 2004, Carlson et al. 2005, Williamson et al. 2007). African populations seem therefore to have a deficit of recent positive selection (but see Hawks et al. 2007), which may be interpreted as evidence that selective pressures in recent times were more prevalent outside of Africa (Akey et al. 2004, Storz et al. 2004). In agreement with this hypothesis, Tang et al. (2007) found more genomic regions potentially influenced by selection when Africa was compared to Eurasian or to Asian populations than in the comparison of Eurasia to Asia. Under a selectionist view, this could be explained by the fact that the Eurasian continent has been colonized only recently and traces of selection would be easier to recognize. However, the populations remaining in Africa have also experienced drastic changes in their environment during the past 50,000 years (deMenocal, 2004), and prominent examples of recent genetic adaptations have been found in this continent as well (e.g. beta-globin (Hanchard et al. 2007), G6PD (Saunders et al. 2002), or lactose tolerance (Tishkoff et al. 2007)). Like Africa, the Americas are also strongly differentiated from the rest of the World, and here selection would have had little time to operate, especially given the overall small sizes of the populations, leading to large levels of differentiation among Amerindian populations (Wang et al. 2007b).

We believe that demographic factors can better explain the particular differentiation of both Africa and the Americas. These two continents are indeed geographically very isolated from the others, such that some spatial and demographic bottlenecks have certainly occurred during the exit out-of-Africa to colonize Eurasia and during the colonization of the Americas from North-East Asia (see e.g. Fagundes et al. 2007). Moreover, these spatial bottlenecks could have also enhanced the possibility of allelic surfing during subsequent spatial expansions (Travis et al. 2007). Allele surfing could also explain the asymmetry of the STR allele frequency distributions (Figures 2 and 3), since this phenomenon originally described the increase in frequency of rare alleles over large and recently colonized areas (Edmonds et al. 2004, Klopfstein et al. 2006). Therefore, the asymmetries shown in Figures 2 and 3 are expected after a range expansion out-of-Africa, as well as into Eurasia, East-Asia and the Americas.

If large allele frequency differences were mainly driven by positive selection acting on coding regions, one would expect to see a negative relationship between ΔF and the distance between gene and markers. Voight et al. (2006) indeed discovered more signals of selection in genic regions than in non-genic regions of the genome and Hinds et al. (2005) and Eberle et al. (2006) found that regions of extended linkage disequilibrium are enriched for genic SNPs. When testing for a correlation of allele frequency differences and distance to genes, however, we find only marginally significant results in three cases. We note however, that the relative lower number of loci examined here in a large number of populations is in contrast with previous genome scan studies, where hundreds of thousands of loci were studied in a very few populations. This low marker density may indeed prevent us from obtaining significant results, and it would be interesting to extend our analysis to new databases containing hundreds of thousands of markers (see e.g. Jakobsson et al. 2008, Li et al. 2008). In any case, the fact that markers showing high levels of differentiation between continents appear randomly scattered over the whole genome is more in line with surfing than with positive selection as a cause. It is, however, very likely that we observe the effects of diverse selective and neutral forces and their interaction. Positive selection, genetic drift and allelic surfing mainly lead to increased genetic differences between populations, while balancing selection and migration decrease differentiation. Our results suggest that local adaptation is certainly not the main acting force in promoting these large allele frequency changes between continental regions, but selection could certainly be involved at various loci.

Among the genes that are close to markers with high allele frequency differences between African and non-African populations, we could identify some that were already signalled as candidates for positive selection in previous studies using different criterion than mere allele frequency differences between continents. These are TCF15 (Storz et al. 2004), KRTAP23–1 (Williamson et al. 2007), PHACTR1 (Williamson et al. 2007), C20orf26 (Williamson et al. 2007), ANTXR2 (Kimura et al. 2007), UTRN (Tang et al. 2007), TYRP1 (Izagirre et al. 2006, Lao et al. 2007), LYST (Izagirre et al. 2006), DMD (Nachman & Crowell, 2000), SEMA4F (Nielsen et al. 2005), and E2F6 (Kayser et al. 2003). It suggests either that markers with geographic differentiation may indeed point to linked selected genes or that previous studies using allele frequency difference as a criterion to identify outlier loci have erroneously mistaken surfing for selection.

Since allele surfing looks very much like a selective sweep (Nielsen et al. 2007, Excoffier & Ray, 2008) it would affect other aspects of genetic diversity than the allele frequency spectrum, like linkage disequilibrium and extended homozygosity (Biswas & Akey, 2006). Previous studies aiming at detecting positively selected loci have attempted to control for past demography, either by 1) explicitly modelling some complex demography (Sabeti et al. 2007, Stajich & Hahn, 2005, Tang et al. 2007, Williamson et al. 2007), 2) by comparing diversity linked to derived or ancestral alleles (Voight et al. 2006), or 3) by contrasting coding to non-coding regions (Akey et al. 2002, Barreiro et al. 2008). To our knowledge, range expansions have never been used as a null model against which observed patterns were examined, and it is thus unclear (and would be worth examining) how the sensitivity of the first types of approaches would change under such a new null model. As mentioned above, derived and ancestral alleles show different frequencies in Africa (The International HapMap Consortium, 2007, Li et al. 2008) and the result of positive selection differs between new and standing variation (Przeworski et al. 2005, Teshima et al. 2006, Barrett & Schluter, 2008), so that tests based on the comparison of diversity associated to derived and ancestral alleles may indeed be sensitive to allele surfing, simply because these two allele categories have different initial frequencies. The comparison of genic to non-genic regions may indeed be the approach most robust against past demography. For instance, Barreiro et al. (2008) compared the proportion of loci with a high FST between genic and non-genic SNPs. They found that the proportion of genic SNPs with an FST>0.65 was about 2.8 fold larger than the proportion of non-genic SNPs with equally large FST, and they could identify several candidate genes based on this high level of differentiation between populations. However, since this class of high FST SNPs represents only about 0.35% of all genic SNPs, it suggests that most genic regions have not been influenced much by selection. While we find that positive selection is unlikely to have shaped the allele frequency spectrum at most loci, it may certainly have acted on fewer genes than previously believed, and our current results do not allow us to discriminate between the effects of demography and selection for an individual locus. Loci which are candidates for being under positive selection should therefore be more carefully scrutinized to find links between potentially selected alleles and a phenotypic effect (see e.g. Sabeti et al. 2007).

Conclusions

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

The survey of the HGDP database on human polymorphisms reveals that large allele frequency differences between continental regions are extremely common. Indeed as much as 30% of loci show very large allele frequency differences between continents. These differences are unlikely to have been created by positive selection, but are more likely the result of neutral demographic processes such as the surfing phenomenon. Because the erosion of large allele frequency differences by mutation is slow, even for large mutation rates, the surprisingly large number of strongly differentiated STR alleles also do not need to be explained by the action of positive selection. Africa and the Americas show a much larger extent of differentiation than Eurasia or East Asia, which is certainly due to changes in allele frequencies during the colonisation of the Eurasian and the American continents. Disentangling the effects of selection and neutral demographic processes on genome diversity remains an important challenge of future human evolution studies.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

Thanks to Montgomery Slatkin for his comments on a previous version of the manuscript, and to Gerald Heckel and Matthieu Foll for stimulating discussions on the subject. We are grateful to Mourad Sahbatou and Sijia Wang for providing information about the genomic location of some of the markers, and to Isabelle Dupanloup for providing help on database issues. This work was supported by a Swiss NSF grant No 3100A0-112072 to L.E.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information
  • Akey, J. M., Eberle, M. A., Rieder, M. J., Carlson, C. S., Shriver, M. D., Nickerson, D. A. & Kruglyak, L. (2004) Population History and Natural Selection Shape Patterns of Genetic Variation in 132 Genes. PLoS Biology 2, e286.
  • Akey, J. M., Swanson, W. J., Madeoy, J., Eberle, M. & Shriver, M. D. (2006) TRPV6 exhibits unusual patterns of polymorphism and divergence in worldwide populations. Hum Mol Genet 15, 21062113.
  • Akey, J. M., Zhang, G., Zhang, K., Jin, L. & Shriver, M. D. (2002) Interrogating a high-density SNP map for signatures of natural selection. Genome Res 12, 18051814.
  • Arnholt, A. T. (2007) BSDA: Basic Statistics and Data Analysis.). 0.1 ed.
  • Balaresque, P. L., Ballereau, S. J. & Jobling, M. A. (2007) Challenges in human genetic diversity: demographic history and adaptation. Hum Mol Genet 16, R134139.
  • Barbujani, G., Magagni, A., Minch, E. & Cavalli-Sforza, L. L. (1997) An apportionment of human DNA diversity. Proc Natl Acad Sci U S A 94, 45164519.
  • Barreiro, L. B., Laval, G., Quach, H., Patin, E. & Quintana-Murci, L. (2008) Natural selection has driven population differentiation in modern humans. Nat Genet 40, 340345.
  • Barrett, R. D. H. & Schluter, D. (2008) Adaptation from standing genetic variation. Trends in Ecology & Evolution 23, 3844.
  • Bastos-Rodrigues, L., Pimenta, J. R. & Pena, S. D. J. (2006) The Genetic Structure of Human Populations Studied Through Short Insertion-Deletion Polymorphisms. Annals of Human Genetics 70, 658665.
  • Bersaglieri, T., Sabeti, P., Patterson, N., Vanderploeg, T., Schaffner, S., Drake, J., Rhodes, M., Reich, D. & Hirschhorn, J. (2004) Genetic Signatures of Strong Recent Positive Selection at the Lactase Gene. The American Journal of Human Genetics 74, 11111120.
  • Biswas, S. & Akey, J. M. (2006) Genomic insights into positive selection. Trends Genet 22, 437446.
  • Bryk, J. A., Hardouin, E., Pugach, I., Hughes, D., Strotmann, R., Stoneking, M. & Myles, S. (2008) Positive Selection in East Asians for an EDAR Allele that Enhances NF-ΰκ Activation. PLoS ONE 3, e2209.
  • Butty, V., Roy, M., Sabeti, P., Besse, W., Benoist, C. & Mathis, D. (2007) Signatures of strong population differentiation shape extended haplotypes across the human CD28, CTLA4, and ICOS costimulatory genes. Proc Natl Acad Sci U S A 104, 570575.
  • Cann, H. M., De Toma, C., Cazes, L., Legrand, M.-F., Morel, V., Piouffre, L., Bodmer, J., Bodmer, W. F., Bonne-Tamir, B., Cambon-Thomsen, A., Chen, Z., Chu, J., Carcassi, C., Contu, L., Du, R., Excoffier, L., Friedlaender, J. S., Groot, H., Gurwitz, D., Herrera, R. J., Huang, X., Kidd, J., Kidd, K. K., Langaney, A., Lin, A. A., Mehdi, S. Q., Parham, P., Piazza, A., Pistillo, M. P., Qian, Y., Shu, Q., Xu, J., Zhu, S., Weber, J. L., Greely, H. T., Feldman, M. W., Thomas, G., Dausset, J. & Cavalli-Sforza, L. L. (2002) A Human Genome Diversity Cell Line Panel. Science 296, 261b262.
  • Carlson, C. S., Thomas, D. J., Eberle, M. A., Swanson, J. E., Livingston, R. J., Rieder, M. J. & Nickerson, D. A. (2005) Genomic regions exhibiting positive selection identified from dense genotype data. Genome Res 15, 15531565.
  • Conrad, D. F., Jakobsson, M., Coop, G., Wen, X. Q., Wall, J. D., Rosenberg, N. A. & Pritchard, J. K. (2006) A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 38, 12511260.
  • Currat, M., Excoffier, L., Maddison, W., Otto, S. P., Ray, N., Whitlock, M. C. & Yeaman, S. (2006) Comment on “Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens” and “Microcephalin, a Gene Regulating Brain Size, Continues to Evolve Adaptively in Humans”. Science 313, 172.
  • De Jong, F. A., Marsh, S., Mathijssen, R. H. J., King, C., Verweij, J., Sparreboom, A. & Mcleod, H. L. (2004) ABCG2 Pharmacogenetics: Ethnic Differences in Allele Frequency and Assessment of Influence on Irinotecan Disposition. Clin Cancer Res 10, 58895894.
  • DeMenocal, P. B. (2004) African climate change and faunal evolution during the Pliocene-Pleistocene. Earth and Planetary Science Letters, 220 324.
  • Eberle, M. A., Rieder, M. J., Kruglyak, L. & Nickerson, D. A. (2006) Allele Frequency Matching Between SNPs Reveals an Excess of Linkage Disequilibrium in Genic Regions of the Human Genome. PLoS Genetics 2, e142.
  • Edmonds, C. A., Lillie, A. S. & Cavalli-Sforza, L. L. (2004) Mutations arising in the wave front of an expanding population. Proc Natl Acad Sci U S A 101, 975979.
  • Evans, P., Gilbert, S., Mekel-Bobrov, N. & Al., E. (2005) Microcephalin, a gene regulating brain size, continues to evolve adaptively in humans. Science 309, 17171720.
  • Excoffier, L. (2003) Human diversity: Our genes tell where we live. Current Biology 13, R134R136.
  • Excoffier, L., Laval, G. & Schneider, S. (2005) Arlequin (version 3.0): An integrated software for population genetics data analysis. Evolutionary Bioinformatics Online 1, 4750.
  • Excoffier, L. & Ray, N. (2008) Surfing during population expansions promotes genetic revolutions and structuration. Trends in Ecology and Evolution 23, 347351.
  • Fagundes, N. J. R., Ray, N., Beaumont, M., Neuenschwander, S., Salzano, F. M., Bonatto, S. L. & Excoffier, L. (2007) Statistical evaluation of alternative models of human evolution. Proc Natl Acad Sci U S A 104, 1761417619.
  • Fullerton, S. M., Bartoszewicz, A., Ybazeta, G., Horikawa, Y., Bell, G. I., Kidd, K. K., Cox, N. J., Hudson, R. R. & Di Rienzo, A. (2002) Geographic and haplotype structure of candidate type 2 diabetes-susceptibility variants at the calpain-10 locus. The American Journal of Human Genetics, 70 10961106.
  • Gerstenblith, M. R., Goldstein, A. M., Fargnoli, M. C., Peris, K. & Landi, M. T. (2007) Comprehensive evaluation of allele frequency differences of MC1R variants across populations. Human Mutation 28, 495505.
  • Gilad, Y., Rosenberg, S., Przeworski, M., Lancet, D. & Skorecki, K. (2002) Evidence for positive selection and population structure at the human MAO-A gene. Proc Natl Acad Sci U S A 99, 862867.
  • Goebel, T., Waters, M. R. & O’rourke, D.H. (2008) The Late Pleistocene Dispersal of Modern Humans in the Americas. Science 319, 14971502.
  • Hahn, M. W., Rockman, M. V., Soranzo, N., Goldstein, D. B. & Wray, G. A. (2004) Population Genetic and Phylogenetic Evidence for Positive Selection on Regulatory Mutations at the Factor VII Locus in Humans. Genetics 167, 867877.
  • Hallatschek, O., Hersen, P., Ramanathan, S. & Nelson, D. R. (2007) Genetic drift at expanding frontiers promotes gene segregation. Proc Natl Acad Sci U S A 104, 1992619930.
  • Hallatschek, O. & Nelson, D. R. (2008) Gene surfing in expanding populations. Theoretical Population Biology 73, 158170.
  • Hamblin, M. T., Thompson, E. E. & Di Rienzo, A. (2002) Complex signatures of natural selection at the Duffy blood group locus. The American Journal of Human Genetics 70, 369383.
  • Han, Y., Gu, S., Oota, H., Osier, M. V., Pakstis, A. J., Speed, W. C., Kidd, J. R. & Kidd, K. K. (2007) Evidence of Positive Selection on a Class I ADH Locus. American Journal of Human Genetics 80, 441456.
  • Hanchard, N., Elzein, A., Trafford, C., Rockett, K., Pinder, M., Jallow, M., Harding, R., Kwiatkowski, D. & Mckenzie, C. (2007) Classical sickle beta-globin haplotypes exhibit a high degree of long-range haplotype similarity in African and Afro-Caribbean populations. BMC Genetics 8, 52.
  • Handley, L. J. L., Manica, A., Goudet, J. & Balloux, F. (2007) Going the distance: human population genetics in a clinal world. Trends Genet 23, 432.
  • Hawks, J., Wang, E. T., Cochran, G. M., Harpending, H. C. & Moyzis, R. K. (2007) Recent acceleration of human adaptive evolution. Proc Natl Acad Sci U S A 104, 2075320758.
  • Hinds, D. A., Stuve, L. L., Nilsen, G. B., Halperin, E., Eskin, E., Ballinger, D. G., Frazer, K. A. & Cox, D. R. (2005) Whole-Genome Patterns of Common DNA Variation in Three Human Populations. Science 307, 10721079.
  • Hughes, L. B., Beasley, T. M., Patel, H., Tiwari, H. K., Morgan, S. L., Baggott, J. E., Saag, K. G., Mcnicholl, J., Moreland, L. W., Alarcon, G. S. & Bridges, S. L., Jr. (2006) Racial or ethnic differences in allele frequencies of single-nucleotide polymorphisms in the methylenetetrahydrofolate reductase gene and their influence on response to methotrexate in rheumatoid arthritis. Ann Rheum Dis 65, 12131218.
  • Izagirre, N., Garcia, I., Junquera, C., De La Rua, C. & Alonso, S. (2006) A Scan for Signatures of Positive Selection in Candidate Loci for Skin Pigmentation in Humans. Molecular Biology and Evolution 23, 16971706.
  • Jakobsson, M., Scholz, S. W., Scheet, P., Gibbs, J. R., Vanliere, J. M., Fung, H.-C., Szpiech, Z. A., Degnan, J. H., Wang, K., Guerreiro, R., Bras, J. M., Schymick, J. C., Hernandez, D. G., Traynor, B. J., Simon-Sanchez, J., Matarin, M., Britton, A., Van De Leemput, J., Rafferty, I., Bucan, M., Cann, H. M., Hardy, J. A., Rosenberg, N. A. & Singleton, A. B. (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 9981003.
  • Karolchik, D., Kuhn, R. M., Baertsch, R., Barber, G. P., Clawson, H., Diekhans, M., Giardine, B., Harte, R. A., Hinrichs, A. S., Hsu, F., Kober, K. M., Miller, W., Pedersen, J. S., Pohl, A., Raney, B. J., Rhead, B., Rosenbloom, K. R., Smith, K. E., Stanke, M., Thakkapallayil, A., Trumbower, H., Wang, T., Zweig, A. S., Haussler, D. & Kent, W. J. (2008) The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res 36, D773779.
  • Kayser, M., Brauer, S. & Stoneking, M. (2003) A Genome Scan to Detect Candidate Regions Influenced by Local Natural Selection in Human Populations. Molecular Biology and Evolution 20, 893900.
  • Kimmel, M. & Chakraborty, R. (1996) Measures of Variation at DNA Repeat Loci under a General Stepwise Mutation Model. Theoretical Population Biology 50, 345.
  • Kimura, R., Fujimoto, A., Tokunaga, K. & Ohashi, J. (2007) A Practical Genome Scan for Population-Specific Strong Selective Sweeps That Have Reached Fixation. PLoS ONE 2.
  • Klopfstein, S., Currat, M. & Excoffier, L. (2006) The Fate of Mutations Surfing on the Wave of a Range Expansion. Molecular Biology and Evolution 23, 482490.
  • Lamason, R. L., Mohideen, M.-A.P.K., Mest, J. R., Wong, A. C., Norton, H. L., Aros, M. C., Jurynec, M. J., Mao, X., Humphreville, V. R., Humbert, J. E., Sinha, S., Moore, J. L., Jagadeeswaran, P., Zhao, W., Ning, G., Makalowska, I., Mckeigue, P. M., O’donnell, D., Kittles, R., Parra, E. J., Mangini, N. J., Grunwald, D. J., Shriver, M. D., Canfield, V. A. & Cheng, K. C. (2005) SLC24A5, a Putative Cation Exchanger, Affects Pigmentation in Zebrafish and Humans. Science 310, 17821786.
  • Lao, O., De Gruijter, J. M., Van Duijn, K., Navarro, A. & Kayser, M. (2007) Signatures of Positive Selection in Genes Associated with Human Skin Pigmentation as Revealed from Analyses of Single Nucleotide Polymorphisms. Annals of Human Genetics 71, 354369.
  • Lewontin, R. C. (1972) The apportionment of human diversity. Evolutionary Biology 6, 381398.
  • Lewontin, R. C. (1995) Human diversity. New York : Scientific American Library.
  • Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran, S., Cann, H. M., Barsh, G. S., Feldman, M., Cavalli-Sforza, L. L. & Myers, R. M. (2008) Worldwide Human Relationships Inferred from Genome-Wide Patterns of Variation. Science 319, 11001104.
  • Lohmueller, K. E., Wong, L. J. C., Mauney, M. M., Jiang, L., Felder, R. A., Jose, P. A. & Williams, S. M. (2006) Patterns of Genetic Variation in the Hypertension Candidate Gene GRK4: Ethnic Variation and Haplotype Structure. Annals of Human Genetics 70, 2741.
  • Mekel-Bobrov, N., Gilbert, S. L., Evans, P. D., Vallender, E. J., Anderson, J. R., Hudson, R. R., Tishkoff, S. A. & Lahn, B. T. (2005) Ongoing Adaptive Evolution of ASPM, a Brain Size Determinant in Homo sapiens. Science 309, 17201722.
  • Mori, M., Yamada, R., Kobayashi, K., Kawaida, R. & Yamamoto, K. (2005) Ethnic differences in allele frequency of autoimmune-disease-associated SNPs. J Hum Genet 50, 264266.
  • Myles, S., Tang, K., Somel, M., Green, R. E., Kelso, J. & Stoneking, M. (2008) Identification and Analysis of Genomic Regions with Large Between-Population Differentiation in Humans. Ann Hum Genet, 72, 99110.
  • Nachman, M. W. & Crowell, S. L. (2000) Contrasting Evolutionary Histories of Two Introns of the Duchenne Muscular Dystrophy Gene, Dmd, in Humans. Genetics 155, 18551864.
  • Nakajima, T., Wooding, S., Sakagami, T., Emi, M., Tokunaga, K., Tamiya, G., Ishigami, T., Umemura, S., Munkhbat, B., Jin, F., Guan-Jun, J., Hayasaka, I., Ishida, T., Saitou, N., Pavelka, K., Lalouel, J. M., Jorde, L. B. & Inoue, I. (2004) Natural selection and population history in the human angiotensinogen gene (AGT): 736 complete AGT sequences in chromosomes from around the world. The American Journal of Human Genetics 74, 898916.
  • Nielsen, R., Hellmann, I., Hubisz, M., Bustamante, C. & Clark, A. G. (2007) Recent and ongoing selection in the human genome. Nat Rev Genet 8, 857.
  • Nielsen, R., Williamson, S., Kim, Y., Hubisz, M. J., Clark, A. G. & Bustamante, C. (2005) Genomic scans for selective sweeps using SNP data. Genome Research 15, 15661575.
  • Norton, H. L., Kittles, R. A., Parra, E., Mckeigue, P., Mao, X., Cheng, K., Canfield, V. A., Bradley, D. G., Mcevoy, B. & Shriver, M. D. (2007) Genetic Evidence for the Convergent Evolution of Light Skin in Europeans and East Asians. Molecular Biology and Evolution 24, 710722.
  • Oota, H., Pakstis, A. J., Bonne-Tamir, B., Goldman, D., Grigorenko, E., Kajuna, S. L. B., Karoma, N. J., Kungulilo, S., Lu, R. B., Odunsi, K., Okonofua, F., Zhukova, O. V., Kidd, J. R. & Kidd, K. K. (2004) The evolution and population genetics of the ALDH2 locus: random genetic drift, selection, and low levels of recombination. Annals of Human Genetics 68, 93109.
  • Osier, M., Pakstis, A., Soodyall, H., Comas, D., Goldman, D., Odunsi, A., Okonofua, F., Parnas, J., Schulz, L., Bertranpetit, J., Bonne-Tamir, B., Lu, R. B., Kidd, J. & Kidd, K. (2002) A Global Perspective on Genetic Variation at the ADH Genes Reveals Unusual Patterns of Linkage Disequilibrium and Diversity. American Journal of Human Genetics 71, 8499.
  • Pasanen, M. K., Neuvonen, P. J. & Niemi, M. (2008) Global analysis of genetic variation in SLCO1B1. Pharmacogenomics 9, 1933.
  • Przeworski, M., Coop, G. & Wall, J. D. (2005) The signature of positive selection on standing genetic variation. Evolution 59, 23122323.
  • R Development Core Team (2008) R: A language and environment for statistical computing. R.F.F.S.Computing (ed.). Vienna , Austria .
  • Ramachandran, S., Deshpande, O., Roseman, C. C., Rosenberg, N. A., Feldman, M. W. & Cavalli-Sforza, L. L. (2005) Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci U S A 102, 1594215947.
  • Ray, N. (2005) pathmatrix: a geographical information system tool to compute effective distances among samples. Molecular Ecology Notes 5, 177180.
  • Rockman, M. V., Hahn, M. W., Soranzo, N., Goldstein, D. B. & Wray, G. A. (2003) Positive Selection on a Human-Specific Transcription Factor Binding Site Regulating IL4 Expression. Current Biology 13, 21182123.
  • Rockman, M. V., Hahn, M. W., Soranzo, N., Loisel, D. A., Goldstein, D. B. & Wray, G. A. (2004) Positive Selection on MMP3 Regulation Has Shaped Heart Disease Risk. Current Biology 14, 15311539.
  • Rockman, M. V., Hahn, M. W., Soranzo, N., Zimprich, F., Goldstein, D. B. & Wray, G. A. (2005) Ancient and Recent Positive Selection Transformed Opioid cis-Regulation in Humans. PLoS Biology 3, e387.
  • Rosenberg, N., Murata, M., Ikeda, Y., Opare-Sem, O., Zivelin, A., Geffen, E. & Seligsohn, U. (2002a) The frequent 5,10-methylenetetrahydrofolate reductase C677T polymorphism is associated with a common haplotype in whites, Japanese, and Africans. The American Journal of Human Genetics 70, 758762.
  • Rosenberg, N. A. (2006) Standardized Subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, Accounting for Atypical and Duplicated Samples and Pairs of Close Relatives. Annals of Human Genetics 70, 841847.
  • Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. & Feldman, M. W. (2002b) Genetic structure of human populations. Science 298, 23812385.
  • Sabbagh, A., Langaney, A., Darlu, P., Gerard, N., Krishnamoorthy, R. & Poloni, E. (2008) Worldwide distribution of NAT2 diversity: Implications for NAT2 evolutionary history. BMC Genetics 9, 21.
  • Sabeti, P. C., Varilly, P., Fry, B., Lohmueller, J., Hostetter, E., Cotsapas, C., Xie, X., Byrne, E. H., Mccarroll, S. A., Gaudet, R., Schaffner, S. F. & Lander, E. S. (2007) Genome-wide detection and characterization of positive selection in human populations. Nature 449, 913.
  • Sabeti, P. C., Walsh, E., Schaffner, S. F., Varilly, P., Fry, B., Hutcheson, H. B., Cullen, M., Mikkelsen, T. S., Roy, J., Patterson, N., Cooper, R., Reich, D., Altshuler, D., O’brien, S. & Lander, E. S. (2005) The case for selection at CCR5-Delta 32. PLoS Biology 3, 19631969.
  • Saunders, M. A., Good, J. M., Lawrence, E. C., Ferrell, R. E., Li, W.-H. & Nachman, M. W. (2006) Human Adaptive Evolution at Myostatin (GDF8), a Regulator of Muscle Growth. The American Journal of Human Genetics 79, 10891097.
  • Saunders, M. A., Hammer, M. F. & Nachman, M. W. (2002) Nucleotide Variability at G6pd and the Signature of Malarial Selection in Humans. Genetics 162, 18491861.
  • Schirmer, M. A. B., Toliat, M. R. C., Haberl, M. D., Suk, A. E., Kamdem, L. K. B., Klein, K. F., Brockmoller, J. B., Nurnberg, P. C., Zanger, U. M. F. & Wojnowski, L. A. (2006) Genetic signature consistent with selection against the CYP3A4*1B allele in non-African populations. Pharmacogenetics & Genomics 16, 5971.
  • Singh, P. P., Singh, M. & Mastana, S. S. (2006) APO E distribution in world populations with new data from India and the UK. Annals of Human Biology 33, 279308.
  • Soejima, M., Tachida, H., Ishida, T., Sano, A. & Koda, Y. (2006) Evidence for Recent Positive Selection at the Human AIM1 Locus in a European Population. Molecular Biology and Evolution 23, 179188.
  • Soranzo, N., Bufe, B., Sabeti, P. C., Wilson, J. F., Weale, M. E., Marguerie, R., Meyerhof, W. & Goldstein, D. B. (2005) Positive Selection on a High-Sensitivity Allele of the Human Bitter-Taste Receptor TAS2R16. Current Biology 15, 12571265.
  • Stajich, J. E. & Hahn, M. W. (2005) Disentangling the Effects of Demography and Selection in Human History. Molecular Biology and Evolution 22, 6373.
  • Stanton, T., Boxall, S., Hirai, K., Dawes, R., Tonks, S., Yasui, T., Kanaoka, Y., Yuldasheva, N., Ishiko, O., Bodmer, W., Beverley, P. C. L. & Tchilian, E. Z. (2003) A high-frequency polymorphism in exon 6 of the CD45 tyrosine phosphatase gene (PTPRC) resulting in altered isoform expression. Proc Natl Acad Sci U S A 100, 59976002.
  • Stefansson, H., Helgason, A., Thorleifsson, G., Steinthorsdottir, V., Masson, G., Barnard, J., Baker, A., Jonasdottir, A., Ingason, A., Gudnadottir, V. G., Desnica, N., Hicks, A., Gylfason, A., Gudbjartsson, D. F., Jonsdottir, G. M., Sainz, J., Agnarsson, K., Birgisdottir, B., Ghosh, S., Olafsdottir, A., Cazier, J. B., Kristjansson, K., Frigge, M. L., Thorgeirsson, T. E., Gulcher, J. R., Kong, A. & Stefansson, K. (2005) A common inversion under selection in Europeans. Nat Genet 37, 129137.
  • Storz, J. F., Payseur, B. A. & Nachman, M. W. (2004) Genome Scans of DNA Variability in Humans Reveal Evidence for Selective Sweeps Outside of Africa. Molecular Biology and Evolution 21, 18001811.
  • Tang, K., Thornton, K. R. & Stoneking, M. (2007) A New Approach for Using Genome Scans to Detect Recent Positive Selection in the Human Genome. PLoS Biology 5, 15871602.
  • Tang, K., Wong, L. P., Lee, E. J. D., Chong, S. S. & Lee, C. G. L. (2004) Genomic evidence for recent positive selection at the human MDR1 gene locus. Hum Mol Genet 13, e171.
  • Teshima, K. M., Coop, G. & Przeworski, M. (2006) How reliable are empirical genomic scans for selective sweeps? Genome Res 16, 702712.
  • The International Hapmap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861.
  • Thompson, E. E., Kuttab-Boulos, H., Witonsky, D., Yang, L., Roe, B. A. & Di Rienzo, A. (2004) CYP3A variation and the evolution of salt-sensitivity variants. American Journal of Human Genetics 75, 10591069.
  • Tishkoff, S. A., Reed, F. A., Ranciaro, A., Voight, B. F., Babbitt, C. C., Silverman, J. S., Powell, K., Mortensen, H. M., Hirbo, J. B., Osman, M., Ibrahim, M., Omar, S. A., Lema, G., Nyambo, T. B., Ghori, J., Bumpstead, S., Pritchard, J. K., Wray, G. A. & Deloukas, P. (2007) Convergent adaptation of human lactase persistence in Africa and Europe. Nat Genet 39, 3140.
  • Travis, J. M. J., Munkemuller, T., Burton, O. J., Best, A., Dytham, C. & Johst, K. (2007) Deleterious Mutations Can Surf to High Densities on the Wave Front of an Expanding Population. Molecular Biology and Evolution 24, 23342343.
  • Voight, B. F., Kudaravalli, S., Wen, X. & Pritchard, J. K. (2006) A Map of Recent Positive Selection in the Human Genome. PLoS Biology 4, e72.
  • Wang, H. A., Ding, K. D., Zhang, Y. A., Jin, L. A. B., Kullo, I. J. D. & He, F. A. C. (2007a) Comparative and evolutionary pharmacogenetics of ABCB1: complex signatures of positive selection on coding and regulatory regions. Pharmacogenetics & Genomics 17, 667678.
  • Wang, S., Lewis, C. M., Jakobsson, M., Ramachandran, S., Ray, N., Bedoya, G., Rojas, W., Parra, M. V., Molina, J. A., Gallo, C., Mazzotti, G., Poletti, G., Hill, K., Hurtado, A. M., Labuda, D., Klitz, W., Barrantes, R., Bortolini, M. C., Tira,  , Salzano, F. M., Petzl-Erler, M. L., Tsuneto, L. T., Llop, E., Rothhammer, F., Excoffier, L., Feldman, M. W., Rosenberg, N. A. & Ruiz-Linares, A. (2007b) Genetic Variation and Population Structure in Native Americans. PLoS Genetics 3, e185.
  • Weber, J. L., David, D., Heil, J., Fan, Y., Zhao, C. & Marth, G. (2002) Human Diallelic Insertion/Deletion Polymorphisms. The American Journal of Human Genetics 71, 854862.
  • Williamson, S. H., Hubisz, M. J., Clark, A. G., Payseur, B. A., Bustamante, C. D. & Nielsen, R. (2007) Localizing Recent Adaptive Evolution in the Human Genome. PLoS Genetics 3, e90.
  • Xu, H. Y., Chakraborty, R. & Fu, Y. X. (2005) Mutation rate variation at human dinucleotide microsatellites. Genetics 170, 305312.
  • Xue, Y., Daly, A., Yngvadottir, B., Liu, M., Coop, G., Kim, Y., Sabeti, P., Chen, Y., Stalker, J., Huckle, E., Burton, J., Leonard, S., Rogers, J. & Tyler-Smith, C. (2006) Spread of an Inactive Form of Caspase-12 in Humans Is Due to Recent Positive Selection. The American Journal of Human Genetics 78, 659670.
  • Young, J. H., Chang, Y.-P. C., Kim, J. D.-O., Chretien, J.-P., Klag, M. J., Levine, M. A., Ruff, C. B., Wang, N.-Y. & Chakravarti, A. (2005) Differential Susceptibility to Hypertension Is Due to Selection during the Out-of-Africa Expansion. PLoS Genetics 1, e82.
  • Yu, F., Sabeti, P. C., Hardenbol, P., Fu, Q., Fry, B., Lu, X., Ghose, S., Vega, R., Perez, A., Pasternak, S., Leal, S. M., Willis, T. D., Nelson, D. L., Belmont, J. & Gibbs, R. A. (2005) Positive Selection of a Pre-Expansion CAG Repeat of the Human SCA2 Gene. PLoS Genetics 1, e41.
  • Zhou, G. Q., Zhai, Y., Dong, X. J., Zhang, X. M., He, F. Y., Zhou, K. X., Zhu, Y. P., Wei, H. D., Yao, Z. J., Zhong, S. F., Shen, Y., Qiang, B. Q. & He, F. C. (2004) Haplotype structure and evidence for positive selection at the human IL13 locus. Molecular Biology and Evolution 21, 2935.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Material and Methods
  5. Results
  6. Discussion
  7. Conclusions
  8. Acknowledgements
  9. Web resources
  10. References
  11. Supporting Information

Table S1. Populations sampled in the HGDP-CEPH Diversity Panel.

Table S2. STR allele frequency differences (ΔF) for all comparisons between major geographic regions.

Table S3. Indel absolute allele frequency differences (ΔFmax) for all comparisons between major geographic regions.

Table S4. SNP absolute allele frequency differences (ΔFmax) for all comparisons between major geographic regions.

Table S5. Asymmetric distribution of STR allele frequency differences between regions.

Table S6. Asymmetric distribution of indel derived allele frequency differences between regions.

Table S7. Asymmetric distribution of SNP derived allele frequency differences between regions.

Figure S1. Expected decrease of STR allele frequency.

Figure S2. Relationship between ΔFmax and distance to the closest genes for the STR loci.

Figure S3. Relationship between ΔFmax and distance to the closest genes for indel loci.

Figure S4. Relationship between ΔFmax and distance to the closest genes for SNP loci.

Please note: Wiley-Blackwell Publishing are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

FilenameFormatSizeDescription
AHG_489_sm_SuppMat.doc546KSupporting info item

Please note: Wiley-Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.