SEARCH

SEARCH BY CITATION

Keywords:

  • SNP;
  • haplotype sharing;
  • recombination;
  • drift;
  • human phylogeny

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

The vast amount of recombination information in the human genome has long been ignored or deliberately avoided in studies on human population genetic relationships. One reason is that estimation of the recombination parameter from genotyping data is computationally challenging and practically difficult. Here we propose chromosome-wide haplotype sharing (CHS) as a measure of genetic similarity between human populations, which is an indirect approach to integrate recombination information. We showed in both empirical and simulated data that recombination differences and genetic differences between human populations are strongly correlated, indicating that recombination events in different human populations are evolutionarily related. We further demonstrated that CHS can be used to reconstruct reliable phylogenies of human populations and the majority of the variation in CHS matrix can be attributed to recombination. However, for distantly related populations, the utility of CHS to reconstruct correct phylogeny is limited, suggesting that the linear correlation of CHS and population divergence could have been disturbed by recurrent recombination events over a large time scale. The CHS we proposed in this study is a practical approach without involving computationally challenging and time-consuming estimation of recombination parameter. The advantage of CHS is rooted in its integration of both drift and recombination information, therefore providing additional resolution especially for populations separated recently.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

The reconstruction of human phylogeny from contemporary genetic information was first attempted by the use of allele frequencies from five major blood-group systems (Cavalli-Sforza & Edwards, 1967; Cavalli-Sforza et al., 1988). Over the last two decades, data from mitochondrial DNA (mtDNA) (Vigilant et al., 1991; Wallace et al., 1999) and the Y chromosome (Underhill et al., 2000; Jobling & Tyler-Smith, 2003) have been used almost exclusively to infer relationships among human populations. However, Y-DNA and mtDNA are both inherited effectively as single “linkage block” owing to the absence of recombination, while other genomic regions could be expected to have different lineages or histories (Hey & Machado, 2003). Recently available data on genome-wide high-density single nucleotide polymorphisms (SNPs), and the advent of whole-genome sequencing data for human populations, have demarcated a transition from single-locus based studies to genomics analysis of human population structure and relationship (Rosenberg et al., 2002, 2005; The International HapMap Consortium, 2005, 2007; Friedlaender et al., 2008; Jakobsson et al., 2008; Kayser et al., 2008; Li et al., 2008; The HUGO Pan-Asian SNP Consortium, 2009). Apart from the significant increase in the number of loci or markers, the accumulated recombination events in the genome are expected to provide additional information for human genetic relationship studies. In practice, however, estimation of population recombination parameter 4Ner from genotyping data is computationally challenging, and the theory of optimal estimation has not yet been fully worked out. Furthermore, estimators rely on assumptions about demography and selective neutrality (Ardlie et al., 2002). As a matter of fact, the vast recombination information in the human genome has long been ignored or deliberately avoided in studies on human population genetic relationships. Most recent studies considering haplotype information focused on either the genomic structure of recombination rates (McVean et al., 2004), or haplotype diversity (Conrad et al., 2006; Jakobsson et al., 2008; Auton et al., 2009), or demographic parameters of single populations (Lohmueller et al., 2009). A recent study introduced a copying model to infer population relationships (Hellenthal et al., 2008), but it was based on SNP data of sparse density (about 4.1 kb per SNP) and small sample size.

In this study, we analyzed 20,177 SNPs on chromosome 21 with a density of 1.6 kb per SNP in 11 human populations representing Africa, Europe, and East Asia. The high-density markers of this data set allow us to investigate the fine-scale recombination pattern across the chromosome and to explore multiple chromosomal regions of various sizes. We first investigated whether recombination information in the genome could be used to study human population genetic relationships, that is, whether the correlation of recombination events and their frequencies among populations could reflect population divergence history. Since recombination events cannot be precisely estimated, we propose chromosome-wide haplotype sharing (CHS) in sliding windows to capture chromosome-wide recombination information in human populations. This approach integrates information of both recombination and drift without relying on precise estimation of the recombination parameter per se. We demonstrated that CHS can be used to reconstruct reliable relationships among human populations, and showed that it could be partitioned into the contributions of recombination and genetic drift. We further conducted simulation studies to investigate properties of CHS and compared it with statistics based on single SNPs; the results showed that HS provides much higher resolution than statistics based on single SNPs, especially when the relationship of closely related human populations is concerned. We also explored the appropriate size of genomic regions for CHS analysis to correctly reconstruct phylogenies and efficiently reveal the population genetic relationship.

Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

Populations and Samples

Overall 11 human population samples representing Africa, Europe, and East Asia were studied. DNA samples from 48 African-Americans (AfA) (Xu et al., 2007) and 40 Europeans (CAU) were obtained from Coriell Cell Repositories. The five Chinese population samples, 46 Han Chinese (HAN), 42 Chuangs (CHU), 42 Hmongs (HMO), 43 Was (AVA), and 40 Uyghurs (UIG), represent five major linguistic families in East Asia and have been described elsewhere (Huang et al., 2006; Xu & Jin, 2008; Xu et al., 2008). Four HapMap population samples (60 YRI, Yoruba from Ibadan, Nigeria; 60 CEU, Utah residents with ancestry from northern and western Europe; 45 CHB, Han Chinese in Beijing; and 44 JPT, Japanese in Tokyo) (The International HapMap Consortium, 2003, 2005, 2007) were also included in this study. Only unrelated individuals were analyzed in this study. Please refer to Table 1 for more information about population samples.

Table 1.  Information on population samples.
Sample IDEthnicityGeographical locationSample size
AVAWaYunnan, China43
HMOHmongGuizhou, China42
CHUChuangGuangxi, China42
HANHan ChineseShanghai, China46
CHBHan ChineseBeijing, China45
JPTJapaneseTokyo, Japan44
UIGUyghurXinJiang, China40
CEUEuropean AmericanUSA60
CAUCaucasianEuropean40
AfAAfrican AmericanUSA48
YRIYorubaNigeria60

Markers and Their Positions

A set of 29,177 SNPs on chromosome 21 was genotyped in 48 AfA, 40 CAU, 46 HAN, 42 CHU, 42 HMO, 43 AVA, and 40 UIG. Illumina Beadlab™ technology (Illumina, Inc., San Diego, CA, USA) was used in genotyping and the method of genotyping was described elsewhere (Huang et al., 2006). Genotyped SNPs on chromosome 21 of 60 CEU, 60 YRI, 45 CHB, and 44 JPT were downloaded from the web site of The International HapMap Project (HapMap public release #23a, 2008–04-01). After necessary data filtration, for example, deleting markers with missing data >5% samples, excluding those SNPs showing deviation from Hardy-Weinberg equilibrium within population (Fisher's exact test, P < 0.05; where P was estimated using Arlequin 3.0 with 100,000 permutations (Schneider et al., 2000), we obtained 20,177 SNPs that were genotyped successfully in all 11 population samples. The physical positions of SNPs were based on the Homo sapiens Genome Build 36. The total chromosome region studied is 33.4 Mb. The average spacing between adjacent markers was 1.6 kb, with a minimum of 69 bp and a maximum of 189 kb.

Statistical Analysis

Haplotype estimation

Haplotypes were estimated for each individual from its genotypes with fastPHASE (Scheet & Stephens, 2006) version 1.2. “Population labels” were applied during the model fitting procedure to enhance accuracy. The number of random starts of the EM algorithm (-T) was set to 20, and the number of iterations of the EM algorithm (-C) was set to 50. This analysis was used to generate a “best guess” estimate of the true underlying patterns of haplotype structure (Scheet & Stephens, 2006).

Genetic Distance for Populations

Three genetic distance measurements, FST (Weir & Hill, 2002), Nei's standard distance (Nei, 1972), and Nei's DA (Nei et al., 1983) were used to estimate genetic divergence among populations.

Estimates of Haplotype Sharing Between Populations

Basically, HS between populations was estimated as the proportion of sharing haplotypes in populations compared (The HUGO Pan-Asian SNP Consortium, 2009; Xu et al., 2009). Suppose we have two populations, A and B, n(HA) and n(HB) are the total number of haplotypes observed in population A and B, respectively. The n(HA) and n(HB) equal twice the number of persons studied in populations A and B, respectively. We denote the ith distinct haplotype in population A by HAi, whose frequency is denoted by fAi. Similarly, the ith distinct haplotype in population B and its frequency are denoted by HBj and fBj, respectively.

HS between population A and B (HSAB) is defined as:

  • image

In the HS calculation, HAi and HBj are both replaced by a {0, 1} indicator matrix, where 0 indicates that the ith distinct haplotype in population A is private to population A, and 1 indicates the ith distinct haplotype in population A is also common in population B. The same rules are applied to HBj, that is, 0 indicates that the jth distinct haplotype is private to population B, while 1 indicates that the jth distinct haplotype is also common in population A.

Haplotype sharing distance between population A and B (HSDAB) is estimated as:

  • image

In some special cases, private haplotypes in a population also provide important information for population genetic history (Xu et al., 2009). Using the earlier notation, the proportion of private haplotypes, for example, in population A (HSAp) can be defined as:

  • image

In practice, the proportion of private haplotypes has special values in distinguishing recent admixture from shared ancestry. It has been applied in two recent studies (The HUGO Pan-Asian SNP Consortium, 2009; Xu et al., 2009) but will not be further discussed here.

As mentioned earlier, considering the substantial variation of recombination across the human genome (McVean et al., 2004; Myers et al., 2005), we adopted a sliding window strategy and HS was calculated in each window (5 kb ∼ 500 kb bin) for population pairs. The adjacent sliding windows were overlapped by half of the window, that is, the sliding window moves forward half of the given distance bin each time. The HS calculation has been implemented in a computer program package (PEAS v1.0) (Xu et al., 2010).

Since the results could be affected by various sample size among populations, we sampled 80 chromosomes (equal to the chromosome size of 40 individuals) with replacement in each population and counted the number of haplotypes in each genomic interval. The sampling procedure was repeated 100 times and the results were averaged for each genomic interval.

Tree Reconstruction

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

Distance based population trees were reconstructed using the Neighbor-Joining (NJ) algorithm (Saitou & Nei, 1987) with the Molecular Evolutionary Genetics Analysis software package (MEGA version 4.0) (Tamura et al., 2007). A maximum-likelihood tree of populations was reconstructed using the maximum-likelihood method (Felsenstein, 1973) with the CONTML program in the PHYLIP package (Felsenstein, 1989).

Recombination Parameter Estimation

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

The PHASE software implements a Bayesian statistical method for reconstructing haplotypes and estimating recombination parameters from population genotype data (Stephens et al., 2001; Li & Stephens, 2003; Stephens & Donnelly, 2003). A “background” recombination rate was given as a prior; any increase over the background rate (“hotspots”) was assumed to occur as a Poisson process, and the width of the hotspot was assumed to have a truncated normal distribution. We divided the 20,177 SNPs into sections (windows) of 40 SNPs with 20 overlapping SNPs between each two consecutive sections. PHASE was run for each section of chromosomal data with recombination model (-MR), 10,000 iterations, 100 thinning interval, and 10,000 burn-ins. X1000 option was invoked to obtain more accurate estimates, which increased the number of iterations of the final runs to be 1000 times longer than other runs. The other parameters were set as the default.

Linkage Disequilibrium Measures and Calculation

In this study, linkage disequilibrium (LD) was also used to measure the recombination magnitude between adjacent SNPs. Several statistics have been used to measure the LD between a pair of loci (Jorde, 1995). The two most common measures are the absolute value of D′ (denoted by |D′| hereafter), and r2, both derived from Lewontin's D (Lewontin, 1964). In this study, both |D′| and r2 were used to measure LD between adjacent SNPs, and were calculated from haplotype data inferred by fastPHASE. Estimates of |D′| were calculated following Devlin and Risch (1995), while estimates of r2 were calculated following Hill and Weir (1994).

Forward-Time Simulation Studies

To investigate the decay of HS between populations as the divergence time increases, and evaluate the ability to reconstruct correct phylogeny of HS, we conducted forward-time simulations. We explored the appropriate window size in which HS can reconstruct correct phylogeny of populations with various divergence times. A most recent common ancestor (MRCA) population with effective population size (Ne) 10,000 was created from 120 YRI chromosomes based on 20,177 SNPs on chromosome 21. A recombination rate of 1cM/1Mb/generation was used to break the chromosomes. Populations split and diverged hierarchically every five generations. All populations were simulated with constant Ne of 10,000 and without bottleneck so that the effect of drift was reduced to a minimum level. HS analysis was performed using the same procedure as that used in empirical data. Phylogenetic trees based on the haplotypes sharing measure were reconstructed using haplotypes in various window sizes (e.g., 5 kb–500 kb). For each given divergence time, the minimum window size was determined when the topology of the phylogenetic tree was consistent with the presimulated one and was stable with 100% bootstrapping value in all clades.

Partial and Multiple Mantel Tests for the Effect of Recombination and Drift on Haplotype Sharing

We used Mantel tests under a multiple correlation and regression design to simultaneously evaluate the contribution of genetic drift and recombination to HS among populations. There are three different matrices to be analyzed which are obtained from pairwise population comparisons: (1) HS matrix; (2) pairwise FST matrix, and (3) pairwise recombination correlation coefficient matrix. In this case, it would be possible to establish which part of the total explained variance of HS matrix could be attributed to drift or recombination. These relative values could be obtained simply by performing Mantel tests, using each effect separately and combined into a single model.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

Correlation between Recombination and Genetic Divergence

We first estimated population recombination rates (ρ) in 11 populations (Table 1) using the program PHASE 2.1 (see Methods section for details). Figure 1 displays ρ values along the 33.4 Mb region of chromosome 21 across 11 populations. The average ρ varies substantially among populations (Fig. 2). For example, when ρ was estimated from markers with minor allele frequency (MAF) > 0, AVA showed the smallest inline image (0.19 per kb), while AfA showed the largest inline image (1.76 per kb).

image

Figure 1. Estimated ρ values along a 33.4-Mb region of chromosome 21. Each series of points shows the estimates of log10(ρ) per bp. To separate the curves on the Y-axis, a multiplication of an arbitrary constant (c= 4) was added to individual estimates for samples other than YRI, that is, 4 were added to AfA, etc. Note both the large variation across the genome and the similar positions of the troughs and spikes shared between 11 population samples.

Download figure to PowerPoint

image

Figure 2. Estimated population recombination rate (ρ) in 11 populations. Y-axis shows the ρ values per kb which were averaged from 40-SNPs windows for each population. Population IDs are displayed on the X-axis.

Download figure to PowerPoint

Although the average ρ varies substantially among populations (Fig. 2), the chromosome-wide recombination pattern (cREC) is highly correlated among populations as shown by high Spearman rank correlation coefficients (Table 2, Table S1 and S2), indicating that the genomic positions of low and high values of ρ are consistent across populations. Similar results were obtained using the LDhat program (McVean et al., 2004). High correlation among populations could also be observed for regional LD (e.g., r2) since the decay of LD is largely dictated by recombination (Table 3, Table S3 and S4, Fig. S1).

Table 2.  Spearman rank correlations of recombination rate (ρ, above diagonal) and genetic divergence (FST, below diagonal) between populations. FST was averaged over all windows of the same distance bin for a given pair of populations.
 JPTCHBHANCHUHMOAVAUIGCAUCEUAfAYRI
  1. Note: All Spearman's ρ are significantly different from zero by one-tailed test (P < 10−6).

JPT0.8930.8980.8850.8860.8740.8760.8330.8490.7920.772
CHB0.0070.9090.8920.8920.8810.8900.8470.8620.8070.791
HAN0.0080.0000.9010.8950.8870.8880.8490.8600.8060.788
CHU0.0190.0100.0100.8950.8870.8800.8470.8550.8020.787
HMO0.0220.0130.0120.0110.8780.8670.8370.8460.7830.768
AVA0.0290.0180.0210.0220.0260.8710.8270.8440.7820.776
UIG0.0360.0370.0370.0390.0430.0480.8850.8880.8260.801
CAU0.1010.1020.1020.0980.1040.1080.0240.9040.8120.780
CEU0.1050.1070.1070.1030.1090.1130.0280.0010.8130.780
AfA0.1370.1390.1380.1340.1410.1390.0920.1040.1090.891
YRI0.1820.1850.1830.1800.1880.1830.1460.1620.1660.008
Table 3.  Spearman rank correlations of LD (|D′|, above diagonal; r2, below diagonal) between populations.
 JPTCHBHANCHUHMOAVAUIGCAUCEUAfAYRI
  1. Note: All Spearman's ρ are significantly different from zero by one-tailed test (P < 10−6).

JPT0.5660.5340.5160.5300.5320.4510.3690.3730.3130.279
CHB0.9320.5600.5490.5570.5300.4830.3980.4050.3230.289
HAN0.9310.9450.5650.5230.5080.4770.3890.3780.3240.286
CHU0.9190.9270.9350.5710.5450.4620.3980.3870.3140.291
HMO0.9100.9270.9260.9290.5360.4480.3700.3800.2870.270
AVA0.8970.9150.9100.9160.9090.4540.3920.3990.3280.284
UIG0.8490.8660.8590.8580.8470.8610.5150.5380.3990.342
CAU0.7730.7840.7790.7850.7710.7860.9080.6010.3890.319
CEU0.7690.7870.7780.7860.7730.7880.9160.9500.3880.331
AfA0.6340.6430.6380.6480.6350.6530.7250.7210.7210.645
YRI0.5560.5600.5530.5680.5540.5700.6200.6130.6130.889

Interestingly, the between-population correlation of cREC is strongly correlated with the differentiation between populations as measured by FST (Fig. 3). It is obvious that the cREC and LD patterns between populations are both strongly correlated with the genetic differences (FST) between populations as indicated by large R2 (0.92 for ρ, 0.96 for r2, and 0.94 for |D′|). Furthermore, population trees reconstructed from Spearman rank correlation coefficients of both recombination and LD between populations (Fig. S2 and S3) were very similar to the maximum-likelihood tree (Fig. S4A) and the NJ tree using Nei's DA distance (Fig. S4C) reconstructed from single SNPs, suggesting that recombination information reflects the genetic relationship among human populations or population divergence history.

image

Figure 3. The relationship of FST and Spearman rank correlation of recombination rate (ρ), Spearman rank correlation of LD (|D′| and r2) between Populations. Spearman correlation coefficients are −0.955, −0.961, −0.956 for FST versus ρ, FST versus r2, FST versus |D′|, respectively; P < 10−6 (Mantel test). The regression formulas of three lines are: y=−0.68x+ 0.90 (R2= 0.92), y=−2.07x+ 0.96 (R2= 0.96), y=−1.61x+ 0.56 (R2= 0.94), for ρ, r2, and |D′|, respectively.

Download figure to PowerPoint

Correlation between Haplotype Sharing and Genetic Divergence

HS statistics were calculated among 11 human populations in sliding windows of given size (see Methods section for details). The results from the same window size were averaged and led to the CHS statistic. The HS calculated in each distance bin (5 kb–500 kb) for 55 population pairs were shown in Tables S5–S13, respectively. As expected, populations share more haplotypes within short distance bins and closely related populations share more haplotypes. Overall, as shown in Figure 4A, CHS and FST values in all distance bins are all strongly correlated (Mantel test, r < −0.925, P < 10−4). The magnitude of correlation increases with bin size from 5 kb (r=−0.950) to 50 kb (r=−0.954), but starts to decrease beyond 100 kb (r=−0.949).

image

Figure 4. Relationship of haplotype sharing proportions (HSP), correlation of recombination rates and FST. HSP, recombination rates and pairwise FST were calculated from 20,177 SNPs for 55 population pairs. Both recombination rates and FST were averaged over all windows of the same distance bin. (A): Relationship of HSP and FST. (B) Relationship of HSP and recombination. Correlation coefficients are shown in Table 4.

Download figure to PowerPoint

Correlation between Haplotype Sharing and Recombination

Considering the strong correlations between recombination and FST, and the correlation between CHS and FST as we observed earlier, it is expected that cREC and HS are also highly correlated. We further investigated the correlation between cREC and CHS in different distance bins (5–500 kb). Overall, as shown in Figure 4B, cREC and CHS in all distance bins are strongly correlated (Mantel test, r > 0.905, P < 10−4). Correlation magnitudes increase from the 5 kb bin (r= 0.965) to the 50 kb bin (r= 0.967), and also decrease from the 100 kb bin (r= 0.956).

Partition Variation of Haplotype Sharing into Drift and Recombination

Since the CHS values in this study were calculated from haplotypes in sliding windows, they are expected to contain information of both drift and recombination. It is helpful to know the respective contribution of recombination and drift to HS measurement. We thus simultaneously evaluated the contribution of genetic drift and recombination to HS among populations using Mantel tests under a multiple correlation and regression design (see Methods section for details), with the results being shown in Table 4. In distance bins less than 100kb, partial correlation values of HS and cREC are all significant (P < 0.01), and about 40% of CHS variation can be attributed to cREC; while partial correlation values of CHS and FST are all not significant (P > 0.05), and only about 10% of CHS variation can be attributed to FST. In contrast, in distance bins greater than 100 kb, partial correlation of CHS and cREC are not significant (P > 0.07), and only about 10% of CHS variation can be attributed to cREC, while partial correlation values of CHS and FST are all significant (P < 0.05), and about 20% of CHS variation can be attributed to FST. Therefore, both drift and recombination contribute information to HS, but for genomic regions within 100 kb, the contribution of recombination is predominant and that of genetic drift is relatively small.

Table 4.  Correlation and partial correlation of haplotype sharing, genetic distance (FST) and recombination between populations.
 CorrelationPartial Correlation
FSTRecombinationFSTRecombination
  1. Note: Numbers in parentheses are P-values.

5 kb−0.9500.965−0.318 0.599
(<0.0001)(<0.0001)(0.066)(0.010)
10 kb−0.9520.968−0.310 0.629
(<0.0001)(<0.0001)(0.071)(0.008)
20 kb−0.9520.969−0.312 0.638
(<0.0001)(<0.0001)(0.075)(0.008)
30 kb−0.9540.970−0.322 0.642
(<0.0001)(<0.0001)(0.072)(0.006)
40 kb−0.9540.969−0.332 0.633
(<0.0001)(<0.0001)(0.069)(0.008)
50 kb−0.9540.967−0.355 0.608
(<0.0001)(<0.0001)(0.055)(0.008)
100 kb−0.9490.956−0.375 0.508
(<0.0001)(<0.0001)(0.047)(0.024)
200 kb−0.9440.940−0.432 0.363
(<0.0001)(<0.0001)(0.020)(0.074)
500 kb−0.9280.906−0.492 0.137
(<0.0001)(<0.0001)(0.004)(0.271)

Decay of HS with Increasing Divergence Time

Since CHS between populations is contributed by both recombination and drift, it is expected to decay as divergence time increases between populations. We investigated the decay of CHS by forward-time simulation (see Methods section). The results showed that CHS in larger windows decayed more than that in smaller windows (Fig. 5). For example, 71.1% of CHS remained in the 5 kb windows, while only 17.3% of CHS remained in the 50 kb windows and less than 1% of CHS remained in the 500 kb windows for populations diverged 5000 generations ago. To compare the results with those of single SNPs, we calculated average FST from 500 kb windows and used (1 −FST) as the measure for similarity between populations. CHS is significantly decayed compared with that of (1 −FST) (Fig. 5). This result suggested the higher resolution of CHS than FST calculated from the same number of SNPs in distinguishing populations, especially for those populations with very short divergence time.

image

Figure 5. Decay of haplotype sharing as a function of the increase of population divergence time. Results obtained from computer simulation assuming constant population size of 10,000 for all populations so that the effect of genetic drift is minimal. Haplotype sharing was calculated in sliding windows (5 kb–500 kb bin) for population pairs. Simulations were repeated 1000 times and results were averaged. The decay of haplotype sharing percentage is compared with the decay of (1 −FST) which can be taken as allele sharing of single SNPs. FST values were calculated from single SNPs within 500 kb windows and averaged for each time scale.

Download figure to PowerPoint

Window Size and Power of HS to Distinguish Populations

In haplotype-based analysis, it may be difficult to determine the ideal window size for analysis. We thus investigated different bin sizes and determined the power of CHS in each bin size to reconstruct reliable population phylogenies by simulation studies (see Methods section). The results showed that the ability to reconstruct a correct phylogeny of populations is very low using CHS with small bin sizes (Fig. 6). For example, less than 70% of windows which were less than 40 kb showed correct phylogeny when they were individually used to reconstruct population trees, that is, population trees were reconstructed by a single window of a given size. This percentage remained less than 70% no matter how long the population diverged. Generally speaking, the larger windows showed higher percentages of reconstructing correct population phylogeny. For example, more than 75% of windows of 100 kb showed correct population phylogeny for populations diverged between 20 and 500 generations ago. However, even for those windows of large size (>100 kb), the ability to reconstruct correct phylogeny remained high only if the populations did not diverge for a very long time. For example, more than 77% of 200 kb windows lead to correct population phylogeny for populations diverged 10 to 300 generations ago; more than 79% of 500 kb windows lead to correct population phylogeny for populations diverged 5 to 100 generations ago. However, this does not mean that the small size of windows cannot be used to reconstruct 100% of correct phylogenies. As shown in Figure 7, the tree topologies within 100 kb bins were supported by 100% bootstrapping values when CHS was used to reconstruct population relationships. Considering the tradeoff of achieving high resolution and robustness of the topology, we hereby recommend using CHS within 100 kb windows for reconstructing genetic relationships of human populations.

image

Figure 6. The ability to recover correct phylogeny of haplotype sharing in different window sizes. Phylogenies were reconstructed from simulated haplotypes within each window of given size; the percentage of windows showing correct phylogeny were calculated for each window size (5 kb–500 kb bin). Simulations were repeated 1000 times and results were averaged.

Download figure to PowerPoint

image

Figure 7. Population trees reconstructed based on haplotype sharing distance (HSD). (A) HSD calculated from 5 kb bins; (B) HSD calculated from 10 kb bins; (C) HSD calculated from 20 kb bins; (D) HSD calculated from 30 kb bins; (E) HSD calculated from 40 kb bins; (F) HSD calculated from 50 kb bins; (G) HSD calculated from 100 kb bins; (H) HSD calculated from 200 kb bins; (I) HSD calculated from 500 kb bins. Italic numbers on trees are bootstrap values obtained by sampling data 1000 times with replacement.

Download figure to PowerPoint

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

In this study, we have proposed HS as a measure of genetic relationship among human populations, and have shown in a set of high-density SNP data that HS could be used to reconstruct a reliable genetic relationship among human populations. The primary advantage of HS measurement is the increased resolution when closely related populations are compared, especially when data are from common variants such as SNPs with MAF larger than 5% or more. For the majority of common SNPs in the human genome, the differences between populations are not the absence or presence of a certain allele, but the allele frequencies. Furthermore, those closely related populations often share very similar allele frequencies at most of the common loci. Therefore, single SNPs are expected to provide very limited resolution when closely related populations are compared. The HS measurement provides additional resolution as it contains recombination information which could cause more genetic differences between populations.

Both drift and recombination contribute to the genetic divergence between populations; while the former removes the polymorphism within a population, the latter increases the diversity within and between populations. Previous studies reported a positive correlation between recombination rates and levels of genetic variation within human populations (Nachman et al., 1998; Nachman, 2001; Hellmann et al., 2003; Hellmann et al., 2005). Others reported positive correlations between recombination rates and levels of genetic divergence between human and mouse (Sachidanandam et al., 2001; Lercher & Hurst, 2002; Hardison et al., 2003), and other species (Migone et al., 1983; Roselius et al., 2005). However, there are also studies showing a weak correlation of recombination rate between rat, mouse and human (Jensen-Seaman et al., 2004), and showing a recombination rate varying greatly between bird species with highly conserved genome structures (Dawson et al., 2007). In this study, we showed that chromosome-wide recombination patterns among populations are strongly correlated and this correlation is in accordance with the genetic difference between human populations, indicating that recombination events in different human populations are evolutionarily related. We further demonstrated by multiple Mantel testing that the increased resolution of HS is due to the additional recombination information contained in haplotypes, that is, the majority (about 40%, for genomic regions <100 kb) of variation in HS can be attributed to recombination. Therefore, it is biologically reasonable to use HS that integrates both drift and recombination information to study human population relationships. Our current study is based on data from a single chromosome, but our results and method can be extended to the other chromosomes and even the entire genome.

The main defect of the HS measurement lies in the fact that its magnitude depends on the size of the genomic region, that is, HS magnitude in a 5 kb region will be different from that in a 500 kb region. We suggest that authors report the genomic region size as well as the number of markers when reporting HS analysis results; for example, the HS between population A and population B is 80% per 100 kb with an average number of markers. To obtain reliable tree topology and effectively distinguish closely related populations, people can choose an appropriate window size for populations with an estimated divergence time (refer to Fig. 6). However, as we have shown in this study, the relative relationship between populations will not be substantially affected by the size of the genomic region when population trees are used to display the relationship among populations (Fig. 7). We estimated the expected magnitudes of HS for populations with different divergence times, but that required many limited situations with strong assumptions, such as a large and constant effective population size. Therefore the divergence time cannot be deduced from the observed HS from empirical data where the demographic history could be more complicated for the studied populations than that which was simulated in this study.

Theoretically, recombination could be avoided when HS analysis is applied to studies on human population relationships; for instance, one solution is to choose “haplotype block” regions from the human genome (Jeffreys et al., 2001; Ardlie et al., 2002; Gabriel et al., 2002) for “perfect” HS analysis where very rare or even no recombination occurred. Nevertheless, we found in practice that this approach is actually infeasible. One reason is that most haplotype blocks shared by multiple populations could be very short, with few variants providing very limited information as well as resolution. The other reason is that even if we occasionally find a long block shared by some populations, it could only be taken as a single locus like Y-DNA which would not reveal the genome-wide scenario. Besides, such a long block could only be shared by a small number of populations, and is almost useless in other studies. Also, it is unpractical to compare the results of different studies.

Conclusions

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

In summary, to reconstruct human phylogenies or study human genetic relationships, single-site based approaches ignored the vast recombination information in human genomes, and “haplotype block” based approaches could not be generalized in either the entire genome or in most human populations. The CHS we proposed in this study is a practical way of integrating recombination information but without involving the computationally challenging and time-consuming estimation of recombination parameter. We demonstrated in both empirical and simulated data that recombination differences and genetic differences among human populations are strongly correlated, and our approach can be used to reconstruct reliable genetic relationships of human populations. The advantage of CHS is rooted in its integration of both drift and recombination information, therefore providing additional resolution especially for populations separated recently.

Authors’ Contributions

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

SX conceived and designed the study. SX collected data and performed the analysis. SX wrote the paper, with contribution from LJ. All authors read and approved the final manuscript. All authors declare that no competing financial interests exist.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

SX was supported by the National Science Foundation of China (30971577), Shanghai Rising-Star Program (11QA1407600), the Science and Technology Commission of Shanghai Municipality (09ZR1436400), and the Science Foundation of The Chinese Academy of Sciences (KSCX2-EW-Q-1-11; KSCX2-EW-R-01-05; KSCX2-EW-J-15-05). LJ was supported by the National Science Foundation of China (30890034). SX also gratefully acknowledges the support of the K. C. Wong Education Foundation, Hong Kong. This work was also supported by the MoST International Cooperation Base of China.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information
  • Ardlie, K. G., Kruglyak, L. & Seielstad, M. (2002) Patterns of linkage disequilibrium in the human genome. Nat Rev Genet 3, 299309.
  • Auton, A., Bryc, K., Boyko, A. R., Lohmueller, K. E., Novembre, J., Reynolds, A., Indap, A., Wright, M. H., Degenhardt, J. D., Gutenkunst, R. N., King, K. S., Nelson, M. R. & Bustamante, C. D. (2009) Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res 19, 795803.
  • Cavalli-Sforza, L. L. & Edwards, A. W. (1967) Phylogenetic analysis. Models and estimation procedures. Am J Hum Genet 19 (Suppl 19), 233257.
  • Cavalli-Sforza, L. L., Piazza, A., Menozzi, P. & Mountain, J. (1988) Reconstruction of human evolution: Bringing together genetic, archaeological, and linguistic data. Proc Natl Acad Sci USA, 85, 60026006.
  • Conrad, D. F., Jakobsson, M., Coop, G., Wen, X., Wall, J. D., Rosenberg, N. A. & Pritchard, J. K. (2006) A worldwide survey of haplotype variation and linkage disequilibrium in the human genome. Nat Genet 38, 12511260.
  • Dawson, D. A., Akesson, M., Burke, T., Pemberton, J. M., Slate, J. & Hansson, B. (2007) Gene order and recombination rate in homologous chromosome regions of the chicken and a passerine bird. Mol Biol Evol 24, 15371552.
  • Devlin, B. & Risch, N. (1995) A comparison of linkage disequilibrium measures for fine-scale mapping. Genomics 29, 311322.
  • Felsenstein, J. (1973) Maximum-likelihood estimation of evolutionary trees from continuous characters. Am J Hum Genet 25, 471492.
  • Felsenstein, J. (1989) PHYLIP—Phylogeny Inference Package (Version 3.2). Cladistics 5, 164166.
  • Friedlaender, J. S., Friedlaender, F. R., Reed, F. A., Kidd, K. K., Kidd, J. R., Chambers, G. K., Lea, R. A., Loo, J. H., Koki, G., Hodgson, J. A., Merriwether, D. A. & Weber, J. L. (2008) The genetic structure of Pacific Islanders. PLoS Genet 4, 173190.
  • Gabriel, S. B., Schaffner, S. F., Nguyen, H., Moore, J. M., Roy, J., Blumenstiel, B., Higgins, J., Defelice, M., Lochner, A., Faggart, M., Liu-Cordero, S. N., Rotimi, C., Adeyemo, A., Cooper, R., Ward, R., Lander, E. S., Daly, M. J. & Altshuler, D. (2002) The structure of haplotype blocks in the human genome. Science 296, 22252229.
  • Hardison, R. C., Roskin, K. M., Yang, S., Diekhans, M., Kent, W. J., Weber, R., Elnitski, L., Li, J., O’connor, M., Kolbe, D., Schwartz, S., Furey, T. S., Whelan, S., Goldman, N., Smit, A., Miller, W., Chiaromonte, F. & Haussler, D. (2003) Covariation in frequencies of substitution, deletion, transposition, and recombination during eutherian evolution. Genome Res 13, 1326.
  • Hellenthal, G., Auton, A. & Falush, D. (2008) Inferring human colonization history using a copying model. PLoS Genet 4, 111.
  • Hellmann, I., Ebersberger, I., Ptak, S. E., Paabo, S. & Przeworski, M. (2003) A neutral explanation for the correlation of diversity with recombination rates in humans. Am J Hum Genet 72, 15271535.
  • Hellmann, I., Prufer, K., Ji, H., Zody, M. C., Paabo, S. & Ptak, S. E. (2005) Why do human diversity levels vary at a megabase scale? Genome Res 15, 12221231.
  • Hey, J. & Machado, C. A. (2003) The study of structured populations–new hope for a difficult and divided science. Nat Rev Genet 4, 535543.
  • Hill, W. G. & Weir, B. S. (1994) Maximum-likelihood estimation of gene location by linkage disequilibrium. Am J Hum Genet 54, 705714.
  • Huang, W., He, Y., Wang, H., Wang, Y., Liu, Y., Wang, Y., Chu, X., Wang, Y., Xu, L., Shen, Y., Xiong, X., Li, H., Wen, B., Qian, J., Yuan, W., Zhang, C., Wang, Y., Jiang, H., Zhao, G., Chen, Z. & Jin, L. (2006) Linkage disequilibrium sharing and haplotype-tagged SNP portability between populations. Proc Natl Acad Sci USA 103, 14181421.
  • Jakobsson, M., Scholz, S. W., Scheet, P., Gibbs, J. R., Vanliere, J. M., Fung, H. C., Szpiech, Z. A., Degnan, J. H., Wang, K., Guerreiro, R., Bras, J. M., Schymick, J. C., Hernandez, D. G., Traynor, B. J., Simon-Sanchez, J., Matarin, M., Britton, A., Van De Leemput, J., Rafferty, I., Bucan, M., Cann, H. M., Hardy, J. A., Rosenberg, N. A. & Singleton, A. B. (2008) Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 9981003.
  • Jeffreys, A. J., Kauppi, L. & Neumann, R. (2001) Intensely punctate meiotic recombination in the class II region of the major histocompatibility complex. Nat Genet 29, 217222.
  • Jensen-Seaman, M. I., Furey, T. S., Payseur, B. A., Lu, Y., Roskin, K. M., Chen, C. F., Thomas, M. A., Haussler, D. & Jacob, H. J. (2004) Comparative recombination rates in the rat, mouse, and human genomes. Genome Res 14, 52838.
  • Jobling, M. A. & Tyler-Smith, C. (2003) The human Y chromosome: An evolutionary marker comes of age. Nat Rev Genet 4, 598612.
  • Jorde, L. B. (1995) Linkage disequilibrium as a gene-mapping tool. Am J Hum Genet 56, 1114.
  • Kayser, M., Lao, O., Saar, K., Brauer, S., Wang, X., Nurnberg, P., Trent, R. J. & Stoneking, M. (2008) Genome-wide analysis indicates more Asian than Melanesian ancestry of Polynesians. Am J Hum Genet 82, 194198.
  • Lercher, M. J. & Hurst, L. D. (2002) Human SNP variability and mutation rate are higher in regions of high recombination. Trends Genet 18, 337340.
  • Lewontin, R. C. (1964) The interaction of selection and linkage. Part II. Optimum Models. Genetics 50, 757782.
  • Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran, S., Cann, H. M., Barsh, G. S., Feldman, M., Cavalli-Sforza, L. L. & Myers, R. M. (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 11001104.
  • Li, N. & Stephens, M. (2003) Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165, 22132233.
  • Lohmueller, K. E., Bustamante, C. D. & Clark, A. G. (2009) Methods for human demographic inference using haplotype patterns from genomewide single-nucleotide polymorphism data. Genetics 182, 217231.
  • McVean, G. A., Myers, S. R., Hunt, S., Deloukas, P., Bentley, D. R. & Donnelly, P. (2004) The fine-scale structure of recombination rate variation in the human genome. Science 304, 581584.
  • Migone, N., Feder, J., Cann, H., Van West, B., Hwang, J., Takahashi, N., Honjo, T., Piazza, A. & Cavalli-Sforza, L. L. (1983) Multiple DNA fragment polymorphisms associated with immunoglobulin mu chain switch-like regions in man. Proc Natl Acad Sci USA 80, 467471.
  • Myers, S., Bottolo, L., Freeman, C., Mcvean, G. & Donnelly, P. (2005) A fine-scale map of recombination rates and hotspots across the human genome. Science 310, 321324.
  • Nachman, M. W. (2001) Single nucleotide polymorphisms and recombination rate in humans. Trends Genet 17, 481485.
  • Nachman, M. W., Bauer, V. L., Crowell, S. L. & Aquadro, C. F. (1998) DNA variability and recombination rates at X-linked loci in humans. Genetics 150, 11331141.
  • Nei, M. (1972) Genetic distance between populations. Am Nat 106, 283292.
  • Nei, M., Tajima, F. & Tateno, Y. (1983) Accuracy of estimated phylogenetic trees from molecular data. II. Gene frequency data. J Mol Evol 19, 153170.
  • Roselius, K., Stephan, W. & Stadler, T. (2005) The relationship of nucleotide polymorphism, recombination rate and selection in wild tomato species. Genetics 171, 753763.
  • Rosenberg, N. A., Mahajan, S., Ramachandran, S., Zhao, C., Pritchard, J. K. & Feldman, M. W. (2005) Clines, clusters, and the effect of study design on the inference of human population structure. PLoS Genet 1, 660671.
  • Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A. & Feldman, M. W. (2002) Genetic structure of human populations. Science 298, 23812385.
  • Sachidanandam, R., Weissman, D., Schmidt, S. C., Kakol, J. M., Stein, L. D., Marth, G., Sherry, S., Mullikin, J. C., Mortimore, B. J., Willey, D. L., Hunt, S. E., Cole, C. G., Coggill, P. C., Rice, C. M., Ning, Z., Rogers, J., Bentley, D. R., Kwok, P. Y., Mardis, E. R., Yeh, R. T., Schultz, B., Cook, L., Davenport, R., Dante, M., Fulton, L., Hillier, L., Waterston, R. H., Mcpherson, J. D., Gilman, B., Schaffner, S., Van Etten, W. J., Reich, D., Higgins, J., Daly, M. J., Blumenstiel, B., Baldwin, J., Stange-Thomann, N., Zody, M. C., Linton, L., Lander, E. S. & Altshuler, D. (2001) A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature 409, 928933.
  • Saitou, N. & Nei, M. (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406425.
  • Scheet, P. & Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78, 629644.
  • Schneider, S., Roessli, D. & Excoffier, L. (2000) Arlequin: A software for population genetics data analysis. Ver 2.000. Genetics and Biometry Lab, Department of Anthropology, University of Geneva .
  • Stephens, M. & Donnelly, P. (2003) A comparison of Bayesian methods for haplotype reconstruction from population genotype data. Am J Hum Genet 73, 11621169.
  • Stephens, M., Smith, N. J. & Donnelly, P. (2001) A new statistical method for haplotype reconstruction from population data. Am J Hum Genet 68, 978989.
  • Tamura, K., Dudley, J., Nei, M. & Kumar, S. (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24, 15961599.
  • The Hugo Pan-Asian SNP Consortium (2009) Mapping Human Genetic Diversity in Asia. Science 326, 15411545.
  • The International HapMap Consortium (2003) The International HapMap Project. Nature 426, 78996.
  • The International HapMap Consortium (2005) A haplotype map of the human genome. Nature 437, 12991320.
  • The International HapMap Consortium (2007) A second generation human haplotype map of over 3.1 million SNPs. Nature 449, 851861.
  • Underhill, P. A., Shen, P., Lin, A. A., Jin, L., Passarino, G., Yang, W. H., Kauffman, E., Bonne-Tamir, B., Bertranpetit, J., Francalacci, P., Ibrahim, M., Jenkins, T., Kidd, J. R., Mehdi, S. Q., Seielstad, M. T., Wells, R. S., Piazza, A., Davis, R. W., Feldman, M. W., Cavalli-Sforza, L. L. & Oefner, P. J. (2000) Y chromosome sequence variation and the history of human populations. Nat Genet 26, 358361.
  • Vigilant, L., Stoneking, M., Harpending, H., Hawkes, K. & Wilson, A. C. (1991) African populations and the evolution of human mitochondrial DNA. Science 253, 15031507.
  • Wallace, D. C., Brown, M. D. & Lott, M. T. (1999) Mitochondrial DNA variation in human evolution and disease. Gene 238, 211230.
  • Weir, B. S. & Hill, W. G. (2002) Estimating F-statistics. Annu Rev Genet 36, 721750.
  • Xu, S., Huang, W., Qian, J. & Jin, L. (2008) Analysis of genomic admixture in Uyghur and its implication in mapping strategy. Am J Hum Genet 82, 883894.
  • Xu, S., Huang, W., Wang, H., He, Y., Wang, Y., Wang, Y., Qian, J., Xiong, M. & Jin, L. (2007) Dissecting linkage disequilibrium in african-american genomes: Roles of markers and individuals. Mol Biol Evol 24, 20492058.
  • Xu, S. & Jin, L. (2008) A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery. Am J Hum Genet 83, 322336.
  • Xu, S., Jin, W. & Jin, L. (2009) Haplotype-sharing analysis showing Uyghurs are unlikely genetic donors. Mol Biol Evol 26, 21972206.
  • Xu, S., Gupta, S. & Jin, L. (2010) PEAS V1.0: A package for elementary analysis of SNP data. Molecular Ecology Resources 10, 10851088.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Methods
  5. Tree Reconstruction
  6. Recombination Parameter Estimation
  7. Results
  8. Discussion
  9. Conclusions
  10. Authors’ Contributions
  11. Acknowledgements
  12. References
  13. Supporting Information

Table S1 Spearman rank correlation of recombination rate (ρ, above diagonal) and genetic divergence (FST, below diagonal) between populations (11834 SNPs with MAF > 0.05)

Table S2 Spearman rank correlation of recombination rate (ρ, above diagonal) and genetic divergence (FST, below diagonal) between populations (8473 SNPs with MAF > 0.10)

Table S3 Spearman rank correlation of LD (|D′|, above diagonal; r2, below diagonal) between populations (11834 SNPs with MAF > 0.05)

Table S4 Spearman rank correlation of LD (|D′|, above diagonal; r2, below diagonal) between populations (8473 SNPs with MAF > 0.10)

Table S5 Haplotype sharing proportion between populations (5 kb bin)

Table S6 Haplotype sharing proportion between populations (10 kb bin)

Table S7 Haplotype sharing proportion between populations (20 kb bin)

Table S8 Haplotype sharing proportion between populations (30 kb bin)

Table S9 Haplotype sharing proportion between populations (40 kb bin)

Table S10 Haplotype sharing proportion between populations (50 kb bin)

Table S11 Haplotype sharing proportion between populations (100 kb bin)

Table S12 Haplotype sharing proportion between populations (200 kb bin)

Table S13 Haplotype sharing proportion between populations (500 kb bin)

Table S14 SNP and Haplotype diversity in 11 populations.

Table S15 Heterozygosity of SNPs within populations (20177 SNPs)

Table S16 Heterozygosity of haplotypes within populations (11834 SNPs with MAF > 0.05)

Table S17 Heterozygosity of SNPs within populations (11834 SNPs with MAF > 0.05)

Table S18 Heterozygosity of haplotypes within populations (8473 SNPs with MAF > 0.10)

Table S19 Heterozygosity of SNPs within populations (8473 SNPs with MAF > 0.10)

Figure S1 Comparisons of LD between populations.

Figure S2 Phylogenetic trees reconstructed from the matrix of correlation of adjacent LD between populations.

Figure S3 Phylogenetic trees reconstructed from the matrix of correlation of recombination rates and FST between populations.

Figure S4 Population trees reconstructed using single SNPs.

Figure S5 Heterozygosity in 11 populations.

Figure S6 Heterozygosity of haplotypes and SNPs.

Figure S7 Heterozygosity of haplotypes and SNPs.

FilenameFormatSizeDescription
AHG_678_sm_SuppMat.pdf1608KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.