SEARCH

SEARCH BY CITATION

Keywords:

  • Turks;
  • population structure;
  • ancestry;
  • admixture;
  • population genetics;
  • Kyrgyz;
  • Human Genome Diversity Panel

Summary

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Turkey has experienced major population movements. Population structure and genetic relatedness of samples from three regions of Turkey, using over 500,000 SNP genotypes, were compared together with Human Genome Diversity Panel (HGDP) data. To obtain a more representative sampling from Central Asia, Kyrgyz samples (Bishkek, Kyrgyzstan) were genotyped and analysed. Principal component (PC) analysis reveals a significant overlap between Turks and Middle Easterners and a relationship with Europeans and South and Central Asians; however, the Turkish genetic structure is unique. FRAPPE, STRUCTURE, and phylogenetic analyses support the PC analysis depending upon the number of parental ancestry components chosen. For example, supervised STRUCTURE (K= 3) illustrates a genetic ancestry for the Turks of 45% Middle Eastern (95% CI, 42–49), 40% European (95% CI, 36–44) and 15% Central Asian (95% CI, 13–16), whereas at K= 4 the genetic ancestry of the Turks was 38% European (95% CI, 35–42), 35% Middle Eastern (95% CI, 33–38), 18% South Asian (95% CI, 16–19) and 9% Central Asian (95% CI, 7–11). PC analysis and FRAPPE/STRUCTURE results from three regions in Turkey (Aydin, Istanbul and Kayseri) were superimposed, without clear subpopulation structure, suggesting sample homogeneity. Thus, this study demonstrates admixture of Turkish people reflecting the population migration patterns.


Introduction

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Analysis of population genetic substructure has been improved by using high-density single nucleotide polymorphism (SNP) arrays. Knowledge of the patterns of variation within continental populations is useful for several reasons, such as understanding the origin and migration of population groups and providing information on allele frequency for genetic association studies. Recent genome-wide association studies have shown that discovering and accounting for differences (e.g. controlling for population structure even at a fine level within a seemingly homogeneous population) in substructure can reduce error rates in association studies (Tian et al., 2008; McClellan & King, 2010; Price et al., 2010; Rosenberg et al., 2010).

The Human Genome Diversity Panel (HGDP) (Cavalli-Sforza, 2005) has facilitated the discovery of the origin of human genetic diversity, genetic relatedness, and population structure among world populations by providing samples of genomic DNA and genotype data (Cann et al., 2002; Rosenberg et al., 2002; Li et al., 2008). In addition, several non-HGDP populations have been analysed (Xu & Jin, 2008; Teo et al., 2009; Hunter-Zinck et al., 2010; Xing et al., 2010) together with HGDP samples. However, the structure of the Turkish population has not been analysed using high-density SNP genotypes. The Anatolian peninsula (present-day Turkey) connects the Middle East, Europe and Asia, and thus has been subject to major population movements (Grousset, 1970; Güvenç, 1993; Findley, 2005b). Previous studies of genetic variations in the Turkish population examined mitochondrial DNA sequence variation (Calafell et al., 1996; Mergen et al., 2004; Quintana-Murci et al., 2004), polymorphic markers on the Y chromosome (Cinnioğlu et al., 2004; Regueiro et al., 2006) and some polymorphic loci on autosomal chromosomes (Di Benedetto et al., 2001; Berkman et al., 2008) with relatively few genetic markers.

Previously, we have studied the risk factors for coronary artery disease in the Turkish population (Mahley et al., 1995; Bersot et al., 1999; Mahley et al., 2000; Mahley et al., 2001), a population known to have a high prevalence of heart disease (Onat, 2001; Onat et al., 2003). One of the major risk factors is low levels of high density lipoprotein-cholesterol (Bersot et al., 2003). Association studies of candidate genes of lipid metabolism (Hodoğlugil et al., 2005a; Hodoğlugil et al., 2005b; Hodoğlugil et al., 2006; Hodoğlugil et al., 2010) and a genome-wide scan (Yu et al., 2005; Ling et al., 2009) have identified multiple genes that contribute to the Turkish lipid phenotype. Recently, a unique gene – glucuronic acid epimerase – was shown to be associated with both high density lipoprotein-cholesterol and triglyceride levels in the Turkish population (Hodoğlugil et al., 2011). Interestingly, the SNP frequency pattern across the locus for this gene more resembled an Asian pattern, whereas the SNP frequency surrounding this locus on chromosomes 15q21–15q23 was more similar to a European pattern, suggesting the importance of recombination events explaining unique population-specific phenotypes. Thus, in the present study, we sought to analyse the genetic ancestry of the Turkish population with respect to publicly available HGDP samples (http://hagsc.org/hgdp/files.html) (Li et al., 2008).

To achieve a more representative sampling from Central Asia relevant to Turkish history (Grousset, 1970; Güvenç, 1993; Findley, 2005b), we also genotyped samples from another Central Asian population, Kyrgyz from Bishkek, Kyrgyzstan. The Central Asian populations in the HGDP are represented by the Uygur and Hazara populations. In addition, to determine whether subpopulations exist among our study subjects, we analysed Turkish samples from three regions in Turkey (Istanbul, Aydin and Kayseri) (Fig. 1). Thus, we genotyped 64 Turkish and 16 Kyrgyz samples, and then combined the data sets with the HGDP data set to examine genetic relatedness and population substructure among Eurasian populations.

image

Figure 1. Geographical locations of samples used in this study. Turkish (Istanbul, Aydin and Kayseri) and Kyrgyz samples are shown in red; populations from the HGDP are shown in black.

Download figure to PowerPoint

Materials and Methods

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Study Population, Genotyping and SNP Quality Control

Sixty-four unrelated Turkish samples (including one duplicate pair) from three locations in Turkey (Istanbul, Aydin and Kayseri) were selected from participants in the Turkish Heart Study (Mahley et al., 1995). Istanbul is a cosmopolitan city of over 12 million and a major hub for other parts of Turkey. All Istanbul samples were selected from the city itself. Aydin is a mid-size city (population: 188,000) near the Aegean coast, and Kayseri is a relatively large city (population: 1,200,000) in central Turkey. Samples from the Aydin and Kayseri regions were selected from city centres and from several nearby towns and villages. All samples were obtained from individuals who were born and lived in these regions at the time the samples were collected (Fig. 1). In addition, 16 Kyrgyz samples were randomly selected from a Kyrgyz cohort obtained at the Kyrgyz National Center of Cardiology and Internal Medicine in Bishkek, Kyrgyzstan. All participants were queried about their ethnicity, and only participants indicating Turkish or Kyrgyz ethnicity were included in the study. Equal numbers of males and females were included, and all samples were obtained from healthy individuals under controlled conditions as described (Mahley et al., 1995). The protocols were approved by the Committee on Human Research of the University of California, San Francisco, and were in accordance with the Helsinki Declaration.

DNA was extracted from blood with a Qiagen blood kit. DNA samples with an A260/A280 ratio >1.8 quantified with a Nanodrop spectrophotometer were utilised for genotyping with Infinium Human 610-quad BeadChip assays (Illumina, San Diego, CA), according to the manufacturer's specifications. All samples had call rates >98%. The rate of concordance between a pair of duplicate samples was >99.99%. SNPs were filtered out if they differed between duplicate samples or if their call rates across the 80 samples were <95%. Hardy–Weinberg equilibrium was tested separately in Turkish and Kyrgyz populations. SNPs that deviated from Hardy–Weinberg equilibrium (p < 0.001, n= 590 for Turks and n= 1781 for Kyrgyz) were also excluded. Only autosomal chromosomes were utilised. These filtering and exclusion criteria resulted in 571,852 high-quality SNPs. The genotype data set is available upon request from the authors.

Recently, HGDP samples were genotyped (n= 1043) with Illumina HumanHap650K BeadChips (Illumina), and the genotype data were made publicly available (Li et al., 2008). HGDP genotype data from unrelated HGDP subjects (n= 938) (Rosenberg, 2006) were combined with our filtered 79-sample set (excluding one individual from a duplicate pair) and resulted in high-quality genotypes for 533,261 SNPs.

Three different SNP sets were used in the analysis – all SNPs (533,261), linkage disequilibrium (LD)–pruned SNPs (105,382), and a further trimmed smaller set of SNPs (6,408). To prune SNPs for pairwise LD threshold r2 > 0.2, we used PLINK (Purcell et al., 2007) and the –indep-pairwise command, which removes one of a pair of SNPs if r2 > 0.2 in 50-SNP windows, repeats this process for every pair, and then shifts the window 5 SNPs forward and repeats the entire procedure again. This resulted in 105,382 SNPs in the Turkish samples. The LD-pruned (r2 < 0.2) SNP set was further trimmed by using high Fst SNPs between HapMap (phase II) European and East Asian samples. First, high Fst SNPs (CEU vs. CHB + JPT > 0.25) were selected and thinned if adjacent SNPs were <0.1 cM apart and were filled with SNPs 0.25 > Fst > 0.20 if they were >1 cM apart. This resulted in 6408 SNPs. Pairwise HapMap Fst and mapping (cM) data for individual SNPs were provided by Stephen Schaffner (Broad Institute of MIT and Harvard) and Tara Matise (Rutgers University), respectively.

Principal Component Analysis for Inference of Population Affinities

Autosomal SNP genotypes were used to examine the relationship between individuals by principal component (PC) analysis with the smartpca program distributed with EIGENSTRAT (Patterson et al., 2006). The LD-pruned (r2 < 0.2, n= 105,382) SNP set was used, and no genetic outliers were removed. PC analysis was conducted on all samples and on selected samples from Eurasia separately without using population labels. The pairwise combinations of up to four components were plotted to illustrate the genetic relatedness among individuals/populations. Turkish and Kyrgyz samples were combined with the HGDP samples and analysed with smartpca.

To confirm the validity of results, we computed the identity-by-state (IBS) matrix among the 1017 individuals (Turkish, Kyrgyz and HGDP samples) with PLINK, producing a 1017-by-1017 matrix utilising all SNPs. We then performed multidimensional scaling plots on this IBS matrix and used the top two components to illustrate the genetic relatedness among individuals.

Inference of Population Clustering with FRAPPE, STRUCTURE and CLUMMP

To assess population substructure from the high-density genetic marker data, we used FRAPPE 1.1 (EM algorithm) (Tang et al., 2005) and STRUCTURE version 2.2 (Bayesian clustering algorithm) (Falush et al., 2003). For FRAPPE analysis, owing to computer time constraints, the LD-pruned (r2 < 0.2, n= 105,382) SNP set was utilised with 20 populations selected from all continental/geographical regions representing 339 individuals. This FRAPPE analysis considers each person's genome as having originated from K parental populations (K= 2–7), whose contributions are described by coefficients that add up to 100% for each individual. For STRUCTURE analysis, default parameter settings of 30,000 replicates and 30,000 burn-in cycles were used. Because STRUCTURE has a large memory demand, the set of 6408 SNPs was used. Ancestry coefficient estimates from 10 individual STRUCTURE runs for each parental population (K= 2–7) were conducted with a lab computer or computer clusters at the Computational Biology Service Unit, Cornell University (http://cbsuapps.tc.cornell.edu/index.aspx) utilising all samples or a subset of samples. The estimated ln probability of data [ln Pr(X|K)] was consistent across independent runs, and the appropriate number of clusters is six or seven for this data set (Falush et al., 2003). STRUCTURE results were analysed with CLUMMP (Jakobsson & Rosenberg, 2007), which permutes the cluster output by independent runs of clustering programs such as STRUCTURE, so that they match up as closely as possible. Supervised STRUCTURE analysis was performed using selected parental populations as described in the text.

Fst Calculations and Phylogenetic Tree Building

By including population labels in the parameter file while running the program, we calculated an Fst matrix with the smartpca function of EIGENSTRAT simultaneously with the PC analysis. The phylogenetic tree was built with the Fst matrix in MEGA4 using the neighbour-joining method (Saitou & Nei, 1987; Tamura et al., 2007).

Allele Frequency Spectrum Comparison

Genome-wide allele frequency comparisons between population pairs were completed utilising all SNPs, and heat maps were used to visualise the allele frequency distributions across pairs of populations (R, hexbin package, http://cran.r-project.org/). Frequencies of reference forward allele are reported. All allele frequency values were used, and no cut-off values were applied. Pearson's correlation was calculated for population pairs.

Patterns of Decay of LD and Haplotype Diversity

For each chromosome, we randomly selected a 1-Mb region, avoiding centromeres, genomic regions with low SNP density, and known segmental duplications. Genotype data were phased with fastPhase (Scheet & Stephens, 2006) software separately for each population, using default parameters. LD (r2 and D′) was measured by pairwise comparison between SNP markers that had a minor allele frequency ≥15% using phased genotype data in Haploview (Barrett et al., 2005). The LD between a focal SNP and any SNP within a 250-kb upstream or downstream region of the focal SNP was calculated. Haplotype blocks were calculated with the Gabriel method (Barrett et al., 2005) in Haploview using phased genotype data, and haplotypes (frequency >5%) were counted in each selected genomic region for each population separately.

The number of subjects in the HGDP populations varies greatly. To avoid the effects of different sample sizes on comparisons of LD decay and haplotype diversity, populations from similar geographic regions were combined, and 48 subjects were selected for each group: Turkish (48), European (14 French, 12 Italian, 8 Tuscan and 14 Sardinian), Middle Eastern (24 Druze and 24 Palestinian), Central Asian (22 Hazara, 10 Uygur and 16 Kyrgyz) South Asian (8 Balochi, 8 Brahui, 8 Burusho, 8 Makrani, 8 Pathan and 8 Sindhi), Northeast Asian (8 Mongola, 8 Tu, 8 Oroqen, 8 Xibo, 8 Daur and 8 Hezhen), native American (7 Colombian, 8 Surui, 11 Karitiana, 11 Maya and 11 Pima) and African (11 Bantu, 8 Biaka Pygmy, 8 Mbuti Pygmy, 8 Mandenka, 8 Yoruba and 5 San).

Relatedness, Identity-by-Descent, IBS, and Runs-of Homozygosity

Whole-genome genotype data were used to calculate identity-by-descent (IBD) and IBS values in PLINK (Purcell et al., 2007) utilising all individuals. PI_HAT values (proportion of IBD) were evaluated for cryptic relatedness for Turkish and Kyrgyz samples. Pairwise IBS sharing within a subpopulation was used to evaluate genetic similarity in a given population.

To calculate runs of homozygosity (ROH) in our samples, we used the default parameters in PLINK. To avoid the effects of different sample sizes on calculations, we used the same 48-subject groups (Turkish, European, Middle Eastern, Central Asian, South Asian and Northeast Asian). To eliminate the effect of LD on detection of ROHs, SNPs were LD pruned (r2 < 0.2, n= 105,382) for each population group separately.

Results

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Population Structure, Relatedness and Admixture

PC analysis is useful for revealing relationships among individuals and exploring the extent of differentiation among populations. We used data from the unrelated subjects in the HGDP, a collection of 52 populations across the globe, and included data from our Turkish and Kyrgyz samples utilising the LD-pruned SNP set (r2 < 0.2, n= 105,382). Figure 2A shows the first two components of this analysis by smartpca. Population groupings (major geographical regions) were assigned only after the analysis. Subjects from the same geographical region clustered among themselves. Turkish samples clustered tightly among themselves and together with Europeans, Middle Easterners and South Asians (Pakistani). Krygyz samples also clustered tightly among themselves and between Central Asians (Uygur and Hazara) and East Asians.

image

Figure 2. PC analysis demonstrating genetic relatedness across major geographic regions, including HGDP, Turkish and Kyrgyz samples. Each symbol represents one individual. (A) PC analysis of 52 populations from the HGDP (n= 938), Turkish (n= 63) and Kyrgyz (n= 16) samples. (B) PC analysis focusing on selected Eurasian populations (including Turkish and Kyrgyz populations) (n= 451).

Download figure to PowerPoint

To examine fine-scale population structure and relatedness, we removed African, Oceanian and native American populations. Representative populations from Eurasia were selected, and the analysis was repeated (Fig. 2B). Turkish samples clustered with Middle Eastern and European populations, particularly with the Adygei population from the Caucasus. South Asian populations clustered separately and did not overlap with Turkish samples. Kyrgyz samples clustered with other Central Asian populations, but they were relatively closer to East Asian populations (Fig. 2B). These results demonstrate that the PC analysis for the Eurasian region clearly delineates fine-scale population structure.

To examine finer-scale population clustering among populations and to identify any subpopulation structure among our subjects from different regions of Turkey, we analysed Turkish samples together with European and Middle Eastern populations (Fig. 3A) or with South Asian and Central Asian populations (including Kyrgyz) (Fig. 3B) after examining the pattern of clustering of populations in Figures 2A and B. The Turkish samples were easily separated from the Middle Eastern and European populations, and to some extent from the Adygei population. Importantly, samples from the three regions in Turkey (Aydin, Istanbul and Kayseri) overlapped, suggesting no clear subpopulation structure in our samples (Fig. 3A). Additional pairwise PCs were plotted using Turkish, European and Middle Eastern populations. The third PC clearly distinguished Middle Eastern populations of Palestinians and Druze (Figs. S1A and C), while the Turkish samples from different regions overlapped (Figs. S1A–D). Similarly, Turkish samples were clearly separated from South Asian and Central Asian populations as shown in the first two PCs (Fig. 3B). In addition, adding the third and fourth PCs showed that the Turkish samples from the different regions overlapped (Figs. S2A–D) as we observed with the first two PCs (Fig. 3B).

image

Figure 3. PC analysis demonstrating genetic relatedness in selected HGDP, Turkish and Kyrgyz samples. Each symbol represents one individual. (A) PC analysis of Turks vs. European and Middle Eastern populations. Turkish samples were from three regions of Turkey (Aydin, Istanbul and Kayseri). (B) PC analysis of Turks vs. Central Asian (including Kyrgyz) and South Asian (Pakistani) populations.

Download figure to PowerPoint

We repeated the PC analysis using only the 63 Turkish samples and observed that the samples from the different regions overlapped (Figs. S3A–D). In addition, PC analysis of the Turkish and Adygei populations together clearly separated the Adygei population from the Turkish population at the first PC, and again our samples from the different regions overlapped (data not shown). These results demonstrate that our Turkish samples are rather homogeneous and clustered away from other Eurasian populations (Figs. 3 and S1–3). We analysed up to six PCs and the results did not suggest that there were any differences between our samples from the three regions. Including additional population(s) or different groupings (e.g. eliminating a few) did not change the overall interpretation of the PC analysis results for Turkish or Kyrgyz populations (data not shown). To check the validity of the PC analysis results, the IBS matrix was used to create a multidimensional scaling plot for all samples (HGDP, Turkish and Kyrgyz) including all SNPs (Fig. S4). First and second dimensions were plotted with similar labelling of major geographical regions (Fig. 2A) to illustrate the genetic relatedness among individuals or populations. The results were very similar to those obtained with smartpca. Using multidimensional scaling analysis with the LD-pruned SNP set (r2 < 0.2, n= 105,382), we also obtained similar results (data not shown). Furthermore, removing some populations as we did previously (Figs. 2B and 3A and B) also gave very similar results (data not shown). This demonstrates the validity of population clustering results obtained by two different statistical methods.

The population structure of the Turkish and Kyrgyz samples was further examined with FRAPPE (K= 2–7, Fig. S5) and STRUCTURE (K= 2–7, Fig. S6) along with the HGDP samples. Previous analysis of HGDP samples with FRAPPE and STRUCTURE revealed that individuals from the same geographic region or predefined population nearly always shared similar parental ancestry components (Rosenberg et al., 2002; Li et al., 2008). In FRAPPE, the genetic structure of the Turkish samples revealed four parental ancestries (>1%) at K= 7 (Fig. 4). The largest portion (light blue), about 53% averaged across the Turkish samples, was present as the major ancestry in European populations, and this ancestry was also present in the Middle Eastern and Central Asian populations. About 26% of ancestry (dark blue) in the Turkish population represented the major ancestry in South Asians and was present to a lesser extent in Central Asian and Middle Eastern populations but was not present in European populations. About 14% of ancestry (green) in Turks was present in Middle Eastern populations, and about 6% of Turkish ancestry (red) was present to a significant extent in Central Asian and to a major extent in East Asian populations. Samples from different regions of Turkey had similar mean parental ancestry estimates (Table S1). Results from the Caucasus region (Adygei population) were similar to the Turks.

image

Figure 4. Estimated individual ancestry and population structure in 339 individuals by FRAPPE analysis. Representative HGDP populations selected from all continental/geographical regions and combined with Turkish and Kyrgyz samples (n= 339). Populations are labelled above the figure, with their geographic affiliations below. Each individual is represented by a thin vertical line, which is partitioned into K coloured segments (K= 7). Colours represent the inferred ancestry from parental populations. White lines separate individuals of different populations.

Download figure to PowerPoint

Parental ancestry estimates for our Kyrgyz samples were similar to other Central Asian samples (Uygur and Hazara) except that the “red” ancestry coefficient (major ancestry in East Asian populations) was slightly higher in Kyrgyz than other Central Asians (Fig. 4). This finding is consistent with the PC analysis results (Figs. 2A and B).

The population structure of the Turkish and Kyrgyz samples was also examined with STRUCTURE (Fig. S6). At K= 7, parental ancestry estimates for Turkish subjects were higher for ancestry coefficients in which the major ancestry component was European (77%, “light blue”) and lower in South Asian (12%, “dark blue”) and Middle Eastern (4%, “light green”) populations and similar to Central Asian population (6%, “red”). FRAPPE distinguished South Asian populations from Middle Eastern and European populations and Middle Eastern populations from European populations as seen in the original HGDP analysis (Li et al., 2008). To determine whether SNP selection affects the results, random SNPs were selected (1st, 84th, 167th, etc. up to 6407 SNPs) and run on STRUCTURE at K= 7. Random selection gave results similar to those of the selection process described previously (data not shown).

Supervised clustering with STRUCTURE (Falush et al., 2003) was also used to analyse the Turkish genetic ancestry by forcing separate clustering of HGDP populations. Supervised analysis was performed using individuals from the Middle East (Druze and Palestinian), Europe (French, Italian, Tuscan and Sardinian) and Central Asia (Uygur, Hazara and Kyrgyz) at K= 3 (Fig. 5A). The contributions were 45%, 40% and 15% for the Middle Eastern, European and Central Asian populations, respectively. Supervised analysis was also performed using Middle Eastern, European, Central Asian and South Asian (Pakistani) populations (K= 4) (Fig. 5B). Parental ancestry coefficients for our Turkish samples were found to be 38% European, 35% Middle Eastern, 18% South Asian and 9% Central Asian.

image

Figure 5. Supervised population structure analysis. Parental ancestry contributions were calculated for Turkish samples using supervised STRUCTURE analysis. Each individual is represented by a thin vertical line. White lines separate individuals of different populations. (A) Three clusters were forced to correspond to Middle Eastern (Druze and Palestinian), European (French, Italian, Tuscan and Sardinian), and Central Asian (Uygur, Hazara and Kyrgyz) populations at K= 3. (B) Four clusters were forced to correspond to Middle Eastern, European, South Asian (Balochi, Brahui, Burusho, Makrani, Pathan and Sindhi) and Central Asian populations at K= 4.

Download figure to PowerPoint

Fst Calculations and Phylogenetic Tree Building

To measure genetic distances between HGDP, Turkish and Kyrgyz populations, we calculated pairwise Fst values between populations. Results for selected Eurasian populations (Table 1) and all populations in this study (Table S2) are shown. Turks had the lowest pairwise Fst with Adygei, Middle Eastern and European populations, followed by South Asian and Central Asian populations. Kyrgyz had the lowest pairwise Fst with Uygur and Hazara populations followed by East Asian populations. These pairwise Fst distances are in concordance with the results from the PC and STRUCTURE analyses. The phylogenetic tree for selected Eurasian populations (Fig. 6) supported the aforementioned relationship that Turks are closer to Adygei and Middle Eastern populations and to some degree to European and South Asian populations.

Table 1. Fst matrix among selected Eurasian populations.
 TurkishDruzePalestinianFrenchItalianAdygeiBalochiBurushoUygurHazaraKyrgyzHanMongola
  1. Notes:Fst matrix was calculated using the smartpca function of the EIGENSTRAT program utilising the LD-pruned (r2 < 0.2) SNP data set.

  2. Populations from major geographic regions are grouped and separated by thin black lines.

Turkish 0.0080.0070.0060.0050.0040.0100.0160.0230.0250.0410.0940.077
Druze0.008 0.0090.0140.0120.0120.0190.0290.0390.0400.0580.1140.097
Palestinian0.0070.009 0.0140.0110.0120.0170.0260.0360.0370.0550.1080.092
French0.0060.0140.014 0.0020.0090.0200.0260.0350.0360.0540.1110.094
Italian0.0050.0120.0110.002 0.0090.0200.0280.0370.0380.0560.1130.096
Adygei0.0040.0120.0120.0090.009 0.0120.0180.0280.0280.0460.1000.083
Balochi0.0100.0190.0170.0200.0200.012 0.0110.0230.0230.0400.0900.074
Burusho0.0160.0290.0260.0260.0280.0180.011 0.0160.0170.0310.0730.058
Uygur0.0230.0390.0360.0350.0370.0280.0230.016 0.0030.0090.0320.019
Hazara0.0250.0400.0370.0360.0380.0280.0230.0170.003 0.0120.0370.023
Kyrgyz0.0410.0580.0550.0540.0560.0460.0400.0310.0090.012 0.0320.018
Han0.0940.1140.1080.1110.1130.1000.0900.0730.0320.0370.032 0.007
Mongola0.0770.0970.0920.0940.0960.0830.0740.0580.0190.0230.0180.007 
image

Figure 6. Phylogenetic tree of Eurasian populations. Neighbour-joining tree of 33 Eurasian populations (selected from HGDP, Turkish and Kyrgyz populations) based on pairwise Fst matrix calculated with smartpca. The phylogenetic tree was constructed with MEGA4 software.

Download figure to PowerPoint

Allele Frequency Comparison among Populations

Forward reference allele frequencies in Turkish versus other HGDP populations were compared and visualised (Fig. S7). The highest correlations were between Turks and Middle Easterners (r= 0.923, Druze and Palestinian), Europeans (r= 0.914, French, Italian, Tuscan and Sardinian) and South Asian populations (r= 0.894, Pakistani). There was some degree of correlation with Central Asian populations (r= 0.747, Hazara and Uygur) (Fig. S6). These results are in line with results of the PC, FRAPPE and STRUCTURE analyses. Allele frequency correlations between Kyrgyz and HGDP populations were also calculated. The highest correlations were with other Central Asian (r= 0.834), Northeast Asian (r= 0.854) and Chinese populations (r= 0.808).

Patterns of Decay of LD and Haplotype Diversity

To investigate haplotype diversity in our Turkish samples and other population groups, randomly selected 1-Mb regions from each chromosome were analysed. Population groups contained equal numbers of subjects to avoid the effects of different sample sizes. The number and average size of haplotype blocks and the number of common haplotypes (>5%) were rather similar among Turkish, European, Middle Eastern, Central Asian and South Asian groups (Table 2). Haplotype block counts were lower, the average size was shorter and the number of common haplotypes was lower in Africans, whereas the haplotype blocks were much larger in native Americans than in other populations (including Turks).

Table 2.  Haplotype diversity in different population groups.
Population groupsBlocka (n)Mean ± SEMb (bp)Median (bp)Range (bp)Distinct haplotypec (n)
  1. Notes: Haplotype blocks were calculated with phased genotype data using Haploview (Gabriel method). Population group size, n= 48 individuals.

  2. aTotal number of haplotype blocks in 22 different 1-Mb regions.

  3. bAverage size (bp) of haplotype blocks.

  4. cTotal number of distinct, common haplotypes (>5%).

Turkish26614,871 ± 2,0864,869352,664801
Middle Eastern25815,227 ± 2,0925,765419,903786
Central Asian (including Kyrgyz)25615,565 ± 2,1435,276350,723748
European26715,623 ± 1,9795,261337,316819
Northeast Asian24717,092 ± 2,3615,902417,962724
South Asian (Pakistani)25117,171 ± 2,8085,285465,263746
African144 7,301 ± 1,0443,427 89,641417
Native American24824,269 ± 2,8007,856350,723718

Turkish, European, Middle Eastern, Central Asian and South Asian groups exhibited similar rates of LD decay with increasing distance (Fig. 7). The half-life of LD decay with genomic distances was substantially shorter in African samples and longer in native American samples. The difference among populations started to disappear over 100-kb distances.

image

Figure 7. Decay of LD over distance. SNP pairs were partitioned into bins at 5-kb intervals; for each bin number, SNP pairs with r2 > 0.8 (A) and D′ > 0.8 (B) were plotted. Each group has 48 individuals to eliminate possible effects of sample size. The populations shown are European (French, Italian, Tuscan and Sardinian), Middle Eastern (Druze and Palestinian), Central Asian (Hazara, Uygur and Kyrgyz), South Asian (Balochi, Brahui, Burusho, Makrani, Pathan and Sindhi), Northeast Asian (Mongola, Tu, Oroqen, Xibo, Daur and Hezhen), native American (Colombian, Surui, Karitiana, Maya and Pima) and African (Bantu, Biaka Pygmy, Mbuti Pygmy, Mandenka, Yoruba and San).

Download figure to PowerPoint

Relatedness, IBD, IBS and ROH

Cryptic relatedness was determined by estimating IBD across the genome for all possible pairwise sample combinations for Turkish and Kyrgyz samples separately. All pairwise PI_HAT values (proportion of IBD) were <0.05 for Turkish samples and <0.07 for Kyrgyz samples, suggesting that relatedness was not an issue for our samples. Pairwise IBS sharing values were used to evaluate genetic similarity in a given population and are shown for selected populations (Table S3). Average IBS sharing values within populations were quite similar except for Papuan and Piman populations, which were slightly elevated. Samples from different regions of Turkey also have similar IBS sharing values (Table S3).

ROH (extended homozygosity in a locus with two identical alleles) was examined in several population groups containing equal numbers of subjects from Eurasia after SNPs were LD pruned separately in each group. Middle Eastern and South Asian populations showed significantly more ROH as seen by higher count and longer segments in the histogram (Fig. S8). Turkish, Central Asian (including Kyrgyz), European and Northeast Asian populations showed similar degrees of ROH.

Discussion

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

The Anatolian peninsula (present-day Turkey), located on the Silk Road, served as a bridge between the West and East and was subject to migration from different regions throughout history. The most recent migration was by Turkic-speaking nomadic groups, mainly Oghuz groups. Starting in the 10th century, they spread away from their homeland in Central Asia (Grousset, 1970; Güvenç, 1993; Findley, 2005b), began to admix with local inhabitants, and established the Anatolian Seljuk Empire (10–13th centuries). After the collapse of this Turkish Empire by Mongol invasion, another Turkish empire, the Ottomans, ruled (13–20th centuries) the Anatolian peninsula, the Middle East and stretching to South Eastern Europe and South Western Asia (Findley, 2005a; Faroqhi, 2007). These major historical events are reflected in the genetic structure of present-day Turkish people, as described in this study.

We analysed the population structure and genetic relatedness of Turkish and Kyrgyz populations and compared them to other Eurasian populations utilizing HGDP data. PC and FRAPPE/STRUCTURE analyses indicated that the Turkish population has a close genetic similarity to Middle Eastern and European populations and some degree of similarity to South Asian and Central Asian populations. Kyrgyz samples showed genetic relatedness (clustered together) with other Central Asian populations (Uygur and Hazara) in the HGDP set. The PC and FRAPPE results are generally consistent with the phylogenetic tree and the relative paired Fst values with respect to the distance separation among the different population groups. Results from our samples, collected from three regions in Turkey (Aydin, Istanbul and Kayseri), overlapped without a clear subpopulation structure, suggesting a rather homogeneous and distinct genetic ancestry. The potential weakness of our sampling strategy is that we do not have the parental/grandparental ancestry of our samples, which may cause difficulties in the interpretation of genetic ancestry inference. The complex origins, unrecorded/unknown immigrations, and recent intermarriages with other population/ancestry groups preclude the possibility of unambiguously identifying the ancestry of our samples. However, clear overlapping of our samples from three different regions of Turkey, including samples from a cosmopolitan city such as Istanbul (which may reflect the more general picture of present-day Turkey), and data from samples that were obtained from individuals who were born and lived in their designated regions gives us confidence in our interpretation of the results, at least for the regions and samples included in this study.

Genetic distance also depends on the markers used; the panel of more than 500,000 SNPs we used is biased towards common polymorphisms discovered in European and East Asian (mainly Japanese) populations. Nevertheless, fine population structure has been documented in several studies even when subsets of these high-density markers were selected (Bonnen et al., 2006; Xu et al., 2008; Auton et al., 2009; Silva-Zolezzi et al., 2009; Bryc et al., 2010; Hunter-Zinck et al., 2010; Xing et al., 2010). Importantly, the ancestry proportions inferred from this analysis are affected by the populations used in the study. The HGDP has extensive coverage of the world's major geographic regions, although some are not well represented (e.g. Central Asia). However, extensive and rigorous analyses have demonstrated that the estimated genetic clusters are not artefacts of noncontinuous sampling of people (Rosenberg et al., 2002; Li et al., 2008).

To obtain better estimates of some calculations in this study, geographic populations in close proximity were grouped together. Populations of Mongola, Tu, Xibo, Oroqen, Hezhen and Daur were grouped together as Northeast Asians, since these groups reside at high latitudes and speak languages of the Altaic family (Cavalli-Sforza, 2005; Li et al., 2008), of which Turkic is a subdivision (Georg et al., 1998). Uygur and Kyrgyz populations also speak a Turkic language (Georg et al., 1998). Although Hazaran samples were collected from Pakistan (Cann et al., 2002), they are genetically more similar to Central Asian populations than to Pakistani populations as seen in this and other studies (Rosenberg et al., 2002; Quintana-Murci et al., 2004; Li et al., 2008; Xing et al., 2010); therefore, we grouped Hazarans together with Uygur and Kyrgyz populations as Central Asians. The Middle Eastern group consists of Druze and Palestinian populations, since Mozabites have a large African component, and Bedouins are an admixed population (Li et al., 2008). European populations on the Mediterranean Sea (French, Italian, Tuscan and Sardinian) were grouped as Europeans for supervised STRUCTURE, allele frequency spectrum comparison, patterns of decay of LD, and haplotype diversity analyses, whereas all or representative European populations were used for PC, FRAPPE and STRUCTURE analyses as described.

Our population substructure analyses are consistent with historic admixture events (Figs. 4, 5, S5 and S6). In Turks, the largest parental ancestry estimates (light blue) were also present as a major ancestry component in European and, to a lesser extent, in Middle Eastern and Central Asian populations. However, the second largest parental ancestry estimates (dark blue) were present in South Asian, Central Asian and Middle Eastern populations but not in European populations. The third largest ancestry estimates (light green) in Turks have a major component in Middle Easterners. The fourth largest ancestry estimates (red) in Turks were major ancestry estimates in East Asian and Central Asian populations, possibly demonstrating admixture events in Central Asian (Frye, 1996; Comas et al., 1998; Wells et al., 2001; Zerjal et al., 2002; Nasidze et al., 2004) and Turkish populations, but these estimates (red) were absent in European and Middle Eastern populations. PC and phylogenetic tree analyses also supported these conclusions.

The Adygei population from the Caucasus showed a closer genetic affinity to our Turkish samples; however, when the Turkish and Adygei populations were analysed together, the first PC clearly separated these two populations. Although the Adygei sample set was small (n= 17), it clustered tightly with other populations from the Caucasus (Nasidze et al., 2004; Xing et al., 2010), suggesting that it is a valid finding, not an artefact of low sample size. The Caucasus region, close to present-day Turkey, was also subjected to major population movements and the Caucasus Mountains did not seem to act as a barrier to gene flow (Nasidze et al., 2004). Studies of Y chromosome (Wells et al., 2001) and mitochondrial markers (Quintana-Murci et al., 2004) showed closer affinities of Turkish and Caucasus samples.

Many contemporary Central Asian populations speak a Turkic language (Georg et al., 1998) as do the majority of people in Turkey. Several studies have attempted to quantify the Central Asian contribution to the Turkish gene pool utilising mitochondrial DNA, Y chromosome and autosomal markers (Alu insertion polymorphism). Mean estimates varied widely; analysis of mitochondrial markers found that the admixture percent of Central Asian was 22% (Berkman, 2006) to 30% (Di Benedetto et al., 2001); for Y chromosome markers, the percent was <9% (Cinnioğlu et al., 2004), 13% (Berkman, 2006) and 30% (Di Benedetto et al., 2001); and for the Alu insertion polymorphism, it was 13% (Berkman et al., 2008) and 15% (Berkman, 2006) in the Turkish gene pool. Although these markers provide some insights about the relative contributions of different sexes, their haploid nature (mitochondrial and Y chromosome markers) makes them more vulnerable to genetic drift than autosomal markers. However, in the present study, we used autosomal high-density SNP genotypes across the genome to more accurately reflect the Central Asian admixture with Turks. To compare our samples with published reports (Di Benedetto et al., 2001; Cinnioğlu et al., 2004; Berkman, 2006; Berkman et al., 2008), we used supervised clustering with STRUCTURE (Falush et al., 2003). Individuals from the Middle East (Druze and Palestinian), Europe (French, Italian, Tuscan and Sardinian) and Central Asia (Uygur, Hazara and Kyrgyz) were forced into separate clusters, and supervised analysis of Turkish samples was performed at K= 3. The Central Asian contribution was found to be about 15% (with 45% Middle Eastern and 40% European) (Fig. 5A). We inferred parental populations from contemporary populations living in these locations, although these populations may have experienced population movement (e.g. migration, admixture) or genetic drift. Having different populations than the available ones used in this analysis (e.g. populations closer to Turkey or more populations from Central Asia) may also affect the calculated contributions. Nevertheless, our results compare favourably with published results of the Central Asian contribution to today's Turkish genome (Di Benedetto et al., 2001; Cinnioğlu et al., 2004; Berkman, 2006; Berkman et al., 2008).

Although separated by large geographic distances, Europe and South Asia (e.g. Pakistan) have some genetic relatedness (Fig. 2A) (Rosenberg et al., 2002; Li et al., 2008) that may reflect the documented gene flow from Central Asia, the Middle East and Iran to Pakistan (Wells et al., 2001; Quintana-Murci et al., 2004; Regueiro et al., 2006) and the common ancestry of these population groups (Quintana-Murci et al., 2004; Auton et al., 2009). Similarities between our samples and South Asian (Pakistani) samples may reflect those earlier migratory and admixture events. Nevertheless, we did similar supervised clustering in which individuals from South Asia, the Middle East, Europe and Central Asia were forced into separate clusters. The parental ancestry coefficient for our Turkish samples was 38% European, 35% Middle Eastern, 18% South Asian and 9% Central Asian at K= 4 (Fig. 5B).

ROH may arise from consanguinity, reduced population size, or prolonged isolation of a population. Middle Eastern and South Asian populations, where consanguinity is relatively common (Hussain, 1999; Hunter-Zinck et al., 2010), have clearly more ROH (in terms of number and size) than Turkish, Central Asian, European and Northeast Asian populations (Fig. S8). ROH might also result from hemizygosity (copy number variations, such as deletions). Copy number changes were not taken into account in our study. However, no significant differences in mean total length of ROHs were observed when deletions were considered (McQuillan et al., 2008).

The approaches used in our study allowed us to investigate the genetic ancestry of our Turkish samples with respect to HGDP samples and to assess the extent of admixture in our samples. Although the complex origins, historical immigrations, and intermarriages among populations make it hard to be precise, we found that individual parental ancestries can be estimated from the high-density SNP genotype data. A more thorough knowledge of between-population genetic variation is important in improving the design and interpretation of the genetics of complex diseases. Furthermore, since genetic studies are currently aiming at identifying smaller and smaller effects, recognizing and controlling for population structure, even at a fine level within a seemingly homogeneous population, is important to avoid confounding and spurious associations.

Acknowledgements

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Turkish samples were provided through the Turkish Heart Study. Kyrgyz samples were obtained at the Kyrgyz National Center of Cardiology and Internal Medicine, Bishkek, Kyrgyzstan with the support of Drs. M. M. Mirrakhimov and E. M. Mirrakhimov. The authors thank Dr. Vivian G. Cheung (University of Pennsylvania) and Dr. Katherine Pollard (Gladstone Institute of Cardiovascular Disease) for valuable input and critical reading of the manuscript, Sylvia Richmond for manuscript preparation and Gary Howard and Stephen Ordway for editorial assistance. The authors also thank Dr. Stephen Schaffner (BROAD Institute of MIT and Harvard) and Dr. Tara Matise (Rutgers University) for providing pairwise HapMap Fst and mapping (cM) data for individual SNPs, respectively. In addition, the authors are indebted to their associates at the American Hospital, Istanbul, especially Drs. K. Erhan Palaoğlu, Oryal Gökdemir, Sinan Özbayrakcı, Kerem Özer, Guy Pépin, Sibel Tanir, Judy Dawson-Pépin and Linda L. Mahley. The authors acknowledge the generous support of the American Hospital, especially Mr. George Rountree, and the J. David Gladstone Institutes. This work was supported in part by Grant no. R01 HL71027 from the National Institutes of Health.

References

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information
  • Auton, A., Bryc, K., Boyko, A.R., Lohmueller, K. E., Novembre, J., Reynolds, A., Indap, A., Wright, M. H., Degenhardt, J. D., Gutenkunst, R. N., King, K. S., Nelson, M. R., & Bustamante, C. D. (2009) Global distribution of genomic diversity underscores rich complex history of continental human populations. Genome Res 19, 795803.
  • Barrett, J. C., Fry, B., Maller, J., & Daly, M. J. (2005) Haploview: Analysis and visualization of LD and haplotype maps. Bioinformatics 21, 263265.
  • Berkman, C. C. (2006) Comparative analyses from the Central Asian contribution to Anatolian gene pool with reference to Balkans. Ph.D. Thesis. Middle East Technical University , Ankara , Turkey . (http://etd.lib.metu.edu.tr/upload/12607764/index.pdf).
  • Berkman, C. C., Dinc, H., Sekeryapan, C., & Togan, I. (2008) Alu insertion polymorphisms and an assessment of the genetic contribution of Central Asia to Anatolia with respect to the Balkans. Am J Phys Anthropol 136, 1118.
  • Bersot, T. P., Pépin, G. M., & Mahley, R. W. (2003) Risk determination of dyslipidemia in populations characterized by low levels of high-density lipoprotein cholesterol. Am Heart J 146, 10521060.
  • Bersot, T. P., Vega, G. L., Grundy, S. M., Palaoğlu, K. E., Atagündüz, P., Özbayrakçi, S., Gökdemir, O., & Mahley, R. W. (1999) Elevated hepatic lipase activity and low levels of high density lipoprotein in a normotriglyceridemic, nonobese Turkish population. J Lipid Res 40, 432438.
  • Bonnen, P. E., Pe’er, I., Plenge, R. M., Salit, J., Lowe, J. K., Shapero, M. H., Lifton, R. P., Breslow, J .L., Daly, M. J., Reich, D. E., Jones, K. W., Stoffel, M., Altshuler, D., & Friedman, J. M. (2006) Evaluating potential for whole-genome studies in Kosrae, an isolated population in Micronesia. Nat Genet 38, 214217.
  • Bryc, K., Auton, A., Nelson, M. R., Oksenberg, J. R., Hauser, S. L., Williams, S., Froment, A., Bodo, J.- M., Wambebe, C., Tishkoff, S. A., & Bustamante, C. D. (2010) Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci USA 107, 786791.
  • Calafell, F., Underhill, P., Tolun, A., Angelicheva, D., & Kalaydjieva, L. (1996) From Asia to Europe: Mitochondrial DNA sequence variability in Bulgarians and Turks. Ann Hum Genet 60, 3549.
  • Cann, H. M., de Toma, C., Cazes, L., Legrand, M. -F., Morel, V., Piouffre, L., Bodmer, J., Bodmer, W. F., Bonne-Tamir, B., Cambon-Thomsen, A., Chen, Z., Chu, J., Carcassi, C., Contu, L., Du, R., Excoffier, L., Ferrara, G. B., Friedlaender, J. S., Groot, H., Gurwitz, D., Jenkins, T., Herrera, R. J., Huang, X., Kidd, J., Kidd, K. K., Langaney, A., Lin, A. A., Mehdi, S. Q., Parham, P., Piazza, A., Pistillo, M. P., Qian, Y., Shu, Q., Xu, J., Zhu, S., Weber, J. L., Greely, H. T., Feldman, M. W., Thomas, G., Dausset, J., & Cavalli-Sforza, L.L. (2002) A human genome diversity cell line panel. Science 296, 261262.
  • Cavalli-Sforza, L. L. (2005) The Human Genome Diversity Project: Past, present and future. Nat Rev Genet 6, 333340.
  • Cinnioğlu, C., King, R., Kivisild, T., Kalfoğlu, E., Atasoy, S., Cavalleri, G. L., Lillie, A. S., Roseman, C. C., Lin, A. A., Prince, K., Oefner, P. J., Shen, P., Semino, O., Cavalli-Sforza, L. L., & Underhill, P. A. (2004) Excavating Y-chromosome haplotype strata in Anatolia. Hum Genet 114, 127148.
  • Comas, D., Calafell, F., Mateu, E., Pérez-Lezaun, A., Bosch, E., Martínez-Arias, R., Clarimon, J., Facchini, F., Fiori, G., Luiselli, D., Pettener, D., & Bertranpetit, J. (1998) Trading genes along the Silk Road: mtDNA sequences and the origin of Central Asian populations. Am J Hum Genet 63, 18241838.
  • Di Benedetto, G., Ergüven, A., Stenico, M., Castrì, L., Bertorelle, G., Togan, I., & Barbujani, G. (2001) DNA diversity and population admixture in Anatolia. Am J Phys Anthropol 115, 144156.
  • Falush, D., Stephens, M., & Pritchard, J. K. (2003) Inference of population structure using multilocus genotype data: Linked loci and correlated allele frequencies. Genetics 164, 15671587.
  • Faroqhi, S. (2007) On the margins of empire: Clients and dependants. pp. 7597. London : I.B.Tauris & Co.
  • Findley, C. V. (2005a) Islamic empires from Temur to the “gunpowder era”. pp. 93132. New York : Oxford University Press.
  • Findley, C. V. (2005b) Islam and empire from the Seljuks through the Mongols. pp. 5692. New York : Oxford University Press.
  • Frye, R. N. (1996) The present is born. pp. 233239. Princeton : Marcus Wiener Publishers.
  • Georg, S., Michalove, P. A., Ramer, A. M., & Sidwell, P. J. (1998) Telling general linguists about Altaic. J Linguistics 35, 6598.
  • Grousset, R. (1970) The Turks and Islam to the thirteenth century. pp. 141170. New Brunswick : Rutgers University Press.
  • Güvenç, B. (1993) Türklerin Kimliği: Kim Bu Türkler? (Identity of Turks: Who are the Turks?). pp. 1952. Ankara : Kültür Bakanlığı.
  • Hodoğlugil, U., Williamson, D. W., Huang, Y., & Mahley, R. W. (2005a) An interaction between the TaqIB polymorphism of cholesterol ester transfer protein and smoking is associated with changes in plasma high-density lipoprotein cholesterol levels in Turks. Clin Genet 68, 118127.
  • Hodoğlugil, U., Williamson, D. W., Huang, Y., & Mahley, R. W. (2005b) Common polymorphisms of ATP binding cassette transporter A1, including a functional promoter polymorphism, associated with plasma high density lipoprotein cholesterol levels in Turks. Atherosclerosis 183, 199212.
  • Hodoğlugil, U., Tanyolaç, S., Williamson, D. W., Huang, Y., & Mahley, R. W. (2006) Apolipoprotein A-V: A potential modulator of plasma triglyceride levels in Turks. J Lipid Res 47, 144153.
  • Hodoğlugil, U., Williamson, D. W., & Mahley, R. W. (2010) Polymorphisms in the hepatic lipase gene affect plasma HDL-cholesterol levels in a Turkish population. J Lipid Res 51, 422430.
  • Hodoğlugil, U., Williamson, D. W., Yu, Y., Farrer, L. A., & Mahley, R. W. (2011) Glucuronic acid epimerase is associated with plasma triglyceride and high-density lipoprotein cholesterol levels in Turks. Ann Hum Genet 75, 398417.
  • Hunter-Zinck, H., Musharoff, S., Salit, J., Al-Ali, K. A., Chouchane, L., Gohar, A., Matthews, R., Butler, M. W., Fuller, J., Hackett, N. R., Crystal, R. G., & Clark, A. G. (2010) Population genetic structure of the people of Qatar. Am J Hum Genet 87, 1725.
  • Hussain, R. (1999) Community perceptions of reasons for preference for consanguineous marriages in Pakistan. J Biosoc Sci 31, 449461.
  • Jakobsson, M. & Rosenberg, N. A. (2007) CLUMPP: A cluster matching and permutation program for dealing with label switching and multimodality in analysis of population structure. Bioinformatics 23, 18011806.
  • Li, J. Z., Absher, D. M., Tang, H., Southwick, A. M., Casto, A. M., Ramachandran, S., Cann, H. M., Barsh, G. S., Feldman, M., Cavalli-Sforza, L. L., & Myers, R. M. (2008) Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 11001104.
  • Ling, H., Waterworth, D. M., Stirnadel, H. A., Pollin, T. I., Barter, P. J., Kesäniemi, Y. A., Mahley, R. W., McPherson, R., Waeber, G., Bersot, T. P., Cohen, J. C., Grundy, S. M., Mooser, V. E., & Mitchell, B. D. (2009) Genome-wide linkage and association analyses to identify genes influencing adiponectin levels: The GEMS study. Obesity 17, 737744.
  • Mahley, R. W., Arslan, P., Pekcan, G., Pépin, G. M., Ağaçdiken, A., Karaağaoğlu, N., Rakıcıoğlu, N., Nursal, B., Dayanıklı, P., Palaoğlu, K. E., & Bersot, T. P. (2001) Plasma lipids in Turkish children: Impact of puberty, socioeconomic status, and nutrition on plasma cholesterol and HDL. J Lipid Res 42, 19962006.
  • Mahley, R. W., Palaoğlu, K. E., Atak, Z., Dawson-Pepin, J., Langlois, A.-M., Cheung, V., Onat, H., Fulks, P., Mahley, L. L., Vakar, F., Özbayrakçı, S., Gökdemir, O., & Winkler, W. (1995) Turkish Heart Study: Lipids, lipoproteins, and apolipoproteins. J Lipid Res 36, 839859.
  • Mahley, R. W., Pépin, J., Palaoğlu, K. E., Malloy, M. J., Kane, J. P., & Bersot, T. P. (2000) Low levels of high density lipoproteins in Turks, a population with elevated hepatic lipase: High density lipoprotein characterization and gender-specific effects of apolipoprotein E genotype. J Lipid Res 41, 12901301.
  • McClellan, J. & King, M.-C. (2010) Genetic heterogeneity in human disease. Cell 141, 210217.
  • McQuillan, R., Leutenegger, A.-L., Abdel-Rahman, R., Franklin, C. S., Pericic, M., Barac-Lauc, L., Smolej-Narancic, N., Janicijevic, B., Polasek, O., Tenesa, A., MacLeod, A. K., Farrington, S. M., Rudan, P., Hayward, C., Vitart, V., Rudan, I., Wild, S. H., Dunlop, M. G., Wright, A. F., Campbell, H., & Wilson, J. F. (2008) Runs of homozygosity in European populations. Am J Hum Genet 83, 359372.
  • Mergen, H., Öner, R., & Öner, C. (2004) Mitochondrial DNA sequence variation in the Anatolian peninsula (Turkey). J Genet 83, 3947.
  • Nasidze, I., Ling, E. Y. S., Quinque, D., Dupanloup, I., Cordaux, R., Rychkov, S., Naumova, O., Zhukova, O., Sarraf-Zadegan, N., Naderi, G. A., Asgary, S., Sardas, S., Farhud, D. D., Sarkisian, T., Asadov, C., Kerimov, A., & Stoneking, M. (2004) Mitochondrial DNA and Y-chromosome variation in the Caucasus. Ann Hum Genet 68, 205221.
  • Onat, A. (2001) Risk factors and cardiovascular disease in Turkey. Atherosclerosis 156, 110.
  • Onat, A., Hergenç, G., Uzunlar, B., Ceyhan, K., Uyarel, H., Yazıcı, M., Dogan, Y., Özmay, M., Toprak, S., & Sansoy, V. (2003) Determinants of HDL-cholesterol and its prediction of coronary disease among Turks (in Turkish). Arch Turk Soc Cardiol 31, 513.
  • Patterson, N., Price, A. L., & Reich, D. (2006) Population structure and eigenanalysis. PLoS Genet 2, e190.
  • Price, A. L., Zaitlen, N. A., Reich, D., & Patterson, N. (2010) New approaches to population stratification in genome-wide association studies. Nat Rev Genet 11, 459463.
  • Purcell, S., Neale, B., Todd-Brown, K., Thomas, L., Ferreira, M. A. R., Bender, D., Maller, J., Sklar, P., de Bakker, P. I. W., Daly, M. J., & Sham, P. C. (2007) PLINK: A tool set for whole-genome association and population-based linkage analyses. Am J Hum Genet 81, 559575.
  • Quintana-Murci, L., Chaix, R., Wells, R. S., Behar, D. M., Sayar, H., Scozzari, R., Rengo, C., Al-Zahery, N., Semino, O., Santachiara-Benerecetti, A. S., Coppa, A., Ayub, Q., Mohyuddin, A., Tyler-Smith, C., Qasim Mehdi, S., Torroni, A., & McElreavey, K. (2004) Where West meets East: The complex mtDNA landscape of the southwest and Central Asian corridor. Am J Hum Genet 74, 827845.
  • Regueiro, M., Cadenas, A. M., Gayden, T., Underhill, P. A., & Herrera, R. J. (2006) Iran: Tricontinental nexus for Y-chromosome driven migration. Hum Hered 61, 132143.
  • Rosenberg, N. A. (2006) Standardized subsets of the HGDP-CEPH Human Genome Diversity Cell Line Panel, accounting for atypical and duplicated samples and pairs of close relatives. Ann Hum Genet 70, 841847.
  • Rosenberg, N. A., Huang, L., Jewett, E. M., Szpiech, Z. A., Jankovic, I., & Boehnke, M. (2010) Genome-wide association studies in diverse populations. Nat Rev Genet 11, 356366.
  • Rosenberg, N. A., Pritchard, J. K., Weber, J. L., Cann, H. M., Kidd, K. K., Zhivotovsky, L. A., & Feldman, M. W. (2002) Genetic structure of human populations. Science 298, 23812385.
  • Saitou, N. & Nei, M. (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4, 406425.
  • Scheet, P. & Stephens, M. (2006) A fast and flexible statistical model for large-scale population genotype data: Applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet 78, 629644.
  • Silva-Zolezzi, I., Hidalgo-Miranda, A., Estrada-Gil, J., Fernandez-Lopez, J.C., Uribe-Figueroa, L., Contreras, A., Balam-Ortiz, E., del Bosque-Plata, L., Velazquez-Fernandez, D., Lara, C., Goya, R., Hernandez-Lemus, E., Davila, C., Barrientos, E., March, S., & Jimenez-Sanchez, G. (2009) Analysis of genomic diversity in Mexican Mestizo populations to develop genomic medicine in Mexico. Proc Natl Acad Sci USA 106, 86118616.
  • Tamura, K., Dudley, J., Nei, M., & Kumar, S. (2007) MEGA4: Molecular Evolutionary Genetics Analysis (MEGA) software version 4.0. Mol Biol Evol 24, 15961599.
  • Tang, H., Peng, J., Wang, P., & Risch, N. J. (2005) Estimation of individual admixture: Analytical and study design considerations. Genet Epidemiol 28, 289301.
  • Teo, Y.-Y., Sim, X., Ong, R. T. H., Tan, A. K. S., Chen, J., Tantoso, E., Small, K. S., Ku, C.-S., Lee, E. J. D., Seielstad, M., & Chia, K.-S. (2009) Singapore Genome Variation Project: A haplotype map of three Southeast Asian populations. Genome Res 19, 21542162.
  • Tian, C., Kosoy, R., Lee, A., Ransom, M., Belmont, J. W., Gregersen, P. K., & Seldin, M. F. (2008) Analysis of East Asia genetic substructure using genome-wide SNP arrays. PLoS One 3, e3862.
  • Wells, R. S., Yuldasheva, N., Ruzibakiev, R., Underhill, P. A., Evseeva, I., Blue-Smith, J., Jin, L., Su, B., Pitchappan, R., Shanmugalakshmi, S., Balakrishnan, K., Read, M., Pearson, N. M., Zerjal, T., Webster, M. T., Zholoshvili, I., Jamarjashvili, E., Gambarov, S., Nikbin, B., Dostiev, A., Aknazarov, O., Zalloua, P., Tsoy, I., Kitaev, M., Mirrakhimov, M., Chariev, A., & Bodmer, W. F. (2001) The Eurasian heartland: A continental perspective on Y-chromosome diversity. Proc Natl Acad Sci USA 98, 1024410249.
  • Xing, J., Watkins, W. S., Shlien, A., Walker, E., Huff, C. D., Witherspoon, D. J., Zhang, Y., Simonson, T. S., Weiss, R. B., Schiffman, J. D., Malkin, D., Woodward, S. R., & Jorde, L. B. (2010) Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics 96, 199210.
  • Xu, S., Huang, W., Qian, J., & Jin, L. (2008) Analysis of genomic admixture in Uyghur and its implication in mapping strategy. Am J Hum Genet 82, 883894.
  • Xu, S. & Jin, L. (2008) A genome-wide analysis of admixture in Uyghurs and a high-density admixture map for disease-gene discovery. Am J Hum Genet 83, 322336.
  • Yu, Y., Wyszynski, D. F., Waterworth, D. M., Wilton, S. D., Barter, P. J., Kesäniemi, Y. A., Mahley, R. W., McPherson, R., Waeber, G., Bersot, T. P., Ma, Q., Sharma, S. S., Montgomery, D. S., Middleton, L. T., Sundseth, S. S., Mooser, V., Grundy, S.M., & Farrer, L.A. (2005) Multiple QTLs influencing triglyceride and HDL and total cholesterol levels identified in families with atherogenic dyslipidemia. J Lipid Res 46, 22022213.
  • Zerjal, T., Wells, R. S., Yuldasheva, N., Ruzibakiev, R., & Tyler-Smith, C. (2002) A genetic landscape reshaped by recent events: Y-chromosomal insights into Central Asia. Am J Hum Genet 71, 466482.

Supporting Information

  1. Top of page
  2. Summary
  3. Introduction
  4. Materials and Methods
  5. Results
  6. Discussion
  7. Acknowledgements
  8. Conflict of Interest
  9. References
  10. Supporting Information

Table S1 Average estimated individual parental ancestry (K &equals; 7) for selected populations by FRAPPE analysis.

Table S2Fst matrix among HGDP, Turkish and Kyrgyz populations.

Table S3 Pairwise IBS sharing within selected HGDP, Turkish and Kyrgyz populations.

Figure S1 PC analysis of Turkish (Aydin, Istanbul and Kayseri), European and Middle Eastern populations.

Figure S2 PC analysis of Turkish (Aydin, Istanbul and Kayseri), Central Asian and Pakistani populations.

Figure S3 PC analysis of Turkish samples from three different regions (Aydin, Istanbul and Kayseri).

Figure S4 Multi-scale dimensional plot demonstrating genetic relatedness of HGDP, Turkish and Kyrgyz samples.

Figure S5 Estimated individual ancestry and population structure in 339 individuals by FRAPPE analysis with 105,382 SNP markers.

Figure S6 Estimated individual ancestry and population structure with 6408 SNP markers genotyped.

Figure S7 Allele frequency comparison of pairs of populations (Turkish, Kyrgyz and selected HGDP populations).

Figure S8 Histograms of ROH in populations.

Table S1 Average estimated individual parental ancestry (K &equals; 7) for selected populations by FRAPPE analysis.

Table S2Fst matrix among HGDP, Turkish and Kyrgyz populations.

Table S3 Pairwise IBS sharing within selected HGDP, Turkish and Kyrgyz populations.

Figure S1 PC analysis of Turkish (Aydin, Istanbul and Kayseri), European and Middle Eastern populations.

Figure S2 PC analysis of Turkish (Aydin, Istanbul and Kayseri), Central Asian and Pakistani populations.

Figure S3 PC analysis of Turkish samples from three different regions (Aydin, Istanbul and Kayseri).

Figure S4 Multi-scale dimensional plot demonstrating genetic relatedness of HGDP, Turkish and Kyrgyz samples.

Figure S5 Estimated individual ancestry and population structure in 339 individuals by FRAPPE analysis with 105,382 SNP markers.

Figure S6 Estimated individual ancestry and population structure with 6408 SNP markers genotyped.

Figure S7 Allele frequency comparison of pairs of populations (Turkish, Kyrgyz and selected HGDP populations).

Figure S8 Histograms of ROH in populations.

FilenameFormatSizeDescription
AHG_701_sm_Figure.pdf3814KSupporting info item
AHG_701_sm_Table.pdf167KSupporting info item

Please note: Wiley Blackwell is not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing content) should be directed to the corresponding author for the article.