*Corresponding author: Dr. Pierre Zalloua, The Lebanese American University, Chouran, Beirut 1102 2801, Lebanon. Tel: +961-1-784408 Ext. 2855; Fax: +961-9-546090; E-mail: email@example.com
We have examined the male-specific phylogeography of the Levant and its surroundings by analyzing Y-chromosomal haplogroup distributions using 5874 samples (885 new) from 23 countries. The diversity within some of these haplogroups was also examined. The Levantine populations showed clustering in SNP and STR analyses when considered against a broad Middle-East and North African background. However, we also found a coastal-inland, east-west pattern of diversity and frequency distribution in several haplogroups within the small region of the Levant. Since estimates of effective population size are similar in the two regions, this strong pattern is likely to have arisen mainly from differential migrations, with different lineages introduced from the east and west.
The Levant lies in the eastern Mediterranean region, south of the mountains of Cilicia (South Turkey) and north of the Sinai Peninsula. Throughout human prehistory and history, this territory has been a key area, due to its geographical location linking three continents: Europe, Asia and Africa. It was on one of the early out-of-Africa migration routes (Stringer et al., 1989; Bar-Yosef, 1992; Tchernov, 1994; Lahr & Foley, 1998; Luis et al., 2004), is believed to the be the first recipient of migration waves from East Africa seeking milder climatic conditions after the Last Glacial Maximum (LGM) and was the corridor for Neolithic migrations from the Fertile Crescent to Europe and North Africa (Cavalli-Sforza, 1997).
Within recent millennia, the Levant has been a frequent focus of conquest by a variety of states and imperial powers (Issawi, 1988). It was home to some of the oldest cities (Jericho, Byblos and Damascus), a path of some of the oldest and later most important trade routes, and a cradle to three of the World's Great Religions (Hitti, 1957). Signatures of the genetic legacy of the Phoenician expansion and the Diaspora (Aubet, 1993; Zalloua et al., 2008a), and traces of the Muslim expansion and the Crusaders (Lamb, 1930; Zalloua et al., 2008b) have been identified. However, there remain a number of known historical events, ranging from the Byzantine expansion, the Ottoman expansion, periods of Egyptian, Babylonian, Persian, Hittite (Hitti, 1957) and other subjugations, as well as the likelihood of many unknown prehistoric and historic events, whose genetic impact is yet to be revealed. During the Bronze Age, the land south-west of the Fertile Crescent was called the land of Canaan, part of which became Phoenician territory after 1200 BCE (Issawi, 1988). This land included most of the coastal territory of the eastern Mediterranean countries. These countries shared a common culture and history (Hourani, 1946) but some population expansions and migrations have affected Lebanon and Syria in different ways (Harris, 2003). Such movements include the Aramean rule in Syria (Harris, 2003), the Sea Peoples disruption to some coastal cities north and south of modern day Lebanon (in the Phoenician era) (Harden, 1971) and the Ottoman occupation (Akarli, 1993).
Modern Jewish populations have a special phylogeographical status within the Levant for several reasons. Even by the Roman era, there was a significant Diaspora, as attested to by Strabo, Philo, Josephus, and Cicero (Stern, 2007). The Diaspora fragmented into branches with distinct and well-studied genetic signatures, such as those of Sephardim (Adams et al., 2008) and Ashkenazi (Behar et al., 2003; Nebel et al., 2005) and incorporated significant levels of male admixture (Adams et al., 2008). Current Jewish populations in the Levant derive largely from a complex pattern of resettlement from multiple sources within the last ∼50 years and may not represent the pre-Diaspora distribution (Baron, 2007). Jewish genetics have been studied more than those of many other groups (Carmeli, 2004). The current study therefore seeks to focus on a geographical genetic profile through the Levant and surrounding regions among the relatively less well characterized populations, highlighting the product of other migratory and population processes that influenced the Levantine region.
Previous studies of Levantine Y-chromosomal diversity have either focused on a specific population (Zalloua et al., 2008b) or have been limited in their sampling or genotyping (Cadenas et al., 2008). A general survey of the Levantine genetic landscape could provide significant guidance to construct and test hypotheses concerning known and unknown expansion events in the region. We have therefore assembled a comprehensive set of 5874 samples representing 23 countries and 35 populations, to present a survey of the Y-lineage distributions throughout the Levant defined by 58 binary markers and, for some, 19 Y-STRs, and place these in the context of the surrounding Y-chromosomal landscape.
In this study we analyzed the geographical distribution of Y-chromosomal haplogroups and STR haplotypes to identify recent and old migration and expansion patterns involving the Levant. This work should shed light on the geographic gradient or structuring resulting from migrations, geographical expansions, and size fluctuations affecting the populations over time. The haplogroup variation and the geographic clustering of some of the Y-STR haplotypes observed in the Levant is suggestive of several succeeding waves of migrations and or expansions into the region that may have taken place post LGM.
Materials and Methods
Subjects and Comparative Datasets
A total of 885 new samples from five populations (Syria, Jordan, Iran, Egypt and Kuwait) were collected and analyzed for this study. In addition, samples and genotyping data from 951 Lebanese, 200 Syrian and 101 Palestinian men (Akka) were already available (Zalloua et al., 2008a, 2008b). All the participants had three generations of paternal ancestry in their country of birth. Each provided detailed information on their geographical origin (Table S1) and gave informed consent for this study, which was approved by the IRB committee of the Lebanese American University. The 1879 Levantine samples (Lebanese, Syrians, Jordanians and Palestinians) were classified into eight regions. These regions mainly reflect geographical subdivisions within modern day Lebanon and Syria (Table S3a and Fig. 1). The Lebanese samples, collected from 20 different cities (Table S3b), were classified into the Lebanese coast (LC) which contains coastal Lebanese cities including Beirut, Tyre, Sidon, Jounieh and Tripoli, and Lebanese inland (LI) which contains samples from Bekaa and Nabatiyeh. The Palestinian samples (PS), were collected from men currently residing in Lebanon but who originated from Akka. Finally, Syrian samples were from 17 different Syrian cities (Table S3b) and were divided into coast (SC) and four inland regions: inland north (SIN), centre (SIC), south (SIS) and east (SIE) (Table S3a and Fig. 1).
DNA samples were extracted from blood or buccal swabs by standard methods (Wells et al., 2001; Behar et al., 2007). Samples were genotyped with a set of 58 Y-chromosomal binary markers on the non-recombining portion of the Y chromosome (Table S1 and Fig. S1). The markers were genotyped by TaqMan RealTime PCR assays (Applied Biosystems, Foster City, CA). Previously typed samples with a derived allele for each biallelic polymorphism were used as positive controls. These markers define 53 haplogroups (including paragroups), 21 of which were present in the typed samples. The phylogenetic relationships of the relevant Y-chromosomal haplogroups are illustrated in Figure S1 and follow the 2008 YCC convention (Table S1) (Karafet et al., 2008). Published data used in this study were converted to the 2008 YCC nomenclature (Table S2).
All samples were additionally amplified at 19 Y-chromosomal STR loci in two multiplexes. Multiplex I contained the standard 17 loci of the Applied Biosystems Y-filer™ PCR Amplification kit (ww.appliedbiosystems.com). The remaining two loci, DYS388 and DYS426, were genotyped in a separate multiplex (multiplex II), for which we developed an allelic ladder by amplifying and mixing previously typed samples with different number of repeats at the desired locus. The Y-STR data generated in this study are shown in Table S1. STR alleles were named according to current recommendations (Gusmao et al., 2006).
The geographical coordinates of the sample sites are shown in Table S3b (http://itouchmap.com/latlong.html). These include coordinates for the different cities in Lebanon and Syria that were used to make the high resolution maps shown in Figure 2C, as well as the coordinates for the different countries (including Lebanon and Syria as a whole) used to make the maps in Figure 2A-B and D-F. Haplogroup-frequency surfaces were inferred using the Surfer System version 8.09 (Golden Software, Golden, CO) as previously described (Semino et al., 2004). The haplogroup data used to construct Figure 2A-F are shown in Table 1 and the J1 and J2 frequency distribution across the cities of the Levant is shown in Table S3c.
Population pairwise comparisons and group comparisons were performed using the χ2 test of independence for qualitative variables. Spatial analysis of molecular variance (SAMOVA) (Dupanloup et al., 2002) was performed with the package SAMOVA 1.0 (http://web.unife.it/progetti/genetica/Isabelle/samova.html). SAMOVA is based on AMOVA (Excoffier et al., 1992) which provides a measure of variation between groups of populations, accounting for variations due to drift within populations by means of a nested two-way analysis of variance. Autocorrelation indices for DNA analysis (AIDA) was calculated as described: http://web.unife.it/progetti/genetica/Giorgio/giorgio_soft.html. AIDA is a spatial autocorrelation analysis that tests the dependence of the values of a variable on the values of the same variable at another geographical location, in order to reveal geographical patterns of gene variations (Bertorelle & Barbujani, 1995). RST values based on Y-STR data were calculated using Arlequin 3.11 (Excoffier et al., 2005) and displayed as a multidimensional scaling (MDS) plot with SPSS 14.0. The plot shown in Figure S2 was a good fit to the data with a stress value of 0.08 and RSQ of 0.98.
The phylogenetic relationships between the microsatellite haplotypes were elucidated through reduced-median networks (Bandelt et al., 1995) using the program Network 22.214.171.124 (2008 version) (Fluxus Engineering, Clare, U.K.). Based on their consistent representation in the dataset, the following 10 loci were retained for analysis: DYS19, DYS388, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437 and DYS439. Weights applied to each locus were selected to be inversely proportional to the variance of that STR locus (specifically, the weight was 10 times the average variance divided by the locus variance). A SNP weighted at 99 marking the distinction between J1 and J2 was introduced to join those networks. The reduction coefficient was set to 1.0.
Principal Component Analysis
Principal Component Analysis (PCA) (Jolliffe, 1986) was performed on haplogroup frequencies of the Levantine samples (Table S1, Table S2) as well as Middle-Eastern and North African samples (Table S1, Table S2 and Table 1). The data were displaced about the means, and were not normalized by standard deviation (Novembre & Stephens, 2008) resulting in a diagonalisation of the covariance matrix. Principal Component selection followed the method of Cattell (Cattell, 1966).
Nei's genetic diversity measures the probability that any two chromosomes drawn from the population will not share the same type (Nei, 1973; Nei, 1978). Here, we use “types” defined in two ways. First, we computed diversity by haplogroup (displayed in Table 2A). Second, we considered STRs. Since a number of STR haplotypes are shared by multiple haplogroups (Table S6), we used a combination of STRs and SNPs to define haplotypes. Since diversities are so close to 1 with such large numbers of types (Xue et al., 2006), we report diversities in terms of deviations from 1, measuring the probability that two chromosomes share the same type. Further, we also report the expected standard error for the diversity estimator to ensure the deviations between such narrowly deviating measures of diversity were meaningful. The reciprocal of the probability that two chromosomes share the same type can be interpreted as a characteristic number of dominating types in the region.
Table 2A. Percentage of haplogroups in the North/South and coast/inland axes in the Levant.
The derivation of this estimator is provided in the supplementary materials, and is consistent with Nei's estimate of the variance (Nei & Roychoudhury, 1974). The results of the estimator for haplogroup data are displayed in Table 2A, while the estimator applied to coastal vs. inland groups identified by SNP+STR loci are presented in Table 2B. Further detailed description of the method used is included in supplemental methods.
Table 2B. Nei diversity by groups determined by SNP and STR.
Number of STR Loci
Number of SNP/STR Types
A. Haplogroups percentages and Nei diversities of the North/South and Coast/Inland axes in the Levant, used for Figure 3B. The p-values were obtained using the χ2 test. B. Coastal and inland Levantine Nei diversities by groups determined by SNPs and STRs
1–0.04216 ± 0.003440
1–0.09139 ± 0.009652
1–0.02437 ± 0.001955
1–0.04383 ± 0.004802
1–0.00641 ± 0.00063
1–0.01452 ± 0.00226
1–0.00189 ± 0.000313
1–0.004598 ± 0.000979
Effective Population Sizes
Effective population sizes for coastal and inland populations were estimated using BATWING (Wilson et al., 2003), applied to the 564 coastal Levantine samples and 255 inland Levantine samples for which all of the 17 SNPs (M96, M35, M78, M123, M89, M201, M170, 12f2.1, M172, M12, M9, M70, M20, M45, M173, M269, M17) and 11 STR loci (DYS19, DYS388, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393, DYS437, DYS438, and DYS439) were typed. We use setting and priors described earlier (Xue et al., 2006) with a constant-sized then expanding population and ran the program for 250,000 Monte-Carlo cycles for the inland set, and 300,000 for the coastal set. Satisfactory convergence was demonstrated by plotting the trajectory of Na (Fig. S3). Cycles 100,000–300,000 were used to determine posterior estimates of parameters for the coastal sample, and cycles 100,000–250,000 for the inland sample.
The region under study herein is shown in Figure 1. The map shows the Levantine countries and regions (modern borders), including Lebanon, Syria, Akka and Jordan.
The Y-chromosomal haplogroup distribution in the Levantine population was compared to the surrounding Middle Eastern and North African countries (Fig. 1) using 884 newly collected samples in addition to our existing Middle Eastern population database (Table S1 and Table S2). As previously reported (Zalloua et al., 2008b), the most frequent haplogroups present in the Levant are J1, J2, R1b, E1b1b1 and I (Fig. S1), and these are shown in Figures 2, 4 and 5, although the statistical analyses, such as Nei diversity and AIDA, were performed on all the haplogroups found in each sample, as shown in Table 2A. We first consider these distributions individually.
The haplogroup frequency for J1 peaked in the Arabian Peninsula (Yemen, UAE, and Kuwait) and decreased beyond the Middle-East and North Africa (Fig. 2A). J1 frequencies in Syria, Akka and Jordan were more comparable to Lebanon than to the remaining Arabic countries (58.3% in Qatar and 72.5% in Yemen; Fig. 2G). Haplogroup J2, in contrast, was present at its highest frequency in the Lebanese population (29.4%) and was significantly more frequent there, than in the remaining Levantine regions (p < 0.05) (Table 1). As previously reported (Zalloua et al., 2008a), it decreases towards the west in North African countries and towards the east in the Arabian Peninsula (29.4% in Lebanon compared to 7.6% in Egypt and 8.3% in Kuwait; Fig. 2G and Table 1).
The frequencies of the R1b and I haplogroups (Rootsi et al., 2004) peaked in Europe and the gradient faded beyond the Levant (Fig. 2D and 2F). R1b showed some variability in the Levant (4.5% - 9%), had minimal presence in Qatar (1.4%) and was absent from the Yemen sample. Iraq and Kuwait showed significantly higher frequencies of R1b (10.8% and 9.5% respectively; Fig. 2G) which may be explained by the strong historical Ottoman influence (Al-Zahery et al., 2003).
Finally, E1b1b1 (previously E3b), showed the highest concentrations in North African and Berber-speaking populations (Egypt, Morocco and Tunisia; Fig. 2E) (Bosch et al., 2001; Cruciani et al., 2002). It showed significant variability in frequency among the Levantine regions (16.2% in Lebanon, 12% in Syria, 26.4% in Akka and 23% in Jordan) (pairwise comparisons: p-value Lebanon vs. Syria = 0.015, Lebanon vs. Akka = 0.009 and Lebanon vs. Jordan = 0.007); however, E1b1b1 frequencies in the Levant as a whole were significantly lower than those in North African countries (42.7% in Egypt, 51.3% in Tunisia and 52.5% in Morocco; Figure 2G) (p-values for pairwise comparisons with Lebanon all < 0.001).
We next investigated the overall Y-chromosomal genetic structure of the region. In an autocorrelation (AIDA) analysis, the autocorrelation index II decreased from positive to negative values with increasing geographical distance (Fig. 3A), demonstrating an underlying clinal pattern: nearby populations tend to be similar (positively correlated), while distant populations tend to be dissimilar (negatively correlated). SAMOVA, however, invariably distinguished additional single samples as the number of groups specified was increased, revealing a lack of distinct clusters of geographically contiguous samples, perhaps reflecting the sampling strategy which provided multiple Levantine samples and diverse sets of more distant ones (Table S4).
In order to examine the haplogroup distribution further, we performed a PCA analysis on the frequencies of the nine haplogroups listed in Table 2, with any additional rare haplogroups combined into a single “others” category. We included the Levantine regions plus samples from Egypt, Morocco, Tunisia, Cyprus, Turkey, Iran, Iraq, Jordan, Qatar, UAE, and Yemen. The percentages of variance associated with each principal component are shown in Figure 3B (lower right panel). PC1 captures 61.3% of the variation, followed by a substantially smaller 27.1% for PC2, with PC3 and PC4 explaining 6.4% and 2.2% respectively. Following this, the remaining PCs carry 3.0% of the variation (Table S5).
PC1 increases with decreasing E1b1b and increasing J1 (Table S5). It showed the largest variation across North Africa, reflecting the high frequency of E1b1b across this region, together with the more localized distribution of J1 in the East. The PC1 scores place the Levantine sites close to each other and close to Cyprus and Egypt in the span from Morocco to the West to Qatar and Yemen to the East. Within this group, SC, LC, and SIS show a reduced J1 score relative to inland Levantine regions. PC2 increases with increasing J2, and with decreasing J1 and E1b1b (Table S5). Its distribution identified a gradient in the south to north direction through the Levant, placing Africa in the southern portion of the Levant along with UAE, SIE, PS and Qatar. However, the localized J1 and J2 gradients show increasing values for more coastal sites LC, SC, SIN, LI, and SIC compared to PS and SIE. PC3 increases with decreasing J2 and with increasing L (Table S5). Almost all of the regions appeared similar to each other, including all the North African samples, except for Iran and SIE. PC4 increases with increasing R1b and G, and with decreasing R1a and J2 (Table S5). This principal component shows the largest spread among Levantine sites. In this case, SC, LC, LI, and SIE show larger values, while SIC, SES, PS, and SIN show smaller ones.
The principal components capturing the largest variations primarily establish the Levant within the context of the larger-scale Neolithic signal across North Africa, as well as variations between Iran and Iraq, and the Arabian Peninsula. PC4 shows the strongest signal differentiating among Levantine sites.
An MDS analysis of Y-STR-based genetic distances RST, showed similar general features to the two leading principal components. Levantine populations were mostly clustered, while North African populations were progressively more distinct as the distance west increased. Two exceptions, however, were SIE, which is seen to be divergent in PC3, and the similarity between Jordan and Cyprus, not observed among any of the PCs.
A higher resolution contour map of haplogroup frequency distribution among Levantine cities (Table S3b and S3c) revealed coast/inland opposing gradients for J1 and J2 (Fig. 2C). We then regrouped the Levant into northern and southern regions (as shown in Fig. 4) and into three regions going from west to east (coast, inland and further inland regions). J1 frequencies, but not those of J2, were significantly different along the South to North axis of the Levant. More strikingly, going from coastal to inland regions in the Levant, there was a significant increase of J1 frequencies (19.8 to 48.2%, p < 0.001) compared to a significant decrease of J2 frequencies (26.8% to 3.4%, p < 0.001) (Table 2A). This steep difference between the coast and inland regions was particularly remarkable considering the small geographical area under consideration.
Application of the Nei diversity estimator to haplogroup frequency data from the individual Levantine populations showed a minimum value 0.669 for SIE. This would suggest that that population is dominated by roughly 3 haplogroups. Haplogroups J1 and L are the most frequent, with a number of other rarer haplogroups. The rest of the entries in Table 2A show diversities in the 0.7 to 0.8 range, suggesting a rough average number of dominating haplogroups in the range of 3.3 to 5. Table 2A shows that these populations are dominated by two or three haplogroups, but with slightly lower relative frequencies than in the SIE population, and with higher frequencies among the remaining haplogroups. The coastal and near coastal groups SC+LC+PS and SIN+SIC+LI+SIS show similar Nei diversity values, while the far-inland SIE shows the greatest reduction in Nei diversity.
Diversity values were further investigated by including both haplogroup and STR haplotype, and varying the number of STR loci used. The number of SNP+STR types ranged from 257 for 2 STR loci through to 1286 for 10 STR loci (Table 2B). For such large numbers of STR+SNP types, the probability that any two chromosomes drawn from the population may be expected to share the same type is small. For all samples, the estimated probability that two chromosomes would share the same STR+SNP type was roughly twice as high for inland samples as for coastal samples, indicating consistently lower inland diversity. Further, the differences between inland and coastal diversities were much larger than the corresponding estimated standard errors (Table 2B). Estimation of the effective population size using BATWING, however, led to similar numbers for the two regions: ∼1200 (95% CI 840 - 1550) near the coast and ∼1230 (95% CI 930 - 1530) inland.
We sought, through STR network analysis, to assess whether or not the observed geographic distribution of each haplogroup was reflected in geographic variations of STR haplotype distributions. The J1 and J2 (Fig. 5A) sister clades depicted a clear non-uniform geographic distribution of STR haplotypes and few instances of haplotype sharing across geographic regions. Consistent with previous analyses, coastal Levantine regions were well represented in the J2 network. Some evidence of sharing with Jordan was also apparent. The J1 network was dominated by inland Levantine samples (mainly Jordan and inland Lebanon and Syria). The R1b network showed much less geographic correlation, possibly because most of the R1b chromosomes have entered the region recently (Fig. 5B). In fact, without extra-Levantine representation of R1b to establish context, it is difficult to identify where these R1bs originated. Finally, E1b1b showed a clear demarcation between the Levantine STR haplotypes and North African STR haplotypes, with a lower diversity among North African STR haplotypes than among Levantine STR haplotypes (Fig. 5C). 91 STR haplotypes belonged to the E1b1b1 haplogroup within the Levantine population compared to 60 STR haplotypes within the North African population (Table S6).
In this study, we have used a combination of novel and published data to explore the Y-chromosomal landscape of the Levant and its surrounding regions. On a large geographical scale, the Levantine samples clustered together and were readily distinguished from North African or Arabian Peninsula samples (Figs. 2 and 3). This pattern of correlation between genetics and geography is expected from many previous studies of human variation (Cavalli-Sforza, 1997) and is particularly marked for the Y chromosome (Jobling & Tyler-Smith, 2003). However, within the Levant, there was significant heterogeneity, with a predominantly east-west, coastal-inland structure (Fig. 4). Such a high level of differentiation within a small geographical region is striking, and we consider here the historical and other factors that might have contributed to it.
The Levantine genetic structure consists largely of decreasing frequencies of the major haplogroups J2 and E1b1b1 inland and a corresponding increasing frequency of J1, as well as similar patterns in several lower-frequency haplogroups such as L, R1b and I. Diversity is higher on the coast. Such a pattern could be interpreted in two ways: as arising from genetic drift within a geographically stable population that differs in effective population size between the coast and interior (Kimura & Crow, 1964), or alternatively as arising from differential migration from distinct sources into populations of similar size. These interpretations are not mutually exclusive, and we next explore the possible contributions of each.
The coastal region lies within the western section of the Fertile Crescent and has been densely inhabited for all of recorded history and much of prehistory, as illustrated by the location of several of the earliest known cities here. In contrast, most of the interior is dry and now experiences desert or semi-desert conditions, and consequently supports a lower population density. These conditions are likely to have prevailed for much of the Holocene. A lower level of haplogroup diversity inland may therefore be expected from the demographic history, although the specific haplogroups that increased or decreased in frequency would be a matter of chance.
Recent migrations impacting the region under consideration included impacts from Hittites, Babylonians, Persians, Phoenicians (Zalloua et al., 2008a) and subsequent trade, Romans, Sassanids, the Muslim expansion (Zalloua et al., 2008b), the movement of the Omayyad capital from Baghdad to Damascus, the Crusades (Zalloua et al., 2008b), The Ottoman Empire, and European Colonialism as well as significant caravan trade through the region throughout this entire time period. Against this background, estimation of an effective population size is problematical. BATWING reports very similar population sizes for inland and coastal populations, arguing against a detectable gradient of effective population size driving differential drift, although the BATWING results must be interpreted with caution because of the complex demography. Nevertheless, the differences in diversity suggest that migration has most likely been the determining factor in distinguishing coastal diversity from nearby inland regions.
Haplogroup J is believed to have split into J1 and J2 about 18 Kya (Semino et al., 2004). These two sister clades showed distinct histories and geographical localizations with a coastal range that is predominantly J2 and an inland range that is predominantly J1. Chiaroni, King and Underhill, report the same inland vs. coastal divergence pattern of J1 and J2, and correlate the expansion during the rise of agriculture in the Fertile Crescent to the patterns of rainfall distribution (Chiaroni et al., 2008). They suggest that the J2 haplogroup marked agricultural populations that followed the coasts, whereas the J1 haplogroup appears to have fixed in herdsmen populations that remained inland (Chiaroni et al., 2008).
The diversified J2 reduced-median network and high coastal frequency suggest a sustained and non-interrupted presence of this haplogroup along the Eastern coast of the Mediterranean. While our network analysis excluded the Arabian Peninsula and mainly focused on the Levantine regions, some haplotypes appear to have originated in the Peninsula, and to have been recently carried into the Levant with the Islamic expansion. Tofanelli et al. (2009) argue that the initial origin of spread of J1 into the Arabian Peninsula from North Africa is evident when correlated with Arabic nations’ haplotypes. Further, they date the expansion to before the Islamic expansion. While we observe the geographical gradients consistent with the expansions from the earlier refugia, it is also clear that the more recent Muslim expansion has had a significant impact on the elevated J1 frequency distribution in the Levant.
Migration and back-migration between the Levant and North Africa were evident from the haplogeography of haplogroup E1b1b1 and its haplotype network structure. E1b1b1 shows the highest frequency in North Africa (Egypt, Morocco and Tunisia) and drops gradually as it spills out of North Africa through the horn of Africa and the Levantine corridor into the Arabian Peninsula and central Asia, and through the Strait of Gibraltar into Spain and South Europe. A historical event that allowed the spread of E1b1b1 to Iberia most likely occurred through the Strait of Gibraltar, culminating with the Islamic conquests of Iberia. Their armies bore large numbers of Berber recruits, whose presence started with the Umayyad Caliphates (661–75CE) and continued uninterrupted for several hundred years. This distinction between Arabic and Berber pools of E1b1b1 in North Africa was highlighted by a study conducted on populations from Morocco, Tunisia and Algeria (Gerard et al., 2006).
The E1b1b1 frequency gradient, considered in the light of the haplotype diversity, suggests an early migration (Neolithic) from the Levant into North Africa that is consistent with a limited gene flow into Africa followed by a rapid expansion and later punctuated by some back migrations as a result of migratory events in the Mediterranean (Arredi et al., 2004). According to this model, the E1b1b1 frequency gradient between coastal Levant and inland Levant would reflect an origin and expansion near the coast, and limited migration inland. The STR network for the E1b1b1 haplotype lineages (Fig. 5C) dissects these migration events. The network shows geographical separation of the lineages marked by STRs between North African countries (Morocco and Tunisia) and the Levantine countries. Another independent measure is the number of mutations along a lineage. In Arredi et al. (2004), the authors estimated the time to the most recent common ancestor for E1b1b1 lineages in North Africa (E1b1b1b-M81) and this was found to coincide with a possible Neolithic origin.
In conclusion, the current Levantine Y-chromosomal landscape is dominated by the coastal-inland contrast in haplogroup frequencies and diversity that reflects largely the influence of successive migrations, which have tended to reinforce one another by introducing different Y lineages from the east and west.
We thank the sample donors for taking part in this study, Ms. Janet Ziegel and Mr. Pandikumar Swamikrishnan for their help with genotyping and data organization. YX and CTS are supported by The Wellcome Trust. The Genographic Project is supported by funding from the National Geographic Society, IBM and the Waitt Family Foundation.
The Genographic Consortium
Theodore G. Schurr, University of Pennsylvania, Philadelphia, Pennsylvania, United States of America; Fabrício R. Santos, Universidade Federal de Minas Gerais, Belo Horizonte, Minas Gerais, Brazil; Lluis Quintana-Murci, Institut Pasteur, Paris, France; Jaume Bertranpetit, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain; David Comas, Universitat Pompeu Fabra, Barcelona, Catalonia, Spain; Chris Tyler-Smith, The Wellcome Trust Sanger Institute, Hinxton, United Kingdom; Pierre A. Zalloua, Lebanese American University, Chouran, Beirut, Lebanon; Elena Balanovska, Russian Academy of Medical Sciences, Moscow, Russia; Oleg Balanovsky, Russian Academy of Medical Sciences, Moscow, Russia; R. John Mitchell, La Trobe University, Melbourne, Victoria, Australia; Li Jin, Fudan University, Shanghai, China; Himla Soodyall, National Health Laboratory Service, Johannesburg, South Africa; Ramasamy Pitchappan, Madurai Kamaraj University, Madurai, Tamil Nadu, India; Alan Cooper, University of Adelaide, South Australia, Australia; Lisa Matisoo-Smith, University of Auckland, Auckland, New Zealand; Ajay K. Royyuru, IBM, Yorktown Heights, New York, United States of America; Daniel E. Platt, IBM, Yorktown Heights, New York, United States of America; Laxmi Parida, IBM, Yorktown Heights, New York, United States of America; Jason Blue-Smith, National Geographic Society, Washington, District of Columbia, United States of America; David F. Soria Hernanz, National Geographic Society, Washington, District of Columbia, United States of America and R. Spencer Wells, National Geographic Society, Washington, District of Columbia, United States of America.