The Genetic Impact of the Lake Chad Basin Population in North Africa as Documented by Mitochondrial Diversity and Internal Variation of the L3e5 Haplogroup


  • Eliška Podgorná,

    1. Department of Anthropology and Human Genetics, Faculty of Science, Charles University, Prague, Czech Republic
    Search for more papers by this author
  • Pedro Soares,

    1. IPATIMUP (Instituto de Patologia e Imunologia Molecular da Universidade do Porto), Porto, Portugal
    Search for more papers by this author
  • Luísa Pereira,

    1. IPATIMUP (Instituto de Patologia e Imunologia Molecular da Universidade do Porto), Porto, Portugal
    2. Faculdade de Medicina, da Universidade do Porto, Portugal
    Search for more papers by this author
  • Viktor Černý

    Corresponding author
    1. Institute for Advanced Study, Paris, France
    • Archaeogenetics Laboratory, Institute of Archaeology of the Academy of Sciences of the Czech Republic, Prague, Czech Republic
    Search for more papers by this author

Corresponding author: VIKTOR CERNY, Archaeogenetic laboratory, Institute of Archaeology, Prague, Letenská 4, 11801 Prague. Tel: +00420257014304; Fax: +00420257532288; E-mail:


The presence of sub-Saharan L-type mtDNA sequences in North Africa has traditionally been explained by the recent slave trade. However, gene flow between sub-Saharan and northern African populations would also have been made possible earlier through the greening of the Sahara resulting from Early Holocene climatic improvement. In this article, we examine human dispersals across the Sahara through the analysis of the sub-Saharan mtDNA haplogroup L3e5, which is not only commonly found in the Lake Chad Basin (∼17%), but which also attains nonnegligible frequencies (∼10%) in some Northwestern African populations. Age estimates point to its origin ∼10 ka, probably directly in the Lake Chad Basin, where the clade occurs across linguistic boundaries. The virtual absence of this specific haplogroup in Daza from Northern Chad and all West African populations suggests that its migration took place elsewhere, perhaps through Northern Niger. Interestingly, independent confirmation of Early Holocene contacts between North Africa and the Lake Chad Basin have been provided by craniofacial data from Central Niger, supporting our suggestion that the Early Holocene offered a suitable climatic window for genetic exchanges between North and sub-Saharan Africa. In view of its younger founder age in North Africa, the discontinuous distribution of L3e5 was probably caused by the Middle Holocene re-expansion of the Sahara desert, disrupting the clade's original continuous spread.


The first reliable reports regarding the ethnic composition of the Lake Chad Basin population (henceforth referred to as the LCB) came from Arabic sources written in the 11th century (Levtzion & Hopkins, 1981; Insoll, 2003). Europe started discovering the area much later when the first contacts with the Borno Empire were established in the 19th century. Most of the information about the area south of the Sahara and its people had been gathered thanks to the accounts of travellers such as Dixon Denham, Heinrich Barth, and Gustav Nachtigal (Denham et al., 1826; Barth, 1857–1858; Nachtigal, 1879). Both Arabian and European sources reported various commercial activities which could have provided contact between regions today divided by the barren Sahara desert, including a slave trade which was introducing sub-Saharan individuals into North Africa.

It is widely accepted that the population history of the LCB, lying in the Sahara–Sahel belt, is closely linked with wet climatic phases of the late Quaternary (Maley, 1981). During more favorable climactic periods, Lake Chad served as an extended freshwater body that attracted both animals and humans to the area and encouraged settlement. Several zoogeographic analyses have reported the spread of various animal species across the Sahara during humid periods (Drake et al., 2011). The most important climatic changes have been documented for the Early Holocene starting ∼12 ka, when the giant 350,000-km2 Lake Mega-Chad—the largest late Pleistocene water body in Africa—was formed (Schuster et al., 2005; Leblanc et al., 2006; Bouchette et al., 2010).

Changing climatic conditions further north in Central Sahara resulted in episodic settlements that would usually not be likely to leave many archaeological traces. A major exception is the prehistoric site of Gobero (Sereno et al., 2008), found recently in Central Niger near a paleolake on the western tip of the Ténéré Desert. Many skeletal remains have been unearthed here, and the oldest occupation of the site is dated to the Early Holocene ∼9.7–8.2 ka. Craniometrical data gathered at the site reveal great similarity of this original population to the so called “Mechtoids” from Mali and Mauritania as well as to the Iberomaurusians and Capsians from Maghreb, providing particular confirmation of Early Holocene trans-Saharan connections. The settlement then experienced a hiatus until a different human population represented by more gracile skeletons arrived here in the Middle Holocene between 7.2 and 4.5 ka (Sereno et al., 2008).

Other archaeological evidence has shown that Late Pleistocene and Early Holocene hunter–fisher–gatherers in the Sahara were perfectly adapted to the lacustrine and interlacustrine environments. This “Aquatic” or “Aqualithic” civilization (Sutton, 1974; Sutton, 1977) also produced distinctive ceramic vessels sharing stylistic similarities with those found across the Sahara, suggesting close contact and/or high mobility over a large area. Although dry climatic conditions returned ∼6 ka (Foley et al., 2003), the aridity stress in some northerly places might have been significantly relieved by hydrological and/or hydrogeological conditions allowing for the existence of several smaller paleolakes fed by local aquifers which could support inhabitants within an otherwise desert environment (Grenier et al., 2009).

Our previous analyses of LCB genetic diversity revealed some autochthonous mtDNA clades such as L3f3 and L3e5 (Černý et al., 2007; Cerezo et al., 2011). Phylogeographic analysis and whole genome sequencing of L3f3 (Černý et al., 2009) showed not only that this clade is restricted to the LCB geographical region, where it expanded ∼9 ka, but that it prevails in its Chadic speaking peoples. Because L3f3 has deep roots in East Africa, where its mother clade L3f had already emerged ∼50 ka (Soares et al., 2012), we suggested a possible eastern link of this clade, likely related to the linguistically inferred westward migration of Cushitic pastoralists (Blench, 1999; Černý et al., 2009).

The second most common mtDNA clade in the LCB is L3e5, but it has yet to be analyzed in detail. The first age estimates of its mother haplogroup L3e pointed to ∼40–50 ka (Bandelt et al., 2001; Salas et al., 2002; Rosa et al., 2004; Behar et al., 2008), but newer estimates—based on a higher number of sequences and calculated using a mutation rate that takes the effect of purifying selection into account—suggest a slightly younger age ∼35 ka (Soares et al., 2012). As to the geographical origin of L3e, Central Africa has been broadly accepted (Rosa & Brehem, 2011) since its first inference several years ago (Bandelt et al., 2001). L3e daughter clades—all distinguishable from the mother haplogroup by one HVS-1 mutation—subsequently expanded from Central Africa in several directions. While L3e1 spread to Mozambique along the eastern stream of the Bantu expansion (Pereira et al., 2001b; Salas et al., 2002), L3e2, L3e3, and L3e4 moved in the opposite direction, mostly to West Africa, where their further dispersals were tied to post-Last Glacial Maximum (LGM) climatic ameliorations (Salas et al., 2002; Rosa & Brehem, 2011).

L3e5 is the latest L3e clade to have been discovered. It was initially described (but unnamed) in Tunisian Berbers (Fadhlaoui-Zid et al., 2004), and later also detected in the LCB where its root type (16041–16223), along with several derived types having few matches elsewhere, appears to be very frequent (Černý et al., 2007). It was further suggested that the evolution of L3e5 took place in the LCB region and that its most recent common ancestor lived there ∼11.5 ka (Černý et al., 2007). However, the presence of this clade in both North Africa and the LCB has been interpreted by some authors as favoring the hypothesis of a “trans-Saharan” migration route proposed on linguistic grounds (Cruciani et al., 2010b).

The interesting occurrence of L3e5 in both the LCB and North Africa led us to sample for its possible presence in the area between these two regions, in the Saharan parts of North Chad. Today's human settlement here is concentrated in the oases around the Lakes of Ounianga, where the seminomadic Daza people live in small hamlets. In this article, we present a thorough analysis of the population structure of the LCB including 41 new Daza sequences and its neighboring regions based on both geographic and linguistic criteria. Subsequently, we present an improved L3e5 mtDNA genome tree based on 19 new and 10 published complete sequences, revealing the internal subdivision of this specific clade occurring in both sub-Saharan and North African populations. We have also undertaken an L3e5 founder analysis allowing for a deeper insight into the gene flow pattern between the regions south and north of the Sahara and show that the most probable migration route of this clade was from the LCB to North Africa and not vice versa.

Material and Methods

Population Samples

Forty-one unrelated new DNA samples were collected from the northernmost regions of the LCB in several scattered hamlets south of the Lakes of Ounianga in Northern Chad. The seminomadic Daza people living here speak Dazaga, classified to the Saharan branch of the Nilo-Saharan language. Informed consent was obtained from all donors.

We also retrieved a large number of published mtDNA sequences from the LCB and neighboring populations. The dataset, together with the Daza, comprises a total of 7584 hypervariable segment 1 (HVS-1) mtDNA sequences and represents 154 African populations and three language families (39.3% Afro-Asiatic; 53.4% Niger-Congo; 7.3% Nilo-Saharan) divided to 13 language branches. We further divided the dataset geographically with respect to the LCB: 1058 sequences were from the LCB as such; 857 sequences were from the area to its northwest; 1011 from northeast; 1951 from west; 749 from east; 1223 from south; and 735 from the southeast, again vis-à-vis the LCB. It was particularly useful for the purposes of the spatial distribution analyses that the geographical coordinates (latitudes and longitudes) of most of the population samples (n = 141) could be identified from the original publications (see Fig. S1 and Table S1 for further details and references). Owing to their very different population history (Behar et al., 2008; Černý & Pereira, in press; Quintana-Murci et al., 2008) hunter–gatherer samples were not considered.

Generating Data

A total of 41 control region mtDNA sequences encompassing both HVS-1 and HVS-2 segments were generated in the newly collected Daza samples using primer P23 and P24 (Gonder et al., 2007); in total we retrieved the sequence range from nucleotide position 15882 to 701. PCR products were then sequenced with forward primers, and in the case of the presence of a poly-C stretch, a reverse complementary chromatogram was generated as well. The mutated variants were identified with the help of mtDNA-GeneSyn (Pereira et al., 2009) and compared to the revised Cambridge Reference Sequence (rCRS) by means of the HaploGrep software algorithm (Kloss-Brandstatter et al., 2011) linked to the PhyloTree complete mtDNA sequence databank (van Oven & Kayser, 2009). The Daza sequences presented in this study, together with their haplogroup affiliations, can be found in Table S2.

In this study, we have focused our attention mainly on the haplotypes belonging to L3e5; until now there were only 10 known complete L3e5 mtDNA sequences—7 from Africa, 1 from the United States, 1 of unknown origin, and 1 from Europe (Behar et al., 2008; Costa et al., 2009; Soares et al., 2012; Cerezo et al., 2012). Unlike L3f3 (Černý et al., 2007, 2009), the occurrence of L3e5 in the LCB is evenly distributed among all three language family groups. From our entire current HVS-1 dataset encompassing LCB populations, we selected L3e5 samples to represent the various ethnic language groups in the region. In total, we carried out whole genome sequencing of 19 L3e5 samples—8 from Afro-Asiatic (1 Bulahay, 2 Hide, 1 Kotoko, 1 Mafa, and 3 Masa), 5 from Niger-Congo (3 Fali and 2 Fulani), and 6 from Nilo-Saharan speakers (3 Kanembou and 3 Kanuri). We followed the same whole mtDNA genome sequencing strategy as reported previously (Černý et al., 2011a). The complete mtDNA sequences of L3e5 haplogroup were submitted to GenBank (accession numbers KF358472–KF358490).

Statistical Analyses

The Daza mtDNA sequences were compared with neighboring population samples using the 340 bp-long HVS-I segment (from np 16030 to 16370). PhiST pairwise genetic distances (Reynolds et al., 1983) among 154 African populations were calculated for 1000 permutations with the help of Arlequin version software (Excoffier et al., 2005). These distances were subsequently transformed into two dimensions by Principal Coordinates Analysis (PCoA) implemented in GenAlEx version 6.501 (Peakall & Smouse, 2012). An analysis of molecular variance (AMOVA) (Excoffier et al., 1992) was also performed separately for all populations and for geographical and linguistic groupings; 10,000 permutations were used for each calculation.

The L3e5 frequency distribution in space was visualized by the construction of interpolation maps using the “Spatial Analyst Extension” of ArcView version 3.2 ( as in our previous studies (Pereira et al., 2010a). The clinical pattern of the frequencies distribution was tested by correlogram analysis (Moran, 1950) implemented in PASSaGE software (Rosenberg, 2001).

Network software, version (Bandelt et al., 1995, 1999), was used to visualize the shared L3e5 HVS-1 haplotypes among the regions and to calculate the approximate age of the most recent common ancestor. We performed basic quality control analysis based on the principles described previously for the detection of phantom mutations (Bandelt et al., 2001; Brandstätter et al., 2005). The traces of all private mutations were again rechecked for clarity and resequenced when necessary. The average number of substitutions from the ancestral haplotype (16041–16223) to all derived haplotypes in the cluster was considered, and converted into a date by applying a mutation rate of one substitution every 16,677 years (Soares et al., 2009) with a standard error as in Saillard et al. (2000).

The reduced median network of the whole genome L3e5 sequences led to a suggested branching order for the tree, which was then constructed most parsimoniously by hand. Age estimates were made using the ρ statistic and maximum likelihood (ML). We used ρ with a complete mtDNA sequence mutation rate of one substitution in every 3624 years corrected for purifying selection (Soares et al., 2009), and a synonymous mutation rate of one substitution in every 7884 years. Standard errors were estimated as in Saillard et al. (2000). We also obtained ML estimates of branch lengths using PAML 3.1361, assuming the HKY85 mutation model with gamma-distributed rates. We converted mutational distance in ML to time using the same complete mtDNA genome clock corrected for purifying selection.

For the L3e5 founder analysis, we built a network using HVS-1 variation. The founder age was estimated assuming both the f1 and f2 criteria (Richards et al., 2000), meaning that in order for sequences that were only shared or inferred to be considered founder, they had to present at least one (f1 criterion) or two derived branches (f2 criterion) in the source population. An effective number of samples for each founder was calculated as previously (Soares et al., 2012) and founder ages were then statistically distributed across 200-year intervals. The mutation rate employed was one mutation every 16,677 years (Soares et al., 2009). The haplotype and nucleotide diversities and mean number of pairwise differences of L3e5 samples in Northwest Africa and LCB were calculated in Arlequin

We obtained Bayesian skyline plots (BSPs) (Drummond et al., 2005) from BEAST 1.4.6 (Drummond & Rambaut, 2007) for the whole genome L3e5 sequences with a relaxed molecular clock (Drummond et al., 2006) (lognormal in distribution across branches and uncorrelated between them) and the HKY model of nucleotide substitutions with gamma-distributed rates. BSPs estimate the effective population size through time using random sequences from a given population. We used the mutation rate obtained in our prior general L3 study (2.6186 × 10−8 substitutions/site/year; Soares et al., 2012). BEAST uses a Markov-chain Monte-Carlo (MCMC) approach to sample from the posterior distributions of model parameters (branching times in the tree and substitution rates). Specifically, we ran 50,000,000 iterations, with samples drawn every 5000 MCMC steps after a discarded burn-in of 5000 steps. We checked for convergence to the stationary distribution and sufficient sampling by inspection of posterior samples, and visualized BSPs with Tracer v1.3. A generation time of 25 years was used as in Fagundes et al. (2008), and since we aimed at a tree structure directly comparable to the rest of the analyses, we forced the larger subhaplogroups to be monophyletic for the purposes of the calculation. In order to perform a systematic comparison and description of the increment periods in the effective population size of the BSP, we calculated the rate of the increment of number of individuals per effective population size per year. For the purposes of a definition, we considered the increase of at least one individual per 100 individuals in 100 years as a “steep” one, which matched well with the visualization of the BSP.


PhiST pairwise genetic distances between pairs of populations for HVS-I diversity in and around the LCB were plotted in a two-dimensional space by means of PCoA (Fig. 1). The first coordinate, explaining 34.7% of the variation, sorts the groups according to latitude—while northern LCB neighbors are located on the right side of the graph, southern LCB neighbors can be found on the left side. On the other hand, when considering longitude, the second coordinate, a sorting pattern explaining 21.4% of the variance also becomes clearly visible. Interestingly, the LCB populations living in the middle of the Sahelian belt occupy a similar space on the plot to West African samples, and their distances from both their northern and southern neighbors are practically the same. The heterogeneous East African population occupies the largest space and is the closest one to southern LCB neighbors. It is interesting and perhaps surprising that while the Daza occupy a northern LCB position geographically, genetically they are close not to its northern, but rather to its southern neighbors.

Figure 1.

Principal coordinates analysis of the Lake Chad Basin and neighboring populations.

AMOVA shows that geographical groupings account for a slightly lower proportion of variance among populations than does the linguistic one (Table 1). When the entire dataset was analyzed as one geographic group, the variation among populations came to 8.81%. When the dataset was divided into seven geographical groups (the area of the LCB by the Daza, and the areas to its Northwest, Northeast, West, East, South, and Southeast) the variation was shown to be 6.93% among the groups, and 2.88% among populations within the groups. A similar structure was revealed when the Daza were combined with other sub-Saharan groupings, but a higher proportion of variance among populations within groups was observed when the Daza were included in one of the North African group (see Table 1). When analyzing variance by language rather than geography, and dividing the dataset into 13 linguistic groupings (Berber, Cushitic, Chadic, and Semitic as branches of the Afro-Asiatic family; Adamawa-Ubangi, Atlantic, Benue-Congo, Gur, Kwa, and Mande as branches of the Niger-Congo family; and East Sudanic, Saharan, and Songhai as branches of the Nilo-Saharan family) the among-group variation dropped to only 5.93%, and that among populations within the groups to 3.56%.

Table 1. Population Structure—AMOVA Results (% of Variation) for Geographical Versus Linguistic Groupings
GroupingsAmong groupsAmong populations within groupsWithin Populations
  1. Different geographical groupings of the Daza population sample were used (geographic 1, Daza included in LCB; 2, in East; 3, in Northeast; 4, in Northwest; 5, in West; 6, in South; and 7, in Southeast).

All populations 8.8191.19
Geographic 16.932.8890.19
Geographic 26.922.8890.19
Geographic 36.832.9690.21
Geographic 46.812.9890.21
Geographic 56.932.8990.18
Geographic 66.912.9090.19
Geographic 76.922.8890.20

L3e as a whole is the most frequent mtDNA haplogroup found in the LCB (Fig. S2a). Interestingly, no L3e was found in Daza. This population from Northern Chad contains mostly sub-Saharan, but also some Eurasian, haplogroups (Table S2). The most frequent sub-Saharan clade in Daza is the ubiquitous L2a1 followed by several L0a and L3f1b lineages. Eurasian mtDNA variation is represented mainly by several M1 haplotypes and one U6.

When L3e is further dissected into its clades (possible due to its distinguishing HVS-1 variants) then L3e5 is shown to be the second most frequent clade just after L3e2b (Fig. S2b). From the 88 L3e5 sequences, a total of 46 are from LCB and 42 from North Africa. All genetic diversity parameters show higher values in the LCB than in the North Africa fraction (mean number of pairwise differences: 1.300 ± 0.828 vs. 0.860 ± 0.624; haplotype diversity: 0.819 ± 0.054 vs. 0.640 ± 0.083; nucleotide diversity: 0.0038 ± 0.0027 vs. 0.0026 ± 0.0020, respectively). When analyzing by population the highest frequency of L3e5 (20.3%) was detected in the LCB population of the Kotoko, but a high frequency (17.0%) was also observed in Moroccan Berbers from Figuig. Apart from these, four populations from our dataset show an L3e5 frequency higher than 10%—three from Cameroon (the Masa, Hide, and Fali) and one from Tunisia (the Matmata Berbers). A frequency under 10% was detected in 18 populations. Apart from two exceptions (the East African Turkana and West African Serer) all these groups belong to either LCB or Northwestern Africa (see Fig. 2). As far as linguistic affiliations are concerned, L3e5 was found in all groups except the Cushitic, Mande, and Songhai. Moran's I correlograms show that the frequency of L3e5 can be interpreted partially as a cline, as it affects only a part of the studied area; in fact, the Sahara desert divides L3e5 into two separate pockets.

Figure 2.

Frequency map based on HVS-I data of the L3e5 haplogroup.

The network of HVS-1 L3e5 sequences (Fig. S3) shows that the ancestral haplotype marked by motif 16041–16233 is unanimously the most frequent one (n = 45). Among derived haplotypes in the LCB and North Africa we have identified only two cases of haplotype sharing, both between ethnically close peoples—the 16041–16233–16278 haplotype occurs in the Shuwa Arabs from Nigeria and the Arabs from North Tunisia, and the 16037–16041–16233 haplotype has three sequences in the Chadic and one in the Berber branches belonging to the Afro-Asiatic family. The age estimate of the most recent common ancestor of L3e5 based on HVS-I diversity gives an Early Holocene date of 10,262 ± 2493 years ago.

L3e emerged somewhere in Central Africa (Soares et al., 2012), and L3e5 likely in its northern vicinity—therefore somewhere in the LCB. Its migration time into North Africa was estimated by founder analysis (Richards et al., 2000). We used both f1 and f2 criteria for the age estimates of its North African arrival, and the results (see Fig. 3) show that whereas the f1 criterion peak attains 6200 years, the more conservative f2 criterion dates the migration to 8000 years ago. This suggests that L3e5 migrated into North Africa certainly more than 6000 years ago.

Figure 3.

Probabilistic distribution of founder clusters in North Africa across migration times scanned at 200-year intervals from 0 to 100 ka, using the f1 and f2 criteria.

Age estimates based on whole genome mtDNA data allowed a more reliable genetic dating of L3e5, with all three estimates agreeing on something around 11 ka (11,842 [8722–15,016] for total ρ; 10,044 [4289–15,798] for synonymous ρ; and 11,090 [8346–13,876] for ML). This date is quite similar to the one obtained using HVS-I diversity only (see above). The tree reconstruction (Fig. 4) allowed us to confirm one internal branch (L3e5a) already identified on the PhyloTree website, as well as four new ones, all having diagnostic mutations mostly in the coding region. All these subhaplogroups emerged between 11 and 5 ka (see the exact dates in Fig. 4), testifying to the star-likeness already revealed by the HVS-1 network. L3e5a is distinguished by one mutation at position 13317, and, with the exception of one sequence from Hungary, by another mutation at 2833 in all its African sequences; in addition, sequences from Chad and Cameroon also all share a transversion at position 431. L3e5b is characterized by one mutation (5483); L3e5c by one coding region (11548) and one control region mutation (195); L3e5d by two mutations (8869 and 12591); and, finally, L3e5e is characterized by two mutations (15367 and 16037). Since the last mentioned subclade is identifiable by HVS-1 resolution, it can be shown by examining the larger HVS-1 dataset that it occurs in both LCB and North African datasets (Fig. S3). The population expansion for the L3e5 haplogroup is also evident in the Bayesian Skyline Plot (Fig. 5), where a 42-fold increase for the period between 3.5 and 10.6 ka is visible.

Figure 4.

A phylogenetic reconstruction of mtDNA haplgroup L3e5. Integers alone represent transitions, and integers plus a suffix of A, G, C, or T indicate transversions. Underlined integers indicate synonymous substitutions. Age estimates for L3e5 and its clades are based on diversity of the complete molecule (first lines), synonymous polymorphisms taken alone (second lines) and ML estimates (third lines); rCRS—revised Cambridge Reference Sequence. Provenance of the samples: (1) KF358479, Kanembou, Chad; (2) KF358485, Kotoko, Cameroon; (3) KF358478, Hide, Cameroon; (4) KF358480, Kanembou, Chad; (5) KF358481, Kanembou, Chad; (6) EU092776, Egypt; (7) JN214452, Hungary; (8) KF358475, Fali, Cameroon; (9) KF358477, Fulani Tcheboua, Cameroon; (10) JN655798, Chad; (11) EU092821, Libya; (12) EU092959, USA and JQ705596, unknown; (13) KF358483, Kanuri, Nigeria; (14) KF358489, Masa, Cameroon; (15) KF358476, Fulani Balatungur, Niger; (16) KF358488, Masa, Cameroon; (17) KF358486, Mafa, Cameroon; (18) JN655822, Sudan; (19) KF358474, Fali, Cameroon; (20) KF358473, Fali, Cameroon; (21) KF358482, Kanuri, Nigeria; 922) JN655828, Sudan; (23) KF358487, Masa, Cameroon; (24) KF358484, Kanu, Nigeria; (25) FJ460533, Tunisia; (26) DQ341070, Nigeria; (27) KF358472, Bulahay, Cameroon; (28) KF358490, Hide, Cameroon.

Figure 5.

Bayesian Skyline Plot (BSP) based on L3e5 haplogroup data, indicating the median of the hypothetical effective population size through time (assuming 25 years per generation). Maximum time (x-axis) corresponds to the median posterior estimate of the genealogy root height.


The genetic structure of the African Sahel suggests that the region has witnessed important past demographic events (Černý et al., 2004, 2007, 2009; Soares et al., 2012). While some areas have already been well covered by previous samplings, the genetic diversity of the vast area lying between the Ethiopian highlands/White Nile on the east and Lake Chad on the west remains to be adequately documented because of low accessibility. We tried to diminish this lacuna by sampling the Daza from Northern Chad, a population whose maternal gene pool shows not only the expected links to LCB groups but also to other sub-Saharan populations (highlighting how little we know about the genetic diversity of African Sahel populations east of the LCB).

Gene flow across the Sahara, from the Mediterranean to sub-Saharan Africa, is well documented by the southern presence of mtDNA haplogroups such as U6 and M1 originally coming to North Africa from the Near East in pre-LGM times (Olivieri et al., 2006; Pereira et al., 2010a,b). Later on, other mtDNA clades such as U5, H1, H3, and V, which arrived to North Africa likely from Iberia in post-LGM times (Achilli et al., 2005; Ennafaa et al., 2009), penetrated into the Sahara and even reached as far as the Sahel belt (Cerezo et al., 2011; Černý et al., 2011b). It has been shown that these non-L—but nonetheless already African—clades are present in higher frequency mainly in nomadic pastoralists such as the Fulani (U5) or Tuareg (H1, H3, and V) (Ottoni et al., 2009; Pereira et al., 2010a; Černý et al., 2011b), and that the most probable timeframe for their arrival to the south of the Sahara is 3–9 ka (Pereira et al., 2010a). As a result of the subsequent Neolithic development in the east Mediterranean, other mtDNA lineages originally from the Near East, such as J and T, enriched the North African maternal gene pool especially in its eastern parts (Coudray et al., 2009; Kujanová et al., 2009), but did not reach sub-Saharan locations. In Daza, we identified only M1 and U6 (no H1, H3, V, or JT) clades, suggesting rather the pre-Neolithic formation of their maternal gene pool.

Ancient gene flow in the opposite direction, from sub-Saharan Africa to North Africa and even further into Europe, has been documented only recently (Cerezo et al., 2012). It has been shown that at least 35% of the sub-Saharan maternal gene pool in Europe cannot in fact be linked with recent episodes such as the slave trade, as it had arrived to the area much earlier—perhaps as early as ∼11 ka. Coincidentally, this dating would indicate that this wave of sub-Saharan DNA to Europe occurred just after the Younger Dryas, which is when several European lineages were also re-expanding (Soares et al., 2010). The majority of L-haplotypes are found in Southern Europe, especially in Iberia, where their frequencies can be as high as 6%; their occurrence drops dramatically in the direction heading to Central and Northern Europe. The absence of European L-haplogroups in Africa suggests that they descended from their African progenitors when already in Europe. The European L-haplotypes are derived mostly from West African clades such as L3d (L3d1b1a), L1b (L1b1a8, L1b1a6a, L1b1a9a1, L1b1a11), and L2a (L2a1k), but have not yet been characterized in North Africa in terms of complete mtDNA sequencing. Interestingly, as far as L3e5 is concerned, just one single sample was found in Europe, and not in the south of the continent as we would expect, but rather in Central Europe (Cerezo et al., 2012).

We have shown that besides occurring in the LCB, L3e5 occurs today especially in the Berber populations of Northwest Africa. It has been suggested that the contemporary North African gene pool diverged from the Near Eastern one and expanded in North Africa before the Holocene, a concept jointly confirmed by mtDNA and nuclear genomic data (Pereira et al., 2010b; Henn et al., 2012). As a whole, however, the current gene pool of North Africans is geographically subdivided; while in the west it contains more European inputs, its eastern part is more closely linked to the Near East (Coudray et al., 2009; Henn et al., 2012). While the sub-Saharan component appears to be relatively small throughout all North African groups, we show here that its presence is rather differentiated: L3e5 is found predominantly in the western rather than eastern parts of North Africa. It is interesting that L3e5's contemporary location in North Africa (Fig. 2) approximately matches the locations of the Paleolithic and Mesolithic Iberomaurusian and Capsian cultures that are currently linked with the origin of the Berbers (Ilahiane, 2006). Moreover, the skeletons unearthed recently in the western part of the LCB bear morphological traits common in Northwestern Africa (Sereno et al., 2008). All this evidence supports the idea of Early Holocene communication between relatively distant regions today separated by an inhospitable Saharan desert.

Turning our attention to the LCB gene pool, two hypotheses of its Holocene infiltration have been proposed. Both were developed on linguistic grounds and concern the Chadic branch of the Afro-Asiatic phylum. While the “inter-Saharan” thesis (Blench, 1999) claims that Chadic ancestors arrived from East Africa, where the Cushitic is their closest linguistic branch, the “trans-Saharan” thesis (Ehret, 2002) argues, on the other hand, that they arrived to the area by crossing the Sahara from North Africa, where Berber is considered the closest linguistic branch. Uniparental genetic markers provide rather contradictory evidence for these linguistic proposals; while the mtDNA L3f3 clade supports Blench's vision (Černý et al., 2009), the Y chromosome R1b1a-V88 clade favors Ehret's concept (Cruciani et al., 2010a). However, recently analyzed male genetic contributions of the LCB populations showed a significant North African R1b contribution only in the Fulani pastoralists belonging to the Niger-Congo linguistic family and not to the Chadic groups (Bučková et al., 2013). It seems thus that LCB Y chromosomal variation is still difficult to interpret, likely because of its higher susceptibility to genetic drift than that of mtDNA (Pereira et al., 2001a) or due to insufficient sampling in as large a region as is the LCB.

In the interpretation of additional analyses of the Y chromosome variation (Cruciani et al., 2010b) the L3e5 distribution was mentioned as possibly lending support to Ehret's “trans-Saharan” migration route hypothesis. We, however, show here that the L3e5 data are hardly compatible with such a view: not only because its age estimates are older (∼11 ka) and occur (unlike L3f3) in practically all LCB linguistic groups, but mainly because its gene flow has taken place in the completely opposite direction to that thought by the proponents of Ehret's “trans-Saharan” route. The presence of L3e5 in both the LCB and North Africa can thus be considered as evidence of an ancient pre-Afro-Asiatic link between the two opposite sides of the Saharan desert, leading to gradual introgression of LCB genetic material into the ancestral Berber population.


The project was supported by the Grant Agency of the Czech Republic (Grant no. 13–37998S-P505). FCT, the Portuguese Foundation for Science and Technology, supported this work through a personal grant (SFRH/BPD/64233/2009) to P.S. and partial funding of IPATIMUP as an Associate Laboratory of the Portuguese Ministry of Science, Technology, and Higher Education. The authors would also like to thank to two anonymous reviewers for their comments on an earlier version of this manuscript.