Unidad de Genética, Instituto de Medicina Legal, Facultad de Medicina, Universidad de Santiago de Compostela, 15782, and Centro Nacional de Genotipado (CeGen), Hospital Clínico Universitario, 15706, Galicia, Spain
Corresponding author: Černý Viktor, Department of Anthropology & Environment, Institute of Archaeology, Czech Academy of Sciences, 118 01 Prague 1, Czech Republic, Fax: +420.257532288, Tel: +420.257014304, E-mail: firstname.lastname@example.org
The Chad Basin was sparsely inhabited during the Stone Age, and its continual settlement began with the Holocene. The role played by Lake Chad in the history and migration patterns of Africa is still unclear. We studied the mitochondrial DNA (mtDNA) variability in 448 individuals from 12 ethnically and/or economically (agricultural/pastoral) different populations from Cameroon, Chad, Niger and Nigeria. The data indicate the importance of this region as a corridor connecting East and West Africa; however, this bidirectional flow of people in the Sahel-Sudan Belt did not erase features peculiar to the original Chad Basin populations. A new sub-clade, L3f2, is described, which together with L3e5 is most probably autochthonous in the Chad Basin. The phylogeography of these two sub-haplogroups seems to indicate prehistoric expansion events in the Chad Basin around 28,950 and 11,400 Y.B.P., respectively. The distribution of L3f2 is virtually restricted to the Chad Basin alone, and in particular to Chadic speaking populations, while L3e5 shows evidence for diffusion into North Africa at about 7,100 Y.B.P. The absence of L3f2 and L3e5 in African-Americans, and the limited number of L-haplotypes shared between the Chad Basin populations and African-Americans, indicate the low contribution of the Chad region to the Atlantic slave trade.
Since ancient times Lake Chad has been somewhat isolated geographically and, while some researchers have considered this region a crossroads for human migrations, others regard it as a final destination where population movements from Western and Eastern Africa terminated (Lange, 1992; Cyffer, 2002). From an ethno-linguistic point of view the Chad Basin is the homeland of highly diversified groups: three of the four African linguistic families (Afro-Asiatic, Niger-Congo and Nilo-Saharan) overlap here. The middle part of the Sahel-Sudan belt (sometimes referred to as Central Sudan) has its own history of great Islamic empires, including the Kanem, Bornu, Bagirmi, Waddai and others (Insoll, 2003), but its prehistory is still not well understood (Newman, 1995). The natural conditions around Lake Chad have at all times been dictated by oscillating wet and dry periods, which alternated not only at an annual level but also over much longer intervals (Maley, 1981). It seems that during the kanémien period (20000 – 12000 B.C.) only habitable desert existed around Lake Chad. Around 9,000 years ago, however, a large part of the Chad Basin was already underwater. Lake Megachad rose at this time to a height of 325 m above sea level (a.s.l.), and flowed through the Bahr el-Ghazal into the Bodélé plains of northern Chad. Its southern shore is still visible in the dune belt running along the Maiduguri-Bama-Limani-Borgor line, almost as far as latitude 10° North. It is estimated that Lake Megachad covered an expanse of some 330,000 km2; its current extent, at an altitude of 282 m a.s.l. and covering a mere 20,000 km2, may thus be regarded as a relatively insignificant relict (Brunk & Gronenborn, 2004). With the gradual drying of the climate, however, vegetation patterns stabilised and the present ethnic composition formed. The oscillating withdrawal of Lake Megachad around 5,000 Y.B.P., and with it the growth of the Sahara, led to a certain isolation of the Chad Basin populations – traces of ancient Egyptian campaigns end in the Gilf Kebir region in the south-western tip of what is now Egypt. The Sahel-Sudan belt, between the Sahara to the North and the tropical rain forests to the South, was for thousands of years a broad corridor along which cultural influences, as well as human migrations from East and West Africa, moved.
Beef cattle, prehistoric depictions of which can be found in the mid-Saharan rocky massifs neighbouring the northern parts of the Chad Basin, were of prime importance among domesticated animals. The skeletal remains of domesticated cattle from the 4th millennium B.C. have been found at several sites in northern Niger, of which the most important was probably Adrar Bous (Haour, 2003). In the southern part of the Chad Basin the first evidence, in the form of domesticated cattle bones, comes however from just 3,000 Y.B.P. Plant cultivation, too, began relatively late in the Chad Basin. Pearl millet (Pennisetum glaucum) was the first agricultural plant to appear in the archaeological profiles from north-eastern Nigeria; this was brought to Lake Chad at the close of the second millennium B.C., probably from northern Niger which may have been one of the two West African centres of domestication. Only in the second phase, and in conjunction with the broad spread of previously established iron metallurgy, did the cultivation of sorghum (Sorghum bicolor) begin, most likely having been domesticated in the Nile Valley; the earliest finds from the Chad Basin, however, date to the first half of the first millennium A.D. The marked delay in the advance of agricultural technologies in comparison with the outside world may be explained by the unusually favourable natural conditions of the Early and Middle Holocene, which did not compel the local inhabitants to adopt physically demanding and tedious crop cultivation (Neuman, 2003). The relatively late establishment of cattle may be associated with the threat of sleeping sickness, against which the pastoralists of the southern reaches of the Chad Basin must still protect their herds today (Gifford-González, 2000). Fishing and the diverse food sources linked to areas of water – across which the local population was able to move very effectively – were very probably of great importance; this is attested by what is thus far the earliest wooden boat to be found in Africa, from the site of Dufuna in north-eastern Nigeria, which radiocarbon dating shows to be 8,000 years old (Breunig et al. 1996).
In recent years genetics has made a substantial contribution to our understanding of human migration patterns. The mitochondrial genome (mtDNA) in particular has played a central role in unravelling the past and present history of African populations. Sampling in Africa is, however, still very insufficient, and many regions and ethnic groups remain uncharacterised; this is especially true for less accessible areas such as eastern Chad or the Congo Basin.
We analysed 448 individuals from 12 different populations, sampled around Lake Chad in the southern part of the Chad Basin in northern Cameroon, western Chad, south-eastern Niger and north-eastern Nigeria (Table 1). This selection was made to embrace as broadly as possible the ethnic composition, economic orientations and geographic position of that area (Figure 1). Buccal swabs were colleted from maternally unrelated volunteers, all of whom gave informed consent.
Table 1. Diversity indices of HVS-I mtDNA in the population samples from the Chad Basin
Language branch/Language Family
NOTE: *The Hide and the Mafa correspond to those individuals analysed in Černýet al. (2004); the Masa sample includes all the individuals from Černýet al. (2004; N = 31) plus 1, while the Kotoko also incorporates 38 additional DNAs with respect to the dataset reported by Černýet al. (2004; N = 18). **The Borgor Fulani and Tcheboua Fulani were previously reported in Černýet al. (2006).
N = sample size; k = number of different sequences; S = number of segregating sites; h = haplotype diversity; π= nucleotide diversity; M = observed average number of pairwise differences. AA = Afro-Asiatic; NC = Niger-Congo; NS = Nilo-Saharan.
0.996 ± 0.014
0.0256 ± 0.0137
8.7 ± 4.2
0.955 ± 0.017
0.0211 ± 0.0111
7.2 ± 3.4
0.980 ± 0.012
0.0222 ± 0.0118
7.6 ± 3.6
0.988 ± 0.012
0.0216 ± 0.0115
7.4 ± 3.5
0.968 ± 0.021
0.0219 ± 0.0117
7.5 ± 3.6
0.963 ± 0.023
0.0197 ± 0.0107
6.7 ± 3.3
0.977 ± 0.012
0.0177 ± 0.0096
6.0 ± 2.9
0.947 ± 0.019
0.0216 ± 0.0115
7.4 ± 3.5
0.931 ± 0.024
0.0197 ± 0.0105
6.7 ± 3.3
0.953 ± 0.016
0.0207 ± 0.0110
7.1 ± 3.4
0.988 ± 0.006
0.0258 ± 0.0134
8.8 ± 4.1
0.989 ± 0.012
0.0224 ± 0.0120
7.6 ± 3.7
One group of five Chadic-speaking populations and one group of two Semitic-speaking populations were selected from the Afro-Asiatic language family. The first group of Chadic speaking populations comprises peasant populations from northern Cameroon and south-eastern Niger – the Hide and Mafa of the Mandara Mountains (the same individuals analysed in Černýet al. 2004; N= 23 and N= 32, respectively), the Kotoko of the Shari basin (those individuals analysed in Černýet al. 2004[N= 18] plus 38 new subjects), the Masa of the Logon basin (the individuals analysed in Černýet al. 2004[N= 31] plus one additional subject) and the Buduma of the north-western shore and islands of Lake Chad in Niger (N= 30). The Semitic group comprised two Arabic-speaking populations – the first made up of nomadic tribes migrating in Kanem and Bagirmi in Chad (N= 27), and the second composed of semi-nomadic Shuwa Arabs from the Borno state in Nigeria (N= 38).
From the Niger-Congo phylum one peasant Fali population from the Tinguelin rocky massif approximately 30 km North of Garoua in Cameroon (N= 40), and two Fulani populations were selected – the first nomadic Fulani sample was taken from the middle Logon South of Borgor in Chad (N= 49), and the second from the Tcheboua region, around 30 km South of the Benue River in Cameroon (N= 40). The latter sample is made up of nomads that have settled recently (approximately one or two generations ago, as reported by their leaders), but whose dependence on cattle rearing is still high.
The Kanembu from Kanem, northeast of Lake Chad in Chad (N= 50), and the Kanuri from the Borno state in Nigeria, southwest of the lake, (N= 31) were sampled from the Nilo-Saharan phylum.
DNA extraction was performed using the method presented in Černýet al. (2004). HVS-I was amplified by means of primers F–15971 (5′–TTA ACT CCA CCA TTA GCA CC–3′) and R–16410 (5′–GAG GAT GGT GGT CAA GGG AC–3′), with an annealing temperature of 51°C. Purification was undertaken using the QIAquick PCR purification kit (QIAGEN). Reactions were carried out using the BigDye Terminator v.3.1 Cycle sequencing kit (Applied Biosystems). The sequence range of 16030–16370 bases was considered.
MtDNA variability in sub-Saharan Africa is characterised primarily by L-type haplogroups. Because of their high internal diversities, most of them—and even their sub-clades—can be recognised from the first hypervariable segment (HVS-I); some, however, require more information concerning coding region variants. RFLP analyses were thus undertaken for all the samples for HpaI 3592 (targeting position C3594T that determines L0′1′2′5′6 sequences, sensuTorroni et al. 2006) and MboI 2349 (transition T2352C determines the L3e branch). Variation at position 16390 (diagnostic of haplogroup L2) was recorded from chromatograms in most of the samples; in some cases however it was necessary to carry out RFLP detection using AvaII. The primer sequences and temperature profiles of the RFLP analyses are available on request from the corresponding author. Finally, some non-L-type mtDNAs were further RFLP genotyped using AluI 7025 (targeting site 7028) and MseI 14766 (for site 14766).
The 16090 to 16365 sequence range was used for cross-comparisons of haplotypes in populations. It should be noted that the African nomenclature is in need of revision due to the new data available (especially concerning complete genomes). We here show the updated criteria for nomenclature that concerns the lineages observed and discussed in the present article and is based on complete genome data (available in the public literature and/or GenBank [http://www.ncbi.nlm.nih.gov/entrez/query.fcgi?db=Nucleotide]).
L3h was first identified by Rosa et al. (2004); according to these authors L3h is characterized by a −1715DdeI site (which identifies the transition G1719A) and by the HVS-II variants G16129A C16256A T16362C. We now know that this variation probably identifies a sub-lineage of L3h characterized by coding region sites G1719A A4388G C5300T T9509C A11590G and by (at least) the control region mutations C16256A A16284G. We here rename this sub-cluster as L3h1 based on the data provided in Kivisild et al. (2004) and Torroni et al. (2006). Further, we observe that the basal motif of L3h should also include variant T195C. L3f and L3f1 are named as in Salas et al. (2002); there is however some evidence that transition C16292T that identifies L3f1 is just a sub-branch of a L3f sub-lineage defined by coding region sites C5601T T9950C (further subdivided by A14769G; Kivisild et al. 2004, and other coding region variants, see also Torroni et al. 2006). L3f2 is defined here as the branch carrying C16176T and C16234T on top of the L3f motif; there is one complete genome in Torroni et al. (2006) representative of this sub-cluster (see below). There is not control region variant that identifies L3e (Bandelt et al. 2001; Salas et al. 2002); nevertheless L3e1, L3e1b, L3e2 and L3e2b are defined by their HVS-I diagnostic sites as in Salas et al. 2002); at least for L3e2 and L3e2b the nomenclature is fully consistent with the complete genome data (e.g. Torroni et al. 2006). There is only one complete genome representative of L3e3; therefore, the nomenclature remains as in Salas et al. (2002). L3e5 has recently been defined by Torroni et al. (2006), based on only one complete genome (their #13 sample); here we observe that the HVS-I motif of L3e5 is constituted by transitions A16041G C16223T. Also according to our data, the extra HVS-II variant A16037G seems to define a small sub-branch, named here as L3e5a (also characterizing sample #13 of Torroni et al. 2006). L3d and its main sub-branches (L3d1 and L3d2), as well as L3b and L3b1, are named as in Salas et al. (2002); the nomenclature is still consistent with the scarce complete genome data available. L3g is the branch defined by T16086C C16223T A16293T T16311C C16355T T16362C in HVS-I (Salas et al. 2002, 2004a,b); here we move to L4 since it constitutes a branch between L3 and L7 (Torroni et al. 2006; note that this branch was also renamed by Kivisild et al. (2004) as L4g); L4 sub-branches follow the Salas et al. (2004b) phylogenetic scheme. Finally, nomenclature for U6 and M1 follows the nomenclature of the recent article by Olivieri et al. (2006; see also references therein), while U5 follows Achilli et al. (2005). For the rest of the European lineages we use the most recently updated nomenclature from Palanichamy et al. (2004).
Statistical and Phylogenetic Analyses
Median networks of HVS-I sequences were drawn by hand using the principles of the median-joining algorithm (Bandelt et al. 1999). Subsequently, the most parsimonious tree of haplogroups was inferred. Coalescent times were calculated using the ρ (rho) statistic, and an HVS-I mutation rate of one transition per 20,180 years was applied for the sequence range 16090–16365 using Network 220.127.116.11 software (Bandelt et al. 1995; Forster et al. 1996; Saillard et al. 2000). The diversity indices of the HVS-I sequences (haplotype diversity, nucleotide diversity, and average number of pairwise differences) were calculated using Arlequin 3.0 software (Excoffier et al. 2005). AMOVA (Excoffier et al. 1992) was analyzed using Arlequin 3.0, and the significance of the covariance components associated with the different levels of genetic structure was tested using a non-parametric permutation procedure (Excoffier et al. 1992). Principal Component Analysis was performed based on haplogroup frequencies as in Salas et al. (2005b). Comparisons between populations were assessed by FST distances, which were subsequently plotted by multidimensional scaling analysis (MDS) using the PROXSCAL technique, implemented in the SPSS 10.0 statistical package. An HVS-I mtDNA database of African populations and African-Americans (>6,600 mtDNAs) was employed for population comparisons; more details concerning these data can be found in Salas et al. (2005b). Note that the Bamileke and Ewondo in Destro-Bisol et al. (2004a), as well as the Bakara, Basa and Fulbe in Destro-Bisol et al. (2004b), are included in the dataset of Coia et al. (2005). The classification of samples in the main African regions is taken from previous works (Salas et al. 2002, 2004a, 2005b); the allocation of some population samples to, for example, the western Central or western African pool is based on pragmatic reasons; the analysis carried out in the present project and the conclusions drawn are not substantially dependent upon this classification. Variation at positions 16182–16185 and length polymorphism at the polyC were not considered. A posteriori (post-sequencing) phylogeographic checking of the mtDNA sequence data was carried out, in order to avoid data errors as far as possible (e.g. Bandelt et al. 2004a,b, 2005a,b; Salas et al. 2005a,e, 2006; Yao et al. 2006).
Descriptive Parameters of the HVS-I Sequences
Table 1 shows the descriptive parameters of the Chad Basin populations analysed. It is interesting to note that, in a broad sense, the values for the different diversity parameters (haplotype and nucleotide diversities, average number of pairwise differences) were lower for the nomadic groups than for their agricultural counterparts (independently of geographic location); these differences are not, however, statistically significant.
Patterns of Matching Sequences Between Chad Basin Populations and the Main African Regions
A large database of African types was used for cross-comparison with the Chad Basin mtDNAs. As shown in Table 2, the number of shared individual mtDNAs and haplotypes is higher with western Central Africa than with any other African region, followed by West Africa. The difference between western Central and West Africa is more evident when looking at the matched haplotypes: Chad populations share 59 haplotypes (∼29%) with West African populations, accounting for ∼13% of the total haplotypes in West Africa. The percentage of shared haplotypes between Chad and western Central Africa is significantly higher (N= 101; ∼50%) accounting for ∼26% of the total haplotypes in western Central Africa.
Table 2. Shared mtDNAs and haplotypes between Chad populations and different regions in Africa.
CHAD BASIN POPULATIONS
INDIVIDUALSb (N= 448)a
HAPLOTYPESc (NH= 203)a
Notes: aN = sample size; NH= number of different haplotypes
bThe first number before the hyphen is the number of individuals in the given African region sharing mtDNA with Chad Basin individuals; after the hyphen is the opposite, i.e. the number of individuals in the Chad Basin that share mtDNA with individuals from the given African region. Numbers in parentheses are the corresponding percentages with respect to the totals (in Africa before the hyphen and in the Chad Basin after the hyphen). All these numbers are computed for the shared mtDNA sequence segment between all the samples, i.e. from position 16090 to 16365.
cFor the number of haplotypes shared between the Chad Basin and different African regions, the numbers in parentheses are firstly the percentage of these shared haplotypes from the total number of haplotypes (NH) in the given African region, and then the percentage from the total number of haplotypes in the Chad Basin.
EAST (N= 717; NH= 401)
NORTH (N= 1341; NH= 516)
SOUTH (N= 266; NH= 138)
SOUTH-EAST (N= 416; NH= 143)
SOUTH-WEST (N= 200; NH= 111)
WEST (N= 1228; NH= 452)
WESTERN CENTRAL (N= 999; NH= 379)
The percentages of shared mtDNA and haplotypes with North, East, and especially South Africa, are low. As inferred from phylogeographic information (see next section) the Chad Basin haplotypes matching those from South and south-eastern Africa coincide mainly with those probably ‘moved’ from West Africa during the Bantu expansion. The presence of East African mtDNAs in the Chad Basin seems to mirror the existence of a historical, bi-directional flow between East and West Africa. As a whole, the Chad Basin manifests a clear predominance of a western Central African component.
The Phylogeography of the Chad Basin
Analysis of the HVS-I region, as well as of three additional coding region mutations (HpaI 3592, MboI 2349, AvaII 16390), in all the samples made possible a reasonable phylogenetic classification of the L–type sub–Saharan haplotypes into the already defined haplogroups (Table S1). Figure 2 shows the patterns of haplogroup frequencies in the main African regions. As expected, most of the Chad Basin mtDNAs could be attributed to L-haplogroups; a non-negligible 5–6%, however, are of West Eurasian origin. This West Eurasian component (e.g. pre-HV, members of haplogroup U, etc.) is more prevalent in the Semitic nomadic group of the Afro-Asiatic phylum, represented mainly by the Arabic tribes from Chad that account for the five (pre-HV) sequences detected, and to a lesser extent by the semi-nomadic Shuwa Arabs of Nigeria as well. Some portion of these sub-Saharan African “intrusive” haplogroups was also detected within the nomadic Niger-Congo Fulani; all the U5 sequences were found in the Borgor Fulani. Two representatives of the North African autochthonous haplogroup U6 were detected in the Kanuri and in the Mafa.
Table S1. HVS-I sequences and coding region RFLPs in 448 individuals from 12 different populations of the Chad Basin
3592 Hpa I
2349 Mbo I
7025 Alu I
14766 Mse I
pub *An asterisk indicates the individuals analyzed in: Černý V, Hájek M, Čmejla R, Brůžek J, Brdička R (2004) MtDNA sequences of Chadic-speaking populations from northern Cameroon suggest their affinities with eastern Africa. Ann Hum Biol 31:554–569
pub **Two asterisks indicate the individuals analyzed in: Černý V, Hájek M, Bromová M, Čmejla R, Diallo I, Brdička R (2006). The mtDNA of Fulani nomads and their genetic relationships to neighbouring sedentary populations. Hum Biol 78: 9–27.
Note the unexpected status of 3249 MboI in sample #83 (haplogroup M1) and sample #445 (haplogroup H1). We have however found some back mutation in our database of complete or semi-complete (coding region) genomes, namely, Herrnstadt et al. (2002) within haplogroup H1 (sample #85), Palanichamy et al. (2004) within haplogroup K1a1b (sample #C40), and Achilli et al. (2005) within haplogroup U6b1 (sample #39).
Haplogroup L0a is mainly of East African origin, diversifying there around 40,000 years ago (Salas et al. 2002). Its major derived sub-clades (L0a1, L0a2 and L0a1a) spread into Central and South–Eastern African regions. Although it is believed that these were brought into the latter regions mainly by the eastern stream of the Bantu expansion, the role of the western Bantu stream is still uncertain (Salas et al. 2002), although some details are beginning to emerge (Plaza et al. 2004; Beleza et al. 2005). L0a1 and its main sub-clade L0a1a are represented by 10 haplotypes (∼6% of the mtDNAs) in the Chad Basin. Seven of these 10 haplotypes (13/28 of the L0a mtDNAs observed) are found in the three Chadic-speaking populations of the Kotoko, Mafa and Masa at relatively high frequencies (11%). Eight of these haplotypes match with other neighbouring western Central African samples. The most frequent type (matching the basal motif of L0a1; sensuSalas et al. 2002) is frequent in, for example, Nubians, but is also found in other East African populations (e.g. in the Sudan and Turkana). L0a is absent in Shuwa Arabs and both Fulani sample sets. Several representatives of the sub-clade L0a1a (which is identified in the HVS-I region by mutations C16168T and C16278T the top of L0a [N= 15]) were detected. The Chad Basin L0a1 types (with the exception of the basal L0a1 type) show indications of some differentiation in situ, as they are close derivatives of the pre-existing L0a1 types found in East (and/or South-East) Africa. L0a2, probably of Central African origin (Soodyall & Jenkins, 1993; Salas et al. 2002), was not observed in the Chad Basin samples used in this study.
Haplogroup L1b, which probably spread into Central and North Africa along the Atlantic coast line, seems to be of West African origin. L1b is represented by eight different haplotypes (36 mtDNAs). It is highly prevalent in both the Fulani groups (29% in the sample from Chad and 20% in the sample from Cameroon), but also in the Kanembou (12%) and Fali (10%); it is found sporadically in some other ethnic groups. L1b is completely absent in the three Chadic speaking groups (the Hide, Masa and Buduma) and in the Arabs from Chad. The L1b haplogroup occurs only in the form of L1b1 defined by the (mutationally unstable) variant A16293G. Most of the Chad Basin L1b types match, or are close derivatives, of West African types.
The history of haplogroup L1c still remains enigmatic (Beleza et al. 2005; Plaza et al. 2004; Richards et al. 1993; Salas et al. 2002, 2004a, 2005b). The present data seem to point to (somewhere in) Central Africa as the ‘cradle’ of L1c, with very restricted overlap into south–eastern areas (Salas et al. 2002; Destro-Bisol et al. 2004). L1c in the Chad Basin is represented by eight haplotypes (10 sequences); these are found in all three linguistic families except for the Arab groups. This Central African haplogroup occurs in the Chad Basin in the form of different haplotypes containing mutation A16293G (L1c1; note that this definition is provisional [Salas et al. 2002] since there is evidence indicating that A16293G is not an appropriate diagnostic site due to its high mutation rate), but also in those bearing transition A16215G (L1c3). It is important to note that the L1c2 haplotypes, occurring predominantly in Americans of African origin, were not observed in the samples from the Chad Basin. This seems to indicate, in accordance with previous studies (Salas et al. 2002, 2004a), that the contribution of the Chad Basin (in contrast to the western and south-western Atlantic façade) to the African-American mtDNA pool was probably very limited.
The L5 and L1f haplogroups are geographically restricted to the East African region, where their origins are also expected to lie. Only one L5 haplotype was found in the Chad Basin samples (a single Kanembou individual); it occurs in the form of L5a (L1e1 in Figure 5 of Salas et al. 2002) with mutations C16111T, A16254G, C16355T and C16360T.
Haplogroup L2 is commonly divided into four main branches, termed L2a, L2b, L2c and L2d (Bandelt et al. 2001; Torroni et al. 2001b), of which L2a is the most numerous and most widespread within Africa. Where L2a diversified is still an open question; it could have been in West, Central or East Africa. The origin of the remaining known L2 clades (L2b, L2c and L2d) is unambiguously in West Africa (Salas et al. 2002). L2a is represented in the Chad Basin, as it is everywhere in sub-Saharan Africa, by a large number of sequences (N= 78). Of the previously known clades of this haplogroup, only L2a1 (identified by the mutationally unstable variant A16309G; N= 35, 16 haplotypes) was detected and only one sequence containing mutation C16286T could further be classified as L2a1a. L2a is particularly abundant in the Kanembou (38% of the sample), but is also relatively frequent in nomadic Arabs (33%). It is absent from the Fulani sample from Cameroon, and is found at a relatively low frequency (6%) in the Fulani population from Chad. It is important to note that most of the Chad Basin L2a types do not match either the West or East African L2a types. This again suggests some diversification of this clade in situ. Positions T16209C C16301T C16354T on top of L2a1 define a small sub-clade, dubbed L2a1c by Kivisild et al. (2004, Figure 3) (see also Figure 6 in Salas et al. 2002), which mainly appears in East Africa (e.g. Sudan, Nubia, Ethiopia) and West Africa (e.g. Turkana, Kanuri). In the Chad Basin four different L2a1c types, one or two mutational steps from the East and West African types, were identified.
There is another small branch deriving from the basal L2a type that could tentatively (and consistent with Kivisild et al. 2004) be termed L2a1d. This small clade would be defined by positions T16189C C16291T T16311C T16229C on the top of L2a (see Figure 6 in Salas et al. 2002), and is found in East Africa and also Central Africa. In the Chad Basin we found five L2a1d derivatives lacking mutation T16189C. Both L2a1c and L2a1d abound in all of the linguistic branches of the Chad Basin population samples analyzed.
L2b is represented by 11 haplotypes (20 mtDNAs). This West African haplogroup is absent in both Arab groups and in the Fali; its highest frequency was detected in the Mafa (19%). L2b is also frequent in both Fulani sample sets where it occurs mainly as clade L2b1, differentiated by mutations C16355T and T16362C. Chad Basin L2b types do not, for the most part, match West African types. This is also consistent with the apparent absence of the L2b Chad Basin types in America (in contrast to the West African L2b types).
A total of 17 sequences were classified as L2c or L2d. These West African haplogroups occur in relatively higher frequencies not only in both of the Fulani sample sets, but also in the Buduma sample set (mainly matching the basal type which is highly prevalent in West Africa). They are virtually absent in both Arabic groups, both Nilo-Saharan groups, the Kotoko and the Mafa.
Haplogroup L3A (L3 without M and N) is most frequent in East Africa (∼50%), but can also be found in other parts of the continent. It is divided into several highly diversified sub–haplogroups, of which L3f and L4 are characteristic of East Africa, while L3b and L3d are specific to West Africa. The most diversified, the most extended, the most numerous and probably the oldest of the L3A types is haplogroup L3e, which dates back 46,000 years (Bandelt et al. 2001). We find that in the Chad Basin the highest number of sequences (N= 88) fall into the West African haplogroups L3b and L3d. They are not missing from any population, but are abundant in both of the Fulani sample sets, where they occur in more than 30% of samples. The lowest frequency of these West African haplogroups was identified in the Buduma (3%). The most prevalent L3b types in the Chad Basin match the basal type characterised by T16124C C16223T C16278T T16362C.
L3e is represented by 35 haplotypes (N= 75). It was detected in all of the Chad Basin population sample sets, and mainly in the Fali where it is found at a rate of 40%. Relatively high frequencies of L3e were found in the Kotoko and Shuwa Arabs (25% and 24%, respectively). Within this haplogroup representatives of L3e1 (determined in HVS-I by mutations C16223T C16327T) and its subclade L3e1b (deletion of T16325C on top), the clade L3e2 (C16223T C16320T) and its subclade L3e2b (16172–16189 on top), and the clade L3e3 (16223–16265T), can be detected in the Chad Basin samples. On the other hand we did not find L3e1a (16185–16311) and L3e4 (16223–16264) in this region. Consistent with other L-types in the Chad Basin, most of the American L3e1 and L3e2 types do not match those found in the Chad Basin, and mainly match West African types. It is also interesting to note that some L3e2 (and perhaps also L3e1) types are probably of Central African origin.
We also found a large set of L3e sequences in our samples (−3592 HpaI and +2349 MboI) carrying mutations A16041G C16223T (N= 39). These mtDNAs appear in all the Chad Basin populations, with the exceptions of the Arabs from Chad and the Buduma from Niger; their occurrence in Fulani samples is also very low. A search through our database of more than 6500 African mtDNA HVS-I profiles revealed 52 other sequences carrying mutations A16041G C16223T. We also note that these sequences are probably related to the complete Nigerian genome (#18) reported by Torroni et al. (2006), which carries the HVS-I variants A16037G A16041G C16223T T16311C T16519C and A73G C150T A263G 315+C T398C 523–524del in HVS-II. We named this branch L3e5. Note that this clade was detected earlier (but left unnamed) by Fadhlaoui-Zid et al. (2004), and was suggested to be of North African origin since “no match was found with sub-Saharan populations” (p. 230; Fadhlaoui-Zid et al. 2004). We therefore assign this minor subclade as L3e5, defined now by the basal HVS-I motif A16041G C16223T, and the sub-clade L3e5a defined by A16037G on top. The L3e5 network of Figure 3a reflects a clear star-like phylogeny. Most of the L3e5 types are found in western Central Africa, although there seems to be important diffusion into North Africa (interpreted by Fadhlaoui-Zid et al. (2004) as evidence for the autochthonous character of L3e5 in North Africa); the root type is relatively prevalent in the Chad Basin populations, and there are plethora of derived haplotypes (with nearly no matches in West Africa), indicating that L3e5 evolved in situ in this region. Taking the root type of L3e5 as the founder in western Central Africa (and more specifically in the Chad Basin), we estimate an expansion for this clade at about 11,450 ± 3,650 Y.B.P. in this region. In North Africa this clade seems to be more recent (with a larger standard deviation), dating to 7,100 ± 3,800 Y.B.P. There are only three West African L3e5 types, two of which match the root type while the other is found in the Serer of Senegal (Rando et al. 1998). There is absolutely no African-American mtDNA belonging to L3e5, which contrasts with the high prevalence of other L3e types in African-Americans, e.g. see Figure 9 in Salas et al. 2002; this again seems to indicate that the Chad Basin did not contribute significantly to the Atlantic slave trade.
Another well-represented haplogroup in the Chad Basin sample sets is L3f (defined in HVS-I by T16209C C16223T T16311C). Interestingly, an important number of sequences carry the extra variant C16176T (and most of them carried C16234T), while lacking the L3f diagnostic site T16311C (Figure 3b). This could constitute a new sub-clade of L3f; tentatively dubbed L3f2 here. The presence of C16188T on top of the L3f2 diagnostic motif would define L3f2a. L3f2a is probably related to the L3f complete genome of Nigerian mtDNA (#18) reported in Torroni et al. (2006; their Figure 1). Curiously, the HVS-II region of this Nigerian nearly matched a sample from Switzerland reported in Dimo-Simonin et al. 2000 (#142), sharing HVS-II positions 073-143-189-318 (some of which probably constitute part of the HVS-II diagnostic motif of L3f2). These sequences also mutate back at site T16311C, which probably constitutes a parallel mutation within L3f (note that this position is relatively unstable, e.g. Bandelt et al. 2002; Malyarchuk & Rogozin, 2004).
The phylogeography of L3f is particularly interesting. This haplogroup is probably of East African origin (Salas et al. 2002), while L3f1 appears to have spread at an early date into West Africa (and is also well represented in African-Americans; Salas et al. 2002) and probably into the Arabian Peninsula (Kivisild et al. 2004). The L3f2 root type is found in two western Central African sequences with two derived mtDNAs from East Africa. L3f2 is however found exclusively in western Central Africa (N= 17), and L3f2a is mostly found in this region as well. It is interesting that L3f2 diversification occurs almost exclusively in Chadic-speaking groups (the Chadic branch of the Afro-Asiatic linguistic family); note that these groups together constitute only 38% of our Chad Basin sample. It should also be noted that other published sub-Saharan sequences from the populations of North Cameroon (the Ouldeme, Podokwo and Mandara) belong to the Chadic group as well. If it is tentatively assumed that the root of L3f2 (16176-16209-16223-16234) constitutes a founder type, the TRMCA of this clade in western Central Africa would be 28,950 ± 11,600 Y.B.P, contemporary with L3f1 (Salas et al. 2002). The ‘double’ star-like shape of the L3f2 phylogeny suggests, however, the existence of at least two different expansion events, one of them affecting L3f2*. This assumption leads the authors to tentatively estimate an expansion event in the Chad Basin around 15856 ± 5943 Y.B.P. Consistent with the phylogeography of other clades there are no African-American L3f2 types. The introduction of L3f2 into North Africa is limited (Figure 3b), as is also true for other L3f lineages. The only East African L3f2 detected is within L3f2a, and corresponds to a Sudanese individual.
Haplogroup L4 occurs only in very small numbers, as predicted by the hypothesis formulated in Salas et al. (2004b). One Buduma matches the common L4g2 type in western Central Africa (matching individuals from many different western Central African populations, such as the Daba, Fali, Mandara, Podokwo, etc.). There are three other Chad Basin mtDNAs belonging to L4g2.
Finally, haplogroups L0d and L0k have been detected almost exclusively in the Khoisan people of southern Africa and in neighbouring Bantu populations, e.g. in Mozambique (Pereira et al. 2001; Salas et al. 2002); it is very likely that these are the last remnants of formerly more numerous and more diversified haplogroups that did not survive the period of Bantu expansion (Vigilant et al. 1991). As expected, these lineages were completely absent in the Chad Basin samples.
Principal Component Analysis
PCA is the usual method for summarising population relationships; here the intention is to observe the mtDNA patterns of the Chad Basin populations within the general African landscape. The pattern of PCA1 (Figure 4a) reflects the close relationship between the Chad populations and those of western Central and West Africa, and shows a clear separation from East Africa and an even greater distance to North and South Africans. The PCA2 plot reflects the proximity of some Chadic-speaking populations (e.g. the Kotoko and Buduma) to East Africa (reflecting the higher frequency of the L0a and, for example, L3f lineages in the Kotoko and L3f and M1 mtDNAs in the Buduma), while others are more closely related to West Africa (e.g. the Borgor Fulani and the Tcheboua Fulani), reflecting the prevalence of some typical West African lineages, e.g. L1b, L3b/d. PCA3 accentuates the distances of all the populations from North and South Africa as well; these regions behave as outliers, reflecting on the one hand chiefly the Khoisan component in South Africa, and on the other the European character of North Africa. PCA1, PCA2 and PCA3 account for 22.7%, 18.4% and 15.7% of the total variation, respectively.
MDS plotting of FST genetic distances obtained from 93 pairwise population comparisons revealed the outlying positions of North African and some (mainly Khoisan) sub-equatorial and Pygmy populations, without providing any clear visualisation of the relationships between the Chad Basin populations and their neighbours (data not shown). To better understand the genetic pattern of the area under investigation, we further analysed 60 selected populations. In addition to the aforementioned outliers, island populations (e.g. Bioko) were also excluded, as their contribution to the continental groups was minimal and they are likely to be susceptible to founder effects and drift after the initial settlement of their territories. Figure 4b clearly shows the homogeneity of West African populations (at the upper left hand side of the plot) on the one hand, and the dispersion of the East and South African groups on the other. The Chad Basin populations are situated somewhere in the middle. The two Arab samples, the Kotoko, the Mafa, the Masa, the Kanuri and the Buduma are closer to the East African populations living in or near the Ethiopian highlands; the others (both Fulani sample sets, the Hide, the Fali and the Kanembou) are linked rather to the West African group. In respect of the second dimension (vertical axis), however, the Chad Basin populations display no intelligible geographic or linguistic orientation. Figure 4c shows the MDS plot of the Chad Basin populations in the context of the main African regions; it mirrors in its first and second dimensions the pattern displayed by the PCA.
Apportioning of Genetic Variance
We carried out analyses of molecular variance (AMOVA) on the 12 Chad Basin populations analysed in the present work. It was observed that most of the genetic variation (∼96%) occurs within the populations, and that the variation between populations accounts for a non-negligible ∼4% (P < 0.000; 20000 permutations). If it is taken into account the fact that the FST value for the whole African continent is 0.12, then it can be said that the Chad Basin populations show a relatively high level of genetic homogeneity.
Africa is typically divided into broad areas, namely North, West, East, western Central (or Central), South-East, South-West, and Southern (e.g. Salas et al. 2002, 2004a). The Chad Basin is represented in today's mtDNA database by populations of Kanuri, Fulbe and Hausa sampled mainly from northern Nigeria and southern Niger (Watson et al. 1997); analysis of these samples seems to indicate a closer mtDNA affinity to the West African mtDNA gene pool than to that of East Africa (Salas et al. 2002).
We have aimed here to shed light on the role played by Lake Chad in the population history of the Sahel-Sudan belt of Africa, by analysing the mtDNA heritage of 12 different ethnic groups from this region. The genetic differentiation measured by FST distances in Chad Basin groups from neighbouring populations in East, western Central and West Africa is relatively small. This continual and geographically determined Sahel-Sudanic mtDNA landscape supports the idea of a weak linguistic contribution to mtDNA history in this part of world. More pronounced differentiation is reported only for the continuously migrating Fulani nomads (who are more closely related to the West African pool), as they are not statistically differentiated from Guinea-Bissau populations now living more than 3,000 km away (Černýet al. 2006).
All of the analyses carried out in the present study point to the close relationship between the Chad Basin populations and western Central Africa, with a close affinity to West as well as to some East African features. The PCA (Figure 4a) and the MDS (Figure 4b) plots summarise this general pattern. AMOVA also indicates a close relationship between the Chad Basin populations, with only 4% accounting for variation between populations. The percentages of shared haplotypes between the Chad Basin and the main African regions (Table 2) also accord with this general scenario.
The haplogroup profiles of the Chad Basin populations revealed a somewhat unexpected absence of some clades, such as the Bantu haplogroup marker L0a2 which is highly prevalent in South-East Africa as well as in western Central Africa (and, unsurprisingly, in Americans of recent African ancestry; Salas et al. 2002, 2004a, 2005b,d). The absence of the L0a1 clade in both of the Fulani sample sets, and their close resemblance to the West Africans from Guinea-Bissau, adds support to the theory of an East African origin for the L0a haplogroup. The presence of some L0a* and L0a1 mtDNAs in the Chad Basin, however, indicates some ancient connection with East Africa. The absence of mtDNA belonging to L1b* in the Chad Basin is more enigmatic. L1b1 is its most important ‘daughter’ haplogroup, and has been detected mainly in West Africa (and consequently in African-Americans). The occurrence of L1b1 in both of the Fulani Chad Basin sample sets at a high frequency (over 20%) accords with their West African origin. L1c is represented by the L1c1 and L1c3 clades, but the L1c2 is absent from the Chad Basin database. The most likely homeland of the L1c haplogroup is western Central Africa (e.g. the Congo delta), from where some of its two main clades reached West and South-East Africa. The youngest clade, L1c2, was carried to South-East Africa only by the Bantu migration. The low frequency of L1c in the Chad Basin populations indicates that this region was partially isolated from more equatorial western Central African populations. The very rare L5a haplogroup has been reported to be virtually restricted to East Africa alone; we have found a (Kanembu) representative living East of Lake Chad and so Kanem might thus be considered the most western extent of this haplogroup. It is interesting that the second clade, L5a (L1e2 in Salas et al. 2002), which is otherwise more widespread, was not found in our samples.
Some other haplogroups appear in the Chad Basin at relatively high frequencies. Thus L2a1, which is widely distributed all around sub-Saharan Africa, is also very well represented in the Chad Basin (but only one L2a1a sequence was identified). The lack of L2a1b is not surprising at all, as this clade has a more or less south-eastern distribution (Richards et al. 2004; Salas et al. 2002). Both L2b* and L2b1 are represented in the Chad Basin. Its West African origin is supported here by its presence in Fulani samples, mainly in the form of L2b1, which is otherwise much more confined to West Africa than L2b*. Representatives of L2c and L2d were also found - the L2d, in the form of L2d2 (defined by 111A transversion and four additional mutations G16145A C16239T C16292T C16355T), is present at a high frequency in the Buduma. Phylogenetic reconstruction and the phylogeographic patterns of the sub-clades L3f2 and L3e5 indicate their (probable) autochthonous origin in the Chad Basin, with some sporadic representatives in other parts of Africa in the case of L3f2, and a significant frequency in North Africa for some L3e5 types. They both seem to reflect the existence of pre-historical population expansions, as indicated by the star-like appearance of their phylogenies. One of these appeared before the last glacial maximum at about 28,950 ± 11,600 (L3f2); there is however evidence for a population expansion 15,856 ± 5943 Y.B.P. for L3f2*. L3e5 shows diversification at about 11,450 ± 3,800 Y.B.P. During the last glacial maximum the Chad Basin experienced a dry period known as the kanémien, but with the beginning of the Holocene Lake Megachad formed and the Chad Basin became a suitable place to find new foraging (fishing and hunting) opportunities. Some connections from North Africa can also be seen in the archaeological record at this time (Haour, 2003); such evidence also fits well with the expansion event detected for L3e5 from this region around 7,100 ± 3,800 Y.B.P. Later, some pastoral groups also entered the Chad Basin. The archaeological data from western parts of the Republic of Sudan, especially from the shores of the Wadi Howar (a tributary of the Nile), suggest an important human migration to the Chad Basin from the Upper Nile valley, somewhere between the third and fourth cataracts (Keding 1993; Blench 1999). This new colonisation of eastern parts of the Chad Basin traversed large river valleys with some expanses of water. It is, then, possible that one of these demographic expansions, detectable in the still sparse archaeological record of the Chad Basin, was responsible for the star-like phylogenetic shape of L3f2* and L3e5.
It is interesting that the analysis of our population samples from Chad Basin has enabled the identification of the most likely geographical origin of L3f2 on the sub-Saharan side, in contrast to the North African geographic origin suggested by Fadhlaoui-Zid et al. (2004). The fact that these clades (L3f2 and L3e5) have not yet been found in East Africa may indicate that the human population(s) in which these diversifications occurred remained - with the ongoing drying of the climate - isolated from related East African groups. The phylogeographic characteristics of most of the Chad Basin, typically the East and West L-lineages, indicate diversification in situ, providing evidence from the sharing of the most prevalent (preferably basal) types, but not the sharing of the (one or more mutational step) derived mtDNAs.
In brief, the history told by the mtDNA seems to indicate that the Chad Basin has a mainly western Central African background; this is indicated by several analyses carried out in this study (shared haplotypes between regions, PCA, etc.), but especially by the phylogeographic patterns observed. The Chad Basin was also the epicenter of a bidirectional ‘genetic’ corridor between West and East Africa, favouring the input of West African types into the Chad Basin - probably due to its geographical proximity. The low frequency of the autochthonous North African U6 haplogroup in the Chad Basin populations testifies to the limited influence of North Africa in the region. Some lineages (L3f2* and L3e5) mirror the existence of demographic expansion events in the region, dated to about 15,856 ± 5,943 to 11,400 Y.B.P. In addition, all the evidence points to the Chad Basin having been set apart from the African scenario of the Atlantic slave trade.
It is interesting that the Arab-speaking populations fit well (e.g. PCA and MDS) with the Chad Basin sub-Saharan variability, despite their distinctive phenotypical appearance. On the other hand, other non-sedentary peoples of the Chad Basin – the Fulani nomads – are clearly differentiated (mainly because of high frequencies of the L1b and L3b/d haplogroups) from the most general Chad Basin mtDNA gene pool.
It is to be expected that the genetic exploration of still unknown areas of Eastern Chad and Western Sudan will in the near future enable the drawing aside of the veil on other fascinating stories in this remote part of Africa.
The authors wish to express their gratitude to the Chad Basin volunteers for their helpful participation in the study. We would also like to thank the two anonymous referees of this article for their most useful comments. This project was supported by the Grant Agency of the Czech Republic (under grant no. 404/03/0318), the Andrew W. Mellon Foundation through the Council of American Overseas Research Centers, Washington, DC, and the Fondation Maison des Sciences de l′Homme in Paris. This work was partially supported by grants from the Ministerio de Sanidad y Consumo (PI030893; SCO/3425/2002), Fundación Investigación Médica Mutua Madrileña Automovilística, and Genoma España (CeGen; Centro Nacional de Genotipado) given to AS.