The matrilineal genetic composition of 372 samples from the Republic of Guiné-Bissau (West African coast) was studied using RFLPs and partial sequencing of the mtDNA control and coding region. The majority of the mtDNA lineages of Guineans (94%) belong to West African specific sub-clusters of L0-L3 haplogroups. A new L3 sub-cluster (L3h) that is found in both eastern and western Africa is present at moderately low frequencies in Guinean populations. A non-random distribution of haplogroups U5 in the Fula group, the U6 among the “Brame” linguistic family and M1 in the Balanta-Djola group, suggests a correlation between the genetic and linguistic affiliation of Guinean populations. The presence of M1 in Balanta populations supports the earlier suggestion of their Sudanese origin. Haplogroups U5 and U6, on the other hand, were found to be restricted to populations that are thought to represent the descendants of a southern expansion of Berbers. Particular haplotypes, found almost exclusively in East-African populations, were found in some ethnic groups with an oral tradition claiming Sudanese origin.
Unveiling the history of human settlement in the West Coast of Africa is a complex task. It is the result of a continuous complex network of migrations, invasions and admixture of peoples from different origins. Fossil evidence suggests a modern human presence in NW Africa around 40000 years before present (YBP) (Alimen, 1987). A pre-Neolithic Capsian culture evolved later locally or through a diffusion from the Near East (Camps-Faber, 1989). Around 9000 YBP, when the Sahara went through a period of maximum humidity (Aumassip et al. 1988), several Neolithic cultures flourished in the area, bringing together people of sub-Saharan and North African origin (Dutour et al. 1988). The domestication and spread of several African-specific plants probably started in western Sahel after 4000 YBP. The first phase of largely east and southward oriented Bantu migrations, originating from the central Gulf of Guinea region, is a likely outcome of these cultural developments (Fage, 1995).
The Ghana Empire, between Niger and Senegal, is the oldest known occidental African Kingdom (Fage, 1995) which was followed in the 14th–16th centuries by other empires (Mali, Songhai). The admixture of Berbers with native populations of this area dates back at least to the 9th century A.D., after the arrival of pastoral Peuls or Fulbe (here designated as Fula). In 1086 Ommíades conquered North-Western Africa and pushed the populations from South Morocco and Mauritania to the Senegal region (Moreira, 1964). When the Europeans arrived in Senegambia in the 15th century they met most of the presently known ethnic groups settled in the region (Teixeira da Mota, 1954). The Fula arrived again two centuries later, coming from the Futa Toro and Sahel regions, dominating the whole area. The Mandinga (Mandenka) were the last to arrive in this region (Carreira & Quintino, 1964).
Present day Guinean ethnic groups are disseminated all over the territory. The Balanta are the biggest group, and in the first quarter of the 20th century spread over territories occupied earlier by other ethnic groups. The origin of the Balantas is uncertain. Some see language affinities with the Sudanese from whom they could have separated 2000 years ago with the first spread of kushites migrations (Quintino, 1964). According to Stuhlmann (1910), the group derives from a Bantu branch, which separated in the Pleistocene near the Nile, following camite invasions. The Bijagós inhabit the Archipelago of the same name and some scholars see strong cultural resemblances to Egyptians (Quintino, 1964), but others relate them to the Senegalese Djola. The latter are a rather heterogeneous group, and include the Beafada which have an oral tradition of coming from Mali (Lopes, 1999). A mass arrival of Fula took place in the beginning of the 19th century. The origin of this ethnic group is unknown, but tradition relates them to Hiksos and Nubians. They show the typical phonetic “glottal catch” which characterizes the whole group.
A total of 372 blood samples were collected from unrelated Guinean males whose maternal ancestors were known to belong exclusively to a specific ethnic group. The samples were collected either in military camps with the permission of the Guiné-Bissau Chairman of the Joint Chiefs of Staff, or in the villages around the country with the help of the Ministry of Health. Every participant gave his consent in an individual interview after a detailed explanation of the project. Sample sizes and origins (along with additional information) are specified in Table 1 and 2. Due to the complex history involving the major ethnic groups in Guiné-Bissau, they do not all follow a clear present-day settlement pattern (see Figure 1).
Table 1. Population data of the Guinean samples ethnic distribution
aincludes the so-called Balanta-Mané (Balanta islamized by Mandinga); bincludes Felupes;
Bainouk, Banyuk, Elomay
Biafada, Bidyola, Biafar
Fulup, Floup, Ejamat, Ediamat
Fulbe, Futa Jallon
Fulbe, Futa Jallon
Jahanque, Jahanka, Diakanke
Susu, Sose, Soso
Table 2. Haplogroup relative frequencies and diversity index (H) in Guiné-Bissau ethnic groups and several other African populations. Superscripts (a-g) in Guinean ethnic groups refer to codes used in Figure 3 (PCA)
The leukocyte fraction of whole blood was used for DNA extraction by standard methods and the mtDNA hypervariable segment I (HVS-I) of the control region was amplified and sequenced. Sequencing products were separated on a MegaBACE 1000 automatic sequencer, following the manufacturer's specifications and aligned using Wisconsin Package GCG Version 10.0. All sequences were read between nucleotide positions (nps) 16024 and 16400. Additional information regarding polymorphic sites 185, 186, 189, 195, 236, 297 and 322 in HVS-II was obtained by directly sequencing all samples that could not be unambiguously classified on the basis of HVS-I information alone.
In case of ambiguity in defining mtDNA haplogroups on the basis of the HVS-I haplotype, additional data was gathered from restriction fragment length polymorphisms (RFLPs) of diagnostic sites. All restriction digests were made according to the manufacturer's instructions (Fermentas and New England BioLabs). The following polymorphic restriction sites were screened: 322HaeIII, 1715DdeI, 2349MboI, 2758RsaI, 3592HpaI, 3693MboI, 4157AluI, 4685AluI, 5584AluI, 5656NheI, 7055AluI, 8616MboI, 10084TaqI, 10321AluI, 10394DdeI, 10397AluI, 10806HinfI, 11439MboI, 11641HaeIII, 12308HinfI, 13803HaeIII, 13957HaeIII, 14766MseI and 14868MboI. The following coding region sites were ascertained by sequencing: 2758, 4218, 12618, 13105 and 14182. Primers and PCR conditions used in all analyses are available as Complementary Material at http://www.ahg.com.
Haplogroup L2 is divided into L2a (characterized by 16294 and 13803), L2b (16114A, 16129, 16213 and 4158), L2c (322 and 13958), and L2d (16399 and 3693) sub-clades. Mutations 16278, 16362 and 10086 characterize haplogroup L3b; haplogroup L3d is defined by 8618 and shares with L3b a transition at np 13105. According to Bandelt et al. (2001) L3e (defined by 2352) is subdivided into L3e1 (16327), L3e2 (16320), L3e3 (16265T) and L3e4 (16264 and 5584) clades. L3e2 is further subdivided into L3e2* (14869) and L3e2b (16172 and 16189). As in Salas et al. (2002), L3f captures all L3* lineages with a mutation at 16209. L3f1 is further defined by a T at np 16292 (and 14766). Here we further define a new sub-cluster, L3h characterized by a loss of the DdeI site at np 1715 (mutation at np 1719) and the HVS I motif 16129, 16256A and 16362. Following Finniläet al. (2000) U5b is characterized by 5656 and 12618 over 14182. Haplogroup U6 (Rando et al. 1998) is defined by 16172 and 16219. Haplogroup M1 is characterized by 16129, 16189, 16249 and 10400 mutations (Quintana-Murci et al. 1999).
Genetic Analysis and Population Comparisons
Median networks of HVS-I haplotypes (Bandelt et al. 1995, 2000) were drawn for each haplogroup separately, using the Network 3.1 program (Arne Röhl, http://www.fluxus-engineering.com/sharenet.htm). Haplogroup frequencies, molecular diversity indexes (FST) and genetic diversity (H - Nei, 1987) for haplotypes and haplogroups and analysis of molecular variance (AMOVA) were calculated using Arlequin v2.0 (Schneider et al. 2000). Comparisons between populations were assessed by subjecting the (relative) frequency vectors of the haplogroups to a principal component analysis (PCA).
A local database with more than 19000 individuals taken from literature and our unpublished data from worldwide populations was employed to search for exact matches of Guiné-Bissau haplotypes, ignoring length variation in the C stretch of the HVS-I.
Coalescence times were estimated by means of the ρ statistic, assuming that a transition within 16090-16365 corresponds to 20180 years (Forster et al. 1996).
Results and Discussion
The 372 Guinean samples clustered to 192 different haplotypes of all major West African mtDNA haplogroups (for the complete list see Complementary Material). Three predominant haplotypes (GB4, GB85 and GB117) captured 13% of the Guinean mtDNA variation, occurring at a frequency higher than 3% each. Most sequences (94%) could be classified as belonging to sub-Saharan African L0a1, L1b, L1c1, L2a, L2b, L2c, L2d1, L3b, L3d, L3e, L3f1 and L3h haplogroups and subhaplogroups. Unexpectedly for a West African population, 22 (5.9%) of the samples clustered to haplogroups M1 (1.1%), U5 (2.7%) and U6 (2.2%, Table 2; Graven et al. 1995; Watson et al. 1997; Rando et al. 1998; Salas et al. 2002). M1 and U6 are found in North and East Africa, Arabia, and the Middle East, whereas U5 has been sampled at appreciable frequencies only in Europe (Passarino et al. 1998; Quintana-Murci et al. 1999; Richards et al. 2000). The haplogroup profile for each ethnic group separately can be found in the Complementary Material.
Haplogroup L0 was represented in Guineans only by its daughter group L0a1 showing marginal frequencies ranging from 1% to 5% (Table 2), in contrast to its frequency in East African populations (e.g. 25% in Mozambique: Watson et al. 1997; Pereira et al. 2001; Salas et al. 2002). Interestingly, only the Balanta, a group claiming Sudanese origin, showed an increased frequency of this clade (11%). Haplogroup L0a has a Paleolithic time depth in East African populations (33,000 year old, Salas et al. 2002). The relatively young coalescent date of L0a1 in Guineans (6400±2600 years, assuming a single founder) suggests that only a small subset of L0a reached Guinea during the Holocene. The founder haplotype of L0a in Guineans, GB4 (see Table 4 in Complementary Material), has an exact match in East Africa, the Middle East and in Cape Verde and Senegal Mandenka populations, indicating that its spread is not strictly restricted to Guineans. The lack of the L0a2 clade, associated with the 9bp deletion in CoII/tRNALys intergenic region, and widespread in Bantu speaking populations all over Africa (Soodyall et al. 1996), suggests that L0a has at least two distinct phylogeographic patterns in Central and West Africa. We cannot discard the possibility of a Bantu migration to West Africa, as the founder group could have a distinct composition from those who participated in the southwards migration(s).
Haplogroup L1b is restricted mostly to West African populations (Graven et al. 1995; Watson et al. 1997; Salas et al. 2002) and is represented by two different branches in Guineans. Its major cluster (Figure 2) L1b1 is associated with a transition at np 16293 and includes a frequent sub-clade defined by the combined presence of a transversion to A at np 16114 and a transition at np 16274 that has also been observed in Senegalese Mandenka (Graven et al. 1995) and Wolof (Rando et al. 1998). L1b1 presents a TMRCA of about 36000 years (Figure 2), predating the diversity of L0a1 in Guineans. The matches in this cluster have a West African distribution well represented in Mandenka (haplotypes GB8 and GB20) and their frequency is highest in the Fulani-western and Senegal-eastern language groups (Table 2). GB23 and GB24 are widespread in Africa and are found in nearly all West African populations considered here (Salas et al. 2002). Another West African specific clade, L1c, is present at a relatively low frequency (0-8%) yet with high haplotype diversity in the Guiné-Bissau sample.
Haplogroups L2a-L2c are frequent in Senegambia (Table 2) and reveal signatures of a recent expansion from a limited number of founder haplotypes that are shared between populations of different linguistic affiliation. In contrast, haplotypes belonging to haplogroup L2d are represented by single individuals and do not show a common founder sequence (Figure 2). Fifteen out of 42 L2a haplotypes sampled in Guinea Bissau had matches elsewhere: West Africa (Cabo Verde, Brehm et al. 2002; Wolofs & Senegalese, Rando et al. 1998; Mandenka, Graven et al. 1995) but can also be found in East, South and North Africa. The geographic distribution of L2b and L2c haplotypes is largely restricted to West Africa. Not surprisingly most of the haplotype matches are with Cabo Verdeans, Wolof and Senegalese. L2c is the haplogroup that shows a higher extent of shared lineages: Cape Verde, Senegal Mandenka, mixed Senegalese and São Tomé. The last case is likely due to a recent gene flow from the Cape Verde Islands (Brehm et al. 2002). However, several L2 haplotypes observed in Guineans appeared as unspecific to other West African populations but shared matches with East and North Africans. This was the case for the Balanta (BLE) haplotype GB44 matching only with Sudanese (Watson et al. 1999), and GB59 matching with Moroccan sequences. Interestingly, haplotype GB83 (L2b) found in the Mansonca (MSW) group had an exact match only with Ethiopians (our unpublished data). Also the Fula haplotype GB39 has not been reported in West Africa but appears in East Africa: Lake Turkana (Watson et al. 1997), Nubia, Southern Sudan, Ethiopia and Saudi Arabia (our unpublished data).
Haplogroups L3b, L3d, and L3e are rare or absent in indigenous populations of North and South Africa but well represented in our sample. GB127 and GB134 are particular links of Guinean groups to Northwest African Mozabites, Moroccans and Senegalese. Particularly, GB136 from Fula-related people has been found so far in Hausa and again in Nubians and Sudanese. Apart from Mozambique (6%) the majority of L3d lineages are West African (7% in mixed Senegalese to 12% in Niger/Nigeria) with an estimated age of 42100 (±10600, Salas et al. 2002). L3f is more frequent in Southeast Africa, ranging from 8% in Kenya/Sudan to 2% in Mozambique. The coalescence time of this haplogroup in West Africa was calculated as 39400 ya (±10400, Salas et al. 2002), within the error range of the estimate based on Guinean samples (49350±16200 ya). Haplotype GB178 in Fula shared an exact match with sequences from a wide range of East-African populations (Somalia, Egypt) and even Saudi Arabia. Haplogroup L3h is found in Ethiopia, Cape Verde and Niger/Nigeria at marginal frequencies (∼1%) but reaches its highest known frequency in the Ejamat from Guinea (8%). Its coalescent time estimate (14000±8400 ya) in Guineans is consistent with its late Pleistocene/early Holocene spread around Africa.
No significant differences between Guinean ethnic groups pooled by their linguistic affiliation were observed in haplogroup frequencies. As for their geographic neighbours (Table 2), haplogroups L1b, L1c, L2b, L2c, L2d, L3b, L3d, and L3e cover most of the mtDNA variation (64-85%). The Guiné-Bissau sample shows an overall genetic diversity of 0.901 (sd.005) that is significantly higher than among other samples from West Africa (Table 2).
M1 and U6 Lineages
Haplogroup M1 has been characterized as an East African remnant of the major Asian haplogroup M (Quintana-Murci et al. 1999). It has been found mostly in Ethiopian populations (17%), its characteristic HVS-I motif being also well represented in Egyptian and Sudanese populations along the Nile Valley (7-8%, Krings et al. 1999). HVS-I haplotypes matching the East African M1 clade have also been identified in Northwest Africans (Plaza et al. 2003, unpublished data) where their frequency can reach 12.8% in Algerians and 4% among Moroccan and Algerian Arabs and Berbers. M1 is generally absent from autochthonous West African populations but was found among Balanta, Baiote, and Djola groups speaking Niger Congo Atlantic Bak languages. The Guinean M1 haplotypes matched exactly one West Saharan (Rando et al. 1998), 2 Mozabites (Côrte-Real et al. 1996), 2 Iranian and one Saudi Arabian sequence (unpublished data). This lineage derives from a particular cluster defined by a mutation at position 16185, which is also found in Ethiopia, Morocco and North African populations (Plaza et al. 2003, our unpublished results).
Haplogroup U6 is rather frequent in NW Africa, among Algerian Berbers, Moroccans and Mauritanians (Côrte-Real et al. 1996; Rando et al. 1998; Plaza et al. 2003), but is rare or absent in western sub-Saharan Africans. Three different U6 haplotypes were observed in Fula, Mandenka and Manjaco groups. These haplotypes match with sequences of a wide geographic range: North and West Africa (Cabo Verde, Tuareg, Mozabites, Moroccan Arabs and Berbers), East Africa (Nile Valley, Egypt and Ethiopia), the Middle East (Iran) and Mediterranean Europe (Sicily and Portugal, http://www.ahg.com/), suggesting that their spread might be related to the southern expansions of the Berber groups to whom the Fulani languages relate.
European Lineages: U5
Ten individuals out of 372 samples, all related to Fulbe groups, carried mtDNA variants typical of western Eurasia, particularly Europe. Within these mtDNAs belonging to haplogroup U5 nine Fulanis share one particular HVS-I haplotype. Both haplotypes are only one mutational step away from a common node widespread in Europe. Although U5 is one of the most frequent mtDNA variants among western Eurasians (about 460 sequences in our mtDNA HVS-I database) no exact matches to the two Guinean haplotypes were found, as would be expected in the case of recent admixture. On the other hand, the Fulani U5 haplotype appears in a data set of West Africans (Wolof and Serer, Rando et al. 1998) and in Moroccans (unpublished data), pointing to the existence of a common African founder lineage of haplogroup U5. Again, as in haplogroup U6 the linguistic correlation suggests that the spread of the haplotype in Senegambia might be related to the movement of Berber populations. More data from North and West African populations is needed to better characterize the source and the time of the spread of this founder lineage.
AMOVA and Principal Component Analysis
Analysis of molecular variance (AMOVA) in African populations attributed 15.6% to differences between groups, 3% to variation between populations within groups, and 81.6% to differences within populations (overall FST= 0.184, P < 0.0001). A hierarchical structuring of populations into groups based on religion beliefs (Muslims vs. Animists) and geography (interior vs. littoral) gave similar values (data not shown).
A principal component (PC) analysis distinguished North Africans from sub-Saharans (Figure 3). The difference revealed by the first component is likely due to the presence of Eurasian mtDNA lineages among the North Africans and a relatively higher frequency of haplogroups L2a, L2c, L2d and U6 in Northwest Africa. The second component reflects L2/L0 frequencies. Moroccan Berbers and Arabs and Algerian Berbers are plotted close to Egyptians, supporting a common origin, while Algerian Arabs are placed apart. The Nile Valley sample occupies an intermediate position between Ethiopia and North Africans. The populations from Mozambique appear isolated and well differentiated from Kenya and Sudan. All the West Africans form a distinct and more compact cluster. Nevertheless the isolation of Senegalese Mandenka (Sm) and the Fula from Guiné (e) should be noted. As a whole, Guinean groups are closer to West and then East Africa (see Axis 1, Figure 3).
Roughly 87% of the mtDNA lineages found in the Guinean populations are common in other West African populations. Not surprisingly, the highest number of matches was with Cape Verde followed by other populations from the area (Mandenka, Wolof, Fulbe), but also with Morocco. The notable L haplotype sharing with North Africans testifies to the absence of a real barrier between this region and typical sub-Saharan populations. On the other hand, some Guinean groups (Fula and Balanta for instance) present haplotypes otherwise observed to date in East-African and Middle East populations.
It is interesting to note that the Bantu-associated markers L0a 9bp del CoII/tRNALys (Soodyall et al. 1996), L3b motif 16124-16223-16278 (Watson et al. 1997), L3e1 particularly L3e1a characterized by mutation 16185 (Bandelt et al. 2001) or the 16192 L2a1 subclade (Pereira et al. 2001), were not found in our sample. This suggests that either Bantu migrations contributed very little to the gene pool of Guineans, despite the evidence of a Bantu migration starting from Cameroon and spreading towards Ghana, Nigeria, Burkina Faso and Mauritania, or that they had a distinct gene pool from that associated with the southwards migrants. The lack of Bantu branches of the Niger-Congo linguistic family, among a plethora of languages spoken in Guiné-Bissau, is more in agreement with the first hypothesis.
The finding of haplogroup M1 lineages of East African origin, albeit at low frequencies (3-5%) in Guinean groups with linguistic affinities to the Bak superfamily including Balanta, Baiote and Ejamat languages, supports the earlier suggestion of a Sudanese origin of the Balanta population and their spread to western Africa with kushitic migrants approximately 2000 years ago. Obviously, thereafter they were assimilated within the local population, acquiring their language. In particular the 16185 mutation might suggest a route through North Africa. The U6 presence in the Guinean pool, although at a low frequency, is not surprising, as these particular lineages have already been reported for this region. It seems plausible that the U5 lineages observed in the Fula arrived in Guiné via Sahel from North Africa before the slave trade. None of the typical European haplogroups (H, J, and T) were found in the present-day population of Guinea, whereas they exist at a fairly high frequency in North Africa in contrast to the U5 frequency (only 4.5%). This makes it less likely that the presence of U5 in Guiné, in particular, and in Northwest Africa in general, is due to recent admixture with the European population. A possible ancient migration from Asia to Africa was proposed by Cruciani et al. (2002) to explain the presence of some unusual Y-chromosome lineages identified in West Africa. Haplogroup R1 (defined by M173 mutation), without further branch defining mutations (M269 and M17) specific to Europeans, accounted for ∼40% of the Y-chromosomes in North-Cameroon, while not yet having been sampled elsewhere in Africa. More data from Central and Western Africa are needed to cast light on the origin of such idiosyncratic mtDNA and Y chromosome lineages. Thus, our U5 sequences from the Guinean Fulbe people corroborate Cruciani's hypothesis of a prehistoric migration from Eurasia to West Sub-Saharan Africa, testified by their present day restricted and localised distribution.
The authors are grateful for the precious help of the Chairman of the Joint Chiefs of Staff and the Ministry of Health from Guiné-Bissau. ICCTI (Lisbon, Portugal) and the Regional Government of Madeira provided financial support to AB. AR is a recipient of a Ph.D. scholarship from Fundação para a Ciência e Tecnologia (FCT, Lisbon) reference SFRH/BD/12173/2003. TK was supported by the Estonian basic research grant 4769. We are also grateful to the contributions from two anonymous reviewers to an early version of the manuscript.
Electronically Available Data
HVS-I and HVS-II haplotypes and their distribution among ethnic groups from Guiné-Bissau are available as Complementary Material at the web site http://www.ahg.com/. A list of the PCR primers and conditions used to amplify all pertinent mtDNA regions are also included in the Complementary Material web site.