MtDNA Profile of West Africa Guineans: Towards a Better Understanding of the Senegambia Region

Authors

  • Alexandra Rosa,

    1. Department of Evolutionary Biology, Estonian Biocenter, Tartu University, Riia 23, 51010 Tartu, Estonia
    2. Human Genetics Laboratory, Center of Macaronesian Studies, University of Madeira, Campus of Penteada, 9000-390 Funchal, Portugal
    Search for more papers by this author
  • António Brehm,

    Corresponding author
    1. Human Genetics Laboratory, Center of Macaronesian Studies, University of Madeira, Campus of Penteada, 9000-390 Funchal, Portugal
    Search for more papers by this author
  • Toomas Kivisild,

    1. Department of Evolutionary Biology, Estonian Biocenter, Tartu University, Riia 23, 51010 Tartu, Estonia
    Search for more papers by this author
  • Ene Metspalu,

    1. Department of Evolutionary Biology, Estonian Biocenter, Tartu University, Riia 23, 51010 Tartu, Estonia
    Search for more papers by this author
  • Richard Villems

    1. Department of Evolutionary Biology, Estonian Biocenter, Tartu University, Riia 23, 51010 Tartu, Estonia
    Search for more papers by this author

*Corresponding author: António Brehm, phone +351291705383, fax +351291705393, e-mail: brehm@uma.pt

Summary

The matrilineal genetic composition of 372 samples from the Republic of Guiné-Bissau (West African coast) was studied using RFLPs and partial sequencing of the mtDNA control and coding region. The majority of the mtDNA lineages of Guineans (94%) belong to West African specific sub-clusters of L0-L3 haplogroups. A new L3 sub-cluster (L3h) that is found in both eastern and western Africa is present at moderately low frequencies in Guinean populations. A non-random distribution of haplogroups U5 in the Fula group, the U6 among the “Brame” linguistic family and M1 in the Balanta-Djola group, suggests a correlation between the genetic and linguistic affiliation of Guinean populations. The presence of M1 in Balanta populations supports the earlier suggestion of their Sudanese origin. Haplogroups U5 and U6, on the other hand, were found to be restricted to populations that are thought to represent the descendants of a southern expansion of Berbers. Particular haplotypes, found almost exclusively in East-African populations, were found in some ethnic groups with an oral tradition claiming Sudanese origin.

Introduction

Unveiling the history of human settlement in the West Coast of Africa is a complex task. It is the result of a continuous complex network of migrations, invasions and admixture of peoples from different origins. Fossil evidence suggests a modern human presence in NW Africa around 40000 years before present (YBP) (Alimen, 1987). A pre-Neolithic Capsian culture evolved later locally or through a diffusion from the Near East (Camps-Faber, 1989). Around 9000 YBP, when the Sahara went through a period of maximum humidity (Aumassip et al. 1988), several Neolithic cultures flourished in the area, bringing together people of sub-Saharan and North African origin (Dutour et al. 1988). The domestication and spread of several African-specific plants probably started in western Sahel after 4000 YBP. The first phase of largely east and southward oriented Bantu migrations, originating from the central Gulf of Guinea region, is a likely outcome of these cultural developments (Fage, 1995).

The Ghana Empire, between Niger and Senegal, is the oldest known occidental African Kingdom (Fage, 1995) which was followed in the 14th–16th centuries by other empires (Mali, Songhai). The admixture of Berbers with native populations of this area dates back at least to the 9th century A.D., after the arrival of pastoral Peuls or Fulbe (here designated as Fula). In 1086 Ommíades conquered North-Western Africa and pushed the populations from South Morocco and Mauritania to the Senegal region (Moreira, 1964). When the Europeans arrived in Senegambia in the 15th century they met most of the presently known ethnic groups settled in the region (Teixeira da Mota, 1954). The Fula arrived again two centuries later, coming from the Futa Toro and Sahel regions, dominating the whole area. The Mandinga (Mandenka) were the last to arrive in this region (Carreira & Quintino, 1964).

Present day Guinean ethnic groups are disseminated all over the territory. The Balanta are the biggest group, and in the first quarter of the 20th century spread over territories occupied earlier by other ethnic groups. The origin of the Balantas is uncertain. Some see language affinities with the Sudanese from whom they could have separated 2000 years ago with the first spread of kushites migrations (Quintino, 1964). According to Stuhlmann (1910), the group derives from a Bantu branch, which separated in the Pleistocene near the Nile, following camite invasions. The Bijagós inhabit the Archipelago of the same name and some scholars see strong cultural resemblances to Egyptians (Quintino, 1964), but others relate them to the Senegalese Djola. The latter are a rather heterogeneous group, and include the Beafada which have an oral tradition of coming from Mali (Lopes, 1999). A mass arrival of Fula took place in the beginning of the 19th century. The origin of this ethnic group is unknown, but tradition relates them to Hiksos and Nubians. They show the typical phonetic “glottal catch” which characterizes the whole group.

Here we analyze the mtDNA lineages present in the major ethnic groups of Senegambia, covering a broad number of recognized groups underrepresented in previous studies (Graven et al. 1995; Watson et al. 1997; Rando et al. 1998), and compare them within the broader context of African mtDNA variability (Graven et al. 1995; Watson et al. 1997; Rando et al. 1998, 1999; Krings et al. 1999; Chen et al. 2000; Pereira et al. 2001; Brehm et al. 2002; Salas et al. 2002). Because mtDNA haplogroups show distinct geographic patterns in Africa, their frequency and diversity patterns in West Africa can be informative with respect to the origin of the different ethnic groups from Guiné-Bissau. The presence of Y-chromosomes of Eurasian affiliation among populations from Cameroon at a high frequency, as reported recently (Cruciani et al. 2002), raises the intriguing question of back migrations from Eurasia to Africa, here supported by the presence of particular Eurasian mtDNA lineages among Guineans.

Material and Methods

Sampling

A total of 372 blood samples were collected from unrelated Guinean males whose maternal ancestors were known to belong exclusively to a specific ethnic group. The samples were collected either in military camps with the permission of the Guiné-Bissau Chairman of the Joint Chiefs of Staff, or in the villages around the country with the help of the Ministry of Health. Every participant gave his consent in an individual interview after a detailed explanation of the project. Sample sizes and origins (along with additional information) are specified in Table 1 and 2. Due to the complex history involving the major ethnic groups in Guiné-Bissau, they do not all follow a clear present-day settlement pattern (see Figure 1).

Table 1.  Population data of the Guinean samples ethnic distribution

Code
Ethnic
group
Language
group WA

Religion
Closest
language group

Synonyms
  1. A-Animist, M-Muslim, C-Christian; Population codes and language groups follow terminology from http://www.sil.org/ethnologue/;

  2. aincludes the so-called Balanta-Mané (Balanta islamized by Mandinga); bincludes Felupes;

BLEBalantaaBak-Balanta-GanjaA,M,CTendaBallante, Balant
BDABaioteBak-Diola-BayotADiolaBayotte
BABBanhuEastern Senegal-BanyunA,MTendaBainouk, Banyuk, Elomay
BIFBeafadaEaster Senegal-TendaMBadyaraBiafada, Bidyola, Biafar
BJGBijagóBijagóA Bidyogo, Bijougot
BRABrame 
CCJCassangaEastern Senegal-NunABanhu-FelupeKasanga, I-Hadja
EJADjolabBak-Diola-HerEjamatA,CDiola-WolofFulup, Floup, Ejamat, Ediamat
FULFulaFulani-West CentralMFula-WolofFulbe, Futa Jallon
FUFFuta-FulaFulani-West CentralMFula-WolofFulbe, Futa Jallon
FUCFula-PretoFulani-WesternMFula-WolofPeul, Peulh
FUCFula-ForroFulani-WesternMFula-WolofPeul, Peulh
FUTFula-TorancaFulaniMFula-WolofPeul, Peulh
JADJancancaMandenkanMMandinkaJahanque, Jahanka, Diakanke
LANLandoma 
MANMancanhaBak-Manjaku-PapelA,CManjaku-PapelMankanya, Mankanha
MNKMandingaMandekanMKalenke, JahankaMandingue, Mandenka
MFVManjacoBak, Manjaku-PapelA,C,MMancanha, PapelMandyak, Manjiak
MSWMansoncaSuaM Kunante, Mansoanka
NAJNalúMbulungish-NaluA,MSusuNalou
SUDSussuSusu-YalunkaM,A,CYalunkaSusu, Sose, Soso
PBOPapelBak, Manjaku-PapelA,CMankanya, MandyakPepel, Oium
Table 2.  Haplogroup relative frequencies and diversity index (H) in Guiné-Bissau ethnic groups and several other African populations. Superscripts (a-g) in Guinean ethnic groups refer to codes used in Figure 3 (PCA)
HaplogroupMozambique (Mo)1,2Ethiopia (Et)3Kenya/ Sudan4, iEgypt (Eg)5Nile Valley (Nv)6!Kung/ Khwe7,8Cabo Verde (CV)9Senegal Mand (Sm)10Sen. mixed (Sx)11, iiNiger/Nigeria (NN)4,8,iii
Guiné-Bissau
Morocco Arabs (Ma)11,12Mor. Berbers (Mb)11,12Algeria Berbers (Ab)12,13Alg. Arabs (Aa)11,12
BDA/EJA/
BAB/
CCJ/BIFa


BJGb


BLEc

PBO/MFV/
MANd

FUC/FUF/
FUL/FUTe
MNK/MSW/
LAN/
JAD/SUSf


NAJg

Guiné
Total
  1. 1Pereira et al. (2001); 2Salas et al. (2002); 3unpublished; 4Watson et al. (1997); 5unpublished; 6Krings et al. (1999); 7Chen et al. (2000); 8Vigilant et al. (1991); 9Brehm et al. (2002); 10Graven et al. (1995); 11Rando et al. (1998); 12,13unpublished (i - Turkana, Kikuyu and Somalia; ii - Senegalese, Wolof and Serer; iii - Songhai, Tuareg, Yoruba, Hausa, Fulbe and Kanuri). The so-called Eurasian haplogroup refers to all non-Ls, M1 or U6 sequences. N, total sample number; H, Nei's (1987) diversity index; sd, H standard deviation.

L0a2571338412014911415450012
L1b12014182017156181361794115412
L1c5001007233207465851100
L1*641614690001000000000000
L2a331316515320122020121315182214151755314
L2b12020443161105613071981102
L2c10000016397418131117162315161104
L2*1000201001003132421105
L3b3001171081113141358105484229
L3d630100797122913125121592015
L3e150331121547151451143712752414
L3f2583401036650313022000
L3h0100001001852415040000
L3*1153201103030000000001104
M101757800000453000014340
U603221030230005330277242
Eurasian028670410314400011200361726037
H.795.844.828.509.781.518.884.778.881.876.905.926.909.899.876.901.902.901.610.476.573.822
sd.011.011.022.043.020.058.007.027.012.011.016.028.011.014.016.019.027.005.029.037.037.037
N41627088192255932921101211375022627777582637235026814955
Figure 1.

Geographic distribution of ethnic groups in Guiné-Bissau. The boundaries may not correspond entirely to the precise distribution of the groups involved since overlapping areas do exist.

Populations of low sample size were pooled according to their linguistic affinities. The linguistic clustering presented in Table 1 is based on anthropological or linguistic classifications following Almeida (1939), Barros (1947), Carreira (1962, 1983), Almada (1964), Carreira & Quintino (1964), Hair (1967), Quintino (1967, 1969), Diallo (1972) and Lopes (1999). Some groups were left unpooled: the Balanta, for whom a Sudanese origin has been suggested, and the Bijagós because of their particular geographical location.

HVS-I and HVS-II Sequencing

The leukocyte fraction of whole blood was used for DNA extraction by standard methods and the mtDNA hypervariable segment I (HVS-I) of the control region was amplified and sequenced. Sequencing products were separated on a MegaBACE 1000 automatic sequencer, following the manufacturer's specifications and aligned using Wisconsin Package GCG Version 10.0. All sequences were read between nucleotide positions (nps) 16024 and 16400. Additional information regarding polymorphic sites 185, 186, 189, 195, 236, 297 and 322 in HVS-II was obtained by directly sequencing all samples that could not be unambiguously classified on the basis of HVS-I information alone.

RFLP Testing

In case of ambiguity in defining mtDNA haplogroups on the basis of the HVS-I haplotype, additional data was gathered from restriction fragment length polymorphisms (RFLPs) of diagnostic sites. All restriction digests were made according to the manufacturer's instructions (Fermentas and New England BioLabs). The following polymorphic restriction sites were screened: 322HaeIII, 1715DdeI, 2349MboI, 2758RsaI, 3592HpaI, 3693MboI, 4157AluI, 4685AluI, 5584AluI, 5656NheI, 7055AluI, 8616MboI, 10084TaqI, 10321AluI, 10394DdeI, 10397AluI, 10806HinfI, 11439MboI, 11641HaeIII, 12308HinfI, 13803HaeIII, 13957HaeIII, 14766MseI and 14868MboI. The following coding region sites were ascertained by sequencing: 2758, 4218, 12618, 13105 and 14182. Primers and PCR conditions used in all analyses are available as Complementary Material at http://www.ahg.com.

Haplogroup characterization

The HVS-I sequence types were classified following the nomenclature of African and European mtDNA haplogroups (Quintana-Murci et al. 1999; Macaulay et al. 1999; Rando et al. 1999; Alves-Silva et al. 2000; Chen et al. 2000; Richards et al. 2000; Richards & Macaulay 2001; Bandelt et al. 2001; Torroni et al. 1997, 2001; Mishmar et al. 2003; Salas et al. 2002). Here, and in what follows, the nucleotide position (np) number relative to the revised CRS (Anderson et al. 1981; Andrews et al. 1999) is used to designate haplotype-defining mutations. Character state change is specified only for transversions and insertions/deletions. Based on the previous knowledge of African complete sequences paraphyletic clade L1 is split into two monophyletic units L0, capturing previously defined L1a and L1d lineages, and L1 clade that includes L1b and L1c clades (Mishmar et al. 2003). The sub-clades of L0a (pro L1a) and L1b are defined as in Salas et al. (2002).

Haplogroup L2 is divided into L2a (characterized by 16294 and 13803), L2b (16114A, 16129, 16213 and 4158), L2c (322 and 13958), and L2d (16399 and 3693) sub-clades. Mutations 16278, 16362 and 10086 characterize haplogroup L3b; haplogroup L3d is defined by 8618 and shares with L3b a transition at np 13105. According to Bandelt et al. (2001) L3e (defined by 2352) is subdivided into L3e1 (16327), L3e2 (16320), L3e3 (16265T) and L3e4 (16264 and 5584) clades. L3e2 is further subdivided into L3e2* (14869) and L3e2b (16172 and 16189). As in Salas et al. (2002), L3f captures all L3* lineages with a mutation at 16209. L3f1 is further defined by a T at np 16292 (and 14766). Here we further define a new sub-cluster, L3h characterized by a loss of the DdeI site at np 1715 (mutation at np 1719) and the HVS I motif 16129, 16256A and 16362. Following Finniläet al. (2000) U5b is characterized by 5656 and 12618 over 14182. Haplogroup U6 (Rando et al. 1998) is defined by 16172 and 16219. Haplogroup M1 is characterized by 16129, 16189, 16249 and 10400 mutations (Quintana-Murci et al. 1999).

Genetic Analysis and Population Comparisons

Median networks of HVS-I haplotypes (Bandelt et al. 1995, 2000) were drawn for each haplogroup separately, using the Network 3.1 program (Arne Röhl, http://www.fluxus-engineering.com/sharenet.htm). Haplogroup frequencies, molecular diversity indexes (FST) and genetic diversity (H - Nei, 1987) for haplotypes and haplogroups and analysis of molecular variance (AMOVA) were calculated using Arlequin v2.0 (Schneider et al. 2000). Comparisons between populations were assessed by subjecting the (relative) frequency vectors of the haplogroups to a principal component analysis (PCA).

A local database with more than 19000 individuals taken from literature and our unpublished data from worldwide populations was employed to search for exact matches of Guiné-Bissau haplotypes, ignoring length variation in the C stretch of the HVS-I.

Coalescence times were estimated by means of the ρ statistic, assuming that a transition within 16090-16365 corresponds to 20180 years (Forster et al. 1996).

Results and Discussion

Haplogroup Profiles

The 372 Guinean samples clustered to 192 different haplotypes of all major West African mtDNA haplogroups (for the complete list see Complementary Material). Three predominant haplotypes (GB4, GB85 and GB117) captured 13% of the Guinean mtDNA variation, occurring at a frequency higher than 3% each. Most sequences (94%) could be classified as belonging to sub-Saharan African L0a1, L1b, L1c1, L2a, L2b, L2c, L2d1, L3b, L3d, L3e, L3f1 and L3h haplogroups and subhaplogroups. Unexpectedly for a West African population, 22 (5.9%) of the samples clustered to haplogroups M1 (1.1%), U5 (2.7%) and U6 (2.2%, Table 2; Graven et al. 1995; Watson et al. 1997; Rando et al. 1998; Salas et al. 2002). M1 and U6 are found in North and East Africa, Arabia, and the Middle East, whereas U5 has been sampled at appreciable frequencies only in Europe (Passarino et al. 1998; Quintana-Murci et al. 1999; Richards et al. 2000). The haplogroup profile for each ethnic group separately can be found in the Complementary Material.

L Lineages

Haplogroup L0 was represented in Guineans only by its daughter group L0a1 showing marginal frequencies ranging from 1% to 5% (Table 2), in contrast to its frequency in East African populations (e.g. 25% in Mozambique: Watson et al. 1997; Pereira et al. 2001; Salas et al. 2002). Interestingly, only the Balanta, a group claiming Sudanese origin, showed an increased frequency of this clade (11%). Haplogroup L0a has a Paleolithic time depth in East African populations (33,000 year old, Salas et al. 2002). The relatively young coalescent date of L0a1 in Guineans (6400±2600 years, assuming a single founder) suggests that only a small subset of L0a reached Guinea during the Holocene. The founder haplotype of L0a in Guineans, GB4 (see Table 4 in Complementary Material), has an exact match in East Africa, the Middle East and in Cape Verde and Senegal Mandenka populations, indicating that its spread is not strictly restricted to Guineans. The lack of the L0a2 clade, associated with the 9bp deletion in CoII/tRNALys intergenic region, and widespread in Bantu speaking populations all over Africa (Soodyall et al. 1996), suggests that L0a has at least two distinct phylogeographic patterns in Central and West Africa. We cannot discard the possibility of a Bantu migration to West Africa, as the founder group could have a distinct composition from those who participated in the southwards migration(s).

Haplogroup L1b is restricted mostly to West African populations (Graven et al. 1995; Watson et al. 1997; Salas et al. 2002) and is represented by two different branches in Guineans. Its major cluster (Figure 2) L1b1 is associated with a transition at np 16293 and includes a frequent sub-clade defined by the combined presence of a transversion to A at np 16114 and a transition at np 16274 that has also been observed in Senegalese Mandenka (Graven et al. 1995) and Wolof (Rando et al. 1998). L1b1 presents a TMRCA of about 36000 years (Figure 2), predating the diversity of L0a1 in Guineans. The matches in this cluster have a West African distribution well represented in Mandenka (haplotypes GB8 and GB20) and their frequency is highest in the Fulani-western and Senegal-eastern language groups (Table 2). GB23 and GB24 are widespread in Africa and are found in nearly all West African populations considered here (Salas et al. 2002). Another West African specific clade, L1c, is present at a relatively low frequency (0-8%) yet with high haplotype diversity in the Guiné-Bissau sample.

Figure 2.

Figure 2.

MtDNA phylogeny of all Guinean haplogroups and skeletons of various L0, L1, L2 and L3 sub-haplogroups based on HVS-I sequences and coding-region RFLPs. The number of individuals assigned to the haplotypes is shown within the circles. The numbers over the lines represent the HVS-I (-16000 bp) and coding region mutations, with respective restriction sites. Transversions are represented with suffixes (length variation in the C-stretch is disregarded). Recurrent mutations are underlined and a refers to the mutation loss relative to root. The star indicates the putative root of the haplogroup. Coalescence estimates ± sd (in ya) are shown for haplogroups or sub-haplogroups.

Figure 2.

Figure 2.

MtDNA phylogeny of all Guinean haplogroups and skeletons of various L0, L1, L2 and L3 sub-haplogroups based on HVS-I sequences and coding-region RFLPs. The number of individuals assigned to the haplotypes is shown within the circles. The numbers over the lines represent the HVS-I (-16000 bp) and coding region mutations, with respective restriction sites. Transversions are represented with suffixes (length variation in the C-stretch is disregarded). Recurrent mutations are underlined and a refers to the mutation loss relative to root. The star indicates the putative root of the haplogroup. Coalescence estimates ± sd (in ya) are shown for haplogroups or sub-haplogroups.

Figure 2.

Figure 2.

MtDNA phylogeny of all Guinean haplogroups and skeletons of various L0, L1, L2 and L3 sub-haplogroups based on HVS-I sequences and coding-region RFLPs. The number of individuals assigned to the haplotypes is shown within the circles. The numbers over the lines represent the HVS-I (-16000 bp) and coding region mutations, with respective restriction sites. Transversions are represented with suffixes (length variation in the C-stretch is disregarded). Recurrent mutations are underlined and a refers to the mutation loss relative to root. The star indicates the putative root of the haplogroup. Coalescence estimates ± sd (in ya) are shown for haplogroups or sub-haplogroups.

Haplogroups L2a-L2c are frequent in Senegambia (Table 2) and reveal signatures of a recent expansion from a limited number of founder haplotypes that are shared between populations of different linguistic affiliation. In contrast, haplotypes belonging to haplogroup L2d are represented by single individuals and do not show a common founder sequence (Figure 2). Fifteen out of 42 L2a haplotypes sampled in Guinea Bissau had matches elsewhere: West Africa (Cabo Verde, Brehm et al. 2002; Wolofs & Senegalese, Rando et al. 1998; Mandenka, Graven et al. 1995) but can also be found in East, South and North Africa. The geographic distribution of L2b and L2c haplotypes is largely restricted to West Africa. Not surprisingly most of the haplotype matches are with Cabo Verdeans, Wolof and Senegalese. L2c is the haplogroup that shows a higher extent of shared lineages: Cape Verde, Senegal Mandenka, mixed Senegalese and São Tomé. The last case is likely due to a recent gene flow from the Cape Verde Islands (Brehm et al. 2002). However, several L2 haplotypes observed in Guineans appeared as unspecific to other West African populations but shared matches with East and North Africans. This was the case for the Balanta (BLE) haplotype GB44 matching only with Sudanese (Watson et al. 1999), and GB59 matching with Moroccan sequences. Interestingly, haplotype GB83 (L2b) found in the Mansonca (MSW) group had an exact match only with Ethiopians (our unpublished data). Also the Fula haplotype GB39 has not been reported in West Africa but appears in East Africa: Lake Turkana (Watson et al. 1997), Nubia, Southern Sudan, Ethiopia and Saudi Arabia (our unpublished data).

Haplogroups L3b, L3d, and L3e are rare or absent in indigenous populations of North and South Africa but well represented in our sample. GB127 and GB134 are particular links of Guinean groups to Northwest African Mozabites, Moroccans and Senegalese. Particularly, GB136 from Fula-related people has been found so far in Hausa and again in Nubians and Sudanese. Apart from Mozambique (6%) the majority of L3d lineages are West African (7% in mixed Senegalese to 12% in Niger/Nigeria) with an estimated age of 42100 (±10600, Salas et al. 2002). L3f is more frequent in Southeast Africa, ranging from 8% in Kenya/Sudan to 2% in Mozambique. The coalescence time of this haplogroup in West Africa was calculated as 39400 ya (±10400, Salas et al. 2002), within the error range of the estimate based on Guinean samples (49350±16200 ya). Haplotype GB178 in Fula shared an exact match with sequences from a wide range of East-African populations (Somalia, Egypt) and even Saudi Arabia. Haplogroup L3h is found in Ethiopia, Cape Verde and Niger/Nigeria at marginal frequencies (∼1%) but reaches its highest known frequency in the Ejamat from Guinea (8%). Its coalescent time estimate (14000±8400 ya) in Guineans is consistent with its late Pleistocene/early Holocene spread around Africa.

No significant differences between Guinean ethnic groups pooled by their linguistic affiliation were observed in haplogroup frequencies. As for their geographic neighbours (Table 2), haplogroups L1b, L1c, L2b, L2c, L2d, L3b, L3d, and L3e cover most of the mtDNA variation (64-85%). The Guiné-Bissau sample shows an overall genetic diversity of 0.901 (sd.005) that is significantly higher than among other samples from West Africa (Table 2).

M1 and U6 Lineages

Haplogroup M1 has been characterized as an East African remnant of the major Asian haplogroup M (Quintana-Murci et al. 1999). It has been found mostly in Ethiopian populations (17%), its characteristic HVS-I motif being also well represented in Egyptian and Sudanese populations along the Nile Valley (7-8%, Krings et al. 1999). HVS-I haplotypes matching the East African M1 clade have also been identified in Northwest Africans (Plaza et al. 2003, unpublished data) where their frequency can reach 12.8% in Algerians and 4% among Moroccan and Algerian Arabs and Berbers. M1 is generally absent from autochthonous West African populations but was found among Balanta, Baiote, and Djola groups speaking Niger Congo Atlantic Bak languages. The Guinean M1 haplotypes matched exactly one West Saharan (Rando et al. 1998), 2 Mozabites (Côrte-Real et al. 1996), 2 Iranian and one Saudi Arabian sequence (unpublished data). This lineage derives from a particular cluster defined by a mutation at position 16185, which is also found in Ethiopia, Morocco and North African populations (Plaza et al. 2003, our unpublished results).

Haplogroup U6 is rather frequent in NW Africa, among Algerian Berbers, Moroccans and Mauritanians (Côrte-Real et al. 1996; Rando et al. 1998; Plaza et al. 2003), but is rare or absent in western sub-Saharan Africans. Three different U6 haplotypes were observed in Fula, Mandenka and Manjaco groups. These haplotypes match with sequences of a wide geographic range: North and West Africa (Cabo Verde, Tuareg, Mozabites, Moroccan Arabs and Berbers), East Africa (Nile Valley, Egypt and Ethiopia), the Middle East (Iran) and Mediterranean Europe (Sicily and Portugal, http://www.ahg.com/), suggesting that their spread might be related to the southern expansions of the Berber groups to whom the Fulani languages relate.

European Lineages: U5

Ten individuals out of 372 samples, all related to Fulbe groups, carried mtDNA variants typical of western Eurasia, particularly Europe. Within these mtDNAs belonging to haplogroup U5 nine Fulanis share one particular HVS-I haplotype. Both haplotypes are only one mutational step away from a common node widespread in Europe. Although U5 is one of the most frequent mtDNA variants among western Eurasians (about 460 sequences in our mtDNA HVS-I database) no exact matches to the two Guinean haplotypes were found, as would be expected in the case of recent admixture. On the other hand, the Fulani U5 haplotype appears in a data set of West Africans (Wolof and Serer, Rando et al. 1998) and in Moroccans (unpublished data), pointing to the existence of a common African founder lineage of haplogroup U5. Again, as in haplogroup U6 the linguistic correlation suggests that the spread of the haplotype in Senegambia might be related to the movement of Berber populations. More data from North and West African populations is needed to better characterize the source and the time of the spread of this founder lineage.

AMOVA and Principal Component Analysis

Analysis of molecular variance (AMOVA) in African populations attributed 15.6% to differences between groups, 3% to variation between populations within groups, and 81.6% to differences within populations (overall FST= 0.184, P < 0.0001). A hierarchical structuring of populations into groups based on religion beliefs (Muslims vs. Animists) and geography (interior vs. littoral) gave similar values (data not shown).

A principal component (PC) analysis distinguished North Africans from sub-Saharans (Figure 3). The difference revealed by the first component is likely due to the presence of Eurasian mtDNA lineages among the North Africans and a relatively higher frequency of haplogroups L2a, L2c, L2d and U6 in Northwest Africa. The second component reflects L2/L0 frequencies. Moroccan Berbers and Arabs and Algerian Berbers are plotted close to Egyptians, supporting a common origin, while Algerian Arabs are placed apart. The Nile Valley sample occupies an intermediate position between Ethiopia and North Africans. The populations from Mozambique appear isolated and well differentiated from Kenya and Sudan. All the West Africans form a distinct and more compact cluster. Nevertheless the isolation of Senegalese Mandenka (Sm) and the Fula from Guiné (e) should be noted. As a whole, Guinean groups are closer to West and then East Africa (see Axis 1, Figure 3).

Figure 3.

PCA of African populations based on data from Table 2 but excluding the! Kung/Khwe. Population codes are as follows: Mb (Morocco Berbers), Ab (Algerian Berbers), Ma (Morocco Arabs), Aa (Algerian Arabs), Eg (Egypt), Nv (Nile Valley), Et (Ethiopia), KS (Kenya/Sudan), Mo (Mozambique), Sm (Senegal Mandenka), Sx (Senegal mixed), NN (Niger/Nigeria), CV (Cabo Verde). Guinean ethnic groups were grouped (from a to g) as in Table 2. Axis 1 extracted 70.2% and axis 2, 12.5% of the total variation.

Final Remarks

Roughly 87% of the mtDNA lineages found in the Guinean populations are common in other West African populations. Not surprisingly, the highest number of matches was with Cape Verde followed by other populations from the area (Mandenka, Wolof, Fulbe), but also with Morocco. The notable L haplotype sharing with North Africans testifies to the absence of a real barrier between this region and typical sub-Saharan populations. On the other hand, some Guinean groups (Fula and Balanta for instance) present haplotypes otherwise observed to date in East-African and Middle East populations.

It is interesting to note that the Bantu-associated markers L0a 9bp del CoII/tRNALys (Soodyall et al. 1996), L3b motif 16124-16223-16278 (Watson et al. 1997), L3e1 particularly L3e1a characterized by mutation 16185 (Bandelt et al. 2001) or the 16192 L2a1 subclade (Pereira et al. 2001), were not found in our sample. This suggests that either Bantu migrations contributed very little to the gene pool of Guineans, despite the evidence of a Bantu migration starting from Cameroon and spreading towards Ghana, Nigeria, Burkina Faso and Mauritania, or that they had a distinct gene pool from that associated with the southwards migrants. The lack of Bantu branches of the Niger-Congo linguistic family, among a plethora of languages spoken in Guiné-Bissau, is more in agreement with the first hypothesis.

The finding of haplogroup M1 lineages of East African origin, albeit at low frequencies (3-5%) in Guinean groups with linguistic affinities to the Bak superfamily including Balanta, Baiote and Ejamat languages, supports the earlier suggestion of a Sudanese origin of the Balanta population and their spread to western Africa with kushitic migrants approximately 2000 years ago. Obviously, thereafter they were assimilated within the local population, acquiring their language. In particular the 16185 mutation might suggest a route through North Africa. The U6 presence in the Guinean pool, although at a low frequency, is not surprising, as these particular lineages have already been reported for this region. It seems plausible that the U5 lineages observed in the Fula arrived in Guiné via Sahel from North Africa before the slave trade. None of the typical European haplogroups (H, J, and T) were found in the present-day population of Guinea, whereas they exist at a fairly high frequency in North Africa in contrast to the U5 frequency (only 4.5%). This makes it less likely that the presence of U5 in Guiné, in particular, and in Northwest Africa in general, is due to recent admixture with the European population. A possible ancient migration from Asia to Africa was proposed by Cruciani et al. (2002) to explain the presence of some unusual Y-chromosome lineages identified in West Africa. Haplogroup R1 (defined by M173 mutation), without further branch defining mutations (M269 and M17) specific to Europeans, accounted for ∼40% of the Y-chromosomes in North-Cameroon, while not yet having been sampled elsewhere in Africa. More data from Central and Western Africa are needed to cast light on the origin of such idiosyncratic mtDNA and Y chromosome lineages. Thus, our U5 sequences from the Guinean Fulbe people corroborate Cruciani's hypothesis of a prehistoric migration from Eurasia to West Sub-Saharan Africa, testified by their present day restricted and localised distribution.

Acknowledgments

The authors are grateful for the precious help of the Chairman of the Joint Chiefs of Staff and the Ministry of Health from Guiné-Bissau. ICCTI (Lisbon, Portugal) and the Regional Government of Madeira provided financial support to AB. AR is a recipient of a Ph.D. scholarship from Fundação para a Ciência e Tecnologia (FCT, Lisbon) reference SFRH/BD/12173/2003. TK was supported by the Estonian basic research grant 4769. We are also grateful to the contributions from two anonymous reviewers to an early version of the manuscript.

Electronically Available Data

HVS-I and HVS-II haplotypes and their distribution among ethnic groups from Guiné-Bissau are available as Complementary Material at the web site http://www.ahg.com/. A list of the PCR primers and conditions used to amplify all pertinent mtDNA regions are also included in the Complementary Material web site.

Ancillary