In search of the Pre- and Post-Neolithic Genetic Substrates in Iberia: Evidence from Y-Chromosome in Pyrenean Populations


*Corresponding author: Dr. Eduardo Arroyo-Pardo, Laboratorio de Genética Forense y Genética de Poblaciones, Departamento de Toxicología y Legislación Sanitaria, Facultad de Medicina, Universidad Complutense de Madrid, E-28040, Madrid, Spain. E-mail:


The male-mediated genetic legacy of the Pyrenean population was assessed through the analysis of 12 Y-STR and 27 Y-SNP loci in a sample of 169 males from 5 main geographical areas in the Spanish Pyrenees: Cinco Villas (Western Pyrenees), Jacetania and Valle de Arán (Central Pyrenees) and Alto Urgel and Cerdaña (Eastern Pyrenees). In the Iberian context, the Pyrenean samples present some specificities, being characterizeded by a high proportion of chromosomes R1b1b2-M269 (including the usually uncommon R1b1b2d-SRY2627 and R1b1b2c-M153 types) or I2a2-M26 and low proportions of other haplogroups. Our results indicate that an old pre-Neolithic substrate is preponderant in populations of the whole Pyrenean fringe. However, AMOVA revealed a high level of substructure within Pyrenean populations, partially explained by drift effects as well as by the signature of an ancient genetic differentiation between Western and Eastern Pyrenees.


The Pyrenean mountain chain covers 430 km between the Atlantic and the Mediterranean shore separating the Iberian Peninsula from the rest of the European mainland (Fig. 1). Hundreds of deep valleys can be found scattered among peaks as high as 3000 meters and more. Towards the west, the Pyrenees gradually join the Cantabrian Sea while descending abruptly into the Mediterranean. For the most part the main crest forms the Franco-Spanish frontier. The Spanish valleys run from the top of the mountains down to the south; however, some of them, such as the Valle de Cerdaña, lie east-west. The Pyrenees, are typically divided into three sections: the Atlantic or Western, the Central, and the Eastern. The Central Pyrenees spread westwards from The Valle de Arán to the Port of Canfranc. On the west side, the Atlantic Pyrenees have an elevation which gradually decreases from east to west. In the Eastern Pyrenees the average elevation is relatively uniform, with some exceptions, until the extreme in the east in which a sudden decline occurs. Four languages are spoken now on the Spanish slope of the Pyrenees: Basque, Castillian, Occitan and Catalan, first is a non-indoeuropean relict form and the rest are derived from Latin. Orographical and geographical characteristics of the Pyrenees create numerous valleys favouring the isolation of small populations in which microevolutive processes determine their genetic structure. Such circumstances likely played a major role in shaping current genetic structure in present-day Pyrenean populations.

Figure 1.

Map of the Iberian Peninsula, with zoom in the Pyrenean region, and Y-SNP profiles in the 5 Pyrenean samples studied (Cinco Villas, Jacetania, Valle de Arán, Alto Urgel and Cerdaña). The approximate location of the three Basque capitals- San Sebastian (Gipuzkoa), Bilbao (Biscay) and Vitoria (Alava) – are included as a reference for the Basque Country. Y-SNP profiles from the Basque and Iberian samples of Alonso et al. (2005) are also presented. In the pie charts, dark blue refers to R1b1b2d, dark light to R1b1b2c, white to other R, orange to I2a2, salmon to other I and pink to lineages that entered Iberia from the Neolithic onwards. The light gray slice refers to R plus I lineages.

In the past decades, many biodemographical and genetic studies on Pyrenean populations have addressed important questions such as which linguistic or orographical barriers were related to population structure (Moral, 1988; Aluja et al. 1993; Malgosa et al. 1988; Nogues et al. 1992; Vergnes et al. 1980). However, the emerging panorama on the Pyrenean genetic diversity is based on quite fragmentary studies often focusing on the Basques. Molecular studies of the northern Navarre province have been carried out (Calderón et al. 2000, 2003; Peña et al. 2002), concerning GM and KM inmunoglobulin allotypes, three STRs loci from HLA region, DYS19 microsatellite and the YAP markers from Y chromosome. The studied sample was composed mainly of individuals of Baztan Valley (Western Pyrenees, Navarre province). It is also worth citing a study of a general population from Navarre province in relation to autosomal STRs (Barbero et al. 2000). These studies had the general aim of investigating the particular link between people of the Basque Country and Navarre province, and results of these contributions show a genetic relationship between them. It can be questioned whether a few widespread ideas about Pyrenean, or even Iberian, diversity might be supported by more extensive studies encompassing populations from the entire Pyrenean range. For instance, it is widely accepted that Iberian populations preserve a high degree of a pre-Neolithic genetic pool that has persisted despite the complexity of demographic events that took place in the Peninsula (Semino et al. 2000, Flores et al. 2004). While the Basques are often seen as the best representatives of this ancient Iberian genetic component (Flores et al. 2004), two questions can be raised. One is about which Basques those are, since recent studies showed the lack of homogeneity of what was usually considered a coherent Basque population core (Manzano et al. 1996; Iriondo et al. 2003; Perez-Miranda et al. 2005). The second is whether there are other Pyrenean populations, as yet unstudied, that maintained at least as well as Basques the signals of the pre-Neolithic gene pool in the Peninsula.

Another issue that has produced conflicting evidence is whether within the Pyrenean range genetic diversity can be envisioned as gradient-like or not. A few studies based upon classical markers such as blood groups, proteins and HLA antigens, suggested the existence of an east–west gradient, positioning the Basque (Calafell & Bertranpetit, 1994a, 1994b), again, at the genetic Western extreme of a pattern marked by strong geographical continuity (Dugoujon et al. 1989; Hazout et al. 1991). However, immunoglobulin (Ig) data reported for the Northern and South sides of the Pyrenees (Esteban et al. 1998; Giraldo et al. 2001) failed to demonstrate such a pattern and the Ig variation in the region was more likely explained by microdifferentiation processes which occurred locally by chance, rather than by any ancient settlement history shared by the Pyrenean inhabitants. Recently, the analysis of Y-chromosome markers also led to the conclusion that genetic drift played a significant role in modelling diversity in the Basques (Alonso et al. 2005), but similar data have not yet been produced for other nearby Pyrenean populations.

In order to address some of the above-mentioned questions on Pyrenean genetic diversity, we have performed a high-resolution analysis of Y-STRs and Y-SNPs in 169 males belonging to five previously uncharacterised Pyrenean populations: Valle de Arán, Alto Urgel, Cerdaña, Jacetania and Cinco Villas. The major aims of this study were 1) to obtain extensive Y-chromosome data from populations scattered throughout the west to east Spanish area of the Pyrenees; 2) to compare the results on the studied populations with those previously published for the neighbouring Basques and for Iberia as a whole; and finally, taking advantage of phylogeographic analyses, 3) to obtain additional hints on the history of Y-chromosome lineages and indirectly on how their carriers have contributed to build the present-day Pyrenean genetic landscape.

Material and Methods

Sample Collection

Blood samples were obtained from unrelated healthy volunteer donors at 5 central hospitals from the Pyrenees. Sample sizes and geographical locations (Fig. 1) were: 25 males from Valle de Arán (Lérida province), 34 from Alto Urgel (Lérida province), 37 from Cerdaña (Gerona province), 31 from Jacetania (Huesca province) and 42 from Cinco Villas (Navarre province). All individuals had all four grandparents born in the region and lived in small hamlets in the vicinity of the central hospitals where sampling was made. These samples partially overlap those previously studied by López-Parra et al. (2004).

Y-Chromosome Typing

Y-STRs from the minimal haplotype plus DYS437, DYS438, DYS439 and GATA H4 were typed as previously described in López-Parra et al. (2004). The Y-STRs nomenclature is according to Gusmão et al. (2006). Analysis of 27 Y-chromosome SNPs was carried out using different methods (see Supporting Information Table S1). Haplogroup designation is according to Karafet et al. (2008).

Data Analysis

For statistical analysis of haplotype diversities, Mean number of Pairwise Differences (MPD) and Mismatch Distributions (MD) over Y-STRs Arlequin software version 3.01 was used (Excoffier et al. 2005). Due to the absence of published data for R1b1b2c in most Iberian populations, this haplogroup was included in R1b1b2*(xR1b1b2d) in order to perform statistical analyses. Arlequin 3.01 software was also used to perform analysis of molecular variance (AMOVA). DYS385 was not considered and the number of repeats in DYS389I was subtracted from DYS389II.

Median Joining (MJ) networks were constructed with Network version 4.1.1 (Bandelt et al. 1999) using an STR variance-based weighting as described in Qamar et al. (2002). The R1b1b2d network with the entire population contained high dimensional cubes, which were resolved by using the reduced median algorithm to generate a file, and then applying the median joining network method to this file.

Time-to-most-recent-common-ancestor (TMRCA) was estimated within Network from the ρ-statistic, using an average mutation rate per Y-STR locus at 6.9 × 10−4 per generation of 25 years (Zhivotovsky et al. 2004). TMRCAs for I2a2 and R1b1b2d were exclusively based on the Pyrenean lineages detected in this study. As for R1b1b2c, due to the scarce number of Pyrenean male carriers, we additionally considered Basques chromosomes falling in R1b1b2c (García, pers. comm.) under the assumption that the Atlantic fringe neighboring the Pyrenees might have been the region of origin of the haplogroup.

Results and Discussion

Y-STR Haplotype Diversity

Taking into account the 12 Y-STRs here examined, the 169 Pyrenean males harboured a total of 141 different haplotypes, out of which 122 were observed only once (Supporting Information Table S3).

Levels of haplotype diversity or mean pairwise differences between haplotypes were similar in Jacetania, Cerdaña and Alto Urgel, with the three being characterized by high values relative to those found in Cinco Villas and Valle de Arán. In order to compare the values across different Iberian populations, diversity indexes were recalculated for a subset of 7 Y-STRs (DYS19, DYS389I, DYS389b, DYS390, DYS391, DYS392, DYS393) and are presented in Table 1. Jacetania, Cerdaña and Alto Urgel Y-STRs diversity does not present any distinctiveness relative to most other Iberian populations. On the contrary, Cinco Villas and especially Valle de Arán show considerably lower indexes, that although rather uncommon in Iberia are nonetheless of the same magnitude as those reported for Basque populations such as in Alava, Biscay and Gipuzkoa (Table 1).

Table 1. Y-STR Diversity indexes. N: simple size, K: different haplotypes, GD: Haplotype Diversity, MPD: Mean Pairwise Differences
Spain-General148990,9834 ± 0,00493,9173 ± 1,9752Martín et al. 2004
Cantabria107670,9741 ± 0,00743,7708 ± 1,9156Zarrabeitia et al. 2001
Barcelona2241310,9778 ± 0,00533,7270 ± 1,8889Gene et al. 1999
Galicia53380,9695 ± 0,01443,9252 ± 1,9988González-Neira et al. 2000
Jacetania31240,9720 ± 0,01973,8624 ± 1,9937Present study
Valle de Arán25130,8633 ± 0,06042,4833 ± 1,3868Present study
Cerdaña37260,9730 ± 0,01473,4459 ± 1,8005Present study
Alto Urgel34270,9875 ± 0,00973,3636 ± 1,7676Present study
Cinco Villas42220,9640 ± 0,01152,8618 ± 1,5366Present study
Andalucía56450,9890 ± 0,00694,0078 ± 2,0333González-Neira et al. 2000
Asturias120860,9856 ± 0,00493,9157 ± 1,9771Martínez-Jarreta . 2003
Álava1190,9636 ± 0,05102,8909 ± 1,6429García (pers. comm.)
Biscay75360,8995 ± 0,03042,8930 ± 1,3369García (pers. comm.)
Guipuscoa72250,9174 ± 0,01652,5156 ± 1,3709Garcia (pers. comm.)
Navarra1050,8222 ± 0,09691,1556 ± 0,8120Garcia (pers. comm.)
SW Spain111920,9941 ± 0,00283,9526 ± 1,9943Gamero et al. 2002
Valencia140910,9830 ± 0,00453,8513 ± 1,9472Aler et al. 2001
North Portugal2901650,9861 ± 0,00263,9599 ± 1,9883Beleza et al. 2006
Center Portugal1861070,9840 ± 0,00363,8402 ± 1,9395Beleza et al. 2006
South Portugal138850,9799 ± 0,00603,8173 ± 1,9326Beleza et al. 2006

Out of the 5 Pyrenean regions here considered, those with the more reduced Y-STRs diversities – Cinco Villas and Valle de Arán – are those of more difficult geographical access. Cinco Villas is located in a remote area of the North-Western edge of the Pyrenees, being still today a Basque-speaking region in which Basque surnames remain. Thus, the variation in the magnitude of observed diversity is explained mainly by geographical conditions.

Y-SNP Variation

The haplogroups detected in each of the 5 Pyrenean populations are shown in Figure 2. A common feature of the Y-haplogroup profiles in the 5 populations is the reduced frequency of male lineages that putatively entered Iberia from the Neolithic era onward (Fig. 1). Representatives of this male component carried haplogroups C, E1b1b1, G and J, which in the whole Pyrenean sample, accounted for 8.9% of lineages although the proportion was rather unevenly distributed across the 5 Pyrenean populations: 4% in Valle de Arán, 8.1% in Cerdaña, 12.9% in Jacetania, 20.5% in Alto Urgell and absent in Cinco Villas. As a comparison we can take the non-Basque Iberians (N = 692) studied by Alonso et al. (2005) among whom nearly 30% of Ys could be associated with the above mentioned component, in clear contrast with the proportion of 8.9% in the Basque sample (N = 168) from the same study (coincidentally, it exactly matched our estimate for Pyreneans).

Figure 2.

Phylogenetic tree of Y-SNP haplogroups. Biallelic markers are displayed in each branch. Y-chromosome haplogroup frequencies in each of the Pyrenean populations.

Therefore, concerning the relative amount of the post-Neolithic genetic substrate, most Pyrenean populations resemble Basques more than they resemble other Iberians.

Another link between all Pyreneans now studied and Basques is the extremely high frequency of haplogroup R1b1b2, defined by M269 which is a mutation widely assumed to have arosen in Palaeolithic times. R1b1b2 is by far the most common haplogroup in Western Europe accounting for more than 50% of Iberian Y chromosomes (Rosser et al. 2000; Bosch et al. 2001; Cruciani et al. 2002; Maca-Meyer et al. 2003; Flores et al. 2004; Brión et al. 2004; Alonso et al. 2005; Beleza et al. 2006). However, in Basque populations, lineages containing the pre-Neolithic M269 can reach frequencies up to 80% (see Supporting Information Table S2).

Genetic Structure of the Pyrenees

Despite some common traits that characterize the 5 Pyrenean populations, there are other aspects of their male pools that contribute to marked genetic differentiation between some of them. When AMOVA was conducted over biallelic marker data considering one group containing the 5 Pyrenean populations, a significant proportion of variance was attributed to differences among populations (Fst = 6.92%; p = 0). This Fst was 2.5 times higher than the value obtained when AMOVA was applied to all of the Iberian populations from Supporting Information Table S2 (Fst = 2.73%; p = 0), and 3.5 times higher than produced when all Iberians except Pyreneans were analysed through one-group AMOVA (Fst = 1.98%; p = 0). This means that a substantial degree of substructure does exist in Pyrenean populations.

Phylogeographical Analyses

Haplogroup R1b1b2d

One of the most interesting findings of this study was the very high frequency of R1b1b2d in all Pyrenean samples except Cinco Villas. In this last population, R1b1b2d was just carried by one individual, which in fact agrees fairly well with the general distribution of the haplogroup in Iberians. In Europe, the mutation defining the subclade R1b1b2d – SRY2627 (synonymous of M169) is a relatively rare variant within R1b1b2, being commonly assumed to be typical of Iberian populations. Nearly everywhere in the Peninsula it occurs at 2–7% frequency and up to now the highest values have been found in the North East corner, namely among Basques (4–11%, Maca-Meyer et al. 2003, Hurles et al. 1999) and Catalans (22%, Hurles et al. 1999). Frequencies of 13% in Jacetania, 18% in Alto Urgel, 19% in Cerdaña and especially of 48% in Valle de Arán, can be found in nearby regions which have never been observed before. Furthermore, all Pyrenean R1b1b2d chromosomes had molecularly close STR backgrounds, as can be seen in Figure 3a.

Figure 3.

Networks of R1b1b2d (a and b), R1b1b2c (c) and I2a2 (d and e). a and only contain samples from this study. The remaining networks in addition to our samples also contain samples from Basques (Hurles et al. 1999, Garcia, pers. comm.), French, Catalans (Hurles et al. 1999) and other Iberians (Hurles et al. 1999, Beleza et al. 2006). TMRCA: Time-to-most-recent-common-ancestor. The nodes marked with * were used as ancestral for the TMRCA estimates. Below each network is indicated the panel of STRs used for its construction.

The origin of R1b1b2d was previously addressed by Hurles et al. (1999) who, based on a worldwide survey of SRY2627 likely placed its origin in Iberia, and further estimated that it occurred only a few thousand years ago (1650–3450 BP). In this study the highest frequencies of R1b1b2d were observed in Basques (11%) and in Catalans (22%) from Girona, which is located very near to the southern limit of the Pyrenees mountains. Meanwhile, new data were published for other Basque samples (Alonso et al. 2005) and R1b1b2d was found to be less well represented than previously reported: 4.17% in Basques from Biscay, 4.55% from Alava and Navarre, and absent in Basques from Gipuzkoa.

Summarizing the available data, it appears that R1b1b2d is much more frequent in the Central/Eastern Pyrenees than in other Iberian regions. Such a distribution pattern makes it unlikely that the Basque area could have been the region of origin of R1b1b2d. In Figure 3b, is displayed a network constructed with the existing microsatellite backgrounds in European R1b1b2d chromosomes. A star-like tree was obtained, with a well defined central and frequent haplotype shared by the majority of Pyrenean males but also by Basques and other Iberians. The central haplotype harbors the modal STR motif (14-13-29-24-11-13-13 for DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392 and DYS393, respectively) of the European R1b1b2-M269 lineages, indicating that the derived SRY2627 mutation must have arisen in that ancestral background. The tree also shows that most of the Pyrenean R1b1b2d lineages are only one or two steps distant from the central haplotype, which suggests a focal origin and subsequent expansion of R1b1b2d. In the case of the Catalans from Girona, there are no chromosomes carrying the central STR motif and a large proportion of haplotypes occupy peripheral positions in the tree. The presence of very divergent haplotypes can explain the high average variance across Y-STR loci observed in the Catalan sample; 0.286, compared to 0.265, 0.211, 0.143 and 0.085 in Cerdaña, Gipuzkoa, Alto Urgel and Valle de Arán, respectively. However, and not disregarding the possibility that R1b1b2d might have originally appeared in a region close to the Mediterranean edge of the Pyrenees, the network analysis indicates that at least the incidence of R1b1b2d in Girona seems more compatible with repeated influx of lineages rather than mainly local molecular diversification.

At the moment, the best explanation for the known distribution of R1b1b2d is that the subclade originated somewhere in the Central/Eastern Pyrenees, but to confirm this hypothesis it would be essential to obtain data from the French side of the mountain range.

The scarceness of R1b1b2d outside Iberia (France excepted, where frequencies of 3–7% were reported; Hurles et al. 1999) points to an unbalanced lineage flux that predominantly occurred towards the Iberian side of the Pyrenees. If so, the dispersion of R1b1b2d could have been driven by the Neolithic farmers entering Iberia, but the young age reported by Hurles et al. for R1b1b2d apparently contradicts this scenario. However, the time to the most-recent common ancestor (TMRCA) of the Pyrenean R1b1b2d lineages was here estimated at 7383 ± 1477 years ago, which is consistent with an early dispersion of R1b1b2d all over the Pyrenees and subsequent dissemination outside the mountain range from the Neolithic era onwards. The much younger age estimated by Hurles et al. (1999) for the SRY2627 mutation can, nevertheless, be explained by the mutation rate used (2.1×10−3, for microsatellites), which does not take into account evolutionary considerations (see Zhivotovsky et al. 2006).

Haplogroup R1b1b2c

Another subclade of R1b1b2 found in the Pyrenees was R1b1b2c, defined by M153. It was only detected in Cerdaña (2.7%) and Cinco Villas (14.3%), which are populations located in the Eastern and Western limits, respectively, of the examined Pyrenean area. R1b1b2c was firstly reported in Basques, at 15.6% frequency, by Underhill et al. (2000). Since then, R1b1b2c has been exclusively detected in Iberians or Iberian descendents, and always at very low frequencies in non-Basque populations (Alonso et al. 2005, Flores et al. 2004). Within the Basques studied by Alonso et al. (2005), R1b1b2c did not reached such high frequencies as in the original report; 11.1% in Basques from Biscay, 5.4% from Gipuzkoa and absent in Basques from Alava and Navarre. The authors further postulated that within Iberia the best candidate region for the origin of R1b1b2c was somewhere within or near to the Basque Country.

The results now obtained for Cinco Villas (as previously described, also a Basque-speaking region) give additional support to the presumed geographical origin of R1b1b2c. This haplogroup in Cinco Villas constitutes the second highest frequency ever found in a population. The first was in the Basque sample studied by Underhill et al. (2000), but unfortunately no details about the specific region of the individuals were provided.

Figure 3c shows a median-joining network relating the haplotypes defined by 10 common STRs of R1b1b2c lineages found in this study and in Gipuzkoa and Biscay (García, pers. comm.). The TMRCA of the depicted lineages was estimated at 8453 ± 2711 years ago, therefore pointing to an earlier origin of R1b1b2c compared to R1b1b2d. Note that the age now obtained for R1b1b2c is far more recent than that previously reported by Alonso et al. (2005) (17900–21300 years), which can be explained because the calculations rely on different methods (ours were demographic model free) and assumed mutation rates. However comparing the average STR variances of the R1b1b2c (0.243), R1b1b2d (0.207) and I2a2 (0.278) lineages considered in this study and given the replicated estimates pointing to a Mesolithic time frame for the origin, diversification and diffusion of the I2a2 clade (Rootsi et al. 2004), the temporal interpretation here provided for R1b1b2c seems reliable.

Two R1b1b2c haplotypes from Biscay were found to be molecularly very differentiated from any other lineage, which surely accounts for the large average variance across Y-STR detected in Biscay (0.388) compared to Cinco Villas (0.185). The detection of such divergent haplotypes can indicate that many intermediate R1b1b2c haplotypes were either lost or still remain unsampled.

Paragroup I

Individually, the second most common haplogroup among our Pyrenean samples was haplogroup I which accounted for 12.4% (21/169) of paternal lineages. This haplogroup has a remarkable continental specificity and within Europe it represents around 18% of Y-chromosomes showing a decreasing East-West cline (Rootsi et al. 2004). In the sample of Iberians studied by Alonso et al. (2005) (enrolling 168 Basques and 692 non-Basques) only 7.4% of men carried I chromosomes.

The unique subclade of I assessed in this study was I2a2, defined by M26. It was present in 7.7% of the Pyreneans, which encompassed almost 2/3 of their I lineages, whereas in the Iberians referred to above it was found at 2.6% frequency totalling almost 1/3 of haplogroup I representatives.

From the 5 studied Pyrenean populations only Alto Urgell fitted well the overall panorama of I2a2 discerned in the whole of Iberia, but again a series of frequencies such as 9.5%, 9.7%, 8% and 8.1% in Cinco Villas, Jacetania, Valle de Arán and Cerdaña, respectively, is quite uncommon for Iberian populations. Exceptionally within Spain, two higher values were reported by Flores et al. (2004) for Cadiz (South Spain) with 10.7% and Castilla (Central Spain) with 19%.

It appears that haplogroup I arose in Europe before the Last Glacial Maximum (LGM) and many of its subclades seem to have played a central role in the process of the human recolonization of Europe from isolated refuge areas after the LGM (see Rootsi et al. 2004 for a review). Among the I subclades, I2a2 is virtually absent east of the Italian Apennines and shows the highest incidences in north-eastern Iberia/southern France, with the exception of the isolated and dramatic peak of frequency (40.9%) in Sardinia (Rootsi et al. 2004). This was well illustrated in the phylogram of I2a2 in Figure 1 from Rootsi et al. (2004) based upon the population data available at the moment, within which Basques (Spanish and French mixed) and Bearnais (in the French Atlantic Pyrenees) showed the highest continental frequencies (6% and 7.7% respectively). New data from Spanish Basques (Alonso et al. 2005) did not reproduce such elevated values; in Biscay I2a2 was not detected and its frequency was 1.3% in Gipuzkoa and 4.5% in Alava plus Navarra. From the new data presented here it seems that the Pyrenees might indeed have been the region where I2a2 arose and from which it initiated the spreading process after the LGM. The network in Figure 3d of I2a2 haplotypes from mainland Europe does not contradict this scenario. Within a central haplotype lay chromosomes from the Pyrenean samples of Cerdaña, Cinco Villas and Alto Urgel, as well as from Spanish and French Basques and from Bearnais. There are several haplotypes, even from Pyrenean populations like Cerdaña, that are quite divergent from the inferred founder haplotype. Although we cannot discard the possibility that they may represent sporadic reintroductions in the region which occurred in more recent times, the age of I2a2 – estimated at 8000 ± 4000 years ago by Rootsi et al. (2004) and here reassessed, through TMRCA analysis restricted to Pyrenean chromosomes (Fig. 3e), at around 12000 ± 4000 years ago – is ancient enough to explain the accumulation of strong differentiation in a number of STR backgrounds.

Final Remarks and Conclusions

Our investigation of Pyrenean Y chromosome diversity provides an extended data framework to understand the importance of historical events or demographic movements on the genetic landscape of current human populations from the Pyrenees mountain range and also of the whole of Iberia.

Within the Peninsula, the Pyrenees constitute an extraordinary region which can be used to infer how different evolutionary forces might have interacted by modelling patterns of diversity in present-day populations. Actually, the complex orography of the mountain chain prevented easy interaction between people, and in certain regions it has created severe conditions of population isolation that likely fostered extensive drift. Simultaneously, due to the geographical location, the Pyrenees were always a natural bidirectional gateway for people arriving in and leaving Iberia.

However the genetic component in the Pyreneans that presumably entered Iberia from the Neolithic era onwards was estimated to be among the lowest for any population in Europe, only comparable with that observed among Basques. The absence of relevant North African inputs during recent historical times is understandable, since the Pyrenees were never under Islamic rule, which lasted for many centuries in other parts of Iberia. Yet, even in relation to other post-Neolithic contributions, our data indicate that the rate of assimilation of non-autochthonous Y-lineages was considerably lower in Basques and Pyreneans than on the rest of the Peninsula. In a certain way, the weight of a peculiar and adverse geography remains today remarkably imprinted in the male gene pool of the Pyreneans.

The reciprocal side of this pattern is the high proportion of an ancient substrate retained both in Basques and in Pyreneans. The Y lineages representative of what might have been a pre-Neolithic male genetic composition in Iberia, were those bearing the Palaeolithic mutations M269, including its Mesolithic derived branches R1b1b2c-M153 and R1b1b2d-SRY2627, plus those falling in the I clade defined by the Mesolithic M170. This set of lineages was encountered in 91.1% of Pyrenean men, and such a high value in Iberia is only typically found among Basque populations (it also represented exactly 91.1% of the Basques studied by Alonso et al. (2005). This result suggests that the Pyreneans, as well as the Basques, retained the legacy of the Iberian pre-Neolithic genetic composition.

Importantly however, the degree of the pre- and post-Neolithic contributions were found to be highly variable across different Pyrenean populations. In Cinco Villas the post-Neolithic influence was virtually nil (0.0%), whereas in Valle de Arán it was residual (4%). These two populations, the most isolated of the Pyrenean populations studied, show accordingly reduced levels of Y-STR diversity. Within the latter population, isolation and particularly strong drift effects must have acted to produce such a peculiar Y-SNP profile in Valle de Arán that has no parallel in any Iberian population.

Male introgession in post-Neolithic times was comparatively much more influencial in Jacetania, Alto Urgel and Cerdaña, which can be explained because these three populations are located in strategic passage points, especially Cerdaña which has long been known to afford a relatively uncomplicated North-South way of crossing the Pyrenees.

Another question that our results can address is whether, within the Pyrenees, the diversity pattern observed is evidence of some degree of paternal continuity or whether, on the contrary, microdifferentiation processes prevailed, establishing marked discontinuities. Undoubtedly Valle de Arán testifies to the important role of drift in the genetic differentiation of isolated populations. Hence, similar to the appearance of a certain heterogeneity among the Basques, some differentiation also exists between Pyrenean populations, which was mainly determined by isolation and drift.

Notwithstanding, we found convincing indications that the mountain chain did not act as a substantial barrier to gene flow between populations. One of the signs comes from the concentration of high frequencies of I2a2 among populations from the entire Pyrenean range. Our data strongly reinforce previous evidence that I2a2 arose during Mesolithic times in a region close to or within the Pyrenees. The dispersal of I2a2 from its place of origin throughout the Pyrenees and beyond, implied not only gene exchange but also considerable movement of people. Very likely, the demographic event associated with the expansion of I2a2 was the Ice-age repopulation of Europe from the Franco-Cantabrian refuge. A number of studies on human mtDNA diversity have already indicated that the Franco-Cantabrian glacial refuge was a major source for the European gene pool (Achilli et al. 2004), and our data on I2a2 seemingly lend support to the role of the region as a Mesolithic diffusion center of male lineages.

Evidence that a few thousand years later the mountain chain also permitted gene flow between populations emerges from the distribution of R1b1b2d. Its origin goes back to approximately seven thousand years ago, being widely spread and usually well represented in Pyrenean populations, meaning that despite the intricacy of the Pyrenean orography, the movement of people was maintained over time along the mountain chain, promoting substantial gene flow, even over a linguistic barrier.

Although some continuity is discernable within the paternal diversity of Pyrenean populations it was not possible to identify any clear-cut gradient of variation, contrary to previous reports of a strong East-West cline using ‘classical’ markers (Calafell, 1994a). Several factors might have contributed to this; for example, different migrations at different times may have overwritten previous genetic signs, or the different levels of isolation of the valleys may have contributed to erase that gradient of variation. Moreover, the gradient observed when using autosomal markers could possibly have an undetectable counterpart on the Y-chromosome, that might be due to sex-biased demographic factors, which is an issue we will address with our ongoing investigations with mtDNA.

R1b1b2c ought to have a final comment in this work. Although one isolated occurrence was registered in Cerdaña, the most eastern Pyrenean population studied herein, its distribution is practically confined to the western Pyrenean side, leading us to suppose that the Atlantic fringe, inhabited by Basque-speaking populations, was the place where it first arose. From the associated STR diversity values, R1b1b2c appears to be nearly a thousand years older than R1b1b2d, a time lapse large enough for the haplogroup dissemination in a geographical range as broad as for R1b1b2d. Why then is R1b1b2c essentially restricted to Basque speaking populations? In fact, this distribution may be a simple result of chance and thus meaningless for tracing population histories. However, we cannot exclude that the spreading of R1b1b2c might have been somewhat limited by an interaction, not necessarily at random, between geographic and cultural factors. The Pyrenean archaeological record from the final Bronze Age (3100 BP – 2700 BP), reveals some East-West differentiation: the Eastern culture related to the Urnfield invaders (3100–2900 BP) was almost absent in the Western Pyrenees, where the influence of cromlech builders was clearly more visible (Ruiz Zapatero, 1995). An interesting parallel is observed in the distribution of R1b1b2c, a haplogroup that arose long before the Bronze Age, seeming to indicate that gene flow between inhabitants of the Atlantic vicinity of the Western Pyrenees was higher than between them and people from other Pyrenean locations.

The perspective provided by the Y-chromosome indicates that haplogroups such as I2a2 and R1b1b2d, which probably appeared for the first time somewhere within the mountain chain, spread throughout the entire mountain range. In contrast, the influx of lineages originating in other places, even quite close to the mountains' inner core, as appears to be the case for R1b1b2c, was extremely limited. In other words, the Pyrenees acted more as a donor than a receptor of male lineages.

To fully explore the meaning of our findings it is now essential to obtain data on the female counterparts of the Pyrenean gene pool.


The authors thank Lourdes Montsant (Hospital de Puigcerda), Carles Borrull and Ma Angeles Bordás (Hospital Vall D'Arán), Ferrán Gómez (Fundación Santo Hospital de La Seu d'Urgell), Isabel Maroto (Consorcio Hospitalario de Jaca), Rafael Eneterreaga and Javier González (Centros de Salud “Cinco Villas”). This work was financed by projects from Complutense University PR48/01-9837 and the Spanish Ministry of Science and Education CGL2006-07828/BOS and was partially supported by Programa Operacional Ciência e Inovação 2010 (POCI 2010), VI Programa Quadro (2002–2006). Some DNA controls were kindly provided by M. A. Jobling.