Migration Waves to the Baltic Sea Region


  • T. Lappalainen,

    1. Finnish Genome Center, Institute for Molecular Medicine Finland, University of Helsinki, Haartmaninkatu 8, P.O. Box 63,00014 University of Helsinki, Finland
    Search for more papers by this author
  • V. Laitinen,

    1. Department of Medical Genetics, University of Turku, Kiinamyllynkatu 10, 20520 Turku, Finland
    Search for more papers by this author
  • E. Salmela,

    1. Finnish Genome Center, Institute for Molecular Medicine Finland, University of Helsinki, Haartmaninkatu 8, P.O. Box 63,00014 University of Helsinki, Finland
    2. Department of Medical Genetics, University of Helsinki, Haartmaninkatu 8, P.O. Box 63, 00014 University of Helsinki, Finland
    Search for more papers by this author
  • P. Andersen,

    1. Department of Neurology, Umeå University Hospital, University of Umeå, 901 85 Umeå, Sweden
    Search for more papers by this author
  • K. Huoponen,

    1. Department of Medical Genetics, University of Turku, Kiinamyllynkatu 10, 20520 Turku, Finland
    Search for more papers by this author
  • M.-L. Savontaus,

    1. Department of Medical Genetics, University of Turku, Kiinamyllynkatu 10, 20520 Turku, Finland
    Search for more papers by this author
  • P. Lahermo

    Corresponding author
    1. Finnish Genome Center, Institute for Molecular Medicine Finland, University of Helsinki, Haartmaninkatu 8, P.O. Box 63,00014 University of Helsinki, Finland
    Search for more papers by this author

*Corresponding author: Päivi Lahermo, Finnish Genome Center, Institute for Molecular Medicine Finland, University of Helsinki Haartmaninkatu 8, P.O. Box 63,00014 University of Helsinki. Tel. +358-9-191 25476, Fax. +358-9-191 25478, E-mail: paivi.lahermo@helsinki.fi


In this study, the population history of the Baltic Sea region, known to be affected by a variety of migrations and genetic barriers, was analyzed using both mitochondrial DNA and Y-chromosomal data. Over 1200 samples from Finland, Sweden, Karelia, Estonia, Setoland, Latvia and Lithuania were genotyped for 18 Y-chromosomal biallelic polymorphisms and 9 STRs, in addition to analyzing 17 coding region polymorphisms and the HVS1 region from the mtDNA. It was shown that the populations surrounding the Baltic Sea are genetically similar, which suggests that it has been an important route not only for cultural transmission but also for population migration. However, many of the migrations affecting the area from Central Europe, the Volga-Ural region and from Slavic populations have had a quantitatively different impact on the populations, and, furthermore, the effects of genetic drift have increased the differences between populations especially in the north. The possible explanations for the high frequencies of several haplogroups with an origin in the Iberian refugia (H1, U5b, I1a) are also discussed.


Analysing and separating a variety of migratory and demographic events from the current genetic variation of a region is a challenging task. In human population genetic studies that aim to dissect population history, analysis has traditionally consisted of pooling genetic data for each population and comparing the populations using various statistical methods. However, it has been shown that this approach is not entirely unproblematic since differences between haplogroups usually explain a larger proportion of the total variance than differences between populations (Bosch et al. 1999). Thus, having solely populations as the main unit of study without a combination of lineage-based information may lead to a loss of information and a failure to identify more subtle trends. Indeed, phylogeographic analyses of haplogroup variation have greatly increased our understanding of population history on a wide geographical and temporal scale, and interpretation of haplogroup frequencies among populations relies heavily on the knowledge of the historical strata that different haplogroups represent.

Throughout their history, the populations of the Baltic Sea region have been affected by migrations from both Western and Central Europe and from the east. The region was first settled both from the south-east and from the south soon after the retreat of the continental ice sheet some 12 000 years ago. The first ceramic culture in Scandinavia was of southern origin, whereas the eastern and northern shores of the Baltic Sea were affected by the Comb Ceramic culture that may have a Finno-Ugric association. These early Neolithic cultures were followed by the Corded Ware and Bronze Age cultures that affected Northern Germany, Scandinavia, the Baltic countries and coastal Finland. The Northeastern region, however, had close ties to the Russian area especially in the Bronze and Early Iron Age. (Huurre 1990, Siiriäinen 2003) Later political and population histories of the countries are intertwined with Swedish and Russian influence over large areas. Furthermore, the Germans have had a prominent role in urban life especially in the Baltic states (Alenius 2000). At present, the linguistic variation in the Baltic Sea region is substantial, with four major language groups: Finno-Ugric, and the Indo-European branches Baltic, Germanic, and Slavic (Fig. 1, Table 1).

Figure 1.

Map of the studied populations and their linguistic groups. For the abbreviations, see Figure 2.

Table 1.  The Y-chromosomal haplogroup frequencies (%) within the populations, and their linguistic affiliations
LanguageEstonianLatvianLithuanianKarelianEastern FinnishWestern FinnishSwedish
  1. 1IE: Indo-European

Y* (xC,DE,F)000000.40
F* (xI,J,K)
K* (xN,P)0.8000000.6
N* (xN2,3)000000.40
P* (xQ,R)0.800.60004.4
R* (xR1a,R1b)00000.300

In this study, we analyzed the population history of the Baltic Sea region. To achieve this, we studied the variation of both mitochondrial DNA and the Y chromosome with a special emphasis upon characterising the regional phylogeography of the most common haplogroups. The analysis covered both the internal variation of the region as well as the immigrations to the Baltic Sea area.

Material and Methods

Samples and Genotyping

Blood samples were collected from healthy unrelated individuals from populations of the Baltic Sea region (Fig. 1, Table 1). All samples were collected according to the principles of the Declaration of Helsinki (1964), and the project was approved by the local ethical committees. The Swedish samples are mainly from eastern Sweden. To acquire adequate sample sizes, the Aunus, Tver and Viena Karelians, Vepsians, Ingrians and 44 additional Karelian samples were pooled together as Karelians for the Y-chromosomal analysis and some mtDNA calculations. Respectively, the Estonians, Latvians, Lithuanians and Seto were pooled as Balts in some analyses. The Seto are not included in the Y-chromosomal analysis due to their small number of male samples. Even though these arrangements may cause a loss of resolution or even a small bias, pooling closely related populations is still preferable to exclusion of a mass of data. A subset of the Finnish and Baltic Y-chromosomal data has been presented before, (Lappalainen et al. 2006, Laitinen et al. 2002) and the Finnish (Finniläet al. 2001) and Russian (Malyarchuk & Derenko 2001, Loogväli et al. 2004) mitochondrial DNA data were collected from the literature. Because the total mtDNA sequencing by Finniläet al. (2001) has been selective in favour of some haplogroups, a part of the sequenced samples were randomly excluded until the mtDNA data matched the true haplogroup frequencies given in their Table 1.

A total of 1223 male samples were genotyped for 18 Y-chromosomal biallelic polymorphisms (SRY-1532, M216, M203, P14, M170, M253, P37, M223, 12f2, M9, LLY22g, P43, Tat, M45, P36, M207, P25, M17) mostly on the Sequenom® platform, and for nine STR loci (DYS19, DYS385a/b, DYS388, DYS389I, DYS389II, DYS390, DYS391, DYS392, DYS393). From the mtDNA, a total of 1269 samples were analyzed for 17 SNPs in the coding region (663, 3010, 4529, 4580, 4769, 4833, 6776, 7028, 7309, 8994, 9055, 12308, 13263, 13368, 13708, 14470, 5178). The Karelian and Seto samples and SNP A13263G from the whole sample set were genotyped by RFLP analysis, and the Swedes and Balts were analyzed on the Sequenom® platform. In addition, hypervariable segment 1 (HVS1) was sequenced in all the samples with standard methods. Details of the genotyping and haplogroup assignment are given in Supplementary Table 1.

Statistical Analysis

Y-chromosomal haplogroups were constructed from the biallelic marker data according to common guidelines (Y Chromosome Consortium 2002). MtDNA haplogroups were assigned based on both coding region and HVS1 polymorphisms (Finniläet al. 2001, Loogväli et al. 2004, Richards et al. 1998, Macaulay et al. 1999, Kivisild et al. 2002, Achilli et al. 2004,2005). The frequencies of haplogroups were calculated in each of the studied populations.

We performed phylogeographic analysis for Y-chromosomal haplogroups N3, I1a and R1a1, and mtDNA haplogroups H and U, since these haplogroups had a sufficient number of samples for the analysis of patterns of intrahaplogroup variation. In these mtDNA analyses all the Finnish samples sequenced belonging to the haplogroups H and U were used (Finniläet al. 2001). The Network 4.112 software was used to construct median-joining networks of the intrahaplogroup variation (Bandelt et al. 1995, 1999). The Y-chromosomal STR markers were assigned weights in inverse accordance with their variation within the haplogroup. The networks were simplified by partially removing circular structures according to the weighting scheme, and excluding the singleton haplotypes in the case of the very complex N3 network – however, the singletons are naturally included in the calculation of the coalescence age. In the mitochondrial DNA analysis, the coding region polymorphisms were given a very high weight, most HVS1 polymorphisms were allocated an intermediate one, and those HVS1 sites with a mutation rate over 3 times the average of HVS1 were downweighted (Meyer et al. 1999). The mtDNA position 16519 was omitted from network and coalescence age calculations due to its hypervariable status. The relative sample sizes of different populations were calculated to ease the interpretation of the network figures drawn from population samples of varying size. Coalescence ages for the haplogroups were also calculated by the Network 4.112 software. The mtDNA coalescence analysis was based on HVS1 (16090–16365) variation with a mutation rate of 1 transition per 20180 years (Forster et al. 1996). For Y-chromosomal coalescence estimates a rate of 1 mutation per 3623 years was used (Zhivotovsky et al. 2004). Also Bayesian-based calculations for coalescence age were performed (Wilson et al. 2003), but they showed such a lack of robustness for small deviations of the data that the network approach was preferred for the final analysis.

Additional analyses were performed for the Y-chromosomal and mtDNA data of each population. We used the Arlequin 2.001 software (Schneider et al. 2000) for calculations of haplotype diversity (Nei 1987), mean number of pairwise differences, and genetic distances measured as FST and RST (Slatkin 1995). Multidimensional scaling (MDS) plots for genetic distances were constructed with R software (R Development Core Team 2005). SAMOVA 1.0 software (Dupanloup et al. 2002) was used to assess genetic grouping of populations. Admixture proportions were calculated by Admix 2.0 (Dupanloup & Bertorelle 2001) from the Y-chromosomal data of selected populations by calculating molecular distance as squared difference in allele size and bootstrapping with 50 000 replicates. The diversity and SAMOVA analyses were performed also for each major haplogroup. Furthermore, matches for the most common Y-chromosomal haplotypes in our dataset were found in the Y-chromosomal haplotype reference database (Willuweit & Roewer 2007, http://www.yhrd.org).


Population-based Y-chromosomal Analyses

The Y-chromosomal data is given in Supplementary Table 2, and the haplogroup frequencies in Table 1. In the MDS plot of the genetic distances calculated from the total Y-chromosomal data (Fig. 2a), there were clusters consisting of the Balts, Karelians/Eastern Finns, and Swedes/Western Finns, which accounts for 8.0% of the variation. The populations in the Baltic states had a very high haplotype diversity up to 0.998, while the diversity was lower in the northeastern populations (Table 2). In the admixture analysis of the Estonians, the parental populations of Latvians+Lithuanians and Eastern+Western Finns had respective admixture coefficients of 0.81±0.20 and 0.19±0.20. For the other populations the definition of parental populations was too complex and the results too weak to yield good estimates of admixture coefficients.

Figure 2.

MDS plot of genetic distances calculated from the total Y-chromosomal data (stress 0.27) (A), and of total mtDNA data (stress 0.19) (B). The solid lines denote a division into two groups and the dotted line a division into three groups in SAMOVA analysis, and the numbers denote the respective percentages of variance. Abbreviations: Estonian (EST), Latvian (LAT), Lithuanian (LIT), Seto (SET), Karelian (KAR), Aunus Karelian (AUK), Viena Karelian (VIK), Tver Karelian (TVK), Vepsian (VEP), Ingrian (ING), Finnish (FIN), Eastern Finnish (FIE), Western Finnish (FIW), Swedish (SWE), and Russian (RUS).

Table 2.  The haplotype diversities and coalescence ages of the total Y-chromosomal data and the major haplogroups
  Total I1a N3 R1a1
nHaplotype diversitynHaplotype diversitynHaplotype diversitynHaplotype diversity
Estonian1180.998 +/− 0.002140.956 +/− 0.045400.994 +/− 0.008440.992 +/− 0.007
Latvian1130.992 +/− 0.0034NA470.962 +/− 0.013440.994 +/− 0.006
Lithuanian1640.990 +/− 0.00481.000 +/− 0.063720.936 +/− 0.020560.990 +/− 0.006
Karelian1320.966 +/− 0.009200.895 +/− 0.043700.899 +/− 0.030330.936 +/− 0.027
Eastern Finnish3060.966 +/− 0.006580.968 +/− 0.0112170.934 +/− 0.013180.994 +/− 0.021
Western Finnish2300.974 +/− 0.006910.880 +/− 0.030950.960 +/− 0.009200.990 +/− 0.019
Swedish1600.989 +/− 0.003570.939 +/− 0.021230.917 +/− 0.040390.991 +/− 0.008
TMRCA (ky)12232527.7 +/− 1.35648.8 +/− 1.525410.7 +/− 1.4

Y-chromosomal Haplogroups

Haplogroup N3 was much more common on the eastern side of the Baltic Sea than in Sweden (χ2 p< 0.001) (Table 1). In the STR network (Fig. 3a) many haplotypes are highly specific for either Finno-Ugric or Baltic-speaking populations, which is supported by the SAMOVA analysis, in which the grouping into Latvians and Lithuanians vs. the others explains as much as 20.3% of the variation. The Estonians harbor both Baltic and Finnic haplotypes, and they had the highest haplotype diversity (Table 2). The age of the haplogroup was 8,800 years.

Figure 3.

Median-joining networks of the haplotype variation of Y-chromosomal haplogroups N3 (A), I1a (B), and R1a1 (C). The lengths of the branches correspond to the number of STR mutations, and the sizes of the nodes denote the number of samples of the corresponding haplotype. Note that the singleton haplotypes have been excluded from the N3 network. The pie chart denotes the proportions of the different populations in the total sample set of the Y-chromosome. For the abbreviations, see Figure 2.

Haplogroup I1a had a high frequency of up to 40% in Sweden and Western Finland and intermediate frequencies in the other Finno-Ugric populations, while it was almost absent among Latvians and Lithuanians (Table 1). In the network the Finnish and Swedish haplotypes appear to be separate (Fig. 3b), which is supported by the SAMOVA analysis that separated the Swedes/Balts from the others with a moderate 6.9% of variation among these groups. The highest diversities were in Eastern Finland, the Baltic states, and Sweden (Table 2). The age of the haplogroup was 7,700 years (Table 2).

Haplogroup R1a1 had high frequencies up to 39% among all the populations with the exception of the Finns (Table 1). In the very starlike network (Fig. 3c), the Karelians exhibited a limited diversity of haplotypes, which is consistent with the haplotype diversity calculations (Table 2) and SAMOVA analysis, where the grouping of the Karelians versus the others explains 9.0% of the variation. The age of the haplogroup was as high as 10,700 years (Table 2).

The frequencies of the other haplogroups were low and mostly lacked strong geographical patterns. R1b was common in Sweden and in Western Finland, whereas I1b was more abundant in the Baltic states and Karelia.

Population-based Mitochondrial DNA Analyses

The full mitochondrial DNA data is given in Supplementary Table 3, and the mtDNA haplogroup frequencies are given in Table 3. The MDS plot of the genetic distances calculated from the total mtDNA data had only one cluster, but with exclusively the Karelian populations on its other side, a pattern mainly supported by the the SAMOVA analysis (Fig. 2b). The mean pairwise differences were relatively similar across the populations with slightly lower numbers among the Karelians and Russians (Table 4).

Table 3.  The mitochondrial DNA haplogroup frequencies (%) within the populations
     AunusVienaTver  Total   
  1. 1From Finniläet al. (2001)

  2. 2From Malyarchuk & Derenko (2001) and Loogväli et al. (2004)

Table 4.  The mean pairwise differences (π) and coalescence ages of the total mtDNA data and the major haplogroups.
  1. 1From Finniläet al. (2001)

  2. 2From Malyarchuk & Derenko (2001) and Loogväli et al. (2004)

Estonian1177.92 +/− 3.71484.03 +/− 2.05267.65 +/− 3.69
Latvian1147.77 +/− 3.64404.24 +/− 2.15427.01 +/− 3.36
Lithuanian1637.48 +/− 3.51763.98 +/− 2.01295.97 +/− 2.93
Seto567.52 +/− 3.56244.74 +/− 2.40206.28 +/− 3.11
Aunus Karelian2187.01 +/− 3.301003.69 +/− 1.88705.68 +/− 2.76
Viena Karelian877.14 +/− 3.38342.45 +/− 1.36314.55 +/− 2.30
Tver Karelian616.37 +/− 3.06302.70 +/− 1.48185.42 +/− 2.74
Vepsian645.61 +/− 2.73372.88 +/− 1.55133.82 +/− 2.05
Ingrian386.70 +/− 3.23193.64 +/− 1.937NA
Total Karelian5126.72 +/− 3.182403.27 +/− 1.691465.32 +/− 2.58
Finnish 1797.45 +/− 3.52314.38 +/− 2.22436.44 +/− 3.11
Swedish3077.03 +/− 3.311404.07 +/− 2.04876.11 +/− 2.93
Russian 2506.12 +/− 2.96192.90 +/− 1.597NA
TMRCA (ky)139961836.7 +/− 5.640068.4 +/− 13.4

Mitochondrial DNA Haplogroups

Haplogroup H was very common among all the populations, but with considerable variation in the subhaplogroup frequencies (Table 3). H1* was common among the Karelians, Swedes, and some Baltic populations with frequencies up to 18%, and rare especially in Finland and Estonia, while H1f was very specific to the Finns and Karelians. H3 was relatively rare, with frequencies of a few percent. The mean pairwise differences were again lower among the Karelians and Russians and, surprisingly, the highest among the Seto (Table 4). The low diversity of haplogroup H among the Karelians can be observed also in the haplotype network (Fig 4a): compared to e.g. the Balts, a much bigger proportion of the Karelian haplotypes are in high-frequency modal haplotypes. The median-joining network of haplogroup H consists of highly starlike clusters, and the haplotypes tend to have either very high or very low frequencies. The differences in subhaplogroup frequencies are clearly visible in the network e.g. in the case of the abundancy of H2 among the Swedes. The coalescence age of haplogroup H in our dataset was 36,700 years (Table 4).

Figure 4.

Median-joining networks of the haplotype variation of mitochondial DNA haplogroups H (A) and U (B). The lengths of the branches correspond to the number of mutations, and the sizes of the nodes denote the number of samples of the corresponding haplotype. The HVS1 mutation sites are given -16 000, and the coding region polymorphisms are underlined. The bases are given only for the sites where there are more than two alleles; for the others, see Supplementary Table 3. Note that position 16519 is not included in the network. The pie chart denotes the proportions of the different populations in the total sample set of the mtDNA. For the abbreviations, see Figure 2.

Of the U subhaplogroups, U4 was the most frequent among the Latvians, Seto, and Tver Karelians (7.1–8.8%) (Table 3). U5b and U5b1b1 were common in Karelia and especially among the Viena Karelians where U5b1b1 reached a high frequency 16.1%. The Estonians had the highest number of mean pairwise differences (Table 4). Haplogroup U had a network with some starlike clusters, too, but to a lesser degree than haplogroup H (Fig. 4b), and especially subhaplogroups U5b*, U5b1b1 and K had very common modal haplotypes, whereas the variation of U5a was scattered into haplotypes of moderate frequency. The SAMOVA analysis separated the Seto and Vepsians with small sample sizes from the main group. The coalescence age was as high as 68,400 years (Table 4).

Haplogroup Z was observed among the Finns, some Karelian populations, Russians and Swedes with a low frequency. Asian haplogroups A, C, G and D were rare in the Baltic Sea region with the exception of D5 that reached a high frequency of 11.5% in Viena Karelia (Table 3).


The aim of this study was to unravel the paternal and maternal population history of the Baltic Sea region, both the migrations affecting the region from outside and internal population events. To yield the maximum resolution of our extensive dataset, we chose to combine population-based analyses with regional analysis of phylogeography of the major haplogroups. Haplotype diversity, age and network calculations are conventional tools of phylogeographic analysis of lineages, but the problem of network analysis is that the method itself does not perform quantification and statistical testing of the trends one may intuitively observe. For this purpose we applied the SAMOVA also for analysis of haplogroups. Jointly, these analyses yielded an understanding of the history of the main haplogroups of the region, which allowed a better interpretation of haplogroup frequency patterns across the studied region.

The Y-chromosomal Haplogroups

The frequency distribution and age of haplogroup N3 in our study sample was consistent with the earlier studies (Lahermo et al. 1999, Zerjal et al. 2001, Tambets et al. 2004, Karlsson et al. 2006, Rootsi et al. 2007). According to the YHRD database, the haplotypes most common in Finland and Karelia were relatively unique, which is not unexpected, since data from most Eurasian populations where N3 is common is not publicly available. It seems evident that the Finns and Karelians share a history regarding haplogroup N3. In the database comparisons, we also observed that N3 may mark a westward diffusion in the north from Finland to Sweden and in the south from the Baltic countries to Poland and Germany.

It has been suggested (Zerjal et al. 2001, Roewer et al. 2005) that the differences in the haplotype structure of Baltic-speaking Latvians and Lithuanians and Finno-Ugric populations, also observed in our N3 data, imply that the migrations introducing N3 to the region followed a bifurcating pattern, which discourages the idea of a language switch among Latvians and Lithuanians (Laitinen et al. 2002). Another factor contributing to the pattern of genetic variation could be the divergence by linguistic isolation between Baltic and Finnic speakers. However, such a process would affect all haplogroups, which is not supported by genetic distances calculated from the total Y-chromosomal data set – the strong divergence of the Baltic speakers is at least in part specific for N3. A founder effect among the Baltic populations is not any better an explanation, since the diversity among the Baltic speakers is not decreased. Thus, our data supports the idea of two migrations that introduced N3 to the Baltic Sea region (Fig. 5). The haplotype variation in Estonia suggests an admixture of Baltic and Finno-Ugric haplotypes. Furthermore, a bifurcating migration pattern can contribute to the relatively high age of the haplogroup in the Baltic region, since the coalescence age represents the common root of the total variation in the region.

Figure 5.

Suggested arrival routes of the most important Y-chromosomal haplogroups in the Baltic area, with the dotted arrows denoting less certain routes.

Haplogroup I1a is suggested to have its origins in the Iberian refugium, from where it spread northward and now has its highest frequencies in Northern Europe (Rootsi et al. 2004). The haplotype matches to Germany and Poland imply that I1a has arrived to the Nordic countries from the Southern Baltic Sea region, which is historically plausible. The coalescense age of the haplogroup is about 5000 years lower than the age of the earliest archaeological findings from the Northern Baltic Sea region, which suggests a Neolithic arrival. There are two possible migration routes from Central Europe to the Northern Baltic Sea region: an exclusive western route via Sweden, an eastern route via the Baltic states, or via both to Eastern Finland and Karelia (Fig. 5). The surprisingly high diversities of I1a among the eastern Finnish and Baltic populations, and the lack of association between the Western Finns and the Swedes in SAMOVA analysis suggest that I1a has been involved in bifurcating migrations both via Sweden and the Baltic states, and that the presence of the haplogroup in Finland and Karelia is not merely due to Swedish influence. The low frequency of I1a among the Baltic populations may be due to later effects of genetic drift or replacement.

Haplogroup R1a1 is known to be most prevalent in Eastern Europe, and has possibly expanded alongside the Kurgan culture and/or the Indo-European language (Semino et al. 2000). The Baltic and Swedish haplotypes had affinities mainly with Germany and Poland in database comparisons, which suggests gene flow from that region to the Western and Eastern coasts of the Baltic Sea. It is plausible that both R1a1 and I1a were carried to the Baltic Sea region via the same Neolithic migrations from Germany/Poland. The higher coalescence age and the starlike network structure of R1a1 are consistent with the probable higher diversity and frequency of R1a1 in the original source population(s), a consequence of the wider geographical distribution of the haplogroup. It is an important observation that in the Baltic Sea region R1a1 is mainly associated to Central European rather than eastern or Russian influence. However, haplotype frequency comparisons (Derenko et al. 2006, Willuweit & Roewer 2007) give some indication of Russian gene flow as a partial source of R1a1 in Karelia, which would be plausible given the long period of admixture with Slavs (Fig. 5). However, the Y-chromosomal diversity in Karelia has been heavily affected by drift and founder effects. Another haplogroup with eastern affinity is I1b (Rootsi et al. 2004), whose presence in Karelia and the Baltic states is probably a sign of Russian gene flow.

Haplogroup R1b, the most common haplogroup of Western Europe (Rosser et al. 2000, Semino et al. 2000), was relatively rare in the northern Baltic Sea region, which, with the exception of Sweden, shows a lack of recent West European influences. R1b has been suggested as having its origins in the Iberian refugia (Semino et al. 2000) in a similar manner to I1a (Rootsi et al. 2004) but their frequency distribution points to a very different history for R1b and I1a, at least in Northern Europe.

Mitochondrial DNA Haplogroups

The specificity of H1f for the Finnish population has been associated with drift within Finland (Loogväli et al. 2004), which is supported by the low number of high frequency haplotypes in the network. Its abundance also in Aunus and Viena Karelia and Ingria provides a strong support for the close historical ties between the Finns and Karelians. The high frequency of haplogroup H2 among the Swedes may be due to sampling bias or local genetic drift, since most of the Swedish H2 samples belong to a single haplotype.

Haplogroups H1, H3, U5b and V have been associated with the expansion from the Iberian refugia after the Ice Age (Torroni et al. 1998, 2006, Achilli et al. 2004, 2005). Interestingly, H1 and U5b have frequency peaks in Northern Europe in addition to the Iberian peninsula, and our Karelian sample even had a higher H1 frequency than that of the Basques (Achilli et al. 2004, 2005, Torroni et al. 2006). We suggest three possible scenarios to explain this pattern. First, it is possible but unlikely that genetic drift has consistently increased the frequencies of these haplogroups in the entirety of Northern Europe, though it probably has contributed to the frequency peaks among the Karelians. Another explanation for the high frequencies would be migration from Southern to Northern Europe, possibly via the Atlantic and Baltic coasts, but the archaeological evidence for this is limited. The third alternative scenario would be initially high H1 and U5b frequencies in the entirety of Europe that were partly replaced by other haplogroups in Central Europe due to subsequent migrations that did not affect the north and the southwest. We consider this to be historically and genetically plausible, but further research is needed to analyze this in detail. Haplogroups H3 and V, despite their similar origin in the Iberian refugia, do not follow the same frequency pattern as H1 and U5b, as their frequencies in the Baltic Sea region are barely any higher than in Central Europe. However, the low frequencies of these haplogroups across Europe make reliable comparisons difficult, and even though there is evidence for a similar historical scheme, they are still distinct haplogroups with possibly different histories.

Haplogroup U is an ancient European haplogroup with an age as great as 55,000 years (Richards et al. 2000), and it had a very old coalescense of age also in our analysis. U5b1b1, the so-called “Saami motif”, was very common among the Karelian populations especially in Viena, which, together with the high frequency of D5 and the presence of Z, is a clear sign of shared population history for the Saami and Karelians (Ingman & Gyllensten 2006). This is also supported by archaeological evidence, since many prehistoric cultural features of the Saami people have arrived from the east via Karelia.

The eastern elements in the mtDNA variation of the Baltic Sea region are intertwined with the Saami influence. Recent studies of the mtDNA variation among the Saami show a link to the Volga-Ural region (Tambets et al. 2004, Ingman & Gyllensten 2006), which is now shown to exist also among the Karelians and, to a lesser degree, among the other populations from the Baltic Sea region as well. Additionally, the presence of U4 in the Eastern Baltic Sea populations may represent eastern influence, since it is typical for the Volga-Ural region (Bermisheva et al. 2002). The high diversity of this haplogroup in the Baltic region, observable in the haplotype network, suggests a complex history, and rules genetic drift out as a cause of the high frequency. All in all, these mtDNA haplogroups may be maternal reflections of the eastern influence that can be most clearly observed in the Y-chromosomal haplogroup N3.

Population-based Y-chromosomal and mtDNA Analyses

Genetic distances between the populations are in accordance with linguistic and geographic boundaries in our study and also previous analyses (Zerjal et al. 2001) regarding the close proximity of the Baltic countries, and the close association of Finland and Karelia. Admixture analysis of the Estonians supports the importance of geographical location compared to language in determining genetic variation. The Finnish Y-chromosomal variation has been affected by drift, and the grouping of Finns in the paternal MDS plot highlights their bimodal population structure (Kittles et al. 1998, 1999, Lahermo et al. 1999, Lappalainen et al. 2006). The Karelians share the low diversity of the Finns also on the maternal side, especially the Viena Karelians, isolated by their northern geographical location, and the Tver Karelians among whom a founder effect has probably taken place. The diversities of the Balts and the Swedes were in accordance with common European values, and reflect their less isolated geographical position and a probably more stable population size.


The populations of the Baltic Sea region have their strongest roots in Central Europe, which is compatible with archaeological information regarding the arrival of the first inhabitants and several later prehistorical cultures. Additionally, the populations from the eastern side of the Baltic Sea region carry signs of migrations rooted in the east that may be associated to the Finno-Ugric language and/or the Comb Ceramic culture. Less pronounced local admixtures with other populations include Saami influence especially among the Karelians, Russian admixture among them and the Balts, and Central European influence in Sweden. An interesting phenomenon our data has confirmed is the common features between the Iberian peninsula and Northern Europe, observed especially in mtDNA variation, and we hope that future research will shed light on its cause. Furthermore, several Y-chromosomal and mtDNA haplogroups in the Baltic Sea region are of paleolithic origin in Europe. This is consistent with earlier observations (Richards et al. 2000) suggesting preservation or even enrichment of the traces of the most ancient European settlement in the northern periphery of the continent. In conclusion, this study also provided strong evidence that the Baltic Sea has been a route not only for trade and cultural exchange but also for population migration.


The study was financially supported by the Emil Aaltonen Foundation, the Finnish Cultural Foundation, and the Research Foundation of the University of Helsinki. The Finnish Genome Center provided the genotyping platforms for the STRs and the SNPs by Sequenom®. We thank Pertti Sistonen, Tuula Koski and Richard Villems for providing samples, and Ella Granö for technical assistance. Finally, we would like to express our gratitude to the sample donors for their contribution.