Mitochondrial DNA Portrait of Latvians: Towards the Understanding of the Genetic Structure of Baltic-Speaking Populations

Authors

  • L. Pliss,

    Corresponding author
    1. Biomedical Research and Study Centre, University of Latvia, Riga, Latvia
    2. Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia
      *Corresponding authors: Liana Pliss, Ratsupites Street 1, LV-1068, Riga, Latvia, Phone: +3717808218, Fax: +3717442407. E-mail: liana_@navigator.lv; Kristiina Tambets, 23 Riia Street, 51010 Tartu, Estonia, Phone: +3727375053, Fax: +3727420286. E-mail: ktambets@ebc.ee
    Search for more papers by this author
    • #

      these authors contributed equally to this work

  • K. Tambets,

    Corresponding author
    1. Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia
      *Corresponding authors: Liana Pliss, Ratsupites Street 1, LV-1068, Riga, Latvia, Phone: +3717808218, Fax: +3717442407. E-mail: liana_@navigator.lv; Kristiina Tambets, 23 Riia Street, 51010 Tartu, Estonia, Phone: +3727375053, Fax: +3727420286. E-mail: ktambets@ebc.ee
    Search for more papers by this author
    • #

      these authors contributed equally to this work

  • E.-L. Loogväli,

    1. Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia
    Search for more papers by this author
  • N. Pronina,

    1. Latvian State Medical Genetics Centre, Riga, Latvia
    Search for more papers by this author
  • M. Lazdins,

    1. Department of Biochemistry and Molecular Biology, University of Latvia, Riga, Latvia
    Search for more papers by this author
  • A. Krumina,

    1. Riga Stradins University, Riga, Latvia
    Search for more papers by this author
  • V. Baumanis,

    1. Biomedical Research and Study Centre, University of Latvia, Riga, Latvia
    Search for more papers by this author
  • R. Villems

    1. Department of Evolutionary Biology, Institute of Molecular and Cell Biology, University of Tartu and Estonian Biocentre, Tartu, Estonia
    Search for more papers by this author

*Corresponding authors: Liana Pliss, Ratsupites Street 1, LV-1068, Riga, Latvia, Phone: +3717808218, Fax: +3717442407. E-mail: liana_@navigator.lv; Kristiina Tambets, 23 Riia Street, 51010 Tartu, Estonia, Phone: +3727375053, Fax: +3727420286. E-mail: ktambets@ebc.ee

Summary

Mitochondrial DNA (mtDNA) variation was investigated in a sample of 299 Latvians, a Baltic-speaking population from Eastern Europe. Sequencing of the first hypervariable segment (HVS-I) in combination with analysis of informative coding region markers revealed that the vast majority of observed mtDNAs belong to haplogroups (hgs) common to most European populations. Analysis of the spatial distribution of mtDNA haplotypes found in Latvians, as well as in Baltic-speaking populations in general, revealed that they share haplotypes with all neighbouring populations irrespective of their linguistic affiliation. Hence, the results of our mtDNA analysis show that the previously described sharp difference between the Y-chromosomal hg N3 distribution in the paternally inherited gene pool of Baltic-speaking populations and of other European Indo-European speakers does not have a corresponding maternal counterpart.

Introduction

Since the first studies that showed considerable mtDNA variation in different geographical subsets of humans (e. g.Brown, 1980; Cann et al. 1987; Vigilant et al. 1991), the analysis of mtDNA polymorphisms has been a favoured tool in population genetics. In uniparentally inherited non-recombining human mtDNA, polymorphisms have accumulated sequentially along radiating maternal lineages from sets of often continent-specific mtDNA founders, providing a detailed record of the ancient migration patterns of women (e.g.Wallace et al. 1999). Large-scale studies of European mtDNA diversity have been mostly concentrated on Western and Central European populations (e.g.Richards et al. 2000, 1998; Torroni et al. 1994). In the last few years, data on maternal lineages from Eastern European and Slavonic-speaking populations have started to emerge (e.g.Belyaeva et al. 2003; Bermisheva et al. 2002; Malyarchuk & Derenko 2001; Malyarchuk et al. 2003, 2002; Tolk et al. 2000).

So far, population genetic studies of the two extant Baltic-speaking populations – Latvians and Lithuanians, who form a separate branch of the Indo-European language family tree – have mostly touched on the variation of classical genetic markers and the pattern of spread of polymorphisms associated with diseases, as well as variation of the Y chromosome (Beckman et al. 1999, 1998; Kasnauskiene et al. 2003; Krumina et al. 2001; Kučinskas 1994, 2001; Lahermo et al. 1999; Laitinen et al. 2002; Pronina et al. 2003; Rosser et al. 2000; Zerjal et al. 2001). MtDNA variation in Baltic-speaking populations has been less studied to date. Recently, Kasperavičiūtėet al. (2004) showed that mtDNA variation among different Lithuanian regions is limited and that Lithuanian mtDNA variation is closely related both to Slavonic- and Finno-Ugric-speaking populations of Northern and Eastern Europe. Practically no data is available regarding mtDNA variation among Latvians.

Genetic investigations of classical genetic markers (e.g. TF*DCH1, PI, LWb) among Latvians and Lithuanians have revealed genetic stratification of Baltic-speakers at the intrapopulation level (Beckman et al. 1999; Krumina et al. 2001), as well as differences in the Baltic-speakers compared to other Indo-European and Finno-Ugric-speaking populations of the Baltic Sea region (Beckman et al. 1998; Sistonen et al. 1999). The analysis of one of the most common genetic diseases in Europeans – phenylketonuria (PKU) – has revealed the predominance of a single mutation in the phenylalanine hydroxylase (PAH) gene, R408W of haplotype 2, in the Baltic states. Although well spread throughout Eastern Europe, R408W has its frequency peak in Latvia, Lithuania and Estonia, there comprising about four fifth of PKU haplotypes (Kasnauskiene et al. 2003; Lilleväli et al. 1996; Pronina et al. 2003). It has been suggested that this mutation originated in an ancient eastern European population, from where it spread westward (Eisensmith et al. 1995).

Meanwhile, the analysis of five Y-chromosomal markers showed the highest genetic similarities between Estonian, Latvian and Lithuanian males, followed by Finno-Ugric-speaking Mari (Laitinen et al. 2002). However, Zerjal et al. (2001) concluded, based on the analysis of both hg frequency distribution and microsatellite variation, that the Y chromosomes of Finno-Ugric- and Baltic-speaking populations have distinct genetic histories. One of the main players in Y chromosomal variation in Northern Eurasian populations is hg N3 (formerly hg 16 or TatC allele). This hg is frequent among Finno-Ugric-speaking populations (Laitinen et al. 2002; Rootsi et al. 2000; Zerjal et al. 2001; Tambets et al. 2004) and many Siberian populations (Karafet et al. 2002; Zerjal et al. 2001; Tambets et al. 2004), but present only at very low frequencies in Southern and Western Europe (Rootsi et al. 2000; Zerjal et al. 2001, 1997). Hg N3 is also frequent in Baltic-speaking populations (Lahermo et al. 1999; Laitinen et al. 2002), but less so in the Slavonic-speaking neighbours of Latvians and Lithuanians (Rootsi et al. 2000; Rosser et al. 2000; Zerjal et al. 1997), even though the two linguistic families – Slavonic and Baltic – are sister groups in the Indo-European tree of languages. According to one of the latest estimates, the split between the two linguistic families occurred only approximately 3400 years ago (Gray & Atkinson, 2003).

Summing up, several problems arise in inferring the genetic history of Latvians and Baltic-speakers in general. How does the genetic variability of the Latvian mtDNA pool (and that of Lithuanians) reflect their linguistic background compared to neighbouring populations, in particular Finno-Ugric and Slavonic speakers? How does the pattern of mtDNA variation in Latvians correspond to that observed for their Y chromosomes, where the Latvians and Lithuanians, contrary to their linguistic affinity, are very close to their Finnic-speaking neighbours? And, from a microevolutionary point of view, do the two extant Baltic-speaking populations – Latvians and Lithuanians – display distinct maternal lineages that can be considered as region – specific for the two populations? To this end, we have analyzed 299 Latvian mtDNA genomes and present the results obtained in a comprehensive phylogeographic context.

Subjects and Methods

Population Samples

The Latvian sample consisted of 299 unrelated healthy volunteers representing four anthropologically, archaeologically and ethno-linguistically different regions of Latvia: 88 from the North–Western region (Northern Curonia), 67 from the Central region (Semigalia), 68 from the South–Western region (Southern Curonia), and 76 from the Eastern (Lettigalia) region (Figure 1). The informed consent of the volunteers was obtained and their ethnicity, as well as maternal ancestry over the last three generations, was established from interviews. All DNA samples were taken anonymously. The Ethics Committee of Riga Stradins University approved the research protocol.

Figure 1.

Map of Latvia. The four regions where the samples for the present study were collected are shown with a dashed line. The Sample size (N) for each region is given in brackets

For comparison, the mtDNA database of 11,236 individuals, consisting of published mtDNA data (Baasner et al. 1998; Bermisheva et al. 2002; Cali et al. 2001; Comas et al. 1998; Crespillo et al. 2000; Derbeneva et al. 2002a, 2002b; Derenko et al. 2003; Dimo-Simonin et al. 2000; Dubut et al. 2004; Helgason et al. 2001, 2000; Kasperavičiūtėet al. 2004; Kittles et al. 1999; Lahermo et al. 1996; Larruga et al. 2001; Loogväli et al. 2004; Lutz et al. 1998; Malyarchuk & Derenko, 2001; Malyarchuk et al. 2003, 2002; Meiniläet al. 2001; Mogentale-Profizi et al. 2001; Passarino et al. 2002; Pereira et al. 2001; Pfeiffer et al. 1999; Richards et al. 2000; Tagliabracci et al. 2001) as well as our unpublished mtDNA data from 4732 individuals from different Eurasian populations, were used as background information for the analysis.

mtDNA Sequencing and Genotyping

DNA was extracted from venous blood using the standard phenol–chloroform method as described in Sambrook (1989). A DNA fragment, encompassing the mtDNA HVS-I between nucleotide positions (nps) 16024–16383, was amplified and sequenced in all samples; HVS-II sequences between nps 70–350 were determined only for selected samples from hg U4 and H.

Purified PCR products (see Werle et al. 1994) were sequenced directly from both strands by use of the DYEnamic™ ET terminator cycle sequencing kit (Amersham Pharmacia Biotech, Sweden) according to the manufacturer's protocol on the MegaBace 1000 DNA automated sequencer (Amersham Pharmacia Biotech). The sequences were compared with the revised Cambridge Reference Sequence (rCRS Andrews et al. 1999) by use of the Genetics Computer Group Wisconsin Package or by Contig Express software (ABI, USA). Length polymorphism of the A and C stretches between nps 310–315 and 16180–16188 were disregarded in the analysis. Similarly, transversions adjacent to the poly-C tract in positions 16184–16193 were ignored as probable sequencing artefacts.

To confirm the hg affiliations of mtDNA sequences, hierarchical RFLP analysis was performed using 17 restriction endonucleases (see Table 1). Nucleotide variants at coding region sites 5656, 6776, 7385, 10927 and 11812 were ascertained by sequencing. For detection of the mutation at np 456 (from C to T) allele-specific PCR was used.

Table 1.  Mitochondrial DNA HVS-I haplotypes of Latvians
HVS-I haplotype (−16 000)Haplo-groupDiagnostic sites1Exact HVS-I matches2
 TA LATLITRUSPOLGERFRANORFIN
 AT A 111111111111
 TTC34444444556777888000122344556
 23403456778067034249039836747694
731503474693057287449392102066007
390672973931366554964277289456648
vCCTdxjqkazfcGCaGdbweaaCGgbtsuauc
  1. 1The restriction enzymes used in the analysis are designated by the following single-letter codes:

  2. a –AluI; b –AvaII; c –DdeI; d –Bsh1236I; e –HaeIII; f –HhaI; g –HinfI; j –MboI; k –RsaI;

  3. q –NlaIII; s –AccI; t –BstOI; u –MseI; v –Alw44I; w –SspI; x –Eco47I; z –BsuRI

  4. 2the presence of the HVS-I haplotype in a population is indicated with x, respectively. Population data are from: LAT – Latvians, this study; LIT – Lithuanians, (Kasperavičiūtėet al. 2004); RUS –Russians, (Malyarchuk & Derenko, 2001; Malyarchuk et al. 2002; Orekhov et al. 1999); POL – Poles, (Malyarchuk et al. 2002; Richards et al. 2000); GER – Germans, (Baasner et al. 1998; Hofmann et al. 1997; Lutz et al. 1998; Pfeiffer et al. 1999; Richards et al. 1996); FRA – French, (Cali et al. 2001; Danan et al. 1999; Dubut et al. 2004); NOR – Norwegians, (Helgason et al. 2001; Opdal et al. 1998; Passarino et al. 2002); FIN –Finns, (Kittles et al. 1999; Lahermo et al. 1996; Meiniläet al. 2001; Pult et al. 1994; Sajantila et al. 1995)

093H*-T C+- -- + T- + -4xxxxxxx
093 129 316H*-T C+- -- + T- + -1 x x
093 293H*-T C+- -- + T- + -1 
093 311 362H*-T C+- -- + T- + -1 x 
129H*-T C+- -- + T- + -6xxxxxxx
168H*-T C+- -- + T- + -1 xxxx 
181H*-T C+- -- + T- + -1 
270H*-T C+- -- + T- + -1xx 
311H*-T C+- -- + T- + -1xxxxxxx
154 354H*-T C+- -- + T- + -1 
193 294H*-T C+- -- + T- + -1 x 
CRSH*-T C+- -- + T- + -23xxxxxxx
177H1- - - - 1 
299H1- - - - 1 xx 
320H1- - - - 2 xx 
355H1- - - - 1 x x 
093H1- - - - 1 
189H1- - - - 2xxxxxxx
CRSH1- - - - 9 
162 344H1a+ - - 1 
162H1a+ - - 2xxxxxxx
080 129 142 189 356H1b- - - 3 
080 189 261 356H1b- - - 1 
080 189 356H1b- - - 3 xx 
189 209 356 362H1b- - - 1 
189 269 356H1b- - - 1 
189 311 356H1b- - - 1 
189 356H1b- - - 9xxxx xx
189 356 362H1b- - 1 xxx x 
193 354H2- + - 1 
354H2- + - 5xxxxxxx
CRSH2- + - 2 
311H3- C- 1 
214 278H3- C- 1 
CRSH3- C- 1 
CRSH4- - - 2 
304H5- T - - 1 
167 192 304 311H5- T - - 1 x 
192 304 311H5- T - - 1 
304H5a- T + - 14 xxxxxx
192 304 311H5a- T + - 2 x 
304 311H5a- T + - 1 x xx 
362H6+C - +1 
362H6-C - +2xxxxxxx
CRSH6-C - +1 
217 311H7- + - 4 
278H11a- - - 1 xxxx 
311H11a- - - 1 
224 278 293 311H11a- - - 1x 
261 278 293 311H11a- - - 2 
278 293 311H11a- - - 5 xxx x
311HV*- + - - 7xxx x 
129 172 223 311 319I1+ + 1x x 
129 172 223 278 311 319I1+ + + - 1 
129 172 223 311I1+ + + - 8 xxxx x
129 172 223 311 355I1+ + 2x 
129 223 311I+ + 1 x 
069 126J1+ - + - 9xxxxxxx
069 126 153J1+ - + - 1 
069 126 311J1+ - + - 1 xx xx 
069 126 145 172 186 222 261J1+ - + - 1 
069 126 145 172 222 261J1+ - + - 1xxxx x 
069 126 189 291A 311J1+ - +- - - - + 1 
069 126 145 172 222 260 261J1+ - + - 1 
069 126 145 189 231 261J2+ + - - 2 x x 
069 126 145 231 239 261 355J2+ + - - 1 
069 126 193 278J2+ + - - 1 xxx x
224 311K+ + + 7xxxxxxx
051 129C 189 311 362U2+ + 9 
343U3+ + + + 5xxxxxxx
CRSU4+ C + + + 3 xxx 
214U4+ C + + + +- 1 
104 294U4+ C + + + 1 
134 356U4+ + + + 1xxxxxx 
134 356 362U4+ + + 1 
134 172 356 362U4+ + + 1 
145 356U4+ + + + + 2 
189 356U4+ + + 3xx x 
189 311 356U4+ + + 1 
223 356U4+ + + + 1 xxx x
261 316 356U4+ + + 1 
335 356U4+ + + 1 
356U4+ + + 11xxxxxxx
093 192 256 270 291U5a+ + 3xx x x
093 256 270 291U5a+ + 1 x
114A 192 256 270 286CG 292 294U5a+ + + 1 
114A 192 256 270 292 294U5a+ + 1 
114A 192 256 270 294U5a+ + + 1xxxxxxx
114A 192 256 270 294 311U5a+ + + 1 
192 256 270 320U5a+ + + 3x x 
192 256 270 292 294U5a+ + 1 
192 248 256 270 291 294 311U5a+ + + 1 
174 256 270U5a+ + + 1 
256 270U5a+ + 4 xxxxx
256 270 294U5a+ - + 2 x
189 256 270U5a+ + 1 
192 311U5b+ A + 2 x x
189 270U5b1+ G A T + 2xx xxxx
192 270U5b1+ G A T + 1 xx 
093 129 189 270U5b1b+ G +G C + 1 
126 163 186 189 294T1+ - + 4xxxxxxx
126 163 186 192 294T1+ - + 1 
126 294T2+ G + + 3 xxxxx 
126 294 296T2+ G + + 2 xxxxx 
126 294 296 304T2+ G + + 13xxxxxxx
126 294 296 304 311T2+ G + + 1x x 
126 294 296 304 362T2+ G + + 1 xx 
126 248 292 294T2+ G + + 1 
126 261 294 296 304T2+ G + + 1 x 
126 178G 294 296 304T2+ + G-+ + 1 
298V- - 4xxxxxxx
153 298V- - 2xxxx xx
153 234 298V- - 1 
153 298 311V- - 1 
153 298 319V- - 1 
051 223 292W+ + - - 1 
223 230 292W+ + 1 
223 292W+ + 1 xxxxxx
223 291 292W+ + - - 2 
223 292 295W+ + - - 1 xx x
179 292W+ + 1 
192 223 292 325W+ + + - - 5 x x
093 223 227 278 362G2a+ + + - + - - 1 
189 223 266 278X2+ - - + 1 
Totals 2991804733521282363612580

Classification of hgs and subhaplogroups (sub-hgs) was based on Torroni et al. (1996; 1994; 1993), Macaulay et al. (1999), Finniläet al. (2001), Kivisild et al. (2002); Loogväli et al. (2004), Richards et al. (1998) and Tambets et al. (2004).

The phylogenetic network of mtDNA haplotypes was constructed manually and checked by use of the program NETWORK 3.1.1.1 (http://www.fluxus-engeneering.com). Relative mutation rates used to construct the network were inferred from the number of independent occurrences of changes at a particular site, derived from the dataset of nearly 16,000 HVS-I and 1000 coding sequences, including both published and unpublished material.

Statistical Analysis

Statistical analysis was based on mtDNA haplotypes that were classified into hgs (see Tables 1 and 2). The distribution of mtDNA diversity was measured using the analysis of molecular variance (AMOVA, Excoffier et al. 1992) as variation within and between population groups, which were composed either on the basis of linguistic affiliations, according to language subfamilies (Baltic-, Slavonic-, Germanic-, Finno-Ugric-speaking populations), or by geographical location of the studied populations. In the latter case the populations of North-Eastern and Eastern Europe (Latvians, Lithuanians, Estonians, Finns, Russians and Poles) were grouped and compared with populations of Western and North-Western Europe (Germans, French and Norwegians). In order to investigate the population structure of Latvians, all four Latvian ethnolinguistic groups were first treated separately and then grouped into two main subgroups – Curonians (Northern and Southern Curonians) and South-Eastern Latvians (Semigalians and Lettigalians) – according to their geographic and linguistic affiliations.

Table 2.  MtDNA haplogroup frequencies (%) among Latvians and other European populations
 Latvians1 IE-BLithuanians2 IE-BRussians3 IE-SPoles3 IE-SFinns4 U-FUEstonians1 U-FUMari5 U-FUNorwegians6 IE-GGermans7 IE-GFrench8 IE-R
n%n%n%n%N%n%n%n%n%n%
  1. Note: Linguistic affiliations of populations are abbreviated as follows: IE-B – Indo-European/Baltic; IE-G – Indo-European/Germanic; IE-R – Indo-European/Romanic; IE-S – Indo-European/Slavic; U-FU – Uralic/Finno-Ugric 1 – present study; 2 –Kasperavičiūtė & Kučinskas, (2004); 3 –Malyarchuk et al. (2002); 4 –Meiniläet al. (2001); Sajantila, (1995); Kittles et al. (1999); Pult et al. (1994); Lahermo et al. (1996); 5 –Bermisheva et al. (2002); 6 –Passarino et al. (2002); Helgasson et al. (2001); 7 –Hofmann et al. (1997); Baasner et al. (1998); Richards et al. (1996); Pfeiffer et al. (1999); 8 –Dubut et al. (2004); Cali et al. (2001)

H 13344.58346.18542.319745.223540.517843.55540.417945.115947.715247.5
HV 72.331.742.040.9 61.521.510.320.641.3
preHV 10.5 10.320.6
V 93.095.094.5214.8386.6163.91511.0164.0154.561.9
preV 10.621.0 10.2 61.9
J 196.4147.8168.0347.8284.84210.3107.45012.6288.4195.9
T 289.41810.02210.95011.5223.8327.875.1399.8309.03611.3
N1a 51.2 10.310.3 
N1b 21.1 10.2 10.2 10.310.310.3
N1c 10.2 
A 10.6 21.5 
R 10.520.510.2 10.3 
U 7625.43821.14220.98519.516227.911026.93626.58721.97021.07423.1
U* 10.6 10.2 10.2 10.320.651.6
U1 21.0 10.2 10.310.3 
U293.0 31.540.940.761.5 30.961.9
U351.731.721.020.5 41.0 71.841.2 
U4289.4105.673.5225.061.0225.41410.3112.861.820.6
U5279.0179.42110.4388.713022.45814.21914.04210.6288.4268.1
U6 20.6
U7 10.510.220.3 10.310.3
U8 21.1 20.530.571.7 51.3 30.9
K72.352.863.0153.4172.9112.732.2205.0257.5299.1
W 124.021.142.0163.7518.8102.4 82.092.772.2
X 10.321.173.581.850.941.0 20.541.230.9
I 134.373.952.581.8244.141.010.792.361.882.5
M 10.3 31.581.8101.710.285.920.520.6 
M* 10.510.2 10.7 
C 40.9 10.7 
D 10.510.210.210.221.5 20.6 
G10.3 10.520.5 
Z 91.6 42.920.5 
L 10.230.5 00.020.541.220.6
Total n299 180 201 436 580 409 136 397 333 320 

The significance of the results was tested by 10,000 permutations. Standard errors were estimated from 1000 bootstrap iterations. For AMOVA the ARLEQUIN 2.0 package (Schneider et al. 2000) was used. Principal component (PC) analysis was carried out using the program POPSTR, kindly provided by H. Harpending. The statistical significance of population differences with respect to the frequencies of mtDNA hgs was evaluated using the chi-square test (uncorrected for multiple comparisons).

Results

mtDNA Haplogroup Profile of Latvians

The observed mtDNA hg and sub-hg frequencies found among Latvians are summarised in Table 2. Hg H, which is the most frequent hg in all European populations except the Saami, accounted for almost half (45%) of the mtDNA variants in Latvians. Forty-five per cent of the classified hg H genomes belonged to sub-hgs H1 and H5 (Table 3 and Figure 3). Sub-hg H1b, which occurs more frequently in Eastern and North-Central Europe (ca. 7% and 5% from the total of hg H, respectively, (Loogväli et al. 2004), was the most abundant type of sub-hg H1 among Latvians (25% of H1, 8% of H), also being frequent among Estonians (Table 3). The HVS-I sequence motifs characteristic to this sub-hg can often be observed also in the Lithuanian population (9,5% from hg H, Kasperavičiūtėet al. 2004). In the Latvian gene pool, H1b was significantly more frequent than among Eastern Slavs (p ≤ 0.025).

Table 3.  Frequencies (%) of the subhaplogroups of haplogroup H
 Latvians1Estonians1Volga-Ural Finnic-speakers2Finns2Eastern Slavs2Slovaks2French2
n%n%n%n%n%n%n%
  1. 1this study

  2. 2data from Loogväli et al. (2004)

  3. Note – The fraction of unclassified hg H haplotypes involving haplogroups H8, H9, H10, H12, H13, H14 and H15 (Achilli et al. 2004; Loogväli et al. 2004), based on the data taken from literature (32 from 458 Achilli et al. 2004; Coble et al. 2004; Finniläet al. 2001; Herrnstadt et al. 2002; Howell et al. 2003; Ingman et al. 2000; Levin et al. 1999; Maca-Meyer et al. 2001; Mishmar et al. 2003; Palanichamy et al. 2004; Reid et al. 1994; Rieder et al. 1998) makes up only 5–10% of hg H, thus ca. 9 out of 133 in our sample.

H* 4231.61836.02142.0722.65130.92040.01734.0
H1 4030.11938.01734.01445.25332.1918.01326.0
H1*1712.81020.01428.026.53621.836.01122.0
H1a32.324.036.0412.974.248.024.0
H1b2015.0612.000.000.095.524.000.0
H1f00.012.000.0825.810.600.000.0
H2 86.048.000.0412.91911.512.024.0
H2*21.512.000.026.553.000.024.0
H2a164.536.000.026.5148.512.000.0
H3 32.236.000.026.574.224.0612.0
H4 21.500.024.000.031.824.000.0
H5 2015.024.000.0412.9106.1510.0510.0
H5*32.312.000.013.231.848.048.0
H5a11712.812.000.039.774.212.012.0
H6 43.024.024.000.0106.136.036.0
H7 43.012.048.000.021.212.048.0
H8 00.000.000.000.000.012.000.0
H11 107.512.048.000.0106.1612.000.0
Total (H) 1331005010050100311001651005010050100
Figure 3.

The Phylogenetic network of mtDNA haplogroup H haplotypes found among Latvians and their geographic neighbours. The legend shows the colour code for studied populations, Sample sizes (n) are shown in the brackets. Numbers on the links indicate observed mutations and are numbered according to the revised Cambridge Reference Sequence (Andrews et al. 1999); nucleotide change is specified by suffixes only for transversions. The gain or the loss of a restriction site is marked by a “+” or a “−”, respectively.

The frequency of H1a in the Latvian population was similar to that of other North-Eastern and Eastern European populations (Table 3). Significant differences of H1a frequencies were observed between Latvians and Finns (p ≤ 0.025). Among the latter, sub-hg H1b was not observed. On the other hand, sub-hg H1f, frequent in the Finnish population, was not found in the Latvian mtDNA pool (p ≤ 0.001).

Sub-hg H5, the second largest in the Latvian mtDNA pool (15% of H), was found to be significantly more frequent among Latvians than in the gene pool of hg H in Eastern Slavs (6%; p ≤ 0.025).

We found four representatives of sub-hg H7 in our sample of Latvian mtDNAs (Table 3 and Figure 3). All of them shared the HVS-I motif 16217–16311 and were collected from three different Latvian regions.

A quarter of mtDNA variants belonged to hg U in the Latvian gene pool. Five sub-hgs – U2, U3, U4, U5 and K – were observed among hg U haplotypes. The sub-hgs U4 and U5 were most abundant, covering about 40% of the hg U gene pool. The analysis of the mtDNA hg profiles in different European populations (Table 2) showed that the frequency of hg U4 is significantly higher in Eastern than in Western European populations (p ≤ 0.001, see details about population groups in the Methods section). In particular, significant differences of U4 frequencies were observed between Latvians and French, as well as between Latvians and Finns (p ≤ 0.001). Interestingly, the frequency of sub-hg U4 in Latvians is among the highest in Europe. In the central part of Latvia –Semigalia – U4 was found in 14.9% of all mtDNA variants, which is close to the frequency of occurrence of U4 observed in the Volga-Uralic region (Bermisheva et al. 2002). Perhaps more importantly, the heterogeneity of hg U4 is also relatively high in Semigalia: out of thirteen U4 haplotypes found among Latvians eight were observed there.

The other hg that has a different frequency pattern in Eastern and Western European populations (Table 2) is K; statistically significant differences of hg K frequencies were observed between Latvians and Western European populations, with hg K being more common among the latter (p ≤ 0.01). All hg K mtDNAs from Latvians shared one HVS-I-haplotype with a 16224–16311 motif. However, according to Finniläet al. (2001), this motif may be characteristic of many hg K branches, which differ at several coding region nps and have therefore been phylogenetically separated for a long time.

An example of region-specific differences in the Latvian mtDNA pool is provided by sub-hg U2, which had a particularly high frequency in Eastern Latvia –seven out of nine Latvian sub-hg U2 mtDNAs were found in the Lettigalia. The frequency of U2 was significantly higher in Lettigalia than in the Western parts of Latvia–Northern (p ≤ 0.01) and Southern Curonia (p ≤ 0.025).

We found only a single member of Asian-specific hg M – G2a – in Lettigalia, the Eastern part of Latvia.

To illustrate the genetic relationships of studied populations, PC analysis based on the frequencies of mtDNA hgs was performed (Figure 2). To see whether the close genetic relationships of paternal lineages of the Baltic-speakers and Finno-Ugric-speaking Mari, observed by Laitinen et al. (2002), can also be also seen while comparing the maternal lineages of these populations, the mtDNA data of Mari were included, among others, into the PC analysis. The analysis showed that all European populations, except Finns, French and Mari, fall into one cluster. Latvians and Lithuanians formed a tight cluster with Estonians, Russians, Poles, Germans and Norwegians. The first PC, which accounts for 34.3% of the total mtDNA variation, was determined mostly by a different frequency distribution of hgs K, U4, V and M. Hg K, which is at a higher proportion in Western European populations (especially the French) than in Eastern European populations, is one of the most influential components separating populations along the first PC. The opposite gradient, further strengthening the differences along the East-West axis, is formed by a combined frequency distribution of hgs U4, V and M that were more frequent among the Mari (see Table 2) than in the other populations. Hg U4 also contributes to the distribution of populations on the plot along the second PC, which retains 19.6% of the total mtDNA variation and separates Finns from other populations. This hg is rare among Finns, who instead have a high proportion of hgs U5 and W in their mtDNA pool; however, it has to be noted that the Finnish sample consists mostly of mtDNAs from the North-Central part of Finland, and could thus be somewhat biased for representing the genetic variation of the total Finnish population. To minimize the possible effect of rare and infrequent sub-hgs on the PC plot for the studied populations (Table 2), these sub-hgs that we found in less than five populations and that did not contribute more than 0.5% to the total mtDNA pool (A; N1a-c; preHV; preV; R, U*, U1, U3, U6–8) were pooled, first with their phylogenetically closest group and then with the group showing similar geographic spread. The results (data not shown) did not differ to any significant extent from those shown in the PC plot in Figure 2.

Figure 2.

Principal component (PC) analysis based on mtDNA haplogroup frequencies of some European populations. Population codes are given in alphabetical order as follows: est – Estonians; fin – Finns; fra –French; ger – Germans; lat – Latvians; lit – Lithuanians; mar – Maris; nor – Norwegians; pol – Poles; rus – Russians. The genetic variation retained by different components is shown in brackets.

Lineage Sharing Analysis

One hundred and twenty four different haplotypes belonging to 10 major hgs were observed in a sample of 299 mtDNAs from the Latvian population (Table 1).

In order to compare the mtDNA haplotype distribution in Latvians with those observed in neighbouring populations we examined the HVS-I sequence variation, also taking into account the information from coding region mutations, where available, in the context of published sequence data. Additional to those populations listed in Table 1, the comprehensive mtDNA database (for references, see Subjects and Methods) was used for background information.

One of our aims was to see whether the Baltic-speaking populations possess mtDNA variants that could reflect the influence of their distinct linguistic background compared to that of their geographical neighbours. The comparison of mtDNA haplotypes from Baltic-speakers revealed that most of the lineages shared between Latvians and Lithuanians are also present in neighbouring populations, in many cases among both Finno-Ugric- and Slavonic speakers, as well as among Western European Indo-European-speakers (Table 1). Only one lineage from sub-hg I1 with the HVS-I motif 16129–16172–16223–16311–16319 was not found in our database, but was present both in the Latvian (Semigalian) as well as in the Lithuanian (North Žemaičiai) mtDNA pool. The other haplotype from hg H sub-hg H11 with HVS-I motif 16224–16278–16293–16311, shared only by Baltic-speakers in Table 1, has been found previously, for example, in different Eastern European populations (Loogväli et al. 2004) and appears also in Central Asian populations (Comas et al. 1998). Both of these haplotypes are derived from two frequent founder-haplotypes – 16278–16293–16311 of sub-hg H11 and 16129–16172–16223–16311 of hg I, respectively, which have a wide geographical spread.

While the proportion of shared lineages among different populations was found to be quite similar for all populations studied, there are also examples where some derived haplotypes appear to be associated with a specific region or linguistic group. For example, sub-hg U2 that is characterised by only one haplotype with the HVS-I motif 16051–16129C-16189–16311–16362 in the Latvian population has so far been found only among Finno-Ugric-speaking Estonians (authors` unpublished data). Similarly, two haplotypes from hg U4 are shared only with Finno-Ugric-speaking populations: haplotype 16134–16356–16362 with Mari (Bermisheva et al. 2002) and Hungarians (authors` unpublished data) and haplotype 16189–16311–16356 with Estonians (authors` unpublished data) and Mansi (Derbeneva et al. 2002b). However, since both haplotype motifs involve “fast” mutations – 16189, 16362 and 16311 – their monophyletic origin in different locations needs to be confirmed before more firm long-distance phylogeographic implications can be drawn.

Analogous examples can also be found if one compares the mtDNA pools of Latvians and Slavonic- as well as Germanic-speaking populations (Table 1). For example, a haplotype from sub-hg H5 with the HVS-I motif 16167–16192–16304–16311 observed among Latvians is a derivative of the sequence variants that have been found at moderate frequencies in Germanic-speaking populations, and also among Eastern Slavs, interpreted as a possible marker of Slavonic migrations from central to eastern Europe (Malyarchuk & Derenko, 2001). The exact HVS-I sequence match with that present in Latvians has been observed so far only among Poles (see Table 1). In addition, two haplotypes of hg U4 with HVS-I sequence motifs 16214 and 16104–16294, and a third one, which has an HVS-I sequence identical to CRS, have been identified in the Latvian mtDNA pool (Table 1). All of them are characterized by a mutation at np 310 in HVS-II and have most probably undergone a back-mutation at np 16356. These sequence types are relatively frequent among Eastern Slavs, although present also in other Eastern and Central European populations (Malyarchuk, 2004). Most interestingly, the specific HVS-I haplotype 16104–16294 from this hg U4 branch is, again, a one-step derivative of the U4 HVS-I haplotype 16294 found among Poles (Malyarchuk et al. 2002). Studies of the spatial distribution of major sequence types of U4, determined by HVS-I motifs 16134–16356, 16179–16356 and those with back-mutation at np 16356, have shown that they have rather different geographical spreads. The first and third that were also found among our Latvian sample appear to be much more frequent in Central and Eastern European populations, whereas the sequence type 16179–16356 has been found mostly among Western Europeans (Malyarchuk 2004; Tambets et al. 2003) and was not observed in the mtDNA pool of Latvians studied.

We also found some unique haplotypes from our Latvian sample, not observed in other populations in our database. For example, a haplotype with HVS-I motif 16217–16311 from sub-hg H7, found in Eastern, Central and Western Latvian samples has been not described previously. The majority of unique sequences found in Latvians nevertheless derive from the HVS-I sequence variants that are common in most European populations. For example, haplotype 16080–16129–16142–16189–16356 from sub-hg H1 was found in Central and Western regions of Latvia, but has not been reported in published datasets. However, it is a two-step derivative of the HVS-I motif 16080–16189–16356 that is widely spread in Europe (see also Figure 3).

The Results of AMOVA

The results of the analysis of the genetic structure of investigated populations are presented in Table 4. Firstly, samples were grouped according to their linguistic background (groupings A, B, C, and D). The second grouping (E) was performed on the basis of the geographical location of populations. Although by far the largest fraction of genetic variation was found within populations, still in all groupings the proportion of genetic variation within groups and between individual populations was clearly higher than that in between group comparisons. The largest effect was observed for the Baltic-Finno-Ugric and the smallest for the Baltic-Slavic comparisons. The proportions of the variation between groups and within groups did not show any consistent difference when linguistic grouping (A to D) and geographic grouping (E) was compared, demonstrating the complexity for mtDNA diversity patterns in the part of Europe under study. The absence of phylogeographic structure of mtDNA variation in Latvians was further confirmed by AMOVA analysis (Table 4, groupings F and G), and almost all (∼99.5%) intrapopulation variation fell within ethnolinguistic groups.

Table 4.  The results of the analysis of molecular variance (AMOVA) for studied European populations and Latvian population
GroupingGroup1Group 2Group 3Group 4Group 5Distribution of Genetic Variance (%)1
Among groupsAmong populations within groupsWithin populations
  1. 1The statistical significance of the results is calculated as described in Subjects and Methods.

  2. Note. The population codes are given in alphabetical order as follows: Est – Estonians; Fin – Finns; Fra – French; Ger – Germans; Lat – Latvians; (Let – Lettigalians; Ncu – northern Curonians; Scu – southern Curonians; Sem – Semigalians;) Lit – Lithuanians; Nor – Norwegians; Pol – Poles; Rus – Russians.

ALat+LitPol+RusFin+EstGer+NorFra0.120.5499.34
 0.2874±0.01430.0108±0.00330.000±0.0000
BLat+LitFin+Est 0.230.8298.96
 0.3304±0.01820.000±0.0000.000±0.000
CLat+LitPol+Rus 00.1599.86
 0.6598±0.0150.0078±0.030.0019±0.0014
DLat+LitGer+Nor 0.040.6199.35
 0.341±0.01550.000±0.0000.000±0.000
ENor+Ger+FraPol+Rus+Lat+Lit+Fin+Est 0.060.6299.32
 0.2228±0.01240.000±0.0000.000±0.0000
FScu+Ncu+Sem+Let 0.3799.63
 0.029±0.00180.000±0.0000
GCuronian regionSouth-Eastern region 0.490.0599.46
 0.3404±0.00470.0291±0.00160.0268±0.0016

Discussion

The territory of the eastern coast of the Baltic Sea was permanently settled relatively late, as the land became habitable only approximately 12 000–10 000 years ago, after the end of the last glaciation. The first inhabitants of the region were hunter-gatherers, probably carriers of the Swiderian and Magdalenian cultures (Gimbutas, 1994). These pioneer hunter-gatherers likely arrived predominantly from the major European glacial refugia, situated in the present territory of Ukraine and Francocantabria (Dolukhanov et al. 2000). The formation of Baltic tribes in the territory was a complex process, associated with the interaction of different population groups, with the dispersal of Indo-European languages as well as with the process of neolithization in the region.

Mitochondrial DNA Variation of the Latvian Population

Our analysis of mtDNA variation revealed that the gene pool of Latvians is characterized by the same package of Western Eurasian mtDNA hgs that encompass about 95% of mtDNA variation in Europe (Richards et al. 1996, 1998; Torroni et al. 1996), with 70% of lineages belonging to hgs H and U (Table 2). Eastern Eurasian hgs form only a minute fraction (0.3%) of the Latvian mtDNA pool, represented by a single hg G2a lineage (Table 1). In Northern Europe, the two main hg M lineage groups are D5b and Z1 that are present in many populations at low frequencies (Tambets et al. 2004). Interestingly, these hgs were observed neither in our sample of 299 Latvians nor among 225 Lithuanians (Kasperavičiūtėet al. 2004; Tambets et al. 2004). In the latter the derivatives of hg M were not sampled, and the Eastern Eurasian package of mtDNA lineages was represented by a single hg A mtDNA. One D5b individual has been described among 545 Estonians (Tambets et al. 2004). This observation is of some interest because the more Northern Finnic-speaking populations, Finns, Saami and Karelians, possess East Asian maternal lineages at frequencies that are also low, but still higher – 2%-5%– (see Table 2 and Sajantila et al. 1995; Tambets et al. 2004) than in populations living in the Southeastern Baltic coast. This suggests that East Asian maternal gene flow had, in Eastern Europe, only a negligible impact on populations living in the South-eastern Baltic Sea region, though it reached populations living close to the sub-Arctic fringe of Europe.

The HVS-I haplotype-sharing analysis among Baltic-, Germanic-, Slavonic-, and Finno-Ugric-speaking populations (Table 1) showed that the vast majority of mtDNA haplotypes found among Latvians are identical to, or close derivatives of, those observed in other Eastern and Western European populations, irrespective of the linguistic affiliations of the latter. These results most likely reflect a deep common origin for the European mtDNA pool (Richards et al. 1996).

Inside Latvia we detected only a few region-specific differences in haplogroup frequencies. The high frequency of hg U2 in Lettigalia could be best explained by a recent founder effect (or sampling bias), because all U2 genomes found belong to the same HVS-I haplotype (see Table 1). Note that similar intra-population homogeneity was also observed among Lithuanians (Kasperavičiūtėet al. 2004).

Comparison with Data of Nuclear Genetic Markers

Studies of Y-chromosome markers have revealed that approximately one third to a half of all Y-chromosomes found in Latvians and Lithuanians belong to hg N3, defined by the mutation TatC (Lahermo et al. 1999; Laitinen et al. 2002; Zerjal et al. 2001, 1997; Tambets et al. 2004). This high frequency of hg N3, combined with its considerable diversity of microsatellite haplotypes, has also been found among Estonians (30–35%) and Volga-Finnic-speaking populations (20–50%) (Rosser et al. 2000; Zerjal et al. 2001; Tambets et al. 2004). The proportion of this variant of Y chromosomes drops drastically in the geographical neighbours of Latvians and Lithuanians – among Poles (2%), and is much less frequent also among Slovaks (3%), Ukrainians (6%) and Russians (8–14%) (Rosser et al. 2000; Tambets et al. 2004). The sharp decline of the frequency of hg N3 from 48% among the Finno-Ugric-speaking Saami to less than 8% among the Norwegians and Swedes can also be observed in Northern Scandinavia (Rosser et al. 2000; Zerjal et al. 2001; Tambets et al. 2004). Thus, based on N3 frequency distribution, Latvians and Lithuanians are closer to the adjacent Finno-Ugric-speakers than to their Slavonic-speaking neighbours, with whom they share Indo-European linguistic proximity. This finding has been interpreted as evidence of a common origin for Baltic- and Finnic-speakers (Laitinen et al. 2002). Recently, Kasperavičiūtėet al. (2004) found, similarly to Zerjal et al. (2001), that the microsatellite haplotype patterns within hg N3 are different among Lithuanians and Estonians. They noted, however, that it is unclear whether the observed differences suggest different source populations, as proposed by Zerjal et al. (2001), or rather more recent random genetic drift. Meanwhile, it is interesting to note that the calculations of Kasperavičiūtėet al. (2004) suggested very similar expansion times, around 7000 – 8000 BP, for the Lithuanian and Estonian hg N3 Y chromosomes, which are probably also applicable to Latvians. Therefore, it is indeed possible that the spread of hg N3 among the ancestral populations of Estonians, on the one hand, and the Baltic-speaking populations on the other, pre-date the advent of the Neolithic age in the East Baltic and may be part of the post-LGM re-colonisation of the region. An ancient language shift from Uralic to Indo-European among the Baltic-speakers has been suggested, linked to an earlier arrival of agriculture to the ancestors of the present-day Latvians and Lithuanians (Wiik, 2002).

The genetic proximity of the Baltic-speaking populations with the Volga-Finnic Mari, proposed based on Y-chromosomal diversity of those populations by Laitinen et al. (2002), most likely reflects simply an ancient, largely common heritage of Eastern European populations, rather than a specific link between the two populations – our analysis of the mtDNA hg frequency profiles (see Fig. 2) does not cluster Mari and Latvians closely together, and the comparison of mtDNA HVS-I haplotypes in these populations shows that they predominantly share only founder lineages, while differing for more derived haplotypes (Table 1).

It is also interesting to note that the populations in the East Baltic are the northern boundary of the spread of the “Adriatic” NRY hg I1b*-P37, present at very low frequencies both in Latvians and Estonians, while considerably more frequent in Byelorussians, Ukrainians, southern Russians and, in particular, in the Balkans (Rootsi et al. 2004). On the other hand, though there is no geographic border between Estonia and Latvia, there is a three-fold southwards frequency drop (14.8%vs 4.7%) of hg I1a*-M253, which is particularly frequent all over Scandinavia, including among the Saami (Rootsi et al. 2004). Its moderate presence in Estonia is probably due to the long-term presence of Swedish settlements in Estonian islands and in the northwestern coastal areas.

At the same time, none of the synthetic maps of Europe obtained by analysis of classical markers appear to reflect any sharp boundaries, corresponding to these seen in the spread of Y chromosome variation, e.g. between Poland and Lithuania (Barbujani & Sokal, 1990; Cavalli-Sforza et al. 1994). Meanwhile, although the Latvian, Estonian and Lithuanian populations share the highest frequencies in Europe of the mutation R408W of haplotype 2 of PKU, the analogy of its wider spread with Y chromosomal hg N3 cannot be observed – unlike hg N3 Y chromosomes, this particular disease-associated mutation is present at high concentrations all over the southern coast of the Baltic Sea, expanding into central and southeast Europe, yet is virtually unknown in Finland (Zschocke, 2003).

Overall, the present study fills a gap which existed and shows that the maternal lineages of Baltic-speaking populations, Latvians and Lithuanians, and of their Slavonic- and Finno-Ugric-speaking neighbours, in particular Estonians, form a close cluster. This cluster also includes Germans, as well as Germanic-speaking Scandinavians, suggesting a recent shared maternal ancestry as well as pre-historic and historic time gene flows across linguistic borders. However, this maternal gene flow (migration of females) contrasts with an apparent lack of westward flow of hg N3 Y chromosomes (migration of males). The Baltic-speaking populations largely share a mosaic of frequencies that, in Europe, brings them together specifically with the Finnic-speaking populations, both those living in the Baltic area and those living in the Volga Basin. It remains to be seen whether rapidly advancing technologies for complete genome variation studies may in future reveal autosomal genes with a spread closely matching that of either of the contrasting haploid genetic systems of the Y chromosome and mtDNA.

Acknowledgements

The authors would like to thank Ille Hilpus and Jaan Lind, for technical assistance; Laurent Excoffier and Ildus Kutuev, for their help with statistics; Henry Harpending, for the program POPSTR. We wish to express our gratitude to two anonymous reviewers for their helpful suggestions. The research of L.P. was supported by a grant from the European Social Fund (ESF) program no. ESS2004/3; K. T. received support from Estonian Science Foundation research grant no. 6040 and A. K. was supported by a grant from Latvian Council of Science National program no. 01.0023. The research of R.V. was supported by Estonian basic research grant no. 0182474 and European Commission Directorate General Research grants ICA1-CT-2000–70006 GENEMILL and QLG2-CT-2002–90455 GENERA.

Electronic-Database Information

The URL for data presented herein is as follows: Fluxus Engineering, http://www.fluxus-engineering.com/

Ancillary