Mitochondrial DNA Variation in Karkar Islanders

Authors

  • F. X. Ricaut,

    Corresponding author
    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    2. Center for Archaeological Sciences, Catholic University of Leuven, Celestijnenlaan 200E B-3001 Leuven, Belgium
    3. Centre d'Anthropologie, FRE 2960 CNRS, Université Paul Sabatier, Toulouse III, 37, Allées Jules Guesde, 31073 Toulouse Cedex, France
    Search for more papers by this author
  • T. Thomas,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    2. Anthropology Department, University of Otago, PO Box 56, Dunedin, New Zealand
    Search for more papers by this author
  • C. Arganini,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    Search for more papers by this author
  • J. Staughton,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    Search for more papers by this author
  • M. Leavesley,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    2. School of Humanities and Social Sciences, University of Papua New Guinea
    Search for more papers by this author
  • M. Bellatti,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    Search for more papers by this author
  • R. Foley,

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    Search for more papers by this author
  • M. Mirazon Lahr

    1. Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, The Henry Wellcome Building, Fitzwilliam Street, CB2 1QH, United Kingdom
    Search for more papers by this author

*Corresponding author: Dr François-Xavier Ricault, Center for Archaeological Sciences, Catholic University of Leuven, Celestijnenlaan 200E B-3001 Leuven, Belgium, Tel: + 32 (0)16326429, Fax: + 32 (0)16322980, E-mail: fx.ricaut@infonie.fr

Summary

We analyzed 375 base pairs (bp) of the first hypervariable region (HVS-I) of the mitochondrial DNA (mtDNA) control region and intergenic COII/tRNALys 9-bp deletion from 47 Karkar Islanders (north coast of Papua New Guinea) belonging to the Waskia Papuan language group. To address questions concerning the origin and evolution of this population we compared the Karkar mtDNA haplotypes and haplogroups to those of neighbouring East Asians and Oceanic populations. The results of the phylogeographic analysis show grouping in three different clusters of the Karkar Islander mtDNA lineages: one group of lineages derives from the first Pleistocene settlers of New Guinea-Island Melanesia, a second set derives from more recent arrivals of Austronesian speaking populations, and the third contains lineages specific to the Karkar Islanders, but still rooted to Austronesian and New Guinea-Island Melanesia populations. Our results suggest (i) the absence of a strong association between language and mtDNA variation and, (ii) reveal that the mtDNA haplogroups F1a1, M7b1 and E1a, which probably originated in Island Southeast Asia and may be considered signatures of Austronesian population movements, are preserved in the Karkar Islanders but absent in other New Guinea-Island Melanesian populations. These findings indicate that the Karkar Papuan speakers retained a certain degree of their own genetic uniqueness and a high genetic diversity. We present a hypothesis based on archaeological, linguistic and environmental datasets to argue for a succession of (partial) depopulation and repopulation and expansion events, under conditions of structured interaction, which may explain the variability expressed in the Karkar mtDNA.

Introduction

Karkar Island is an oval-shaped volcanic island Bismarck Sea about 15 kilometres off the north coast of mainland Papua New Guinea (PNG) (Fig. 1). The forested island is about 25 km in length and 19 km in width and is dominated by a 1,839 metre volcano. The volcano consists of an outer caldera formed around 9000 years ago, and a more recent inner caldera formed between 1500 and 800 years ago. The volcano is still active and a number of eruptions have been recorded periodically since 1643.

Figure 1.

Location of Karkar Island.

Karkar Island is within the Near Oceania region which encompasses New Guinea, the Bismarck Archipelago and Solomon Islands. Archaeologists, linguists and geneticists generally agree that Near Oceania was subject to at least two major pulses of human dispersal: an initial Pleistocene colonisation of Sahul by 40,000 BP (Groube et al. 1986; Leavesley et al. 2002; O'Connell and Allen 2004), and a recent Holocene expansion at approximately 3,300 BP which resulted in the settlement of Remote Oceania. Subsequent to this second phase, the archaeological record indicates minor movements of people continuing within Near Oceania, particularly in the Bismarck Sea (Spriggs 1997; Lilley 2000, 2004).

The archaeological record suggests that the earliest Pleistocene settlers were able to colonize offshore islands, crossing open ocean gaps to do so. By 35,000 BP people had colonized the Bismarck Archipelago, and had reached the northern Solomons by 29,000 BP (Wickler & Spriggs 1988). Recent genetic studies of the mitochondrial DNA of living populations have shown that some mtDNA lineages (esp. P and Q haplogroups, see below) are specific to this part of New Guinea-Island Melanesia and have coalescence dates roughly in accord with the archaeological dates for the first settlement of Sahul (Redd & Stoneking 1999; Forster et al. 2001; Friedlander et al. 2005; Merriwether et al. 2005). Linguistic analyses suggest that the first settlers of Near Oceania spoke languages related to the diverse ‘Papuan’ languages of today. The historical relationships of these languages are difficult to classify, but there may be at least 12 major family groupings occurring widely on mainland New Guinea and strewn intermittently across the islands of Near Oceania (Grimes 2005).

The second major pulse of human movement, at about 3,300 BP, is associated with the archaeologically defined ‘Lapita cultural complex’ (Green 1991) characterised by a distinctive dentate stamped pottery, stone tool kit, settlement pattern and economy. The earliest Lapita sites are found in the Bismarck archipelago, but within 500 years they are found on every major island group between there and Samoa – representing the first human settlement of Remote Oceanic islands. According to orthodox models Lapita is indicative of incursive populations ultimately deriving from Asia (Bellwood 1997, 2005), but modified and enlarged during contact with pre-existing populations throughout Wallacea and Near Oceania (Green 1991). This south-eastward population movement has been argued to have been fuelled by population expansion following the development of agriculture in Continental Asia (Diamond & Bellwood 2003).

Linguists tell a similar story (Green 1999), arguing that the Lapita expansion could be associated with the spread of the Austronesian language family, based on the fact that this is the sole family recorded in Remote Oceania. The Austronesian languages are thought to have originated in Taiwan around 5,000 to 6,000 years ago, from where they expanded across a vast swathe of the Indo-Pacific world (Blust 1995; Bellwood 1995). Linguists estimate that Austronesian languages were probably introduced to Island Melanesia from the west approximately 3,500 years ago at the tail end of this expansion (Ross 1988). From here they were taken by Lapita colonists into the remote Pacific.

In turn, geneticists have correlated this late-Holocene migration with mtDNA polymorphisms (9-bp deletion, positions 16189, 16217, 16261, 16247) defining haplogroup B4a1a1, which is argued to provide a link between Oceanic and Taiwanese populations. Widely referred to as the “Polynesian motif” (Redd et al. 1995) due to its dominance in that part of Oceania, this haplogroup is also very frequent in Micronesia and parts of Island Melanesia, but is absent in the PNG highlands. The presence of the Polynesian motif has also been confirmed in low frequencies in central and eastern Indonesian populations, and it probably originated either there or in Near Oceania during the Holocene (Pierson et al. 2006; Oppenheimer 2004). The analysis of whole mtDNA sequences has allowed the identification of its immediate precursor in Taiwanese Aboriginal groups (Trejaut et al. 2005) seemingly corroborating the linguistic model of Holocene movement of people out of Taiwan in a south-easterly direction (but see discussions in Hill et al. 2007; Friedlaender et al. 2007).

These correlations have led some to suggest that modern-day gene frequencies and language distributions are predictive of a major genetic/linguistic division in Near Oceania – between older Papuan-speaking groups ultimately derived from Pleistocene settlers of Near Oceania, and a more recent Austronesian-speaking group which has a history embedded in a late-Holocene expansion from Continental Asia. The genetic pattern has, however, become more complicated recently, as sampling coverage has increased, and as awareness of the effects of sampling error, as well as drift and interaction amongst small populations has developed. According to archaeological evidence, populations have always interacted in New Guinea-Island Melanesia, and even if the movements were intermittent and small scale, the isolation of small island populations would have been incomplete during the Pleistocene as well as the Holocene (Allen 2003; Terrell et al. 1997). In the recent past Austronesian and Papuan speaking groups were equally mobile, and consequently there may have been much population intermixture and replacement (Spriggs 1997; Friedlander et al. 2002).

Rather than language, the primary structuring factor in population movement and contact appears to have been a coastal versus inland division in the primary residence of groups. Coastal dwelling populations were much more mobile than their inland counterparts, and this is reflected genetically in that they are more intermixed (Friedlaender et al. 2007: p9). Island size is an important factor here because larger islands are more able to support permanent inland populations, which consequently can be expected to be more isolated. In Island Melanesia this pattern is reflected in the general frequency distribution of ancient and more recent haplogroups – those haplogroups deriving from Pleistocene era dispersals are most commonly found amongst inland dwelling Papuan speaking populations on the largest islands of New Britain and Bougainville, whilst coastal populations have predominantly recent haplogroups, such as the ‘Polynesian Motif,’ at very high frequencies, irrespective of language affiliation (Friedlaender et al. 2007: fig. 8; Cox & Lahr 2006).

Karkar Island is an interesting case study in this context. The Karkar population (currently around 45,000 inhabitants) is divided into two different linguistic groups, the Takia and the Waskia, each having a very restricted geographic distribution on the coastal margins of the island. Takia is part of the Bel language family within the large Oceanic subgroup of Austronesian, and is spoken by about half of the Karkar population. Takia speakers occupy the south-eastern part of the island, as well as nearby Bagabag Island and two villages on the mainland coast. The inhabitants of the north-western half of Karkar (and one village on the mainland coast), speak Waskia, a Papuan language belonging to the Madang family of the Trans New Guinea phylum (Ross 1988, 1999), which is thought to be about 6,000-10,000 years old on the basis of its comparative diversity and archaeological evidence (Pawley 2005).

Takia has undergone a process of “metatypy” (Ross 1999) – that is, its bound morphology and its lexicon are still Oceanic, but its syntax has been modelled on Waskia due to prolonged contact. Consequently, Takia can be described as a ‘Papuanised’ Oceanic Austronesian language. Waskia, on the other hand, appears not to have been historically influenced by Takia in any significant way.

Ross (1999) argues that syntactic changes in Takia were promoted by males in trading partnerships with men from Waskia. Ross hypothesises that the language commonly used in these partnerships was Waskia, and it was Takia men who were bilingual. Native speakers were thus the instigators of change (Ross 1999: p2).

Little archaeological research has been undertaken on Karkar Island (Egloff 1975), but significant data have been produced for the Bismarck Archipelago and north coast of PNG (Allen 2003). These data suggest that trade/exchange has been a feature of social life in the region for at least 20,000 years (Summerhayes & Allen 1993). Linguistically, the early presence of Austronesian speakers on the north coast of PNG is evidenced by very limited borrowing in Waskia from a pre- Proto Oceanic Austronesian language (i.e. probably before the Lapita period proper). However, the main movement of Austronesian speakers to the North Coast of PNG may have been relatively late, and, contrary to the general flow of people during the Lapita era, probably came from the east not the west. There is evidence of an east to west settlement of the north coast of PNG by Oceanic Austronesian speakers originating from New Britain (Ross 1988) and this is correlated with the development of new trade networks across the Vitiaz Strait evidenced by a westward distribution of distinct pottery types, obsidian and chert at 1,600 BP (Lilley 2000: pp185-8). The geographically central position of Karkar Island within this region suggests that it was probably entangled in these later flows of people and things, and that the presence of Takia on the island today has its origins in this later period.

The linguistic and limited archaeological data then, suggest a series of multiple population arrivals on Karkar, at different periods of prehistory, with a degree of structured population interaction throughout the sequence. The potential for biological and genetic research to contribute to our understanding of the population history of the island has been limited. Early work was conducted during the 1970s as part of the International Biological Programme (Harrison & Walsh 1974; Boyce et al. 1978). These studies were highly detailed surveys of the demography, genetics, epidemiology, physiology, biochemistry and nutrition of the population.

Through analysis of demographic patterns of residence and marriage, coupled with blood group, serum protein and red cell enzyme sampling in villages throughout the island, these studies provided useful baseline data on population structure. Initial cluster analyses showed a distinct correlation between genetic and linguistic information, with Waskia speakers grouping with mainland Papuan speaking populations, and Takia speakers grouping with nearby Austronesian speakers (Booth 1974). Relationships on the island itself were largely structured by geographic propinquity, and took on a circular coastal pattern limited by the large uninhabitable central volcanic cone. Nevertheless the linguistic boundary was thought to be of greatest isolating effect (Harrison et al. 1974). Later analyses, however, concluded that this was not as marked as first thought, ending on a somewhat pessimistic note:

“… the linguistic division is not marked and it is very doubtful if it would even be detectable on genetic evidence alone … It would seem that the approach and models which have been used here and which are being used widely elsewhere have very low powers of resolution of real situations despite their conceptual value.” (Boyce et al. 1978: p293).

Later research on multi-locus allele frequency data from the same populations concluded that no clear genetic distinction exists between the Austronesian and Papuan speakers residing on the islands and north coast of PNG, and that population affinities are based more on geographical proximity than on linguistic similarity (Serjeantson et al. 1983; Bhatia et al. 1995).

In the present study, we investigate the origins and structure of the Karkar population by analyzing the Hypervariable region 1 of the mitochondrial DNA control region and intergenic COII/tRNALys 9-bp deletion (Cann & Wilson 1983) of 47 Karkar Islanders belonging to the Waskia Papuan language group. We compare the Karkar mtDNA haplotypes and haplogroups with those from East Asians, Southeast Asians, Australian Aborigines, Melanesians, Micronesians and Polynesians. Through this phylogeographic analysis of the Karkar lineages we intend to elucidate the genetic background of the Karkar Islanders in the context of Pleistocene and late-Holocene population movements occurring regionally and the island-level patterns of structured population interaction referred to above.

Materials and Methods

Samples

DNA extractions were made from blood samples of 47 unrelated individuals from the Waskia language group of Karkar Island (north-eastern coast of Papua New Guinea, Fig. 1). The samples were collected by A. E. Mourant in the 1970s, and have been part of the Mourant Collection (Leverhulme Centre for Human Evolutionary Studies, University of Cambridge, United Kingdom) since that time. These samples, identified as ‘PKK’ in the records, were stored in glass tubes at –20°C. At the time of collection the census population of the island was 16,000 persons, and consequently our sample represents about 0.3% of the total – quite high for this region.

DNA Extraction

DNA was extracted from 200 μl of blood using the NucleoSpin Blood QuickPure kit (Macherey-Nagel, Epsom, UK) according to the manufacturer's conditions. DNA extracts were eluted from the filter-column with 50 μl of nuclease-free double-distilled water and were stored at −20°C until further use. The DNA extractions were performed in the Laboratory of Molecular Virology (Division of Transfusion Medicine, Department of Haematology of the University of Cambridge).

PCR Amplification

PCR amplifications were performed on selected regions of the mitochondrial DNA to examine polymorphisms in Hypervariable region 1 of the mtDNA control region. The 9-base pair (bp) intergenic region V deletion (Cann & Wilson 1983) was also amplified to confirm the affiliation of the mtDNA sequences, based on the HVS-I polymorphic sites, to the haplogroup B4a. To determine the presence or absence of the Region V 9-bp deletion we amplified a fragment of approximately 120-bp including the mtDNA region V using primers L8196/H8297 (Handt et al. 1996). Hypervariable region 1 (HVS-I) of the mtDNA control region was amplified using two sets of overlapping primers pairs: L15989 (Gabriel et al. 2001)/H16239 (Ivanov et al. 1996), and L16190/H16410 (Gabriel et al. 2001). We also used the primer H16167 (5′-GGGTTTGATGTGGATTGGG-3′) (Ricaut et al. 2004a) to resolve amplification problems linked to the polycytosine region located between nucleotide positions (np) 16184–16193 (Szibor and Michael 1999).

PCR conditions for these reactions were: predenaturation at 94°C for 10 min, followed by 40 cycles at 94°C for 30 sec, 30 sec at 48°C (L15989/H16239, L15989/H16167 and L8196/H8297) or 51°C (L16190/H16410), and 72°C for 45 sec; and final extension at 72°C for 5 min.

PCR amplifications were carried out in 50 μl of reaction mixture containing 2-6μl of the ancient DNA extracts, 10 mM Tris HCL pH 8.3, 50 mM KCL, 1.5 mM MgCl2, 1 mg/ml BSA, 200 μM each dNTP, 0.25 μM each primer and 2 U of Taq Gold Star (Eurogentec, Cambridge, UK).

Intergenic region V and HVS-I region amplification products were visualized on a 1% agarose gel and purified with Microcon-PCR filters (Millipore, Watford, UK) and systematically sequenced. Sequence reactions were performed on each strand, with the same primers as those employed for PCR amplification, by means of ABI Prism BigDye Terminator Cycle Sequencing Kit (PE Applied Biosystems, Oxford, UK.) according to the manufacturer's conditions. The sequence reaction products were analyzed on an ABI Prism 3100 (PE Applied Biosystems) automated DNA sequencer in the Sequencing Service of the Department of Zoology (Oxford University).

Data Analysis

In this article we follow the gene mutation nomenclature suggested by den Dunnen and Antonarakis (2001) for description of changes in DNA sequences. For quality control we follow the recommendations proposed by Bandelt et al. (2001) to detect and avoid the five major types of error (base shift, reference bias, phantom mutations, base misscoring, artefactual recombination). We rechecked all the sequence variations in the electropherograms using the BioEdit 5.0.9 program (http://www.mbio.ncsu.edu/Bioedit/bioedit.html; Hall 1999) as well as for the presence of any incongruences between the results obtained from PCR. Moreover, we checked if all the sequence variations obtained have previously been reported (http://www.mitomap.org/) and if the haplogroup and sub-haplogroup motif was fully represented and otherwise rechecked the relevant positions in the sequence.

The 373 bp long HVS-I sequences (np 16018-16390) from the Karkar samples were edited and aligned against the revised Cambridge reference sequence (rCRS, Andrews et al. 1999) using the BioEdit 5.0.9 program (Hall 1999).

Haplotype diversity (h) and nucleotide diversity (π) (Nei 1987) were calculated using the DNASP software (http://www.ub.es/dnasp/; Rozas et al. 2003).

Haplogroup affiliation of each mtDNA sequence was inferred based on the HVS-I haplogroup motif reported by Kivisild et al. (2002), Kong et al. (2003), Friedlaender et al. (2005) and Trejaut et al. (2005). When Karkar HVS-I sequences could not be affiliated to any haplogroup because (i) they were lacking some HVS-I haplogroup motif, (ii) they harboured mutations characteristic of several haplogroups, or (iii) non-coding region information was not available (in our study the limited amount of material allows only the analysis of the 9 bp deletion for a subset of samples), a (near-)matching method (Yao et al. 2002b) was used. Through this strategy the potential haplogroup status can then be inferred through a motif search and (near-)matching with the 6,227 sequences used for comparative analysis (cf. below) for which (i) haplogroup status has been confirmed with coding-region information in most cases and/or (ii) the full HVS-I haplogroup motif is represented. This allows us to link combinations of HVS-I mutations with certain mutations in the coding region or with sequences harbouring a full HVS-I haplogroup motif and anticipate the haplogroup status of our samples. The potential and utility of the near-matching method has previously been described (e.g. Yao et al. 2002b; Ricaut et al. 2004b). To perform (near-) matching we used the Blast 2.0 program (at http://www.ncbi.nlm.nih.gov) and searched for similar or neighbouring sequences.

To investigate the phylogenetic structure of the Karkar haplotypes we performed a Median-Joining (MJ) Network (Bandelt et al. 1999) using the program Network 4.1(http://www.fluxus-engineering.com/sharenet.htm).

The HVS-I sequences obtained from the Karkar samples were compared with the mtDNA sequences of 6,227 individuals from East and Southeast Asia compiled from the DDBJ/EMBL/GENBANK international nucleotide sequence database and additional literature (Table 1). Data were included from: 340 Polynesians, 538 Micronesians, 797 Melanesians, 254 Australian aborigines, 837 individuals from Island Southeast Asia, 278 individuals from the Indian Ocean, 952 individuals from Island East Asia, 1834 individuals from Continental East Asia, and 412 from Central Asia (Table 1). All of these sequences were aligned using the BioEdit 5.0.9 program (Hall 1999). The Arlequin package (http://www.lgb.unige.ch/arlequin) was used to determine the frequency distribution of the Karkar mtDNA haplotypes in the populations used for comparative analysis. The Karkar haplogroup frequencies were also determined in each of the reference populations.

Table 1.  East and Southeast Asian populations compared in this study
Region/Population (size)References1Region/Population (size)References1
  1. *Sequence lengths too short to include all the sequence variations present in the Karkar sequences obtained in this study.

  2. 1References cited: 1. Ingman & Gyllensten 2003; 2. Sykes et al. 1995; 3. Lum et al.1994; 4. Redd et al. 1995; 5. Lum et al. 2000; 6. Ingman et al. 2000; 7. Lum et al. 1998a; 8. Murray-McIntosh et al. 1998; 9. Hagelberg et al. 2000; 10. Friedlaender et al. 2002; 11. Oota et al. 2005; 12. Vigilant et al. 1991; 13. Van Holst Pellekaan et al. 1998; 14. Huoponen et al. 2001; 15. Redd and Stoneking 1999; 16. Tajima et al. 2004; 17.Comas et al. 1998; 18. Stoneking et al. 1992; 19. Tommaseo-Ponzetta et al. 2002; 20. Hurles et al. 2005; 21. Fucharoen et al. 2001; 22. Maca-Meyer et al. 2001; 23. Prasad et al. 2001; 24. Thangaraj et al. 2003; 25. Endicott et al. 2003; 26. Macaulay et al. 2005; 27. Melton et al. 1998; 28. Yao et al. 2004; 29. Oota et al. 2002; 30. Thangaraj et al. 2005b; 31. Thangaraj et al. 2005a; 32. Horai et al. 1996; 33. Nishimaki et al. 1999; 34. Trejaut et al. 2005; 35. Snall et al. 2002; 36. Lee et al. 1997; 37. Seo et al. 1998; 38. Comas et al. 2004; 39. Betty et al. 1996; 40. Yao et al. 2002b; 41. Yao et al. 2002a; 42. Hurles et al. 2003a; 43. Friedlaender et al. 2005.

  3. 2Province of Guangxi, Guangdong, Fujian, Zhejiang.

  4. 3Province of Hunan, Yunan.

  5. 4Province of Hubei, Anhui, Sichuan.

  6. 5Province of Lianoing, Shandong, Changsha.

  7. 6Province of Qinghai, Xinjiang, Gansu.

  8. 7Including HVS-I sequences from New Caledonia, New Britain, New Ireland and Mussau available from Friedlaender et al. (2005).

Polynesia (340) South East Asia Islands (837) 
 Cook islands (81)1, 2* Indonesia (126)4*, 15, 16, 18
 Samoa (101)3*, 2*,4*,5*,6,7* Java (21)5*,7*
 Tonga (11)3*, 2*, 1 Borneo (124)2*, 3*, 5*,7*, 20
 Hawai'I (28)3*, 5*,7* Malaysia (320)1, 3*, 16, 26
 Tahiti (4)4* Sumatra (99)26
 Australes (3)2* Philippines (147)1, 2*, 5*, 7*, 16, 22, 20
 New Zealand (63)2*, 8*Indian Ocean (278) 
 Marquesas (19)2* Nicobar islands (84)23, 24, 25, 30, 31
 Rapanui (30)5*,7*,42 Urak Lawoi/Moken (16)7*
Micronesia (538)  Andaman islands (141)24, 25*
 Marshall islands (39)2*, 3*, 5*,7* Madagascar (37)20
 Kiribati (26)5*,7*East Asia Islands (952) 
 Yap islands (162)5*,7* Taiwan (Aboriginal) (668)27, 34
 Nauru (29)5*,7* Japan (284)1, 32*, 33, 37
 Kosrea (29)5*,7*East Asia (1834) 
 Palau (139)5*,7* Korea (421)35, 36
 Pohnpei (29)5*,7* China (across China) (75)5*, 28
 Kapingamarangi (33)2*, 7* South China Coast2 (170)39, 41
 Mariana islands (52)5*,7* South Central China3 (112)41
Melanesia (797)  Central China4 (42)41, 40
 Island Melanesia (76)43 North China5 (151)40
 Vanuatu (97)2*, 7*, 9, 43 East China6 (125)41, 40
 Fidji (17)7*, 43 Not Han Chinese (340)41
 North-Bougainville (43)6, 43 Vietnam (91)5*, 16, 29
 Santa Cruz islands (89)10, 43 Thailand (307)1, 7*, 21, 40, 11
 Papua New Guinea (475)1, 3*, 4*, 5*, 7*, 6, 12*, 18, 19*, 43 Central Asia (412)17, 28, 38
 Australia (Aboriginal) (254)3*, 5*, 6, 7*, 12*, 13, 14, 15 

In order to infer the phylogenetic relationship of the Karkar population with other populations from Asia and the Pacific area, Fst values were calculated (Arlequin package) based on the number of pairwise differences between sequences. The statistical significance of Fst values was estimated by permutation analysis, using 10,000 permutations. A multi dimensional scaling (MDS; Kruskal 1964) analysis of the matrix of Fst distances was performed using the Xlstat software (http://www.xlstat.com/en/home/). The complete data set was composed of 325-bp (np 16050-16374) of the HVS-I sequences from 33 modern populations representing 2135 individuals. The populations used were as follows: 38 Polynesians (Redd and Stoneking 1999; Ingman et al. 2000, Ingman & Gyllensten 2003; Hurles et al. 2003b); 640 aboriginal Taiwanese from 9 tribes (Trejaut et al. 2005); 59 individuals from the Philippines (Tajima et al. 2004); 231 Japanese (Tajima et al. 2004); 141 Chinese (Yao et al. 2002b); 52 Malaysian (Tajima et al. 2004); 215 individuals from Thailand (Fucharoen et al. 2001); 35 Vietnamese (Oota et al. 2002); 54 Indonesian (Tajima et al. 2004); 59 East Indonesians from Molucca Islands and Nusa Tenggaras (Redd & Stoneking 1999); 33 Individuals from Nicobar Islands (Prasad et al. 2001); 41 individuals from Vanuatu (Hagelberg et al. 2000); 64 Santa Cruz Islanders (Solomon Islands, Friedlaender et al. 2002); 39 individuals from the coast of PNG (Ingman & Gyllensten 2003; Redd et al. 1995; Redd & Stoneking 1999) and 11 from the PNG highlands (Ingman & Gyllensten 2003); 202 individuals from West New Guinea (Tommaseo-Ponzetta et al. 2002) and 221 Aboriginal Australians (van Holst Pellekaan et al. 1998; Huoponen et al. 2001; Redd & Stoneking 1999).

Results

HVS-I Sequences in Karkar Islanders

A 373 bp segment of the mtDNA HVS-I region was sequenced (np 16018 to 16390 of the rCRS; Andrews et al. 1999) for 47 individuals and confirmed on both strands (Table 2). The nucleotide sequence data reported in this paper will appear in the DDBJ/EMBL/Genbank nucleotide sequence databases with the accession numbers DQ309847 to DQ309867.

Table 2.  Mitochondrial HVS-I sequences (between positions 16018 and 16390) of 47 individuals from Karkar island and their haplogroup attributions
HaplotypesIndiv.Polymorphic positions19-del2 Haploroup (%)
160661607516092160931612616129161441614816162161721617616181161821618316189161921621716221162231624116247162611626516288162911629416297163041631116319163421634316355163571636216390
rCRSATTTTGTCATCAAATCTCCAACATCCTTTGTACTTG
  1. 1Numbered according to the revised Cambridge Reference Sequence (Andrews et al. 1999).

  2. 2Region V length polymorphisms variant according to Lum and Cann (1998b) classification.

  3. 3ND, not determined.

PKK15............CCC.C....T..............X.I B4a (10.6)
PKK21.............CC.C...GT..............X.I B4a1a1(23.4)
PKK33............CCC.C...GT..............X.I B4a1a1
PKK42...........GCCC.C...GT...........C..X.I B4a1a1
PKK51....C......GCCC.C...GT...........C..X.I B4a1a1
PKK64....C.......CCC.C...GT..............X.I B4a1a1
PKK73.....ACT..........T..TC.....C..G....ND3 Q1f (14.9)
PKK82.C...ACT.........TTG.TC.....CA.G....ND Q1f
PKK92.....ACT..........TG.TC.....C..G....ND Q1f
PKK103.....ACT..........TG..C.T...C..G....ND Q1 (6.4)
PKK112...C.ACT..........T...C.....C..G..C.ND Q1a2 (4.3)
PKK122G....A............TG............T...ND Q2d (4.3)
PKK131G....A............TG................ND Q2 (2.1)
PKK144..... ............T.....T.........CAND E1a (8.5)
PKK154....CA.........T..T.......C.........ND M7b1 (10.6)
PKK161....CA.........T..T.................ND M7b1
PKK171.....A...C...............T.C......C.ND F1a1 (2.1)
PKK181........G.T......................C..ND P1 (2.1)
PKK191..................T...............C.ND (D/G) (10.6)
PKK203..C. A.................C...G........ND ND
PKK211.............................AC.....ND ND
Total47 

In the 47 Karkar samples, 21 sequence types were found, differing at 36 positions at which 37 substitutions had taken place (32 were transitions and 5 were transversions). A total of 8 sequence types (17%) occurred in single individuals, and the most frequent types were present in 5 individuals.

The values of the haplotype diversity (h = 0.96 ± 0.010) and nucleotide diversity (π= 0.022 ± 0.000) of the Karkar population were relatively high, as well as the number of pairwise differences (PD), which is 8.5. These values are higher than those found in European, Central Asian or East Asian populations (Comas et al. 1998; Prasad et al. 2001; Yao et al. 2002b), but comparable to the coastal dwelling populations of the Bismarck Archipelago (Friedlaender et al. 2007: table s4).

MtDNA Haplogroup Affiliation

Sixteen Karkar samples contain substitutions at np 16189 T>C, 16217 T>C and 16261 C>T which, associated with the 9-bp deletion, are characteristic of the Asian haplogroup B4a (Kong et al. 2003). Eleven of these individuals harbour mutations at np 16247 T>C characteristic of sub-haplogroup B4a1a1, the “Polynesian Motif” (Trejaut et al. 2005).

Fifteen individuals were classified as belonging to haplogroup Q. Twelve Karkar samples harbour substitutions at np 16129 G>A and 16241 A>G corresponding to the two specific mutations of haplogroup Q which is especially common in New Guinea-Island Melanesia (Forster et al. 2001; Friedlaender et al. 2005). Twelve of these 15 individuals have additional sequence polymorphisms at site 16144 T>C, 16148 C>T, 16265 A>C, 16311 T>C, and 16343 A>G which are associated with the haplogroup Q1. Elsewhere, 4 of them harbour a substitution at np 16261 C>T characteristic of the sub-haplogroup Q1f. Three other individuals lacking the substitutions at np 16241 A>G could also be affiliated by the near-matching method into sub-haplogroup Q1f. Two individuals affiliated to haplogroup Q1 also harbour substitutions at np 16362 T>C and can be affiliated to sub-haplogroup Q1a2. The HVS-I sequences of the 3 last individuals affiliated to haplogroup Q contain substitutions, in addition to np 16129 G>A and 16241 A>G, at np 16066 A>G which is characteristic of haplogroup Q2, and two of them harbour mutations at np 16355 C>T and are affiliated to haplogroup Q2d.

Four Karkar samples contain substitutions at np 16362 T>C and 16390 A>G characteristic of haplogroup E (Kivisild et al. 2002). With the presence of an additional mutation at np 16291 C>T these 4 individuals can be assigned to the haplogroup E1a (Trejaut et al. 2005).

Four Karkar individuals have HVS-I sequences harbouring substitutions at np 16129 G>A, 16192 C>T and 16297 T>C characteristic of East Asian Haplogroup M7b1 (Kivisild et al. 2002; Trejaut et al. 2005). The HVS-I sequence of another individual differed from the HVS-I sequence of the 4 previous individuals by the absence of a single mutation at np 16297, which is a part of M7b1motif. In spite of this absence, this HVS-I sequence can be assigned by the near-matching method into haplogroup M7b1.

One individual harbouring substitutions at np 16129 G>A, 16172 T>C and 16304 T>C is affiliated to the East Asian haplogroup F1a1 (Kivisild et al. 2002; Trejaut et al. 2005).

Another individual contains substitutions at np 16176 C>T and 16357 T>C. The mutations at np 16176 C>T, 16266 C>T and 16357 T>C are characteristic of the haplogroup P1 (Forster et al. 2001; Friedlaender et al. 2005). Nevertheless, in spite of the absence of the mutation at np 16266 this Karkar individual could be assigned by means of the near-matching method to haplogroup P1, which is distributed in the Southwest Pacific (PNG, Melanesia and Australia) (Forster et al. 2001; Friedlaender et al. 2005).

The HVS-I sequence of one individual contains substitutions at np 16223 C>T and 16362 T>C that are characteristic of Asian haplogroup D (Torroni et al. 1993; Richards et al. 2000). However, these two mutations could also assign this sequence to haplogroup G or another type in M or N (Yao et al. 2004). Unfortunately, a near-matching search against published mtDNAs with haplogroup status confirmed by coding region did not allow a more accurate confirmation of the haplogroup status of this sequence. Similarly, the HVS-I sequences of the 4 remaining individuals (PKK 20 and 21) could not be affiliated with certainty to any haplogroups even by using the near-matching method, because they had no characteristic sets of D-loop mutations required for such assignments. It is likely that additional information obtained from the non-coding region might have contributed to solve the haplogroup status of some of the Karkar lineages, but unfortunately the quantity of material available did not allow us to perform such analyses.

The haplogroups of the Karkar population are summarized in Table 2. All the Karkar lineages for which a haplogroup has been determined belong to East Asian, Southeast Asian or ancient Near Oceanic haplogroups. However, 10.5% of the Karkar lineages could not be affiliated to a specific haplogroup.

Phylogenetic Structure of the Karkar Islanders

The haplotype genetic relationships of the 47 HVS-I sequences constituting our sample were investigated using a MJ network (Fig. 2). The network shows a reticulated structure that reflects the relatively high rate of homoplasy in the HVS-I of the mtDNA control region. The topological structure of the MJ network shows two major haplotype clusters which include the Karkar haplotypes affiliated (i) to haplogroup B4a (haplotypes 1 to 6, Table 2) which includes 34% of the Karkar lineages and (ii) the haplotypes affiliated to haplogroup Q (haplotypes 7 to 13, Table 2) which includes 31.9% of the Karkar lineages. The remaining lineages (34.1%) are only represented by one or two HVS-I sequence types and are affiliated to haplogroups that have a frequency ranging between 10.5% (haplogroup M7b1) to 2.1% (haplogroup P1, F1a1 and E1a).

Figure 2.

Median Joining network of 47 mtDNA sequence haplotypes belonging to the Karkar population. Karkar haplotype names are listed in Table 2. Data encompass hypervariable region I of the mtDNA control region (16018–16390). The size of the circles is proportional to the frequency of that haplotype in the sample. Links represent mutations (less 16,000) from the revised Cambridge Reference Sequence (Andrews et al. 1999). Nd, not determined.

Geographic Distributions of the Karkar Haplogroups

The geographic distribution of the Karkar haplogroups among the reference populations (6,227 individuals) used in this study suggests that the 6 major haplogroups (Q, P, B4a, E1a, F1a1, and M7b1) affiliated with the Karkar HVS-I sequences can be regrouped in 4 different geographic areas, some of which overlap (Table 3).

Table 3.  Estimated frequencies (%) of Karkar haplogroups in the populations used for comparative analysis presenting these haplogroups.
Region/Population (size)1Karkar haplogroups
QtotalQ1Q1fQ1a2Q2Q2dQ3E1aM7b1B4aB4a1a1F1a1P1P2
  1. 1References listed in Table 1.

  2. 2Sequence length from these populations is too short to present the full haplogroup motifs. The frequencies obtained should be treated with caution.

  3. 3Present study.

Polynesia (340)2
 Cook islands (81)17.317.3-------14.861.7---
 Samoa (101)3--------20.872.3---
 Tonga (11)18.218.2-------27.318.2---
 Hawai'i (28) 4.5 3.6-------10.785.7---
 Tahiti (4)5050------------
 Australs (3)100100------------
 New Zealand (63)-1.6------- 6.373.1---
 Marquesas (19) 5.35.3------- 5.363.2---
 Rapanui (30)1010------- 8.391.6---
Micronesia (538)2
 Marshall islands (39)---------35.941---
 Kiribati (26)---------34.630.7- 3.8-
 (outer)Yap islands (162) 5.55.5-------17.948.8- 1.9-
 Nauru (29)--------- 6.919.3---
 Kosrae (29)---------10.331.1---
 Palau (139)---------10.125.2-29.5-
 Pohnpei (29) 3.43.4------- 6.934.5- 3.4-
 Kapingamarangi (33)---------15.2 9.1---
 Mariana islands (52)--------- 1.9 9.6---
Melanesia (797)
 Island Melanesia (76)19.715.7--4--------10.5
 Fiji (17)23.211.8-5.85.8-------- 5.8
 North-Bougainville (43) 2.32.3------------
 Vanuatu (97) 4.13.1-1----- 5.1 6.2-10.3-
 Santa Cruz islands (89) 8.810--7.8---- 1.134.7- 1.1-
 Papua New Guinea (475)45.816.613.412.6-1.31.9-- 2.510.6-10.610.3
 PNG Highlands (66)25.612.11.59-1.51.5-----25.7 4.5
 PNG Coastal (99)18.712.1-1.1-3.32.2--14.650- 6.118.2
 PNG Highlands/Coastal (55)27.210.910.91.8-3.6--- 2.827.8- 5.414.5
 Karkar Island (47)331.9 6.414.94.32.14.3-8.510.610.623.42.1 2.1-
Australia (Aborigines) (254)--------------
South East Asia Islands (837)
 Indonesia (126)53-----6-126241-
 Java (21)--------2-1---
 Borneo (124)-------2-2-7--
 Malaysia (320)-------529-25--
 Sumatra (99)-------339-11--
 Philippines (147)-------1128-12--
Indian Ocean (278)
 Nicobar islands (84)-----------9.5--
 Andaman islanders (141)--------------
 Madagascar (37)---------- 8.1---
East Asia Islands (952)
 Taiwan (Aboriginal) (668) 
 Atayal (116) Northern Moutain--------- 0.9----
 Saisiat (63) Northern Moutain-------14.2 1.6-----
 Tsou (60) Central Mountain-------5---9--
 Bunun (96) Central Mountain-------19.8---2--
 Paiwan (62) Southern Moutain-------3.2-2-1--
 Rukai (50) Southern Moutain-------4---3--
 Puyuma (52) Southern Moutain-----------4--
 Amis (105) East coast--------4.8 1.9-3--
 Tao (64) East coast-------1.6---12--
 Japan (284)--------82-3--
East Asia (1834)
 Thailand (307)--------2.3 1.3-13.4--
 North China (151)--------1.34-2--
 Not Han Chinese (340)--------1.2 1.2- 3.8--
 China (across China) (75)--------5.34- 1.3--
 South China Coast (170)--------2.4 3.5- 6.5--
 South Central China (112)--------7.111.6-15.2--
 Central China (42)--------- 2.3- 7.1--
 West China (125)--------4 4.8-4--
 Korea (421)--------2.1 1.4- 0.7--
 Vietnam (91)--------12 6.6- 8.81-
Central Asia (412)--------0.2 0.4- 1.2--

The haplogroups P and Q, which encompass 34% of the Karkar lineages, represent the first group. Twelve lineages (25.5%) belong to haplogroup Q1 and one lineage to the haplogroup P1. Both of these lineages have a coalescence date of approximately 50,000 years BP, reflecting an ancient population expansion in New Guinea after first settlement (Friedlaender et al. 2005). Three other lineages belong to the haplogroup Q2 for which the expansion date is approximately 10,000 years later than those for haplogroups Q1 and P1 (Friedlaender et al. 2005). The phylogeographic distribution of these haplogroups (Table 3 of this study; Table 2 of Friedlander et al. 2005) suggests that they are restricted to an area centred on New Guinea-Island Melanesia. Indeed, haplogroups Q1, P1 and Q2 are absent in East Asia, Indian Ocean, Australia and Island Southeast Asia, with the exception of the presence of Q1 and P1 in Indonesia (Nusa Tenggaras Islands, Redd & Stoneking 1999). Interestingly, the distributions of P and Q do not completely overlap. Haplogroup P is absent in Polynesia, and rare in the western half of New Guinea (Tommaseo-Ponzetta et al. 2002) and in Island Melanesia (Table 3). Haplogroup Q1 is present in Polynesia, Micronesia (Yap and Pohnpei Islands; Lum et al. 1998a; Lum & Cann 2000), Island Melanesia and PNG. Haplogroup Q2 is absent in Polynesia and Micronesia and mainly detected in Island Melanesia (Table 3 of this study; Table 2 of Friedlaender et al. 2005).

Haplogroup B4a and its sub-haplogroup B4a1a1 (the “Polynesian motif”) constitute the second group, comprising 10.6% and 23.4% of the Karkar lineages respectively (Table 2). Haplogroup B4a is widely spread throughout the geographic areas considered in this study (Table 4), but the distribution of the sub-haplogroup B4a1a1 is restricted to Indonesia and Java, Island Melanesia, Micronesia, and Polynesia. This is consistent with the relative coalescence ages of haplogroups within this lineage corresponding to the movement of peoples. Haplogroup B4a probably arose approximately 29,100 ± 7100 years BP (Forster et al. 2001) somewhere in Southeast Asia before the last glacial maximum. Haplogroup B4a1a, the precursor to the Polynesian motif, has been estimated to coalesce at 13,200 ± 3800 years BP (Trejaut et al. 2005), while the TMRCA (time to most recent common ancestor) for the B4a1a1 motif itself, in Melanesians and Polynesians, has been recently estimated using two different methods, to 7,900±1700 BP, and 6,200±1800 BP (Pierson et al. 2006). These dates are considerably earlier than those associated with the Lapita expansion, or the spread of the Austronesian languages, but, accepting molecular dating limitations, they also confirm a general west to east movement of people during the Holocene.

Table 4.  General geographic distribution of the Karkar haplotypes
Haplotype specific to Karkar IslandersHaplotype shared
HaplotypeHaplogroupNew Guinea – Island MelanesiaEast Asia and Island/Peninsula Southeast Asia
HaplotypeHaplogroupLocation (indiv.)HaplotypeHaplogroupLocation (indiv.)
  1. 1ND, not determined.

  2. 2Karkar haplotypes shared with Polynesian and Micronesian populations (for the short fragment available).

4B4a1a132B4a1a1Vanuatu (3)12B4aChina (1), Taiwan (25), Philippines(3), Indonesia(2), Sumatra(8), Malaysia(2)
5B4a1a17Q1fPNG Highlands (2)14E1aTaiwan(24), Philippines(3), Indonesia(5), Malaysia(5)
8Q1f9Q1fWest Papua (4)15M7b1Taiwan(1), Philippines(1), Malaysia (6), Vietnam (6)
10Q111Q1a2PNG (1)17F1a1Taiwan (6), Philippines (1), Sumatra (1)
16M7b112Q2dCoastal PNG (5)19ND1Japan (7), China (2), Central Asia(11)
18P113Q2Vanuatu (1) 
20ND 
21ND 

The third group represents the geographic distribution of haplogroup F1a1 (2.1% of all the Karkar lineages) and M7b1 (10.6% of all the Karkar lineages). These haplogroups are restricted to Continental East Asia (China, Korea, Thailand, Vietnam, etc.), Island East Asia (Taiwan and Japan) and Island South East Asia (Table 3). Interestingly, despite being present in our Karkar data, these haplogroups are not present in the wider dataset of populations from New Guinea, PNG and Island Melanesia. The coalescence time of the M7b1 haplogroup has been estimated at 20,200 ± 13400 years BP (Trejaut et al. 2005) and those of the haplogroup F1a1 at 7,300 ± 2700 years BP (Hill et al. 2007). The two Karkar lineages belonging to haplogroup M7b1 harbour a further substitution at np 16126 which is absent among M7b1 haplotypes from Continental Asia, but is shared by Aboriginal Taiwanese (Trejaut et al. 2005). The origins of Karkar members of haplotype M7b1 are consequently most likely to be in the Island Southeast Asian region.

The fourth group is the haplogroup E1a, which has been found in Thailand, in Sabah Aborigines, in Taiwanese Aborigines, and across Indonesia. It is widespread, though relatively uncommon, in western Island Melanesia (esp. New Britain) but on current evidence does not occur east of the Bismarck Archipelago (Merriwether et al. 2005; Friedlaender et al. 2007). Additionally, haplogroup E1a has not been detected in the population of PNG (Table 3). The coalescence time of haplogroup E1a has been estimated at 11,910 ± 5200 years BP and its current distribution is probably part of the generalised west to east movement of populations during the Holocene (Trejaut et al. 2005).

Sequence Sharing and Haplotype Distribution

Table 4 shows the general location of the reference populations (6,227 individuals) used in this study which share Karkar haplotypes. We can see in Table 4 that the Karkar population contains 8 (31%) unique sequence types (PKK 4, 5, 8, 10, 16, 18, 20, 21), representing 12 individuals, which were not shared by any other populations studied. On the other hand 11 (52.4%) Karkar lineages (PKK 1, 3, 7, 9, 11, 12, 13, 14, 15, 17, 19), representing 28 Karkar individuals, are shared without ambiguities by other populations: 6 Karkar HVS-I sequences (28.6%) are shared within a restricted geographic area (New Guinea-Island Melanesia) and 5 lineages (23.8%) are shared throughout a wider geographic range (Southeast Asian Peninsula, East Asia and Island Southeast Asia). Finally, we cannot exclude the possibility that 2 Karkar lineages (PKK 2 and 6) could be shared by Polynesian and Micronesian populations. Nevertheless, the relatively short DNA fragments available from these populations (150 to 180 bp) allow only a partial comparison with the Karkar HVS-I sequences, and this sharing should be considered with caution.

Genetic Distances

To infer phylogenetic relationships between the Karkar population and other Asian and Oceanic populations we performed a MDS analysis (Fig. 3) based on the Fst distance table (not shown because of its wide size). The majority of the Fst values between each pair of populations were significant at 5% level (excepted between two pairs of populations). The MDS indicates that the Karkar population clusters with its geographical neighbour populations: coastal and highland PNG groups and Solomon Islands population.

Figure 3.

MDS plot based on pairwise Fst values showing relationships among the 33 populations from Asia and Pacific area. Australia Arnhem Land (Aust ARH), Australia Northwest (Aust NW), Australia Riverine (Aust RIV), Australia Yuendumu (Aust YU), Australia Walbiri (Aust WL), Thailand (Thai), Vietnam (Viet), Taiwan Tsou (Tai TS), Taiwan Amis (Tai AM), Taiwan Yami (Tai YA), Taiwan Bunun (Tai BU), Taiwan Puyuma (Tai PU), Taiwan Paiwan (Tai PA), Taiwan Rukai (Tai RU), Taiwan Atayal (Tai AT), Taiwan Saisiat (Tai SA), Malaysia (Malay), Indonesia (Indo), East Indonesia (East Indo), Philippine (Phili), Nicobar Islands (Nicobar), Solomon Islands (Solomon), Papua New Guinea-Coast (PNG Coast), Papua New Guinea-Highland (PNG Highlands), West New Guinea-Fringe Highlands (WNG FH), West New Guinea-Sea Coast (WNG SC), West New Guinea-Central Highlands (WNG CH), West New Guinea-Lowland Riverine (WNG LR), and Karkar Island (Karkar). The stress value for the MDS plot is 0.171.

Discussion

The mtDNA gene pool of the Waskia speaking portion of the Karkar population indicates a high degree of genetic variability in terms of haplotype diversity, nucleotide diversity, and the number of pairwise differences. This diversity is also shown in the affiliation of the HVS-I sequences to a wide range of haplogroups (Q1, Q2, P1, M7b1, F1a1, E1a, B4a and B4a1a1), which have different geographic and temporal origins (Table 3). When we look more specifically at the geographic distribution of the Karkar haplotypes and haplogroups (Table 3 and 4), it is apparent that the Waskia mtDNA gene pool can be grouped into three different clusters, each allowing a specific part of Karkar population history to be understood.

The haplogroups Q1, Q2 and P1, which constitute the first cluster, find their origin in New Guinea-Island Melanesia (Table 3; Friedlaender et al. 2005). Their phylogeographic distribution suggests that 34% of the Waskia lineages originated from the population(s) which first settled New Guinea-Island Melanesia around 40,000 BP based on archaeological data (Groube et al. 1986; Leavesley et al. 2002). This result is also shown by the MDS (Fig. 3) and by 6 Karkar HVS-I types (28.6%) which are, according to the reference populations used in this study, only shared with New Guinea-Island Melanesia populations (Table 4).

A second cluster of Karkar lineages includes those affiliated to haplogroup B4a (34% of the population), which is widespread in Southeast Asia, Oceania and the Pacific (Table 3; Schurr and Wallace 2002). Among these lineages 69% (23.4% of all the Karkar lineages) are affiliated to haplogroup B4a1a1 (the so-called “Polynesian motif”). This haplotype is generally thought to be indicative of recent Holocene population movements, possibly affiliated with the spread of the Austronesian language family. It has its origins in Indonesia or Near Oceania, and is linked to ancestral forms occurring as far west as Taiwan (Trejaut et al. 2005).

The phylogeographic distribution of haplogroups F1a1, M7b1 and E1a (representing 21.2% of the Karkar lineages) and their affiliated Karkar haplotypes (Table 3 and 4), shows a similar pattern to the B4a lineages and can be placed in the same cluster. Again, the current distribution of these haplogroups and haplotypes suggests a geographic origin in the region encompassing Taiwan, the Philippines and Indonesia during the Holocene. This is particularly clear through the distribution of (i) haplogroup E1a which is only found amongst Aboriginal Taiwanese and Island Southeast Asian populations (Table 3) and (ii) the Karkar haplotype (PKK14) affiliated to haplogroup E1a which is found with the highest frequency in some Aboriginal Taiwanese groups (16.7% in Bunum population) (Melton et al. 1998; Trejaut et al. 2005). Similarly, although haplogroup M7b1 is widely distributed amongst Southeast Asian and Continental Asian populations (Table 3), the derived Karkar lineages affiliated to haplogroup M7b1 contain substitutions that link this lineage to the Aboriginal Taiwanese (see Results section, and Trejaut et al. 2005). In fact all of the Karkar haplotypes also found in populations from Asia are systematically shared in high frequencies by populations in the Wallacea region.

We can thus postulate that around 55.2% of the Waskia lineages (34% affiliated to haplogroup B4a and B4a1a1, and 21.2% affiliated to haplogroup F1a1, M7b1 and E1a) could have been introduced to Karkar Island by recent Austronesian-speaking populations. If so, the transmission of these mtDNA lineages was not associated with a simultaneous transmission of Austronesian language. In fact the flow of linguistic transmission appears to have been in the opposite direction, with Austronesian Takia modelling on Papuan Waskia (Ross 1988). This absence of a strong association between language and mtDNA variation could have different explanations, such as sampling bias or genetic drift, as previously noted for New Guinea-Island Melanesian populations (Lum et al. 2002; Hurles et al. 2003a; Cox & Lahr. 2006). However, sociolinguistic relationships among Karkar groups (e.g. trading partnerships crossing the Takia/Waskia boundary (McSwain 1977), and patterns of marital residence (Boyce et al. 1978) probably present the main mechanisms leading to this genetic admixture (see below).

The third cluster in our sample represents mtDNA lineages found only in the Karkar population (Table 3 and 4). These 6 unique sequence types (PKK 4, 5, 8, 10, 20, 21, representing 28.6% of the Karkar individuals), are affiliated with the haplogroups of both the earliest inhabitants of New Guinea-Island Melanesia and the later Holocene arrivals. Similarly, Karkar Islanders have mtDNA haplogroups (M7b1, F1a1) which are absent in other New Guinea-Island Melanesian populations (Table 3). These findings indicate that the Karkar population has a unique mtDNA genetic structure, possibly reflecting isolation effects and a different settlement history.

In summary, the settlement history of Karkar Island as reflected in the Waskia genetic data suggests (i) high genetic diversity similar to other coastal dwelling populations in Island Melanesia, (ii) some unusual mtDNA lineages compared to other studies of New Guinea-Island Melanesia, and (iii) an admixture of autochthonous and recent immigrant mtDNA lineages. With respect to this latter point, we cannot determine the chronological sequence of arrival of Karkar mtDNA lineages from the genetic data alone. Since autochthonous and recent mtDNA lineages co-occur in much of Island Melanesia it would be theoretically possible for all of the observed variants to have arrived on Karkar at the same time, from an already admixed source population in that region. Such scenarios have seldom been considered in studies of Oceanic population genetics, with distinct haplogroups too frequently assumed to be representative of discrete migrations. In the absence of other lines of evidence more weight should perhaps be attached to such scenarios. In the instance of Karkar however, the presence of linguistic information decreases the likelihood of a single colonisation. Our sample is from a population speaking a Papuan language that shows only tenuous historical links with other nearby languages, but is closest to languages from mainland PNG not Island Melanesia (Ross 2005). Furthermore the presence of Austronesian speakers on the island indicates at least some stratigraphy to settlement. Thus, we believe that our genetic data can be best interpreted in light of linguistic, archaeological, social and environmental data relating to the context of human mobility and interaction in the region. This allows us to consider in more detail likely population history scenarios for Karkar.

Although Karkar was well within the voyaging range of Pleistocene settlers and may have been settled by 35,000 BP, the island has probably been subject to periods of depopulation during prehistory due to its status as an active volcano. Any pre-Holocene populations would almost certainly have been wiped out by a cataclysmic eruption ∼9000 years ago. Other violent eruptions between 1500 and 800 BP would have also had an impact on populations residing on the island [see: http://www.volcano.si.edu/world/volcano.cfm?vnum=0501-03=&volpage=erupt]. Archaeological evidence from nearby islands with a history of volcanic activity indicates that affected areas were repopulated relatively quickly (Torrence et al. 2000) and that social networks were commonly reconfigured in response (Lilley 2004). A series of complete and partial depopulations followed by repeated re-colonisation events by distinct groups from neighbouring regions can consequently be seen as part of the context for increased genetic diversity on Karkar.

Within this context, archaeological and linguistic evidence indicates that there were at least three phases of influential population movement in the regional vicinity of Karkar. The first is tied to the expansion of the Trans New Guinea language phylum of which Waskia is a member. It is possible that this expansion was fuelled by the domestication of taro in the central highlands of PNG, from where Trans New Guinea speakers spread in both directions down into the lowlands. The archaeological evidence suggests that some domestication of taro may have started as early as 9000 BP (Denham 2004), but a date of perhaps 6000 BP for the beginnings of the TNG dispersal is more plausible (Ross 2005). This latter date also corresponds to a time when the north coast of PNG was becoming more attractive to human settlement, with sea level rise leading to the development of rich floodplains, river deltas and lagoons (Terrell 2004). On this basis the arrival of proto-Waskia on Karkar was a mid-Holocene phenomenon, consisting of the movement of a mainland population offshore to an island recovering from a massive volcanic eruption post-9000 BP. It is likely that these people carried the ancient mtDNA haplogroups evidenced in our sample (Cluster 1 above).

The second population movement is tied to the arrival of Austronesian languages in Near Oceania. Early contact with Austronesian on Karkar is evidenced by limited word borrowings in Waskia from an Austronesian language that had not yet lost proto-Oceanic final consonants (Ross 1988). The Oceanic branch of the Austronesian family is believed to have developed in the Bismarck Archipelago, probably at the time of Lapita, and includes Takia amongst its many daughter clades. The Austronesian borrowings in Waskia then, occurred before the arrival of the Takia speakers on Karkar, perhaps sometime immediately preceding or at the beginning of the Lapita period around 3300 BP. Genetically this contact with pre-Lapita Austronesians may be reflected in our sample by the presence of recent mtDNA haplogroups that occur in Island Southeast Asia but have not yet been found in the Bismarck Archipelago, PNG or wider Island Melanesia (i.e. M7b1, F1a1). Alternatively the presence of these haplogroups might be an effect of more recent westward contact, and indeed their absence in the Bismarcks and elsewhere may be sampling error.

The third phase of influential movement relates to developments in southern New Britain and the Vitiaz Strait some 1,600 years ago. Lilley (2000, 2004) has documented trade networks expanding from southern New Britain at this time, stretching westwards along the northern PNG coastline and following the chain of islands that leads to Karkar. A distinctive series of pottery forms and stone tools were distributed throughout the region during this expansion and this has been linked to the westward movement of Oceanic Austronesian speakers (Lilley 2004: pp. 93-4). Oceanic languages on this part of the coast belong to the innovation-defined North New Guinea Cluster, with Takia residing within the Vitiaz subgroup (Ross 1988). Takia speakers occupy the southeast coast of Karkar, a location to be expected if migration proceeded in a westerly direction from New Britain. It is likely that the arrival of this language group on Karkar included individuals affiliated to the recent B4a lineages that make up a third of our sample population, and which are so dominant in other Oceanic populations. On the basis of genetic profiles in the Bismarck Archiplelago it is likely that haplogroups P and Q were also represented in this later arrival of people.

If this basic outline of population movement is correct, gene flow must have occurred between the two linguistic groups on Karkar in order to account for the high proportion of recent mtDNA motifs of Southeast Asian origin in our Waskia sample. For interpretive purposes the census data recorded by the IBP studies of the early 1970s is extremely useful here. A virtually complete census of the island was produced by this research, with detailed information on marital residence and migration patterns for the 16,000 people living on Karkar at the time (Harrison & Walsh 1974). The population was found to be markedly patrilocal, with married females being the only members of the community residing in any significant numbers away from their villages of birth. Interestingly there was a difference between the language groups in this respect. Initial analyses showed that 71% of Waskia married women lived away from their home village, while only 37% of Takia women did so. Marriage across the linguistic boundary was rare, but again structured slightly differently. Of those married women living in a Waskia village outside their birth village, 89% were from another Waskia village, 7% were from a Takia village, and 4% were from outside Karkar. In Takia 82% were from another Takia village, 5% from a Waskia village, and 13% from outside Karkar (Boyce et al. 1978: pp. 279-80). It is important to note that these trends were continuing with increased involvement in the cash economy, and were accompanied by extreme population expansion (the earliest census in 1925–27 recorded 7286 people) (Hornabrook 1974).

Under these conditions we would predict the genetic structure of the population of Karkar to slowly converge, with Waskia becoming more like Takia at a slightly faster rate, and Takia becoming more like regional populations due to a greater flow of genes from outside Karkar. We currently have no mtDNA data for Takia, but the Waskia data presented here clearly indicates this process of convergence. For the Waskia speakers the result has clearly been an increased proportion of recent ‘Asian’ mtDNA haplogroups within the population. Here then, we have the flip side of Hage and Marck's (2003) argument that the predominantly ‘Melanesian’ origins of Y chromosome lineages in Oceanic populations is an effect of matrilocal practices: The predominantly ‘Asian’ origins of mtDNA lineages in our sample of Papuan speakers is an effect of patrilocal practices.

It would be fascinating to explore this pattern further through investigation of Y chromosome markers on Karkar and further sample analysis from both sides of the linguistic boundary. But, it is important to note that due to both mtDNA and NRY having a fraction of the reproductive sample sizes of autosomal markers, extreme genetic drift is likely to be a significant factor in very small populations. In situations such as Karkar where there has probably been significant population fluctuation throughout prehistory we would expect this effect to be even greater. The population expansion witnessed on Karkar over the last hundred years – from 7286 in the 1920s to 16,000 in the 1970s and 45,000 today – is an extreme index of what is possible.

In conclusion, we hypothesise that the genetic structure and diversity of our Karkar sample is best understood within the context of environmental, archaeological and linguistic data. The best explanation of the Karkar Island genetic results is a succession of (partial) depopulation, repopulation and expansion events, under conditions of structured and increasing interaction. Nevertheless, these preliminary results are inevitably constrained by the sample size (especially through the few single Karkar mtDNA occurrences) and other factors – only an increased number of archaeological surveys and genetic studies from New Guinea-Island Melanesia, will enable a better understanding of Karkar settlement history and validate/invalidate the hypotheses put forward.

Acknowledgements

This research was supported by NERC as part of the EFCHED Thematic Programme ‘Searching for traces of the Southern Dispersal’ Project, and by the AHRC through the European Science Foundation EUROCORES Origins of Man, Language, and Languages (OMLL) programme “Pioneers of Island Melanesia”. Further support was also provided by the Isaac Newton Trust of Trinity College, Cambridge, and the Leverhulme Trust. The authors express their sincere thanks to Daniel Candotti and Jean-Pierre Alain, Laboratory of Molecular Virology (Division of Transfusion Medicine, Department of Haematology of the University of Cambridge).

Ancillary