The Utsat people do not belong to one of the recognized ethnic groups in Hainan, China. Some historical literature and linguistic classification confirm a close cultural relationship between the Utsat and Cham people; however, the genetic relationship between these two populations is not known. In the present study, we typed paternal Y chromosome and maternal mitochondrial (mt) DNA markers in 102 Utsat people to gain a better understanding of the genetic history of this population. High frequencies of the Y chromosome haplogroup O1a*-M119 and mtDNA lineages D4, F2a, F1b, F1a1, B5a, M8a, M*, D5, and B4a exhibit a pattern similar to that seen in neighboring indigenous populations. Cluster analyses (principal component analyses and networks) of the Utsat, Cham, and other ethnic groups in East Asia indicate that the Utsat are much closer to the Hainan indigenous ethnic groups than to the Cham and other mainland southeast Asian populations. These findings suggest that the origins of the Utsat likely involved massive assimilation of indigenous ethnic groups. During the assimilation process, the language of Utsat has been structurally changed to a tonal language; however, their Islamic beliefs may have helped to keep their culture and self-identification.
The Utsat people form a relatively small population that lives mostly in Sanya Prefecture on the southern tip of Hainan Island off the southern coast of mainland China. This population is not considered as a separate ethnic group, and is officially recognized as being of the Hui nationality. This recognition is contentious because of its unclear origin. The folktales of the Utsat name their ancestors as Muslims who originated in Central Asia, as did other Hui Chinese. However, the Utsat are also thought to be descendants of the Champa Kingdom (7th to 18th centuries) who fled their homeland to escape Vietnamese invasion (Olson, 1998). According to oral historical material, a Cham Prince and approximately 1000 Cham moved to Hainan after the Vietnamese completed their conquest of Champa, and the Ming dynasty allowed them to set up a kingdom in exile in Hainan (Tran & Reid, 2006). However, Chinese historical literature records the arrival of the Cham refugees on Hainan even earlier, in the Song dynasty, after the capital of Cham fell in 982 AD (Andaya, 2008).
The close relationship between the Utsat and Cham is supported by affiliations in their language. Tsat, the language of the Utsat, belongs to the Malayo-Polynesian group within the Austronesian linguistic family, as do the Cham languages (Pang & Maddieson, 1993; Graham, 2006). The Tsat language is very similar to that of the Northern Roglai and is classified as a Northern Cham branch (Lewis, 2009). However, Tsat exists as a “linguistic enclave” in Hainan Island because it is surrounded by non-Austronesian languages (i.e. Tai-Kadai and Chinese). Over the long course of contact with sinitic dialects and the directionality of internal drift, Tsat has changed structurally to resemble Chinese and Hlai. For example, Tsat has developed into a solidly tonal language, which is seldom observed in Malayo-Polynesian languages (Graham, 1996).
In previous studies, we have found that East Asian languages show a strong association with paternal lineages of Y chromosomes (Wen et al., 2004a, 2004b; Li et al., 2008; Cai et al., 2011). Thus, the structural changes to the Tsat language may be also reflected genetically. Li et al. (2008) reported the Y chromosome data for 31 Utsat. The high frequency of haplogroup O1a-M119 (58.1%) and the connection to Daic populations in the Y-short tandem repeat (STR) networks suggests a probable Daic genetic background for the Utsat (Li et al., 2008). These findings suggest that the origin of the Utsat most likely involves assimilation or recent gene flow from neighboring indigenous populations. However, Y chromosome data only offer a paternal perspective and the small sample size in the study of Li et al. (2008) may have contributed to bias. Furthermore, without comparisons to the Cham population, the situation remains contentious. In the present study, we addressed this question by typing the maternally inherited mitochondrial (mt) DNA and relevant Y chromosome markers in 102 samples from the Utsat population (72 men and 30 women) to gain a better understanding of the origin of the Utsat.
1 Material and methods
1.1 Population samples
The present study was approved by the Ethics Committee of the Fudan School of Life Sciences. Peripheral blood samples were collected from 102 Utsat individuals from Sanya Prefecture, Hainan, China, after adequate information had been provided regarding the study and the subjects had provided written consent. All subjects were all healthy and not related within five generations.
1.2 Y chromosome markers
Fourteen single nucleotide polymorphisms (SNPs) in the Y chromosome non-recombining portion as listed in the latest Y chromosome phylogenetic tree (Karafet et al., 2008; Yan et al., 2011; M130, M89, M9, M45, M119, M110, M101, P31, M95, M88, M122, M164, M159, and M7) were typed in the samples collected using polymerase chain reaction (PCR)–restriction fragment length polymorphism (RFLP). Four of SNPs (M48, M8, M217, and M356) were typed using Taqman (Applied Biosystems, Foster City, CA, USA), whereas seven SNPs (YAP, M15, M175, M111, M134, M117, and M121) and seven STR polymorphisms (DYS19, DYS389I, DYS389II, DYS390, DYS391, DYS392, and DYS393) were typed using fluorescently labeled primers for PCR amplification. Denatured products were separated by acrylamide gel electrophoresis on a 3730xl Genetic Analyzer (Applied Biosystems, Carlsbad, CA, USA) to distinguish the alleles.
1.3 Mitochondrial DNA markers
The hypervariable segment I (HVS-I) region of mtDNA was amplified using the primers L15974 and R16488 (Yao et al., 2002). The PCR products were purified by Shrimp Alkali Enzyme and ExonI (Roche Diagnostics, Shanghai, China). The purified PCR product was sequenced using the Big-Dye Terminator Cycle Sequencing Kit (Applied Biosystems) and an ABI 3730xl Genetic Analyzer (Applied Biosystems). Sequence Analysis 3.3 software (Applied Biosystems) was used to extract the sequences. The HVS-I sequences were edited and aligned against the revised Cambridge reference sequence (Andrews et al., 1999) using DNASTAR software (DNASTAR, Madison, WI, USA). Another 22 polymorphisms in the coding regions of the mtDNA (3010, 7598, 663, 10 400, 10 310, 4216, 4491, 12 308, 10 646, 11 719, 4715, 4833, 8271, 5301, 70 287, 13 263, 14 569, 5417, 5178, 12 705, 15 607, and 9824) were also genotyped hierarchically using SNaPshot (ABI SNaPshot Multiplex Kit; Applied Biosystems). The PCR products were also electrophoresed on the 3730xl Genetic Analyzer (Applied Biosystems). The haplogroup affiliation of each mtDNA sequence was inferred by combined use of the HVS-I motif and diagnostic polymorphisms in the coding regions (Kivisild et al., 2002; Kong et al., 2003).
1.4 Statistical analyses
Networks of Y chromosome STRs and mtDNA HVS-I motifs were constructed by the median joining method (Bandelt et al., 1999) using Network version 220.127.116.11 (http://www.Fluxus-engineering.com, accessed 10 October 2012). Genotype data for the Utsat were generated in the present study, whereas data pertaining to neighboring populations were obtained from the existing literature (Yao et al., 2002; Yao & Zhang, 2002; Wen et al., 2004a, 2004b; Trejaut et al., 2005; Li et al., 2007, 2008, 2010; Gan et al., 2008; Qin et al., 2010; He et al., 2012; Lu et al., 2012). Arlequin 3.11 was used to calculate the Y-STRs Slatkin's linearized fixation index (Fst) (Rst) genetic distances (Excoffier et al., 2005). Principal components analysis (PCA) and multidimensional scaling (MDS) were performed using SPSS 18.0 software (SPSS, Chicago, IL, USA).
2 Results and Discussion
2.1 Y chromosome
According to the nomenclature of the Y Chromosome Consortium (YCC; Karafet et al., 2008; Yan et al., 2011), eight SNP haplogroups were determined from the 72 individual Utsat samples (see Table S1 available as Supplementary Material to this paper). Although the language of the Utsat people is now classified as Northern Cham, their paternal genetic structure, with high frequencies of haplogroup O1a*-M119, is not similar to that of the Cham populations. O2a1* and its sub-haplogroup O2a1a, which are dominant haplogroups of the Cham, only account for 4.17% of the Utsat (Fig. 1). The moderate frequencies of ancient southeast Asian lineages C-M130 and F*-M89 among the Utsat may result from the genetic drift of certain ancestral contributors to the population. One of the Sino-Tibetan characteristic lineages, namely O3a2c1a-M117, has also been identified in the Utsat at a low frequency of 4.17% (Su et al., 2000; Kang et al., 2012), and probably resulted from recent gene flow from the neighboring Han migrants.
Detailed paternal genetic patterns for the Utsat, Cham, and other East Asian populations were discerned using additional published Y chromosome datasets. We performed a PCA using the Y chromosome haplogroup frequencies of the Utsat and another 43 East Asian populations (Fig. 2). Populations from mainland southeast Asia (MSEA) and from Hainan formed two clusters in the second principal component (PC). The Utsat was clustered into the Hainan group, together with the Hainan aborigines and populations of southern China, such as the Dong and Sui. However, Cham was very closely related to the MSEA group. The MDS plot of 52 populations with Rst genetic distances based on six common Y-STRs (i.e. DYS19, DYS389I, DYS390, DYS391, DYS392, and DYS393) also associated the Utsat with populations from Hainan (Fig. 3). This pattern was due mainly to the high frequency of the O1a*-M119 haplogroup and the low frequency of O2a1*-M95 in the Utsat. Detailed distances among the Utsat, Hainan aborigines, and MSEA populations within the O1a*-M119 haplogroup could reveal a clear paternal genetic origin of the Utsat. Therefore, we constructed a median joining network based on six STR haplotypes (i.e. DYS19–DYS389I–DYS390–DYS391–DYS392–DYS393) of O1a*-M119 individuals in the relevant ethnic groups. In the network (Fig. 4), Hainan aborigines formed several almost exclusive clades with few individuals from other populations, suggesting that Hainan aborigines had been isolated from other Daic populations of southern China and Taiwan aborigines. Almost all the Utsat samples were clustered in those exclusive clades with Hainan aborigines. However, samples from MSEA tended to cluster with southern Chinese populations. These findings suggest that the main paternal haplogroup of the Utsat was introduced from indigenous ethnic groups in Hainan rather than from the Cham or other MSEA populations.
2.2 Mitochondrial DNA
Nineteen mtDNA haplogroups were found in the 102 Utsat samples (Table S2). The mtDNA haplogroups present at high frequencies in the Utsat are (in decreasing order of frequency) D4, F2a, F1b, F1a1, B5a, M8a, M*, D5, and B4a. The D4 and F2a haplogroups are the two main haplogroups in the Utsat population, accounting for 16.67% and 15.69%, respectively. However, these two haplogroups are absent or occur at very low frequencies in other Hainan aborigines and MSEA populations. We then compared Utsat samples with those two haplogroups at the haplotype level with other relevant populations. Most of the D4 samples from the Utsat have the same HVS-I motif type, namely 16223–16316–16362. However, this haplotype is seldom seen in the populations of East Asia and MSEA. The F2a haplotype in the Utsat is shared exclusively with some Han Chinese and other small ethnic groups (Lahu, Yi, and Mosuo) in Yunnan (Yao & Zhang, 2002; Wen et al., 2004a, 2004b; Lu et al., 2012; Wang et al., 2012). The pattern of high frequencies of B and F haplogroups in the Utsat is very similar to the neighboring populations and other southern populations. Furthermore, we used PCA based on the distribution of mtDNA haplogroup frequencies to show the overall clustering pattern of the Utsat and another 30 populations (Fig. 5). Taiwan aborigines, MSEA populations, and Hainan aborigines formed three clusters in the first PC. Haplogroups E, F5, and B4 contributed mostly to the Taiwan pole. In contrast, haplogroups G, A, C, M9, and M8 were found to contribute most to the Hainan and Sino-Tibetan pole. The Utsat tended to cluster with Hainan aborigines, whereas the Cham clustered with the MSEA populations. This frequency distribution pattern revealed the genetic affinity between the Utsat and Hainan aborigines.
However, the results based on haplogroup frequency comparisons could be misleading because of the quickly changing frequencies of the mtDNA lineages caused by positive selection or genetic drift (Yang et al., 2011; Lu et al., 2012). A network analysis of individual lineages will most likely offer a better investigation of maternal relationships among the Utsat, Cham, and other populations (Li et al., 2007; Qin et al., 2010). Based on the mtDNA HVS-I motif and SNP-determined haplogroups, the networks of mtDNA haplogroups D4, F2a, F1b, F1a1, M8a, D5, and B4a were analyzed (Fig. 6). Each of these haplogroups has a high or moderate frequency in both the Utsat and MSEA populations, with these haplogroups together accounting for 72.55% of the Utsat. In the networks of the D4 haplogroup, only one haplotype was shared between the Utsat and Thai people, with other Utsat samples forming a large, exclusive clade. Exclusive clades were also seen for F2a and F1b. In the networks of B4a and M8a, the Utsat were clustered only with samples of Hainan aborigines. In the F1a1 network, the Utsat were clustered only with populations from southern China. In the B5a and D5 networks, the Utsat, Hainan aborigines, and populations from southern China, Taiwan, and MSEA clustered together in a large clade and in one moderate-sized clade. However, no Utsat samples clustered directly with the Cham. Overall, the maternal lineages of the Utsat are closer to those of the Hainan or southern China ethnic groups than to the MSEA populations.
In the present study, the patterns of paternal Y chromosome and maternal mtDNA diversities indicate that the Utsat are much closer to the Hainan indigenous ethnic groups than to the Cham and other MSEA populations. This suggests that the formation of the Utsat likely involved massive assimilation of indigenous ethnic groups. During the assimilation process, the language of the Utsat has been changed structurally so that it resembles the Sinitic or Tai-Kadai dialects. However, it is most interesting that the culture and self-identification of the Utsat remain consistent with the Cham. The Utsat are Muslims and the religious law of Islam deals with many personal matters, such as hygiene, diet, fasting, and even the exact time for prayer. These Islamic beliefs may have a significant role in maintaining the lifestyle and self-identification of the Utsat. Islam for the Utsat is more about community and probably not about biological descent. We may call it a “religion dominance” mechanism of genetic replacement: a small invading group is adopted by the larger indigenous resident population, allowing for genetic substitution from the indigenous population to occur. However, the religion of the small invading population may help maintain their religion-based culture and self-identification.
This work was supported by the National Natural Science Foundation of China (Grant Nos. 31071098, 30860124, 30890034, 91131002), National Excellent Youth Science Foundation of China (Grant No. 31222030), Shanghai Rising-Star Program (Grant No. 12QA1400300), Shanghai Commission of Education Research Innovation Key Project (Grant No. 11zz04), and Shanghai Professional Development Funding (Grant No. 2010001).
Appendix I Members of the Genographic Consortium
Janet S. ZIEGLE (Applied Biosystems, Foster City, CA, USA)
Li JIN (Fudan University, Shanghai, China)
Hui LI (Fudan University, Shanghai, China)
Shilin LI (Fudan University, Shanghai, China)
Pandikumar SWAMIKRISHNAN (IBM, Somers, NY, USA)
Laxmi PARIDA (IBM, Yorktown Heights, NY, USA)
Daniel E. PLATT (IBM, Yorktown Heights, NY, USA)
Ajay K. ROYYURU (IBM, Yorktown Heights, NY, USA)
Lluis QUINTANA-MURCI (Institut Pasteur, Paris, France)
R. John MITCHELL (La Trobe University, Melbourne, Australia)
Marc HABER (Lebanese American University, Beirut, Lebanon)
Pierre A. ZALLOUA (Lebanese American University, Beirut, Lebanon)
Syama ADHIKARLA (Madurai Kamaraj University, Madurai, India)
ArunKumar GANESHPRASAD (Madurai Kamaraj University, Madurai, India)
Ramasamy PITCHAPPAN (Madurai Kamaraj University, Madurai, India)
Varatharajan Santhakumari ARUN (Madurai Kamaraj University, Madurai, India)
R. Spencer WELLS (National Geographic Society, Washington DC, USA)
Himla SOODYALL (National Health Laboratory Service, Johannesburg, South Africa)
Elena BALANOVSKA (Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia)
Oleg BALANOVSKY (Research Centre for Medical Genetics, Russian Academy of Medical Sciences, Moscow, Russia)
Chris TYLER-SMITH (The Wellcome Trust Sanger Institute, Hinxton, UK)
Daniela R. LACERDA (Universidade Federal de Minas Gerais, Belo Horizonte, Brazil)
Fabrício R. SANTOS (Universidade Federal de Minas Gerais, Belo Horizonte, Brazil)
Pedro Paulo VIEIRA (Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil)
Jaume BERTRANPETIT (Universitat Pompeu Fabra, Barcelona, Spain)
David COMAS (Universitat Pompeu Fabra, Barcelona, Spain)
Begoña MARTÍNEZ-CRUZ (Universitat Pompeu Fabra, Barcelona, Spain)
David F. SORIA-HERNANZ (Universitat Pompeu Fabra, Barcelona, Spain)
Christina J. ADLER (University of Adelaide, Adelaide, Australia)
Alan COOPER (University of Adelaide, Adelaide, Australia)
Clio S. I. Der SARKISSIAN (University of Adelaide, Adelaide, Australia)
Wolfgang HAAK (University of Adelaide, Adelaide, Australia)
Matthew E. KAPLAN (University of Arizona, Tucson, AZ, USA)
Nirav C. MERCHANT (University of Arizona, Tucson, AZ, USA)
Colin RENFREW (University of Cambridge, Cambridge, UK)
Andrew C. CLARKE (University of Otago, Dunedin, New Zealand)
Elizabeth A. MATISOO-SMITH (University of Otago, Dunedin, New Zealand)
Matthew C. DULIK (University of Pennsylvania, Philadelphia, PA, USA)
Jill B. GAIESKI (University of Pennsylvania, Philadelphia, PA, USA)
Amanda C. OWINGS (University of Pennsylvania, Philadelphia, PA, USA)
Theodore G. SCHURR (University of Pennsylvania, Philadelphia, PA, USA)
Miguel G. VILAR (University of Pennsylvania, Philadelphia, PA, USA)