Genetic structure and affinities among tribal populations of southern India: a study of 24 autosomal DNA markers


*Corresponding author: Dr. H. Vishwanathan, Division of Genetics, Department of Environmental Sciences, Bharathiar University, Coimbatore 641 046, TN, India. Phone: +91-422-2422222 ext.395. Fax: +91-422-2422387. E-mail:


We describe the genetic structure and affinities of five Dravidian-speaking tribal populations inhabiting the Nilgiri hills of Tamil Nadu, in south India, using 24 autosomal DNA markers. Our goals were: (i) to examine what evolutionary forces have most significantly impacted south Indian tribal genetic variation, and (ii) to test whether the phenotypic similarities of some south Indian tribal groups to Africans represent a signature of close relationship to Africans or are due to convergence. All loci were polymorphic and average heterozygosities were substantial (range: 0.347–0.423). Genetic differentiation was high (Gst= 6.7%) and genetic distances were not significantly correlated with geographic distances. Genetic drift therefore probably played a significant role in shaping the patterns of genetic variation observed in southern Indian tribal populations. Otherwise, analyses of population relationships showed that Indian populations are closely related to one another, regardless of phenotypic characteristics, and do not show particular affinities to Africans. We conclude that the phenotypic similarities of some Indian groups to Africans do not reflect a close relationship between these groups, but are better explained by convergence.


Contemporary ethnic populations of India are highly variable, both biologically and culturally (Majumder, 1998). The broadest division of Indian populations distinguishes tribal from non-tribal groups. The definition of tribe is somewhat ambiguous, but generally refers to the endogamous populations that are considered aboriginal, inhabiting the Indian subcontinent before the immigration of pastoral nomads from central Asia some 3,500 years ago (Cavalli-Sforza et al. 1994). The tribal groups constitute about 8% of the total Indian population and they “may represent relic populations of unknown origin but potentially of great genetic interest” (Cavalli-Sforza et al. 1994). The origins and migrational histories of the tribal populations of the Indian subcontinent are not clearly understood. It has been argued that Africa may have made some direct genetic contribution to India, since some tribal populations in southern India possess phenotypic similarities with Africans, the so-called “Negrito” physical characteristics (Maloney, 1974; Saha et al. 1974; Roychoudhury, 1982; Chandler, 1988; Majumder, 1998). It has also been suggested that at one time a “Negrito element” was widespread throughout India and was eventually forced into a more restricted location in south India (Majumder & Mukherjee, 1993).

Analyses of mitochondrial DNA (mtDNA) and Y-chromosomal variation in south India suggest that contemporary genetic variation may derive from the original Indian gene pool (Roychoudhury et al. 2001; Cordaux et al. 2003; Kivisild et al. 2003), although the time depth for the arrival of the first settlers of the Indian subcontinent remains a matter of debate (e.g. Cordaux & Stoneking, 2003; Endicott et al. 2003b). The presence of mtDNA haplogroup M in both east Africa and India has been interpreted as supporting an ancient east Africa-to-India migration (Quintana-Murci et al. 1999), although this conclusion has been questioned (Roychoudhury et al. 2001). Moreover, comparisons of south Indian tribal groups with other world populations, based on mtDNA data, do not reveal close relationships to Africans (Cordaux et al. 2003; Kivisild et al. 2003). This conclusion is also supported by Y-chromosome genetic markers (Kivisild et al. 2003; Cordaux et al. 2004). Therefore, it is possible that the African-like morphological features of some south Indian tribal groups result from convergence rather than shared ancestry with Africans.

However, mtDNA and the Y-chromosome each represent single haploid loci and are more prone to stochastic processes than are autosomal bi-parentally inherited markers. Here, we report the analysis of 24 autosomal markers (including 7 insertion-deletion polymorphisms and 17 restriction site polymorphisms or RSPs) in five indigenous tribal populations from the Nilgiri Hills of Tamil Nadu, in southern India. The aims of this study were: (i) with regard to the relationships of these populations, to test whether the “Negrito” features of some contemporary south Indian tribal groups represent a signature of close relationship to Africans or are due to convergence, and (ii) with regard to the genetic structure of these populations, to examine what evolutionary forces have most significantly impacted on the genetic variation in present-day south Indian tribes.

Materials and Methods

Population Samples and Autosomal Markers

Blood samples (5-10 ml by venipuncture) were drawn from 250 unrelated adult volunteers from five endogamous Dravidian-speaking tribal populations of southern India with prior informed consent. The tribal groups are confined to hilly tracts and valleys of the Nilgiri region (Figure 1), located 20–100 km from each other (Table 1), and consist of Irula (n = 50), Kurumba (n = 54), Badaga (n = 51), Kota (n = 45) and Toda (n = 50). Of these, the Irula and Kurumba possess “Negrito” morphological features (for a description, see Majumder, 1998). These groups are characterized by small census sizes, in particular Toda (census size: ∼1,300 individuals), Kota (∼2,000), Kurumba (∼5,000) and Irula (∼9,000), whereas Badaga constitute a larger group (∼150,000). Additional linguistic, historical, demographic and genetic information about these populations have been reported elsewhere (Thurston, 1909; Saha et al. 1976; Breeks, 1983; Singh, 1994; Roychoudhury et al. 2000).

Figure 1.

Map of India pinpointing the Nilgiri Hills of Tamil Nadu, in southern India.

Table 1.  Matrices of pairwise distances among five southern Indian tribal populations
  1. Above the diagonal are approximate geographic distances (km) separating populations. Below the diagonal are Nei's (1972) standard genetic distances based on 24 autosomal markers.

Badaga902090 45
Kota0.0430.041100 70

High molecular weight DNA was isolated from the blood samples by the salting out procedure (Miller et al. 1988) and was suspended in 10mM Tris and 0.1mM EDTA for genotyping. All the polymorphic loci studied were genotyped by amplifying DNA in a standard 30-cycle three-step PCR. Appropriate annealing temperatures and additives were optimized for each system. PCR products of the RSPs were digested with 5 U of the appropriate restriction enzymes in the respective buffers for 2–4 hours at the appropriate temperatures. The names and Genome Database (GDB) accession numbers of the 9 RSP loci are: ESR1 (GDB: 185229), NAT (GDB: 187676), PSCR (GDB: 182305), T2 (GDB: 196856), LPL (GDB: 285016), ALB (GDB: 178648), CYP1A1 (GDB: 120604), HoxB4 (GDB: 120663) Msp1, ADH2 (GDB: 119651) Rsa1. The chromosomal locations, primer sequences and PCR conditions for each RSP locus studied are provided at GDB and are listed on the Eccles Institute of Human Genetics website ( The methodologies of the haplotype loci DRD2 (based on sites Taq1 ‘A’, Taq1 ‘B’ and Taq1 ‘D’), β-globin (based on sites HB7, HB8 and HB9) and ALAD (based on sites RSA1 and MSP1) and RSPs are also described elsewhere (Kidd et al. 1998; Majumder et al. 1999a; Jorde et al. 1995; Mukherjee et al. 2000; Wetmur et al. 1991). The insertion-deletion polymorphisms include 5 Alu elements (AluAPO, AluACE, AluPLAT, AluPV92, AluFXIIIB), one deletion within the Alu element CD4 (CD4del) and one nuclear insertion of a mitochondrial DNA segment (mtNUC; Zischler et al. 1995). The protocols for these markers have been described elsewhere (Stoneking et al. 1997; Majumder et al. 1999b; Watkins et al. 2001). After PCR, the samples were subjected to electrophoresis at 125 V for 1 hour. Ethidium bromide stained gels were visualized by UV and were documented.

Data Analysis

Allele frequencies were calculated by direct counting at each locus separately for each population. Heterozygosities at individual loci and the overall average heterozygosity were calculated using the estimated allele frequencies for each population. Hardy-Weinberg equilibrium was tested using a χ2 goodness of fit test, with Bonferroni's correction for multiple comparisons. To assess the extent of gene differentiation among the studied groups, Nei's (1973) measure of gene diversity was calculated separately for each locus and for all loci considered jointly. A contingency χ2 analysis (Workman & Niswander, 1970) was used to test for heterogeneity in the populations, which reflects differences in allele frequencies. Maximum likelihood estimates of the haplotype frequencies were calculated for the multisite marker typing data, using the program HAPLOFREQ (Majumder & Majumder, 2000). To assess genetic relationships among the populations, multidimensional scaling (MDS) analyses were performed by means of STATISTICA (Statsoft Inc., Tulsa, OK, USA), based on Nei's (1972) standard genetic distance calculated with PHYLIP version 3.5c (Felsenstein, 1993). In addition, dendrograms were constructed using the neighbor-joining (NJ) method, based on Nei's (1972) standard genetic distance.


Genetic Diversity and Population Differentiation

All of the loci were polymorphic in all of the populations (Table 2), with the exception of AluAPO in the Toda and CD4del in the Kota and Kurumba. All the RSPs and haplotype loci studied were also polymorphic in all populations. None of 117 χ2 tests for goodness of fit to Hardy-Weinberg equilibrium showed significant departure after Bonferroni correction for multiple tests was applied. The average heterozygosity for each locus was substantial, with several values approaching the theoretical maximum heterozygosity of 0.5 for a biallelic locus, with the notable exception of the CD4del locus which showed low and consistently minimum heterozygosity in all the populations (results not shown). Although high heterozygosity was expected because these loci were ascertained on the basis that they were known to be polymorphic, an analysis of the distribution of allele frequencies at loci ascertained in a similar manner indicates that ascertainment bias alone does not completely account for the observed frequency spectrum; the distributions also contain information on the demographic history of human populations (Sherry et al. 1997). Overall, the average heterozygosity for each population ranged from 0.347 (in Toda) to 0.423 (in Irula), the lowest values being found in the Toda and Kota (Table 2).

Table 2.  Allele frequencies for 24 autosomal markers and average heterozygosity (h) estimates in five southern Indian tribal populations
  1. The frequency indicated for each bi-allelic marker is that of the presence of the insert for insertion-deletion markers except CD4del; presence of the deletion for CD4del; presence of the restriction site for RSPs.

CYP 1A0.3220.6120.4220.4170.184
HOX B40.4700.5100.5560.4040.528
DRD2/Taq1 ‘A’0.5900.6330.4890.5370.798
DRD2/Taq1 ‘B’0.6570.6120.7670.6670.909
DRD2/Taq1 ‘D’0.6520.5700.7330.5000.439
Average h0.4220.4230.3720.4150.347

To determine the amount of genetic differentiation among populations, Gst values (a measure of the interpopulation variability) for each polymorphic locus were determined. The results are presented in Table 3, separately for each locus and also for all loci taken together. Except for CD4del, the total genomic diversity (HT) among the subpopulations was quite high. However, most of the genetic diversity is attributable to diversity between individuals within populations (HS). The percentage of the total genetic diversity attributable to differences between populations ranged from 1.1% for HOXB4 to 12.9% for AluAPO. When all loci are jointly considered, 6.7% of the total genetic diversity is attributable to variation between populations. Tests of significance for heterogeneity of populations also showed significant values for 85 out of 160 comparisons involving insertion/deletion loci and RSPs (data not shown). The three haplotype loci showed significant values in all the comparisons, with the exception of the Badaga/Toda and Irula/Kota for the ALAD locus. Thus, there are substantial differences among populations with respect to their allele frequencies.

Table 3.  Gene diversity analysis for individual loci and for all loci considered jointly
CYP 1A0.4760.4370.082
HOX B40.4990.4940.011
All loci0.4660.4350.067

Population Relationships

The high levels of differentiation observed among the groups studied could be explained by: (i) a common ancestry, but genetic drift has altered ancestral gene frequencies, or (ii) independent origins. To distinguish between these two hypotheses, we first analyzed the distribution of haplotypes at three haplotype loci in the five studied tribal groups. With respect to the DRD2 loci (Table 4a; Vishwanathan et al. 2003), four out of eight possible haplotypes (B2D2A2, B2D1A2, B2D2A1 and B1D2A1) were shared by all five populations and accounted for 78–96% of the haplotypes in each group. Except B1D2A2 (which was restricted to the Irula), none of the haplotypes was specific to “Negrito” or “non-Negrito” groups. In the β-globin gene (Table 4b), the (−−+) haplotype was modal in all populations. In addition, except for the (+−+) haplotype (which was restricted to the Toda), none of the haplotypes was specific to “Negrito” or “non-Negrito” groups. Otherwise, the Hb-S allele occurs primarily on the (−++) haplotype background (Majumder et al. 1999a) which was present in all the study populations with a frequency ranging from 0.12 to 0.29. Most Hb-A alleles occur on the (+−−) haplotype background which was not seen in any of the studied populations. Out of the four possible haplotypes at the ALAD locus (Table 4c), three haplotypes (+−), (−+), (−−) were shared by all the populations, whilst the (++) haplotype was absent or virtually absent in all the study groups. The haplotype (−−) was most prevalent in all groups except the Kurumba. Altogether, the extensive sharing of haplotypes by all five groups is more consistent with a scenario of common ancestry than one of independent origins.

Table 4.  Haplotype frequency estimates in five southern Indian tribal populations
  1. Haplotypes are listed as B1, D1 and A1 alleles for the site-absent state for the RFLP sites while B2, D2 and A2 are the site-present alleles for the DRD2 loci. A polymorphic restriction site is denoted as being present in a particular haplotype by a (+) symbol and absent by a (−) symbol for the β-globin and ALAD loci.

a. DRD2
b. β-globin 
c. ALAD 

To further distinguish between these two hypotheses, the relationships among the study populations were assessed by calculating Nei's (1972) standard genetic distances between populations, based on the 24 markers (Table 1). The distances ranged from 0.030 (between Irula and Kurumba) to 0.106 (between Kurumba and Toda). The Toda showed large genetic distances with all other groups (above 0.055), whereas all but one pairwise distance values not involving the Toda were less than 0.045. A Mantel test indicated that geographic distances (Table 1) are not significantly correlated with the genetic distances among the five study groups (r2= 0.435; p= 0.104). This lack of significant correlation supports the idea of enhanced genetic drift in these small and isolated populations.

The Nilgiri Hills' tribal Indians were also compared to 11 other tribal, and 10 non-tribal, populations of India, using the data from seven insertion/deletion loci presented by Majumder et al. (1999b), Mukherjee et al. (2000) and Veerraju et al. (2001) that are also analyzed in the present study (i.e. mtNUC, AluPV92, AluFXIIIB, AluAPO, AluACE, AluPlat and CD4del). The MDS plot of the 26 Indian populations is presented in Figure 2. Overall, the tribal groups tend to fall on the left-hand side of the plot, whereas the non-tribal groups tend to fall on the right-hand side. Regarding the study groups, the “Negrito” groups (i.e. Irula and Kurumba) cluster together with other Indian tribal groups, as do the Kota. By contrast, the Toda and Badaga are well separated from the other tribal populations (Figure 2): the Toda are isolated from all other groups whereas the Badaga show closer affinities to non-tribal groups. The latter observation could indicate common ancestry of the Badaga with non-tribal groups, or could represent an artifact generated by extensive genetic drift occurring in the Badaga. Noticeably, in an NJ tree, both the Toda and Badaga were characterized by long branches (not shown), consistent with genetic drift occurring in these populations.

Figure 2.

Multidimensional scaling plot depicting the genetic relationships of 26 Indian populations based on seven insertion/deletion loci (mtNUC, AluPV92, AluFXIIIB, AluAPO, AluACE, AluPlat and CD4del). Filled circles represent Nilgiri Hills tribal populations; open circles represent other tribal groups; open squares represent non-tribal groups.

In order to compare the global relationships among populations, we used the available Alu insertion data from Stoneking et al. (1997) and Majumder et al. (1999b) that are common to our dataset (i.e. AluACE, AluFXIIIB, AluAPO, AluPV92, AluPLAT). The overall pattern of the MDS analysis (Figure 3) suggests that Indian populations in general occupy an intermediate position in between west and east Eurasians, as found previously (Majumder et al. 1999b). The five Nilgiri Hills' populations do not cluster together, but fall close to other Indian groups. In particular, the Indian “Negrito” populations do not show any particular ties to African populations and are clearly more closely related to other Indian groups than to African groups.

Figure 3.

Multidimensional scaling plot depicting the genetic relationships of 54 world populations, using five Alu insertions loci (AluACE, AluFXIIIB, AluAPO, AluPV92, AluPLAT). A hypothetical ancestral population is shown, in which the frequency of the Alu element at each locus is set to 0.


It is generally thought that Indian tribal populations are descendants of the original inhabitants of India. Their morphological, social and genetic features, maintained in highly endogamous groups, provide a unique opportunity for examining human evolution and population histories. However, most studies of Indian populations using DNA markers have included at most a few tribal groups. Likewise, analyses of Indian populations using a large number of highly varied and selectively neutral markers distributed throughout the genome have primarily focused on non-tribal groups (Bamshad et al. 2001, 2003; Watkins et al. 2001). Thus, the present investigation was conducted with the goal of analyzing the extent of genetic variation at a number of polymorphic autosomal loci from samples of diverse tribal populations from southern India, with a particular focus on the origins of particular groups that show phenotypic similarities to Africans (i.e. “Negrito” characteristics).

Populations with “Negrito” features have been reported in southern Asia, southeast Asia and island southeast Asia, leading to the suggestion that they might represent the signature of an ancient migration from Africa (Cavalli-Sforza et al. 1994). An alternative explanation for these characteristics is phenotypic convergence. If the former hypothesis is correct, one would expect the Indian “Negrito” groups to show closer genetic affinities to African groups than to other “non-Negrito” Indian groups. By contrast, the reverse would be expected under a scenario of convergence. The comparison of autosomal markers in “Negrito” and “non-Negrito” populations from southern India supports the latter hypothesis. Indeed, our results comparing Indian populations to worldwide populations using Alu insertion markers convincingly show that all Nilgiri Hills' tribal groups (regardless of their phenotypic characteristics) are more closely related to other Indian groups than they are to African groups (Figure 3). Interestingly, Endicott et al. (2003a) also suggested that the “Negrito” features of the Andaman Islanders (in southeast Asia) are due to convergence rather than to common ancestry with Africans. These results emphasize the limits of morphological classifications for inferring population relationships.

Therefore, our results suggest that southern Indian tribal populations share a common ancestry, although they are morphologically diverse. What are the genetic characteristics of these populations? And what evolutionary forces generated the observed patterns? Polymorphic Alu insertions showed high levels of polymorphism in the tribal populations of south India, as found previously for other groups from India (Majumder et al. 1999b; Mukherjee et al. 2000; Veerraju et al. 2001) and elsewhere (e.g. Batzer et al. 1994, 1996; Stoneking et al. 1997). Similarly, the RSPs and haplotype loci also showed high levels of polymorphism in the studied groups, in good agreement with earlier reports on global populations (Jorde et al. 1995; Kidd et al. 1998). In the present study, the estimated levels of average heterozygosities were consistently high in all the populations. The heterozygosity levels were similar to those of other Indian populations using nuclear DNA markers (Majumder et al. 1999b; Mukherjee et al. 2000; Veerraju et al. 2001). Interestingly, the average heterozygosity levels were higher than in other global populations studied (Europe and America), with the exception of African populations (Stoneking et al. 1997; Novick et al. 1998). Thus, the autosomal DNA markers attest that the study groups exhibit high levels of genetic diversity.

The extent of genetic differentiation based on 24 polymorphic markers for the five southern Indian tribal populations (Gst= 6.7%) is higher than that observed in other parts of India (Mukherjee et al. 2000; Veerraju et al. 2001), but smaller than continental-level estimates based on autosomal RSPs and microsatellites (Bowcock et al. 1991; Deka et al. 1995; Barbujani et al. 1997; Stoneking et al. 1997; Novick et al. 1998; Jorde et al. 2000). Watkins et al. (2001) reported a Gst value of 2.4% for 12 Indian populations using Alu insertion polymorphisms, which is about one third of the Gst value estimated in the present study. Thus a substantial interpopulation variability was observed in the study populations. Various analyses based on genetic distances among populations corroborated this finding. Furthermore, the latter analyses also suggested that the process of genetic differentiation of the south Indian tribal groups has been accentuated by genetic drift. The likely key-role of genetic drift in shaping genetic variation in south India is further supported by dramatic allele frequency differences observed in the populations. In addition, some groups such as the Toda showed large genetic distances to their other tribal neighbors, as well as smaller average heterozygosity levels. Through the passage of time, drift must have played an important role in the genetic differentiation of these small and isolated populations (i.e. four out of the five study groups have census sizes less than 10,000).

Under drift conditions, one would expect not only high differentiation among populations, but also reduced diversity. However, we have presented evidence that, with respect to autosomal DNA polymorphisms, the south Indian populations show high levels of heterozygosity. This observation may be explained by population history, for example by a high inflow of genes into the study populations (resulting in high heterozygosities), but different populations having received alleles from different sources (resulting in high levels of genetic differentiation). However, there is no evidence from the anthropological literature to support the hypothesis that the different study populations have had an inflow of genes from different external sources, and our genetic analyses suggest a common ancestry for all Indian groups. Another possible scenario involves an early inflow of genes into a population, followed by a rapid expansion of this population (resulting in high heterozygosities), and subsequent splits of this population into largely isolated (endogamous) populations (resulting in high levels of genetic differentiation). A third possibility is that the high diversity suggested by autosomal markers may reflect a lack of sensitivity and/or ascertainment bias of these bi-allelic markers, since they were selected because they are known to be polymorphic in the Indian subcontinent. Mitochondrial DNA analyses in other south Indian tribal groups support this explanation, as both high differentiation between groups and low diversity within groups is observed (Cordaux et al. 2003).

Thus, both mtDNA and autosomal evidence point to a high genetic differentiation of south Indian tribal groups, consistent with the occurrence of drift in these populations. However, it remains to be determined whether the effects of genetic drift result from a recent and strong bottleneck (possibly as a consequence of the Indo-European migration to India some 3,500 years ago; Cordaux et al. 2003) or from the long-lasting maintenance of a small and constant population size. It is conceivable that both hypotheses are correct, since mtDNA sequence data are more likely to capture recent demographic events, whereas autosomal markers, because of their fourfold larger effective population size, are less sensitive to recent events and more likely to reflect older or long-term demographic events.

In conclusion, the present study suggests that the tribal groups of southern India share a common ancestry, regardless of phenotypic characteristics, and are more closely related to other Indian groups than to African groups. Based on 24 autosomal loci, they appear to show high levels of genetic diversity and genetic differentiation. Several lines of evidence suggest that genetic drift has been the major evolutionary force to shape genetic variation in these populations. This represents an important feature of tribal populations of south India, which has to be taken into account in any attempt to reconstruct the history of these populations.


We are grateful to the original donors of samples. We thank the members of the laboratory of the Anthropology and Human Genetics Unit of the Indian Statistical Institute, Kolkata, for their help at various stages of this work and two anonymous reviewers for their fruitful comments on an earlier version of the manuscript. We are also grateful to the Department of Biotechnology, Government of India for financial support. DE was supported by CSIR, India. RC and MS were supported by funds from the Max Planck Society, Germany.