Mitochondrial DNA Diversity in Tribal and Caste Groups of Maharashtra (India) and its Implication on Their Genetic Origins


Address correspondence to: Mumtaz Baig, Department of Zoology, Government Vidarbha Institute of Science and Humanities, Amravati, MS, India Tel: +91 721 2574137 E-mail:


Genetic relationships among caste-groups are not uniform across the geographical regions of India. Many anthropologists have speculated on the tribal origin of some caste groups in Maharashtra and other states of India. To test this hypothesis, we used neutral mtDNA markers to study genetic relatedness among tribal and caste groups from Maharashtra. Descriptive statistics such as nucleotide diversity, gene diversity and average mismatches were found to be of the same magnitude. Phylogenetic network analysis exhibited a star-like expansion that may date back to the peopling of Eurasia, ∼50,000 year ago. The reconstruction of mtDNA haplogroups showed that both the caste and tribal populations share similar branches of the tree. Also, the coalescence age estimation of caste and tribal populations suggests the persistence of maternal lineages with their root in early late Pleistocene. Our mtDNA analyses show some preliminary and significant evidence for the origin of prehistoric tribal and hierarchical caste societies of Maharashtra.


India is known for the enormous cultural and genetic diversity of its people (Majumdar, 1998). Such diversity is sometimes attributed to the positioning of the Indian peninsula at the tri-junction of the three continents, viz Africa, Europe and Asia. The contemporary Indian population is stratified as tribal and non-tribal, i.e. the caste populations. The origin of the castes in India is an enigma (Majumadar, 2001), though many are known to have tribal origins (Karve, 1961; Kosambi, 1964).

The Maharashtra state of India forms a huge irregular triangle with its base on the west coast of India, overlooking the Arabian Sea (Figure 1). Historically, the state is comprised of three sub-regions, Western Maharashtra, the Vidarbha and the Marathwada. Vidarbha lies on the eastern side and thus mainly contributes to the region broadly referred to as central India. Apart from tribal populations many other ethnic communities, mainly Hindus, Muslims, Christians, Buddhists and Sikhs, inhabit the region. The Vidarbha or Eastern Maharashtra has a hoary past and has been under the domination of many Hindu, Muslim and tribal-Gond kingdoms. The Eastern Maharashtrian strip served as a bridge between Northern and Southern India.

Figure 1.

Map of India showing location of Maharashtra state (darkened).

It is assumed that the relationships between these various populations may define the present genetic landscape of India. Taking this assumption and geographical and ethnic diversity into account, we used neutral mitochondrial DNA (mtDNA) markers as a pilot study to elucidate the distribution of mtDNA haplogroups shared by the diverse ethnic groups (Mountain et al. 1995; Bamshad et al. 1996; Sirajuddin et al. 1994; Das et al. 1996).

Using this method, we found genetic evidence in support of this hypothesis that leads to speculations concerning a tribal origin for some caste groups in the Maharashtra State of India.

Material and Methods


Five ml of blood was collected (by venipuncture) from 74 healthy male individuals belonging to 7 ethnic communities residing in Eastern Maharashtra. Before collection of samples, informed consent was taken and at the same time, un-relatedness of all the individuals was ascertained by interview. The details of sample sizes, and ethnological information regarding population sampled were:

Nav-Baudh (caste) (NB): The Nav Baudhs are widespread throughout Maharashtra. They adopted Buddhism some 50 years ago and therefore are known as Nav Baudh (New Buddhist). They speak the Marathi language, a branch of the Indo-European linguistic family. Many were traditionally menial labourers, but are now engaged in white-collar occupations too. We sampled 10 unrelated individual from this group.

Maratha (caste) (MT): The Maratha belong to the Hindu cast and fall close to the upper caste Brahmins from the same geographical location (Cavalli-Sforza et al. 1994). They claim to have been warriors and landlords, and primarily practiced agriculture. They too speak Marathi and many are now engaged in white-collar occupations. We sampled 10 Maratha from the region.

Bohra (caste) (BR): The Bohras belong to the Shia sect of Islam and are traditionally businessmen. They primarily speak Gujrati, a dialect belonging to the Indo-European linguistic family. They are mainly confined to urban areas. We took 10 unrelated Bohra samples from Eastern Maharashtra.

Irani (Caste) (IR): The Irani are confined to a few cities in Maharashtra as a migrant population. They also belong to the Shia sect of Islam and speak an Irani (Persian) dialect. They are mainly engaged in small businesses. Eight samples were collected from this caste.

Korku (Tribe) (KR): The Korkus speak a language belonging to the Austro-Asiatic linguistic group and are believed to be similar to the Santhal tribe of North India. British rulers brought this tribe to the region as bonded labourers. They are now mainly menial labourers and some are settled agriculturists. The sample size of this group was 12.

Madia Gond (Tribe) (MG): The Gonds predominantly inhabit the extreme eastern region of Maharashtra and much of Central India. They speak a Gondi dialect, belonging to the Dravidian linguistic family, the second most commonly spoken linguistic family, after Indo-European, in India. This tribe falls under primitive tribes and is mainly engaged in hunting and food gathering; a few have settled as agriculturists. We sampled 15 members of this tribe.

Kolam (Tribe) (KL): Besides inhabiting the adjoining states, a substantial number of these people inhabit a few districts of Vidarbha. They speak the Gondi dialect which belongs to the Dravidian linguistic group. We sampled 8 male individuals from this group.

Sequencing and Restriction Site Polymorphisms (RSPs) Study

DNA from each blood sample was isolated from polymorphonuclear leucocytes by proteinase K incubation followed by phenol-chloroform extraction and ethanol precipitation. HVS-I of the control region was amplified as described elsewhere (Richards et al. 1996) and was sequenced by the use of the ABI Prism Big-Dye Terminator cycle-sequencing protocols (Applied Biosystem, USA). The fluorescent-labeled products were analyzed on an Applied Biosystems Model 377 DNA sequencer (Perkin-Elmer). The sequences were submitted to GenBank (accession numbers AY208753-AY208782, AY426267-AY426294 and AY524669- AY524684). Simultaneously, we also screened some representative samples for the two most common mtDNA restriction segment polymorphisms so as to assign haplogroup status to the samples. Samples were screened for the two most common haplogroups found in India (M defined by +Alu I site gain at np 10397 followed by U, defined by Hinf I site gain at np 12308). The mtDNA RSP analyses were performed using standard primers and protocols as described by Torroni et al. (1996).

Data Analysis

Using Clustal X (Jeanmougin et al. 1998) software, DNA sequences (HVS-I positions 16024-16383) were aligned with the revised Cambridge Reference Sequence (rCRS) (Andrews et al. 1999). Descriptive diversity indices such as gene diversity (Nei, 1987) and nucleotide diversity (Nei & Tajima, 1981) were calculated using the DnaSP version 3.5 package (Rozas & Rozas, 1999). Using the same package, Fu's FS (Fu, 1997) test for selective neutrality was also conducted.

To illustrate phylogenetic relationships, a reduced median network was constructed using the program NETWORK (Bandelt et al.1995). The expansion times of the populations were estimated as τ= 2 μt (Rogers et al. 1992), where t is the time in years. For the HVS- I region, the divergence rate was taken as μ= 5.4 × 10-5 per year (Ward et al. 1991). The haplogroup ages were estimated as described earlier (Forster et al. 1996) using the estimator Ph, which is the average transitional distance from the founder haplotype.

Results and Discussion

We have divided our 74 sequences into caste and tribal populations comprising 38 caste and 36 tribal sequences. The gene diversity calculated for both the tribal and caste populations was nearly the same. The tribal group revealed high nucleotide diversity when compared to the castes (Table 1). Similarly, we computed Fu's FS statistic, which is particularly sensitive to population growth. Significantly large negative values indicated population expansion (Fu, 1997), which we observed to be greater in the case of the tribal group in our collection (Table 1). The signature motif linked to this diversity and the earlier expansion was explored in detail in the subsequent coalescence age estimations.

Table 1.  Diversity indices and demographic parameters estimated for tribal and caste populations belonging to Maharashtra
Nucleotide diversity0.0150.0180.017
Gene diversity0.9870.9950.993
Fu's Fs−24.748−32.953−74.013
Tau (τ)4.9795.9165.498
Expansion time in461015477750907
 Year before present (ybp)

Mitochondrial DNA Haplogroups in the Study Population

After combining the HVS-I motif and RSPs information, haplogroup status was assigned to the samples and is reported in Table 2. The haplogroup M and U formed the major portion, (63.5% and 14.8% respectively) of the mtDNA variation found in Maharashtra (Table 3). About ∼4% of Maharashtra mtDNA belonged to other haplogroups such as T1 and W, which are characteristic of Eurasia, while ∼18% did not belong to any of the continental clusters described previously. Frequency comparison of these M and U mtDNA haplogroups with previously studied North-Indians (Kivisild et al. 1999a) and South-Indians (Edwin et al. 2002; Roychoudhury et al. 2000) suggests a distinct gradient in their distribution from North to South (Table 3). In addition to this, the persistence of high frequencies of M and U in the present day caste and tribal groups is suggestive of the fact that both trace their maternal origin to the early expansion of very few female founder.

Table 2.  Description of polymorphisms observed in the mtDNA HVS-I region and RFLP status of the 74 Maharashtrian caste and tribal populations along with their common haplogroup designation. Those in italics are samples inferred by HVS-I motif and not by RFLP
Sample NameHVS-I Sequence polymorphismRFLP StatusHaplogroup
KR1166del- 223+Alu IM
KR3223-292Alu IW
KR5189-223-300 M
KR6183C-189-223-320 M
KR7223-231-266-291-319-362+Alu IM6
KR871 R
KR9129-223-311-344 M4
KR10183C-189-223 M
KR11111-154-178-298-311Hinf IR
KR12129-223-274-291-362+Alu IM4
KR13129-223-261-274-291-362+Alu IM4
MG1126-163-186-189-294Hinf IT1
MG2223-284-311-366 del+Alu IM3
MG393-145-189-223-290-312 M
MG4223-318T+Alu IM
MG593-223-243-270-319-352 M2a
MG6129-266-318-320-362Hinf IR
MG7189-223-240-368 M
MG8223-274-301-319 M2
MG9129-223-264-265C-319-365 M4
MG10223-270-274 – 319-352 M2a
MG1151-206C- 230-311+Hinf IU2
MG12278Hinf IR
MG13169-172-185-223-278+Alu IM
MG14172-223-311+Alu IM
KL1223-304 M
KL251-126-223-278 M
KL351-206C-230-311+Hinf IU2
KL4183C-189-223-311G M
KL593-179-227-245-266-278-362− Hinf IR
KL686-108-129-223-278 M
KL7183C-189-327-330 R
KL851-206C-230-246T-311 U2
MT1183C-189-223-301 M
MT2145-176-209-223-311 M
MT3145-176-223-261-311+Alu IM
MT4188-223-311+Alu IM
MT5179-260-261-319-362Hinf IR
MT6179-223-294+Alu IM
MT7188-223-231 M5
MT8188A-223-270-274-291-319-352 M2
MT9223-256 M
MT10223-256-293-327A M
NB1129-223-304-362-366 del M4
NB2223-270-274-319-352 M2a
NB351-154-206C-230-311+Hinf IU2
NB4166-223-300-311+Alu IM
NB5129-144A-223-309-362 M4
Sample NameHVS-I Sequence polymorphismRFLP StatusHaplogroup
  1. Nucleotide positions (minus 16,000) at which mutations were noted relative to the revised Cambridge Reference Sequence. (Letters only for transversions specifies Nucleotide change).

NB6169+C-189-223-256-274-319-320 M2b
NB7223-304 M
NB8129- 230-343+Hinf IU3
NB969-274-318T+Hinf IU7
NB10223-231-311-356-362 M5
DB1223-270-319-352 M2a
DB2179-223-294 M
DB469-274-318T+Hinf IU7
DB5129-230-343 U3
DB6223-256 M
DB7309-318T+Hinf IU7
DB8309-318T U7
DB9126 R
DB10223-256 M
IR1223-318T M18
IR2192-362Hinf IR
IR3286Hinf IR
IR593-126-163-186-189-292-294-302Hinf IT1
IR6223-318T M
IR7147G-172-223-248-295-355 N1a
IR893-166del-183C-189-223 Ma
Table 3.  Frequencies (%) of haplogroups calculated from 74 caste and tribal populations
Haplogroup frequency (%)
  1. adeduced from sequence data of population from Kashmir, Punjab and Uttar Pradesh from North India studied by Kivisild et al. (1999a).

  2. bdeduced from data of tribal and caste groups from south of India studied by Roychoudhury et al. (2000).

North India a(44%)(13.3%)
South India b(68%)(10.8%)

Phylogenetic Analysis Using Reduced Median Network

The reduced median network obtained from all 74 samples (HVS-I sequences and coding information) exhibited a star-like expansion that is clearly evident from the majority of the branches radiating out from the centre (see Figure 2). Furthermore, this expansion signal shows two older nodes that may date back to the beginning of the settlement of Eurasia ∼50,000 years ago.

Figure 2.

Reduced median joining network constructed from 74 pooled Maharashtrian sequences. The size of each node is proportional to the haplotype frequency.

Structure of Haplogroup M in Maharashtra

Apart from phylogenetic approaches, careful dissection of Asian mtDNA lineages in recent years clearly reflects that Asia's phylogeny is different from that of Europe and Africa (Wallace, 1995). Haplogroup M is a dominant mtDNA cluster among the populations of mainland Asia, as well as among Native Americans (Ballinger et al. 1992; Torroni et al. 1994). Haplogroup M has been sub-divided into discrete sub-clusters due to the accumulation of further synapomorphic mutations. These sub-clusters are defined by certain RFLP and HVS-I sequence polymorphisms (Torroni et al. 1993). In our assay of 74 samples, we found sub clusters of M viz M2, M2a, M2b, M4, M5 and M6, previously defined by Kivisild et al. (1999a) and Bamshad et al. (2001) in other Indian populations. The structure of haplogroup M in Maharashtra, and its comparison with other Indian populations, confirms the autochthonous development of this haplogroup.

Structure of Haplogroup U in Maharashtra

After haplogroup H, U is the second most common haplogroup in Europe. Only haplogroup U in the Maharashtrian population exhibited a frequency comparable to European populations. Interestingly, in our assayed samples, caste populations exhibit more occurrences of U than tribal populations. We found U2, U3 and U7 sub-clusters of U as reported earlier by Kivisild et al. (1999b) in Indian populations. The haplogroup U phylogeny in Maharashtra is dominated by U2 and U7, which apparently show autochthonous development away from European specific haplogroup U sub-clusters. Thus, the possibility that the Indian haplogroup U variations could be derived from a non-European gene pool needs to be considered.

Coalescence Age Estimation of Haplogroup M and U in Maharashtra

We estimated the time to the most recent common ancestor of a particular haplogroup using the estimator ρ (Forster et al. 1996). Only transitions between nucleotide positions 16,090–16,365 in the HVS-I of mtDNA were considered, and one substitution per 20,180 years was taken as an average distance from a specified founder. Based on this convention, the time depth for M and U haplogroups were calculated and are reported in Table 4. The mean coalescence age of the Maharashtrian M cluster was found to be ∼45000 ± 641 years. This age is younger than the date of 65,000 years as proposed earlier by Mountain and colleagues (Mountain et al. 1995) for an expansion starting from South-Asia. This is also evident from our τ= 2 μt approach. Similarly, the mean coalescence age estimation of U was ∼25600 ± 1624 years (Table 4). Thus, this reconfirms that India has witnessed two major expansion phases that have influenced the wide assortment of the Maharshtrian and other Indian lineages. The more recent phase, which according to our estimation started around ∼25600 years ago, is well reflected in the coalescence age of U. This period seems to correspond to the transition from the Middle to the Upper Paleolithic. The first expansion phase may reflect a demographic burst immediately after the initial peopling of India around ∼45000 years ago.

Table 4.  The coalescence age estimation of haplogroups M and U in the Maharashtrian population
Time (t)
in years
M total472.2345082641
U total141.27256281624
R total292.23450011039


Our observations summarize that the distribution of the M and U clusters for both tribal and caste populations of Maharashtra show frequencies comparable to other Indian populations. The phylogenetic network analysis revealed a distinct star-like expansion. Moreover, this expansion is also supported by the ∼50,000 year coalescence age estimation for tribal and caste groups. Thus it is likely that both these socio-culturally different groups have their maternal roots in the early late Pleistocene. Hence, taking this into account, the possibility of the crystallization of some prehistoric tribal groups into present day caste societies cannot be ruled out (Karve, 1961). Interestingly, recent studies by Roy et al. (2003) on Western Maharashtrian populations have reported a high occurrence of haplogroups M and C in upper caste Brahmins, as compared to the Nav-Baudh and Maratha. In future, probing with Y-chromosome linked and autosomal markers may allow vital insights into the genetic origin and relationship of caste and tribal populations in India.


We thank Toomas Kivisild and Peter Forster for critical advice on the manuscript contents and the network analysis respectively.