A comprehensive review of HVS‐I mitochondrial DNA variation of 19 Iranian populations

Iran is located along the Central Asian corridor, a natural artery that has served as a cross‐continental route since the first anatomically modern human populations migrated out of Africa. We compiled and reanalyzed the HVS‐I (hypervariable segment‐I) of 3840 mitochondrial DNA (mtDNA) sequences from 19 Iranian populations and from 26 groups from adjacent countries to give a comprehensive review of the maternal genetic variation and investigate the impact of historical events and cultural factors on the maternal genetic structure of modern Iranians. We conclude that Iranians have a high level of genetic diversity. Thirty‐six haplogroups were observed in Iran's populations, and most of them belong to widespread West‐Eurasian haplogroups, such as H, HV, J, N, T, and U. In contrast, the predominant haplogroups observed in most of the adjacent countries studied here are H, M, D, R, U, and C haplogroups. Using principal component analysis, clustering, and genetic distance‐based calculations, we estimated moderate genetic relationships between Iranian and other Eurasian groups. Further, analyses of molecular variance and comparing geographic and genetic structures indicate that mtDNA HVS‐I sequence diversity does not exhibit any sharp geographic structure in the country. Barring a few from some culturally distinct and naturally separated minorities, most Iranian populations have a homogenous maternal genetic structure.

Within the mtDNA, the D-loop region houses the hypervariable segment (HVS) which exhibits a mutation rate significantly higher than other segments of the mitogenome.Notably, the HVS-I within this region, containing critical haplogroup-defining loci, has been extensively researched (Van Oven & Kayser, 2009).Our study aims to provide an exhaustive review of the extant data concerning mtDNA variation among the populations residing in the Iranian Plateau, a region encompassed within Central Asia.Geographically, Central Asia runs from the Caspian Sea and extends eastward to China and Mongolia, demarcated by Afghanistan and Iran in the south and bordered by Russia in the north.For the purpose of this study, nations, such as Kazakhstan, Kyrgyzstan, Tajikistan, Turkmenistan, and Uzbekistan, are also integrated into our definition of Central Asia.
From the early dispersals of anatomically modern humans from Africa, the Iranian territory has occupied a strategic position within the Central Asian corridor.This corridor, serving as a tri-continental juncture, has historically facilitated human migrations (Vallini et al., 2022).The Iranian Plateau (Figure 1) is a prominent geographical and geological expanse within Central Eurasia, and its internal features are distinctly demarcated by surrounding mountain arcs (Bell, 1832).
The investigation of genetic variations in Iranian populations, particularly from a matrilineal perspective, has hitherto remained somewhat under-explored.Although the HVS-I region of the mtDNA has been examined (Farjadian et al., 2011), there exist two foundational studies that focus on the entire mtDNA genomes of the Iranian population (Bahmanimehr et al., 2015;Derenko et al., 2013).However, these seminal works did not encompass all the Iranian samples previously scrutinized.The present research endeavors to discern the relationships between HVS-I mtDNA gene pools within the modern Iranian genetic framework and those of neighboring Eurasian nations.This is achieved by an integrative reanalysis of extant data on contemporary Iranian cohorts.To facilitate interpopulation comparisons, HVS-I nucleotide sequences from regional clusters of proximate territories were incorporated.The study seeks to address the subsequent inquiries: (1) To what extent do genetic similarities pervade among Iranian populations across diverse geographical regions?(2) How have the mtDNA gene pools of neighboring populations influenced the Iranian genetic spectrum in the context of transboundary migration within the Iranian Plateau? (3) Are Iranian minor ethnic groups genetically distinct from the rest of the country's population due to specific cultural practices such as endogamy?

Study populations
HVS-I nucleotide sequences from a collective sample of 3840 individuals were amassed, representing 19 Iranian populations (n = 1498) and 26 populations from neighboring countries (n = 2342) (Figure 3A,B).These sequences were either retrieved from the GenBank database or procured directly from the authors of pertinent studies (Table 1).In this analysis, the populations of three major cities, namely, Tehran, Isfahan, and Shiraz, were treated as distinct entities.This delineation was necessitated due to the absence of detailed ethnic categorizations in available datasets for these urban centers and the recognition of their status as some of Iran's most populous cities, encompassing a diverse spectrum of ethnic affiliations.Although this exhaustive review primarily centered on the extensively studied mtDNA HVS-I, it is noteworthy that certain Eurasian populations, currently devoid of published HVS-I mtDNA data, could not be integrated into this evaluation.(rCRS, Anderson et al., 1981).Subsequent to this, a consistent nucleotide range was established for all datasets, designating the HVS-I region (16,046-16,381 bp).This region was then trimmed using a script written in Python, version 3.10.Notably, all variations in the polycytosine tract within the range 16,184-16,193 bp were excised.Mitochondrial haplogroup determinations were performed by HaploGrep 2 (Weissensteiner, Pacher et al., 2016), which uses Phylotree mtDNA tree Build 17 (Van Oven, 2015;Weissensteiner, Forer et al., 2016).
To categorize aligned FASTA files by individual populations, facilitating the generation of the pertinent Arlequin file (.arp), the FaBo online toolbox was employed, specifically leveraging the "DNA to haplotype collapse and converter" function (Villesen, 2007).

Statistical analysis
The number of polymorphic (segregating) sites (S), haplotype (gene) diversity (Hd), nucleotide diversity (pi), number of haplotypes (H), and the mean number of differences between all pairs of haplotypes (π) as fundamental indices of the molecular diversity comparisons were calculated by Arlequin v3.5.2.2 software from prepared (.arp) for each population.Based on HVS-I sequences (np 16,046-16,381) from the 45 populations, F ST values were calculated in Arlequin (Excoffier & Lischer, 2010).The Tamura and Nei substitution model (Tamura & Nei, 1993) was used, with a gamma value of 0.325 inferred from the software FindModel based on PAML likelihoods.Ten thousand permutations were used to test significant variations in F ST values (Posada & Crandall, 1998).
First, to conduct a principal component analysis (PCA), the mtDNA haplogroups of all aligned HVS-I FASTA files were checked with HaploGrep 2 V.2.4.0 compared to Phy-loTree build 17 (Van Oven, 2015).These haplogroups stand for the primary 36 groups as A, B, C, D, F, G, H, HV, I, J, J1, J2, JT, K, L, M, N, P, R*, R0, R2, T, T1, T2, U*, U1, U2, F I G U R E 2 Iranian ethnic population structure with language information.NA, not applicable; Pop, proportion of the population compared to the whole Iranian population.The language family/language family sublevel/language (dialect) attributions are in green.
U3, U4, U5, U6, U7, W, X, Y, and Z (haplogroup frequencies tables are available in Supplementary Table , sheets 1  and 2).Then, the relative frequencies of all haplogroups for the whole 45 populations were calculated.The PCA scores were calculated from the relative frequencies via the prcomp function and for categorical PCA in the R v4.1.3package.
To measure and to adjoin to certain PCAs, we performed hierarchical clustering analyses employing the Ward type (Ward, 1963) algorithm and the Euclidean similarity measurement method.The first six PCAs were used for the clustering.The Ward clustering results were visualized in R as a dendrogram using the hclust function with Euclidean distances.The significance of each cluster was evaluated by 10,000 bootstrap replicates using the pvclust function in R. Pairwise F ST value and Slatkin's distance matrices were acquired in Arlequin software for the multidimensional scaling (MDS) (Excoffier & Lischer, 2010).Based on PAML likelihoods, the evolution model of Tamura and Nei (1993) with a gamma value of 0.325 was picked by the FindModel program (Posada & Crandall, 1998).Finally, using the metaMDS function based on Euclidean distances implemented in the vegan library of R, MDS was performed to a matrix of linearized Slatkin F ST values (Slatkin, 1995) and visualized in a two-dimensional space.The standard analysis of the molecular variance (AMOVA) function implemented in Arlequin was used to calculate variances, fixation indices, and p values (p).Ten thousand permutations were used to test the importance of F ST values.The distance matrix was used to infer the haplotype definition.AMOVA was run for each assembly after dividing the groups into two or three clusters.The highest among-groups and the lowest within-groups variances, F CT and F SC values were selected as the best models for our AMOVA analysis.A Mantel test was conducted to determine any potential correlations between the genetic and geographic distances of the samples under study using an F ST genetic distance matrix (Mantel, 1967).

Genetic diversity of the studied populations
In this review, we have for the first time systematically collated and presented all extant mtDNA (full mitogenome and HVS-I) data from contemporary Iran.Utilizing HVS-I F I G U R E 3 (a and b): Geographic distribution of 19 and 26 investigated populations in Iran and Eurasia.The "Persian" and "Ashkenazi" ethnic groups were not specified on the map because they originate from several provinces of Iran and areas of Eurasia.Abbreviations are listed in Table 1.Source: The maps were created with the MapChart Tools: https://www.mapchart.net/world.html.
sequences that span a length of 335 bp, specifically covering mtDNA nucleotide positions 16,046-16,381, we identified a total of 2505 distinct haplotypes from 3840 Eurasian individuals.Notably, 1119 of these haplotypes were discerned within the Iranian cohort (n = 1498 in total).
For the encompassing Eurasian dataset, the mean haplotype diversity was calculated at 0.9974, with the observed values spanning from a minimum of 0.2622 in the Dravidian population to a maximum of 1.This maximum value was observed in multiple groups: Northern Talyeshes from Azerbaijan, Bakhtiaris, Southern Talyshes, and the population from Isfahan, Iran (as detailed in Table 2).

mtDNA haplogroup distribution
In the present study on the Iranian population, 36 distinct haplogroups were identified.A significant proportion of these haplogroups correspond to the more ubiquitous West-Eurasian lineages, including H, HV, J, N, T, and U. Farjadian et al. (2011) elucidated that a vast majority (90.8%) of HVS-I lineages predominantly traced their origins to Western Eurasian haplogroups, namely, H, HV, I, J, K, N, T, U, V, and W. Conversely, haplogroups typically associated with East-Eurasian (A, C, D, F, G, and Z), South-Asian (M), and Sub-Saharan African origins (L1, L2, and L3) were confined to specific, smaller ethnic subsets.Complementing these findings, Bahmanimehr et al. (2015) undertook an extensive exploration of the entire mtDNA mitogenome within the Iranian populace, revealing the predominant distribution frequencies of West Eurasian lineages, such as H, J, and U.However, this dominance was not observed within the Baloch and Zoroastrian communities.Our findings resonate with the aforementioned studies, particularly highlighting the prominence of mtDNA haplogroups H (19.69%), N (9.07%),HV and J1 (7.07%), T (5.88%), U7 (5.07%), and M (5.03%) across the Iranian communities with the Zoroastrian ethnicity as a notable exception.
The summary table of our review shows that four haplogroups B (from the Lurs group), Y (Turkmen), JT (Mazanderanis), and P (Tehran) were represented singularly.In contrast, the remaining 32 identified haplogroups each marked their presence at least two instances within the broader Iranian demographics (Supplementary Table, sheet 1).Bahmanimehr et al. (2015) have explained that the South Asian haplogroup M is a common ancestor found in all of Iran's neighbors to the east.According to our data, the M haplogroup is most prevalent in the Tehran and Baloch groupings among Iranian communities (Supplementary Table , sheet1).The two East-Eurasian mtDNA haplogroups of G and D as a descendants haplogroup of M can be traceable for the Turkmens of this study in which distinctively sets them apart from other Iranian populations.
Shared maternal haplogroups between the Iranian population and populations of other countries, reflecting geographical trends, extend beyond just Iran's terrestrial regions.Sea routes also appear to have facilitated maternal gene flow.For instance, the African haplogroup component (L) is most prevalent in Qeshm and the Arab populations residing along the Persian Gulf in southern Iran.This prevalence suggests an African influence, potentially channeled through the Strait of Hormuz, a key conduit between the two continents (Figure 4).Derenko et al. (2013) reported that haplogroup R2 likely dates back to the pre-LGM/Late Glacial era, estimated to have an overall coalescence time of 21-31 kya years ago, and possibly originated in southern Iran.Significantly, this haplogroup is most prominent in the Qashqai and Persian populations, accounting for 3.3% and 1.8%, respectively, among the surveyed Iranian datasets.The investigations by Derenko et al. substantiated that communities from Iran, Anatolia, the Caucasus, and the Arabian Peninsula shared a set of maternal lineages.Even with the aggregated data from both Derenko and Farjadian et al. studies about the Qashqai ethnic group, the observed frequency of the R2 maternal haplogroup remains consistent with Derenko et al.'s report from 2013 (2%), resting at approximately 1.86%.
In the majority of the studied neighboring countries of Iran, the haplogroups H (14.17%), M (10.16%),D (9%), R (6.78%), U (4.56%), and C (4%) are predominant.The East-Eurasian haplogroups C and D are notably prevalent among all Turco-Mongol-speaking groups (except for haplogroup C in Kyrgyz, Turkmen, and Karakalpak).Conversely, the South Asian haplogroup M stands out as the   The ethnic groups "Persian," "Armenian," and "Azeri" are not delineated on the map as their data are sourced from multiple provinces in Iran, as indicated in Figure 3A.Supplementary Tables, sheets 1 and 2 provide a summary of mtDNA haplogroups distribution for both Iranian and other Eurasian communities studied.

Haplogroup frequency-based comparisons of the studied populations
A PCA was conducted on the frequencies of derived haplogroups from a dataset comprising 3840 HVS-I mitochondrial sequences.This was executed to discern the genetic affiliations between Iranian groups and other Eurasian populations (see Figure 5A,B).Drawing from 36 haplogroups, the first two principal components (PCs) encapsulated 25.504% of the total variance.
There is a distinct demarcation along PC1 between populations speaking the Turco-Mongol languages (viz., Mongolia, Turkmen, Shors, Karakalpak, Kyrgyzstan, Kazakhstan, and Uzbekistan) and other groups.Notably, haplogroups Y, G, C, A, and D manifest with higher frequencies in Mongolia, Kyrgyzstan, and Uzbekistan compared to other areas.Parallelly, Yunnan/Altaian-Kizhi and Kyrgyzstan/Uzbekistan exhibit congruent frequencies for the D and C haplogroups.
Counterposing this on the PC1 axis, and with a distinction on the PC2-PC3, the Kalash ethnic group is characterized by pronounced frequencies of haplogroups R0, J2, and specific sub-lineages of the U haplogroup, such as U4 and U2.In a similar PC1 orientation with the Kalash, the Eurasian outlier Dravidian population exhibits the highest frequency of haplogroup R0.The Yaghnobi, marked by the highest HV frequency in the dataset, clusters with Iranian populations, including Persians, Mazanderanis, Lurs, Armenians, Qashqai, Kurds, Shiraz, Azeris, and Jews in Iran.
The ancient U7 lineage was strikingly high at a frequency of 18.07% in the Lur ethnic group within the Iranian We used the first six PCs' scores for the Ward-type hierarchical clustering with Euclidean distance calculation method.
The dendrogram on Figure 6 helps verifying the PCA results.Out of the two major branches, the first left one contains more than half of the Iranian dataset (63.15%).Interestingly, the majority of the population who live in southern parts of the Zagros mountains stand together with the strongest probability (Shiraz, Lurs, Kurds, and Qashqai).On the other hand, the high ratio of the autochthonous haplogroup U7 among the population who lived in the southern part of the Zagros mountain can represent the persistence of this ancient haplogroup from a long back period up to now for the modern population.It is worth mentioning that Iranian Zoroastrians and Pathans from Afghanistan are joined with high probability values (97%).This can link back to the historical root and cultural practices of the Pashtun ethnic groups from Zoroastrianism (Wynbrandt, 2009).
Another subbranch system composed of the majority of Turco-Mongol language speaking groups like Mongolians, Uzbeks, Kyrgyz, Kazakhs, and Karakalpaks with high probability value (96%), showing their common roots in their genetic structure.More than half the adjacent countries of Iran stand for the second major cluster (97%).It is shown on the dendrogram that within the second major cluster, Turks, Georgians and Armenians; Brahuis, Balochs and Zoroastrians; Turkmens and Tajiks that are nearby non-Iranian populations stand within the same branch system.Interestingly, the Talyeshes from Azerbaijan and Iran cluster with each other at 99% probability.The predominance of haplogroups H, N, J, and U in both Talyeshes populations connect them genetically across the border.

Sequence-based genetic differentiation discussion
Based on the HVS-I mtDNA sequence data, pairwise genetic distances (F ST ) were calculated.F ST range of eight populations (Dravidian, Yunna, Kalash, Ashkenazi, Iranian Jews and Zoroastrians, Shors, and Yaghnobi) was notably higher compared to other groups, designating these populations as outliers in the MDS plot (Supplementary Table , sheet 3).Additionally, all F ST p values were significant except the following pairs: Yaghnobi and Iranian Azeris, Shors and Iranian Azeris, Shors and Turks, Iranian Zoroastrians and Iranian Azeris, Ashkenazi and Iranian Azeris, and Iranian Jews and Iranian Azeris (Supplementary Table , sheet 3).
The pairwise F ST values were close to zero in some identical ethnonyms living in Iran and beyond its borders, such as Armenians, Turkmens, Azeris, Balochs, and Talyshes, showing high genetic affinity between them.An exception to this was the observed significant F ST of 0.12 (p < 0.05) between Iranian Zoroastrians and their counterparts in India.
The pairwise F ST measured between the Persianspeaking urban populations (Tehran, Isfahan, and Shiraz) was low and nonsignificant, correlating with the high maternal gene flow and the linguistic evidence of the modern Persian language's origin in these populations (Renfrew, 1990).MDS and heatmap plots were created from pairwise F ST values to depict the relationships between the investigated Iranians and other populations from Eurasia (Figures 7 and 8).
Generally, the amounts of pairwise F ST displayed considerable similarity among Kazakhs, Karakalpaks, Mongolians, Kyrgyz, Hui, Altai-Kizhi, Sindhi, and Zoroastrians from India.These populations are located next to each other on the MDS plot.However, they had quite different F ST from the rest of the populations, leading to a separate cluster in the MDS and heatmap plots.
The Gilaki and Mazandarani ethnic groups, inhabitants of the South Caspian region in Iran, predominantly speak the North-Western branch of Iranian languages.Linguistic evidence, highlighted by the shared typological characteristics among Gilaki, Mazandarani, and Caucasian languages, supports the hypothesis that these groups may have originated in the Caucasus, potentially displacing a prior South Caspian population (Sana'ati et al., 2017).Our genetic findings, derived from sequence-level analysis, indicate that the Iranian Gilakis have genetic ties closely resembling the Azeris and Northern Talysh people from Azerbaijan.Meanwhile, the Iranian Mazanderanis appear genetically similar to Armenians from Armenia.Furthermore, both groups display genetic proximity to the broader Iranian gene pool.These results are corroborated by PCA and Ward's method of hierarchical clustering, reinforcing the genetic affinities observed among these populations.
In 1813, the Golestan agreement between Russia and Iran separated the North Talysh area from Iran, dividing Astara city into two parts.The Astrachai River was used as the border, and the area to the north was ceded to the Russian government (which became the Republic of Azerbaijan in 1991 CE), and the southern area remained with Iran (Fard et al., 2019).The MDS and heatmap plots of the current study show that the Talyeshes from Azerbaijan are the most similar in their maternal lineages to the Talyeshes from Iran (Figures 7 and 8).
Historically, the Qashqai migrated to Iran at different periods, eventually settling in the Fars province, where they intermingled with local Lur, Kurds, and Turks.A significant event during the Qajar dynasty around 1870 CE saw this tribe grappling with famine, leading to its eventual disintegration and the subsequent absorption of approximately 5000 Qashqai families into the Bakhtiari tribe (Oberling, 1974).Various distributions of haplogroups and sub-haplogroups can be found in this ethnic group.In fact, Qashqai stands the highest variable position (77.8%) for haplogroups within the Iranian gene pool.Qashqai people from Iran have a stronger affinity to Kurds, Lur, and Bakhtiari populations than to other Iranian ones based on our F ST result.
The genetic distance between the Lur and Bakhtiari ethnic groups was negligible, as reflected by the fact that they are close to each other on the MDS plot.Notably, the Lur exhibits a more diverse array of mtDNA haplogroups compared to the Bakhtiaris.Out of the 19 haplogroups identified in Lurs, they share only four (H, T, N, and J) with the Bakhtiaris.Such genetic patterns might be attributed to the Bakhtiaris' pronounced endogamous cultural practices and their nomadic way of life, compared to the Lurs (Baigi & Sadeghi, 2022).The historical distinction between Lur-i Buzurg (Greater Lur) and Lur-i Kuchak (Lesser Lur) can be traced back to 913 CE within the confines of Lurestan (Fragner, 1987).The Bakhtiaris, a subset of the Greater Lur tribe, predominantly inhabit hilly terrains in Khuzestan and Isfahan, located in southwestern Iran.A significant portion of the Bakhtiari population maintains a nomadic lifestyle, alternating between the northern and southern territories of Iran, during summer and winter (in Farsi, called sardsir and garmsir camps).The close location of these two ethnic groups in the MDS and current F ST results indicates Bakhtiaris and Lurs do not have significant maternal gene pool differences from each other among the Iranian populations.
The Zoroastrians from Iran and Iranian Jews exhibit a marked divergence from other Iranian populations on the MDS plot, potentially stemming from their distinct endogamous traditions and cultures (Baigi & Sadeghi, 2022).
The Balochs had the second lowest mtDNA genetic diversity after Zoroastrians in the Iranian populations of the current research.This can be explained with long term geographical isolation (Bittles & Black, 2010).In fact that they enhanced by geographical separation in a southern part of the Zagros mountainous area and the Lut and Kavir deserts, their gene flow restricted between northern and western Iranian districts for a longer time.
The location of the Iranian Azeris on PCA and MDS which locates as a border between Iranian and other non-Iranian population can be explained by the highest number of exogamous marriages with Iranian and non-Iranian populations (Arakelova, 2015;Baigi & Sadeghi, 2022).The Iranian Azeris have low genetic distances to Brahui, Pakistan, and Iranians in Isfahan and Qeshm (Figures 7 and 8).
The MDS plot generally distinguishes an explicitly separate cluster of Iranians from other Eastern adjacent countries' populations.Iranian Azeris, Turkmens, and Balochs groups are placed between the two independent clusters of Iranians and Eastern Eurasian people.
The Kalash are a group of Indo-Iranian people who live in Pakistan's Khyber Pakhtunkhwa Province's Chitral area (Ayub et al., 2015).Kalash culture and belief systems differ from the various ethnic groups surrounding them but are close to practices of the Vedic and pre-Zoroastrians (Mela-Athanasopoulou, 2011).They form a divergent group compared to the other Central Eurasian population on the MDS plot of this research, which shows the lack of South Asian mtDNA lineages.Interestingly, the spatial location of the Kalash on the MDS plot was too far from the Eastern Eurasian populations and stood in line with Zoroastrians from Iran.A similar pattern was observed on PC1-2 (Figure 5A), and the haplogroup-based dendrogram also shows them further from other Pakistani ethnic groups and stands nearer to Iranian population in which cluster with some of them like Tehran, Southern Talyshes, Gilakis, Bakhtiaris, Isfahan, and Arab groups (Figure 6).
The Parsis are Zoroastrian Iranian people who live primarily in South Asia, particularly along the western coast of India.Various historical documents support their Iranian descent.They can be called as the first emigrant community of Iranians as their ancestors fled from Iran to India (Gujarat Province) during the Arab regimes of Iran in the 7th century CE (Fox, 1967).In the Iranian Zoroastrian population, most of the Western-Eurasian haplogroups (H, I, J, and K) are missing entirely, which implies that a considerable genetic drift and founder effects should have occurred in this population following the Muslim expansion in the 7th-8th centuries (Moulton, 1917).In this article, we compared the Zoroastrian-Parsi mtDNA pool from India with the Iranian Zoroastrians.About 35% of Zoroastrian-Parsi's maternal gene pool comprises two Asian haplogroups (M, D) and U, which is recognized as a typical European macro haplogroup.These haplogroups were predominant among the ethnic groups in India, and the genetic structure can also show signs of intermarriages with local women in India.As historical evidence supports, their migration and admixture could have led to their shifted genetic structure and the intermediate location of the Zoroastrian-Parsi between the Eastern Eurasian population and other Iranian populations on the MDS plot.Meanwhile, the high number of endogamous marriages can be considered a potential reason for the small diversity and the distinctiveness of the Iranian Zoroastrian population (Baigi & Sadeghi, 2022).

Analyses of the population structure
In order to reveal possible HVS-I sequence-based differences within the dataset compared to geography, we tested a locus-by-locus AMOVA at the country (only the Iranian population) and Eurasia (the whole dataset) scale.Furthermore, a series of AMOVA was conducted, in which we tested a different number of population groups, building geographically distinct clusters (Supplementary Table ,  sheet 4).On the other hand, identifying clusters that were geographically homogeneous and genetically maximally distinctive from one another enabled us to detect the possible genetic barriers between the Iranian groups.The lack of a sharp geographical structure in mtDNA HVS-I sequence diversity was statistically supported by the performed tests.The majority of haplotype variance for the Iranian population was explained within-population level (93.23%, p < 0.001).It should be noted that the geographic subdivision of the Iranian samples (North-West and South-Central of Iran) consistently produced low and nonsignificant among-groups variance (F CT ) (Supplementary Table , sheet  4).Under both grouping criteria, the F CT , F ST , and F SC values decreased when the outlier groups (Zoroastrians, Jews) were removed from the analyses.As a result of sequential AMOVA analyses for the Iranian population, the highest genetic differentiation between groups occurred when three distinct clusters were identified: The first two clusters represented the Zoroastrian and Jewish communities, and the third cluster represented the remaining Iranian ethnic groups.This clustering configuration displayed the highest among-groups percentage of variance (4.16%) with a significant but low among-groups fixation index (F CT = 0.041, p = 0.009), thus indicating genetic discontinuity between Zoroastrians, Jews, and the majority of Iranian samples (Supplementary Table , sheet 4).
We performed the AMOVA analysis on the whole Eurasian dataset, in which Dravidians were not included, as their genetic difference from the other groups is so large that they would unbalance the results.Corresponding to the MDS, our dataset fit into two large groups rather than three, as two-group clustering gave a higher range for among group-genetic variance (Table 3).Most of the haplotype variation in this constellation accounts for significant within-population differences (95.43%, p < 0.05).
We also performed AMOVA by linguistic groups (Afro-Asiatic, Altaic, and Indo-European) and assess what contribution linguistic differences may have to genetic differentiation in Iran.It seems that a random mtDNA differentiation pattern with respect to linguistics can be invoked, suggesting that linguistic factors did not actually represent insurmountable barriers to matrilineal gene flow among the Iranian population.

CONCLUSIONS
Comprehensive studies delving into the maternal genetic variations within Iranian communities have been limited.In this study, by merging and reanalyzing previously published mtDNA data from the gene pool of the modern Iranian population, we sought to shed light on the HVS-I mtDNA genetic makeup of modern Iranians and neighboring populations across Eurasia.Furthermore, we endeavored to identify potential external influences on the Iranian mtDNA pool and to map out the genetic interactions among various Iranian ethnic groups spanning the Iranian Plateau.
The observed high levels of haplotype and nucleotide diversity underscore the vast genetic diversity within the Iranian gene pool.The matrilineal genetic structures revealed a pronounced genetic affinity among the majority of Iranian populations.Notably, low pairwise F ST values were evident among the Persian-speaking urban populations of Tehran, Isfahan, and Shiraz, suggesting extensive maternal gene flow between these groups.This genetic cohesion is in line with linguistic evidence pointing to the origins of the modern Persian language in Iran's core regions.Further, Iranian Lurs and Bakhtiaris exhibited a particular pattern of high genetic congruence, underscored by both F ST and MDS analyses, which is consistent with the Bakhtiaris' historical subdivision from the greater Lur tribes.In addition the close maternal genetic similarity of Qashqai people with Lurs, Bakhtiaris, and Iranian Kurds corresponds to their admixture with Qashqai after the later population's collapse during the late Qajar dynasty.
Despite this similarity among Iranians, genetic trends can be observed in the country's periphery.The high frequency of South Asian components in the Balochi ethnic group corresponds to the geographical position of the Balochistan province in the southeast of Iran.The Qeshm minority living in the south accumulated some African components of the L haplogroup, which can imply the African inflow through the Strait of Hormuz, also serving as a choke point and a maritime passage between the two continents.Besides geographic, cultural and political factors seem to have acted as obstacles to maternal gene flow between Iranian populations, such as Balochi, Jews, Zoroastrians, and Turkmens, corresponding to long centuries of reproductive isolation caused by language, religion, and other cultural barriers.
In this study, we also considered particular populations with identical ethnonyms within and beyond the Iranian borders and investigated the degree of genetic affinity between them.The relatively recent separation of the Talysh population into two groups in Iran and Azerbaijan did not lead to significant maternal genetic differences between them.Iranian and Indian Zoroastrians, on the contrary, for whom there is also evidence of long isolation for cultural and religious reasons after the Muslim expansion in the 7-8th centuries, showed a significant haplogroup and sequence-based maternal genetic structure.
Our findings suggest that barring a few culturally unique and geographically isolated groups, the broader Iranian populace exhibits a cohesive maternal genetic structure.This pattern can be extrapolated to ethnic groups residing in regions historically under Iranian governance prior to 1828 CE (encompassing Turkmenistan, Azerbaijan, Georgia, and Armenia).
As the field progresses, the use of mitochondrial and whole genome sequencing is becoming increasingly pivotal, especially with advances in molecular biology.Deploying complete mitogenomes in phylogenetic analyses of Iranian populations will enhance genetic resolution for both inter-and intrapopulation assessments.
One limitation of our study is the inherent imbalance in the mtDNA dataset, due to either smaller sample sizes for certain Iranian groups or asymmetrical sampling of specific minority groups (e.g., Assyrian, Tat, Sistani, Bandari, Iranian Georgians, Mandaean, Tabari, Yarsan, and Chaldean).These samples may not holistically represent the diverse Iranian society.By consolidating extant mtDNA data, we aimed to offer a clearer depiction of the Iranian maternal genetic landscape.Future research, concentrating on comprehensive mitochondrial genomes and previously overlooked Iranian populations, especially with balanced urban and rural sampling, will further enrich this domain.

A U T H O R C O N T R I B U T I O N S
Motahareh Amjadi and Mahmood Tavallaei recruited data from reviewed articles and their authors for this study.Motahareh Amjadi and Zahra Hayatmehr performed data analysis.Motahareh Amjadi performed literature review and drafted the manuscript with inputs from all authors.Anna Szécsényi-Nagy, Mahmood Tavallaei, and Balázs Egyed supervised and edited the article.All authors revised and approved the final manuscript.

A C K N O W L E D G M E N T S
None.

C O N F L I C T O F I N T E R E S T S TAT E M E N T
The authors declare no conflicts of interest.

D ATA AVA I L A B I L I T Y S TAT E M E N T
Data sharing is not applicable to this article as no new data were created or analyzed in this study.

O R C I D
Motahareh Amjadi https://orcid.org/0009-0002-1613-3548 Geography of the Iranian Plateau: The Elburz Mountains, including Mount Demavend, the Turkmen-Khorasan Mountains, like the Kopeh Dagh, and the Hindu Kush, are located in the northern arc.In contrast, the Zagros Mountains form the southern and southwestern arches.Several mountain ranges alternate with plateaus and large intermountain-closed depressions inside the region, like the Dasht-e Kavir and the Dasht-e Lut.Source: The map has been created with the genetic mapping tools http://generic.mapping.tools.org.

F
Relative mitochondrial DNA (mtDNA) haplogroup frequencies of the Iranian populations.Circle sizes correspond to sample sizes analyzed in this study.This figure was designed via Adobe Photoshop 2022 V.23.3.Operating system windows 11 64-bit.Source: The raw map was extracted from https://www.cleanpng.com/free/iran-map.html.most frequent among most of the surveyed individuals outside of Iran, with exceptions, including the Yaghnobi, Turk, Dravidian, Georgian, and Armenian populations (see Supplementary

F
I G U R E 5 (a) Principal component analysis (PCA) is performed based on haplogroup frequencies.Brown and blue dots represent Iranian and other non-Iranian Eurasian populations, respectively.Principal component (PC)1-2 together accounted for 25.50% of the total variance.(b) PCA is performed based on haplogroup frequencies.Brown and blue dots represent Iranian and other non-Iranian Eurasian populations, respectively.PC1-3 together accounted for 23.12% of the total variance.gene pool of this study.Moreover, uncommon haplogroups such as K, M, and the smaller percentage of H and N were distinguished.All these observations could characterize the reduced mtDNA variability of Lurs among other Iranian populations.

F
I G U R E 6 Dendrogram based on Ward clustering with bootstrap probability values (p).

F
I G U R E 7 Multidimensional scaling plot based on hypervariable segment (HVS-I) mitochondrial DNA (mtDNA) sequences.Brown and blue dots represent Iranian and other non-Iranian Eurasian populations, respectively.

F
Heatmap plot and clustering based on Slatkin's F ST distance matrix.
Description of the studied populations in Iran and in the comparative regions.
TA B L E 1 Measurements of molecular diversity in the current study.The order of the populations in this table is based on haplotype diversity.
Analysis of molecular variance (AMOVA) based on the Eurasian mitochondrial DNA (mtDNA) hypervariable segment I (HVS-I) sequence dataset.We considered the whole dataset for geographical criteria and linguistic groups within the Iranian population. Note: Table, sheet 5).On the other hand, As a result of considering the entire dataset of Eurasian populations a positive spatial autocorrelation among geographic distances and pairwise F ST were observed (Supplementary Table, sheet 6).