Phylogenetic relationships of freshwater fishes of the genus Capoeta (Actinopterygii, Cyprinidae) in Iran

Abstract The Middle East contains a great diversity of Capoeta species, but their taxonomy remains poorly described. We used mitochondrial history to examine diversity of the algae‐scraping cyprinid Capoeta in Iran, applying the species‐delimiting approaches General Mixed Yule‐Coalescent (GMYC) and Poisson Tree Process (PTP) as well as haplotype network analyses. Using the BEAST program, we also examined temporal divergence patterns of Capoeta. The monophyly of the genus and the existence of three previously described main clades (Mesopotamian, Anatolian‐Iranian, and Aralo‐Caspian) were confirmed. However, the phylogeny proposed novel taxonomic findings within Capoeta. Results of GMYC, bPTP, and phylogenetic analyses were similar and suggested that species diversity in Iran is currently underestimated. At least four candidate species, Capoeta sp4, Capoeta sp5, Capoeta sp6, and Capoeta sp7, are awaiting description. Capoeta capoeta comprises a species complex with distinct genetic lineages. The divergence times of the three main Capoeta clades are estimated to have occurred around 15.6–12.4 Mya, consistent with a Mio‐Pleistocene origin of the diversity of Capoeta in Iran. The changes in Caspian Sea levels associated with climate fluctuations and geomorphological events such as the uplift of the Zagros and Alborz Mountains may account for the complex speciation patterns in Capoeta in Iran.

suggests that the freshwater fauna with low salinity affinity dispersed from the Middle East northward to the Paratethys and then westward into Europe and eastward into western Asia (Durand et al., 2002;Heller, 2007;Por & Dimentman, 1985, 1989. A second hypothesis proposes that, prior to Pliocene orogenesis, the proto-Euphrates collected freshwater from the Middle East and maintained contact with the Black and Caspian Seas (Durand et al., 2002). More recent studies suggest that the colonization of Europe by Leuciscinae most likely occurred from southwestern Asia via the Balkanian/Anatolian/Iranian landmass in the Early Oligocene (Perea et al., 2010).
The genus Capoeta may be an ideal model for study of the biogeographical and evolutionary history of the freshwater fauna of Iran, given its countrywide distribution and the extensive variation in habitats occupied, from high-mountain crystalline streams to deep lowland/coastal muddy rivers (Bănărescu, 1999). Being mostly algaescrappers, these species depend mostly on clear and not very deep rivers where light is not a limitation to the growth of algae. Some species are placed as critically endangered (C. pestai and C. angorae), many as endangered (C. antalyensis, C. barroisi, C. bergamae, C. damascina, C. kosswigi, C. sieboldi and C. tinca) but also many as data deficient in Turkey by Fricke, Bilecenoğlu, and Sarı (2007). In Iran, C. capoeta is considered of least concern in Caspian basin by Kiabi, Abdoli, and Naderi (1999), but in general, there are not strong assessments on their conservation status and more studies are needed. Main threats for this species seem to be habitat loss, water abstraction, construction measures, and pollution and probably on a lower degree invasive species. On the other hand, many species were considered very widely distributed which recent taxonomic studies limit their distribution and describe new more locally limited species which suggest a higher conservation status for them and show the urgent need of studies on the conservation status of generally all freshwater fishes in Iran.
This study investigated the phylogeny of the main freshwater populations of Capoeta, including the most complete dataset available, and provided a hypothesis on the evolutionary history and diversification of the genus in the region. The primary aims of this study were (1) to assess species boundaries within Capoeta and evaluate cryptic diversity and species endemism by sequencing the cytochrome b gene; (2) to investigate the phylogeny within Capoeta species based on geographic sampling; and (3) to propose a hypothesis for the origin of freshwater fauna of Iran considering possible vicariant events that may have shaped the diversity of the genus.

| Sample collection
Three hundred and five specimens of the genus Capoeta and one specimen of Barbus lacerta were collected by electrofishing at 47 sites in 13 river basins, covering most of its distribution in the country (Table 1;

| DNA extraction, amplification, and sequencing
DNA was extracted using the DNeasy ® Blood & Tissue Kit (QIAGEN, Hilden, Germany). The entire cytochrome b (cytb) gene (1140 bp) was amplified by polymerase chain reaction (PCR) according to the protocol described in Perdices and Doadrio (2001)  from GenBank, were used as outgroup Levin et al., 2012;Machordom & Doadrio, 2001a). All sequences and GenBank accession numbers are listed in Tables S1-S3.

| Data analysis
The sequences were collapsed to haplotypes using the program Alter  (Table S4) were calculated with Mega 6 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013). A bootstrapping process was implemented with 1000 repetitions. As multiple tests, p-values were further adjusted by Bonferroni's correction (Rice, 1989).
The Akaike information criterion implemented in PartitionFinder v.  (Rambaut & Drummond, 2013). After discarding the first 10% of generations as burn-in, we obtained the 50% majority rule consensus tree and the posterior probabilities.
We reconstructed haplotype network sequences to resolve relationships among closely related haplotypes (Crandall, 1994). Haplotype genealogies in these clades were obtained by HaploView v.
Divergence times within Capoeta were estimated using a Bayesian relaxed molecular clock approach, implemented in BEAST v. 1.8 (Drummond, Suchard, Xie, & Rambaut, 2012). Dating analyses were performed using the evolutionary models described above. To calibrate the tree, an expanded sequence matrix was considered that included sequences of Barbus and Luciobarbus species (Table S3).

| RESULTS
Of the 1040 bp of partial mtDNA cytb, 744 were constant and 254 were parsimony informative. The Bayesian and ML analyses yielded essentially the same topologies with similar support (Fig. 2). The reconstructed topology was also in agreement with previously published higher level phylogenies that included Capoeta Levin et al., 2012;Turan, 2008). The phylogeny results supported the monophyly of Capoeta and identified it as sister to Luciobarbus. Results revealed the presence of three well-supported clades: A-Mesopotamian; B-Aralo-Caspian; and C-Anatolian-Iranian, as has been proposed by other authors (Levin et al., 2012) (Fig. 2).
Uncorrected-p genetic distances for the cytb gene were 8.7% for the pairwise comparison of clades A and B, 8.2% for clades A and C, and 6.2% between clades B and C. Genetic distances between and within species are listed in Table S4. was not well supported by our data. Genetic distances among these subclades ranged from 1.1% to 1.7% (Table S4). Species delimitation methods recognized C. mandica, C. barroisi, C. trutta, and C. turani as valid, even given the relatively low genetic distances separating them.

| Aralo-Caspian clade
The Aralo-Caspian clade comprised populations of rivers that flow to the Aral, Orumieh, and Caspian seas, and several rivers in central Iran. This was a well-supported clade separated into three subclades ( Fig. 4) with genetic distances among species ranging from 1.1% to 3.7% (Table S4). and Armenia (Fig. 5). Phylogenetic relationships within this subclade were unresolved, and genetic distances between species were low (0.7%). Results of the GMYC and bPTP differed for this subclade, suggesting the use of a genetic marker more sensitive to population differences. Sequences of C. ekmekciae were clustered together, and both species delimitation methods recognized it as a distinct evolutionary unit, which, along with the low genetic differences within this subclade (0.7%), could possibly be attributed to population differences within the same species. Interestingly, a specimen of this subclade was captured in the Caspian basin, which is geographically distant from the locations in which the other specimens were obtained, that is, chiefly the Orumieh basin and other areas in Turkey. This sample was checked twice to be sure that the result is not due to contamination (we repeated the DNA extraction, amplification and sequencing). The lack of more specimens presenting the same condition prevented us to reach any conclusion on it, so we prefer to interpret the data literally until we find more specimens and we study more in depth this question.
The third subclade included populations from north and northeastern river basins of Iran and river basins of Turkmenistan. We found two groups separated by genetic distances varying from 1.1% to 2.9%.
The first consisted of two well-supported subgroups, Capoeta aculeata is suggested as the valid name, as the species was described from specimens caught in a river near Herat in Afghanistan. The other two subgroups, belonging to Iranian populations with southern Caspian distribution, could not to be assigned to any described species. One was recognized by Levin et al. (2012) and designated Capoeta sp1.
We retain this nomenclature for populations of the southern Caspian basin. The third subgroup occurred in Namak endorheic basin from two geographically close but separated rivers, the Jajrud and Namrud Rivers, and is herein referred to as Capoeta sp6. Both GMYC and bPTP recognized C. sp6, C. sp1, and C. heratensis as distinct species.
The haplotype network of the populations from the northern and northeastern river basins demonstrates clear structuring among basins (Fig. 6). No haplotypes were shared among populations of different basins. The group designated Capoeta sp1 was the most diverse taxon in this network, showing 19 haplotypes.
Capoeta aculeata presented the lowest diversity, with two detected haplotypes. It is possible that the bigger number of samples for C. sp1 is biasing a little our results in some grade, but mostly, we believe that C. aculeata is living in a very arid region with very strong fluctuations on the water level (specially the population in Kor basin), so we suppose in their history they suffered many bottleneck events, which is not the case of C. sp1 which is present in a very humid region with high levels of precipitation. Also, we suppose that it have something to do with the fact that rivers in the Caspian basin, where C. sp1 is present, are all independent without freshwater connections what can help to keep rare alleles established in smaller independent habitats, which will show a higher diversity.

| Anatolian-Iranian clade
The Anatolian-Iranian clade, the sister clade of the Aralo-Caspian clade, includes species widespread throughout the Anatolian peninsula and river basins of western and central Iran. This well-supported clade was the most diverse among the Capoeta, comprising six subclades, with genetic distances ranging from 1.5% to 5.4% (Figs 2 and 7).
The first subclade consisted of Capoeta sieboldii (Steindachner 1864) from the Kelkit River in Turkey, which drains into the Black Sea.
A second subclade split into two well-supported groups separated by a genetic distance of 2.7%. The first included populations from Turkey and was attributed to Capoeta bergamae Karaman 1969, and the second, also from Turkey, was described by Levin et al. (2012) and designated Capoeta sp2. Both groups were recovered as valid species by both methods used.

C. banarescui b P T P G M Y C b P T P G M Y C
The third subclade included populations of Capoeta mauricii Küçük, Turan, Sahin, and Gülle 2009, inhabiting the Sarioz Stream and Eflatum Spring in the Beysehir Lake basin in southwestern Turkey.
The fourth subclade included Capoeta antalyensis (Battalgil 1943) from the Boga Cayi River in Turkey, near the type locality of the species. Again, both methods of species delimitation supported subclades three and four as valid species.
A fifth subclade was divided into two groups. The first group was formed by two well-supported subgroups. The first subgroup included samples identified as C. antalyensis. The genetic distance between sequence pairs within this subgroup was 0.2%, and both species delimitation methods considered the subgroup as a valid species.
Hence, we tentatively interpret this as misidentification of these specimens, as samples of type locality of C. antalyensis were present in the fourth subclade. A second subgroup consisted of Capoeta baliki Turan, Kottelat, Ekmekçi, andImamoglu 2006, andCapoeta tinca (Heckel 1843). However, the species delimitation methods used did not recover them as separate species. The second group of this subclade consisted of samples identified as Capoeta banarescui Turan, Kottelat, Ekmekçi, and Imamoglu 2006 and as C. cf banarescui by Levin et al. (2012). Both GMYC analysis and bPTP recognized two evolutionary units, one of which corresponded to C. banarescui.
The final subclade comprised two well-supported groups. One consisted of specimens from the Karkheh basin in western Iran. In a previous phylogenetic study, these fish were assigned to Capoeta sp3 (Levin et al., 2012). Both GMYC analysis and bPTP recognized it as a valid species. The second group of this subclade separated into two subgroups, the first found mainly in Turkey and the other primarily in be corroborated with other information sources. Here, in these four different "species" recognized by bPTP, all the information suggest that they are all the same species: GMYC does not recognize different species, genetic distances between them are very low, lower than some other within species distances in the genus, all samples come from the same sampling point, and they form all together a well-supported clade very close to all other samples from this species. So rather than different species, we interpret the results on this group as highly structured populations within the same species.
The network analyses showed high haplotype diversity and strong geographic structuring in C. saadii in comparison with the remaining haplogroups, with 14 haplotypes present in three basins and no haplotype shared among basins (Fig. 8)

| DISCUSSION
This study provides the most comprehensive molecular phylogenetic framework of the Capoeta species in Iran to date. Capoeta was found to be monophyletic, consisting of three highly divergent lineages, as previously reported (Levin et al., 2012;Zareian, Esmaeili, Heidari, Khoshkholgh, & Mousavi-Sabet, 2016). Within Iran, these lineages are represented by the Mesopotamian clade along with the Aralo-Caspian clade and its sister group, the Anatolian-Iranian clade (Levin et al., 2012;Zareian, Esmaeili, Heidari, et al. 2016). We observed a complex phylogenetic pattern for Capoeta, with the presence of new mitochondrial lineages that, in some cases, indicated the need for rearrangement of the current systematics of Capoeta genus the region (Table 2). This supports the premise that the biodiversity of the area has been underestimated (Coad, 2006) and highlights Iran as critical for diversification studies, as it represents an important area of faunistic interchange among biogeographical regions (Kapli et al., 2015). Our molecular clock dates the separation between Capoeta

| Mesopotamian clade
The taxonomic validity of C. mandica in the Mesopotamian clade has been questioned. Some authors consider it a subspecies of C. barroisi (Coad, 2015), whereas others consider it a different species (Esmaeili, Coad, Gholamifard, & Teimory, 2010;Jouladeh-Roudbar, Vatandoust, Eagderi, Jafari-Kenari, & Mousavi-Sabet, 2015;Özulug & Freyhof, 2008). Bayesian reconstruction provided evidence for the presence of a mitochondrial lineage distinct from C. barroisi. Capoeta mandica lineage appeared to be closely related to C. trutta and distantly related to C. barroisi, which led us to propose it as a valid species. The results of GMYC and bPTP were also congruent with the recognition of C. mandica as a separate species.
In undescribed species with a mark (*), the name used by Levin et al. (2012) is kept in this study. N/A is stated for those species which have no information on the clade they belong to.
However, species of the Mesopotamian clade show low genetic distance and wide distribution. Hence, further investigation, including morphological characters and examination of more specimens throughout its population distribution, is needed to establish a robust taxonomy for this clade.

| Aralo-Caspian clade
Within the Aralo-Caspian clade, a population from the Tejan River in the Caspian slope, sampled for the first time, was highly divergent from the remaining lineages. As the haplotype network analyses and the GMYC and bPTP methods for species delimitation also supported the differentiation of this lineage, we putatively considered it here as new species (Capoeta sp7). However, given that the C. sp7 lineage occurs in sympatry with C. sp1, further morphological analyses and additional samples from the region should be included to determine the origin of speciation, possibly introgressive hybridization, as reported in other cyprinids (Durand, Unlü, Doadrio, Pipoyan, & Templeton, 2000;Machordom, Berrebi, & Doadrio, 1990).
The populations identified as C. capoeta from western Caspian were not monophyletic and showed a complex phylogeny including two previously described species, C. sevangi and C. ekmekciae. This, together with the low phylogenetic resolution of this subclade, suggest these three species as a C. capoeta complex, calling for further study. Three species of the genus Capoeta were found In the Caspian basin: C. capoeta, C. sp1 previously reported by Levin et al. (2012), and the newly identified C. sp7. The Caspian basin has a long history of fluctuation in sea level caused by climatic changes during Plio-Pleistocene (Mamedov, 1997). This complex history probably is one of the causes of the structure observed in our phylogeographic analyses of the populations of C. sp1 belonging to Aralo-Caspian clade, which likely represents connection and isolation of rivers when the levels of the Caspian Sea changed during the last glaciations.

| Anatolian-Iranian clade
The Anatolian-Iranian clade was found to be the most widespread and diversified clade of Capoeta, as was previously reported (Levin et al., 2012). This was a similar pattern to that reported in other genera from Anatolia and western Iran, such as Mesalina (Kapli et al., 2015), Mauremys (Vamberger et al., 2013), and Trachylepis (Fattahi et al., 2014). These wide distributions are probably due to recent dispersion events and/or lower barriers for some species groups.
Although C. damascina shows a broad distribution in Turkey, we found only a few specimens of this species in the Sirwan basin (Tigris tributary) in Iran. Some populations previously described as C. angorae and C. kosswigi are here considered synonymous with C. damascina, as our species delimitation methods did not indicate differences.
Nevertheless, our analysis separated populations of C. buhsei and recently described C. coadi  into two groups: the first clustered populations from the endorheic Namak basin and a second clustering those from the Karun and Zayandeh Rud basins. Our species delimitation methods suggested two species with interruption of genetic flow during the middle Pliocene. This temporal isolation during the middle Pliocene of the Namak basin is also reflected in species C. sp6 of the Aralo-Caspian clade. While the fauna in the Namak basin may have been affected by recent influences (Berg, 1940), Pliocene origin of the freshwater fish fauna in Namak basin has also been suggested for Salmo trutta (Derzhavin, 1934). Early Miocene deposits of Foraminifera indicate a hypersaline lagoon or inner shelf marine environment and humid climate during the Pliocene possibly changed the hydric balance and the salinity of Namak Lake (Daneshian & Dana, 2007;Zhu et al., 2007).
To decipher evolutionary and biogeographical patterns in Capoeta, previous authors have calibrated a molecular clock for the cytochrome b gene using fossil records (Levin et al., 2012). According to the molecular evolutionary rate based on a relaxed molecular clock, Capoeta arose in the middle Miocene 17.5 Mya when the Gomphotherium Landbridge was an important route of terrestrial fauna exchange between Africa and Asia (Harzhauser et al., 2007;Rögl, 1999). This period of middle-to-late Miocene was marked by the alternating periods of closure of the Tethys Sea and probably explains the split of the three main Capoeta clades, which is concurrent with the early development of Zagros Mountains, influencing the separation of basins and populations, especially in Iran. The Zagros Mountain uplift began in the mid-Miocene as a result of tectonic activity primarily resulting from contact of the Iranian and Arabian plates (Molnar, 2006). During the formation of Zagros Mountains, new freshwater bodies, along with new barriers within existing basins, shaped the generation of the main lineages, as supported by our results. This suggests major tectonic processes as the main speciation force within Capoeta, which may also be applicable to other freshwater groups.

| Conservation
As mentioned before, there is an urgent need to assess the conservation status of these species and freshwater fauna in general in Iran. In general, main threats affecting Capoeta genus seem to be the water abstraction for irrigation projects and other human needs and habitat loss, especially in a mainly arid region which become dryer and warmer every year with less precipitations. In addition, this already fragile environment is also affected by the industrialization and constructions and pollutions related to it, which will have certainly a major role as a threat for all freshwater fauna in a developing country and show the need of more conservational controls and policies. Finally, there is also a very important impact on the local fauna caused by the invasive species, mainly commercially interesting species for human use as food. More regulations and a better environmental management are necessary in the country to preserve the rich and unique fauna living in the region.