Mitochondrial divergence suggests unexpected high species diversity in the opsariichthine fishes (Teleostei: Cyprinidae) and the revalidation of Opsariichthys macrolepis

Abstract Opsariichthine (sensu Oceanologi Et Limnologia Sinica, 1982, 13, 293–298) is a cyprinid group consisting of five genera and endemic to East Asia. Previous studies suggested that there may be many possible cryptic species in this group, but this has not been confirmed. In this study, using mitochondrial cyt b sequences on 1,388 samples and 739 haplotypes, we showed very high species diversity within this group. The results showed that phylogenetic relationships of the opsariichthine group were as ([Nipponocypris‐Parazacco‐Candidia] + [Zacco + Opsariichthys]), and there were multiple deep lineages within several species, flagging putative cryptic species. When a 3% genetic distance was used as a threshold for species delimitation, 35 haplogroups were found, nine haplogroups in Candidia‐Parazacco‐Nipponocypris group, six haplogroups in the Zacco group, and 20 haplogroups in the Opsariichthys group. We consider all of them to be putative until determination of distinct species based on the tree topology, geographic distributions, or a combination of both. In addition, two kinds of species delimitation tools, ABGD and PTP, were applied to construct molecular operational taxonomic units (MOTUs). The ABGD method revealed nine MOTUs in Candidia‐Parazacco‐Nipponocypris group, two MOTUs in the Zacco group, and 17 MOTUs in the Opsariichthys group. And the PTP method revealed 10 MOTUs in Candidia‐Parazacco‐Nipponocypris group, 10 MOTUs in the Zacco group, and 29 MOTUs in the Opsariichthys group. Therefore, there should be more species in the opsariichthine group than presently described. Based on the molecular data and morphological characteristics, we proposed Opsariichthys macrolepis as a valid species and described its morphological diagnostic characters.


| INTRODUC TI ON
The identification and delimitation of species, one of the main objectives of taxonomy, are important in evolutionary biology because species remain the fundamental unit and operational entity in most disciplines (Durand & Borsa, 2015). Taxonomic descriptions of species are typically accomplished through morphological criteria that were established by earlier typological studies. However, misidentification often occurred because of features such as phenotypic plasticity, cryptic species, genotypic variation, or different life history stages (species that exhibits polymorphism, e.g., sexual dimorphism and mimicry; Chen, Ma, Shen, Mao, & He, 2015). They could lead to erroneous estimate of genetic diversity, genetic differentiation between populations, the risks of local extinction, or producing meaningless estimates of demographic parameters and in turn may misguide management actions (Durand & Borsa, 2015).
While deciphering hidden diversity in species remains a taxonomic challenge, it is important to study the species differentiation and to understand patterns and processes in biodiversity (Butlin, Bridle, & Schluter, 2009).
DNA barcoding has been proposed as a quick and inexpensive approach to species identification, species discovery, and species delimitation. The mtDNA such as cytochrome oxidase 1 (CO1) or cytochrome b (cyt b) is usually employed as the universal locus (e.g., Hanelt, Schmidt-Rhaesa, & Bolek, 2015;Hundt, Berendzen, & Simons, 2017;Kakioka et al., 2018), and it is easier to amplify from highly processed and degraded tissues than nuclear DNA (Yang & Rannala, 2017). Molecular studies have been crucial to improve our knowledge on the ichthyofauna, and DNA barcoding has been successfully used in fish species identification and in detecting species of taxonomic concerns or cryptic diversity (Gomes, Pessali, Sales, Pompeu, & Carvalho, 2015;Pereira, Hanner, Foresti, & Oliveira, 2013). For example, Ramirez et al. (2017) using a DNA barcoding approach detected hidden biodiversity within a recently described freshwater fish genus Megaleporinus and identified 16-18 different molecular operational taxonomic units (MOTUs) within each of the 10 studied nominal species. For the genus Salminus, which was once migratory fish and top predator distributed throughout South America major hydrographic basins, Machado, Ishizuka, Freitas, Valiati, and Galetti (2016)  The Cyprinidae (cyprinids) represent one of the most diverse freshwater fish groups and are the major components of the primary freshwater fish fauna in Africa, Eurasia, and North America, comprising more than 367 genera and 3,006 species (Nelson, Grande, & Wilson, 2016). The opsariichthine fishes are one of the East Asian endemic minnow group of cyprinids and comprise five genera, Candidia (Jordan & Richardson, 1990), Nipponocypris (Chen, Wu, & Hsu, 2008), Opsariichthys (Bleeker, 1863), Parazacco (Chen, 1982), and Zacco (Jordan & Evermann, 1902). Opsariichthine have been taxonomically defined by Chen (1982) as a group of minnows in Cyprinidae sensu lato occurring widely in East Asia, with the large and elongate anal fin and a series of nuptial tubercles on the jaws as common features in adults (Chen, 1982). This group included small-sized fish that prefer to live in rivers or streams and swim actively in riffles with swiftly running waters (Chen & Chang, 2005;Chen, 1982;Shen & Tzeng, 1993). Among these, Nipponocypris is distributed in the Korean Peninsula and Japan mainly. Candidia and Parazacco are confined to Chinese Taiwan and southeastern continental Asia, respectively. The remaining two genera, Zacco and Opsariichthys, are widely distributed in East Asia.
For more than one hundred years, taxonomic descriptions of opsariichthine species were largely accomplished through morphological characteristics alone. However, recent molecular biological approaches revealed a complicated scenario. A series of reports on the taxonomy of opsariichthine based on morphological and genetic analyses was published, and the reports proposed the new classification of these Asian minnows (Chen & Chang, 2005;Chen, Huang, Jang-Liaw, Shen, & Wu, 2008;Chen, Wu, & Huang, 2009;Huynh & Chen, 2013 Recent studies also indicated that there is a hidden diversity within O. bidens and Z. platypus (Berrebi, Boissin, Fang, & Cattaneo-Berrebi, 2005;Perdices & Coelho, 2006;Perdices, Cunha, & Coelho, 2004;Perdices, Sayanda, & Coelho, 2005). O. bidens lives in sympatry with Z. platypus in many localities, and they are considered widespread species in the Chinese Mainland. However, the population genetic studies found higher genetic structure and long-term interruption of gene flow than previously expected (Berrebi et al., 2005;Perdices & Coelho, 2006;Perdices et al., 2004Perdices et al., , 2005. In these studies, four Z. platypus and five O. bidens mtDNA lineages were resolved and suggested to correspond to four and five species, respectively. Supporting the idea, Johansson (2006) examined body shape differences among the mtDNA lineages, which were based on results of Perdices et al. (2004), Perdices et al. (2005) and found that the different lineages were reflected in body shape differences.
Similar results were also found by Berrebi et al. (2005) and Li, Wang, Zhao, Zhang, and Zhang (2009). They suggested the possibility of more species existing in the opsariichthine group.
Although these authors realized that many cryptic species exist in the opsariichthine group, they have not made any progress in taxonomy except the works by Chen, Huang et al. (2008), , Chen et al. (2009) and Huynh and Chen (2013). In this paper, we used molecular approach to conduct a comprehensive investigation of the opsariichthine group, assessing all nominal species and lineages previously described. Our objective is to provide a clear phylogenetic structure to support identification and designation of species in opsariichthine through the use of several approaches in DNA taxonomy and to revise the current nomenclature of species by proposing new, provisional names to these lineages. Based on molecular analysis, we found opsariichthine from the upper Yangtze River should be a separate species. We compared the morphological characters of 30 specimens from the upper Yangtze River and found their morphological characters are in congruent with the Opsariichthys macrolepis. Therefore, we discussed the validation of Opsariichthys macrolepis. In doing so, we hope to make a little progress on the taxonomy of this speciose group.

| Sampling
From 2010 to 2017, 371 individuals from the opsariichthine group were collected from various localities in the Yangtze River, Huang River, Pearl River, and the freshwaters of the southeast coastal areas of China by our research group (Supporting Information Table S1; Figure 1). Specimens were identified following the diagnostic characters described by Bǎnǎrescu (1968), Chen (1982), Chen and Chu (1998), and FISHBASE (Froese & Pauly, 2018); the latter database was used whenever additional morphological diagnostic characters were described online. The samples for DNA extraction were preserved in 95% ethanol, and for morphological analysis, specimens were fixed in 10% formalin. A set of reference individuals was deposited in the Institute of Hydrobiology, Chinese Academy of Sciences.

| Laboratory methods
Total DNA was extracted from muscle tissue with a standard salt extraction protocol of Aljanabi and Martinez (1997). The cytochrome b gene was amplified using a polymerase chain reaction (PCR) with primers CGlu-2 (AACCACCGTTGTAATTCAACTA) and Pro-R1 (TAGTTTAGTTTAGAATTCTGGCTTTGG) adopted from Hardman and Page (2003) for the samples from Qingyi River in Huangshan City, Anhui Province, and for the rest of the samples, we used the primers L14724 (5ʹ-GACTTGAAAAACCACCGTTG-3ʹ) and H15915 (5ʹ-CTCCGATCTCCGGATTACAAGAC-3ʹ) by Xiao, Zhang, and Liu (2001). Each 50 μl PCR reaction contained 100 ng of template DNA, F I G U R E 1 Collection sites for the newly generated sequences of the present study. Details of the 21 sites and collected specimens are provided in Supporting Information Table S1. The numbers provided are the sample site numbers. This map was created in the ArcGIS version 10.1 (http://www.esri.com/arcgis/about-arcgis) 1 μl of each primer (each 10 μM), 5 μl of 10× reaction buffer, 2 μl dNTPs (each 2.5 mM), and 2.0 U Taq DNA polymerase. The reactions were performed as following: an initial 94°C denaturation for 4 min, and then, 35 cycles of 94°C denaturation for 45 s, 56°C annealing for 45 s, 72°C extension for 1 min, and a final 72°C extension for 10 min. Amplified DNA was fractionated by electrophoresis through 0.8% agarose gels, recovered from the gels, and purified using the BioStar glassmilk DNA purification kit following the manufacturers' instructions. The same primers CGlu-2/Pro-R1 and L14724/H15915 were used for sequencing. Bidirectional sequencing was employed to decrease the occurrence of sequencing error by commercial companies. The sequences have been deposited in GenBank (accession numbers are listed in Supporting Information Table S1).
Based on the cyt b gene, phylogenetic relationships among the opsariichthine haplotypes were reconstructed using three methods, Neighbor-joining (NJ), Bayesian inference (BI), and maximum likelihood (ML). The best-fit model of nucleotide evolution for the data was identified by Modeltest 3.7.0 (Posada & Crandall, 1998). NJ analysis was performed with MEGA 5.0 (Tamura et al., 2011) using the Kimura's 2-parameter (K2P) model. Bootstrapping with 1,000 pseudo replicates was used to examine the robustness of clades in the resulting tree (Felsenstein, 1985). BI was performed with MrBayes 3.2.2 (Huelsenbeck & Ronquist, 2001). The TrN+I+G substitution model (I = 0.5245, G = 1.3349) was selected based on Bayesian information criterion (BIC). In BI, two independent analyses with four simultaneous Makov chains Monte Carlo (MCMC) runs of 6,000,000 generations were made sampling every 1,000 generations, three heated chains and one cold chain, with a total 6,001 trees each. After testing for convergence of the MCMC algorithm, the first 2,000 trees were discarded as burnin. A 50% majority rule consensus tree was obtained from the remaining 4,001 trees. Posterior probabilities (PP) of phylogenetic inferences were determined from remaining trees. The ML method was performed with RAxML v.8.1.21 (Stamatakis, 2014).
The best scoring ML tree was identified using a nucleotide substitution model of GTR+I+G. The support of each node was estimated using a rapid bootstrap analysis with 1,000 replicates.
Genetic divergence is generally measured by the estimated pairwise distance between sequences, such as the p-distance, or more commonly, the distance calculated by the Kimura's 2-parameter (K2P) model (Kimura, 1980;Kondo, Ueno, Ohbayashi, Golygina, & Takamura, 2016). In this study, genetic distance was calculated with MEGA 5.0 (Tamura et al., 2011) with the default parameters and 1,000 bootstrap replicates under K2P model. The early studies proposed that levels of neutral mitochondrial DNA sequence diversity play a prominent role in alpha taxonomy, with divergence thresholds of approximately 3% widely being accepted as indicative of species differences (Chappell, Trewick, & Morganrichards, 2011;Hebert, Cywinska, Ball, & DeWaard, 2003;Johnston, Morikawa, Ntie, & Anthony, 2011). In this study, we reviewed all case in which haplogroups separated from the closest neighboring haplogroup by a nucleotide distance larger than the threshold (3%) and considered them to potentially represent additional cryptic species. For these cryptic species, we used Eschmeyer's (2018) fish database as the reference for the current nomenclature (Durand & Borsa, 2015). The current nomenclature was maintained for a haplogroup when its geographic distribution was compatible with the type locality of the species. We maintained the current nomenclature to designate those haplogroups that unambiguously correspond to the type material, based on the type locality, and we arbitrarily assigned capital letters or Arabic numbers to the other haplogroups. The other haplogroups were thus provisionally denominated "sp. A," "sp. B," etc.

| DNA taxonomy
In this study, to group samples into MOTUs and further delimit species, two popular approaches, the Automatic Barcode Gap Discovery (ABGD) and The Poisson Tree Process (PTP), were applied.
In most situations, genetic distances between individuals from different species are supposed to be greater than the intraspecific variation, revealing a noncontinuous distribution (Hebert et al., 2003).
This feature is called a barcode gap, which can be used as a threshold offering primary species delimitation under the assumption that individuals within a species are more similar than between species (Mallet, 1995). ABGD is an automatic procedure that can directly sort sequences into putative species based on barcode gaps (Puillandre, Lambert, Brouillet, & Achaz, 2012). ABGD was performed on a web interface (wwwabi.snv.jussieu.fr/public/abgd) using the default values for the relative gap width (X = 1.5) and two distance metrics (JC69 and K2P) as well as the p-distance (Puillandre et al., 2012). PTP is a model for delimiting species on a phylogenetic tree by searching for sequence clusters, thus distinguishing within and between species branching (Zhang, Kapli, Pavlidis, & Stamatakis, 2013).
A ML tree generated with RAxML v.8.1.21 (Stamatakis, 2014) based on 1,000 rapid bootstrap replicates under the GTR+I+G model was used as an input data. The PTP analysis was conducted on the PTP web server (http://species.h-its.org/ptp/), using default options and 500,000 MCMC generations. PTP reports were generated using the maximum likelihood (ML) tree.
In DNA taxonomy analysis, in order to avoid ambiguous alignments, three sequence datasets (the Candidia-Parazacco-Nipponocypris complex group, the genus Zacco group, and the genus Opsariichthys group) representing species complexes or genera based on the phylogeny and morphological characterization were selected and the analysis was run on each of the datasets. In case of discordance in the amount of splitting, we chose to keep the smallest number of entities, in order to avoid over splitting the species.  Information Table S2). (2005)

| Mitochondrial haplogroups and the cryptic species in the opsariichthine fishes
A total of 371 sequences of the complete cytochrome b gene (1,140 bp) consisted of five opsariichthine species (O. acutipinnis, O. bidens, O. evolans, Z. platypus, and Z. acanthogenys) were new sequences in this study, and 1,017 sequences were supplemented from GenBank consisted of 18 opsariichthine species (C. barbatus, C. pingtungensis, N. sitboldii, N. temminckii, P. fasciatus, P. spilurus, O. acutipinnis, O. bidens, O. chengtui, O. duchuunguyeni, O. evolans, O. hainanensis, O. kaopingensis, O. minutes, O. pachycephalus, O. uncirostris, Z. acanthogenys, and Z. platypus). Due to length heterogeneities of the cyt b sequences from NCBI, the aligned data matrix used for the analyses consisted of 913 bp sequence for cyt b in the present study. In total, 739 unique haplotypes were identified from 1,388 sequences. The number of mutations was 423 and the number of parsimony informative sites was 378. The base frequencies were A = 24.7%, C = 28.2%, G = 16.4%, and T = 30.7%. The sequence character analysis indicated that the G-C frequency (44.6%) was apparently lower than the A-T frequency (55.4%), which is consistent with the features of the mitochondrial genome (Zhang & Hewitt, 1996).
ML, NJ, and BI analyses generated similar tree topologies, and the NJ tree was shown in Figure 2. The molecular phylogenetic relationships show that the opsariichthine comprises three main groups: the Candidia-Parazacco-Nipponocypris group with one longitudinal stripe on the flanks, the Zacco group with an indistinct vertical stripe or band on the side of the body, and the Opsariichthys group with several distinct vertical stripes or bands along its body. The result showed that, when a 3% genetic distance was used as a threshold for species delimitation, there were 35 haplogroups in the mitochondrial phylogeny of the opsariichthine fishes, nine haplogroups in Candidia-Parazacco-Nipponocypris group, six haplogroups in the Zacco group, and 20 haplogroups in the Opsariichthys group. We consider all of them to be putative, distinct species. These cases are examined genus by genus in the following analysis, where each haplogroup was either assigned a capital letter, Arabic numbers, or its current name. Pairwise distributions of nucleotide distance among and within each of the 35 haplogroups were calculated with the Kimura 2-parameter model. Genetic distances between haplogroups ranged from 3.3% to 20.9% (Table 1

| Diagnose of species by DNA taxonomy
We designed three datasets of sequences based on the phylogenetic analysis and morphological characterization, which were the  suggested that Z. macrolepis should be a valid species (Figure 3) and based on the result of the phylogenetic analysis, it belong to the genus Opsariichthys.

| Revalidation of Opsariichthys macrolepis
Opsariichthys macrolepis is distinguished from its congeners by the following unique combination of features: absence of anterior notch on upper jaw; lateral line sales 46-49; predorsal scales 17-19; scales above lateral line 8; scales below lateral line 3; scales surrounding caudal peduncle 17-18; gill rakers 8; pharyngeal teeth arranged into two rows; end of pectoral fin not reaching or slightly extending to origin of ventral fin; maxillary not reaching to or slightly extending the vertical of anterior margin of orbit; rounded tubercles on lower jaw rather small and arranging in 2-3 rows in male. The morphological descriptions are shown below.

| Description
Meristic and morphometric data are listed in Table S2. Body is elongated and compressed laterally with standard length: 3.50-4.61 of depth, 3.63-4.24 of head length, and 4.83-5.91 of caudal peduncle length; head length: 3.81-4.34 of eye length, 2.67-3.34 of snout, 2.68-3.43 of interorbital width; body depth is slightly longer than head length. Dorsal III 7 is a little closer to the caudal base than to the snout; pectoral I 14 is nearly as long as head; ventral I is 7-8; anal III is 9. Anal fin extends beyond the caudal base, and the third branched ray is the longest. During the spawning season, the male develops a series of well separated and rounded horny tubercles on the snout, cheeks, and anal fin; the area below lower jaw is with two or three rows of 15-21 tubercles in total, cheek is with four longitudinal rows of tubercles, the lower two rows are separated and located at the lower cheek while the upper two rows are positioned just below the eye and the top row may or may not be interrupted; opercle is only with five small tubercles. In nuptial males, the body has 11 to 13 greenish blue stripes on the flanks, a greenish yellow caudal peduncle, and a blackish purple snout and cheek. However, the color faded when soaked in formalin.
We made comparison between O. macrolepis and its congeners in the Yangtze River and nearby geographic regions (Table 3)

| High species diversity in the opsariichthine group
Cryptic species refers to a group of species that have been classified with the same name or a group of morphologically indistinguishable species (Struck et al., 2018). Many cryptic species with the same morphological characters may be recognized using molecular tools and integrative taxonomy, which enrich species diversity but challenge traditional morphological taxonomy (Bickford et al., 2007).
Therefore, it is necessary to increase the number of tools and characters to accurately diagnose.
At present, 23 species are suggested in the opsariichthine group, but approximately 50 species names are available and have been used (Froese & Pauly, 2018). Species of this group continues to be discovered and described. The morphological features that delineate species in opsariichthine are insufficient to describe its actual species diversity because many morphological traits can be subtle or ambiguous, which often makes it difficult to recognize and describe new species. In the present study, with more samples, based on mitochondrial cyt b sequences, and with the DNA delimitation approaches such as ABGD and PTP, we revised the opsariichthine group.
Our phylogenetic relationships within the opsariichthine group were mostly concordant with previous studies (Huang, Wang, & Wang, 2017), and three main groups were found: the with the latter suggestion. Nipponocypris was described by , with both maxillary barbels and ventral keel absence, and this genus includes three valid species: two described from Japan "N. temminckii" and "N. sieboldii," and one from Korea "N. koreanus." Two species were found in this study, N. temminckii and N. sieboldii, and they were found not to be monophyletic.
Recently, Chen and his team conducted several morphological and molecular studies to propose new taxonomy of Opsariichthys (Chen & Chang, 2005;Chen, Huang et al., 2008;Chen et al., 2009;Huynh & Chen, 2013). In their study, (a) two new species were described, Opsariichthys sp. C-E, suggested that the species diversity in Yangtze River has been underestimated. Species delimitation requires additional sampling in other geographic areas and an examination of diagnostic trait. The evidence presented above suggests that species diversity within opsariichthine has been underestimated and warrants comprehensive revisions, our results call for further taxonomic studies to aid the identification of morphological or other traits useful in diagnosing opsariichthine haplogroups. Furthermore, we also conclude that the use of single mtDNA gene as molecular maker is limited in its utility. In order to confirm the results reliability, more loci, especially nuclear, should be used in the future.

| Revalidation of the O. macrolepis
O. macrolepis was first recognized by having two rows of pharyngeal teeth by Yang and Huang (1964), and assigned in the genus Zacco. Later, in the revision of the genera Zacco and Opsariichthys, Bǎnǎrescu accepted the revision (1968). Nevertheless, this species was treated as a junior synonym of Z. platypus by Chen (1982) because he collected more samples from different locations and the results showed that the number of pharyngeal tooth rows is not stable in opsariichthine group. In our study, we found a unique lineage from the upper Yangtze River, and their morphological characters were consistent with the description of Z. macrolepis. And pharyngeal teeth of 150 specimens were counted by our team, they all had two rows. In our opinion, the samples collected by Chen (1982) possibly belonged to different species. Therefore, Z. macrolepis is considered to be a valid species, and based on the result of the phylogenetic analysis, it was suggested as a member of Opsariichthys.

ACK N OWLED G M ENTS
We thank Siqing Liu, Zhi Zhang, Qiang Qin, Fubin Zhang, and other colleagues for their assistance during field work and DNA extraction. We also thank Dr.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
Xue Wang contributed to sampling, molecular experiment, data analyses, and writing the manuscript. Fei Liu and Dan Yu contributed to sampling and data analyses. Huanzhang Liu contributed to research design and writing the manuscript.

DATA ACCE SS I B I LIT Y
DNA sequences have been deposited in GenBank under the accession nos. MH350437 -MH350807. Details regarding individual samples are available in Supporting Information Table S1.