The complete mitochondrial genome of Rhynchocypris oxycephalus (Teleostei: Cyprinidae) and its phylogenetic implications

Abstract Rhynchocypris oxycephalus (Teleostei: Cyprinidae) is a typical small cold water fish, which is distributed widely and mainly inhabits in East Asia. Here, we sequenced and determined the complete mitochondrial genome of R. oxycephalus and studied its phylogenetic implication. R. oxycephalus mitogenome is 16,609 bp in length (GenBank accession no.: MH885043), and it contains 13 protein‐coding genes (PCGs), two rRNA genes, 22 tRNA genes, and two noncoding regions (the control region and the putative origin of light‐strand replication). 12 PCGs started with ATG, while COI used GTG as the start codon. The secondary structure of tRNA‐Ser (AGN) lacks the dihydrouracil (DHU) arm. The control region is 943bp in length, with a termination‐associated sequence, six conserved sequence blocks (CSB‐1, CSB‐2, CSB‐3, CSB‐D, CSB‐E, CSB‐F), and a repetitive sequence. Phylogenetic analysis was performed with maximum likelihood and Bayesian methods based on the concatenated nucleotide sequence of 13 PCGs and the complete sequence without control region, and the result revealed that the relationship between R. oxycephalus and R. percnurus is closest, while the relationship with R. kumgangensis is farthest. The genus Rhynchocypris is revealed as a polyphyletic group, and R. kumgangensis had distant relationship with other Rhynchocypris species. In addition, COI and ND2 genes are considered as the fittest DNA barcoding gene in genus Rhynchocypris. This work provides additional molecular information for studying R. oxycephalus conservation genetics and evolutionary relationships.

maintaining the balance of stream ecosystem (Park, Im, Ryu, Nam, & Dong, 2010). Due to poor diffusion ability, R. oxycephalus is an ideal materials for the study of freshwater fish biogeography.
The phylogenetic relationship of genus Rhynchocypris was very complicated, and it is one of the long-standing controversial scientific issues in the classification of the subfamily Leuciscinae. Formerly, genus Rhynchocypris was considered as synonym with genus Phoxinus (Nelson & Joseph, 1976). Based on isozyme, Ito, Sakai, Shedko, and Jeon (2002) found that genus Phoxinus and genus Rhynchocypris were two nature taxa with close relationship. Based on 16S rRNA and Cytb genes from the mitochondrial genome, Sasaki et al. (2007) found that relationship between genus Phoxinus and genus Rhynchocypris is a little farther and Rhynchocypris was sister group with genus Tribolodon and genus Pseudaspius. In above studies, phylogenetic relationship of genus Rhynchocypris is controversial and further research is needed.
The typical vertebrate mitochondrial genome is circular, ranging in size from ~15 to 18 kb and generally containing 37 genes (13 protein-coding genes, 22 tRNAs, and two rRNAs) and two noncoding regions (control region and putative origin of light-strand replication; Sasaki et al., 2007). Because of its maternal inheritance, high mutation rate, and small molecular weight, mitochondrial DNA has been used as a good molecular marker in phylogenetic analysis.
In addition, the mitochondrial gene fragments have different evolution rates, so different gene fragments can be applied to different species studies. For example, RNA has a slower evolution rate and relatively conservative genes, which is suitable for species research in the upper class. ND, COXI, and other genes are faster than RNA genes in rate of evolution, and they are suitable for phylogenetic analysis between species or genus.
Due to the limitations of morphological classification methods, more and more molecular biology methods have been applied to fish species identification in recent years. DNA barcoding technology is the most widely used among them (Hogg & Hebert, 2004). DNA barcoding technology is a technique for rapidly identifying species by analyzing the DNA sequences of standard target genes. It can not only identify known species, but also discover new species and hidden species that cannot be identified by traditional taxonomic methods. Compared with traditional species identification methods, this technology has the advantages of high accuracy, high efficiency, and is not affected by the environment of the identified object, individual factors of individual development, and identification experts (Hebert, Ratnasingham, & Dewaard, 2003). In mitochondrial genomes, COI gene is commonly used for species identification of birds (Yoo et al., 2013), insects (Hajibabaei, Janzen, Burns, Hallwachs, & Hebert, 2006), and fishes (Ward, Zemlak, Innes, Last, & Hebert, 2005) and has achieved good effect. However, as DNA barcoding, COI gene is not suitable for all animal species. For example, Li, Liu, Li, Du, and Zhuang (2015) analyzed Clupeiformes with COI gene and found that although all species can be distinguished, the efficiency is ordinary. Under this situation, more mitochondrial genes should be used as animal DNA barcodings. For example Nishida (2000a, 2000b), , and Chen, Chi, Mu, Liu, and Zhou (2008) considered that COI, COIII, ND2, ND4, ND5, and Cytb genes were the best molecular markers for phylogenetic analysis in the research of Vertebrate and Teleostean. So these genes have the potential to be good DNA barcodings.

| PCR amplification and sequencing
PCR primers were designed by Primer Premier 5.0 software (Lalitha, 2000) and were based on universal primers of fish mtDNA (Simon et al., 1994). In addition, we used 16 sets of specific primers to am-

| Sequencing assembling and annotation
The complete mitochondrial genome sequences were assembled and annotated with the software Geneious (Drummond et al., 2010  conserved regions of the sequence (Castresana, 2000). Before the establishment of phylogenetic tree, the substitution saturation of base was tested by DAMBE software with GTR distance (Xia, 2013).

| Genome annotation and base composition
We obtained the mitochondrial genome sequence of R. oxycephalus and deposited it in NCBI with GenBank accession no. MH885043.
The mitogenome of R. oxycephalus was a circular DNA molecule with 16,609 bp in length. As shown in Figure 2, the mitogenome organization of R. oxycephalus was similar to that of typical ver-  (Perna & Kocher, 1995). The overall A + T content of the mitochondrial genome of R. oxycephalus was 56.0%; such an A-Trich pattern reflected the typical sequence feature of the vertebrate mitochondrial genome (Mayfield & Mckenna, 1978).
The R. oxycephalus mitochondrial genome contained 25 overlapping nucleotides. These were located in 7 pairs of neighboring genes and varied in length from 1 to 7 bp; one of the longest overlap (7 bp) was located between ND4L and ND4, the other was located between ATP8 and ATP6. A total of 30 intergenic nucleotides were dispersed in 12 locations and ranged in size from 1 to 13 bp; the longest intergenic spacer (13 bp) was located between tRNA-Asp and COII.

| Protein-coding genes
Among 13 PCGs of R. oxycephalus, there were 12 PCGs using ATG as the initiation codon except the COI gene, which used GTG as initiation codon. All COI genes in reported fishes used GTG as initiation codon. Thus, the feature that COI used GTG as initiation codon seemed to be prevalent among nontetrapod vertebrates (Saitoh et al., 2000). However, stop codons varied among 13 PCGs. Seven PCGs in R. oxycephalus mitochondrial genome ended with complete stop codons, including TAA (ND1, COI, ATP6, ND4L, ND5, and ND6) and TAG (ATP8), the rest six genes ended with incomplete stop codons, either TA (ND4) or T (ND2, COII, ND3, COIII, and Cytb), which were presumably completed as TAA after transcriptions (Anderson et al., 1981). The codon usage and the relative synonymous codon usage (RSCU) in R. oxycephalus mitochondrial genome are given in Table 3. It revealed that codons were abundant in A or T in third position. The codons that had relatively high content of G and C were likely to be abandoned. Codon distribution in R. oxycephalus is given in Figure 3. Codons per thousand codons (CDspT) of R. oxycephalus showed its preference to Leucine and Alanine.

| Ribosomal and transfer RNA genes
The 12S and 16S rRNA genes of R. oxycephalus mitochondrion were 957 and 1693 bp in length, respectively. As in other vertebrates, they The secondary structures of the tRNA-Ser(AGY) genes in Rhynchocypris oxycephalus were located between tRNA-Phe and tRNA-Leu (UUR) genes and separated by tRNA-Val gene. The base composition of the two rRNA gene sequences was A: 28.6%, T:26.6%, C:21.1%, and G:23.7%. The A + T and G + C contents of the two rRNA were found to be 53.4% and 46.6%, respectively.
The secondary structure of the animal tRNA gene was very similar. It showed a typical clover stem-loop structure including four arms and four rings, one of which was a variable ring.
According to its function, the four arms and the ring were, respectively named: amino acid accepting arm, dihydrouracil arm

| Noncoding regions
Like other vertebrates, there were two noncoding regions in R. oxycephalus mitochondrial genome. One was control region (D-loop), and the other was putative origin of light-strand replication (O L ).
F I G U R E 5 Schematic map characterizing of the control region of Rhynchocypris oxycephalus. ETASextended termination-associated sequence, CSB-conserved sequence blocks Control region of R. oxycephalus mitochondrial genome was 943 bp in length, locating between tRNA-Pro and tRNA-Phe genes. It was also called A + T-rich region with A + T content accounting for 65% of total base pairs, which was much higher than G + C content. Similar result was observed in other Cyprinidae species .
Control region consisted of termination-associated sequence (TAS), central conserved domain (CCD), and conserved sequence block (CSB).
TAS had an obvious hairpin structure (TACAT and ATGTA; Guo, Liu, & Liu, 2003). Liu (2002) identified three conserved sequence blocks (CSB-D, CSB-E, and CSB-F) from CCD. In addition, previous studies on mammalian conserved sequence regions had found that there were generally three conserved sequences in CSB, which were named CSB1, CSB2, and CSB3, and speculated that this region was involved in heavy chain RNA primer generation (Walberg & Clayton, 1981). In addition, one repetitive sequence (AT) was found by the software Tandem Repeat Finder. This repetitive sequence was also found in other Cyprinidae species (Liu, 2002).

| Sequence alignment
To compare the differences among Rhynchocypris species, mitogenome sequences of other 7 Rhynchocypris species were downloaded from Genbank and included in this study (Table 4).
The complete mitochondrial genome of 13 PCGs, tRNA and their combined sequence, rRNA and their combined sequence was all aligned by Clustal X 1.83 (Jeanmougin et al., 1998), and the results are shown in Table 5.
According to Brown, George, and Wilson (1979) and Knight and Mindell (1993) conclusions that the conversion ratio of the gene sequence was lower than 2.0, it was generally considered that the mutation had reached saturation and it was likely to be affected by the evolutionary noise, so special weighting must be carried out to ensure the comparison in the process of constructing the evolutionary F I G U R E 6 The secondary structures of the putative origin of light-strand replication gene in Rhynchocypris oxycephalus relationship of the system with the correct information. It could be found that all of the Ts/Tv ratio was higher than 2.0, which indicated the conversion and transversion were not saturated. And it was suitable for phylogenetic analysis. In addition, It can be found that G content in the most segments was very low, which indicated an obvious antibias in the Guanine.

Species Length (bp) A + T % AT-skew GC-skew
According to variable sites and the Kimura-2-Parameter distance (Table 5), it could be found that ND2 had the maximum mutation rate (34.4%) and genetic distance (0.163) among 13 PCGs, which was in accordance with Qiao's (2014) conclusion. While COII had a small mutation rate and genetic distance, it could be indicated that the sequence was very conservative.   Figure 9a, b.

| The analysis of the DNA barcoding
We used the software MEGA 5.0 (Tamura et al., 2011)

The mean interspecies and intraspecies distance used by
Kimura-2-Parameter model among 6 PCGs is shown in Figure 10.
According to Figure 10, we could learn that ND2 had the maximum interspecies distance among 6 PCGs, while COI had the minimum.

D2. And the result of the Wilcoxon test in intraspecies distance in
Rhynchocypris species was COI = COIII < ND5 < ND4 = Cytb <= ND 2. The results were basically consistent with the results of the sequence alignment.
According to the theory of the ideal DNA barcoding by Meyer and Paulay (2005), the interspecies variation of the ideal DNA barcoding should be significantly larger than the intraspecies variation, and there should be a gap between the two, which called DNA barcoding gap. Distribution of interspecific and intraspecific variations of Rhynchocypris species in 6 PCGs is shown in Figure 11.
We found that the average interspecies distance between 6 PCGs was larger than the intraspecies distance, and there were different degrees of overlap between intraspecies and interspecies distribution of each PCG. All 6 PCGs had no obvious DNA barcoding gap.
However, COI, Cytb, and ND2 genes had less overlap between intraspecies and interspecies distribution which was beneficial to species differentiation.

| Structural features of the mitochondrial genome of R. oxycephalus
In this study, the complete sequence of the mitochondrial genome of R. oxycephalus was obtained. R. oxycephalus had the same characteristics as other Cyprinidae species in mitochondrial genome structures, with a total length of 16,609 bp and a mitochondrial genome A + T content of 56.0% which was consistent with the A + T preference of vertebrates. It indicated that the order of mitochondrial genomes changes rarely, and it was suitable for solving the biological system developmental relationship of higher order elements such as families and subjects (Boore, 1999). Base G had the lowest content in the mitochondrial genome of R. oxycephalus.
The phenomenon might be related to the way the mitochondrial gene is replicated. Specifically, the H chain replicated first, and when the H chain replication reached the origin of light-strand replication, the L chain began to replicate. It caused a relatively long L chain in a single-stranded state was prone to base mutations, resulting in a more stable G base being gradually replaced by other bases (Clayton, 1982). There were several intergenic regions and overlapping regions in the mitochondrial genome, including 12 intergenic regions and seven overlapping regions. This phenomenon was also common in other Cyprinidae species (Wu et al., 2009;Zhang et al., 2009).
Among 13 PCGs of R. oxycephalus, like other vertebrates, except ND6 gene, all genes showed strong A + T bias and C base preference. ND6 gene was the PCG of the L chain, so it could be indicated that there were large base composition differences between the genes encoded by the H chain and the L chain. R. oxycephalus's PCGs start codon was relatively constant and had the general characteristics of bony fish (Chang, Huang, & Lo, 1994), while the stop codon changed greatly. Beside complete stop codons, there were two types of incomplete stop codons (T/TA).
This phenomenon was widespread in the mitochondrial genome. It was not difficult to see the transcript of these protein sequences was U or UA at the 3' end. Due to the Ploy A at the 3' end of the mRNA, a complete stop codon could be formed by the addition of polyadenylation during processing (Ojala, Montoya, & Attardi, 1981). Among 22 tRNA genes of R. oxycephalus, in addition to tRNA-Ser (AGY), the rest could fold into a typical clover structure.
The tRNA-Ser (AGY) lacked the DHU arm and formed a singleloop structure at the position of the DHU arm. This structure was very common in fish mitochondrion (Lee & Kocher, 1995;Noack, Zardoya, & Meyer, 1996). Cheng et al.(2015) had shown that this tRNA lacking the DHU arm could adjust the structural morphology and it did not affect its ability to enter the ribosome and its ability to carry and transport amino acids. In addition, the putative origin of light-strand replication was a region with a fast rate of evolution and a high degree of variation, which could fold into a stable stemloop secondary structure. Similar structures were found in fishes, amphibians, and mammals, but not in reptiles and birds (Ojala et al., 1981;Wolstenholme, 1992). Generally speaking, the control is involved in termination of DNA replication (Hai, Yang, Wei, Ming, & Hu, 2003). In termination-associated sequence, there was an obvious hairpin structure (TACAT and ATGTA). Several TACAT sequences could also be found in downstream sequence (Lin et al., 2006). Central conserved domain was the most conservative zone in the control zone, and it was very conservative in almost all fishes. It could identify three conserved regions including CSB-D, CSB-E, and CSB-F by comparing with other Cyprinidae species.
Conserved sequence block could identify three conserved regions including CSB1, CSB2, and CSB3. It was presumed that this region was involved in the occurrence of H chain RNA primers (Walberg & Clayton, 1981 Cyprinidae species (Liu, Tzeng, & Teng, 2002). Different repetition times of AT sequence resulted in different length of conservative sequence region of fish.

| The phylogenetic relationships of Rhynchocypris species
In recent years, more and more researches on genus Rhynchocypris were presented. Imoto et al. (2013)

| DNA barcoding of Rhynchocypris species
Nowadays, more and more people use different mitochondrial genes as DNA bar codes to identify animal species. By establishing a phylogenetic tree for 13 PCGs, Tang, Zheng, Ma, Cheng, and Li (2017) concluded ND5 gene had the potential to be DNA bar code for  (Chen et al., 2012).
In theory, the ideal DNA barcoding sequence should have large variation between species, small intraspecific variation, and DNA barcoding gap. In this study, the interspecies distance of the 6 PCGs we selected is all larger than the intraspecies distance.
Relatively speaking, COI and ND2 genes have larger interspecies distance and smaller intraspecies distance. So, the effect of using these two PCGs to analyze the genetic distance is better than the other four PCGs. In addition, we can find the DNA barcoding gap in six PCGs. Moritz and Cicero (2004) suggested that if there are many closely related species in the collected samples, the overlap between the interspecies variation and the intraspecies Another reason may be that there may be hybridization or genetic introgression between these species in the neighborhood, which will increase the overlap between interspecific and intraspecific variations. The phenomenon is also present in other Rhynchocypris species (Xu, 2013). Relatively speaking, COI, Cytb, and ND2 genes had less overlap between intraspecies and interspecies distribution. So, we concluded that COI and ND2 genes are suitable DNA

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R S CO NTR I B UTI O N
QC conceived the ideas and designed the study; QC, ZZ, and YG performed the experiments and collected the data; ZZ and QC analyzed the data; QC, ZZ, and YG interpreted the results; ZZ and QC wrote the manuscript. All authors contributed critically to the drafts and gave final approval for publication.

DATA AVA I L A B I L I T Y
All data used in this study are publicly available in NCBI databases