Numerous mitochondrial DNA haplotypes reveal multiple independent polyploidy origins of hexaploids in Carassius species complex

Abstract Evolutionary trajectory and occurrence history of polyploidy have been extensively studied in plants, but they remain quite elusive in vertebrates. Here, we sampled and gathered 4,159 specimens of polyploid Carassius species complex including 1,336 tetraploids and 2,823 hexaploids from a large geographic scale (49 localities) across East Asia, and identified a huge number of 427 diverse haplotypes of mitochondrial control region, in which 74 haplotypes with total occurrence frequency up to 75.498% were shared by hexaploids and tetraploids. Significantly, these diverse haplotypes were clustered into four major lineages, and many haplotypes of hexaploids and tetraploids were intermixed in every lineage. Moreover, the evolutionary trajectory and occurrence history of four different lineages were revealed by a simplified time‐calibrated phylogenetic tree, and their geographic distribution frequencies and haplotype diversity were also analyzed. Furthermore, lineage C and D were revealed to undergo population expansion throughout mainland China. Therefore, our current data indicate that hexaploids should undergo multiple independent polyploidy origins from sympatric tetraploids in the polyploid Carassius species complex across East Asia.

To further reveal evolutionary trajectory and ecological adaption of tetraploids and hexaploids in the Carassius species complex, a huge number of specimens were sampled throughout mainland China, and numerous mtDNA sequences including the newly obtained and previously reported sequences were used to perform a comprehensive investigation from a large geographic scale across East Asia.

| Sampling and specimen collection
A total of 3,105 individuals of the Carassius species complex were currently sampled from 34 locations through mainland China.
And the other 1,054 sample data of Carassius species complex from 15 localities in East Asia were collected from previous reports (Luo et al., 2014;Takada et al., 2010). Details about all the samples, sampling sites, and references are given in Table 1. For currently sampled specimens, caudal fin was preserved in 100% ethanol for subsequent DNA extraction and sequencing, and the blood cells were fixed in 70% ethanol for ploidy determination as described (Jiang et al., 2013). For the previously reported samples, the ploidy forms were collected from previous reports (Luo et al., 2014;Takada et al., 2010), and the mitochondrial control region (CR) sequences were obtained from GenBank. All experiments in this research were performed according to the permit guidelines established by the Institute of Hydrobiology, Chinese Academy of Sciences, and the experimental protocols were approved by the animal care and use committee of Institute of Hydrobiology, Chinese Academy of Sciences.

| Ploidy determination
High speed sorting flow cytometer FACSAriaTMIII (BD) was used to estimate ploidy levels of currently sampled specimens by measuring the relative DNA content of their fixed blood cells as described previously (Wei, Zhang, Zhang, Zhou, & Gui, 2003). Chicken blood cells with known DNA content of 2.5 pg/nucleus were used as an internally quantitative standard for each sample flow cytometry profile. The sampled blood cells of each sample were mixed with chicken blood cells and fixed in 70% precooled ethanol overnight at 4°C. The mixed cells were washed 2-3 times in 1× phosphate-buffered saline and then resuspended in the solution including 0.5% pepsin and 0.1 M HCl. DNA was stained with propidium iodide solution (40 g/ml) for 1-3 hr at room temperature in the dark. Each sample contained three repeats, and each repeat was measured at least 10,000 cells. DNA contents for each sample were measured by a formula in which the PI fluorescence intensity ratio of the mean of blood cells from each individual to the mean of blood cells from chicken was multiplied by the known mean DNA content (2.5 pg) of chicken blood cells as described (Wei et al., 2003) and was generated automatically by the flow cytometer. As reported previously (Jiang et al., 2013), the individuals with average DNA contents of about 3.64 pg/N ± 0.12 were generally detected as tetraploids, whereas the individuals with near 5.42 pg/N ± 0.198 were identified as hexaploids.
At last, the common sequences ranging from 298 to 353 bp were used for subsequent analyses.

| Sequence and data analyses
Sequence alignments and information of haplotypes were identified using MEGA7.0 (Kumar et al., 2016). Haplotypes were generated by DnaSP 5.10 (Librado & Rozas, 2009) software and then arranged according to their frequency for easy reading. Haplotype diversity was evaluated using Arlequin version 3.5 (Excoffier & Lischer, 2010).
For BI tree, four independent Markov chain Monte Carlo (MCMC) chains were simultaneously run for 20,000,000 generations with sample frequency of 1,000 generations. The first 25% of the trees were discarded as burn-in, and the remaining tree samples were used to generate a consensus tree. For ML tree, nodal support value was assessed from 100 nonparametric bootstrap replicates. In addition, to investigate the relationship between haplotypes, a network was built by TCS 1.21 (Clement, Posada, & Crandall, 2000).
To estimate divergence times in Carassius species complex, an uncorrelated relaxed molecular clock approach was implemented in Beast 1.7.5 (Drummond & Rambaut, 2007). HKY + I + G was selected as the best-fit model of evolution by MODELTEST software of version 3.7 (Posada & Crandall, 1998). The divergence time (11.11-9.14 million years ago [Mya]) (Gao et al., 2012)  Historical demographic/spatial expansions of Carassius species complex were investigated by two approaches. First, Tajima's D (Tajima, 1989) and Fu's Fs (Fu, 1997) statistics were calculated. Second, pairwise mismatch distributions (Schneider & Excoffier, 1999) neutrality test values were negative and significant (p < .05), and the mismatch distribution curve was unimodal, the lineage was thought to fit the sudden expansion model. Both the two approaches were performed by Arlequin 3.5 (Excoffier & Lischer, 2010). Moreover, the expansion time was estimated by the equation τ = 2ut (Nei & Tajima, 1981;Rogers & Harpending, 1992), in which u is the mutation rate per sequence and per generation. The value of u was calculated from u = 2μk, where μ is the mutation rate per nucleotide, and k is the number of nucleotides in the analyzed fragment. The approximate time of expansion was calcu- The average substitution rate of 2% site −1 Myr −1 (Meyer, 1993) was used for mitochondrial CR haplotypes of the Carassius species complex.

| Ploidy distribution of Carassius species complex across East Asia
A total of 4,159 specimens of Carassius species complex were gathered from 49 localities across East Asia. As shown in Table 1

| Matrilineal genealogy of mtDNA CR haplotypes
Based on the 427 diverse mtDNA CR haplotypes, a network was constructed by TCS 1.21 (Clement et al., 2000) to investigate their matrilineal genealogy. As shown in Figure 2, four major lineages are clustered, and haplotypes of hexaploids and tetraploids are intermixed in all of the four lineages. Lineage A includes 39 haplotypes, in which four haplotypes are shared by hexaploids and tetraploids, and eight and 27 haplotypes exist only in hexaploids or tetraploids respectively. The highest frequency haplotype in lineage A is CR42 (4.4%; Figure 2 and Table S2) that distributes only in Ryukyus (Table   S1), and h41 is the most widely distributed haplotype in lineage A, which occurs in seven local populations of China including GY, YN, XK, LJ, GD, SMX, JY (Table S1). Lineage B includes 54 haplotypes that distribute in Japan main islands and Ryukyus, in which 19 haplotypes are shared by hexaploids and tetraploids, and 21 and 14 haplotypes exist only in hexaploids or tetraploids, respectively. CR2 is the highest frequency haplotype with only 0.745% in lineage B (Figure 2 and Table S2). A total of 45 haplotypes are grouped into lineage C, where four are shared by hexaploids and tetraploids, and 34 and seven haplotypes exist only in hexaploids or tetraploids, respectively. In lineage C, the highest frequency haplotype is GH12 (9.930%; Figure 2) that distributes in most newly sampled populations of China (22/34; Table S1), and other haplotypes in this lineage are derived from GH12

| Phylogenetic relationship and evolutionary history of mtDNA CR haplotypes
To pursue the phylogenetic relationship of polyploidy occurrence and divergence evolution in Carassius species complex, phylogenetic BI tree and ML tree were also constructed according to these diverse CR haplotypes. And only BI tree was given in Figure 3, as both ML and BI tree were well supported each other and showed the same topology.
Similar to haplotype network, the diverse 427 mtDNA CR haplotypes were also clustered into four major lineages with obvious geographical distribution differences, and haplotypes of hexaploids and tetraploids were intermixed in each lineage as well.
Moreover, a simplified time-calibrated phylogenetic tree of the diverse mtDNA CR haplotypes was constructed (Figure 4a

| Population expansion possibility of different mtDNA haplotype lineages
Based on mtDNA CR haplotype sequences for each lineage, we calculated the statistical data about neutrality test and mismatch analysis by the reported approaches (Excoffier & Lischer, 2010;Fu, 1997;Schneider & Excoffier, 1999;Tajima, 1989), and thereby evaluated the historical demography and population expansion possibility in Carassius species complex across East Asia. As shown in Table 2 and  (Table 2), and mismatch distribution analysis shows bimodal distribution of pairwise differences (Figure 5a), indicating that lineage A rejects the hypothesis of sudden expansion (Table 2)

| DISCUSSION
Our current studies had identified and gathered numerously diverse mtDNA CR haplotypes (427 haplotypes) of polyploid Carassius species complex from a huge number of specimens (4,159 specimens) and a large geographic scale (49 localities) across East Asia ( Figure 1 and Table 1), and analyzed each haplotype variation and occurrence frequency in hexaploids or in tetraploids, respectively ( Fig. S1, Table S2). Subsequently, these diverse haplotypes were clustered into four major lineages, and many haplotypes of hexaploids and tetraploids were intermixed in every lineage (Figures 2   and 3). Significantly, the first 10 haplotypes with high occurrence frequency were found to share by both hexaploids and tetraploids,  and their occurrence frequency was high up to 53.017% (Table S2).
Moreover, a simplified time-calibrated phylogenetic tree revealed evolutionary history of four different lineages (Figure 4a), and their geographic distribution frequencies (Figure 4b). And the lineage C and D that distribute throughout mainland China were revealed to undergo population expansion by neutrality test, mismatch analysis ( Figure 5 and Table 2). Therefore, these data suggest that multiple independent polyploidy origins should have occurred in hexaploids of the polyploid Carassius species complex across East Asia, as numerous CR haplotypes of hexaploids and tetraploids are intermixed in each lineage, and many CR haplotypes especially high occurrence frequency haplotypes are shared by both hexaploids and tetraploids (Figures 2 and 3).
Previous studies have proposed that an early allopolyploidy event might lead to tetraploids, and an autopolyploidy event might result in hexaploids in the polyploid Carassius species complex Luo et al., 2014). The current studies further help us understand their evolutionary history and trajectory. As shown in Figures 2 and 3 (Peng, Xue, Guo, & Xue, 2008). In Ryukyus island, the examined population was mainly consisted of lineage A haplotypes, and a few of lineage B and lineage D haplotypes were also detected, indicating that the Ryukyus population of Carassius species complex might be artificially introduced from China and Japan, respectively, as suggested previously (Takada & Tachihara, 2009;Takada et al., 2010). In addition, a few haplotypes of lineage A observed in XK of northeast China and only two haplotypes of lineage D detected in KA of Japan might be resulted from human activities (Gao et al., 2012).
Diploidization process after polyploidy has been suggested to be the driving force of recurrent polyploidy (Soltis et al., 2015;Wendel, 2015). After autopolyploidy from sympatric tetraploids, the extant hexaploids in Carassius species complex may be entering an evolutionary trajectory of diploidization or have been on the diploidization process, as many phenomena similar to normal sexual diploid species have been revealed in some strains of hexaploid Carassius gibelio, such as normal meiosis completion, multiple modes of unisexual gynogenesis and sexual reproduction, and extra microchromosomes for male determination (Li et al., 2016;Zhang et al., 2015). However, recurrent polyploidy is not ceased, as octaploids have been also detected in some natural habitats occasionally (Table 1) (Liasko et al., 2010;Xiao et al., 2011). Therefore, the wide geographic distribution, coexistence of different ploidy forms and occurrence of repeated polyploidy events make the polyploid Carassius species complex an intriguing system to investigate evolutionary genetics and ecological genetics of vertebrates.

CONFLICT OF INTEREST
None declared.

AUTHORS CONTRIBUTION
JFG, LZ, ZWW, FFJ, and XLL collected samples, XJZ, ZL, and XLL performed the experiments, XLL and XYL analyzed the data, JFG and XLL prepared the data and wrote the manuscript. All authors read and approved the final manuscript.