Comprehensive analysis of population genetics of Phoxinus phoxinus ujmonensis in the Irtysh River: Abiotic and biotic factors

Abstract As a widely distributed species along the Irtysh River, Phoxinus phoxinus ujmonensis (Kaschtschenko, 1899) was used as a model to investigate genetic diversity and population structure as well as the influence of environmental factors on population genetics. In this study, we specifically developed 12 polymorphic microsatellite loci. The analysis of microsatellite and mtDNA markers revealed a high and a moderate genetic diversity across seven populations, respectively. Moderate differentiation was also detected among several populations, indicating the impact of habitat fragmentation and divergence. The absence of isolation by distance implied an extensive gene flow, while the presence of isolation by adaptation implied that these populations might be in the process of adapting to divergent habitats. Correlation analysis showed that abiotic factors like dissolved oxygen, pH, total dissolved solids, and conductivity in water as well as biotic factors like plankton diversity and fish species diversity had impact on genetic diversity and divergence in P. phoxinus ujmonensis populations. The results of this study will provide an insight into the effect of environmental factors on genetic diversity and contribute to the study of population genetics of sympatric species.

The Irtysh River is one of the biggest cross-border rivers in the northwest of China, flowing 546 km within China through Kazakhstan and Russia, and eventually into the Arctic Ocean. In China, there are about 23 native species in this river basin. However, fish populations have declined sharply in recent decades (Guo, Zhang, & Li, 2003;Huo et al., 2010). Hence, the situation of fish populations and structure has attracted wide interest and concerns.
Notably, Phoxinus phoxinus ujmonensis, a small cold freshwater fish ( Figure 1), is the only one species that is widely distributed throughout almost the whole Irtysh River basin (especially in China). In this watershed, both abiotic and biotic factors differ greatly along upper stream, lower stream, and tributaries, such as water temperature, conductivity, and total dissolved solids. Moreover, there exist several water conservancy facilities throughout this river basin. These factors might have induced the fragmentation and divergence of the habitats of fish，which may also become barriers to gene flow. Our previous investigations of feeding habits of predatory fish like Lota lota in Irtysh River indicated that P. phoxinus ujmonensis was prey to almost all these fish in this river. Obviously, P. phoxinus ujmonensis possesses a significant niche in the food web within this river. Thus, it could act as a good bio-model for the study of the population diversity and structure, as well as the effect of environmental factors on diversity within the Irtysh River. However, little research on population genetics of fish in the cold northwest areas of China has been available.
This study conducted an analysis of the population genetics of P. phoxinus ujmonensis with specifically developed markers and correlation analysis of genetic indexes and abiotic or biotic factors so as to deepen our understanding of the relationship between environmental factors and genetic diversity and structure.

| Ethics statement of sampling and DNA extraction
We confirmed that the sampling areas were not privately owned or preserved ones. We only sampled the individuals related to this study, involving no endangered or protected species. It was ensured that all the experiments and fish individuals involved were in accordance with the "Guidelines for Experimental Animals" of the Ministry of Science and Technology (Beijing, China;No. [2006]398, 30 September 2006. We kept the damage to the fish population to the minimum, especially in the breeding season.  Table ). Fins were cut off the fish and immediately stored into the ethanol. Total genomic DNA samples were extracted from individuals according to the standard Proteinase-K/ phenol-chloroform protocol (Russell & Sambrook, 2001). The concentration of all the DNA samples was detected by the Micro-plate Spectrophotometer (Epoch; BioTek) and diluted to 100 ng/µl.

| Isolation and characterization of microsatellite loci
Genomic libraries of 300-900 bp fragments were constructed according to the method described as Bei et al. (2012). The microsatellite library was subsequently constructed by magnetic bead enrichment method (Cui-Yun, 2005;Li, Gruber, Jessee, & Lin, 1998;Zane, Bargelloni, & Patarnello, 2010). Positive colonies in the plates of Escherichia coli DH5α cells were screened by polymerase chain reaction (PCR) using universal M13 primers. Plasmid DNAs extracted from positive colonies were sequenced, and sequencing results were analyzed with the SSRHunter v 1.3.0 (Li & Wan, 2005) to locate the microsatellite loci. Afterward, the primer pairs of microsatellites were designed using Primer premier 5.0 software.
In order to determine the characteristics of primer pairs, we tested these primer pairs on 36 P. phoxinus ujmonensis individuals collected from Haba River (HBH population). PCR amplifications were performed in a 10 µl of solution containing 50-100 ng genomic DNA, 2 × Es Taq MasterMix (Es Taq DNA Polymerase, 2 × Es Taq PCR Buffer, 3 mM MgCl 2 , 400 µM dNTP mix, CWBIO), and 5 µM of each primer on a Applied Biosystems (ABI). Thermal cycling parameters were as follows: initial denaturation at 94℃ for 3 min, followed by 35 cycles of denaturation at 94℃ for 30 s, specific annealing at different temperatures (Table 1) for 35 s, extension at 72℃ for 40 s, and final extension at 72℃ for 8 min. All PCR products were F I G U R E 1 Photograph of an individual of Phoxinus phoxinus ujmonensis captured from KYET population in 2015 resolved on 1.5% agarose gels to screen out primer pairs that could exhibit consistent amplification of polymorphic loci, and then, these primer pairs were further screened by polyacrylamide gel electrophoresis (PAGE). Subsequently, the images of PAGE were analyzed by Quantity One v 4.6.2 (Yáñez et al., 2008). Allele size ranges, the number of alleles per locus (N a ), heterozygosity, and polymorphic information content (PIC) were computed using Popgene v 1.32 and PIC Calculator v 0.6 (Kemp, 2002).

| Microsatellite loci genotyping and statistical analysis
In order to ensure the stability and reliability of the results, we selected microsatellite loci of good quality to analyze the genetic diversity of the seven P. phoxinus ujmonensis populations.
Forward primers were 5′ modified with a fluorescent dye (FAM, HEX, or TAMRA, Table 1). PCR amplifications were carried out as described above. All PCR products were genotyped on an ABI PRISM 3730 genetic analyzer (Applied Biosystems) with a ROX GS400 size standard. The genotype results were analyzed by GeneMarker v 1.51 (Holland & Parson, 2011). Afterward, the data file obtained from GeneMarker was converted into specific formats by CONVERT (Glaubitz, 2004) to evaluate the population genetics indexes with several softwares. The number of alleles (N a ), number of effective alleles (N e ), expected heterozygosity (H e ), and observed heterozygosity (H o ) were calculated for each locus and each population with PopGene v 1.32 (Yeh, Yang, & Boyle, 1999). Nei's unbiased genetic distance (N d ), Shannon's information index (S i ), and deviations from Hardy-Weinberg equilibrium (HWE) were computed through both Popgene and GenAlEx 6.5 (Peakall & Smouse, 2006). The analysis of population specific fixation index (F st ), pairwise genetic differentiation coefficient (F st_p ), and the hierarchical analysis of molecular variance (AMOVA) were performed using Arlequin v 3.5.1.2 (Excoffier & Lischer, 2010). Based on pairwise F st_p values, the unweighted pair-group method with arithmetic means (UPGMA) dendrogram was constructed with Mega 6.0 (Tamura, Stecher, Peterson, Filipski, & Kumar, 2013).
We performed the principal coordinate analysis (PCoA) with GenAlEx 6.5 to visualize the patterns of genetic relationships between individuals and populations, basing on pairwise Euclidean distances. Admixture analysis was conducted in STRUCTURE 2.3.4 (Evanno, Regnaut, & Goudet, 2005)

| Mitochondrial DNA sequence amplification and statistical analysis
In our previous study, we had sequenced and characterized the complete mitochondrial genome of P. phoxinus ujmonensis (Xie et al., 2016). Thus, we located the Cox I gene and Cox II gene and designed primer pairs with Primer Premier 5.0 (Table 1). PCR amplifications were conducted in the same reaction system described above. Thermal cycling parameters were as follows: initial dena-  & Rozas, 2009). Based on these haplotypes, the median-joining networks were constructed in Popart v 1.7 (Bandelt, Forster, & Röhl, 1999;Leigh & Bryant, 2015). Genetic distances (G d ) of these seven groups were calculated with Mega 6.0. To better understand the genetic distance and structure of these populations, we constructed the phylogenetic tree of all the 239 individuals based on the Cox I + Cox II dataset in Mega 6.0. We determined the best-fitting nucleotide substitution model which produced the lowest Bayesian information criterion (BIC) score using Mega 6.0. (Tamura et al., 2013).

| Correlation analysis of genetic data and environmental data
To investigate the relationship between population genetic characteristics and environmental factors, we collected some abiotic data such as water temperature (T), dissolved oxygen (DO), and conductivity ( Table ). We conducted the principal component analysis (PCA) with the "princomp" function in R 3.5.3. We conducted Mantel tests via the "vegan" package in R 3.5.3 to determine whether there exist IBD and IBA patterns (Hélène & Luca, 2011;Oksanen et al., 2013).
We subsequently analyzed and visualized the correlations between genetic diversity indexes and environmental factors in R 3.5.3.

| Characteristics of microsatellite loci
We designed a total of 54 primer pairs, based on 226 clones. Among these 54 microsatellite loci, 19 loci were perfect type, seven were interrupted type, and the others were compound type (Mathithumilan TA B L E 1 Main information of 12 microsatellite loci and two mitochondrial genes for Phoxinus phoxinus ujmonensis   (Holmen, Vøllestad, Jakobsen, & Primmer, 2009). Core information of these loci was submitted to GenBank (Table 1).

| Microsatellite dataset
All the 239 individuals from seven populations were genotyped in terms of the 12 microsatellite loci. However, the ppu23 did not consistently exhibit the expected amplifications across some populations, and significant deviations from Hardy-Weinberg equilibrium (HWE) were observed in ppu17, ppu47, ppu49, and ppu51 among several populations. Thus, we abandoned the data of the above five loci to ensure the reliability of genetic results (Planes & Fauvelot, 2002

| Mitochondrial dataset
We analyzed the 1,143 bp mtDNA fragments of Cox I gene with full length being 1,551 bp, the 691 bp mtDNA fragments of Cox II gene at full length (Table 1), and the 1,834 bp concatenated fragments of Cox I + Cox II. In total, 59, 29, and 82 haplotypes were detected from Cox I, Cox II, and Cox I + Cox II dataset in all the 239 individuals, respectively (Table 3). These results presented moderate haplotype diversity and low nucleotide diversity.

| Population structure
The N d values of the seven populations ranged from 0.0928 (KYET and XSK) to 0.7957 (TRKT and CHE), while the F st_p values varied from 0.01319 (BEJ and CHE) to 0.11049 (TRKT and CHE; Table 4).
Notably, moderate differentiation was detected between TRKT population and the rest five populations except HBH population, which might be attributed to the fact that both HBH and TRKT population were located in the same tributary.
The AMOVA results illustrated that 94.34% of the total variance of nuclear genetics was attributed to the differences within populations (among individuals), and only 5.66% of the total variance was attributed to the differences among populations (Table ), suggesting weak population genetic differentiation. The UPGMA dendrogram based on F st_p values consisted of three main clusters. KYET, XSK, and KLH fell into the first cluster, which was converged with the second cluster of BEJ and CHE, and they were finally merged with the third cluster of HBH and TRKT ( Figure 2b). Table 5, the low G d values revealed by mtDNA haplotypes implied neither significant nor strong genetic structure among these populations. The shared haplotypes in the three networks suggested that some key cytoplasmic genetic information was communicated frequently across these populations, supporting the genetic structure to some extent (Figure 3). TA B L E 3 Number of haplotypes (N), haplotype diversity (H d ), and nucleotide diversity (P i ) revealed by mtDNA markers

| PCA and correlation analysis of environmental factors
The PCA results showed that the first three components accounted for 96.04% of the variance with their eigenvalues greater than 1  negatively to V (Figure 7a). The average plankton diversity indexes of Mar, Sha, Sim, and Pie at each sampling site (data at TRKT site absent) were computed as Multi since these four indexes were highly positively correlated to each other. The correlation between S i and Multi was also positively high, while correlation between F st and Multi was negatively high.
As both the UPGMA and admixture analysis suggested that these seven populations could be regarded as three clusters, we combined the allele data of individuals belonging to each cluster to calculate specific Shannon information index (Si 3 ). We also computed the abiotic and biotic indexes in three regions corresponding to three clusters. In addition, we calculated the Shannon-Wiener's diversity indexes of fish species (S f ) in the upper, middle, and lower regions (matched the three clusters). Visualized correlation analysis suggested that Si 3 of the three regions was positively correlated to DO 3 and pH 3 , as well as to Multi 3 and S f (Figure 7b).
The results of Mantel test indicated that there were no IBD patterns on either geographic distance (r = 0.185, p = 0.170) or water distance (r = 0.339, p = 0.120; Figure 8). However, the slight but significant correlation (r = −0.310, p = 0.020) of the T matrix between F st_p matrix might imply the potential IBA pattern, though correlations of F st_p between other five factors were not significant (Figure 9).

| Genetic diversity and structure
In this study, we found a high nDNA diversity in contrast with a moderate mtDNA diversity. Namely, (a) the mean H o of these seven populations was as high as 0.791, and the mean S i was 1.976.
Especially, the S i value, as the major index of the genetic diversity in this study, was found to be obviously higher than that in previous studies (Ji et al., 2014;Pandian et al., 2018;Wang et al., 2007 which could be attributed to the different inheritance patterns of these two types of markers. Nuclear genomes can be more stably inherited by the next generation, while cytoplasmic genomes could only be partially inherited from the previous generation (Galloway & Fenster, 1999;Jaj & Werren, 1995;Nigro, 1994). This suggests that the frequency of nuclear genotype declined much more slowly than that of cytoplasmic haplotype, resulting in high nDNA diversity but low mtDNA diversity in some species.
The pairwise genetic differentiation coefficients (F st_p ) and genetic distances (N d ) are main indexes of genetic structure. A large number of previous studies have provided the reference for the use of these two indexes (Hennink & Zeven, 1990;Nanninga & Manica, 2018;Rousset, 1997;Waldmann, Garcíagil, & Sillanpää, 2005). This study found that most F st_p values were below the threshold value of 0.05, indicating no differentiation among these populations, except that two values of HBH and five values of TRKT were higher than 0.05 but lower than 0.15. This exception implied moderate F I G U R E 4 Principal coordinate analysis of pairwise distances between individuals and populations of Phoxinus phoxinus ujmonensis, visualized in GenAlEx 6.5 differentiation among these populations (Wright, 1965). Like the Such an environment was favorable for the two populations' adaptation which allowed high genetic diversity and resulted in slight differentiation between these two populations and other five populations. Furthermore, the UPGMA dendrogram exhibited a slight but meaningful convergence; that is, these seven groups showed apparently regional aggregation in upper, middle, and lower reaches, respectively. This might reflect a trend that the genetic relationships among these seven populations developed along the river from upper stream to lower stream, since gene flow between nearby populations could occur more frequently. A similar trend was found in the population genetic structure of California Bocaccio and Atlantic Salmon (Matala, Gray, Gharrett, & Love, 2004;Primmer et al., 2010). Moreover, the NJ dendrograms of mitochondrial haplotypes presented a random pattern and no classification of phylogenetic haplogroups, as the haplotypes shared between geographically close populations were internally unrelated.
This weak but significant genetic structure could be explained by two hypotheses. First, the high nDNA and moderate mtDNA diversity together with moderate genetic differentiation among several populations might indicate that there existed extensive gene flow among these seven populations since these four tributaries were in a continuous water system, which could ensure high genotypic association of mainstream and tributaries. This might also explain the absence of IBD to some extent. The root clade in the ML tree which F I G U R E 7 Correlation analysis between environmental factors and genetic diversity indexes, computed on both population level (a) and region level (b, the three regions were equal to the three clusters above, variables like "T" were marked with "3" to distinguish). Alt, altitude of sampling location; Con, conductivity of water; DO, concentration of dissolved oxygen; F st , population specific fixation index; HdC/PiC, haplotype/nucleotide diversity from Cox Ⅰ + Cox Ⅱ; Multi, mean values of plankton diversity indexes; pH, water pH; S f , Shannon's diversity index of fish species; S i , Shannon's information index from microsatellite loci; T, water temperature, TDS, total dissolved solids in water; V, flow velocity of water. Tests were conducted and visualized via "corrplot" package in R 3.5.3 F I G U R E 8 Mantel tests for detecting IBD patterns, matrixes of geographic (D g ), and water (D w ) distances against matrixes of F st_p . D g and D w were measured through Google Earth 7.1.2.2041. Tests were conducted via "vegan" package in

| Influence of environment
It is well known that environment can shape both genetic and trait variation (Magalhaes, D'Agostino, Hohenlohe, & Maccoll, 2016;Robinson & Wilson, 1996). P. phoxinus ujmonensis' habit of preferring to inhabit in the cold and clean tributaries containing high dissolved oxygen may explain why the Shannon genetic information index (S i ) was positively related to dissolved oxygen (DO). The previous study reported that dissolved oxygen availability might have great effect on the energy for fish locomotion, growth, and reproduction (1987).
Generally speaking, the active locomotion could enhance frequent gene flow among fish populations, and efficient reproduction could ensure better inheritance of genetic information. The evidence supporting above findings could be found in the study of the population differentiation of the African cyprinid Barbus neumayeri. The aquatic dissolved oxygen was reported as a selective force limiting gene flow among populations and therefore played a key role in determining the genetic structure of populations (Robert, Merritt, Chapman, David, & Martinez, 2013). Genetic study in Simulium tani also showed that nucleotide diversity was strongly correlated to the levels of dissolved oxygen in the streams (Low, Adler, & Takaoka, 2014).
The cold water fish Cottus poecilopus was also reported to be abundant in streams with high oxygen saturation and high abundance of macroinvertebrates (Baran et al., 2015). Another study of aquatic organisms found that among-pond variation in community structure of freshwater zooplankton was partially explained by pH and dissolved oxygen (HolmesSingh, 2016). Similarly, the pH of water largely influenced the survival, growth, development, and reproduction of fish (Abbink et al., 2012;Ahmed, Donkor, Street, & Vasiljevic, 2013;Alavi & Cosson, 2005;Dziewulska & Domagała, 2013), which were also F I G U R E 9 Mantel tests for detecting IBA patterns, matrixes of environmental variables against matrixes of F st_p . Con, conductivity of water; DO, concentration of dissolved oxygen; T = temperature of water; TDS, total dissolved solids in water; V, flow velocity of water. Tests were conducted via "vegan" package in R 3.5.3 and visualized in Origin 8.5 important for maintaining genetic information of species. For examples, the diversity of trout was previously reported to be low in lakes at low pH (Reimchen, 1992), and nine-spined stickleback was abundant under alkaline conditions (MacColl, Nagar, & Roij, 2013). Thus, these findings should support the conclusion that genetic diversity is positively related to pH in this study, since the mean pH values among these seven populations ranged from 7.52 to 7.98.
However, genetic diversity and structure estimated by mitochondrial markers seemed to be influenced by abiotic factors like V, Con, and TDS. To be noted, P. phoxinus ujmonensis is a small-body fish with weak swimming ability. And large females in fish usually represent sexually mature individuals who allocated more energy to reproduction rather than resistance against high flow velocity of water.
Thus, we might speculate that the proportion of female P. phoxinus ujmonensis should be higher in the area with low velocity, especially in the breeding season. The HdC/PiC values in XSK, BEJ, and HBH populations were higher than those in the upper stream of the same tributary; that is, XSK was higher than KYET, BEJ was higher than CHE, and HBH was higher than TRKT. Considering the characteristics of maternal inheritance of mitochondrial genes, the negative correlation between V and HdC/PiC should be acceptable. The PCA of the dispersal tendency of pumpkinseed (Lepomis gibbosus) also showed the importance of water flow velocity (Ashenden, Rooke, & Fox, 2017). As a component of the cytoplasm and the energy factory of the cell, mitochondria directly and frequently respond to environmental stimuli. For example, it produces more energy to speed up swimming in response to some stimuli. Another variable phenotype directly related to cytoplasm in fish is the electroreception (Hopkins, 1976;Zakon & Meyer, 1983). Under certain conductivity in water, fish could rely on specific electroreceptors like jaw and lateral line to sense prey and predator (Bleckmann, 1986;Carr, Maler, & Sas, 2010;Janssen, 2010;King, Hu, & Long, 2018;Maciver, Sharabash, & Nelson, 2001 (Table ).
As for the biotic factors, the Shannon genetic information index (S i ) was also highly positively related to both plankton diversity indexes (Mar, Sha, Sim, and Pie) and Shannon-Wiener's diversity indexes of fish species (S f ). Food availability was reported to be an important factor influencing species richness, in turn, affecting genetic diversity (Gaston, 2000;Pope, Riginos, Ovenden, Keyse, & Blomberg, 2015). High diversity of plankton could guarantee enough food for fish like P. phoxinus ujmonensis, so that they might survive, grow, locomote, and reproduce well (Papiol, Cartes, & Fanelli, 2014).
Nonetheless, one might note the discrepancy that S i (nuclear diversity) was positively related to DO but HdC/PiC (cytoplasmic diversity) were positively related to TDS since DO and TDS were always negatively related. We thought that this should also be due to the different inheritance patterns of these two types of markers mentioned above and the unique environment in this watershed. The four tributaries wind down the hillside on the north side of the mainstream. The TDS, to which HdC/PiC were positively related, were undoubtedly higher in the XSK, BEJ, and HBH populations since they were located in the lower reaches of the three tributaries, while the DO seemed to be higher in upper reaches except for the HBH population. Nevertheless, the Pearson correlation coefficient for S i and DO (0.69) was lower than that for S i and Multi (0.72), and they were both lower than that for DO and Multi (0.87). As the Multi represented the abundance of food for P. phoxinus ujmonensis, we might hypothesize that food availability has more influence on the genetic diversity of this species.

| CON CLUS ION
In this study, we developed and characterized 12 highly polymorphic microsatellite loci. Population analysis revealed high level of genetic diversity in local populations of P. phoxinus ujmonensis along the Irtysh River, which might be attributed to high genotypic association ensured by the continuous water system. However, there was moderate genetic differentiation among several populations, which could partially be due to the habitat fragmentation. The slight but meaningful differentiation of TRKT population against other populations might imply that human activities are unneglectable factors that influence genetic structure of fish. There was no obvious evidence supporting isolation by distance. However, we found the potential evidence of isolation by adaptation as well as the highly positive correlation between genetic diversity and abiotic or biotic factors. This might also suggest that there exist gene flow along distances, but different populations might be in the process of adapting to divergent environments due to habitat fragmentation. Finally, as P. phoxinus ujmonensis occupies an important niche in the food web, our results will provide reference for the study of population genetics of economic or endangered fish in the same catchment. The relationship of genetic information of various fish species and the detailed interaction between environmental factors and genetic characteristics remain to be further investigated.

ACK N OWLED G M ENTS
This work was financially supported by the Special Funds for the Foundation Work of Science and Technology of China (No. 2012FY112700). Authors wish to thank Zhi-Ming Zhang and Yuanyuan Zheng for collecting fish and plankton samples.

CO N FLI C T O F I NTE R E S T
The authors declare no conflicts of interest. Guo revised the manuscript. All authors reviewed the manuscript.

DATA AVA I L A B I L I T Y
Microsatellite genotypes for seven populations were in Table . (Tables ) were in the Supporting Information.