Population genetic structure of Bellamya aeruginosa (Mollusca: Gastropoda: Viviparidae) in China: weak divergence across large geographic distances

Abstract Bellamya aeruginosa is a widely distributed Chinese freshwater snail that is heavily harvested, and its natural habitats are under severe threat due to fragmentation and loss. We were interested whether the large geographic distances between populations and habitat fragmentation have led to population differentiation and reduced genetic diversity in the species. To estimate the genetic diversity and population structure of B. aeruginosa, 277 individuals from 12 populations throughout its distribution range across China were sampled: two populations were sampled from the Yellow River system, eight populations from the Yangtze River system, and two populations from isolated plateau lakes. We used seven microsatellite loci and mitochondrial cytochrome oxidase I sequences to estimate population genetic parameters and test for demographic fluctuations. Our results showed that (1) the genetic diversity of B. aeruginosa was high for both markers in most of the studied populations and effective population sizes appear to be large, (2) only very low and mostly nonsignificant levels of genetic differentiation existed among the 12 populations, gene flow was generally high, and (3) relatively weak geographic structure was detected despite large geographic distances between populations. Further, no isolation by linear or stream distance was found among populations within the Yangtze River system and no signs of population bottlenecks were detected. Gene flow occurred even between far distant populations, possibly as a result of passive dispersal during flooding events, zoochoric dispersal, and/or anthropogenic translocations explaining the lack of stronger differentiation across large geographic distances. The high genetic diversity of B. aeruginosa and the weak population differentiation are likely the results of strong gene flow facilitated by passive dispersal and large population sizes suggesting that the species currently is not of conservation concern.


Introduction
The level of genetic diversity in natural populations is determined by the interplay of mutation, migration, hybridization, drift, and selection (Harrison 1991;Vellend and Geber 2005). The relative role that each force plays depends on life-history traits, the mating system, and the dispersal ability of a species. Extrinsic factors such as the landscape matrix, the geographic history, and anthropogenic actions can further affect genetic diversity of natural populations (Husemann et al. 2012;Fern andez-Garc ıa et al. 2014;Eberhart-Phillips et al. 2015). The landscape configuration can influence dispersal, which in turn affects the population genetic structure. For example, mountainous landscapes and anthropogenically fragmented systems may increase the genetic divergence among natural populations by limiting seasonal movements (Epps et al. 2005;Roffler et al. 2014). The geographic history of a landscape, in turn, has influenced the demographic history of the local populations, which is an important factor shaping the current genetic composition of a species. Demographic bottlenecks and founder effects in populations at the leading edge, for example, can lead to reduced genetic diversity (Ray et al. 2015). Hence, the population structure that can be observed today is the result of a complex interplay of current and past processes which are difficult to disentangle. In order to understand the contributions of both historical and contemporary factors, the use of markers with different evolutionary speeds can be an important resource.
Freshwater gastropods represent interesting models to study the effects of extrinsic factors on the population genetic structure due to their primarily sessile lifestyle (Nekola 2012). Many snail species show strong genetic differentiation between populations, even across small geographic distances (Hurtrez-Bouss es et al. 2010;Tian-Bi et al. 2013). However, freshwater snails often occur in high abundances (Chaine et al. 2012) potentially reducing the amount of genetic drift that populations experience counteracting genetic differentiation between populations (Tibbets and Dowling 1996). Further, passive dispersal via drifting of larvae or adults may reduce genetic differentiation, at least for riverine species within river catchments. Long-distance dispersal, however, is generally only possible if snails or their larvae are transported passively, either by animals (zoochory) or accidentally or on purpose by humans (anthropogenic translocation) (Gittenberger et al. 2006;Gittenberger 2012).
Despite multiple modes of passive dispersal, strong population divergence is often observed between local snail populations (Sinclair-Winters 2014), either as a result of natural geographic isolation or anthropogenic fragmentation. The genetic divergence is often accompanied by elevated effects of drift (Gonz alez-Astorga and N uñez-Farf an 2001) and a reduction in genetic diversity and effective population size (N e ) (Zuberogoitia et al. 2013). Such reductions may have negative effects on the adaptive potential of populations (Reed and Frankham 2003;Willi et al. 2006). Therefore, estimating genetic diversity and differentiation can provide important insights into the threat status of a species (Toro and Caballero 2005); further, analyses of genetic structure and demographic analyses using multiple genetic markers with different evolutionary speeds may help to understand the factors that are and have been most relevant in shaping a population's genetic structure (Waples and Gaggiotti 2006;Roberts et al. 2013).
Here, we investigate the population structure and distribution of genetic diversity in the Chinese freshwater snail Bellamya aeruginosa. The species belongs to a species-rich genus of primary sessile freshwater snails, which is widely distributed across South-East Asia, India and Africa (Schultheiß et al. 2011;Van Bocxlaer and Hunt 2013). Bellamya aeruginosa is a relatively common species in China and has a wide distribution across the Yangtze and Yellow river systems. In particular, the large number of great lakes in the Yangtze River system, specifically in the Jianghan plain and Dongting Lake Plain, provides an out-standing environment for B. aeruginosa. However, due to intensive exploitation (Ozawa et al. 2003;Song et al. 2007), the species may be of conservation concern; yet, formal evaluations of the status of the species have not been performed in this area. Further, the species may be affected by the strong fragmentation of Chinese river systems by damming in the last century (Nilsson et al. 2005), especially in Yangtze River system, potentially further reducing its population size and genetic diversity (Ricciardi and Rasmussen 1999;Roberts et al. 2013).
In this study, we sampled 10 populations of B. aeruginosa across a network of large lakes in China that are connected via the Yangtze (N = 8) and Yellow (N = 2) river systems and two populations from isolated plateau lakes (N = 2). For all populations, we genotyped two genetic marker systems with different evolutionary speeds: the mitochondrial COI (cytochrome oxidase I) gene and seven microsatellite loci. We used the data to estimate genetic diversity and differentiation and effective population sizes and to explore the demographic history of local populations. Specifically, we aimed to differentiate between two main competing scenarios and their potential underlying evolutionary mechanisms: (1) limited genetic divergence and high intrapopulation diversity of local populations would suggest large population sizes and good historic connectivity, (2) In contrast, low genetic diversity and strong differentiation of local populations could be the result of a sessile life style, the occurrence in lacustrine systems, large geographic distances isolating populations, and the proposed strong anthropogenic pressure (overharvesting and fragmentation).

Materials and Methods
Sampling Bellamya aeruginosa populations were sampled from 10 interconnected lakes in the Yellow (N = 2) and Yangtze (N = 8) river systems and two isolated Plateau lakes (N = 2) in China ( Fig. 1 and Table 1). A total of 277 individuals were obtained between October 2011 and July 2012. Species identification followed the keys provided by Zhang and Liu (1960). Adductor muscles were preserved in 95 % ethanol and stored at À20°C until DNA extraction.

Mitochondrial COI sequencing
Genomic DNA was extracted with the DNeasy Blood and Tissue Kit (QIAGEN, Hilden, Germany) following the manufacturer's protocol and conserved at À20°C until further use. We amplified partial COI sequences using the universal primers for invertebrates described by Folmer et al. (1994). PCR products were purified following a standard sodium acetate precipitation (B€ urgmann et al. 2001). A total of 116 specimens were sequenced using the BigDye Terminator kit v. 3.1 (Applied Biosystems, Foster City, CA) and were processed by Sangon Biotech Co., Ltd. (Shanghai, China). All individuals were sequenced in both directions. Forward and reverse sequences were inspected and aligned using the SEQMAN software in Lasergene v. 7.1 (DNAstar, Madison, WI). Consensus sequences were blasted to confirm that the sequence amplified was the correct target DNA fragment. All new sequences were deposited in GenBank (accession numbers: KF535413-KF535429, KF535431-KF535468, KP150575-KP150635).

COI analysis
The resulting 116 sequences were aligned with Clustal X (Thompson et al. 1997). Variable sites, parsimony informative sites, number of haplotypes, and haplotype (h) and nucleotide diversity (p) were calculated with DNASP v. 5 (Librado and Rozas 2009). ARLEQUIN v. 3.5 (Excoffier and Lischer 2010) was used to test for neutrality employing Tajima's D (Tajima 1989) and Fu's F statistics (Fu 1997). AMOVA (Analysis of molecular variance) was conducted in ARLEQUIN to partition the genetic variance within and among populations. Furthermore, three groups were defined according to river system (NS and BY as Yellow River system, EH and DC as Plateau Lake; all other populations as Yangtze River system) to estimate variance components. Pairwise Φ ST values were calculated to assess population differentiation. We generated a statistical parsimony haplotype network with a 95 % connection limit with TCS v. 1.2.1 (Clement et al. 2000). Further, we used the COI data to generate Bayesian skyline plots to explore the demographic history of each population alone and the whole combined data set. This method permits the reconstruction of past population demography and generates plots of female effective population sizes (N e ) over time. The appropriate substitution model was determined as HKY+I+G with jModeltest v. 2 (Darriba et al. 2012). We created individual input files for each population, and all populations combined with BEAUti v.1.7.4 (implemented in the BEAST package). Analyses were run in BEAST v.1.7.4 (Drummond et al. 2012). We used a strict clock with a published substitution rate of 1.32 % per million years for invertebrates estimated for COI under the HKY model (Wilke et al. 2009). The program was run for 10 million generations sampling every 10,000 generations. The Bayesian skyline plots were subsequently generated with Tracer v. 1.5 (Rambaut and Drummond 2009). We tested for isolation by linear geographic distance across all data. As sampling at different geographic scales may lead to artificial IBD patterns, we also tested for isolation by distance and stream distance within the Yangtze River only using a Mantel test with 10,000 randomizations as implemented in IBD v. 1.52 (Bohonak 2002). Stream distances were measured along the streams between sites using ArcMap v. 10 (ESRI 2011).

Microsatellite markers analysis
All samples (N = 277) were genotyped at seven microsatellite loci (Table S1). The PCR amplifications were performed in a total volume of 10 lL: 19 PCR buf-  (Table 1) displaying the study region in China; redsampling locations in the Yellow River system, graysampling locations at the Yangtze River system, and blue sampling locations at plateau lakes. The box shows the position of sampling locations within the Yangtze River, numbers along the stream represent distances.
The resulting data were first inspected with MICRO-CHECKER v. 2.2.3 (Van Oosterhout et al. 2004) to test for unexpected mutation steps, large gaps, unusually sized alleles and the presence of null alleles. The number of alleles (N A ), the expected (H E ) and observed heterozygosities (H O ), F ST , and the inbreeding coefficients F IS (Weir and Cockerham 1984) were estimated with MSA (MICROSATELLITE ANALYSER) v. 3.15 (Dieringer and Schl€ otterer 2003). GENEPOP v. 4.0.10 (Rousset 2008) was used to measure heterozygote deficiency or excess and to assess deviations from HWE (Hardy-Weinberg equilibrium). P-values were corrected for multiple comparisons by applying a sequential Bonferroni correction (Rice 1989). We tested for recent bottlenecks events using BOTTLENECK v. 1.2.02 (Piry et al. 1999) under the TPM (two-phased model), with the proportion of stepwise mutations set to 95 % and the variance set at 15. Significance of deviations was tested using the Wilcoxon sign-rank test with 1000 iterations. Further, we used the microsatellite data to calculate effective population sizes (N e ). We used LDN e as implemented in NeEstimator v. 2 (Do et al. 2014) to calculate N e considering results from 0.02 and 0.01 as lowest allowed allele frequencies.
Pairwise migration rates between the 12 populations were estimated using a maximum likelihood coalescent approach implemented in MIGRATE v 3.0 (Beerli and Felsenstein 2001); we estimated Ө and M (immigration rate/mutation rate) based on F ST values. We ran 10 short chains with a total of 10,000 genealogy samples and three long chains with 1,000,000 samples, following a burn-in of 10,000 samples; three independent runs were performed. As MIGRATE has been suggested to be vulnerable to violations of the assumption of stable pop- ulation sizes, we also used the Bayesian approach developed by Wilson and Rannala implemented in BAYES-SASS 3.0 (Wilson and Rannala 2003) to infer migration rates. The program uses genotypic data and MCMCs (Markov Chain Monte Carlo) to infer recent patterns of gene flow. We performed five independent analyses. Each run contained 35,000,000 MCMC iterations, with a burn-in of 3,500,000 iterations and a sampling frequency of 3500 generations with a random seed. To ensure sufficient mixing of the MCMCs and to improve the coverage of the probability space, we adjusted the acceptance rate for estimated allele frequencies and inbreeding coefficients. We increased the mixing parameters for the allele frequencies (DA) to 0.50 and for the inbreeding coefficient (DF) to 0.80. Mixing and convergence of MCMCs were visually assessed using TRACER. From the five independent runs, we chose the run with the lowest Bayesian deviance in the logProb calculated using the R-function (R Development Core Team 2009) provided by Faubet et al. (2007) and as suggested in Meirmans (2014). An analysis of molecular variance was performed with ARLEQUIN to partition the genetic variance among and within populations, similar to the mtDNA data. We also tested for isolation by linear geographic distance and stream distance (see mtDNA data) using Mantel tests with 10,000 randomizations as implemented in IBD v. 1.52 (Bohonak 2002). Population genetic structure was further analyzed using the Bayesian algorithm implemented in STRUCTURE v. 2.3 (Pritchard et al. 2000). The software was used to cluster individuals into K populations. The number of genetic clusters (K) was assessed assuming an admixture ancestry model with correlated allele frequencies; 20 independent runs were performed with 500,000 MCMC repetitions at each run discarding a burn-in of 150,000 iterations. We tested K from 1 to 13 and used the ad-hoc statistic DK (Evanno et al. 2005) to determine the most likely value of K. Data were sorted with CLUMPP v. 1.1.2 (Jakobsson and Rosenberg 2007), and aligned data were visualized in DISTRUCT (Rosenberg 2004).

Genetic variation -COI
We generated 116 COI sequences with a length of 709 bp, 81 of which represented unique haplotypes. The alignment contained 167 variable sites (23.6 %), 136 of which were parsimony informative. Nucleotide (p) and haplotype diversities (h) across all populations were 0.038 and 0.985, respectively. Tajima's D and Fu's Fs were not significant in any population.

Genetic variation -microsatellite markers
No evidence of scoring errors was revealed by MICRO-CHECKER. However, the presence of null alleles was suggested for several loci (Table S2). However, no systematic distribution of null alleles across populations was found and only few populations were affected; therefore, all loci were maintained for further analyses. GENEPOP showed deviations from HWE for all populations at multiple loci even after Bonferroni correction (P < 0.007). Probability values for H E excess were > 0.813 for all populations except QJ (Qingjiang River) (P = 0.679) ( Table 1). The number of alleles (N A ) ranged from 5 to 25 across the seven microsatellite loci. The mean N A across all loci ranged from 11.0 for Dianchi Lake (DC) to 14.7 for Chaohu Lake (CH) ( Table 1). Observed heterozygosity (H O ) and expected heterozygosity (H E ) were overall high across all populations with mean values ranging from 0.657 to 0.782 and 0.785 to 0.864, respectively (Table 1). Inbreeding coefficients ranged from 0.113 to 0.233 (Table 1). Most populations showed no signs of bottlenecks, except for Honghu Lake (HH) (P = 0.008) and QJ (P = 0.016), suggesting that most studied populations were under mutation-drift equilibrium.
Population structure F ST estimates from the microsatellite data ranged from 0.018 to 0.107; Φ ST estimates from the mitochondrial sequences ranged from 0.002 to 0.133 (data not shown). Most Φ ST and all F ST values were low and not statistically significant (P > 0.05). AMOVA of the microsatellite data revealed that 93.31 % of genetic variation was explained by intrapopulation variation, while the remaining 6.69 % (P < 0.05) explained variation among populations. For mitochondrial sequences, the amount of variance explained by differences between populations was similar (6.03 %) ( Table 2). Another AMOVA was conducted using river systems as groups (NS and BY as Yellow River system, EH and DC as Plateau Lakes; all other populations as Yangtze River system). However, very low differentiation at the group level was found for both microsatellite and COI data (Table 3).
Mitochondrial haplotypes from the same sampling locations did not cluster together with the exception of CH (Chaohu Lake, Yangtze River system), BY (Baiyangdian Lake, Yellow River system) and HH populations (Honghu Lake, Yangtze River system) (Fig. 2). STRUC-TURE analysis of the mitochondrial data indicated a grouping into three genetic clusters (DK = 67.3 for K = 3, Figs 3). Most of the individuals from NS, BY, and DC were assigned to the first genetic cluster, EH and CH represented the second cluster, and the populations from DT, HH, and QJ represented a third cluster, while other populations tended to be more admixed (HZ, TH, PY, LZ).
We found a significant isolation by linear distance pattern for the microsatellite data (Mantel test, r = 0.4888; P = 0.011, Fig. 4A) and COI data (r = 0.5101; P = 0.011, Fig. 4B), when all populations were included. When we only analyzed the populations from the Yangtze catchment, no isolation by linear or stream distance was detected for either data set ( Fig. 4C-F).

Gene flow
Gene flow estimates based on the microsatellite data calculated with MIGRATE and BAYESASS indicated moderate to high levels of gene flow between populations (Tables 4 and 5). MIGRATE estimated the effective number of migrants entering and leaving each population per generation (M) between 1.29 (QJ?DT) and 74.76 (LZ? QJ) ( Table 4). Most of populations had asymmetrical gene flow except for HZ, NS, and DC populations. The highest unidirectional estimates of gene flow estimated by MIGRATE were found toward the HH population (M = 188.47) and the lowest was for CH (M = 90.03), while the highest gene flow out of a population was from LZ (M = 251.94). Similarly, many migration rate estimates from BAYESASS were moderate to high, ranging from 0.01 to 0.237 (76 of 144 values); the remainder varied from 0.008 to 0.009 (Table 5).

Current and historical demography
The effective population size estimates derived from LDN e analyses were not very precise, appeared to vary between populations, yet were mostly large (estimated as infinite for five of the 12 populations, range of other populations from 55 for CH to 1006 for YX).
BOTTLENECK analyses of the microsatellite data revealed no evidence of a recent bottleneck for most of populations except for the HH and QJ populations. No significant heterozygosity excess was detected in any locality under the TPM.
The Bayesian skyline plots detected relatively stable population sizes throughout the last 2 mya for most populations (Fig. 5). The overall population size of the species in the study region appears to have increased constantly from 3 mya to 300 kya, when the population sizes strongly dropped. Subsequent to this drop, population size recovered quickly even extending predecline population size.

Discussion
Our population genetic analysis of B. aeruginosa supports our first scenario: we found high levels of genetic diversity and strong population admixture across the 12 studied populations of B. aeruginosa for both mitochondrial and nuclear markers despite a sessile life style and large separating geographic distances. Current population sizes appear to be mostly large, which was also supported by the high genetic diversity and past fluctuations of population sizes appear to be limited. However, the species as a whole seems to have rapidly declined during the mid-to Late Pleistocene and postglacially expanded. Yet, no recent bottlenecks were detected with the microsatellite data. Our findings suggest that gene flow is moderate to high among the sampling locations, even between river systems despite vast geographic distances and low active dispersal ability. The results further indicate that the species at the moment can maintain high genetic diversity despite strong harvesting pressure and increasing fragmentation. These findings may be explained by a combination of large population sizes with high standing genetic variation and long-distance gene flow, potentially facilitated by zoochoric and anthropogenic dispersal. In the following, we discuss each of these findings in detail.

Genetic diversity and current and past population demography
Both marker systems revealed high genetic diversity and low levels of genetic differentiation between local populations (Tables 1-4, Figs. 2, 3). Current population sizes were generally large and confidence intervals in all, but one population included infinity. Population sizes of single populations appeared to be relatively stable with increases or declines occurring within the last 500 ka, independent of the stream system (Fig. 5). Overall, the population size of all combined populations constantly increased starting 3 mya with a sudden drop around 300 kya. Within the last 100 ka, the population sizes increased again even exceeding the predecline size (Fig. 5).
Our findings suggest that the demographic history of the species has been influenced by past climatic cycles with a drop in population size corresponding to the Late Pleistocene and postglacial range expansion (Yang et al. 2009;Bagley et al. 2013). For ten populations, we  detected a drop in population size from mid-or Late Pleistocene (Fig. 5). The Tali glacial stage of the Late Pleistocene (10-110 ka) likely had a great influence on the distribution of Bellamya in plateau lakes, while the Taku glaciation in the early Mid-Pleistocene and the Lushan glaciation in the Late Mid-Pleistocene (200-230 ka) (Wissmann 1937;Tang and Tao 1997;Xiang et al. 2006) may have led to a fast decline of Bellamya populations across the Yangtze River system. At more recent time scales, microsatellites could not detect bottlenecks for most populations suggesting no recent drastic population declines. The current population sizes are likely large and together with high migration rates allow for the maintenance of high genetic diversity. These find-    ings of high genetic diversity and little differentiation is comparatively uncommon in freshwater snails in lakes due to their low migration ability, high self-fertilization rates, and historical demographic instability (Campbell et al. 2010;Nalugwa et al. 2011;Standley et al. 2014). However, Bellamya is somewhat special as the genus has a relatively short generation time of only 4 months (Ma et al. 2010), which in combination with their specific mating strategy (gonochorism and viviparity), may make them less vulnerable to rapid loss of genetic diversity resulting from population fluctuations and geographic isolation (Li et al. 2011;Tian-Bi et al. 2013). Further, the lacustrine habitats may represent relatively stable habitats, which allow for large stable population sizes, further helping to maintain high genetic diversity. Hence, the combination of the environmental setting and species specific life-history traits together promotes the maintenance of large stable populations in B. aeruginosa.

Low population divergence and weak geographic structure
Another aspect potentially contributing to the high genetic diversity are high migration rates, even between far distant populations. We found that the study populations are only slightly differentiated from each other. F ST and Φ ST values were low and mostly insignificant. We found a significant signal of isolation by distance for both genetic markers across the whole study region ( Fig. 4A and B). Isolation by distance and isolation by stream distance in the Yangtze River, however, were much weaker ( Fig. 4C-F). AMOVA showed limited divergence between populations and regions for both data sets (Tables 2 and 3). Bayesian inference of recent migration rates suggested consistent, intermediate to high levels level of gene flow between populations (Table 5). The haplotype network for the mitochondrial data showed little sorting of locations (Fig. 2) and STRUCTURE suggested three lineages; yet, these only partially correspond to the stream systems and geographic regions. All these findings together suggest that gene flow is moderate to high and not only occurs within streams, but also between different drainages (Yangtze River and Yellow River); even isolated plateau lakes receive migrants. Within the Yangtze River for which we have the largest number of samples, no strong isolation by stream distance was detected, suggesting that populations along the stream frequently exchange individuals. The absence of significant isolation by distance and weak geographic signal among populations is not uncommon in snail species (Wada et al. 2012). This may at least in riverine systems partially be due to frequent flooding events during which specimens may be swept away and transported even to far distant locations within the stream (Van Leeuwen et al. 2013). Accidental transport during flooding may change population genetic patterns within rivers and even completely remove isolation by distance patterns (Crispo and Chapman 2009). Such flash floods are common in the Yangtze River catchment, and all of the lakes sampled in this catchment are connected to the river.
Flooding, however, cannot explain the strong admixture between far distant sampling locations across different river systems and isolated plateau lakes (as seen e.g., in Figs. 2, 3). Here, anthropogenic translocations and animal-mediated dispersal via waterfowl represent a possible explanation, a commonly observed phenomenon in freshwater snails (Green and Figuerola 2005;Gittenberger et al. 2006;Gittenberger 2012;Kopp et al. 2012). Human-and animalmediated dispersal, for example, explains the present day distribution of Isabellaria buresi pharsalica in Thessaly (Uit de Weerd et al. 2005). As Bellamya are commonly harvested by humans, anthropogenic translocations appear likely for this genus (Prabhakar and Roy 2009;Li et al. 2010). Such translocations, independent of the vector, appear to have positive effects on the genetic diversity of the species.

Conclusion
Our genetic analysis revealed high levels of intrapopulation genetic variation, large effective population sizes, moderate to high levels of gene flow, and low interpopulation differentiation. The high rates of gene flow within the Yangtze catchment may be supported by drifting during frequent flooding events, while gene flow between different catchments likely is facilitated by animal-or human-mediated dispersal. All together, the conservation status of the species seems currently uncritical despite strong harvesting pressure and anthropogenic habitat fragmentation.

Supporting Information
Additional Supporting Information may be found in the online version of this article: Figure S1. Distributions of pairwise nucleotide differences (mismatch distributions) of COI gene in B. aeruginosa samples and the corresponding Fu's Fs value. Figure S2. The most likely value of K (inferred cluster) estimated using Evanno's DK-method. Table S1. Primer sequence for microsatellite markers and optional PCR conditions.