Genetic evaluation of migratory fish: Implications for conservation and stocking programs

Abstract Fish stocking programs have been implemented to mitigate the blockage of original riverbeds by the construction of hydropower dams, which affects the natural migration of fish populations. However, this method raises concerns regarding the genetic rescue of the original populations of migratory fish species. We investigated the spatial distribution of genetic properties, such as genetic diversity, population structure, and gene flow (migration), of the Neotropical migratory fish Prochilodus costatus in the Três Marias dam in the São Francisco River basin, Brazil, and examined the possible effects of fish stocking programs on P. costatus populations in this region. In total, 1,017 specimens were sampled from 12 natural sites and a fish stocking program, and genotyped for high‐throughput sequencing at 8 microsatellite loci. The populations presented low genetic variability, with evidence of inbreeding and the presence of only four genetic pools; three pools were observed throughout the study region, and the fourth was exclusive to one area in the Paraopeba River. Additionally, we identified high unidirectional gene flow between regions, and a preferred migratory route between the Pará River and the upper portion of the São Francisco River. The fish stocking program succeeded in transposing the genetic pools from downstream to upstream of the Três Marias dam, but, regrettably, promoted genetic homogenization in the upper São Francisco River basin. Moreover, the data show the fragility of this species at the genetic level. This monitoring strategy could be a model for the development of conservation and management measures for migratory fish populations that are consumed by humans.

Studies in which population genetics tools are used to investigate gene flow and to identify migrants among the fish population are needed (e.g., Almada et al., 2017). Moreover, assessment of the genetic diversity status of fish species downstream and upstream of hydropower dams to detect possible effects of the dams and extensive fish stocking is an important step for conservation (Cooper et al., 2017;Savary et al., 2017;Vera-Escalona, Senthivasan, Habit, & Ruzzante, 2018) The São Francisco River, the fourth largest river in South America, is the longest river running entirely in Brazilian territory and an important source of hydropower, with six successive hydropower dams placed across its 2,900-km extent. The upper portion of the São Francisco River basin is an important fishing site where multiple migratory species find favorable conditions for gonadal maturation and spawning (Sato, Bazzoli, Rizzo, Boschi, & Miranda, 2005). Since its construction in 1961, the TM hydropower dam has greatly affected natural migration and therefore the populations of migratory fish species, including Prochilodus costatus (Valenciennes 1850) (Agostinho et al., 2008;Lopes, Alves, Peressin, & Pompeu, 2018). This endemic fish species is a long-distance migrator (Sato & Godinho, 2003), traveling up to 232 km (linear home range) during the reproductive season (Alves, 2007). P. costatus is an interesting study model because its downstream and upstream populations are separated by the TM dam, which impedes connection between the downstream and upstream river fragments, as it is not equipped with a translocation mechanism.
In this study, we evaluated the spatial distribution of genetic properties such as genetic diversity, population structure, and estimated gene flow (migration) of the migratory fish P. costatus from the upper São Francisco River basin. In addition, we investigated the possible effects of fish stocking programs on P. costatus populations in this region. This study is the first of its kind to examine a Neotropical migratory fish, with 1,017 individuals sampled in the main rivers of the region: Abaeté, São Francisco, Pará, and Paraopeba. Here, the term "fish stocking program" refers to the artificial translocation of downstream fish upstream of a dam (Sugunan, 1997). "Migration" refers to the natural migration of fish during the spawning season (Lucas & Baras, 2008). We used an innovative next-generation sequencing (NGS) technique to genotype microsatellite sequences (simple sequence repeats), and genetic population approaches to characterize genetic homogenization and gene flow and to detect migration.  Figure 1), at 12 sites distributed along four main rivers that were grouped by region: São Francisco River upstream (SFU; sites 1 and 2), Pará River section 1 (PA1; sites 3-5), Pará River section 2 (PA2; site 6), Paraopeba River (PAO; sites 7-9), São Francisco River downstream (SFD; site 10), Abaeté River (ABA; site 11), and the Codevasf fishing stocking facility (COD; site 12). The samples were grouped by region due to the lack of significant intraregional differences (ϴst < 0.05). All individuals were captured using cast nets and gillnets, and small pieces of their caudal fins were cut and stored in 70% ethanol. All individuals were taxonomically identified based on Britski, Sato, and Rosa (1984) and then returned to the river. In the laboratory, fragments (12.5 mm 2 ) of all caudal fin samples were placed into 96-well plates to start the genomic DNA extraction protocol (detailed in Pimentel et al., 2018). Sampling was conducted as permitted by permanent collecting licenses issued by the Brazilian Instituto Chico Mendes de Conservação da Biodiversidade (protocol no. 57204-1) and the Instituto Estadual de Florestas (protocol no. 014.007 2017).

| Genotyping and treatment of data
The NGS approach was chosen in this study due to its high performance with large numbers of samples, low cost-benefit ratio, and rapid data acquisition. NGS genotyping and treatment of data were performed following Pimentel et al. (2018). Briefly, samples were genotyped using a pseudomultiplex strategy for eight microsatellite loci with tri-or tetra-nucleotide repeats: ProC10, ProC18, ProC22, ProC36, ProC37, ProC44, ProC48, and ProC49 (Dryad Repository: https://doi.org/10.5061/dryad.tqjq2 bvtq). All primers used contain overhang adapters compatible with the Illumina sequencing system Raw reads with Phred < 30 and length < 75 pb were removed using Prinseq-Lite (Schmieder & Edwards, 2011). To obtain correspondents to specific microsatellites, the filtered reads were mapped against reference sequences of the eight microsatellite loci (GenBank accession numbers Proc10 MG456705, Proc18 MG456707, Proc22 MG456708, Proc36 MG456709, Proc37 MG456710, Proc44 MG456712, Proc48 MG456715, and Proc49 MG456716) using the Bowtie2 software (Langmead & Steven, 2012), which generated mapping files in BAM format. After quality filtering, each obtained read was analyzed directly without the need to create contigs.
The alignments obtained with Bowtie2 were analyzed using GATK (McKenna et al., 2010) to identify those most likely to be true alleles.
This package requires the mapping files obtained with Bowtie2 and a variant call format (VCF) file that contains all possible variants (simple sequence repeats, single nucleotide polymorphisms, and indels) identified in a given region for the species under study. This VCF file was created using the SAMtools package (Li et al., 2009), which requires the reference sequences and Bowtie2 outputs to identify all possible variants of each individual at the eight loci. The resulting VCF file for each individual was concatenated in a single archive and provided as an input file for GATK (McKenna et al., 2010). The RealignTargetCreator and IndelRealigner tools of the GATK package were used to confirm the alignments obtained with Bowtie2 based on the variants obtained by SAMtools, with the elimination of low-confidence reads from each Bowtie2 alignment.
The alleles at each locus were identified using the RepeatSeq software (Highnam et al., 2012) with the M2 parameter (the minimum mapping quality required, determined with Bowtie2 and confirmed with GATK). RepeatSeq inputs are the reference sequences, the validated alignment file obtained from GATK, and the.regions file that contains the location and motif repeat sequence out of eight loci. Thus, we created a.regions text file containing for each microsatellite the start and end positions of the repeat motif relative to the reference sequence, as well as the repeat motif sequence. Based on this information, RepeatSeq identified the motif repeat and defined the genotypes for each individual and locus according to the motif repeat length. For example, for an individual with the sequence 5′-ATTATTATTATTATTATTATT-3′ at locus Pcos10, the genotype in all mapped reads in this region is defined as 21/21, that is, homozygotic.
The GenotypeMicrosat.pl Perl script was developed to filter the genotyping results from RepeatSeq. We need to establish a minimum number of reads to infer the genotype. As individuals are diploids, only two alleles are possible and we establish whether they are homozygous or heterozygous according to the number of reads.
Among the amplicons of a locus, if a given allele has more than 80% of reads, we consider that the individual is homozygous for that allele. If the allele with the largest number of reads does not exceed 80%, we consider the second allele, and if it encompasses ≥20% of the total reads, we consider the individual to be heterozygous. Rosa, Salvador, Bialetzki, & Santos, 2017; https://doi.org/10.1007/s0033 5-016-9670-7) established a minimum of two reads to confirm the less frequent allele. In our work, we set the minimum depth at 10 reads, 20% of which would give at least two reads confirming the less frequent allele. This strategy was developed to avoid false positives. Loci that did not fulfill these criteria were identified as NA. The output of the script was a worksheet (in.xls format) containing the distribution of alleles per locus per individual.

| Genetic diversity and population structure analyses
To evaluate the spatial genetic population structure for P. costatus using the GenALEx 6.5 software (Peakall & Smouse, 2006). The pairwise genetic population structure was evaluated using the F index through ϴst (Weir & Cockerham, 1984), using GenALEx. The GenALEx software was also used to estimate whether loci were in Hardy-Weinberg equilibrium, and to identify linkage disequilibrium and determine the proportion of null alleles at each locus through the EM algorithm (Dempster, Laird, & Rubin, 1977), estimated using FreeNA (Chapuis & Estoup, 2006). (1 and 2) São Francisco River upstream, (3) Itapecirica River, (4 and 6) Pará River, (5) Lambari River, (7-9) Paraopeba River, (10) São Francisco River downstream, (11) Abaeté River, and (12) Codevasf fish stocking site. More details on the sampling sites are provided in Table 1. This sampling area contains 16 hydroelectric undertakings and 1 thermoelectric undertaking. Only three undertakings have fish passways: a trap-and-truck fish translocation system at the Retiro Baixo hydroelectric power plant, and ladders at the Pitangui central hydraulic generator and Igarapé thermoelectric power plant

| Migration and gene flow analysis and mixed genotypes
We used the diveRsity R package (Keenan, McGinnity, Cross, Crozier, & Prodöhl, 2013) with the divMigrate function and 0.35 filter threshold based on genetic indices to obtain estimates of migration and gene flow among the P. costatus populations sampled in the ABA, SFD, SFU, PA1, PA2, and PAO regions. We also estimated possible migration and gene flow for COD samples compared with those from the other regions to determine whether fish stocking was influencing the estimates. This method is known to detect asymmetric migration, that is, migration occurring at a significantly higher rate in one direction, according to the allele frequency data. The D (Crawford, 2010), Nm (Wright, 1943), and G (Nei, 1973) genetic indices were tested against our dataset.
The specimen with the highest probability of assignment from DAPC was considered the most likely source of the assigned genotype. Furthermore, individuals with mixed genotypes across the study populations were counted; all specimens identified at any site presenting more than one genetic pool were considered to have mixed genotypes. The genetic indices showed similar low genetic variability for the 12 sampled sites (p < .05). Overall, the Ho was low and significantly lower than the He, for all analyzed samples and loci. The lowest Ho was found for ABA and the highest was found for PA2 (Table 2). In addition, 77.5% of the analyzed loci indicated that the population was not in Hardy-Weinberg equilibrium, and the proportion of null alleles was not significant (Appendix S1.4). Furthermore, the Fis ranged from 0.16 to 0.36 at the 12 sites, indicating inbreeding (Table 2).

| Genetic population structure
Pairwise θ ST analysis showed significant differences between some regions. Notably, the PAO values stood out in comparison with the other regions (Table 3). In the upstream regions, PA2 and PA1 pre- and SFD (49%) regions, and genetic pool 4 was found predominantly in the PAO region (86%; Figure 2).

| Migration, gene flow, and mixed genotypes
DAPC analysis detected 256 individuals with mixed genotypes from distinct sites/regions, which were categorized as such because they had >50% probability of grouping with a different population ( Figure 3). These individuals were probably early (September)/late (November) migrants (Appendix S1) or derived from a recent (same sampling year) previous generation, as juveniles take up to 2 years to reach reproductive maturity.
The migration and gene flow estimates detected connectivity among all regions (α = .05). However, PAO showed low connectivity with the other study regions, despite the detection of few migrants. Furthermore, we identified unidirectional gene flow among the sampled regions, independently of the genetic index inferred (Figure 4).
Based on the results, the detection of migrants between sampled sites corroborates the findings of the estimated gene flow analysis; however, these migrants did not likely contribute significantly to bidirectional gene flow, as the effective population sizes were low. We detected low gene flow (p < .5) between the PA1 and SFU regions, and moderate gene flow between the SFD and the SFU regions, as  (Wright, 1943).

| D ISCUSS I ON
In the present study, we investigated the spatial distribution of ge- which was present at high frequency (>90%). Environmental factors can trigger spawning seasons that affect recruitment area selection (Rosa et al., 2017), and likely have changed allele frequencies at the population level (see more details in Appendix S2).
The P. costatus population of the Paraopeba River also differs from the populations at other sampling sites in the low frequencies of gene pools 1 and 3, and the presence of only four SFU and PA1 migrants, suggesting that gene flow occurred only in the past or that current gene flow is very limited. One possible explanation is the existence of the TM dam, which may represent the greatest physical barrier to the natural migration of fish (Lopes et al., 2018). Another possibility is that the genetic profile of the population sampled in PAO was also influenced by the lack of Codevasf fish stocking ac-  (Carvalho-Costa et al., 2008;Melo, Sato, Foresti, & Oliveira, 2013) and to those of other Neotropical migratory fishes (Sanchez et al., 2012. Several studies of repopulation with native salmonids in various parts of the world have shown that stocking has little impact on the genetic makeups of populations (e.g., Allendorf & Seeb, 2000;Eldridge & Killebrew, 2008;Gow, Tamkee, Heggenes, Wilson, & Taylor, 2011;Heggenes, Beere, Tamkee, & Taylor, 2006;Small, Currens, Johnson, Frye, & VonBargen, 2009;Stelkens, Schmid, Selz, & Seehausen, 2009). In contrast, other works have suggested that intense fish stocking can lead to increased genetic diversity (e.g., Ferreira et al., 2017;Marie, Bernatchez, & Garant, 2010;Valiquette, Perrier, Thibault, & Bernatchez, 2014) and the introduction of genes, thereby effecting the genetic rescue of endangered populations by alleviating inbreeding depression and boosting fitness (Ingvarsson, 2001;Tallmon, Luikart, & Waples, 2004).
In addition to the intrinsic characteristics of migratory species that favor high genetic diversity, other factors, such as connectivity between areas and the storage process, may directly influence genetic diversity (Ferreira et al., 2017). For example, Prochilodus sp.
can migrate up to 237 km (Alves, 2007) and may recruit individuals from different regions to complete the reproductive process. Thus, we suggest that the matrices used to repopulate the region under the influence of the TM dam be further investigated in the search for the one most likely to guarantee greater genetic diversity of this important migratory fish.
Genetic studies of migratory fish species, especially those consumed by humans, are extremely important in identifying the fragility of these species and helping to develop conservation measures to reduce the human impacts on these populations (Eldridge & Killebrew, 2008;Heggenes et al., 2006;Prado et al., 2017). The identification of priority sites and proper means of their environmental protection may help to maintain the genetic diversity of migratory fish populations. Our results suggest that conservation sites should include the SFU region, which has three of the four sets of gene pools detected for P. costatus, and PAO, which has a unique genetic profile that probably represents an adaptation of P. costatus to this region.

CO N FLI C T O F I NTE R E S T
The authors declare that the present research was conducted in the absence of any commercial relationship and that they have no conflict of interest.