Estimating the inbreeding level and genetic relatedness in an isolated population of critically endangered Sichuan taimen (Hucho Bleekeri) using genome‐wide SNP markers

Abstract Sichuan taimen (Hucho bleekeri) is critically endangered fish listed in The Red List of Threatened Species compiled by the International Union for Conservation of Nature (IUCN). Specific locus amplified fragment sequencing (SLAF‐seq)‐based genotyping was performed for Sichuan taimen with 43 yearling individuals from three locations in Taibai River (a tributary of Yangtze River) that has been sequestered from its access to the ocean for more than 30 years since late 1980s. Applying the inbreeding level and genetic relatedness estimation using 15,396 genome‐wide SNP markers, we found that the inbreeding level of this whole isolated population was at a low level (2.6 × 10−3 ± 0.079), and the means of coancestry coefficients within and between the three sampling locations were all very low (close to 0), too. Genomic differentiation was negatively correlated with the geographical distances between the sampling locations (p < .001), and the 43 individuals could be considered as genetically independent two groups. The low levels of genomic inbreeding and relatedness indicated a relatively large number of sexually mature individuals were involved in reproduction in Taibai River. This study suggested a genomic‐relatedness‐guided breeding and conservation strategy for wild fish species without pedigree information records.


| INTRODUC TI ON
Sichuan taimen (Hucho bleekeri) is a large and fierce carnivorous fish species, with a maximum length of 2 m (Hu, Wang, Cao, & Xiong, 2008). It lives at the top of the food chain and has important value for maintaining the ecological balance of the habitat waters. Some biogeographical scholars have speculated that the H. bleekeri is a remnant fish that invaded from the north to the south during the Quaternary glacial period and has important scientific value in many aspects such as animal geography, paleoecology, fish system, and climate change (Song, 2012). At present, the cold-water fish farming industry has realized commercial farming of Hucho taimen in China (Xu et al., 2007). Hucho bleekeri, as an important native germplasm resource of Hucho, has huge market development potential in aquaculture in China.
Hucho bleekeri, a Chinese national secondary protected animal, is currently listed as a critically endangered species in The Red List of Threatened Species compiled by the International Union for Conservation of Nature (IUCN; Hu et al., 2008). Hucho bleekeri was originally distributed only in the upper tributaries of Yangtze River cess to the ocean for more than 30 years since late 1980s. Applying the inbreeding level and genetic relatedness estimation using 15,396 genome-wide SNP markers, we found that the inbreeding level of this whole isolated population was at a low level (2.6 × 10 −3 ± 0.079), and the means of coancestry coefficients within and between the three sampling locations were all very low (close to 0), too. Genomic differentiation was negatively correlated with the geographical distances between the sampling locations (p < .001), and the 43 individuals could be considered as genetically independent two groups. The low levels of genomic inbreeding and relatedness indicated a relatively large number of sexually mature individuals were involved in reproduction in Taibai River. This study suggested a genomic-relatedness-guided breeding and conservation strategy for wild fish species without pedigree information records.

K E Y W O R D S
conservation genetics, genomic inbreeding, genomic relatedness, Hucho bleekeri with a distribution of about 5,000 km 2 recorded in 1960s, including the upper reaches of the Min and Qingyi rivers in Sichuan Province, the upper and middle reaches of the Dadu River in Sichuan and Qinghai provinces, and the Xushui and Taibai rivers located in the upper reaches of the Han River in Shaanxi Province, China. The habitats of this fish have been drastically reduced to <100 km 2 due to the fragmentation of habitats caused by hydropower development, industrial pollution, and illegal fishing, and there are only a few isolated populations in the wild in recent years (Song, 2012). The number of mature individuals of the species is thought to be ranged from 2,000 to 2,500 based on expert judgement, and the wild populations are estimated to be continuously rapidly decreasing and are expected to lose more than 20% adult individuals over the next two generation (Song, 2012). If humans do not interfere with the process of population decline, this species is very likely to fall into the extinction vortex and eventually becomes extinct.
Hucho bleekeri is one of the most threatened fish species, and understanding genetic diversity including inbreeding levels and genetic relatedness in the fragmented populations has important practical significance for the genetic management of ex situ conservation, proliferation, and reintroduction to original habitats of this species (Frankham et al., 2017). However, for many years, it has been difficult for researchers to collect H. bleekeri sample in the wild. Moreover, these wild individuals do not have pedigree records and cannot be traced more deeply. As a result, we have little knowledge of the genetic structure of this species in the wild (Qi, Chao, Yang, Shen, Tang, 2009;Shen, Tang, & Li, 2006;Wang et al., 2011). Exhilaratingly, in 2012, a population with dozens of adult fish was discovered in Taibai River, Shaanxi Province, giving researchers the opportunity to analyze the genetic structure (Du et al., 2014). In previous studies using microsatellites and mitochondria markers, no evidence of inbreeding was detected in this population and the genetic structure estimation suggested that the wild H. bleekeri population in Taibai River expanded a long time ago, but had suffered great losses in recent years (Wang et al., , 2015Zhang, Wei, Du, & Li, 2016).
Microsatellite-based research helps us learn the genetic information of this population, but the density is as low as a dozen microsatellite markers, and this small number markers cannot be representative of the genome structural features, making it impossible for researchers to accurately estimate the inbreeding level and genetic relatedness parameters. Traditional methods for accurate estimates of the above parameters require pedigree information which are unavailable for wild fish populations such as H. bleekeri (Falconer & Mackay, 1996). In recent decades, with the widely popularization of high-throughput sequencing technologies, many statistical methods have been proposed to estimate inbreeding and genetic parameters using genome-wide SNP markers (Da, Wang, Wang, & Hu, 2014;Goddard & Hayes, 2011;Meuwissen, Hayes, & Goddard, 2001;VanRaden, 2008). However, the estimates of the inbreeding level and genetic relatedness using genome-wide SNP markers have been unavailable in wild H. bleekeri populations.
In the present study, specific locus amplified fragment sequencing (SLAF-seq)-based genotyping (Sun et al., 2013) was performed for 43 individuals captured from three locations in the Taibai River, then, the inbreeding level and genetic relatedness were estimated, and phylogenetic tree drawing and population structure analysis were also performed. The results could aid in understanding the genomic inbreeding and relatedness of H. bleekeri individuals in Taibai River and potentially provide an opportunity to the genetic management of ex situ conservation and the proliferation and releasing to original habitats.

| Genetic samples of Hucho bleekeri
Taibai River is a tributary of the Bao River and then flows into the Han River (the largest Yangtze River tributary). The river is 63 km long, with a drainage area of 1,349 km 2 and an average annual flow of 17 m 3 /s, and the natural drop is 1,588 m (Wang, Wang, Luo, 1997). The sample with 43 yearling H. bleekeri (Figure 1) was collected from the Taibai River on July 2018. Taibai River has been sequestered from its access to the Bao River for more than 30 years by a hydropower dam without fish passage since late 1980s. The Taibaihe Town is affiliated to Taibai County, Baoji City, Shaanxi Province, and it is located in the southwest of Taibai County and is named after the Taibai River. It is between 107°03′ and 107°46′40″ east longitude, 33°38′13″ and 34°09′55″ north latitude, 107 km away from Taibai County. Due to inconvenient transportation, samples can only be collected along the river on foot. Therefore, three easy-to-reach sampling locations were selected in the upper reaches of

| Genotyping of the Hucho bleekeri individuals
The total genomic DNA of H. bleekeri was extracted using a standard phenol-chloroform protocol, and DNA quality was checked by NanoDrop 2000C spectrophotometer (Thermo Scientific) and used for The development of SNP markers was performed with the H. hucho genome as a reference genome, too. The SLAF-seq reads were aligned to the reference genome using BWA software (Li & Durbin, 2010), and the SNPs were developed via both GATK and SAMtools software (Li et al., 2009;McKenna et al., 2010), and SNPs discovered by both software were used for genetic analysis. The SNP marker intersection was used as the final reliable SNP marker dataset. Vcftools (Danecek et al., 2011) was used to filter SNPs meeting all four criteria: QUAL value ≥30, minor allele frequency (MAF) ≥5%, the significance level of Hardy-Weinberg equilibrium (HWE) test ≥0.01, and there were no more than 4 individuals with missing genotypes at each locus among all the 43 individuals.

F I G U R E 2
Three sampling locations for the 43 individuals of Hucho bleekeri collected in Taibai River. Blue circle represents Taibaihe town in Baoji City, Shaanxi Province, China. Each red circle represents a sampling location. The black rectangle represents a small hydropower station on Taibai River. Green block of the thumbnail map at lower right corner represents Qinling Mountains. Location A is about 5 km upstream from Taibaihe town. Location B is about 17 km upstream from Taibaihe town. Location C is about 25 km upstream from Taibaihe town. The small hydropower station is about 10 km downstream from Taibaihe town

| Genetic model for estimating genomic relatedness
The genomic estimates of additive and dominance relatedness were obtained using a mixed model based on the quantitative genetics model (Falconer & Mackay, 1996) that partitions a genotypic value into its breeding value and dominance deviation .
Under the assumption of Hardy-Weinberg equilibrium (HWE), the mixed model to implement the traditional quantitative genetics model be expressed as: where μ was fixed effect, a = M α α was genomic breeding value and d = M δ δ genomic dominance deviations, α was gene additive effects of SNPs, and δ was dominance effects of SNPs. The variance matrices were as follows: where 2 was additive variance, and 2 was dominance variance. A g was genomic additive relationship matrix, and D g was genomic dominance relationship matrix.
Genomic relatedness between individuals can be described by genomic pairwise coancestry coefficients (F ij = θ) and genomic pairwise dominance coefficient (D ij ). The genomic coancestry coefficient (θ) between individuals i and j was estimated based on the half off-diagonal value in A g and dominance coefficient (D ij ) between individuals i and j based on the off-diagonal value in D g . Additional methods of genomic relatedness and similarity were selected as follows: genomic identity by descent (IBD ij ) to indicate two homologous alleles from a common ancestor and genomic identity by state (IBS ij ) to the server as a measure of common alleles shared by two individuals.
Multidimensional scaling (MDS) was analyzed to obtain genomic relationships between individuals. MDS is a method similar to principal component analysis (PCA), and the MDS and PCA dimensions are highly correlated and should generate very similar 3D result views. The reason for using the MDS method instead of the PCA method in this study is that these two methods have different strategies for data compression processing (Garbe, Prakapenka, Tan, & Da, 2016). The MDS method uses the largest number of individuals and the SNPs shared among individuals. In contrast, the PCA method uses the largest number of SNPs and filters some individuals with large number of missing SNPs. PCA method is used more widely when the sample size is large. In the present study, there are only 43 individuals, and all the individuals should be used in the genetic analyses. Therefore, MDS method is more suitable than PCA for this study.
A g and D g were calculated using a definition of genomic correlation , which was a within-SNP standardization of each SNP using its own SNP variance, being the Definition VI implemented by GVCBLUP program . The genomic inbreeding coefficients of individuals (f i = F), IBD, IBS, and MDS were calculated by using PLINK software (Purcell et al., 2007).

| Phylogenetic and genetic structure analysis
A maximum likelihood (ML) tree was constructed using the genome-wide SNP markers with the MEGA X software (Kumar, Stecher, Li, Knyaz, & Tamura, 2018) with the Tamura-Nei model (Tamura & Nei, 1993), and the node support for ML trees was assessed with 1,000 bootstrap pseudoreplicates (Felsenstein, 1985). The genetic structure of the 43 individuals was analyzed with the genome-wide SNP markers using ADMIXTURE software version 1.3 (Alexander, Novembre, & Lange, 2009). Prior population information was analyzed with genotype frequencies, using a number of clusters (K) ranging from 1 to 10, and ADMIXTURE's crossvalidation was used to choose the correct value for K.

| The differences of genomic relatedness between individuals from the three sampling locations
The differences between H. bleekeri populations in genomic inbreeding coefficients, additive and dominance relationships, IBD, and IBS were tested using the R software. The statistical model was y = μ + population + e, where y = the observation, μ = common mean, population = effect of a population from the three sampling locations, and e = random residual. Three populations of A, B, and C were included in the statistical model for genomic inbreeding coefficients.

| Statistics and evaluation of sequencing data
A total of 260.78 M paired-end reads were generated for 43 fish by SLAF sequencing, and the average Q30 bases ratio was 94.85%, and the average GC content was 43.13%. After low-quality reads, adaptors, and barcode sequences were removed, a total of 1,216,579 SLAF tags were obtained. Of the total SLAF tags, 220,703 were polymorphic and the average sequencing depth was 11.88-fold. A total of 468,425 SNPs markers were obtained, among which there were 15,396 SNPs passed the SNP filtration criteria in this study.

| Genomic inbreeding level estimates
The average genomic inbreeding coefficients (F) showed that the inbreeding level of this whole isolated population is at a low level (F = 2.6 × 10 −3 ± 0.079), and population A had the highest genomic inbreeding coefficients, with average inbreeding coefficient of F = 0.058 ± 0.061 (Table 1), followed by C and B populations, with negative value close to zero (Table 1, Figure 3). Only the A population had significantly higher F than the B population (p < .05). No significant differences were observed between other two pairs, neither A and C, nor B and C (Table 1).
(1) y = X + Za + Zd + e = X + ZM + ZM + e In addition, due to the fact that the inbreeding in one individual is equal to the coancestry between parents, some individuals have inbreeding close to the expectation of mating among first cousins   Table 2).

| Genomic pairwise relatedness and similarity estimates within and between populations
The means of genomic coancestry coefficient between different populations had negative values, with −0.019 for between populations B and C, −0.046 for between populations A and C, and −0.026 for between populations A and B (Table 3). Means of θ within each population were higher than the means between any two populations from the three sampling locations (Figure 7). A three-dimensional view of the first three MDS components for the 43 individuals was presented in Figure 8, population A was separate from populations B and C, and the individuals from B and C populations cannot be distinguished independently. The result of regression analysis showed that θ was negatively correlated with the geographical distances between the sampling locations (p < .001; Figure 9).

| Phylogenetic tree and genetic structure mapping
All the 15,396 SNPs were used to construct maximum likeli-  Note: Genomic inbreeding coefficients were analyzed using 15,396 SNP set by PLINK. A is the population of location A. B is the population of location B. C is the population of location C.
Within a row with no common lowercase superscript (a, b) differ significantly in multiple comparisons (p < .05). Within a row with no common lowercase superscript (a, b, c) differ significantly in multiple comparisons (p < .05).

TA B L E 1 Average genomic inbreeding coefficients by population
Abbreviations: D, genomic pairwise dominance coefficient; IBD, probability of identify by descent; IBS, probability of identify by state; θ, genomic pairwise coancestry coefficients.
The low value of genomic coancestry coefficient and IBD for interpopulations, along with MDS and ML phylogenetic tree, results visually displayed population A was relatively genetic independent from the populations B and C in Taibai River. Meanwhile, negative correlation between coancestry coefficient and geographical distances was observed for the three sampling locations connected to each other in the same river suggested that the swimming ability of juvenile H. bleekeri living in Taibai River did not seem to be able to TA B L E 3 Average genomic pairwise relatedness and similarity coefficient between different populations

Between different populations N of pairs θ (Mean ± SD) IBD (Mean ± SD) D (Mean ± SD) IBS (Mean ± SD)
A-B 220 −0.026 a ± 0.019 0.015 a ± 0.032 0.020 a ± 0.021 0.728 a ± 0.013 A-C 132 −0.046 b ± 0.016 0.004 b ± 0.018 0.016 a ± 0.022 0.715 b ± 0.012 B-C 240 −0.019 c ± 0.018 0.020 a ± 0.041 0.019 a ± 0.020 0.726 a ± 0.011 Note: A-B represents all pairs between individuals of population A and individuals of population B. A-C represents all pairs between individuals of population A and individuals of population C. B-C represents all pairs between individuals of population B and individuals of population C.
Within a row with no common lowercase superscript (a, b, c) differ significantly in multiple comparisons (p < .05).
Abbreviations: D, genomic pairwise dominance coefficient; IBD, probability of identify by descent; IBS, probability of identify by state; θ, genomic pairwise coancestry coefficients. Slightly negative genomic inbreeding coefficient and pairwise relatedness estimates within and between populations were common in this study. In particular, average genomic pairwise coancestry coefficients between different populations were negative. In general, these small negative estimates were considered as "unrelated" (Ackerman et al., 2017;Garbe et al., 2016). However, it also suggested that there may be some bias in the parameter estimation (Zhou, Vales, Wang, & Zhang, 2016).

F I G U R E 7
The reasons for this phenomenon were that the molecular markers obtained by the SLAF sequencing method were not uniformly covered on the genome, and higher-density and coverage genomic molecular markers can be obtained using higher-throughput sequencing methods such as resequencing to increase the statistical power (Ackerman et al., 2017;Da et al., 2014). Furthermore, including the focal individual in the estimation of the allele frequencies may be the cause of this bias (Ackerman et al., 2017), too. Nevertheless, the robustness of these estimates needs to be re-evaluated when a better reference genome of H. bleekeri but not H. hucho becomes available.
In summary, the approach of genomic inbreeding and similarity measures allows the estimation of inbreeding and coancestry of the H. bleekeri without pedigree records. The low levels of genomic inbreeding and relatedness of the H. bleekeri individuals in Taibai River suggested a relatively large number of sexually mature individuals were involved in reproduction. Genomic differentiation between the three sampling locations was negatively correlated with the geographical distances, and the fish in location A could be considered as genetically independent from the other two locations. This study suggested a genomic-relatedness-guided breeding and conservation strategy for wild fish species without pedigree information records.

This study was supported by the Central Public-interest Scientific
Institution Basal Research Fund (HSY201808M, 2019XT0605) and National Natural Science Foundation of China (31502157).

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
GH and JY: conceived and designed the experiments. YZ collected samples. PL, YZ, GR, and GH performed the experiments. GH and PL analyzed the data. GH and PL wrote the paper.

DATA AVA I L A B I L I T Y S TAT E M E N T
All the raw sequence data were submitted to BIG database (The BIG Data Center at Beijing Institute of Genomics) and are publicly available (Accession numbers: PRJCA001798). All descriptive statistics for the sequencing dataset, as well as genetic parameter estimates of all individuals, genomic relatedness, and similarity measures for each individual pair within and between three populations, were stored in Dryad (https ://doi.org/10.5061/dryad.bg79c np79).