Last exit before the brink: Conservation genomics of the Cambodian population of the critically endangered southern river terrapin

Abstract The southern river terrapin, Batagur affinis is one of the world's 25 most endangered freshwater turtle species. The major portion of the global population is currently found in peninsular Malaysia, with the only remnant Indochinese population in southern Cambodia. For more than a decade, wild nests in this remnant Cambodian population have been fenced and hatchlings reared in captivity. Here we amplified 10 microsatellite markers from all 136 captive individuals, obtained 2,658 presumably unlinked and neutral single nucleotide polymorphisms from 72 samples with ddRAD‐seq, and amplified 784 bp of mtDNA from 50 samples. Our results reveal that the last Indochinese population comprised only four kinship groups as of 2012, with all offspring sired from <10 individuals in the wild. We demonstrate an obvious decrease in genetic contributions of breeders in the wild from 2006–2012 and identify high‐value breeders instrumental for ex‐situ management of the contemporary genetic stock of the species.


| INTRODUC TI ON
The impact of genetic factors on a species' extinction risk can be severe. Decreased genetic diversity can be translated into a loss of long-term evolutionary potential, and inbreeding depression is one of the most pervasive extinction mechanisms reducing reproductive success and survival in populations (Frankham, 2005;Reed & Frankham, 2003). As most endangered species or populations undergo steep declines, the loss of genetic diversity in a severely depleted population has the potential to expedite the extinction process.
As wild populations decrease in size to critical levels, ex-situ conservation programmes and reintroductions of captive animals have become the most widespread measures to protect endangered species (Primack, 1993). The major goal of assurance colony management (compilation of captive founders) and captive breeding programmes is to produce self-sustaining captive stock that resembles its bottlenecked wild counterparts as closely as possible both in behavior and genetic profile (Frankham, 2008;Frankham, Briscoe, & Ballou, 2010;Robert, 2009). Consequently, genetic management of captive stock aims for minimizing kinship between potential founders to preserve maximum genetic variability and minimize inbreeding in captive bred 2015), making the Cambodian population the only remaining genetic stock across all of Indochina.
The small numbers today reflect a combination of overharvesting and excessive egg collection for consumption, pollution, and the loss of habitat from sand mining (Iskandar & Erdelen, 2006;Moll, 1990;Moll & Moll, 2000;Platt et al., 2008;Sharma & Tisen, 2000). Since 2006, hatchlings have been collected from nesting sites along the river ( Figure 2a) and kept at a sanctuary that is managed by the Wildlife Conservation Society and the Cambodian Fisheries Administration.
The turtle conservation program in Cambodia includes in-situ and ex-situ conservation measures such as fencing wild nests to protect hatchlings from potential predators, protecting remaining nesting sites along the Sre Ambel River, transferring new hatchlings to secure sites, and rearing of hatchlings in captivity (Moll et al., 2015).
One of the goals of the conservation program of the Sre Ambel population is to establish assurance colonies given the extremely low number of nests per year and the ongoing threats to their survival.
We sampled 136 individuals in captivity, which equals ~30% of the remaining Indochinese population of B. a edwardmolli, given critically low numbers of individuals left in the wild. We utilized genome-wide single nucleotide polymorphisms (SNPs) obtained via double digest restriction associated DNA sequencing (ddRAD-seq) (Peterson, Weber, Kay, Fisher, & Hoekstra, 2012). Additionally, we amplified 10 microsatellite markers from 136 samples, and a fragment of 784 bp of mtDNA (including the partial ND3 gene and partial control region) from 50 samples. Our aims were (a) to estimate the number of breeders producing nests in the wild population from 2006-2012; (b) to infer an optimal set of potential breeders in captivity sired by wild breeders for maximizing genetic diversity and minimizing inbreeding in the assurance colony so that the deleterious and potentially rapid effects of inbreeding depression will be prevented in the future genetic stock of this relict population.
For subsequent analysis, normalized allele sizes were used.
The presence of null alleles was checked with MICRO-CHECKER (Van Oosterhout, Hutchinson, Wills, & Shipley, 2004). Tests for deviation from Hardy-Weinberg equilibrium and linkage disequilibrium for all pairs of loci were performed using GENEPOP'007 (Rousset, 2008). Lastly a test of neutrality was performed with BayeScan v.2.1 (Foll & Gaggiotti, 2008) with default settings.

| ddRAD-seq library preparation and sequencing
In addition to microsatellite genotyping, genome-wide SNPs were mined by following a ddRAD-seq approach (Peterson et al., 2012). In

| Genome-wide SNP generation and analysis
Data quality of the sequences was assessed using FastQC v0.11.3 (Andrews, 2010). Raw data were demultiplexed in STACKS v1.44 (Catchen, Hohenlohe, Bassham, Amores, & Cresko, 2013) using process_radtags. Sequences with a raw phred score below 20 were discarded. Reads with uncalled bases or low-quality scores were removed, and reads were truncated to 140 bases to eliminate potential sequencing error occurring at the ends of reads.
De novo and reference-based locus assemblies were conducted separately by running denovo_map.pl and ref_map.pl scripts within STACKS (Catchen et al., 2013), respectively. The former script constructs loci for each sample, whereas the latter one uses previously constructed loci from reference genome alignments. Both programs construct catalog loci and match individual loci to the catalog.
For de novo assembly only the forward read was analyzed and no more than one SNP per locus was admitted to preclude linkage bias. Minimum depth of coverage (m), minimum mismatches in creating individual stacks (M), and mismatches in secondary reads (n) were set to 7, 2, and 1, respectively. Highly repetitive RadTags were removed to avoid inclusion of paralogs. The resultant SNP set was used in subsequent analyses as SNP set A.
For reference-based assembly, forward and reverse sequence reads from individuals were aligned to the western painted turtle (C. picta bellii) genome using the BWA-MEM alignment algorithm (Li, 2013). Reads that did not properly pair in mapping or that had a phred score <20 were discarded. All alignments were sorted by leftmost coordinates using the SAMTOOLS software (Li et al., 2009).
The ref_map.pl program of STACKS (Catchen et al., 2013) was run with minimum depth of coverage set to 7, and the resultant SNP set was used in subsequent analyses as SNP set B.
Once catalog loci and matches were identified with both de-novo_map and ref_map approaches, the "populations" program in STACKS (Catchen et al., 2013) was run allowing for 20% missing data (r). PLINK v1.07 (Purcell et al., 2007) was used to remove individuals with >10% missing loci. For both sets A and B, three options for pruning SNPs with minor allele frequencies (MAF) <0.05, <0.01, 0 were applied, resulting in six final datasets (A1-3, B1-3). Neutrality tests were conducted for SNPs across all datasets using BayeScan v2.1 (Foll & Gaggiotti, 2008) under default settings. Also mean polymorphic information content (PIC) of the SNPs across all datasets was calculated with CERVUS 3.03 (Kalinowski, Taper, & Marshall, 2007). The dataset that provided the highest number of individuals with the highest mean PIC (SNP set A2, Table 2), including 72 individuals (40 males, 19 females, 13 unknown sex, see Table S1) with 2,658 SNPs (mean PIC = 0.28), was used for all subsequent analyses.

| Population structure
Our dataset was expected to violate one of the major assumptions of model-based population structuring: unrelatedness (see Anderson and Dunham (2008) and Rodriguez-Ramilo and Wang (2012)), because a small number of individuals remains in the wild, many of which are potentially closely related. Therefore, population structure TA B L E 1 Number of nests found between 2006-2012, their locations, number of hatchlings collected, their inferred parent sets, and network cluster assignments was analyzed via Principal Coordinate Analysis (PCoA) (Gower, 2015) as a multivariate method designed for clustering related individuals.
Pairwise PhiPT distance matrices were generated for both microsatellite and SNP datasets separately, and the patterns of genetic relationship within these matrices were visualized with PCoA as implemented in GenAlEx v.6.5 (Peakall & Smouse, 2006

| Sibship assignment and parental genotype reconstruction with SNP data
Sibling analysis and pedigree genotype reconstruction were performed using a nearly exhaustive run of the Full-Likelihood method in COLONY (Jones & Wang, 2010) with female and male polygamy allowed. A range of genotyping error rates was tested with 10%, 1%, and 0.1% across loci. The non-inbreeding model was selected and a sibship prior was not included. Parentage was accepted when the three error rate runs provided the same parent dyads with posterior probability values >90%. Inferred loci with three genotyping error rate runs were included in the parental genotypes when assigned likelihoods >0.90 (three distinct datasets for each parent). Loci with missing data and MAF >0.01 were removed by PLINK (Purcell et al., 2007).
To understand if the relatedness of the parent sets affects the number of genetic clusters obtained by SNP-and microsatellite-based data, average relatedness (Lynch & Ritland, 1999) between parent sets of each genetic cluster was calculated with COANCESTRY (Wang, 2011).
Specifically, we tested if PR between parent set combinations is significantly different from overall average relatedness of parent sets by performing 10,000 bootstraps in COANCESTRY (Wang, 2011).
To test if assigned full-sib families could have resulted from chance alone, the relatedness among all samples in a family group was calculated using the Wang estimator (Wang, 2002) in the R package Related (Pew et al., 2015). Samples within each group were then randomly shuffled 1,000 times using the grouprel function in Related and average expected relatedness values were compared with observed ones. The package was also utilized for the calculation of PR between inferred parental genotypes.

| MtDNA analyses
Fifty samples were randomly selected from the genetic clusters obtained from the above-mentioned population structure analyses. Given TA B L E 2 Number of single nucleotide polymorphisms (SNPs) remaining after minor allele frequency (MAF) pruning, mean polymorphic information content (PIC) and number of remaining individuals across all six datasets produced

| Microsatellite analyses
In total, 12 loci from 136 individuals were genotyped with overall 0.14% None of the loci significantly deviated from neutrality. Therefore, all 10 microsatellite loci were used for the population structure analysis.

| Genome-wide SNP generation and analyses
Samples from 129 individual turtles yielded a total of 598,211,204 bp of raw data. Six of these samples were immediately removed (5% of the total number of individuals), corresponding to <0.03% of all reads, due to poor data recovery (<250 kb

| Population structure
The PCoA performed with 10 microsatellite loci provided two loose genetic clusters ( Figure S2), whereas the one performed with 2,658 genome-wide SNPs yielded three distinct clusters and three individual samples that did not group into any of the clusters ( Figure S3).
The network analysis resulted in groupings of four distinct clusters and three ungrouped individuals instead (Figure 2b, see Table S1).

| Sibship assignment and parental genotype reconstruction with SNP data
Parent sets were successfully assigned for 95.9% of individuals via sibship assignment analysis with COLONY (Jones & Wang, 2010).
The three ungrouped samples in the genetic network had <0.9 posterior probability assignments, and were therefore excluded from further family assignment (ungrouped individuals in Figure 2b). A total of four full-sib families were identified, two of which shared one parent (half-sib). Each network cluster provided one full-sib family (Figure 2b), while SNP-based PCoA ( Figure S3) grouped the two half-sib families together.
By permuting the sample dataset, we found that the average relatedness within each full-sib group is significantly higher (p < 0.001) than a random distribution of relatedness values ( Figure S4).
Given that four full-sib families were identified, two of which shared one parent, a total of seven parental genotypes were reconstructed with COLONY (Jones & Wang, 2010). Despite having no nest location data, because every year except 2006 there was one nest collected from the nesting beaches (one mother for each cohort), we estimated that the current captive individuals were sired from four male and three female breeders. For these seven breeders, allelic states for 1,809 out of 2,658 loci were inferred with assigned likelihoods of >0.9. Genotyping error rates of 10%, 1%, and 0.1% were tested across loci followed by missing data and MAF filtering, resulting in 88, 126, and 146 SNPs among parent sets, respectively. For all three datasets, the average relatedness between parent sets of clusters 1 and 2 (n = 2 + 2 = 4) was insignificantly higher than the average relatedness across all parent sets, whereas the average relatedness between parent sets of clusters 3 and 4 (n = 3, because they share one parent) was shown to be significantly lower than across all sets at a 95% confidence level (Table 3).

| MtDNA sequencing
A total of 784 bp of mtDNA sequence yielded only one haplotype across all four groups and one ungrouped individual (white circle in Figure 2b). The other two ungrouped individuals (white triangles in Figure 2b) shared an alternative mtDNA haplotype that differed in two base pairs from the major haplotype, equal to 0.02% sequence divergence (GenBank accession numbers: MN069309 and MN069310).

| Performance of the different marker sets used in this study
We found four genetic clusters in our dataset spanning 72 indi-   (Table 1). Our pedigree analysis showed that between 2009

| Catastrophic drop in population size and diversity of Cambodian B. affinis in the wild
and 2012, a female breeder from 2006 started breeding with a male breeder, which is potentially newly started contributing to the gene TA B L E 3 The assigned parent sets of the network analysis clusters, their average relatedness values within each set and among the other parent sets that were calculated with three different genotyping error rates are shown we demonstrated that, only within 7 years, the Cambodian B. affinis population in the wild has undergone a massive decrease in population structure (Figure 2c) that highlights the extreme urgency for exsitu measures to help conserve this demographically and genetically impoverished population.

| Valuable breeders for the assurance colony
Our sample set mostly includes half-and full-sibs that were sired by <10 breeders in the wild. Consequently, any unrelated individuals to these families will be of highest importance for representing the remnant wild genetic variation within the captive population.
Therefore, three of the ungrouped individuals (two males, one female) that were merged with the main clusters of microsatellitebased analyses but represented in SNP-based clustering analyses were assigned as valuable breeders for the assurance colony. One of these males belongs to the earliest (2006)

| Applied conservation genomics of B. affinis
Southern river terrapins were reportedly common in the Mekong drainage including the Tonle Sap lake system until the late 19th century (Moll et al., 2015). They have been widely extirpated since, and only one known B. affinis population currently remains in Cambodia (Platt et al., 2008(Platt et al., , 2003. Given extensive efforts to collect all wild hatchlings from 2006-2012, we assume that the genetic diversity and structure uncovered in this study directly reflects actual genetic diversity of the entire remaining Indochinese population of this species. As previously observed in Batagur trivittata (Çilingir et al., 2017), the sister species of B. affinis, global G IT is smaller than zero (global G IT = −0.025 ± 0.005; individual inbreeding coefficients −0.225 to 0.27; Table S1). This could be interpreted as a lack of severe inbreeding, assuming our dataset represents only the first generation of a potentially long-lived turtle's population (estimated age of maturity is 25 years; Moll et al., 2015) whose bottleneck is relatively recent. However, marker-based inbreeding estimators rely on multiple assumptions and vary with the demography of populations, and our sample set is likely to violate some of these assumptions.
For example, when parental allele frequencies are unknown, population allele frequencies are calculated based on the current sample (in our case the first generation offspring of wild breeders), and when the current sample contains highly related individuals (our sample set comprises many half-and full-sibs) it has been shown that the marker-based inbreeding estimators may return misleading results smaller than zero (Wang, 2014). Moreover, the population in Cambodia has an extremely small effective population size, another potential source of bias that may lead to F IS -related inbreeding coefficients below zero (Waples, 2014 (Frankham, 2005;Reed & Frankham, 2003).
Retaining contemporary genetic diversity of the Cambodian population should be the major management focus, given the population is highly genetically and demographically impoverished.
Accordingly, a fine-scale assessment of population structure and pedigree analysis plays a key role in selecting ideal individuals for assurance colony management and/or reintroduction efforts. To this end, we generated a list of 25 individuals for a captive assurance colony-directly selected based on our population-genomic data to maximize genetic diversity-which will be a suitable basis for future genetic stock for the species in Cambodia once the individuals have started breeding. Additionally, this methodology can be applied for the management of future reintroduction events, and our findings on population genetic inferences can be used as a baseline for future monitoring of the assurance and reintroduction colonies.
The need for genomic methods in the conservation of non-model and non-commercial species is growing (Shafer et al., 2016) despite an overall increase of conservation genomic studies (see examples in Shafer et al. (2015) and Garner et al. (2016)). We believe that our work is one of the case studies filling this scientific gap by shedding light on genomic variation, population structure, and demographic parameters regarding a relict population of one of the most critically endangered turtle species.
The majority of the world's turtles are threatened with extinction and among these B. affinis is in a particularly critical state. We showed that the small remnant wild population in Cambodia has decreased even more across the short span of this study. From 2006 to 2008, the number of active breeders dropped from eight to zero and from 2009 to 2012 only two active breeders were revealed.
Accordingly, although our dataset comprised genetically highly related individuals (full-sibs and half-sibs), the genome-wide analysis provided a fine resolution in determining population structure, whereas traditional genetic markers failed to do so.
We applied genomic methods to inform choices for assurance colony management, and to help preserve the genetic diversity of B. a. edwardmolli while bridging the gap between in-situ and ex-situ conservation. Given that ~70% of the world's most threatened turtles inhabit Asia, with comparable demographic histories to B. affinis, our genomic strategy has the potential to be applied to many other species in the mid-term future.

ACK N OWLED G M ENTS
We thank Samuel Tay

CO N FLI C T O F I NTE R E S T
We do not have any conflict of interests.