Implications of past and present genetic connectivity for management of the saltwater crocodile (Crocodylus porosus)

Abstract Effective management of protected species requires information on appropriate evolutionary and geographic population boundaries and knowledge of how the physical environment and life‐history traits combine to shape the population structure and connectivity. Saltwater crocodiles (Crocodylus porosus) are the largest and most widely distributed of living crocodilians, extending from Sri Lanka to Southeast Asia and down to northern Australia. Given the long‐distance movement capabilities reported for C. porosus, management units are hypothesised to be highly connected by migration. However, the magnitude, scale, and consistency of connection across managed populations are not fully understood. Here we used an efficient genotyping method that combines DArTseq and sequence capture to survey ≈3000 high‐quality genome‐wide single nucleotide polymorphisms from 1176 C. porosus sampled across nearly the entire range of the species in Queensland, Australia. We investigated historical and present‐day connectivity patterns using fixation and diversity indices coupled with clustering methods and the spatial distribution of kin pairs. We inferred kinship using forward simulation coupled with a kinship estimation method that is robust to unspecified population structure. The results demonstrated that the C. porosus population has substantial genetic structure with six broad populations correlated with geographical location. The rate of gene flow was highly correlated with spatial distance, with greater differentiation along the east coast compared to the west. Kinship analyses revealed evidence of reproductive philopatry and limited dispersal, with approximately 90% of reported first and second‐degree relatives showing a pairwise distance of <50 km between sampling locations. Given the limited dispersal, lack of suitable habitat, low densities of crocodiles and the high proportion of immature animals in the population, future management and conservation interventions should be considered at regional and state‐wide scales.


| INTRODUC TI ON
Understanding the evolutionary history and present distribution of genetic variation of wild species is fundamental knowledge in which to base informed decisions for sustainable management (Waples et al., 2008). Modern genetic data and analytical advances can help to determine appropriate evolutionary and geographic boundaries for management and identify how the biophysical environment and life-history traits, such as habitat specificity or dispersal, combine to shape the population genetic structure and connectivity among populations (Hendricks et al., 2018;Hohenlohe et al., 2021). Genetic data can help identify dispersal events, estimate rates of effective migration, enable modelling of the effects of environmental conditions on dispersal within and among populations, and assess abundance (Al-Asadi et al., 2019;Bravington et al., 2017;Feutry et al., 2017;Manel & Holderegger, 2013). The concepts and technology embedded in modern genetics and genomics offer a great opportunity to efficiently study in nature management-relevant genetic factors of threatened, protected, or exploited species.
When managing wildlife, understanding population connectivity is particularly important with implications for identifying key habitats and corridors that connect populations, determine the potential impacts of human activities on these connections, inform decisions about reintroduction and translocation efforts, and strategies for maintaining population genetic diversity (Frankham, 2015;Jangjoo et al., 2016;Tewksbury et al., 2002). Management decisions require knowledge of the dependence of discrete populations on local reproduction versus immigration from dispersal, locations of potential sources and sinks across the species' managed range, and predicted population growth rates and size (Lowe & Allendorf, 2010;Mills & Allendorf, 1996;Perkins et al., 2003). Measuring dispersal directly at scales that are relevant to basic and applied questions is challenging, which has motivated the use of genetic methods to provide information on genetic connectivity, which is typically defined by the degree to which gene flow affects evolutionary processes within populations (Kool et al., 2013;Lowe & Allendorf, 2010). Divergence at individual genetic markers among subpopulations is commonly used to measure gene flow between spatially or temporally separated samples and is summarised using metrics such as F ST and its analogues (Weir, 2012;Wright, 1949). These summaries require knowledge of population boundaries, which, if not available, can be estimated using unsupervised clustering algorithms for population assignment (Alexander et al., 2009;Pritchard et al., 2000). Multilocus genotype methods can also provide direct estimates of contemporary migration rates (Mussmann et al., 2019;Wilson & Rannala, 2003). An alternative approach to investigating dispersal and connectivity uses the spatial distribution of close relatives (Christie et al., 2010;Feutry et al., 2017). For example, the spatial distribution of half-sibling pairs provides insight into breeding-adult movements (Feutry et al., 2017).
This approach provides complementary information to other population genetics methods and is at a time scale relevant to management planning and intervention. Kin pairs also form the basis of close-kin mark-recapture, which is an emerging method for robust population size estimation (Bravington, Grewe, & Davies, 2016;Bravington, Skaug, & Anderson, 2016;Hillary et al., 2018).
In this study, we apply population genetics techniques to inform population connectivity and management of the saltwater crocodile (Crocodylus porosus). The saltwater crocodile is an iconic element of the natural landscape of northern Australia, a remarkable example of resilience and recovery after being hunted to commercial extinction in the 1960s, and a dangerous predator that shares its habitat with humans-not always harmoniously. In Queensland, saltwater crocodiles have recovered slowly since they were protected in 1974 and now number 20-30,000 (Taplin et al., 2020). That recovery, combined with a rapidly increasing human population in its habitats, has seen increasing human-crocodile conflict (Brien et al., 2017) and persistent calls since the 1980s for increased control measures ranging from egg-harvesting to 'culling'. The public discourse around crocodile conservation and management has always assumed that there is a single widespread Queensland crocodile population. However, this assumption is open to challenge. Taplin (1987) drew attention to the great diversity of Queensland's habitat in terms of its climate, physiography and human influences. Taplin (1987) identified 12 distinct 'bioregions' and subregions expected to pose quite different conservation and management issues. Those bioregions remain useful today in characterising the distribution and abundance of crocodiles State-wide (Taplin et al., 2020;Taplin et al., 2021).
What has been lacking, however, is any understanding of the interconnectedness of crocodile sub-populations occupying Queensland's bioregions. Saltwater crocodiles are capable of longdistance movements extending over hundreds, even thousands, of kilometres-as evidenced by an individual appearing on a remote Pacific island (Allen, 1974) and satellite tracking observations of individuals moving from eastern to north-western Cape York Peninsula following translocation (Campbell et al., 2013;Fukuda et al., 2019;Read et al., 2007). Together with long-standing observations that juvenile and sub-adult crocodiles disappear in large numbers from their natal areas (Messel et al., 1981(Messel et al., , 1984, such observations have supported the idea that large-scale movement, dispersal and consequent interbreeding are common in saltwater crocodiles. This lack of knowledge has important conservation and management consequences. A practical example arises in the far north of Queensland's populated east coast between Ingham and Cooktown, where a few thousand saltwater crocodiles occupy a narrow coastal strip with a high human population (Taplin et al., 2020). The area is bounded to the north and south by large stretches of quite inhospitable coastline with low crocodile densities. However, some 200 km north of Cooktown lies Lakefield-Rinyirru National Park which has a substantial crocodile population that has increased quite rapidly in numbers and shows early signs of reaching some sort of saturation density (Taplin et al., 2020(Taplin et al., , 2021. Meanwhile, the northern populated east coast subpopulation has been subjected since the 1980s to a quite strict management regime that has seen some hundreds of crocodiles removed for public safety (Brien et al., 2017). Nonetheless, the crocodile population has increased since the 1980s at 2%-3% per annum (Taplin et al., 2020). An important question is whether that increase is driven by local recruitment, by immigration from a source population experiencing emigration pressure in Lakefield-Rinyirru or both. If the northern east coast population is essentially closed with minimal immigration/emigration at its boundaries, then it needs to be managed accordingly. If, however, it experiences a high rate of immigration from Lakefield-Rinyirru, then the management issues are quite different. The two subpopulations and those in the intermediate zones would have to be managed as a whole. Some targeting of migrating animals moving through habitat bottlenecks in inhospitable coastal areas might make sense, and the design and expected outcomes from local control efforts in the northern populated east coast would need to be modified.
Population genetics methods can assist management in characterising how this movement influences population boundaries, map population sources and sinks, informs the timing and degree of intervention required, and assign the provenance of problem crocodiles (Lowe & Allendorf, 2010;Manier & Arnold, 2005;Runge et al., 2006;Waples & Gaggiotti, 2006). The identification of management units is another central management goal with population analyses of genetic markers providing an indirect means of inferring how subpopulations aggregate (Palsbøll et al., 2007). Fukuda et al. (2022) found that dispersal range for saltwater crocodiles between source and destination in the Northern Territory was typically 150-200 km and up to 700 km, and concluded that regions that combine two or three adjacent catchments are an appropriate scale for population management. The appropriate scales for crocodile management in Queensland are currently unknown as are the relationships between the identified bioregions and the population structure. Taplin et al. (2021) identified that some bioregion boundaries warranted revision following analysis of distribution and abundance data in relation to climate. Genetic analyses can contribute to that revision. In addition, detailed contrasts can also be explored between the major management units of Queensland and the Northern Territory, whose genetic variation has been well characterised (Fukuda et al., 2019(Fukuda et al., , 2022. These comparisons will help inform crocodile management across State boundaries. Genetic data further allow for an assessment of saltwater crocodile genetic diversity in Queensland following the 1960s demographic bottleneck, particularly at the southern extremes of the management range, where crocodiles are few in number, have increased in number very slowly since protection, and are subject to adverse climatic conditions as the world's southernmost saltwater crocodile population.
In this study, we used genome-wide genetic data to perform a high-resolution population genetics analysis of the Queensland C. porosus population that included individuals sampled from nearly the species' entire 4500 km Queensland range. To generate the genetic data on 1176 individuals, we used an efficient genotyping method that combines DArTseq on a pilot data set and subsequent sequence capture. Based on previous observations from the Northern Territory C. porosus population, we explored the hypothesis that the Queensland C. porosus population exhibits similar substantial genetic structure consistent with isolation by distance. Historical connectivity patterns were established using fixation and diversity indices coupled with clustering methods. We explored the implications of translocation, a historical component of the C. porosus management program, on genetic variation in the Queensland population. We investigated the spatial distribution of kin pairs using a kinship estimation method that is robust to unspecified population structure and validated it using forward simulation. Complementary lines of evidence from the coupled population genetics and kinship analyses were used to investigate hypotheses of dispersal distance, physical barriers, and mating system. The results provide new insights into saltwater crocodile connectivity across Queensland, provide a baseline for future genetic studies and methods, and contribute to the discussion of public safety and sustainable management of this iconic species.

| Study area
In Queensland, the saltwater crocodile is found throughout coastal areas, from the Fitzroy River near Rockhampton on the east coast, through Cape York Peninsula and the Torres Strait, and around the Gulf of Carpentaria to the Northern Territory border (Read et al., 2004;Taplin, 1987). There are 12 recognised crocodile bioregions and subregions, reflecting the highly variable biogeography and climatic conditions that exist across the state (Taplin, 1987;Taplin et al., 2020). While the species occupies a range of habitats including tidal and non-tidal creeks, rivers, swamps, and wetlands, as well as beaches and offshore islands, it is predominantly riverine, with over 90% of the population below 20 m elevation (Taplin et al., 2020). The population has recovered and been increasing since protection in 1974 and is currently estimated at 20-30,000 non-hatchlings at an average density of 1.65 individuals per km of river (Taplin et al., 2020). However, recovery has been highly variable across the bioregions, reflecting in part the availability of high-quality breeding habitat. North-western Cape York Peninsula has the most important breeding habitat and contains ≈40% of the population, with densities declining southward into the Gulf of Carpentaria and along Queensland's east coast (Taplin et al., 2020).
Princess Charlotte Bay also contains important breeding habitat and may be a source of recruitment for nearby river systems, while the Fitzroy River contains the southernmost breeding population (Taplin et al., 2020). Overall, most of the crocodile habitat in Queensland is considered sub-optimal.   Table S1 detail further the partitioning of samples across these bioregions.

| Sampling and DNA extraction
As has been done historically in Australia, sampled crocodiles were assigned to 16 size classes based on total length (TL) (Fukuda et al., 2013) (see Table S2 for mapping between total length and length class). Samples ranged across all size classes with 861 (of 997 with size class information) <1.8 m ( Table S3). Sex of 431 captured crocodiles (157 females and 274 males) was determined by cloacal examination. Sex was generally not determined for animals sampled by harpoon biopsy but seven large individuals were assumed male based on size, given females rarely exceed 3.5 m (Fukuda et al., 2013).
Tissue was sampled from free-swimming crocodiles >600 mm TL using a non-lethal biopsy punch method (Barrow & Halford, 2019).
This involved a needle head (3 mm × 25 mm) modified to fit the end of a 3 m long Rangoon cane pole that was plunged into the base of the crocodile's tail from a moving boat at night. For smaller crocodiles (<600 mm TL) a single piece of tail scute was removed. Genetic material (using both methods) was also collected from problem croc-  (Table S1). DNA extraction and quality control was performed by Diversity Arrays Technology (DArT Pty Ltd) and samples were genotyped using DArTseq™ as described by (Grewe et al., 2015). Briefly, DArTseq combines organism-tailored complexity reduction methods and next-generation (NGS) sequencing platforms. For crocodiles, SNP calling was performed using DArT P/L's proprietary SNP and SilicoDArT calling algorithms (DArTsoft14).
The process used the PstI -SphI restriction enzyme library protocol as used in Fukuda et al. (2019). The DArTseq analysis reported 15,514 pre-scored markers (0, 1, 2 copies of reference allele and null) from 186 individuals (two failed internal quality control).
To shortlist DArTSeq markers from the pilot study to take forward to DArTcap genotyping, we further analysed DArT's reported clusters of sequences, of which there were 35,344. Using custom software written in the R programming language (R Core Team, 2021) we applied a quality control process to the cluster sequences.
Clusters with median total read counts lower than 15 for likely heterozygotes were eliminated, removing 24,968 clusters. We assessed excessive polyploidy in both crocodiles (for contamination) and clusters (potential paralogous loci), which eliminated 288 clusters and no individuals. For each of the remaining clusters, the most numerous allele (in terms of total read counts across individuals) was identified and placed as the reference and counts for clusters with more than two alleles stored for downstream summary. For each of the two most numerous alleles (in terms of read counts) at a locus, the ratio of the median read counts in clear heterozygotes (at least 5% of the total proportion of counts at a locus for an individual were from the alternative allele) was compared for substantial deviation from 0.5. Loci were removed if they had values for this ratio of <0.33 or >0.67 (5004 clusters eliminated). These thresholds were determined from visual inspection of the distribution of the ratios across all loci and were chosen to balance the retention of high-quality loci with removal of likely unreliable loci. The distribution across the remaining individuals of a composite score based on the total number of heterozygous loci minus the total number of null loci was assessed.
A total of 9 crocodiles were removed due to outlier scores on the lower tail of this composite statistic suggesting excessive null rates for these individuals relative to the distribution across individuals.
No outliers were removed on the upper tail of this distribution. Each locus for each crocodile was then scored based on whether one or both of the main alleles (labelled A and B) were present. All loci were genotyped to one of the four categories: AB; AA/A0; BB/B0; or 00 (double null), where A0 and B0 are individuals that contain a copy of the A or B allele on one chromosome but returned a null from the other (see Hillary et al. (2018) for further details). This four-way scoring was applied to 5084 loci across 177 individuals. Post scoring, allele frequencies were computed and a minor allele frequency threshold of 0.02 was applied leaving 4799 loci.
To investigate the potential for population structure within these pilot data we performed a principal components analysis using the dudi.pca function from the ade4 R package (Bougeard & Dray, 2018) with no further quality control. A scatterplot of the first two principal components showed substantial sub-population differentiation across the five pilot regions sampled ( Figure S1).
Given this substantial structure, SNP loci were not filtered for deviation from Hardy-Weinberg equilibrium (HWE) (via chi-square test) because loci differentiated across sub-populations were of interest in the large-scale analysis. PCA analysis on a data set with a mild filter on HWE showed a similar pattern of differentiation on the PCA scatterplot (results not shown). As a final check, we separated the original cluster count data sets into bioregion-specific data sets. We then followed the same quality control procedure as above and intersected the final quality-controlled marker sets with the 4799 loci selected to assess if there was enough variation

| Quality control filtering of individuals and SNPs
Initial quality control of the SNP data was carried out before population genetic and kinship analyses. From the DArTcap procedure, we used DArT's two-row counts format, which reports the number of sequence tags for the reference and alternate alleles for each SNP and is more informative than the pre-scored SNP values. We performed quality control using the filter_rad pipeline in the radiator R package (Gosselin, 2020). The process filtered out both poor-quality SNPs and poor-quality individual samples that could have low DNA quality/quantity or be contaminated. See Table S4 Figure S3). For these individuals there was no correlation between heterozygosity score and missing proportion and further checks of sample and laboratory metadata indicated that the DNA for these individuals was of adequate quality. Despite this, we chose to exclude these individuals from the population genetic analyses based on their low heterozygosity values and their sharp divergence from other individuals in the Coastal Plains -CA bioregion. Understanding why these Coastal Plains -CA samples had very low heterozygosity is outside the scope of this study and is difficult to resolve due to the uncertainty in reported sample locations for the museum samples. Individuals with low heterozygosity from the Fitzroy were not excluded from the analysis as they were at the edge of the species' range and could show low heterozygosity for biological reasons. Individuals that showed evidence of duplication were also removed (one individual from each pair discarded with the individual with the lowest rate of missingness kept).
Following these filters, 948 individuals and 2958 variants were available for analysis. Of these 2958 variants approximately 49% were present in the selected loci set from the SNP discovery. For the bioregion Coastal Plains: Cape Melville -Cooktown, only two individuals were sampled. For analyses that summarise at a bioregion scale, these individuals were amalgamated with the closest spatial group, which was Princess Charlotte Bay. Initial pilot principal component analyses showed that these two individuals did group genetically with the Princess Charlotte Bay bioregion.

| Population diversity and structure analyses
Initial genetic diversity statistics were computed using the R package diveRsity (Keenan et al., 2013) for each bioregion and included allelic richness (A r ), observed heterozygosity (H O ), unbiased expected heterozygosity (H E ) and inbreeding coefficients (F IS ). We used the fixation index (F ST ) as a measure of population genetic differentiation calculated using the R package StAMPP (Pembleton et al., 2013) for all pairwise combinations of bioregions. The StAMPP bootstrap (1000 replicates) p-values for the pairwise F ST values were adjusted for multiple comparisons using a false discovery rate (Benjamini & Hochberg, 1995) at 5%.
We investigated the relationship between genetic and geographic distances by plotting Slatkin's linearised F ST (generated from StAMPP and transformed using F ST /(1−F ST ) (Rousset, 1997)) as a measure of genetic differentiation. Geographic distance was measured as the coastal distance between bioregion centroids.
The coastal distance was computed using a spatial layer of the Queensland coastline designed for crocodile reporting elsewhere (Taplin et al., 2020). The bioregion centroids were matched to their closest coastal line position and the distance computed along the coastline. We tested for a correlation between genetic and geographic distances using the standard Mantel test implemented in the R package vegan (Oksanen et al., 2020).
To assess the partitioning of broad-scale genetic variation, we conducted principal component analyses (PCA) using the R package adegenet (Jombart & Ahmed, 2011) and data from 948 individuals and 2958 variants. We further filtered these data on linkage disequilibrium at R 2 = 0.1 (using the snpgdsLDpruning function in the SNPRelate R package (Zheng et al., 2012)) and a sliding window of 3000 to include all variants, which left 1708 variants for PCA.
We investigated the influence of unequal sample sizes on PCA inference, which could introduce bias in clustering algorithms (Foster et al., 2018), by down-sampling bioregions with large numbers of individuals to 30 individuals.
We examined the correspondence between the primary axes of genetic variation from the principal components analyses and geography using the Procrustes transformation of the first two principal components using the MCMCpack package in R (Martin et al., 2011).
Procrustes transformations scale, stretch, and rotate the PCA axes to minimize the differences between the principal components and the geographic coordinates of each of the saltwater crocodile samples.
To investigate individual genetic ancestry partitioning, we used a model-based clustering approach implemented in STRUCTURE (Pritchard et al., 2000). We ran STRUCTURE for 10 independent runs across clusters values K = (2, …, 10), and investigated model fit for these K values across aligned model runs with the Delta K method (Evanno et al., 2005) implemented in the pophelper (Francis, 2017) R package. We used 20,000 burn-in Markov Chain Monte Carlo iterations followed by 80,000 further iterations for inference. We ensured the adequacy of the run length by checking the likelihood and alpha parameter for stability at longer MCMC chain lengths. We assumed that allele frequencies were correlated between sampled sites and allowed for admixture for all runs. All runs were completed with and without prior location information. The location prior information was set to the sampled bioregion for each individual.
Finally, to complement the principal component analysis, we performed a Discriminant Analysis of Principal Components (DAPC), implemented in the adegenet R package. For DAPC, we again filtered on linkage disequilibrium as per the PCA. In the first DAPC analysis, we used k-means (percentage variance method -95%) to identify clusters in the data and selected the cluster number that best describes the variance in the data set via inspection of the Bayesian Information Criterion as a function of the number of clusters. In the second DAPC analysis, a priori grouping based on sampled bioregions was investigated. Cross-validation, with 30 replicates and a 90/10 ratio for the training/validation sets, was used as an optimization procedure to select the adequate number of principal components to retain in the discriminant analysis.

| Kin identification
Identification of close relatives can provide a direct estimate of connectivity over shorter timescales (one or two generations), as opposed to long timescales (hundreds or thousands of generations) with population genetics. The timescale at which inference is made depends on the relatedness degree of the kin observed. Here, we focus on making inferences on recent connectivity using close-kin, which for our study include parent-offspring pairs (POPs), full-and half-sibling pairs F/HSP. The integration of classical population genetics with knowledge from close kin provides an approach to study the current and distant connectivity of a species (Feutry et al., 2017(Feutry et al., , 2020).

| PC-relate
PC-Relate (Conomos et al., 2016) is a model-free approach for estimating commonly used measures of recent genetic relatedness, such as kinship coefficients and identity by descent (IBD) sharing probabilities, in the presence of unspecified structure. The PC-Relate method is very well suited to the problem of inferring kin in structured populations and distinguishes familial relatedness from population structure, which both manifest as genetic similarity through the sharing of alleles. The method was pioneered in humans and applied to data sets with large numbers of SNPs. We explored PC-Relate's utility for differentiating parent-offspring pairs (POPs), fullsibling pairs (FSPs) and 2nd-degree relatives including half-sibling pairs, uncle/aunt -niece/nephew (referred to as full-thiatic pairs (FTPs) for brevity) and grandparent/grandchild pairs (GGPs) from unrelated or undetermined pairs (U).

| SLiM forward simulation
To calibrate our expectations for inferring kin using PC-Relate from data sets with approximately a few thousand SNPs, which is typical in ecological applications and far fewer than in human studies, we implemented forward-time non-Wright-Fisher simulations using the SLiM 3 software package (Haller & Messer, 2019). This incorporated sub-populations, adult migration and clutches, which are expected and observed properties in the C. porosus population. We simulated two sub-populations linked by various migration rates at 1%, 0.1% and 0.01% per generation. The sub-populations each had a carrying capacity of 1500 individuals. The mutation rate was set at 3.5 × 10 −9 per base pair per generation and the crossover rate at 10 −8 per base pair per generation, which is equivalent to per one centimorgan per megabase. The mortality rate was set to be proportional to the ratio of the carrying capacity (1500) to the number of individuals in the sub-population at the current time step. The simulation structure led to the oldest individual in the population being generally between 50 and 80 years old, with most of the population comprising juveniles of <10 years old. Reproduction is stipulated by each female 15 years or older randomly mating with a male from the same sub-population, also 16 years or older (age at sexual maturity). There was no sex-ratio bias in offspring and individuals within each sub-population were picked at random to be migrants. The simulations were run for 6000 generations, with pedigree tracking for the last 100 generations.
We repeated these three simulation scenarios with each female having an expected clutch size of λ = 15 offspring drawn from a Poisson distribution. Brien et al. (2014) estimated that on average saltwater crocodile clutch sizes were ≈55 eggs with more than half of the individuals surviving to 1 year of age. Alternatively, it has been estimated that only 30% of C. porosus eggs yield hatchlings in the wild (Webb & Manolis, 1993). As juveniles in the simulation are not subjected to any other mortality pressure relative to the rest of the population, we chose the expectation to be 15 offspring, which is slightly less than the 30% on-average estimate making it to 1 year old in the wild. were written in the R programming language to parse the true pedigree information from SLiM output and identify kin pairs that were parent-offspring, full-siblings, half-siblings, grandparent-grandchild and full-thiatic pairs. Samples were picked from the living individuals in the last generation of each of the six simulation scenarios. PCA analysis on the generated populations was performed to gauge the level of structure in these simulated populations and implemented using the adegenet R package. For each sample, we performed PC-Relate-specific quality control of filtering variants at MAF 5% and linkage disequilibrium at R 2 at 0.1 as in Conomos et al. (2016).
To cluster individual pairs into kinship categories we explored three methods including a Euclidean distance clustering method and a support vector machine (SVM). Conomos et al. (2016)  We used multiple clustering performance measures to compare the methods and gauge expectations for the saltwater crocodile data set. We computed the precision TP/(TP + FP), recall TP/ (TP + FN), specificity TN/(TN + FP), and F1 macro = 2(P macro ×R macro )/ (P macro + R macro ), where TP is true positive, FP is false positive, FN is false negative and P macro and R macro are the macro precision and recall. The F1 macro averages over the performance for individual kin classes and if it has a score close to unity indicates that the classifier performs well for each class. These measures were computed and summarised for the three methods across the 45 replicates used for testing from 'confusion matrices' computed using the caret R package (Kuhn, 2015).

| Saltwater crocodiles
To perform model-free estimation of genetic relatedness for the Queensland saltwater crocodile population we used the PC-Relate method implemented in the GENESIS R package. We used the lowheterozygosity quality-controlled data that had 948 individuals and 2958 SNPs. We filtered these data on linkage disequilibrium at R 2 = 0.1 and a sliding window of 3000 to include all variants. A further MAF filter at 5% was then applied, which left 1628 variants for analysis. PC-AiR analysis was then performed, which estimates principal components from SNP datasets and is not confounded by recent genetic relatedness. PC-AiR requires pairwise kinship coefficients for every pair of individuals in the sample, which was estimated using the KING-robust estimators implemented in the SNPRelate R package (Zheng et al., 2012). An adequate set of principal components estimated from the unrelated set of individuals from PC-AiR was taken forward for PC-Relate analysis.
SNPs were filtered in the PC-Relate function if an individual's estimated individual-specific minor allele frequency was <0.01, and the set of unrelated individuals determined from PC-AiR analysis was used for the training set in the PC-Relate analysis, as recommended by Conomos et al. (2016). Kinship summaries were performed using the distribution of kinship coefficients ( ), estimated probabilities of sharing zero alleles IBD, k(0), and the probabilities of sharing two alleles IBD, k(2) returned from PC-Relate. The distribution of was also plotted against coastal distance, where each crocodile's sampling coordinates was matched to the closest coastal position (from the Queensland coastline spatial layer used by (Taplin et al., 2020)) and distance computed between each pair of individuals.
Kinship assignment of pairs to POPs, FSPs and 2nd-degree relatives was predicted from the PC-Relate kinship statistics using the Manichaikul criteria and four SVM classifiers, which included the three trained on the simulated data with clutch breeding at the three migration rates and a final all-scenario model, which was trained on five replicates of data from each of the three clutch scenarios (672,750 total points). The output from these four classifiers was then compared for concordance. Further verification of kinship status for pairs of high interest was performed using available demographic information including sex, body size, sampling date and location.

| RE SULTS
To examine genetic variation in the C. porosus population, we sampled 1,176 individuals distributed across 10 bioregions in Queensland.
We obtained DArT SNP data for these individuals and after stringent filtering ≈3,000 SNPs and 948 individuals were analysed.

| Genetic diversity and fixation index
Genetic diversity indices across nine bioregions were summarised and showed broad evidence for lower diversity for the bioregions at the edge of the species' range (Table 1). Monomorphic loci featured prominently in several bioregions, particularly in the Fitzroy population where close to one-third were monomorphic, the Coastal Plains -APrR (16%) and the Gulf Plains -ALD bioregion (19%) ( Table 1). These three regions also showed the lowest allelic richness scores.
Mean heterozygosity in the Fitzroy was approximately twothirds that of other populations, as expected from earlier quality control investigations (Table 1 and Figure S2). The Coastal Plains -APrR bioregion showed the largest difference between observed suggesting an isolation-by-distance (IBD) effect for this population.
We investigated the sensitivity of these analyses to the data quality control filtering choices by generating a separate data set with no MAC filter and a HWE filter that requires the SNP to be out of HWE in all populations at a chi-squared test p = 0.01 (1 SNP removed), which left 1052 individuals and 4313 SNPs (Table S5).
Genetic diversity indices and F ST values patterns were consistent with those reported above (Table S6 and Figure S4).  Figure S15 and Table S7). For the three eastern clusters, TA B L E 1 Genetic diversity indices by bioregion generated from 948 crocodiles.  (Table S8).

| SLiM forward simulation
For each of the six forward simulation scenarios, the PCA analysis showed that the per-generation migration rates chosen (1%, 0.1%, 0.01%) spanned the full range of no sub-population separation to near full sub-population separation ( Figures S20 and S21). As expected, the PC-Relate kinship statistics were stable in terms of means and variances across sub-population differentiation scenarios (Tables S9 and S10) with some stable inflation in over replicates for 2nd-degree relatives, which included all HSPs, FTPs and GGPs. The biggest qualitative difference between the no-clutch and clutch simulation scenarios is the dramatic increase in FSPs and FTPs, which is expected. GGPs were more prevalent in the no-clutch simulations with FSPs appearing rarely. Therefore, we ignored summaries of FSPs clustering in the no-clutch simulations. Across scenarios, the F1 macro was 0.861 for the SVM, 0.698 for the Euclidean clustering, and 0.720 using the Manichaikul criteria ( Figure S22). The F1 macro was higher in the clutch simulations than the no-clutch scenarios owing to the increased performance on FSP clustering ( Figure S22).
Although the Euclidean clustering performed reasonably well it had poor precision for POP classification, clustering pairs with a k(0) away from 0 as POPs (see Figure S23 for a representative example).
If we assume a clutch scenario and that the SVM clustering translates to real data then we can expect the false-discovery rate (1−precision) to be <5% for POPs, ≈10% for FSPs and ≈20% for 2nd-degree relatives ( Figure S22).

| Saltwater crocodiles
We estimated recent genetic relatedness in the saltwater crocodile population using the PC-Relate method. KING-robust estimators of kinship for all pairs generated from the sample were used for PC-AiR analysis and showed the distortion in the expected distribution of these statistics due to structure in the crocodile population ( Figure S24). The first two PCs from PC-AiR showed the differentiation of the population and classified the population into 213 unrelated and 736 related individuals ( Figure S25). Overall, the dominant proportion of kin pairs (all-scenario SVM classification) was observed within bioregions with sparse across F I G U R E 4 Admixture plots generated from STRUCTURE clustering analyses. Horizontal coloured lines represent an individual sample from the bioregion indicated on the right panel. Results are based on checked convergence from five replicate runs for each K. For each K, individuals are ordered within bioregion by their major ancestry proportion. Bioregion abbreviations are ALD, Albert-Leichhardt drainage; APrR, Ayr -Proserpine -Rockhampton; CA, Cooktown -Ayr; MGD, Mitchell-Gilbert drainage; NFD, Norman-Flinders drainage.

F I G U R E 5
Map of samples differentiated in to five clusters from the DAPC with k-means clustering analysis of saltwater crocodile genetic data. Points correspond to sample instances of individual crocodiles and are coloured by the five clusters determined from the DAPC analysis (see Figure S14). Bioregions are coloured to reflect the five clusters with the colour of each bioregion corresponding to the group that has the highest number of individuals of a DAPC cluster in that bioregion. The Gulf Plain -MI and Burdekin bioregions are presented for completeness and are coloured but were not included in the analysis. Bioregion abbreviations are ALD, Albert-Leichhardt drainage; APrR, Ayr -Proserpine -Rockhampton; CA, Cooktown -Ayr; CMC, Cape Melville -Cooktown; MI, Massacre Inlet; MGD, Mitchell-Gilbert drainage; NFD, Norman-Flinders drainage.
bioregion kin (Table 2). We observed strong agreement in the pattern of kin across bioregions between the SVM results and the kin determined by the Manichaikul criteria (Table 2 and Table S11). The all-scenario SVM was more conservative than the Manichaikul criteria with many fewer FSPs and 2nd-degree relatives reported, which is expected given the lower false-positive rates observed in the simulation study. The kinship statistics for across-bioregion kin pairs were well distributed throughout the kinship statistic range suggesting that these pairs are not just false positives at the threshold boundary of kin-pair types ( Figure S27). We expect the qualitative conclusions for across-bioregion kin pairs to be the same given any of the used kin clustering methods.
Reported FSPs and 2nd-degree relative pairs were observed at a maximum coastal distance of 1900 km and were between several individuals from the North West Cape York and Coastal-Plains -CA bioregions, which correspond to the cross-bioregion set in Table 2. Parent-offspring pairs were predominantly observed at 10 km from each other with four long-range pairs at a coastal distance >400 km (Table S12). All POPs showed length and birth class differences consistent with this kin type. All long-distance F I G U R E 6 Scatterplots of the kinship coefficients and statistics from PC-Relate analysis of saltwater crocodile population. Panels (a, b) show the kinship coefficient, k(0) and k(2) distribution estimate for all pairs from the PC-Relate analysis and the classification into parentoffspring (POP), full-sibling (FSP) and 2nd-degree relatives including potentially half-sibling, grandparent/grandchild and full-thiatic pairs. The category U encompasses all pairs that were not classified as first-or second-degree relatives. Dashed lines show the expected values for each of the statistics. Panel (c) displays the scatterplot of the kinship coefficient versus the geographic distance between each pair along with the classification for each pair. The geographic distance is the distance along the coastline between pairs in different bioregions. For pairs within a bioregion, the distance is the straight-line distance between their sampling locations. The horizontal lines show the expectation with the POP and FSP lines slightly jitter of 0.25 so both can be shown.
across bioregion kin pairs were investigated as potentially a result of translocation (see below).
Out of all the kin pairs reported, 91% were between individuals sampled less than 50 km from each other ( Figure 6). Kin pairs were detected in all bioregions sampled with some across-bioregion kin pairs found. Substantial cross-bioregion kin pairs were reported between the Coastal Plains -APrR and the Coastal Plains -CA, Princess Charlotte Bay (Table 2). Kin pairs were also reported between the North West Cape York and Coastal Plains -CA bioregions; many comparisons (58,638) were performed between these bioregions. Again, most kin pairs reported were found within bioregions, with the Coastal Plains -APrR and Fitzroy bioregions showing the most extreme numbers of kin pairs (≈70%) relative to the number of comparisons performed (Table 2). We visualised the geographic region of the Proserpine River and the spatial distribution of kin pairs to provide insight into the range of kin in this bioregion. The Proserpine region appears densely populated by highly related crocodiles within 10 km of the river's entrance ( Figure S28).

| Removal of possible translocated individuals
We explored the implications of potentially translocated individuals on genetic variation because translocation was a historical component of the C. porosus management program in Queensland between 1995 and 1999 (Brien et al., 2017). We removed individuals that showed unexpected across-bioregion DAPC results and had high likelihood based on cross-checked translocation records. Table 3 shows In total there were 38 possibly translocated individuals (Table 3).  (Table S13).
To assess the impact that these individuals could have on the results, we recomputed the genetic diversity statistics, fixation indices and the performed again the correlation analysis between F ST and coastal distance with these individuals removed. We further retabulated the kin observations with these individuals removed. These analyses were expected to be the most influenced by the removal of these individuals.
Overall, the diversity statistics showed similar results to those from the analysis with the full set of individuals (Table S14). Small changes were observed for the Coastal Plain -CA bioregion, which is expected given 33 individuals were removed from this bioregion.

| DISCUSS ION
This study has performed a comprehensive investigation of the genetic variation of Queensland's saltwater crocodile population. Importantly, the study contains samples from nearly the entire 4500 km coastal range of the species and gives insights into the similarities and differences across climatically and physiographically diverse regions.

| Substantial structure
The study shows unequivocally that it would be misleading to characterise saltwater crocodiles in Queensland as a single population Note: Table shows a geographically reordered Table S8 with cells that contain the dominant number of individuals coloured grey to highlight trend. Cells (coloured in darker greys) are further marked as possible/probable (P) and quite definite translocations (T) because (a) they encompass adjacent bioregions between which migration is possible (and has clearly occurred across comparable distances in Northern Cape York and the southern Gulf Plains) but (b) they also encompass bioregions in which deliberate translocations are known to have occurred in the 1990s and early 2000s (Queensland Department of Environment and Science internal records).

| Interpretation is more difficult because animals have been translocated
The PCA, DAPC, and STRUCTURE analyses showed evidence of a small number of migrant or translocated individuals (or their close descendants) that were sampled in the Coastal Plains -CA but genetically clustered to the Gulf Plains -NFD, North West Cape York, Princess Charlotte Bay, and Coastal Plains -APrR bioregions.
Furthermore, a smaller number of crocodiles sampled in North West Cape York, Princess Charlotte Bay, and Fitzroy were also genetically clustered in other bioregions. The most plausible explanation is that these individuals have arisen from deliberate translocations as part of the Queensland Government's management program (Brien et al., 2017). Evidence in support of this conclusion comes from several observations.
The Queensland Government's management program has involved transporting problem crocodiles from some remote areas to farms and zoos on the east coast between Cairns and Rockhampton (Brien et al., 2017). Known transfers have included animals from the southern Gulf Plains, Weipa, and Princess Charlotte Bay. Escape of wild-captured crocodiles from farms and zoos is known to have been quite common, and most of the individuals found on the Coastal Plains that appear to derive from remote areas were found in waterways not far distant from crocodile farms and zoos. Also, between 1995 and 1999 some problem crocodiles were relocated from the east coast south of Cooktown to remote areas, including Princess Charlotte Bay and North West Cape York (Brien et al., 2017).
The DAPC data and kinship analyses provide additional support for translocation over natural migration as the likeliest explanation.
It was therefore prudent to investigate the results with these animals treated as the product of recent translocations rather than histori-  (Weeks et al., 2011). Assessment of conservation relocations is rarely carried out with outcomes often unknown, and causes of failures rarely understood (Fischer & Lindenmayer, 2000). In our study, the translocation assessment was coincident with the broader analyses and highlighted that translocation can mask migration and loss of genetic diversity (e.g., Fitzroy population). For future management, focussed translocation could be a viable conservation action for restoring genetic variability in the southern Queensland populations, with ongoing genetic monitoring required to assess its impact.  (Cao et al., 2020) showed greater differentiation over smaller geographic distances than observed for C. porosus in our study, as might be expected from the very limited capacity of C.

| Limited dispersal and site fidelitycomparisons with Northern Territory, Australia
johnstoni to exploit coastal waters to move between catchments and their tendency to greater site fidelity (Tucker et al., 1997). C. porosus, in contrast, has well-documented long-distance movement capabilities including in coastal waters (Campbell et al., 2013;Fukuda et al., 2019;Read et al., 2007).
In the Northern Territory, Fukuda et al. (2022) concluded that dispersal of C. porosus was greatest from high density populations at or near capacity with access to extensive high-quality breeding habitat. In Queensland, the bioregion most comparable with the prime habitat in the Northern Territory is North West Cape York, which contains 40% of the total C. porosus population in Queensland and the largest area of high-quality nesting habitat (Taplin, 1987;Taplin et al., 2020). Nests in Port Musgrave produce some hundreds of hatchlings annually, yet population counts have been essentially stable numerically since the late 1980s (Taplin et al., 2020). We might expect this bioregion to be an important source of dispersing juveniles and subadults populating adjacent bioregions and lowdensity river systems further afield in the southern Gulf. DAPC data suggests that has historically been the case to a modest extent, but kinship data shows little evidence of it in contemporary timeframes, with just one (2nd-order) kin-pair identified in some 42,000 kin-pair comparisons within the Gulf Plains.
Future work to compare landscape genetics results from the Queensland population with those reported in Fukuda et al. (2022) could further help to disentangle how the environmental differences between the two States influence connectivity and dispersal.
This would require the characterisation the Queensland landscape features relevant to crocodile dispersal, which will be complex.
Queensland has large regions similar in physiography to the Northern Territory, with extensive plains subject to periodic flooding and seasonally diffuse boundaries between catchments but also large areas with very narrow coastal plains, steeper topography, and river

| Limited dispersal and site fidelityenvironmental and density dependent factors
The environmental factors leading to the marked differences between the Northern Territory are obscure. There are no obvious physical barriers to coastal dispersal of crocodiles from North West Cape York to the southern Gulf and one large individual has been satellite-tracked making excursions from Port Musgrave to the Norman River and back (Campbell et al., 2010). Read et al. (2007) and Campbell et al. (2010)

Some part of the difference between Northern Territory and
Queensland results may best be explained by the different trajectories of the two populations after they were protected in the early 1970s. The Northern Territory population was protected to some extent by very extensive freshwater swamplands that were difficult to hunt and which consequently protected a significant remnant population of breeding-size adults while the broader population was driven to commercial extinction (Webb et al., 1984). The Queensland population, in contrast, is largely confined to riverine habitats where crocodiles were more vulnerable to hunting pressures and often driven to extremely low densities. Despite 11 years of protection, spotlight surveys of 424 km of five southern Gulf rivers (the Albert, Leichhardt, Bynoe and Smithburne Rivers and Duck Creek) in 1985 revealed counts of only 48 non-hatchlings (NH) -an average of just one animal per 9 km of waterway (Taplin, 1988 (Taplin, 1988).
Since 1971 the Northern Territory population has recovered rapidly from hunting and has grown to high average density (>5.26 non-hatchlings per km) and high absolute numbers (100,000 or more) (Fukuda et al., 2020). Queensland's population, on the other hand, has shown overall a much slower rate of increase and has reached only 1.65 non-hatchlings per km compared with 5.26 across surveyed rivers in the Northern Territory (Taplin et al., 2020). Given abundant evidence of density-dependent processes in C. porosus populations (summarised in Fukuda et al. (2020)), it is plausible the Queensland population has not reached sufficiently high densities across a broad enough area to trigger the levels of long-range dispersal that would be detectable in the kinship analyses. If that is the case, then it will be important to consider both the long-and short-term components of genetic relationships in management plans. Continuing pressure on local populations through management interventions may prevent them ever reaching levels where 'natural' exchanges with adjoining populations occur and they will need to be considered as essentially isolated/localised populations rather than dynamic components of a broader Queensland-wide population.

| Parent-offspring pairs
Individuals in parent-offspring pairs were predominantly found within 10 km of each other (six exceptions likely arising from translocations). However, only a very small proportion of the total kin was POPs (≈0.3%). A contributing factor to few POPs being found is that only 10% of the sampled population was over 2.4 m in TL (approximate initial breeding size for females). Smaller individuals can contribute to POPs but are required to have a birth cohort gap between individuals in the pair greater than the age at sexual maturity, which did occur in our study. Despite the proportion of larger C. porosus capable of breeding (>2 m) increasing over time in Queensland (17% in 1984Queensland (17% in -1989Queensland (17% in to 27% in 2016Queensland (17% in -2019, population is still dominated by immature animals (Read et al., 2004;Taplin et al., 2020).

| Full sibling pairs -mate fidelity
We observed a substantial number of full-sibling pairs, which could arise for a myriad of potentially interacting reasons including sam-

| Proserpine and Fitzroy -small, isolated, and low genetic diversity
The Proserpine and Fitzroy Rivers were each found to be populated by highly related individuals, with the most extreme numbers of kin relative to the comparisons performed (68% and 69% respectively).
One of the highest counts of FSPs was found in the Proserpine River, which has a very high density of very large adult crocodiles (1.3 per km) in just 22 km of waterway (Taplin et al., 2020;Taplin et al., 2021).
Furthermore, despite their geographical proximity these systems showed no evidence of being connected across any of the analyses.
The historical, environmental, and biological reasons for the low genetic diversity and seeming isolation of these populations are not clear, but some possibly relevant factors can be identified. The Proserpine River was historically bounded by extensive swamplands on the Goorganga Plains which would have provided much favourable habitat and nesting sites but has largely been replaced by intensive pastoral development (Taplin et al., 2021).  (Read et al., 2004;Taplin, 1987;Taplin et al., 2020;Taplin et al., 2021). movements between breeding events. They have the potential to reveal very recent barriers to gene flow and quantify contemporary migration rates between populations (Feutry et al., 2017(Feutry et al., , 2020. Furthermore, recent research has used the spatial distribution of close-kin to assess whether a population is currently totally philopatric, panmictic or exhibits sex-linked reproductive connectivity and dispersal (Patterson et al., 2022). Potentially, epigenetic age from the biopsy used for this study could be performed but would require an initial validation data set with known age (Husby, 2022;Mayne et al., 2021). Future analysis of mitochondrial DNA combined with epigenetic and subsequent kinship analyses has the potential to reveal sex-specific structure at a finer scale. This would be a substantial advance because aging techniques for crocodiles have proved very problematic due to the remobilisation of calcium in bones that obscures inference from growth ring aging methods.

| Future sampling
Genetic sampling of the Burdekin River catchment would likely give additional insights into the effects of agricultural development on the Queensland population. This river extends over 200 km inland and a small breeding population of saltwater crocodiles has persisted at low density for some decades -concentrated between the Burdekin Dam Wall at 167 km inland and Clare Weir at 58 km (Taplin et al., 2021;Taplin & Pople, 1987). The population appears to have become largely isolated from the coast by a series of weirs and dams constructed between the 1950s and the late 1980s. It likely persisted in this inhospitable habitat despite hunting because the river is very wide, strewn with extensive sand and rock bars and is barely navigable in even a small boat over long reaches. A modest population increase in recent decades (Taplin et al., 2021) has likely been driven by very occasional successful nesting in a small population of mature animals. Genetic testing seems likely to reveal a highly inbred population with large proportions of close-kin pairs, similar to the Fitzroy River. Insights into the genetic history of this population would add to our understanding of long-and short-term influences on these rather isolated aggregations. The population is of particular biological and evolutionary interest as it offers opportunities to study salt gland function in individuals that have never been exposed to salt water.

| Management implications
The aim of the Queensland saltwater crocodile management program is to balance the need to minimise human-crocodile conflict while not negatively impacting the conservation of the species in the state (Brien et al., 2017). The current management program attempts to reduce the number of crocodiles in and around areas where there are large human populations, while at the same time preserving crocodile numbers in more sparsely populated remote areas and protected habitat along the coastline south of Cooktown (Taplin et al., 2020).
Public discourse on crocodiles and crocodile management in Queensland tends to characterise the population as a single entity and to view its post-protection recovery and growth as more or less uniform across the State and having much in common with the somewhat spectacular recovery seen in the Northern Territory. That view has been challenged by the extensive analysis of survey results across 40 years in (Taplin et al., 2020), which showed great differences in population density, growth trajectories and rate of population increase across Queensland's diverse geography. This study adds unprecedented detail to that state-wide picture by highlighting the limited interconnectedness of major regional concentrations of crocodiles, many of which are separated by large expanses of marginal or poor-quality habitat.
Queensland's saltwater crocodile population is best considered, for wildlife management purposes, to be comprised of six sub-populations centred in the Gulf Plains, North West Cape York, Princess Charlotte Bay, the eastern Coastal Plains between Cooktown and Ayr, the Proserpine River, and the Fitzroy River. This regionalisation recognises the five sub-populations that dominated the genetic analysis but separates out also the geographically disjunct Fitzroy River population. Each of these sub-populations has some connectedness with adjacent populations, but that connectedness is much greater between the Gulf Plains and North West Cape York bioregion than it is between the eastern coastal plains sub-populations. Indeed, the Proserpine River presents as surprisingly disjunct from the Coastal Plains -CA and Fitzroy River sub-populations -perhaps resulting from inhospitable habitat and unfavourable tidal regimes peculiar to that part of Queensland (Taplin et al., 2020;Taplin et al., 2021). These are important findings that need to be incorporated into future management regimes, not least because they suggest that the effects of management actions will be far more localised than we have thought previously and there will be less potential for adverse effects on local populations to be ameliorated by dispersal from distant sources of recruitment. This is brought into particular focus by the ambiguous findings of this study in relation to Princess Charlotte Bay and its important sub-population centred on Rinyirru-Lakefield National Park.
The crocodile population there has increased considerably since the first formal surveys of the late 1980s (Taplin et al., 2020) and is a potentially important source of dispersing juveniles and sub-adults available to populate adjacent bioregions. The DAPC cluster analysis summarised in Table 3 shows some interchange with North West Cape York and, to the south, with the Coastal Plains -CA bioregion. But we know that for some years in the late 1990s considerable numbers of crocodiles were moved between these three bioregions as part of Queensland's problem crocodile management effort. Princess Charlotte Bay is separated from the main centres of crocodile population in the North West Cape York (at Albatross Bay and Port Musgrave) by some 900 km of mostly low-quality habitat (Taplin & Pople, 1987)

DATA AVA I L A B I L I T Y S TAT E M E N T
Results of the present study were generated using R version 4.1.2.