On the use of genome‐wide data to model and date the time of anthropogenic hybridisation: An example from the Scottish wildcat

While hybridisation has long been recognised as an important natural phenomenon in evolution, the conservation of taxa subject to introgressive hybridisation from domesticated forms is a subject of intense debate. Hybridisation of Scottish wildcats and domestic cats is a good example in this regard. Here, we developed a modelling framework to determine the timescale of introgression using approximate Bayesian computation (ABC). Applying the model to ddRAD‐seq data from 129 individuals, genotyped at 6546 loci, we show that a population of wildcats genetically distant from domestic cats is still present in Scotland. These individuals were found almost exclusively within the captive breeding programme. Most wild‐living cats sampled were introgressed to some extent. The demographic model predicts high levels of gene‐flow between domestic cats and Scottish wildcats (13% migrants per generation) over a short timeframe, the posterior mean for the onset of hybridisation (T1) was 3.3 generations (~10 years) before present. Although the model had limited power to detect signals of ancient admixture, we found evidence that significant recent hybridisation may have occurred subsequent to the founding of the captive breeding population (T2). The model consistently predicts T1 after T2, estimated here to be 19.3 generations (~60 years) ago, highlighting the importance of this population as a resource for conservation management. Additionally, we evaluate the effectiveness of current methods to classify hybrids. We show that an optimised 35 SNP panel is a better predictor of the ddRAD‐based hybrid score in comparison with a morphological method.

The remaining Scottish wildcat population is believed to be small, whereas hybrid cats are prevalent in certain areas; in a 2017/18 survey of wildcat conservation "Priority Areas" (Littlewood et al., 2014) the ratio of un-neutered hybrids to wildcats was estimated at 6:1 (Breitenmoser et al., 2019). The wild-living population in Scotland now resembles a "hybrid swarm"-a continuum of genetic backgrounds as a result of repeated back-crossing and mating between hybrids (Beaumont et al., 2001;Senn et al., 2019). A recent review of wildcat conservation in Britain by the IUCN concluded the population was "too small, with hybridisation too far advanced and the population too fragmented" to be considered viable (Breitenmoser et al., 2019).
Uncertainty surrounds the temporal patterns of hybridisation in Scotland. Domestic cats are thought to have become widespread during the Roman occupation of Britain ~2000 years ago (Serpell, 2014), although cat remains have been found at Iron Age sites, including on the Orkney islands off the north coast of Scotland (Macdonald et al., 2010;Smith, 1994). The wildcat population dramatically declined during the 18th and 19th centuries due to hunting and habitat loss, and by the start of the 20th century wildcat range in the Britain was limited to north-west Scotland. Significant introgression is believed to have occurred within the last 100 years, following wildcat population expansion, increasing contact between the small remaining population of wildcats and domestic cats (Breitenmoser et al., 2019).
Historic samples, collected over the last c. 100 years, support an acceleration of hybridisation in Scotland over this period (Senn et al., 2019).
Without a comprehensive understanding of hybridisation history or dynamics, or the impact of introgressive hybridisation on fitness, conservation of this species in Britain is not straightforward. Accurate population estimates are difficult to obtain due to the elusive nature of the species and limited ability to distinguish hybrids in the field based on morphology (Breitenmoser et al., 2019). This problem is compounded by the lack of a baseline reference for Scottish wildcats. The difficulties inherent in distinguishing wildcat and hybrid phenotypes results in haphazard protection, impedes accurate monitoring, and undermines the Scottish wildcat's legal status as a protected species.
The Scottish wildcat has served as a canonical example of domestic-wild hybridisation more generally. The aim of this study was, first, to clarify the population structure of wildcats in Scotland using a two-fold increase in the number of genetic markers compared to the most recent study (Senn et al., 2019). For this, we used ddRAD-seq data; ddRAD-seq is an efficient way to sample thousands of markers for genome-wide estimates of hybridisation (Peterson et al., 2012).
Increasing the number of markers increases power to accurately identify complex hybrids and backcrosses (McFarlane & Pemberton, 2019), giving the greatest resolution to date of the hybrid swarm in Scotland.
We also used an expanded set of markers to evaluate the effectiveness of current tests to identify hybrid individuals. Importantly, we aimed to estimate the timescale of hybridisation using a model that predicts the observed pattern of population structure. A demographic model for Scottish wildcats has been developed using an approximate Bayesian computational (ABC) framework (Beaumont et al., 2002), a model-based approach to parameter inference rooted in Bayesian statistics. For this we have developed an efficient two-stage modelling approach, and a novel method for choosing summary statistics to improve model fit.
Finally, we show how this neutral demographic model is a useful tool to evaluate the accuracy of methods to identify genes that are subject to natural selection in structured populations, and hence calibrate them.

| Data processing
ddRAD-seq data were generated for 129 individuals sampled between 1996 and 2017 (Senn et al., 2019). This included 71 individuals from the UK captive wildcat population, 53 individuals from the wild in Scotland (22 Scottish Wildcat Action www.scott ishwi ldcat action.org trapped cats, 31 roadkill samples) and five Scottish domestic cats (for full sample details see Table S1).
This study represents a new bioinformatic analysis of the sequence reads produced by Senn et al. (2019), incorporating an additional 51 captive and two wild individuals, as well as the original 76 samples. Historical wildcat samples derived from museum specimens reported in Senn et al. (2019) could not be used in this study due to poor DNA quality. Sequence reads were generated using the Illumina MiSeq Platform, as described in Senn et al. (2019). As per Senn et al. (2019), reads were demultiplexed by barcode and quality filtered using the STACKS module, process_radtags (Catchen et al., 2013).
Demultiplexed reads were trimmed to 135 bp and concatenated into a single read file per individual. Analysis of raw sequence reads diverged from that of Senn et al. (2019) from this point forward (described below), significant differences included the alignment of reads to the domestic cat reference genome, a lower read depth threshold to identify loci using STACKs and stringent filtering of missing data.
Sequence reads were aligned using BWA (Li & Durbin, 2009) to the Felis catus reference genome v9.0 (GCF_000181335.3) (Pontius et al., 2007). The proportion of mapped reads appeared to be high, even for putative wildcat samples (Table S2). We visually checked read alignment and linkage disequilibrium (LD) using Haploview (Barrett et al., 2005). A proportion of pairwise comparisons were affected by LD, but this was judged to be small and unlikely to affect downstream analysis. Mapped reads were processed using stacks v2.1 (Catchen et al., 2013). In STACKs a minimum of three reads were required to form a "stack". We allowed multiple SNPs per read, and the mean number of SNPs per read across the final data set was 1.6. Variants were filtered using a minimum allele frequency of 0.05 and maximum proportion of heterozygous individuals of 0.7, and the three sample sources (domestic, wild-living, and captive) were treated as separate populations. plink v1.9 (Chang et al., 2015) and vcftools v1.15 (Danecek et al., 2011) were used to filter data from STACKs. Specifically, this led to the removal of individuals with >30% missing data and stringent subsequent filtering of loci to remove all sites with missing data. Closely related individuals were identified using IBD estimates calculated by PLINK, corrected to account for admixture using the method described by Morrison (2013). Corrected IBD estimates were used as input for PRIMUS (Staples et al., 2014) which uses genetic data to reconstruct pedigrees up to third degree relatives. Individuals were then removed from the data set to limit relatedness (for the full list of excluded individuals see Table S1). Population genetic summary statistics (observed and expected heterozygosity, inbreeding coefficient and pairwise F ST ; Weir & Cockerham, 1984) were generated using PLINK and VCFtools.

| Population structure
Principal component analysis (PCA) and ADMIXTURE (Alexander et al., 2009) were used to examine population structure. PCA was completed in R using prcomp. ADMIXTURE analyses were performed for seven values of K, ranging from 2-8, and included a calculation of cross-validation error to estimate the optimal value of K. All SNPs were included, and the data were not considered dense enough to require thinning (to minimise background linkage disequilibrium) prior to the analysis (Alexander et al., 2009).

| Existing hybrid tests
Hybrid individuals are currently identified using a combination of genetic and morphological diagnostic tests: a seven-point pelage scoring system (Kitchener et al., 2005) and a 35 SNP genetic test (Senn & Ogden, 2015). The pelage test (7PS) scores seven key morphological characteristics on an ordinal scale of 1,2,3 for domestic, hybrid or wildcat features, respectively. Putative wildcats score 19 or higher on this test (maximum score 21), although a lower threshold of 17 can be used to overcome possible recorder error, for example, from poor quality camera-trap photos. The genetic test uses 35 SNPs that differentiate between wildcats and domestic cats (Nussberger et al., 2013;Senn & Ogden, 2015). A "hybrid score" is generated using STRUCTURE Q values between 0 and 1 (Pritchard et al., 2000); higher values correspond to individuals with more wildcat ancestry. An LBQ score (i.e., the lower boundary of the Q value 90% CI) of 0.75 is proposed as the threshold to class individuals as putative wildcats, as distinct from hybrids (Senn & Ogden, 2015). Individuals with an LBQ ≥0.75 are currently considered wildcats from a conservation management perspective.
We compared the performance of these hybrid tests using admixture Q values (Q6546) from the ddRAD-seq data to determine hybrid status. None of the 35 SNPs from the genetic test were present in the ddRAD-seq data. Data were only included from individuals where both 35 SNP and pelage scores were available (n = 59). The aim of this analysis was to compare the performance of these tests with diagnoses from a relatively dense marker set. Given the continuum of Q values observed in wild-living cats, a strict threshold (Q6546 ≥ 0.9) was used to select reference wildcat samples, but we recognise this threshold is somewhat arbitrary and does not necessarily denote "true wildcat" status. Individuals with Q6546 ≥ 0.9 were classified as wildcat reference samples, and those below 0.9 as hybrids (note the threshold for the genetic test used by the conservation program, LBQ35 ≥ 0.75, is a management decision, and a higher threshold was used to select reference samples for ROC analysis). Receiver operating characteristic (ROC) curves were then constructed to assess performance (Robin et al., 2011). Given the reference diagnosis, the true positive and false positive rates were calculated for both diagnostic tests at all possible threshold values.
Plotting false positive rate against true positive rate (specificity vs. sensitivity) for each classification threshold generated an ROC curve for each test. The area under the curve (AUC) is equivalent to the probability a test will rank a random positive instance higher than a random negative instance and is a useful metric to compare diagnostic tests. An AUC of 0.5 is essentially a random guess and an AUC of less than 0.5 is worse than random.

| Modelling wildcat demography
A demographic model for wildcats was developed within an ABC framework (Beaumont et al., 2002). ABC was developed as a rejection algorithm (Pritchard et al., 1999), in which simulated data are generated under a hypothesised model of evolution, with model parameters sampled from a known prior distribution.
Summary statistics are taken from both the simulated data and observed data. An accepted sample of simulations (those with summaries closest to the observed data) are then used to estimate posterior distributions of the model parameters. Posterior estimates from the basic rejection algorithm can be improved with local linear (Beaumont et al., 2002) or nonlinear regression (Blum & François, 2010).
The model developed for wildcat demography is outlined below. Wildcat and domestic cat populations diverge, under a neutral model of evolution, for 500 generations. Generation time for a wildcat was estimated to be three years (Beaumont et al., 2001;Nussberger et al., 2018), 500 generations (~1500 years) therefore approximately spans the time domestic cats and wildcats are thought to have been sympatric in Britain (Serpell, 2014). Given the focus on recent demography, and in view of the low mutation rate of SNPs, a two-stage "mutation free" approach (Beaumont, 2004) was used. We first model the divergence of the two populations from a common ancestor using a computationally efficient method in which the starting SNP frequencies for each population were simulated from a beta-binomial distribution, parameterised by F ST (Balding & Nichols, 1995). We achieve this by simulating from three beta distributions, the parameters for which we treat as nuisance parameters in the statistical model ( Figure S1A), and the priors for which are given in Table 1. The metapopulation SNP frequency, X, is simulated from beta(1, ancvar), which assumes that the nonreference allele is typically rarer (empirically confirmed).
Parameters F 1 and F 2 , model drift from the ancestral baseline for domestic and wildcat, giving frequencies beta(X(1−F 1 )/F 1 , (1−X)(1− F 1 )/F 1 ) and beta(X(1−F 2 )/F 2 , (1−X)(1−F 2 )/F 2 ), respectively. The finite population frequency is then a binomial sample of size 2pop 1 and The lower bound of this distribution was limited to 60 to avoid simulating a population of captive individuals smaller than the target data. c (gamma distribution with shape parameter α = 1 and scale parameter θ = 1)/size of captive population.
TA B L E 1 Full set of model parameters, the prior distribution and posterior mean are given for each parameter Previous analysis indicates a complex and patchy pattern of hybridisation, difficult to model on a large scale (Kilshaw et al., 2016;Senn et al., 2019).
Prior distributions for these demographic parameters were chosen based on existing knowledge of the model system. The prior for ancvar followed an exponential distribution. The F ST between captive wildcats and domestic cats reported by this study is 0.446, therefore priors for F 1 and F 2 followed a beta distribution sampling values around 0.2. Priors for effective population sizes followed a log normal distribution, with a fixed lower bound for the captive population of 60 individuals (preventing simulations with a fewer number of individuals then the target data). Fairly wide priors were used for wild-living and domestic cat population sizes; accurate estimates of census population size, both historic and current, are difficult to obtain, especially considering difficulties distinguishing wildcats from hybrids (Macdonald et al., 2010). Mig 1 was a parameter of particular interest as it corresponds to the rate of introgression from domestic cats. This prior followed a beta distribution allowing a migration rate of up to 0.6 per generation. The UK studbook for wildcats informed priors relating to the captive population (Barclay, 2019). Gene-flow between the wild-living and captive populations (mig 2 ) was constrained to be relatively small (around 0.01); we know from studbook records that only a small number of additional wild founders (between one and six) have been incorporated at any one time over the populations history. A more informative prior was given to T 2 as we know the captive population was established in 1960. Importantly, a wide prior was chosen for T 1 , allowing hybridisation to begin at any point in the simulation, before or after T 2 . The priors for T 1 and T 2 were completely independent. For a summary of prior distributions see Table 1. The target and simulated summary statistics were scaled to have unit variance prior to PCA rotation. The nearest neighbour distance (nnd) is an estimate of a quantity proportional to density (Silverman, 1998), in this case the prior predictive density. The idea was to compare the nnd of the target to the nnd of all the simulated points.
We can then define a highest prior predictive density (HPPD) band, for example, HPPD 0.95 , such that 95% of all simulated points have nnd > HPPD 0.95 . The nearest neighbour distances were computed, each time leaving out one summary statistic, allowing summaries resulting in the largest distance between the target and simulated data to be identified. The process was repeated (permanently dropping the worst performing summary statistic from the previous round, PCA rotated and scaled, and Mahalanobis nnd recomputed) until nnd of the target > HPPD 0.95 .
Parameter inference was carried out in R using the package abc (Csilléry et al., 2012). The closest 5091 points (1%) were used to generate the posterior distributions, correcting for an imperfect match between the summary statistics and observed data using nonlinear regression (neural network) (Blum et al., 2013;Raynal et al., 2019).

| Using the wildcat model to calibrate tests for selection
A model for wildcats, such as the one described above, is impor- The data were screened for selection using two methods, the R program pcadapt (Luu et al., 2017) and bgc (Gompert & Buerkle, 2011). Pcadapt uses a PCA-based method to identify candidate loci that are outliers with respect to population structure, bgc implements a Bayesian genomic cline model to quantify locus-specific patterns of introgression as a function of genome-wide admixture.
These methods were also applied to 10 simulated data sets selected at random from the posterior distribution of the wildcat model.
For pcadapt the first three principal components were used in the analysis, following Cattell's Rule that eigenvalues relating to random variation lie on a straight line, and those relating to population structure depart from the line (Cattell, 1966). We focused on outliers correlated with PC1, these relate to SNPs with large variation in allele frequency between the parent populations (included in the analysis) and may therefore represent "wildcat" or "domestic" loci under selection in the hybrid population. pvalues <1 × 10 −6 were investigated as outliers (equivalent to 0.01

Bonferroni corrected).
Unlike pcadapt, bgc requires parental populations to be defined Population summary statistics are given in Table 2.
An ADMIXTURE model with two ancestral populations (Figure 1b, K = 2) also supported distinct clustering of domestic cats and captive wildcats. The majority of wild individuals sampled had probable ancestry assigned to both groups, with varying amounts of "domestic" ancestry. PC1 position was strongly correlated with admixture Q values at K = 2 (Spearman's r = .998, p < .001; Figure S7). K for the whole data set is 5 ( Figure S8).

| Demographic modelling
Our demographic model (Figure 3a) is capable of simulating data within the range of the observed data and the model fits these data well ( Figures S9 and S10). The first two axes of the posterior pre-

| Evidence for natural selection
In the observed data pcadapt reported three outlying SNPs most correlated with PC1 ( Figure S11). bgc found the majority of SNPs TA B L E 2 Summary statistics for the three source populations: captive wildcats, wild individuals and domestic cats and β (five with negative values of α and β, one with positive α and negative β). Two SNPs were found to be outliers by both pcadapt and bgc. These results are summarised in Table 3 and Table S4.
Simulated data produced a comparable number of outliers using both methods. With pcadapt, nine out of the 10 simulated data sets contained at least one SNP correlated with PC1 found to be outlying with respect to population structure (Table S5). The mean proportion of outlying SNPs across the 10 simulations was 0.1%. Similarly, using bgc, the mean proportion of SNPs with α and β estimates in excess was 85.4%. A total of 4.3% and 3.9% of SNPs had outlying values of α and β, respectively and 0.2% were outlying for both α and β estimates (Tables 3 and S6).
Outlier SNPs are candidates for loci under selection although extreme outliers can also be generated via neutral processes, a result of pre-existing population structure, emphasised by genetic drift.
Using a neutral model of wildcat demography to calibrate these tests, we show that the number of outlying SNPs detected in the observed data does not deviate from expectations under neutrality.

| Current status of the wildcat in Scotland
PCA and ADMIXTURE analysis (Figure 1) demonstrated that a group of individuals genetically distinct from domestic cats (putative wildcats) persists in Scotland. Genetic differentiation between these groups was supported by a high F ST , as would be anticipated between two species (Hartl & Clark, 2007), and comparable to that between dogs and wolves (Cronin et al., 2015) or red and sika deer (McFarlane et al., 2020). This supports the findings of previous microsatellite (Beaumont et al., 2001) and SNP studies (Senn et al., 2019) that were able to differentiate between domestic cats and a group of putative wildcats in Scotland. Here, we reanalysed the 76 samples used by Senn et al. (2019) and an additional 53 individuals.
We increased the resolution of the previous analysis by 3449 SNPs, and the data show the same broad patterns. Putative wildcats reported in this study were sampled almost exclusively from the UK captive population. Hybridisation in the wild appeared extensive.
A continuum of genetic backgrounds was observed, the result of repeated hybridisation, backcrossing and mating between hybrids referred to as a "hybrid swarm" (Mayr, 1963); almost all wild-living individuals sampled showed some evidence of introgression from domestic cats (Figure 1). This supports the conclusion of Breitenmoser of increasing conservation concern since the 1980s (Easterbee et al., 1991;Hubbard et al., 1992;Kitchener, 1992) and is generally thought to be a consequence of wildcat range expansion in The demographic model for Scottish wildcats has limited power to detect ancient or complex patterns of admixture. Results presented here suggest our model is unable to detect signals of admixture beyond 30 generations or c. 100 years ( Figure S10). Haplotype

F I G U R E 3 Modelling wildcat demography. (a)
The model under which data were simulated; two parent populations (F. catus and F. silvestris) diverge under a neutral model of evolution. Gene-flow (introgression) from domestic cats begins at time T 1 , at a rate of mig 1 for every subsequent generation. At time T 2 the captive population is formed from a random sample of wild-living cats. Limited gene-flow from the wild population into the captive population occurs at a rate of mig 2 . Two possible scenarios are shown, one where gene-flow from domestics precedes the founding of the captive population, that is, T 1 precedes T 2 , (top), and an alternative scenario where T 1 occurs after T 2 (bottom). On both models an important subset of parameters is shown, for the full set of parameters see Figure S1. (b) PCA plots for the real data (left) and for a random sample of simulated data from the posterior distribution (right). The model was broadly able to simulate the same patterns we observe in the real data. (c) Prior and posterior distributions following ABC, dashed lines indicate the prior. Curves were fitted in R using locfit (Loader, 2013). The model supports recent introgression in the Scottish wildcat population following high gene-flow from domestics and linkage disequilibrium information (from sequence data) are needed for accurate dating of admixture events, especially to separate historical admixture from the very recent (Hellenthal et al., 2014;Loh et al., 2013); this work in whole genome sequenced individuals is now underway.
A recent hybridisation time for Scottish wildcats only seems likely in the face of high gene-flow from domestic cats. Our model estimates gene flow to be 13% (95% HPD: 7%-19%). This estimate implies 13% of gene copies in wild-living cats come from the domestic population per generation. This appears to contrast with the estimate of one wildcat to six unneutered hybrids reported by Breitenmoser et al. (2019); however, it should be noted that these hybrids will not be exclusively F 1 s and will have lines of descent from the domestic population extending over many generations. Quilodrán et al. (2020), using a forward simulating approach to model introgression in the Swiss Jura wildcat population, estimated the rate of introgression to be 6%. At this lower rate of introgression, it took 26 generations for the wildcat population to become 50% introgressed.
Tentative evidence is presented here that the "hybrid swarm" effect can develop rapidly following the breakdown of isolating mechanisms between two species, as has been observed in other hybridising species such as deer (Smith et al., 2014), loaches (Kwan et al., 2014) and honey-bees (Pinto et al., 2005). Our results may also support a recent acceleration of hybridisation in Britain. Although it is difficult to conclude using the current model whether historical admixture has occurred (and to what extent), it is clear there has been significant recent introgression within the last few decades.
An important feature of the model is the captive wildcat population. There is significant interest surrounding this population, which comprises individuals that are among the last putative wildcats in Britain, and especially regarding its value to continuing conservation efforts. It is therefore important to understand the extent to which hybridisation has impacted this population. It is clear from Figure 1  to replicate in the model (Figure 3b). It is hard to disentangle the impacts of maintaining a (historically small) captive breeding population, for example, inbreeding, genetic drift, or adaption to captivity (Frankham, 2018;Woodworth et al., 2002), from genuine population structure. The presence of family groups was limited following the identification of close relatives using PRIMUS. However, estimates of relatedness are complicated by potential admixture (Morrison, 2013). Our results ( Figure S12) imply the distribution of individuals across PC2 or PC3 was not a gradient of inbreeding across the population.
Patterns relating to geographical origin in the wild samples were Note: Results are given for the observed data, as well as the mean across ten simulated data sets sampled at random from the posterior distribution of the demographic model. The proportion of total SNPs (%) is given in brackets. Using both methods, a similar proportion of outlying SNPs is reported in the observed data and data simulated under a neutral model of evolution.
of hybridisation levels into discrete units. It is interesting to note that captive individuals with probable domestic ancestry at K = 2 all belong to the same cluster at K = 3.

| A demographic model for wildcats
The demographic model developed for wildcats appeared to fit well ( Figures S3-S5, S9 and S10), and the method to improve fit by dropping poorly performing summary statistics was effective ( Figure S5).
The two-step approach was computationally efficient, and overall, the model is a useful tool to understand the recent demographic history of the wildcat population in Britain.
The modelling approach we have taken has been to assume that our data does not have sufficient information from mutations occurring over the period of hybridisation to warrant a detailed evolutionary model (Beaumont, 2004). Although linkage has been assumed absent, our model allows for linkage disequilibrium due to finite population size and migration (Waples & England, 2011), which is why we have favoured an individual-based simulation using SLiM, rather than using the coalescent to simulate independent SNPs. We have shown that the model fits the key features of the data reasonably well. We have tried to focus on a simple model to achieve this.
Further improvements may be possible, at the cost of increased parameterisation, by considering, for example, variable population size, which is possible even with unlinked SNPs (Hollenbeck et al., 2016); however, this was not an aim of the current study.  (Table 3), even using a conservative approach to controlling false discovery rate. For these analyses the wildcat model was useful for deriving a null distribution specific to Scottish wildcats.
Even at neutral loci the demographic history of a population can cause allele frequency to vary hugely in space due to genetic drift and/or migration (Hoban et al., 2016;Lotterhos & Whitlock, 2014), as demonstrated by the variability in outcomes from the simulated data (each using a different set of demographic parameters sampled from the posterior distribution). Differences in allele frequencies between domestic cats and wildcats are therefore not surprising considering the genetic differentiation between the two populations, and do not necessarily correspond to deviations from neutrality. This analysis demonstrates that a wildcat-specific model of admixture is a useful tool to evaluate specific statistical approaches in genomic analysis and provides a useful baseline with which to develop scenarios of increasing complexity, for example, incorporating selection, fluctuations in populations size or spatial models.
In this regard, our study support the conclusions of recent studies of hybridisation (McFarlane et al., 2021;Quilodrán et al., 2020).
Furthermore, it will be straightforward to extend the approach to incorporate whole-genome data in the future.

| Existing tests for hybrids
Accurately identifying hybrids in the field is crucial to effective conservation of the wildcat in Scotland. In the absence of uncontroversial reference samples, we have used a score based on 6546 ddRAD We found the 35 SNP test to be a highly accurate predictor of the ddRAD SNP score; hybrids could be identified almost as well using 35 SNPs as with a dense marker set of over 6000 SNPs. Four false positives and two false negatives were identified, although similar Q values were recovered using both marker sets for these individuals, so this may partly reflect the stringent threshold used to select reference wildcats from the ddRAD data.
Without accurate information on the history of hybridisation in Britain there is no uncontroversial baseline for Scottish wildcats with which to calibrate either diagnostic test. Therefore, we recommend the continued use of the pelage score and 35 SNP test in conjunction to identify hybrids, especially when considering individuals to be incorporated into the captive breeding programme.

| Conclusion
We found a population of putative wildcats persists in Scotland.
These individuals were almost exclusively found in the UK captive population, which appears to have been established prior to significant recent admixture and is supported by accurate tests for hybrids. The captive population is an important resource for wildcat conservation in Britain. We found the wild-living population to be a hybrid swarm and almost all wild individuals sampled showed evidence of introgression from domestic cats.

ACK N OWLED G EM ENTS
This study represents further analysis of ddRAD data first published in Senn et al. (2019); the authors of that study are thanked. The authors would like to thank again the many people who have provided samples to this data set over the years. Thanks to those who have handed wildcat samples to the National Museum of Scotland; without these repeated individual efforts, these types of study are not possible. We are also extremely grateful for the assistance of the wildcat captive holding community in the UK. We thank Danielle Gunn-Moore at the University of Edinburgh for access to domestic cat reference samples. We also thank staff, volunteers and collaborators of Scottish Wildcat Action for their participation in sampling during this project and David Barclay for discussions relating to the studbook. Storage and archiving of the study samples at https:// www.cryoa rks.org/ is supported via BBSRC grant BB/R015260/1.

Generation of ddRAD data was funded by RZSS and the Heritage
Lottery Fund via grant to Scottish Natural Heritage. We would like to thank Greger Larson for vauable feedback on the manuscript.

J. Howard-McCombe is supported by the NERC Doctoral Training
Partnership, with additional funding from the People's Trust for Endangered Species.