Genomic patterns in the widespread Eurasian lynx shaped by Late Quaternary climatic fluctuations and anthropogenic impacts

Abstract Disentangling the contribution of long‐term evolutionary processes and recent anthropogenic impacts to current genetic patterns of wildlife species is key to assessing genetic risks and designing conservation strategies. Here, we used 80 whole nuclear genomes and 96 mitogenomes from populations of the Eurasian lynx covering a range of conservation statuses, climatic zones and subspecies across Eurasia to infer the demographic history, reconstruct genetic patterns, and discuss the influence of long‐term isolation and/or more recent human‐driven changes. Our results show that Eurasian lynx populations shared a common history until 100,000 years ago, when Asian and European populations started to diverge and both entered a period of continuous and widespread decline, with western populations, except Kirov, maintaining lower effective sizes than eastern populations. Population declines and increased isolation in more recent times probably drove the genetic differentiation between geographically and ecologically close westernmost European populations. By contrast, and despite the wide range of habitats covered, populations are quite homogeneous genetically across the Asian range, showing a pattern of isolation by distance and providing little genetic support for the several proposed subspecies. Mitogenomic and nuclear divergences and population declines starting during the Late Pleistocene can be mostly attributed to climatic fluctuations and early human influence, but the widespread and sustained decline since the Holocene is more probably the consequence of anthropogenic impacts which intensified in recent centuries, especially in western Europe. Genetic erosion in isolated European populations and lack of evidence for long‐term isolation argue for the restoration of lost population connectivity.


History and status of the sampled populations
Information on the recent and current demographic status of Eurasian lynx populations is large lacking and mostly unreliable, being mostly based on unvalidated expert opinion and hunting statistics. Nevertheless, the populations sampled do cover a range of recent demographic histories and exposure to anthropogenic impacts (Table   S2). Westernmost populations in Europe are the remnants of an ancient range that started its contraction as early as the 7 th century when the species disappeared from Great Britain and followed by a progressive eastward withdrawal in continental Europe starting probably in the 16 th century and peaking in the 19 th and 20 th century with the implementation of extermination policies in many countries. Such intense persecution led to the extermination of the species in most of western Europe, with the few westernmost populations surviving going through intense and extended bottlenecks which were followed by a slow recovery once the species became protected in the 1930s-1940s. Bottlenecks and genetic isolation have been particularly intense in the Balkans, Norway (together with neighbouring Sweden) and NE-Poland, which were at the brink of extirpation during the first half of 20 th century and with little or no gene flow from other lynx populations. The Carpathians Mountains (mainly Romanian and Slovakian parts) went also through important contractions leaving 100-200 individuals by the 1930s, but recovered to more than 1000 individuals by the second half of the 20 th century. The Carpathian population has been since considered relatively large, but largely isolated from other lynx populations, and has been used as the source of animals for reintroductions in western Europe (Von Arx, Breitenmoser-Würsten, Zimmermann, & Breitenmoser, 2004). However, the first model-based estimates obtained recently suggest lynx abundance in the region has been overestimated in the past (Krojerová-Prokešová et al., 2018;Kubala et al., 2017).
Latvia is part of the northwestern population, together with Finland, western Russia, and Estonia. While it also went through declines starting in the 17 th century and intensifying in the early 20 th century, the population recovered and now occupies its full historical range with 600-800 individuals estimated at present. Importantly, the population remained well connected to the large and continuous core population farther east in European Russia, which is represented in our sampling by Kirov and the Urals.
Information on the lynx populations in Asia is even more scarce, scattered and mostly consists of fur harvest data, which not only reflect population numbers but also socioeconomic circumstances. The Asian distribution is generally considered continuous and with high densities, although these typically fluctuate due to natural (prey availability) and anthropic causes (harvest pressure), with amplitudes ranging from 7 to 13 years, being larger at northern latitudes (Matyushkin & Vaisfeld, 2003). Asian lynxes underwent also a heavy exploitation during last centuries, dating back to the XVII century, with 10 000 lynx pelts were recorder yearly in Siberia only (Brass, 1911). This, however, did not lead to the extirpation of entire populations typical for the bottlenecked populations in the western peripheries of the species range.

Samples
We sampled 80 L. lynx across the distribution range of the species, including individuals for five out of the six subspecies proposed by the IUCN Cat Specialist Group (L. l. lynx, L. l. balcanicus, L. l. carpathicus, L. l. isabellinus, and L. l. wrangeli ) (Kitchener et al., 2018)). Majority of samples were tissues collected from legally hunted individuals (Norway, Latvia, Russia, and Romania; n = 68) or animals found dead (Poland,Carpathian Mountains,and Mongolia;n = 19). Three museum specimens

Genome re-sequencing
Samples were sequenced at Centro Nacional de Análisis Genómico (CNAG-CRG) or Macrogen using different sequencing strategies (Table S1). These strategies depend on the quality of the gDNA extracted or the goal depth.

Library preparation
Library preparation 1. Good quality samples aimed to be sequenced at low to medium depth. gDNA samples were used for preparing Illumina sequencing compatible pairedend libraries using NO-PCR protocol, TruSeq™DNA Sample Preparation Kit v2 (Illumina Inc.) and the KAPA Library Preparation kit (Kapa Biosystems). In short, 2.0 micrograms of genomic DNA sheared on a Covaris™ E210 was end-repaired, adenylated and ligated to Illumina specific indexed paired-end adaptors. The DNA was size selected with AMPure XP beads (Agencourt, Beckman Coulter) in order to reach the fragment insert size of 220-550bp. The final libraries were quantified using Library Quantification Kit (Kapa Biosystems).
Library preparation 2. Low input gDNA samples aim to be sequenced at low to medium depth. Low input gDNA was used for short-insert paired-end libraries for whole genome sequencing were prepared with KAPA HyperPrep kit (Roche-Kapa Biosystems) with some modifications. In short, 0.8-1.0 microgram of genomic DNA was sheared on a Covaris™ LE220 (Covaris) in order to reach the fragment size of ~500bp.
The fragmented DNA was further size-selected for the fragment size of 220-550bp with AMPure XP beads (Agencourt, Beckman Coulter). The size selected genomic DNA fragments were end-repaired, adenylated and ligated to Illumina sequencing compatible indexed paired-end adaptors (NEXTflex® DNA Barcodes). The adaptormodified end library was size selected and purified with AMPure XP beads to eliminate any not ligated adapters. The final libraries were quantified using Library Quantification Kit (Kapa Biosystems).
Library preparation 3. Good quality samples aim to be sequenced at medium to high depth. Library preparations were performed according to TruSeq DNA PCR-Free library prep guide (Illumina). Briefly, one microgram of quantified genomic DNA was sheared using an LE220Focused-ultrasonicator (Covaris, Inc.) with a Duty factor of 15%, Peak incident power 450 W, 200 cycles per burst for 50 seconds. Sheared DNA fragments were end-repaired, size-selected to obtain DNA fragments around 350 bp, and adenylated according to the manufacturer's instructions. After ligating indexing adapters to the ends of the DNA fragments, quality and band size of libraries were assessed using D1000 Screen Tapes (Agilent) on a TapeStation 2200 (Agilent). Library concentration was measured by qPCR using KAPA library Quantification Kit (KAPA Biosystems).  in paired-end mode with a read length of 2x126bp using TruSeq SBS Kit v4 following the manufacturer's protocol. Image analysis, base calling and quality scoring of the run were processed following standard Illumina procedures.
HiSeq X-10, 2x150bp. Libraries aim to be sequenced at medium to high depth were sequenced on HiSeq X-10 sequencer (Illumina, Inc) in paired-end mode with a read length of 2x150bp following the manufacturer's protocol. Image analysis, base calling and quality scoring of the run were processed following standard Illumina procedures. *Library preparation 1 and 2 and Sequencing using HiSeq2000 was carried out in CNAG facilities. Library preparation 3 and 4 and Sequencing using HiSeqX-10 was carried out in MACROGEN facilities.
In all cases, samples were sequenced using Illumina protocols, and primary data analysis was carried out with the standard Illumina pipeline.

Neutral regions definition and exclusion
To determine neutral regions of the nuclear genome we excluded all the regions annotated as functional categories in the Iberian lynx reference genome, i.e. genes, lncRNAs, mRNAs, and ncRNAs (Abascal et al., 2016). Besides, we annotated and excluded promoters of protein-coding genes and lncRNA defined as 1000-bp upstream of the gene or lncRNA. We annotated and excluded ultra conserved non-coding elements (UCNEs) using human coordinates (https://ccg.vital-it.ch/UCNEbase/). This information was translated into cat coordinates using LiftOver (http://rohsdb.cmb.usc.edu/GBshape/cgi-bin/hgLiftOver), and latter, we translated these cat coordinates into lynx coordinates by using lynx to cat sinteny (Abascal, 2016).
We also added a security buffer of +/-1000bp to any functional region to avoid regions under the influence of functional areas. To exclude these functional regions and our security buffer from the bam files we used bedtools subtract and bedtools intersect (Quinlan & Hall, 2010).

Mitogenome reconstruction
To reconstruct the mitogenome, we mapped only 10M random reads in the case of WG samples while all reads coming from the capture experiment were used. The average depth was calculated using Samtools depth piped to a custom bash script.
We called SNPs using the following settings on Freebayes (Garrison & Marth, 2012): haploid genome, minimum read mapping quality 30, minimum base quality 20, a minimum of two bases supporting an alternative allele call, and a maximum read mismatch fraction of 0.2. Repetitive regions RS2 and RS3 (positions 16096-16382 and 16908-174, respectively) (Sindičić et al., 2012), where no SNPs could be called reliably, were excluded from the analysis. The number of reference positions not covered by any read was calculated for each individual using the command genomecov in BEDTools (Quinlan & Hall, 2010). A consensus for each mitochondrial genome was constructed using the FastaAlternateReferenceMaker command in GATK (McKenna et al., 2010). Consensus sequences were rearranged to present them with standard coordinates but lacking the excluded repetitive regions.

Treatment of suboptimal samples
Balkans samples were museum material with unknown collection date. Some evidence indicates that at least two of them were more than 70 years old. All three samples were maintained in suboptimal conditions for DNA studies. Accordingly, we processed the samples in a sterile lab especially used for museum or suboptimal material. DNA extraction yielded relatively low amount of DNA and the DNA was highly degraded.
Still, given the interest of the samples, we proceeded with library preparation.
Libraries failed standard quality control because of degradation and RNA contamination. One of them also failed in DNA quantity. Standard procedures were implemented to prevent contamination in the lab throughout the whole process. As they were suboptimal material they underwent two sequencing runs to reach our goal depth, in contrast with the rest of the samples that got this depth with only one run. FASTQC analysis indicates that the reads show a distorted GC, which could be an indication of bacterial DNA. We mapped the reads to the nuclear genome, and calculated the endogenous content as: reads mapped vs. total reads. Balkans samples showed 67%, 73%, and 85%, way below the average for the rest of the samples (~97%). We also calculate Ts/Tv ratio. High Ts/Tv ratios are indication of damaged DNA, and are typical of museum and ancient materials. While one of the samples (labeled as historical) showed signs of damage when compared to the rest of contemporary samples, the other two showed a very low value. We performed diversity measures using ANGSD and SNP calling using GATK and we found unexpectedly high diversity values (above twice the diversity of Tuva (the population with the highest diversity). Therefore, we decide to exclude these samples from this analysis. Still, we think that analysis such as structure or PCA, although taken with caution, could give some useful information on the relationship of Balkan samples to the rest of the samples. Contamination, errors, and damage are unlikely to be shared among samples, so the variants shared by all three samples and separating Balkan samples from the rest are likely to be authentic.
For the mitogenome analysis, Balkans samples underwent a BLAST+ (Camacho et al., 2009) based filtering step between mapping and SNP calling. BLAST+ searches were performed in our own server against NCBI database using as queries all mapping reads showing one or more mismatches to the reference genome. Reads were excluded if the list of hits with the lowest E-values did not include at least one felid species.
Besides, we manually revised and confirmed all SNPs called from these samples. Therefore, the extra divergence of the Balkan mitogenomes, obtained from low quality samples, is not likely caused by contamination or post-mortem damage, as many divergent reads potentially coming from exogenous DNA were efficiently removed through this BLAST filtering step (sample:percentage of removed reads, mapping reads retained post-filtering: h_ll_ba_0214: 34%, 18370; h_ll_ba_0215: 17%, 26622; c_ll_ba_0216: 8%, 30749). Moreover, the exact same haplotype was independently reconstructed from the three samples; suggesting that they are not affected by contamination or damage, which are expected to be random and sample-specific.

Identification of the ancestral state
We inferred the ancestral state using L. rufus for our analysis with ANGSD. Raw pairedend Illumina reads were mapped to the Iberian lynx reference genome and the resulting bam file was used to call variants using SAMtools mpileup (-q30) (Li et al., 2009). The output was piped into pu2fa (-C45) program from Chrom-Compare-master (https://github.com/Paleogenomics/Chrom-Compare) to do a pseudo-haploidisation.
We inferred the ancestral state for 97% of bases in our reference genome.

Chromosome X and Y regions definition and molecular sexing
The annotation of the reference genome do not include sexual chromosomes so we characterized such regions as they should be treated apart of the autosomes because of their different ploidy. For each sexual chromosome we defined two different sets of regions depending on their posterior use:

i)
A lax definition was used to define any region having signals of belonging to a sex chromosome. We used these regions to sex molecularly our individuals and to exclude them from any analysis intended on the autosomes (i.e. PSMC, measures of autosomal diversity).
ii) An strict definition was used to define regions which showed clear signs of belonging to the sexual chromosomes and therefore were used in the analysis aiming to study haploid and sex-linked genetic signals.
i) For our lax definition, we defined non-recombining parts of sexual chromosomes regions using differential depth expected in autosomal vs Y and non-recombinant X chromosome regions. Expecting that female/male ratio would be: ~1 in autosomal regions ~0 in Y non-recombinant regions ~0.5 in X non-recombinant regions. Similar approaches were previously used by Hall et al. (2013), and later modified by Smeds et al. (2015). We used Lynx pardinus individuals (eleven males and 15 females) sequenced to 5x depth to do this analysis. First because the sex of those individuals was known, and also, because our reference genome belongs to Lynx pardinus and only perfectly matching reads are taken into account to calculate read depth -species divergence could cause lack of depth in some areas of the genome. Then, we extrapolated our results to our L. lynx individuals, assuming that the divergence of the two species is so recent that chromosomal reorganizations, if any, should be a minority. We computed the per base average normalized depth for both females and males and then calculated the female/male ratio.
First, we used samtools view to select only primary alignments with a minimum mapping quality of 30 and no mismatches followed by samtools depth to get each base depth. Then, for each individual we used its mode to normalize each base depth to later compute the average depth per base and sex which was used to get the per base female normalized depth / male normalized depth ratio. The distribution of such ratios was explored in R using a random subset of 5M positions.
We decided to make the definition (X, Y or autosomic) of each region using contigs as the meaningful units, as single bases belonging to a chromosome when the adjacent ones do not does not make biological sense and scaffolds might be chimerical due to errors in the scaffolding process. For each contig we obtained its length as well as the number of bases with no data available.
To further inform the definition of the X chromosome, we took advantage of the definition of the regions annotated as syntenic to cat's X or autosomal chromosomes in Abascal (2016). First, we compared the distributions of the ratios found in those syntenic regions in contrast with regions syntenic to autosomes to help us to define the thresholds used. A base was assigned to the X chromosome if it showed a female depth/male depth ratio above 1.5 and/or if it had been described as syntenic to the cat X chromosome. We defined a contig as belonging to the X chromosome in our lax definition if more than 30% of its bases met our requirements.
The lax definition of the Y chromosome contigs required them to have more than 30% of their bases with a female/male depth ratio below 0.3 and at least a normalized depth of 0.2 in males (to avoid non-covered regions by sequencing or mapping biases).
ii) As the strict X chromosome definition we simply used the bases that had already been described as X chromosome using the synteny to the cat genome.
As the strict definition, we required at least 90% of the total bases in a contig to have a ratio below 0.3 and an average normalized depth for males between 0.2 and 0.8 (to also exclude multicopy Y chromosome regions) for the Y chromosome. showed consistent depth in all regions.

Y chromosome variation
We called SNPs on males exclusively using their pooled BAM files, selecting our strict definition of Y chromosome bases as the target region, and the same setting used for the mitogenome calling.

SNP calling and genotyping
When required for the analysis we called variants following the genome VCF (

Phylogenetic analyses of mitogenomes
In order to estimate a substitution rate for the mitochondrial genome in felids, we We performed a Bayesian phylogenetic analysis with the 15 feline species using BEAST v2.4.8 (Bouckaert et al., 2014) and the optimal partitions selected with PartitionFinder among the models implemented in BEAST. The tree was calibrated using a set of fossil constraints (Table below). We assumed a strict clock as justified before (Paijmans et al., 2017)