An empirical test of the estimation of historical effective population size using Drosophila melanogaster

The availability of a large number of high-density markers (SNPs) allows the estimation of historical effective population size ( N e ) from linkage disequilibrium between loci. A recent refinement of methods to estimate historical N e from the recent past has been shown to be rather accurate with simulation data. The method has also been applied to real data for numerous species. However, the simulation data cannot encompass all the complexities of real genomes, and the performance of any estimation method with real data is always uncertain, as the true demography of the populations is not known. Here, we carried out an experimental design with Drosophila melanogaster to test the method with real data following a known demographic history. We used a population maintained in the laboratory with a constant census size of about 2800 individuals and subjected the population to a drastic decline to a size of 100 individuals. After a few generations, the population was expanded back to the previous size and after a few further generations again expanded to twice the initial size. Estimates of historical N e were obtained with the software GONE both for autosomal and X chromosomes from samples of 17 individuals sequenced for the whole genome. Estimates of the historical effective size were able to infer the patterns of changes that occurred in the populations showing generally good performance of the method. We discuss the limitations of the method and the application of the software carried out so far.

size. A number of methods have been developed for this inference.
While some of them are thought to provide estimates of effective size in the long run, usually for thousands of generations in the past (Kamm et al., 2020;Liu & Fu, 2020;Schiffels & Durbin, 2014;Speidel et al., 2019), others are intended to infer changes in N e in the recent past, based on identity-by-descent of genomic fragments (Browning & Browning, 2015), allele-frequency spectrum and linkage disequilibrium (Boitard et al., 2016) or only linkage disequilibrium between markers (Barbato et al., 2015;Hayes et al., 2003;Hollenbeck et al., 2016;Santiago et al., 2020). The latest refinement of the latter method (Santiago et al., 2020) has been shown to accurately infer drastic changes that have occurred in the populations over the past 100-200 generations. This method has been shown to be robust to the interaction between positive or negative selection and recombination (Novo et al., 2022) so that it provides estimates of N e only dependent on the variance of family size in the population.
The estimation of N e from linkage disequilibrium (LD) between markers assumes that LD occurs by chance, that is, by pure genetic drift given the particular breeding system and variation in offspring numbers from parents (Hill, 1981). The relationship between the LD between markers, measured as the correlation (r) between alleles of pairs of loci, and the effective size, is given by the approximate relation r 2 ≈ 1 ∕ 1 + 4N e c (Sved, 1971), where c is the rate of recombination between loci. If independent markers (c = 0.5) are considered in the analysis, estimates of the contemporary effective size can be obtained (Waples, 2006). However, linked markers also allow for the estimation of historical effective population sizes. This arises from the principle that LD between markers at a genetic distance of c Morgans provides information on the N e at 1/(2c) generations in the past (Hayes et al., 2003).
The method developed by Santiago et al. (2020), which is implemented in the software GONE (available at https://github.com/esrud/ GONE), has been evaluated with simulations under different scenarios (Novo et al., 2022;Reid & Pinsky, 2022;Santiago et al., 2020;Saura et al., 2021), generally showing a good ability to detect drastic changes in historical N e . In fact, Reid and Pinsky (2022) showed that the method exhibits ≥90% accuracy for detecting fast severe declines in population size, outperforming other methods. The method has also been applied to different sets of data from different species (Atmore et al., 2022;Bird et al., 2023;Ding et al., 2022;Ferrette et al., 2023;Pacheco et al., 2022;von Seth et al., 2022;Wersebe & Weider, 2023). However, the simulation tests have always a restricted scope, as the models simulated cannot encompass all the complexities of real genomes. In addition, the empirical estimates are generally done without knowing the real demography of the populations, even though relevant changes in population size can be assumed from different sources of data or contrasted with historical data on some occasions (Waldman et al., 2022). Thus, in this study, we aimed to evaluate the performance of the Santiago et al. (2020) method with experimental data from a Drosophila melanogaster population subjected to laboratory controlled drastic changes in size over generations. Our objective was to assess GONE's ability to detect these changes and infer the patterns in population demography, as well as to discuss its limitations and the current application of the method.
The population was maintained with constant temperature (25°C) and continuous lighting with circular mixing of 32 bottles each with 40-50 individuals of each sex and the feeding medium used in our laboratory (Pérez-Pereira et al., 2021). In each generation, male and female adults were placed into the corresponding bottle and were removed after a week, before the next generation emerged, so that there were no overlapping generations and the longevity of the flies in adult stage was of about 11 days. The expected effective size of the population was inferred to be about N e ≈ 1400 (López-Cortegano et al., 2016) (see Supplementary Material for more details on this inference and simulation results supporting it). At generation 208, this base population was subjected to different demographic scenarios ( Figure 1). First, the population size was drastically reduced by sampling 100 individuals (half of each sex) and establishing them in a single bottle. This population was maintained for eight generations with this size. Then, over a period of two generations, the small F I G U R E 1 Scheme showing the experimental scenarios. The circles represent time points where a sample of n = 17 individuals was taken for whole genome sequencing and they refer to: (1) a large base population with a constant size; (2) a severe population decline; (3) a severe bottleneck with recovering of the previous size and (4) a severe bottleneck and subsequent population expansion. population was expanded back to the previous size (32 bottles with about 100 individuals each) and, after another two generations, to a size twice as large (64 bottles). The expanded population, first with 32 bottles, and later with 64 bottles, was maintained with its size for four and five generations respectively. The maintenance of the bottles in all cases was carried out with circular mixing between the bottles, in the same way as the original base population. We sampled n = 17 individual males for the four points marked with numbered circles in Figure 1 in order to carry out genome sequencing of the individuals. Females were not considered to avoid offspring contamination due to the retention of fertilized eggs in their reproductive tracts. Thus, point 1 was used to estimate the historical effective size of the large initial base population; point 2 was used to estimate a sudden decline in census size in the population; and points 3 and 4 to estimate expansions in population size after the previous drop.

| DNA extraction and sequencing
The sampled individuals were frozen in liquid nitrogen and stored at −80°C. We performed an optimized protocol for DNA extraction from individual flies with the Gentra Puregene Cell Kit (Qiagen). All samples were sent to Macrogen (South Korea) for 2 × 150 bp pairedend whole genome sequencing on an Illumina Novaseq 6000 instrument using Nextera XT DNA libraries.

| SNP calling
The FASTQ sequencing files underwent a first quality control with FastQC (Andrews, 2010). Adapters were removed with Trimmomatic (Bolger et al., 2014) with Nextera adapter list. Quality and size trimming (minimum sequence length after trimming = 36) were performed with ERNE-FILTER v2 (Del Fabbro et al., 2013). After a new quality control with FastQC, we mapped the resulting reads against the 6.14 D. melanogaster reference genome with BWA-MEM (Li, 2013). Using SAMtools v1 (Danecek et al., 2021), the resulting SAM files were converted to indexed BAM files, and chromosomes were isolated and indexed. SAMtools (Danecek et al., 2021) was used for PCR duplicate removal and filtering (minimum mapping quality = 20). The quality control of the alignment was carried out with Qualimap v2 (Okonechnikov et al., 2016). 92.9% of the reference genome was covered, with an average coverage for autosomes of 49× (25× for the X chromosome), and an average mapping quality of 58.6. The HaplotypeCaller tool from GATK v4 (Van der Auwera & O'Connor, 2020) was used for raw variant calling (minimum phred-scaled confidence threshold = 10). The resulting gVCF files were combined using the GATK CombineGVCFs tool. We kept only biallelic SNPs. Highly repeated sites, low information regions and SNPs within 10 bp of an indel were removed with BCFtools (Danecek et al., 2021). Finally, SNPs were also filtered with the GATK VariantFiltration tool applying the recommended presets. Then, PLINK (Purcell et al., 2007) was used for file conversion from VCF files to PLINK map/ped files. The final average number of SNPs in each sample was 1,135,122.

| Genetic map usage
For accurate historical N e estimation, GONE requires a precise genetic map. The genetic distances in the D. melanogaster genetic map from Comeron et al. (2012) were originally calculated for females, since males do not recombine. However, the genomic consequences of recombination emerge from populations of both males and females, so for N e estimation, we halved the genetic distances from Comeron's map in the case of autosomes and multiplied them by 2/3 for chromosome X. Flybase Coordinates Converter tool (Gramates et al., 2022) was used for conversion of the genetic map's coordinates to dm6. In order to obtain a similar density of SNPs for the four autosomal samples, these were matched to those appearing in the second sample (point 2 in Figure 1). The total number of SNPs available for analysis were 891,737, 942,323, 882,861 and 894,727 for the autosomal chromosomes and 143,742, 98,450, 108,026 and 85,767 for the X-chromosome, in the four samples, respectively.

| Historical N e estimation
We estimated historical linkage-disequilibrium N e using GONE (Santiago et al., 2020). We used all the default options of the software for the analysis of the autosomal chromosomes: unknown phase, Haldane's correction for genetic distances, no minor allele frequency filtering and use of all SNPs including those with missing data. We considered the four autosomal arms (2L, 2R, 3L and 3R) as chromosomes in the analysis, and used a randomly chosen set of 50,000 SNPs per arm in each of 20 replicates. We used a maximum value of recombination frequency of c = 0.05, the default value recommended by Santiago et al. (2020). However, in order to test the relevance of this cut-off point, we also considered maximum values of c = 0.1 and 0.01. We also made estimates without knowledge of the genetic map, assuming that the average recombination rate of the species was evenly distributed along the genome. The geometric mean of the estimated N e values from the 20 replicates was obtained for each scenario.
Drosophila males are haploids for the X chromosome, so we analysed it separately. Since GONE requires diploid genomes, we com- The estimates of N e for autosomal and X-chromosomes were combined as N e = (4/5) N e, AUTO + (1/5) (4/3) N e, X , taking into account that the latter constitutes approximately 1/5 of the Drosophila genome length and the N e for X-linked genes is expected to be 3/4 that of the autosomal N e (Wright, 1933). Figure 2 shows the estimates of historical N e combining autosomal and X-chromosome estimates. These are shown separately in Figure S1 and for each of the 20 replicates run in Figure S2. The average combined estimates of N e for the most recent (1-4), intermediate (16)(17)(18)(19)(20) and most distant (56-60) generations are shown, along with their standard errors, in Table 1. For the base population scenario, there was a good estimation of the global N e (Figure 2a), although this arises from a compensation between the underestimation from autosomal data and the overestimation from X-chromosome data ( Figure S1a). There was a 31% underestimation of the true N e in the most recent and most distant generations and a 35% overestimation in the intermediate generations (Table 1). The severe decline and subsequent expansions were generally well detected, although the timing of the events was not perfectly inferred and the ancestral N e (before the bottleneck and expansion) were substantial overestimations. The estimated N e in the most recent generations of the drop scenario (54.0 ± 1.1) was significantly smaller than the estimate in the ancestral generations (1005.1 ± 69.8) (Figure 2b; Table 1) Table 1). However, the expansion of population size to twice the initial size (Figure 2d) was not captured in full, although it was well predicted by the X-chromosome estimation ( Figure S1d).

| RE SULTS
The estimations shown in Figure 2  ( Figure S3a). However, for some scenarios, such as the bottleneck and further expansion ( Figure S3d), it better predicts the most recent N e . As expected, using a too low cut-off value (c < 0.01) can miss the population size changes that occurred in the very recent past. If the genetic map remains unknown and the average recombination rate for the species is assumed to be uniform across the genome, the estimated N e values are generally less precise than when the genetic map is known ( Figure S3).

| DISCUSS ION
Our results show that the method developed by Santiago et al. (2020) is useful to detect drastic changes in the demography of populations.
Drosophila is, in some respects, a model species with some limitations for this purpose because of its short genome (about 180 Mb of about 1.25 Morgans) and because it only has two autosomal chromosomes. In addition, the sample size used in our study was relatively low (17 individuals). Even so, the method was able to infer the main changes that occurred in the population. The time for the bottleneck that occurred in the scenarios of Figure 2c,d was estimated to be a few generations before the real one, but the general pattern was well predicted. However, the fast expansion after the bottleneck was only partially captured, perhaps because the number of generations that elapsed after the expansion was relatively low to capture rapid changes in linkage disequilibrium. Furthermore, the method does not have enough power to detect the signal of a small difference in drift (the inverse of N e ) following a large perturbation in population size. In fact, the method predicts squared correlations between sites that are roughly proportional to 1/N e c. Given the small sample size, a difference between very small numbers, e.g., 1/ (1400) vs 1/(2800), could not be detected.
The use of different cut-off values for the maximum recombination rate (c) considered in the analysis has also relevance for a proper estimation of recent changes in N e ( Figure S3). For example, the large expansion after the bottleneck can be better predicted considering a maximum value of c = 0.1 ( Figure S3d) but, in contrast, using this cutoff value may produce substantial biases in the estimation of recent N e for other scenarios ( Figure S3a). In order to investigate this issue, we carried out computer simulations following the experimental design ( Figure S4). The methods for these simulations and the cor-  Table S1 of Hoban et al., 2020), and it is rather common when there is male competition for female mating (Nunney, 1993). A value of N e /N ≈ 0.5 would reflect a variance of family size (V k ) about three times larger than that for an ideal population with a Poisson distribution of progeny number per individual (Wright, 1938). This variation must be the reflection on N e of differences in fecundity and mating ability of the parents, as well as on the viability of the offspring. This N eVk is the value estimated by the software GONE, which is not affected by further reductions in N e due to the interaction between selection and recombination in genomic regions of tight linkage (Charlesworth et al., 1993;Hudson & Kaplan, 1995;Santiago & Caballero, 1998. Thus, as shown by Novo et al. (2022), GONE provides virtually unbiased estimates of the expected effective population size from the variance of progeny numbers (N eVk ).
The estimations of historical N e presented in Figure 2 and Figure S1 are obtained from the average of 20 replicates of each analysis ( Figure S2). In the case of autosomal estimations, each replicate is obtained by sampling (with replacement) a different random set of 50,000 SNPs per chromosomal arm, which is the procedure followed when the software is run at different times and there is a TA B L E 1 Mean and standard errors of the combined estimates of N e at recent (1-4), intermediate (16)(17)(18)(19)(20) and late (56-60) generations back in time. sufficiently large number of SNPs. As noted in Figure S2, the variation observed among replicates was rather small in the case of autosomal chromosomes. For the X-chromosomes the observed variation across replicates was higher, as the variation depended not only on the set of 50,000 SNPs analysed but also on the random combination of chromosomes considered. In addition, the number of sampled pseudo-diploid genomes was only eight in this case.
Other authors applying the method have suggested to perform a block-jackknife approach by setting non-overlapping blocks of 10,000 consecutive SNPs and removing one of these sets in each replicated analysis (Magnier et al., 2022). Yet, others have opted for making replicates from subsamples of half of the chromosomes available (Pacheco et al., 2022). These procedures seem appropriate, but we do not believe that they will offer a large difference with that followed here. In general, the limiting factor is uncertainty associated with sampling individuals rather than the number of SNPs (Waples et al., 2022), provided these are sufficiently numerous and are uniformly distributed across the genome, and the inferences obtained refer to the particular pedigree of the individuals sampled (Ralph, 2019).
The method of Santiago et al. (2020) has now been applied to different species, particularly in the last year, including insects, such as honeybees (Sang et al., 2022); birds, such as Black Robin  (Bird et al., 2023); domestic species, such as pigs (Krupa et al., 2022), cattle (Jin et al., 2022;Magnier et al., 2022), sheep (Djokic et al., 2023;Drzaic et al., 2022), horse (Criscione et al., 2022) and chicken (Gao et al., 2023;Liu et al., 2023); plants, such as walnut (Ding et al., 2022); crustaceans, such as Daphnia (Wersebe & Weider, 2023) and fungi (Singh et al., 2021). As suggested by Santiago et al. (2020) (Atmore et al., 2022;von Seth et al., 2022) and up to 700-800 in others (Gao et al., 2023;Singh et al., 2021). It is important to note that the linkage disequilibrium signal is diminished over time so that very far events cannot generally be inferred with precision. Furthermore, the method is expected to be more reliable when a genetic map is available for the SNPs considered. However, it can also be used assuming a constant average recombination rate for the species, although with somewhat less precision ( Figure S3). In fact, three quarter of the studies mentioned above assumed this latter approach, as on most occasions a genetic map for the species is not available or it is not too reliable.
As mentioned above, Santiago et al. (2020) recommended to ignore the information arising from pairs of SNPs showing a very large recombination frequency because the information from these pairs of SNPs is very sensitive to possible complications in the sampling of individuals, such as non-random sampling or recent admixture in the population. For example, if there is a recent admixture, a typical artefact observed is a recent huge increase in the estimated N e followed by a sudden drop in the last four generations (see Figure 2f of Santiago et al., 2020). Our experimental design implies some structuring of the large base population. This structuring involves the above-mentioned artefact and it is enhanced when the analyses are made with a high cut-off c value (see Figure S3 and results from simulations in the Supplementary Material). Note that, because the estimation process is made considering blocks of generations, the last four have always an identical value and some authors have opted for disregarding these initial generations (Atmore et al., 2022) or even the first 25 (Sang et al., 2022). However, the artefact produced by admixture can be ameliorated at least partially by ignoring pairs of SNPs with a recombination frequency higher than 0.05 (Figure 2f of Santiago et al., 2020, and Figures S3 and S5-S10 of the Supplemental Material). This is the recommended and default option of the software GONE. Previous studies have followed this recommendation and some of them assumed an even lower threshold (c < 0.01 ;Ding et al., 2022;Alvarez-Estape et al., 2023;or c < .02, Kardos et al., 2023).
Even so, this artefact may have occurred in some of the estimations made so far, for example, in two or three Croatian sheep breeds analysed by Drzaic et al. (2022) or one of the horse breeds analysed by Criscione et al. (2022). Note, however, that it is possible that a great decline in size can be expected to occur in the last few generations, for example, when a selection program has started within that period (Saura et al., 2021). Therefore, it is sometimes difficult to disentangle an artefact from a bona fide population change. The use of a threshold for c will depend on the number of SNPs available at short genetic distances, and it has to be borne in mind that disregarding SNPs at long genetic distances will also decrease the power to detect very recent changes in N e , as shown in Figure S3.
For the analyses carried out so far, most of them have suggested declines in N e at different times and extents. This is something ex-  Figure S2. Thus, caution should be taken about the inferences made for the ancestral N e even though those of the recent N e are generally reliable. In addition, it is sometimes easier to detect abrupt changes in N e than constant population sizes when these are large. In fact, our estimates for the base population N e showed a nearly constant size over generations, as expected, but the actual N e value was probably underestimated by the autosomal data and clearly overestimated by the X-chromosome data.
Although the method experimentally examined in this work has been shown to be reliable, several issues must be further studied.
The most important is the effect of admixture or non-random sampling in the analyses, which must be studied in more detail. This has implications particularly in the estimates of the most recent generations, which can show substantial artefacts depending on the cvalue cut-off considered. Another issue is that we have applied the current method for haploid data with a heuristic approach, but the method should be more formally extended to sex chromosome or haploid data, and current work is in progress in this regard.

CO N FLI C T O F I NTE R E S T S TATE M E NT
The authors declare no conflicts of interest.

DATA AVA I L A B I L I T Y S TAT E M E N T
Computer scripts used in the analysis are available at Github address https://github.com/irene -novo-g/Histo rical -Ne-Droso phila.

FASTQ files have been deposited in the NCBI Sequence Read
Archive SRA database under accession numbers SAMN35298180-SAMN35298247 (BioProject accession: PRJNA974804).

B E N E FIT S H A R I N G S TATE M E NT
Our group is committed to international scientific partnerships, as well as institutional capacity building.