Effects of cyclic changes in population size on neutral genetic diversity

Abstract Recurrent changes in population size are often observed in nature, influencing the efficiency of selection and consequently affecting organismal evolution. Thus, it is important to know whether such changes occurred in the past history of a focal population of evolutionary interests. Here, we focused on cyclic changes in population size and investigated the distributional properties of Tajima's D and its power to distinguish a cyclic change model compared with the standard neutral model, changing the frequency and magnitude of the cyclic change. With very low or very high frequencies of the cycle, the distribution of Tajima's D was similar to that in a constant size population, as demonstrated by previous theoretical works. Otherwise, its mean was negative or positive, and its variance was smaller or larger depending on the time of sampling. The detection rate of the cyclic change against the constancy in size by Tajima's D depended on the sample size, the number of loci, and the time of sampling in addition to the frequency and amplitude of the cycle. Using sequence data of several tens of loci, the detection rate was fairly high if the frequency was intermediate and the sampling was made when population size was large; otherwise, the detection rate was not high. We also found that cyclic change could be discriminated from simple expansion or shrinkage of a population by Tajima's D only if the frequency was in a limited range and the sampling was made when the population was large.

changes in population size affect the evolution of genes and to infer the history of size change in natural populations.
By tradition, population genetics have addressed changes in population size using a notion of effective population size (see a review by Charlesworth, 2009). In particular, if population size changes through time, neutral genetic variation in the population can be computed by considering a surrogate population with its size being the harmonic mean of the past population sizes.
Although we can compute a certain quantity, such as the average heterozygosity, using the effective size thus defined, the validity of the assumption of using a surrogate population with a constant size to understand the dynamics of genetic variation in a population with varying size is dependent on the timescale of the changes (Nordborg & Krone, 2002).
Indeed, Sjödin, Kaj, Krone, Lascoux, and Nordborg (2005) demonstrated that the coalescent process of neutral alleles in a model with stochastic changes in population size converges to that in a model with constant size as the rate of the size change approaches zero or infinity; otherwise, there is no corresponding model with constant size that results in the same genealogical process. In addition, using simulation, they computed the expectation of one of the neutrality statistics, Fu & Li's F* (Fu & Li, 1993), as a benchmark to measure deviation from the constant size model and investigated effects of the change in population size on the statistic. From the results, they suggested that the population behaves as a constant size population if the rate of size change with the time measured in units of the present population size is very low or high, that is, in general, smaller than 1/10 or larger than 10. The genealogy under the model with recurrent size changes was further analyzed by Erikkson, Mehling, Rafajlovic, and Sagitov (2010). They derived a general formula to compute the moments of the total branch length in the genealogy.
Assuming a sinusoidally varying population size as a concrete example, they computed the mean and other moments up to the fourth of the total branch length and demonstrated how these moments in a population with varying size deviate from those in the constant size population as the frequency of size change varies.
Although these studies clarified the genealogical structure in populations with size change, there are still some gaps between their results and their application in the interpretation of real polymorphism data, especially in detecting past recurrent changes in population size. First, we can only observe sequence differences among sampled alleles and cannot directly estimate their genealogy. Therefore, we want to know properties of the statistics calculated from sequence data. Second, to detect past changes in population size from polymorphism data, we need to know the distributional property (the variance at a minimum) of the statistics. Sjödin et al. (2005) computed the mean of Fu & Li's F* but did not calculate the variance. At last, simple expansion or shrinkage of a population results in deviations of statistics from those in the population with a constant size. Therefore, we would like to know whether such simple changes in population size could be discriminated from recurrent changes in population size using some of the statistics of polymorphism data.
To narrow the gaps mentioned above, we consider a population with cyclic change in size and investigate the distributional properties of Tajima's D (Tajima, 1989a) and the possibilities of detecting recurrent changes in population size using polymorphism data at a few dozen loci. There are a few reasons why we choose a simple cyclic change model from a variety of possible models with recurrent size changes. First, a significant proportion of animal populations exhibit cyclic change in population size (Kendall et al., 1998). Second, some abiotic and biotic environmental changes are cyclic; for example, temperature and humidity change with a periodicity of approximately one hundred thousand years in glacial cycles (Bennett, 1997). In addition, host-parasite interaction often results in cyclic changes in population size (e.g., Stahl, Dwyer, Mauricio, Kreitman, & Bergelson, 1999). Depending on the generation time of the organism, the frequencies of those environmental changes measured in units of one generation vary among organisms. Third, although changes in population size are typically not exactly cyclic but involve some elements of stochasticity, we cannot employ the stochastic model used by Sjödin et al. (2005) because we want to consider cases where samples of alleles at multiple loci are taken from a population whose size at each past time point takes a specific value. If we employ their model, the variance of a statistic involves not only that from genetic drift but also that from the stochastic change in population size. At last, we can observe the effects of the rate of size change on genetic diversity most easily by employing cyclic change.
In this study, we investigate the following questions assuming a model with cyclic change in population size using simulation, changing the frequency and amplitude of the cycle. First, we investigate the effects of the frequencies and amplitude of the cycle and the timing of sampling on the distribution of Tajima's D. Second, we study the effects of the number of loci, the mutation rate, and the parameters of the cyclic change on the power of a test based on the mean of Tajima's D to discriminate the cyclic change model from the constant size model, which is hereafter referred to as the standard neutral model. We also investigate the power of a likelihood ratio test using a model-based inference method for the demographic history developed by Excoffier, Dupanloup, Huerta-Sánchez, Sousa, and Foll (2013) and compare the powers of the two tests. Third, because deviation of Tajima's D is also observed in models with simple population expansion or shrinkage (Tajima, 1989b), we compare the mean and variance of Tajima's D in the cyclic change model with those simple size change models. We find that the distribution of Tajima's D approaches that under the standard neutral model when the frequency of the cycle is very low or very high as noted by previous theoretical works (Erikkson et al., 2010;Sjödin et al., 2005), but the approach depends on the timing of sampling in addition to the frequencies and amplitude of the cycle. If the sampling is made when the population size is large, deviation of the cyclic change model with intermediate frequencies from the standard neutral model can be detected using data of several tens of loci, and discrimination from simple expansion may be possible under some condition. Otherwise, detection becomes more difficult, and the cyclic change is indistinguishable from simple shrinkage. Considering the importance of past population structure on the consequences of natural selection and ease of obtaining data at more than dozens of loci in nonmodel organisms (Lascoux & Petit, 2010), examining neutrality statistics, such as Tajima's D, to obtain knowledge of past population size in addition to levels of genetic diversity and differentiation will be helpful as a first step for understanding the evolution of a target species.

| ME THODS
We chose Tajima's D (Tajima, 1989a) as a statistic to detect recurrent changes in population size. We also examined distributional properties of various other statistics, such as Fu & Li's F* (Fu & Li, 1993) and Fay & Wu's H (Fay & Wu, 2000), for a small number of parameter sets that characterize recurrent changes, but Tajima's D generally exhibited the smallest variance when the population size was altered (data not shown). Tajima's D was originally developed to detect signals of selection, but it has been often used for inferring the demographic history (Ramos-Onsins & Rozas, 2002). Tajima's D measures the difference between two unbiased estimators of the population mutation rate (θ), k and ̂W (Tajima, 1989a), where k is the average number of pairwise nucleotide differences, and ̂W is an unbiased estimator of θ (Watterson, 1975) calculated from the number of segregating sites, S. Tajima's D is defined by where e 1 and e 2 are calculated from the sample size n. Tajima's D tends to be negative in expanding populations and positive in shrinking populations (Tajima, 1989b).
We assumed the infinite site neutral Wright-Fisher model. We generated sample sequences under the standard neutral model, cyclic change model, population expansion model, and population shrinkage model for a panmictic population using the ms program (Hudson, 2002). The cyclic change model used here is presented in Figure 1.
Population size is N 0 for t 0 generations, suddenly changes to N 1 (N 0 > N 1 ), and remains at that size for t 1 generations. This is one cycle, and this process is repeated indefinitely. At a certain point in one of the cycles, n alleles are sampled. We set the mutation rate so that the expectations of k at the sampling point were the same when different models were compared.
In the simulation, we used N 0 /N 1 = 5, 10, and 20 by varying the value of N 1 but N 0 = 100,000. The length (t 0 ) of the period during which the size was N 0 in a cycle was the same as that (t 1 ) of the period during which the size was N 1 . We defined the frequency of cycle i as the number of cycles per N 0 generations such that t 0 = t 1 = N 0 / (2i). The value of i examined ranged from 0.01 to 1,000. For example, when N 0 is 100,000, i = 1 indicates that the length of one cycle is 100,000 generations. We examined four sampling points, point 1 (at the end of the large phase), point 2 (at the midpoint in the large phase), point 3 (at the end of the small phase), and point 4 (at the midpoint in the small phase) as shown in Figure 1.
First, to assess effects of the frequency of the cycle (i) and the amplitude (N 0 /N 1 ) of changes in population size and the timing (point x) of sampling on the distribution of Tajima's D, we generated 100,000 datasets of n sequences under the cyclic change model using ms. We first set E[k], the expectation of k, to 5.0 per locus at the sampling point. We computed the mutation rate for a specified E[k] at the sampling time using the formula (10) in Chakraborty (1977). We used a sample size of n = 50. We computed the distribution, mean, and variance of Tajima's D.
Next, we investigated the power to discriminate the cyclic change model from the standard neutral model using the mean of Tajima's D when data from multiple loci were available. We followed the fixed S procedure of Hudson (1993) to test the generated datasets against the standard neutral model. The fixed S procedure simulates samples by fixing the number of segregating sites S instead of using the unknown parameter θ. This procedure seems reliable in cases without recombination (Wall & Hudson, 2001). We calculated the power in the following manner. because Tajima's D is not sensitive to deviations if n is less than 30 (Tajima, 1989a). As sample size increases, the probability of detecting deviations from the standard neutral model increases (Simonsen, Churchill, & Aquadro, 1995;Sjödin et al., 2005). Therefore, we also expect that the power of Tajima's D to detect cyclic change increases with sample size.
To compare this simple method of using Tajima = 0 = 1 = 5 = 10 = 1000 the size was N 1 (N 0 ) in the past but suddenly changed to N 0 (N 1 ) at t generations ago. In the simulation, we used N 0 /N 1 from 1 to 1,000.
The time until the change t/N 0 (t/N 1 ) was set to values from 0.00005 to 50 (from 0.005 to 50) in the population expansion (shrinkage) model.
In evaluating the power of the tests, we included monomorphic data obtained from the simulation; otherwise we excluded monomorphic data. In all cases, if we decreased the amplitude, the oval became smaller ( Figure 3).

| Comparison between cyclic change and simple demographic change models
We compared the mean and variance of Tajima In our analyses of cyclic change in population size, we made several simplifying assumptions. Here, we discuss a few of them briefly.
First, we investigated effects of the cyclic change model with only equal lengths of the periods of large and small population sizes (t 0 = t 1 ), but this is not always the case. For example, one glacial cycle consists of a long period of a cool and dry climate and a short period of a warm and humid climate (Bennett, 1997). If the focal organism is adapted to the warmer period, the time when the population size is large becomes shorter (t 0 < t 1 ). Therefore, we need to consider the effects of unequal lengths of the periods. We can predict consequences of the inequality when the frequency of the cycle is low or high. If the size change is slow, Tajima Therefore, the convergence occurs at a lower frequency when t 0 > t 1 compared with the equal case and at a higher frequency when t 0 < t 1 .
Second, we assumed that sudden changes in population size although natural populations would rarely grow 5 or 10 times larger in one generation. We made this assumption given that gradual growth in the transitional stage, which seems more realistic, increases the number of parameters. If the growth rate is high such that the transitional period is considerably shorter than the length of one cycle, differences between the two models would be negligible. Otherwise, the result will change. In a study investigating the power of various  Demographic changes in populations affect the consequences of natural selection, often significantly (Brandvain & Wright, 2016).
For example, nearly neutral mutations (Ohta, 1992) behave like neutral alleles in small populations but as selected alleles in large populations, and recurrent changes in population size result in irregularities in the molecular evolution of such mutations (Cutler, 2000). In addition, the total number of advantageous mutations appearing in one generation is small in small populations but large in large populations, thus facilitating rapid adaptation in large populations. These considerations lead us to conclude that the evolutionary paths of populations with recurrent changes in size would significantly differ from those with constant size or with one-time size change. Therefore, it is important to know whether recurrent size changes have occurred in the populations of the target organism by examining means and variances of neutrality statistics at neutrally evolving loci.
However, it is generally difficult to determine whether the loci at which the data are collected have been affected by selection or not.
Of late, Ewing and Jensen (2016) pointed out that inferences on the demographic history would be strongly biased by intermediate levels of background selection. Background selection poses a serious problem in inferring the demographic history especially when population size changes because a wide range of fitness effects become intermediate sometime in the past. One way to avoid this problem may be to use RAD (restriction site-associated DNA) sequencing (Baird et al., 2008) or related methods, whose target sites are mostly in noncoding regions and possibly neutrally evolving. The simple method using the mean of Tajima's D investigated here may be especially useful for data obtained by methods such as RAD sequencing because number of samples at each locus is variable and the frequency spectrum required, for example, for fastsimcoal2, is sometimes difficult to obtain.
Given the importance of knowing the past demographic history but recognizing the difficulty in its inference, we recommend that a preliminary analysis for data obtained by a method such as RAD sequencing using the mean of Tajima's D to detect deviation from the standard neutral model is carried out. This knowledge will help planning further studies on the target species using more data in terms of number of individuals and read depths of the sequencing, which may aim to estimate population parameters using model-based methods or evaluate effects of selection on some or all part of the genome.

ACK N OWLED G M ENT
This work was supported by JSPS KAKENHI Grant Numbers, JP26291082, JP16H02553. We thank anonymous reviewers for their helpful comments on earlier drafts of the manuscript.

CO N FLI C T O F I NTE R E S T
None declared.

DATA ACCE SS I B I LIT Y
All programs used in this study are available on the Dryad repository, DOI: https://doi.org/10.5061/dryad.364cf36.