The population genetics of partial diapause, with applications to the aestivating malaria mosquito Anopheles coluzzii

Diapause, a form of dormancy to delay or halt the reproductive development during unfavourable seasons, has evolved in many insect species. One example is aestivation, an adult-stage diapause enhancing malaria vectors' survival during the dry season (DS) and their re-establishment in the next rainy season (RS). This work develops a novel genetic approach to estimate the number or proportion of individuals undergoing diapause, as well as the breeding sizes of the two seasons, using signals from temporal allele frequency dynamics. Our modelling shows the magnitude of drift is dampened at early RS when previously aestivating individuals reappear. Aestivation severely bi - ases the temporal effective population size ( N e ), leading to overestimation of the DS breeding size by 1 ∕( 1 − 𝛼 ) 2 across 1 year, where 𝛼 is the aestivating proportion. We find sampling breeding individuals in three consecutive seasons starting from an RS is sufficient for parameter estimation, and perform extensive simulations to verify our derivations. This method does not require sampling individuals in the dormant state, the biggest challenge in most studies. We illustrate the method by applying it to a published data set for Anopheles coluzzii mosquitoes from Thierola, Mali. Our method and the expected evolutionary implications are applicable to any species in which a fraction of the population diapauses for more than one generation, and are difficult or impossible to sample during that stage.

. While the definition of what being "unfavourable" varies across species, as does the life stage (embryonic, larval or adult) in which diapause occurs, the common goal is to increase organismal survival, and ultimately enable them to re-establish when the next favourable season arrives (Denlinger & Armbruster, 2014).If the duration of the unfavourable season lasts for multiple generations, then diapause has a direct impact on the rate of evolution and the genetic constitution of the population.
In many locations, malaria vector populations exhibit strong seasonal fluctuations in abundance, being present in large numbers during the rainy season (RS) and then drop to low levels or even become undetectable when the breeding sites disappear in the dry season (DS) (Charlwood et al., 1995;Hidalgo et al., 2016;Kabbale et al., 2013).Various hypotheses have been suggested to account for their persistence in the DS and the source of their reestablishment soon after the first rains.For some vector species, like Anopheles coluzzii, aestivation, a DS adult-state diapause, is thought to be the primary route (Charlwood et al., 2000;Lehmann et al., 2010;Wang et al., 2011).Aestivation is in itself an interesting biological process which predominantly activated by the absence of water (Depinay et al., 2004).However, using conventional mosquito trapping methods, it is almost impossible to detect aestivators; thus, they are thought to occupy distinct hidden habitats during this state (Charlwood et al., 2000;Lehmann et al., 2017).
There are other hypotheses, such as long-distance migration, which assumes (near) extinction of the local DS population then recolonisation by migrants from further afield where breeding sites exist year-round (Adamou et al., 2011;Mwima et al., 2023).Another is local refugia, where populations survive locally but hidden from sampling, and maintain themselves by low-level breeding during the DS, but this implies the constant availability of larval sites (Omer & Cloudsley-Thompson, 1970).
Many studies have focused on finding direct evidence of aestivating mosquitoes, notably via the mark-release-recapture (MRR) experiment.For example, Lehmann et al. (2010) recaptured one marked female An.coluzzii after it had endured a 7-month-long DS in the Sahelian region.More recently, a modified MRR using hydrogen isotopes as markers estimated that aestivation contributed at least 20% in census size to the persistence of An. coluzzii in the Sahelian villages of Mali until the following RS (Faiman et al., 2022).One obvious challenge of the direct methods is the sampling of aestivators, that these laborious MRR provided a disproportionate amount of evidence with low recapture rates (Epopa et al., 2017).There have been fewer indirect (i.e.genetic) studies.Lehmann et al. (2017) explored the population genetics of several Malian An. coluzzii populations through successive seasons, and attempted to interpret the results in the context of possible aestivation, noting that drift should be negligible in aestivating populations due to reproductive arrest, but the qualitative nature of the work did not present a formal model or statistical testing.There have been other studies to test whether populations undergo annual bottlenecks (Lehmann et al., 1998;Luikart et al., 1998), but these drift-based analyses are limited to the DS only.They did not directly infer aestivation nor provide any means for parameter estimation.
In this work, we first study from a theoretical perspective the magnitude of genetic drift under diapause.If we define temporal effective population size (N e ) as that estimated from the change in allele frequencies between two time points under the idealised Wright-Fisher (WF) model (Waples, 1989) Under the WF model, the standardised variance of allele frequency V t (Waples, 1989) increases by approximately 1 ∕ 2N e per generation: (full derivations of this and other equations in this section can be found in the Appendix S1).A relevant statistic that can be derived from V t is the temporal F, which is the standardised variance of allele frequency change (Waples, 1989).If two samples are taken Δ t generations apart, then F is approximately: from which N e can be solved or estimated (Waples, 1989).If N e is not a constant, then this method estimates the harmonic mean N e of all the generations between the two samples (Waples, 2005).
To extend the model to incorporate aestivation, we assume two seasons per cycle (or year, for most organisms), the RS and DS, lasting Δ R and Δ D generations, and with breeding population sizes N R and N D , respectively.While the population still reproduces under the WF manner between most generations, the transitions between seasons are of particular interest.These include the generations of ERS (E for early, the first generation of RS), LRS (L for late, the last generation of RS) and similarly EDS and LDS.Upon entering EDS, N D offspring (1) enter the breeding compartment, and N A to another compartment to aestivate (N A is the number that survives aestivation to re-emerge the next ERS; individuals that die during aestivation do not affect allele frequencies and are ignored).The breeding compartment continues to reproduce during DS but with a smaller size, while reproduction halts for the aestivators.The two compartments remain isolated throughout the DS.At the next ERS, those N A aestivators reappear and unite with the descendants of the breeding individuals from the LDS.This completes the population dynamics of one cycle.
The model can be visualised in Figure 1, and the notations used in the paper are found in Table 1.
With this model in mind, we then quantify the effect of aestivation on the temporal allele frequency dynamics, such as how V t Note that the variance at the beginning of an RS (V ERS ′) can be lower than that from the previous generation (V LDS ): The difference is negative when With V t no longer monotonically increasing we need to examine its impact on temporal N e estimates.Here, our derivations focus on the following three-time points: ERS, EDS and ERS' (Figure 1), from which three temporal N e estimates can be calculated.
Without much ambiguity, the expected temporal F between ERS

and EDS is
As there are Δ R generations between them, the corresponding temporal N e estimate refers to the harmonic mean of Δ R − 1 generations of size N R and one generation of size N D .A more complex scenario is when the two samples are from exactly one cycle apart from successive ERS's (Appendix S1): The temporal N e estimate between two consecutive ERS is equivalent to that of the WF process with Δ R generations of size N R , and Finally, between EDS and ERS' (Appendix S1): (3) Then, we aimed to validate the temporal F's we derived, focusing only on ERS, EDS and ERS'.

| RE SULTS
Simulation results can be found in Figures 2 and 3, and Appendix S1.In short, the results all agreed with our theoretical expectations (Equations 6-8).

| PAR AME TER E S TIMATI ON ON A RE AL DATA S E T
To illustrate how the theory can be used to estimate parameters, we searched for data sets in which the same population had been geno- DS, they were collected in October 09, April 10 and June 10.As the generation time was assumed to be 1 month (Ag1000G, 2017), the sampling horizons for the 2 years were 14 and 8 generations respectively.According to the original authors, the DS runs from December to May (Lehmann et al., 2017).For summary statistics, we used the three temporal F's, which were computed via F c to account for finite sampling and missing data (Peel et al., 2013;Pollak, 1983;Waples, 1989, see also Appendix S1).
In each ABC simulation, a burn-in period of 24 generations (2 cycles) was run.The initial allele frequency was sampled from U(0.2, 0.8) , where U(a, b) denotes a uniform distribution with bounds a and b.
For the priors, N D was drawn from U(10, 1000) based on preliminary studies on it being not too large; N R | N D ∼ U N D + 2, 100,000 ; and Note that the posterior N A was highly skewed.The result for N R was inconclusive because the posterior was flat (see discussion below).
The posterior of had a sharp peak with a median of 0.39.CIs are reported in the figure legends.

| DISCUSS ION
Diapause allows many insect species to survive through their unfavourable seasons.Its interesting mechanisms have been studied for model organisms such as Drosophlia melanogaster (Ojima et al., 2018;Piper & Partridge, 2016).Many genera of mosquito are also known to undergo diapause and at different life stages: egg and larva diapause occurs in Aedes, while adult diapause is usually found in Culex and Anopheles (Denlinger & Armbruster, 2014).The mechanisms by which reproductive arrest is achieved can be different: some take blood meals but do not develop eggs, some do neither (Yaro et al., 2012).Even within the same species, diapause may or may not be activated depending on the climate the populations live in (Lee & Duvall, 2022).For most species, mosquitoes and beyond, the unfavourable season is usually the harsh winter, but for An.coluzzii, this becomes the tropical DS in the Sahel region.Aestivation, an adult-stage diapause, provides a route for the populations to persist then rebound when the rain returns.Aestivation, by its very nature, is difficult to study because the individuals in this state are difficult or impossible to collect (Lehmann et al., 2017).In this work, we have explored how an indirect approach, using allele frequency changes across seasons, can be used to estimate the proportion of aestivation.Our model extends the classical WF model by incorporating two separate compartments in the DS for partial aestivation.
Here, the meaning of partial is twofold: First, our model still allows breeding during the DS under a smaller size, as opposed to the more restrictive complete reproductive arrest (Lehmann et al., 2017).
Second, N A + N D can be smaller than N R such that the total population size can be smaller in the DS.The formulae we derived for V t and temporal F assumed that only the breeding compartment is observed.One advantage of this method is it does not require sampling of aestivating individuals, which thus far have been difficult or almost impossible to find, while sampling breeding DS individuals is more feasible (Lehmann et al., 2017;Ag1000G, 2020. Cross-seasonal dynamics are crucial to understanding the population genetic consequences of aestivation.One characteristic is the drop in V t at every ERS'.Previously aestivating individuals, who have experienced no drift through the DS, reappear and admix with the rest of the population, lowering the combined V t .If the proportion of the RS population that aestivates ( ) is sufficiently large, then V t is severely dampened or even reset to the level similar to the LRS.Populations with huge seasonal variation in population density, such as An.coluzzii, whose N D ∕ N R can be as low as 0.01 (Bomblies & Eltahir, 2009;Khatri & Burt, 2019;Mabaso et al., 2007;Minakawa et al., 2002), require only a tiny to give such an effect (Equation 5).7).
It may therefore give the false impression of having a very large DS breeding size, contradictory to within-season estimates or the ecology of the species of interest.Note that temporal N e estimate between EDS and ERS' (Equation 8) has more terms than the number of generations between them, presumably because there are two routes linking them, one via the DS breeding compartment, and another via LRS (backward in time) then through the aestivating compartment.
Equations (6-8) show that sampling from three consecutive seasons starting with an RS is sufficient to estimate the three different population sizes (N R , N D , N A ), which serves as the basis for our ABC.
With aestivation, the sum of the two shorter term F's is greater than the overall F, and the difference increases with (Equation 9).As a result, the overall N e estimate will be larger than either of the two shorter term estimates.While the theory was developed assuming samples were taken at the first generation of each season, it could be modified to consider samples taken in most other generations from their respective seasons, with slightly different expectations.
The theory could also be extended to consider populations with overlapping generations (Waples & Yokota, 2007).
The inequalities between the shorter and longer term temporal F's from the Thierola samples (Equation 9, and the legends of smaller than that of the RS.For the second year, N D was also small compared to its corresponding RS, but was larger than the previous year.This result concurred with the original publication that the first year was genetically less stable (Lehmann et al., 2017).In both years, the C.I. of N D and excluded 0, suggesting the coexistence of aestivation and reproduction in the DS (partial aestivation).The N R estimate for the second year was inconclusive as the posterior was flat and almost identical to the uniform prior.We believe this was caused by two factors: First, the sample size might be inadequate, that sampling noise overwhelmed the relatively weak drift signals (Waples, 1989).This was further supported by the small temporal F estimates (Figure 5).The second possibility was the relatively short sampling interval.It began rather late in RS (October) and ended at the next ERS (June); hence, it may not have covered enough RS generations.The original authors also found very little differentiation among the samples collected from October '09 and beyond, which also explained the challenges we faced when estimating the parameters for the second year.While the N R estimate was inconclusive, we could still estimate with precision.This is not uncommon in population genetics, that only composite or ratio parameters can be resolved when information is limited.
The proportion of the population that aestivates is unlikely to be a fixed characteristic of the species, but rather to vary The estimates for N R were not reported as its posterior was flat.The corresponding temporal F estimates were 0.0071, 0.0024 (shorter term) and 0.0036 (overall).
geographically, depending on the severity of the DS.Indeed, there is likely to be an adaptive cline in aestivation phenotypes, given their heterogeneity between climate regions (Yaro et al., 2012).In areas such as the Equatorial where surface waters are found yearround, is expected to be 0 (see Appendix S1).In contrast, Sahelian An. coluzzii requires specific mechanism like aestivation to persist through the DS (Mwima et al., 2023).We believe partial aestivation occurs along this gradient, but this hypothesis requires further investigation.There are other potential factors that may affect the estimate, such as the switching of individuals between the two compartments in the DS, or if aestivators begin their dormancy at different DS gnerations.
As in other temporal N e studies, perfect knowledge of generation time is usually assumed.Additionally, we require information on seasonality and its transition.The beginning of RS is usually associated with the first rain and thus does not always fall on the exact same date or week every year.Sampling too close to the beginning of each season may risk obtaining individuals that are from the previous one.
The proposed sampling regime of RS-DS-RS is only an example, and further investigations are encouraged.As mentioned, our model and derivations are applicable to any diapause scenario that lasts for more than one generation, which is exactly the case for most shortlived insect species.
Key assumptions for our model and method of analysis are (1) that individuals collected during the DS are reproductive, and ( 2) that the population under study is closed, without significant genetic impacts due to immigration over the course of the study.It is not clear how well the population of An. coluzzii from Theriola fits these assumptions.For example, no breeding sites were found during the DS, and it is possible that individuals collected then were aestivators that briefly became active in order to feed before returning to aestivate.It will be worthwhile to expand the model to allow for this possibility, and for possible large-scale immigration at the beginning of the RS, which has also been proposed for these populations (Dao et al., 2014;Huestis et al., 2019).Each of these processes should leave a distinguishable genetic signature which will allow appropriate statistical inference.We hope our work will stimulate and accelerate relevant studies, to shed light on questions such as why aestivating proportions vary between populations and years, and how they interact with vector control programmes.All else being equal, aestivation is expected to slow down the rate of evolution in calendar time, including insecticide resistance.Considering genetic control strategies, a large aestivating population will slow down the spread of a gene drive (North et al., 2020), but can also serve as a reservoir for the gene drive mosquitoes so that they do not locally go extinct after a DS, potentially having an effect similar to repeated releases (Eckhoff et al., 2017).The consequences hence vary from case to case, requiring further analyses.More generally, understanding the persistence mechanisms is essential for all vector control programmes to identify and target the susceptible phases of the vector populations (Lehmann et al., 2017).It is foreseeable that long-distance migration and aestivation, the two main hypotheses of their persistence and re-establishment in highly seasonal environments, require different vector control strategies.Future genetic investigations will be necessary to help differentiate these processes with quantitative testing, but they will almost certainly require intensive sampling and a combination of genetic parameters, such as linkage disequilibrium, clustering and kinship.
, we can address several interesting questions: How does diapause affect temporal N e estimates, if samples are collected across seasons?What are these N e estimates referring to?What happens if only a fraction of individuals undergo diapause, while the others continue to breed?Can we estimate the number (or proportion) of individuals entering the diapause stage during the unfavourable season from allele frequency dynamics?Below we give answers by mathematical derivations and computer simulations.We use aestivation to represent diapause (and the DS as the unfavourable season) in the following paragraphs.It is because the worked example of the genotypic data set primarily concerns An. coluzzii.Note that the theory is also applicable to other diapausing species (see Discussion, below).1.1 | Theory 1.1.1| Standardised variance of allele frequency Without other forces (migration, selection, mutation), allele frequency fluctuates randomly due to genetic drift.While the mean allele frequency remains unchanged over time, the variance increases.
changes over time, the expected temporal F and the interpretations of N e estimates.Given that the model begins with the ERS, V t follows the WF for all generations up to the LDS, but under two different breeding sizes.For instance, we have where Δ R is the duration of the RS (similarly for Δ D ).The next ERS (denoted as ERS') can be viewed as a mixture of the two compartments, one experiencing no drift (other than that associated with the establishment of the aestivating class) and another with Δ D generations of drift.Then, V ERS′ is calculated from their standardised variances and covariance (see SI):We further introduce = N A N R as the fraction of aestivators contributing to ERS'.One can quickly examine the limiting cases: For complete reproductive arrest, = 1, then ERS' is in effect the next generation of LRS hence with only one generation of drift with breeding size N R .If = 0, then it reduces to the pure WF process with Δ D generations of drift in the DS, and one generation when the population enters the next RS.
E 1 Diagram for the aestivation model.The model begins at ERS with breeding size N R .Within the same season, the population reproduces (horizontal arrows) according to the WF model.Upon entering EDS, the population branches into two compartments: one continues to breed but with a smaller size N D , another aestivates with N A individuals.At the next ERS (denoted as ERS'), it is formed by previously aestivating individuals, and the descendants of the breeding compartment from LDS.With Δ D generations between them, we are unable to express it analytically as any kind of harmonic mean within the sampling horizon.Note that Equations 6-8 are functions of the three parameters, hinting at the potential use of the three temporal F's for parameter estimation.Another interesting observation is that when there is aestivation (N A > 0), the sum of the two shorter F's exceeds the overall F, with the difference determined by the three sizes:2 | ME THODSWe built a forward WF simulator with breeding and aestivating compartments to verify the above derivations.Biallelic loci were simulated and the allele frequencies of the breeding compartment over time were recorded.The model further assumed non-overlapping, discrete generations, with random mating.The three sizes are nonnegative with N R ≥ N A + N D , and Δ R and Δ D are known.The first simulation aimed to visualize the V t dynamics under different aestivation proportions and compare them to our theoretical expectations.

Figure 2
Figure 2 plots V t over time for several values of .Within a cycle starting from the ERS, V t increases with two different slopes because of the two different breeding sizes.The shape of V t within one cycle was the same for all , as the breeding sizes were the same.The only difference was between LDS and the next ERS when aestivating individuals reappear.Larger gave a bigger drop in V t between the two generations, concurring with our theory (Equation 4).In the second simulation, we focused on the three pairs of temporal F ′ s among ERS, EDS and ERS'.Each box plot and error bar in Figure 3 came from 1000 repeated simulations.Between the first two time points, different values of produced the same mean F as no aestivation was involved.The widths of the error bars appeared to be similar as well.Both F EDS,ERS ′ and the overall F ERS,ERS ′ decreased with .Because of their inversely proportional relation, larger gave larger N e estimates, despite having the same breeding sizes.

F
typed at multiple loci in three successive seasons (RS, DS, RS').The most appropriate we found was the study ofLehmann et al. (2017), though it is not clear it meets all the assumptions of the model (see Discussion, below).We particularly focused on the temporal An. coluzzii samples collected between 2008 and 2010 from the Sahelian village of Thierola in Mali.Sample sizes per time point varied from about 30 to 60, and were scored for 738 SNPs.To estimate parameters, we used Approximate Bayesian Computation (ABC; Beaumont et al., 2002).As the data set spanned 2 years, we analysed the two DS separately, with non-overlapping intervals.For the first DS, the three temporal samples were collected in August 08, May 09 and October 09, and for the second (9) F ERS,EDS + F EDS,ERS� = F ERS,ERS� + Temporal F, standardised variance of allele frequency change (between two time points) F I G U R E 2 Simulated V t over time under different aestivating proportion .Four values of were examined.Note that = 0 is equivalent to WF. Simulation parameters are as follows: N R = 10000 , N D = 1000, N A = N R , Δ R = Δ D = 6.V t were calculated from 10,000 independent loci, with an initial frequency of 0.5.Four complete cycles (48 generations) were simulated, starting from an RS.
Figures4 and 5.For the first year, the posterior medians for N R , N D

F
Posterior distributions for the first DS (2008-2009).The 90% C.I. were estimated as follows: N R : [163, 486], N D : [39, 53], N A : [113, 420], : [0.68, 0.88].The corresponding temporal F estimates were 0.0714, 0.0669 (shorter term) and 0.0290 (overall).Signals from allele frequency over time have been utilised for N e estimation, often via the well-studied temporal F statistic (Hui et al., 2021; Waples, 1989).Before this work, little was known about the interpretation of these estimates for aestivating populations, especially when the two samples are taken across different seasons or sandwich an ERS.The temporal F across two consecutive ERS overestimates N D by a factor of 1 ∕ (1 − ) 2 (Equation

Figures 4
Figures 4 and 5) are indicators of aestivation.The first-year DS effective breeding size was estimated to be about 45, several times