Understanding evolutionary and ecological dynamics using a continuum limit

Abstract Continuum limits in the form of stochastic differential equations are typically used in theoretical population genetics to account for genetic drift or more generally, inherent randomness of the model. In evolutionary game theory and theoretical ecology, however, this method is used less frequently to study demographic stochasticity. Here, we review the use of continuum limits in ecology and evolution. Starting with an individual‐based model, we derive a large population size limit, a (stochastic) differential equation which is called continuum limit. By example of the Wright–Fisher diffusion, we outline how to compute the stationary distribution, the fixation probability of a certain type, and the mean extinction time using the continuum limit. In the context of the logistic growth equation, we approximate the quasi‐stationary distribution in a finite population.


Introduction
The huge computational power available today allows more and more theoreticians to develop individual based models of high complexity in order to explore dynamical behavior in ecology and evolution.In this manuscript, we aim to make a link between these individual based descriptions and continuous models like (stochastic) differential equations that remain amenable to analysis.We review these techniques and apply them to some frequently used models in ecology and evolution.
In ecology the probably most common description of population dynamics is the logistic growth equation (Verhulst, 1838).Its attractiveness draws from its simplicity.It has a globally attractive fixed point (when started with any non-zero population size), the carrying capacity of the population.However, this simplicity comes at the cost that biological observations as population size fluctuations or even extinction events are not captured by this deterministic model.To account for these stochastic effects one needs to change to a stochastic differential equation which can be derived from the individual based reaction (also called first principles) (Champagnat et al., 2006).We will outline this procedure along similar lines as reviewed in Black and McKane (2012) and additionally provide some insights on the population distribution in the probabilistic framework.
In evolutionary game theory, the Moran process has become a popular model for stochastic dynamics in finite populations (Nowak et al., 2004).It is a model describing the dynamics of different alleles in a population of fixed size and overlapping generations.As this is a birthdeath process, quantities such as fixation probabilities, times, and the stationary distribution can be calculated based on recursions (Goel and Richter-Dyn, 1974;Karlin and Taylor, 1975;Traulsen and Hauert, 2009;Allen, 2011).A continuum approximation for quantities that are known exactly thus makes limited sense.Another important process in population genetics is the Wright-Fisher process -a model for allele evolution in a population of fixed size and non-overlapping generations (Wright, 1931).It is more popular in population genetics but is also used in evolutionary game theory (e.g.Imhof and Nowak, 2006;Traulsen et al., 2006;Taylor and Maciejewski, 2012;Wakano and Lehmann, 2014).However, the Wright-Fisher process is mathematically much more challenging to analyze exactly.Therefore, often continuum approximations resulting in stochastic differential equations are used to compute typical quantities of interest such as the probability of fixation of a certain genotype or the mean time until this fixation event occurs (Crow and Kimura, 1970;Ewens, 2004).
Even though similar in the questions they try to answer, evolutionary game theory and population genetics are developing in parallel, sometimes with little interaction between them.As this is partly arising from the different methods applied, here we aim to provide an introduction to the continuum limit for those less comfortable with these methods and hesitant to go into the extensive, more mathematical, literature.
Since our goal is to illustrate how to apply a continuum limit to individual-based descriptions of an ecological or an evolutionary process, the calculations and derivations below may remain a bit vague where more mathematical theory and knowledge is necessary.For a mathematically rigorous presentation of this topic we refer to the excellent lecture notes by Etheridge (2012) or the book by Ewens (2004).A more application-oriented treatment of stochastic processes in biology can be found in the books by Gardiner (2004), van Kampen (2007), Otto and Day (2007), and Allen (2011).

Evolutionary and ecological proto-type processes
2.1.Wright-Fisher and Moran process We start by introducing the two most popular processes to model (stochastic) evolutionary dynamics, the Wright-Fisher and the Moran process.While in the Wright-Fisher process generations are non-overlapping and time is measured in discrete steps, the generations in the Moran model are overlapping and can be measured in discrete or continuous time.Both processes describe the stochastic variation of allele frequencies due to finite population size effects, also called genetic drift.

Wright-Fisher model
One of the oldest population genetics model is the finite size Wright-Fisher process (Fisher, 1930;Wright, 1931).Given a population of constant size, it describes the change in frequencies of alleles in non-overlapping generations over time, measured in (discrete) generations.
Classically, one considers a population of N individuals where each individual is of type A or B .The population is considered to be in its ecological equilibrium and its population size N is therefore constant over time.One possible interpretation is that every generation each individual chooses, independently of all other individuals, an ancestor from the previous generation and inherits its type.Under selection, the likelihood of drawing type A individuals increases (or decreases) which introduces a sampling bias.The probability for an offspring to have a parent of type A, conditional on k individuals being of type A in the parental generation, is then given by where s ∈ R ≥0 is the selective advantage of type A. The number of type A individuals in the next generation is then given by a binomial distribution with sample size N and success probability p k .Denoting the number of type A individuals in generation n by X n we have Unfortunately the Wright-Fisher model, even though very illustrative, is difficult to study analytically.Through the developments in stochastic modeling in the last century, a lot of this new theory could be adopted to overcome this problem (e.g.Kimura, 1983;Ewens, 2004).

Moran model
Another way to resolve the difficulties associated with the Wright-Fisher process is offered by the Moran process (Moran, 1958).As already mentioned, the setup is the same as for the Wright-Fisher process (constant population size N with two types or -in population genetics -alleles A and B ) with one exception: time is not measured in generations but each change in the population configuration affects only one individual, the one that dies and gets replaced by an offspring of another randomly selected individual.This results in overlapping generations and allows for time being measured either in a discrete or a continuous way.

Discrete time
The Moran process in discrete time can be formulated as follows.Every time step, one individual is randomly chosen to reproduce and the offspring replaces a randomly chosen individual among the remaining N − 1 individuals (sometimes the replacement mechanism is not restricted to the remaining individuals but also includes the parent).Therefore, in a population with k type-A individuals, the probability that one of these replaces a type B individual is given by with p k as defined in Eq. (2.1).Analogously, the probability for the number of type A individuals to decrease from k to k − 1 is given by We have implemented selection on the reproduction step, however the Moran model also allows for selection on death.In a non-spatial setting, as considered here, this leads to the same transition probabilities.However, the Moran model can also be considered on a graph which aims to model spatial structure.Here, the order and also the precise implementation of selection matters and can potentially give rise to different evolutionary dynamics (Lieberman et al., 2005;Kaveh et al., 2015).We note further that without selection (s = 0) we have p k = k/N , i.e. the increase and decrease probabilities are equal for any choice of k.Usually this kind of dynamics is called neutral.

Continuous time
The same dynamics (albeit on a different time-scale) can be obtained by assuming that each pair of individuals is associated to a random exponentially distributed time (also described as exponential clocks) and the next pair to update their types is determined by the smallest random time (or the clock that rings first).At these times, one of the two individuals is chosen to reproduce, the offspring replacing the other individual of the pair.There is no standard choice when it comes to choosing the rate of these exponential times.However, in the neutral model (s = 0) the rate N 2 is mathematically convenient since then the time-scale of the Moran dynamics corresponds to the time-scale of Kingman's coalescent (Kingman, 1982) (see Wakeley (2008) for an introduction to coalescent theory).
Both formulations of the Moran process are Markov chains, either in discrete or continuous time with the special property of only having jumps of ±1.These processes are called birth-death processes.The theory of these is well developed, see for example the books of Karlin and Taylor (1975), Gardiner (2004), or Allen (2011), so that the dynamics of Moran processes are often amenable to analysis (typically by solutions of recursion equations).

Conclusion 1
The difference between the Wright-Fisher model and the Moran model is the progression of populations in time.In the Wright-Fisher process, generations are nonoverlapping, i.e. all individuals update their type at the same time.Therefore the distribution of types in the offspring generation is binomial.On the other hand, generations are overlapping in the Moran model and the type dynamics follow a birth-death process since only one individual is updated at a time, resulting in less complicated transition probabilities.

Logistic growth
In ecology one is typically interested in population sizes rather than allele frequencies.The simplest population growth model is that of exponential growth.However, a population that grows exponentially forever contradicts the physical boundaries of our plane.Obviously, a population's growth will be limited at some point, for example due to spatial constraints or resource depletion.This form of density regulation is enough to stabilize a population around its carrying capacity, the smallest positive population size at which in the deterministic process the growth rate equals zero.
Here, we give the mechanistic basis that could potentially describe such a process.We denote a single individual of the population by Y .The birth-and death-reaction can then be written as (2.5) The parameters β and δ correspond to the rates at which the two reactions happen, i.e. each reaction corresponds to an exponential clock with rate either β or δ.For β > δ, the population tends to grow to infinity, whereas for β < δ, it goes extinct.Population regulation is achieved through a non-linear term that is typically interpreted as an interaction between two individuals, e.g.competition for space.The corresponding reaction is given by (2.6) The parameter γ is referred to as the intra-specific competition coefficient and K is a measure of the number of individuals at carrying capacity.For example, K = 10, 000 would result in a carrying capacity of this order of magnitude.The division by K in the competition rate is accounting for the probability of interaction of two individuals in a well-mixed population where space is measured by the parameter K so that Y /K becomes a density (or rate of encountering an individual when randomly moving around).For a more detailed derivation of these type of interaction rates we refer to Anderson and Kurtz (2015).The logistic process is, like the Moran process, a birth-death process.Therefore, it is amenable to the same type of finite population analysis.We will see in the next section, that in the infinite population size limit (we let K tend to infinity) the mechanistic description above results in the logistic equation of the form where r = β − δ is the per-capita growth-rate, c = (β − δ)/γ is the rescaled carrying capacity, and y = Y /K is the density of the population.

Infinite population size limit
The microscopic descriptions above are enough to implement a stochastic simulation algorithm.However, the theoretical analysis of finite size populations can be challenging.A common technique is therefore to consider a continuum approximation, i.e. studying the limiting model for N , K → ∞ (and usually s → 0 in evolutionary processes).Typically, the diffusion approximation is used to derive (stochastic) differential equations of the form where (W t ) t ≥0 is a standard Brownian motion (see Appendix A.1 for more details on the Brownian motion).This equation describes the population dynamics, i.e. the macroscopic evolution of a certain model.A solution of a stochastic differential equation of this particular form is also called a diffusion.We note that Eq. (3.1) is the compact writing of the following integral equation We call µ(x t ) the infinitesimal mean, i.e. the expected change of the stochastic process (x t ) t ≥0 in a very short time interval, and σ 2 (x t ) the infinitesimal variance, i.e. the corresponding expected variation of the diffusion in very small time steps (see also Appendix A.2).In the case where σ is zero, the limiting process is deterministic and Eq.(3.1) reduces to an ordinary differential equation.We now present how to derive Eq.Dependent on the field, this type of approximation is known as Gaussian or diffusion approximation (Norman, 1975;Ewens, 2004;Etheridge, 2012) or as Kramers-Moyal expansion (van Kampen, 2007).The general idea is to derive the dynamics of the probability density in the microscopic system and to perform a Taylor expansion, i.e. linearize this equation with respect to the parameter 1/N , the frequency change of the population dynamics in the finite population size description of the Moran model.
The change in very small (infinitesimal) time of any continuous-time Markov process (x t ) t ≥0 can be described by the infinitesimal generator, denoted G .For a process that is homogeneous in time, i.e.where the transition rates are constant in time and only depend on the state of the stochastic process, the infinitesimal generator is independent of time t .Intuitively, one can think of it as the derivative of the expectation (of an arbitrary function) of a stochastic process.Formally, it is defined by where E[ f (x ∆t )|x 0 = x] denotes the conditional expectation of the stochastic process f (x t ) at time ∆t given the initial value x 0 = x with f an arbitrary function so that the limit is well-defined.For example, applying G to f (x) = x describes the dynamics of the mean of x t .
The infinitesimal generator is useful in our context since it can be related to a diffusion process.To be more precise, the infinitesimal generator associated to the stochastic differential equation More details on this relationship are provided in Appendix A.2.Our goal is to find the limit of the finite-population size generator of the form given in Eq. (3.5) that directly translates to a diffusion.As an example we consider the continuoustime Moran process with transition rates T k+ and T k− for 0 ≤ k ≤ N and where each individual has a type-independent birth-rate of 1 (an exponential clock with rate 1) at which it replaces another individual with an offspring.Setting x = X /N , the frequency of individuals of type A, we find the infinitesimal generator for the model with fixed population size N , G N , to be of the form probability for an update until time ∆t (3.6) We have used the Landau notation O(∆t ) to summarize processes that scale with order ∆t or higher.Doing a Taylor expansion for large N and neglecting the terms of order higher than 1/N 2 we obtain (3.7) Lastly, we rescale time by the factor 1/2, i.e. we change the process onto the time scale τ = t /2.
Recall that G N is the limit of the differential quotient for ∆t → 0 and as such incorporates the time-change.Now the infinitesimal generator of the rescaled process reads Translating this equation to a stochastic differential equation we can identify the single components as (3.9) Note that we have made no assumption on the dependence of the transition probabilities on the frequency x, such that this approach is applicable for constant selection, linear frequency dependence arising in two player games (Traulsen et al., 2005) or multiplayer games with polynomial frequency dependence (Gokhale and Traulsen, 2010;Peña et al., 2014).

Conclusion 2
For time-continuous finite population size models with jumps of ±1, i.e. a birth-death process, the limiting diffusion process can be computed by Eq. (3.9).

Example: Moran process with selection and mutation
As an example we explicitly derive the stochastic differential equation corresponding to the Moran model with selection and mutation.Here, we decouple the reproduction and mutation processes but similar derivations can be made if we assume a coupling of mutations to reproduction events.The selection coefficient is denoted by s and the mutation rates from type A to B and type B to A are given by u A→B and u B →A , respectively.This time we consider transition rates instead of transition probabilities.This means that the transitions are now also defining the speed of these updates to happen.The transition rates can be written as and Instead of individuals giving birth to an offspring independent of their type, these rates translate to each individual having its own exponential clocks: One clock with rate (1 + s) for type A individuals and rate 1 for type B individuals corresponding to reproduction and one clock with rates u A→B and u B →A for mutation, respectively.Thus, in total there are k individuals that share the same rates which gives the first term for T k+ (reproduction) and the second term for T k− (mutation).Note that this is just a simple example.In general, all different types of updates could be envisioned, see for example Czuppon and Rogers (2019) for a sexually reproducing population under self-incompatibility.
Inserting the just defined transition rates into Eq.(3.9) yields (3.12) Dependent on the choice of selection and mutation rates, these equations result in different limits.Typically one is interested in non-trivial limits for these equations, i.e. a limit so that not both components equal zero.Often this can be achieved by rescaling time (typically the inverse of the population size) and/or defining the strength of selection and mutation in terms of the population size N .As an example, we will focus on two specific limits: (i) strong selection and strong mutation and (ii) weak selection and weak mutation.

Strong selection and mutation
At first, we consider large selection and mutation effects.We assume that s and u i do not depend on N but are constant.In order to obtain a limit equation for the frequency of individuals of type A, x = X /N , we rescale time by N , which transforms Eq. (3.12) to The first equation is independent of N and the vanishing variance in the second equation implies that the limit process is deterministic.We obtain the ordinary differential equation (3.14) which describes the mean dynamics of the allele frequency in the population.

Weak selection and mutation
In contrast to the previous scenario, we now assume that both selection and mutation are weak.More precisely, we choose s and u i to scale inversely with N and define the constants α = sN and ν i = u i N .Inserting these into Eq.(3.12) and here without rescaling time (speeding up or slowing down the original process in order to obtain a reasonable continuous-time limit), yields which gives the diffusion limit (Eq.(3.1)) This stochastic differential equation in the context of population genetics (and typically without the factor two in the stochastic component) is also called Wright-Fisher diffusion (with selection and mutation).This difference in the variance term shows that the Moran model has twice as much variance as the Wright-Fisher process when compared in the diffusion limit (for a derivation of the diffusion limit when starting from the Wright-Fisher process see Appendix B.1).More details are provided in Appendix B.2 where the Moran and the Wright-Fisher process are compared in more detail.In short, the difference arises through the different sampling schemes in the two models, Binomial-vs birth-death-updating.We also note, that for this choice of selection and mutation strength (both are assumed to be weak and to scale with 1/N ), coupling mutation to reproduction and therefore changing the rates in Eqs.(3.10) and (3.11), would result in the same limit equation.
In the following we will call the process solving Eq. (3.16) without the factor 2 in the variance (σ 2 (x) = x(1 − x)) a Wright-Fisher diffusion with selection and mutation.

Conclusion 3
Different assumptions on the model dynamics, here selection and mutation, can lead to different continuum limits on the population level.In order to identify parameter combinations that result in reasonable diffusion approximations it is key to study Eq.(3.12) in detail.

Conclusion 4
The Moran process, by nature, has the same mean behavior as the Wright-Fisher model.However, its variance in the diffusion limit is twice the variance of the corresponding Wright-Fisher diffusion derived from the classical Wright-Fisher model as defined in Section 2.1.

Example: Logistic growth
We apply the same technique to the logistic growth model.From the rates given in Eqs.(2.5) and (2.6) we first derive the transition rates T j + and T j − .They are given by We apply the same technique as before which resulted in Eq. (3.9) with one difference though: for the logistic process, as we have introduced it here, we do not need to rescale time by 1/2.The only rescaling we do is going from the number of individuals j to the corresponding density y = j /K .Then the calculation in Eq. (3.6) transforms to (3.18) Applying the limits in Eq. (3.9) with K as the parameter going to infinity we find This means that in the limit of infinite population sizes the variance vanishes and we end up with a deterministic process, the logistic growth equation: Indeed, as we see in Figure 1, the population size measure K does not need to be very large for the individual based model to approach the deterministic limit (K ≈ 1000 is enough in this example).

Finite population size approximation
We have seen that if we let the population size tend to infinity, we are able to recover a (stochastic) differential equation describing the studied evolutionary or ecological process.A natural question that arises is how these results relate to the finite population size models.One way of approaching this question is to study the variation around the limiting process in more detail.Without going too much into the formal details, we can, as a first approximation, identify this deviation as the infinitesimal variance (σ 2 in Eq. (3.9)) without taking the limit of N or K to infinity.A typical equation in this scenario would look as follows: where N is the parameter (typically the population size or a measure of population size) that in the infinite population size approximation would tend to infinity.We see that the variation around the deterministic part of the process scales with N −1/2 , reminiscent of the central limit theorem.It is worth pointing out that all the subsequent quantities and computations can also be done with this finite population size approximation.This type of approximation is particularly useful when we want to infer results where selection and/or mutation do not belong to a regime that is reasonably covered by the infinite population size limit.In these cases often the limiting process becomes deterministic and the description of the stochastic variation is lost (cf. the examples in the previous section).

Stationary distributions
For the Moran model we have derived two different limits dependent on the strength of selection and mutation.If both, selection and mutation, are strong we arrived at a deterministic limit equation.For weak selection and weak mutation we obtained a stochastic differential equation.One qualitative difference between these two limits is that trajectories of the deterministic equation will always converge to a fixed point while the stochastic differential equation fluctuates indefinitely for positive mutation rates.The deterministic fixed point is given by the solution of Eq. (3.14) equal to zero.In our simple example, a single fixed point x * lies within the interval between 0 and 1 and is stable.Therefore all trajectories will approach this value, e.g.see Figure 2(a).
On the other hand, Eq. (3.16) is a stochastic equation.Hence, even if the trajectories approach or even hit the deterministic fixed point they will not stay there due to the introduced randomness by the Brownian motion, cf. Figure 2(b).Still, we can make predictions about the time a trajectory spends in certain allele configurations.These are summarized in the stationary distribution, a quantity that is the stochastic equivalent of a deterministic fixed point.More precisely, if the initial state of the population is described by the stationary distribution, then all future time points have the same distribution.For birth-death processes, the stationary distribution can be calculated based on detailed balance, i.e. the incoming and outgoing rates need to be equal for every possible state of the process (Gardiner, 2004;Claussen and Traulsen, 2005;Antal et al., 2009).In general, the stationary distribution ψ can be defined as the solution of where x 0 ∼ ψ denotes that x 0 is distributed according to ψ.The above equation can be expressed in terms of the infinitesimal generator This equation allows us to solve the stationary distribution numerically and in some cases even an analytic solution is possible.In Box 1 we introduce the scale function S(x) and the speed measure M (x) which are related to a particular stochastic diffusion.These functions are useful to compute not only the stationary distribution but, as we will see later, other

Box 1: Scale function and speed measure
The scale function is defined by where the lower boundaries of the integrals can be chosen arbitrarily.The name of this function derives from the fact that for a one-dimensional diffusion x t satisfying the scaled process S(x t ) becomes a (time-changed) Brownian motion on the interval [S(0), S(1)], i.e. there is no deterministic contribution in the scaled process.The time change is given by the speed measure M which defines how much faster (or slower) the original process is evolving compared to a standard Brownian motion.It is given by the density of the speed measure.
The scale function and the speed measure describe diffusion processes analytically and can be used to compute the stationary distribution, fixation probabilities and mean extinction times, as we will see in the following sections.A more detailed assessment of the scale function and the speed measure can be found in Etheridge (2012).
quantities like the hitting probability or the mean time to fixation.For example, in the case of a one dimensional diffusion, the stationary distribution, i.e. the solution of Eq. (5.2), can be expressed by the scale function S(x) (Eq.(B1)) and the speed measure density m(x) (Eq.(B3)).Using these quantities, the solution of Eq. (5.2) can be compactly written as Thus, all that is needed for the computation of the stationary distribution is the speed measure density m.For a detailed derivation see also Etheridge (2012, Chapter 3.6).

Stationary distribution of the Wright-Fisher diffusion
Let us consider the Wright-Fisher diffusion with selection and mutation, i.e. (5.4) The infinitesimal generator is given by (see also Eq. (3.5)) (5.5) The derivative of the scale function (Eq.(B1) in Box 1) reads where C is a constant dependent on the lower bound of the integral.The speed density (Eq.(B3) in Box 1) takes the form (5.7) Using a symbolic programming language, e.g.Mathematica, we find that integrating m(x) over the whole frequency space, the interval [0, 1], we obtain where Γ(x) is the Gamma function and 1 F 1 (a, b, z) is the generalized hypergeometric function.Then the stationary distribution (Eq.( 5.3)) can be written as .
(5.9)Note that for α = 0 the generalized hypergeometric function reduces to unity and the stationary distribution can be expressed in terms of the Gamma function.
As an example, some stationary distributions are plotted in Figure 3.For symmetric mutation rates ν A→B = ν B →A , the stationary distribution is either centered around 0.5 or peaks at the boundaries of the frequency space.Intuitively this can be explained as follows: if mutation rates are too small, genetic drift drives the allele population to fixation (either boundary is equally likely due to the symmetric choice of mutation rates).If mutation rates are increased, coexistence of the two alleles becomes possible due to the drift-mutation balance.For non-equal mutation rates, the stationary distribution is skewed towards the allele that is favored by the mutation mechanism.Similarly, for selection the distribution is skewed towards the favored allele.

Conclusion 6
The stationary distribution of a one-dimensional diffusion can be expressed in terms of the density of the speed measure m(x) by Eq. (5.3).The density of the speed measure can be computed by the scale function corresponding to the stochastic diffusion process, Eqs.(B1) and (B3) in Box 1.

Quasi-stationary distribution of the logistic process
Next, we consider the finite population size approximation of the logistic growth model, i.e.
(5.10) 0.00 0.25 0.50 0.75 1.00 0 1 2 3 4 5 allele frequency x prob.density eq.(5.9) Figure 3: Stationary distribution of the neutral Wright-Fisher diffusion.We plot the density of the stationary distribution, ψ in Eq. (5.9), for different sets of mutation rates (and selection).For small symmetric mutation rates, ν A→B = ν B →A < 1 2 , most of the density is at the boundaries of the frequency space (dashed line).For large symmetric mutation rates, ν A→B = ν B →A > 1 2 , most of the density is centered around 1 2 (solid line).Asymmetric mutation rates result in skewed stationary distributions, e.g. the dotted curve is skewed towards x = 1, since the mutation rate towards the considered allele is larger than the mutation rate away from it, ν B →A > ν A→B .A similar pattern emerges when selection is included which generates a bias towards the favored allele (dash-dotted line).
This example has a (unique) absorbing state, y = 0.It is accessible from all positive population densities and therefore the population will go extinct almost surely.The only stationary distribution is the point-measure on 0, i.e. ψ(y) = 1 {y=0} .
In contrast, the positive deterministic population equilibrium, y * = (β − δ)/γ, is a stable fixed point of the deterministic system.Considering large values of the deterministic equilibrium (K 1), we expect the finite population size process from Eq. (5.10) to remain close to this value for long times.In fact, the expected extinction time of the logistic growth process when started in the positive population equilibrium is of order exp(K ) (Champagnat, 2006).This suggests that the process will be in a quasi-stationary state, i.e. before its extinction, the population can be described by the stationary distribution of the corresponding logistic process conditioned on non-extinction.
Formally, the quasi-stationary distribution can be computed by conditioning the original process on its survival.This means that the transition rates change and the novel process can be analyzed by the techniques described above.However, this method goes beyond the scope of this manuscript.For a theoretical treatment of this topic in the context of the logistic equation see for example Cattiaux et al. (2009); Assaf et al. (2010); Méléard and Villemonais (2012).For a general review on methods related to quasi-stationary distributions see Ovaskainen and Meerson (2010).
Another way to approximate quasi-stationary distribution when extinction is very unlikely, is provided by the central limit theorem (sometimes also called linear noise approximation).
Here, the distribution of the process is derived from its local dynamics around the deterministic fixed point y * (Ethier and Kurtz, 1986;Gardiner, 2004).The underlying assumption is that the population stays close to its positive steady state and just slightly fluctuates around this value.This is only a valid assumption when the probability of extinction within the studied time-frame is essentially zero.These small fluctuations can then be described by a Gaussian distribution.In terms of a formula this translates to the following: (5.11)where U is a Gaussian random variable.Writing µ(y) = (β−δ−γy)y and σ 2 (y) = (β+δ+γy)y, the dynamics of U can be rewritten as (5.12) The process U is therefore an Ornstein-Uhlenbeck process (see also Appendix A.3). Evaluating the process U at y K t = y * we thus obtain a description of the variance in the deterministic fixed point.By the properties of the Ornstein-Uhlenbeck process (cf.Eq. (A.9) in Appendix A.3) we find its stationary distribution as ψ U (y) ∼ N 0, − σ 2 (y) 2µ (y) . (5.13) This distribution describes the fluctuations of the process y K t around the deterministic steady state y * .Therefore, when plugging the distribution ψ U into the original process from Eq. (5.11), we find the quasi-stationary distribution of y K t around the deterministic equilibrium y * which is given by (5.14) We can see that for increasing population sizes K , the variance is decreasing and vanishes in the limit K → ∞.In Figure 4 we have plotted this quasi-stationary distribution for different parameter sets of the system.Two general rules arise: Firstly, the larger the overall population size (large K ), the smaller the variance due to the scaling of the variance by 1/K (Eq.(5.14)).This also becomes clear when considering the jump sizes, 1/K , of the corresponding birth-death process when viewed in the density-space; these jumps become smaller the larger K is and this directly translates to the distance the process gets pushed Figure 4: Linear noise approximation around the positive steady state of the logistic equation.We plot the density of the stationary distribution, ψ from Eq. (5.14), for different sets of parameters.The width of the stationary distribution decreases with increasing population size as can be seen by comparing the curves for K = 100, K = 1000 and K = 2000.Decreasing the competition parameter (and therefore increasing the per-capita death rate) results in a broader distribution (red line) when compared to the benchmark scenario (orange curve).The analogous conclusion, lowering the per-capita birth-and death-rate results in a less broad distribution, is visualized by the purple line.This shows that the per-capita birth-and deathrates have a stronger impact on the width of the stationary distribution than the competition parameter.away from y * .Secondly, the larger the birth-, death-and competition rates, β, δ and γ, the broader the distribution.This is explained by the variance σ 2 being the sum of these three values.Then, increasing these values, also increases the effect of the noise around the positive population equilibrium y * as can be seen in Eq. (5.14).

Conclusion 7
If the studied process allows for a deterministically stable steady state but is almost surely going extinct for finite population sizes, a quasi-stationary distribution can be computed to describe the behavior of the process conditioned on survival.If the extinction probability is very low, an approximation of this distribution is given by the linear noise approximation where the variance around the deterministic steady state is modeled by an Ornstein-Uhlenbeck process given by Eq. (5.12).

Fixation and first hitting probabilities
We have seen that stochastic descriptions of processes can lead to outcomes that are different from their deterministic counterparts.Here, we want to investigate one of these phenomena, namely the probability for a certain type to become fixed in a population.One-dimensional stochastic differential equations allow for an explicit computation of these fixation probabilities.As before we denote by x t the frequency of type A individuals at time t ≥ 0 in the population.If this process is described by a one-dimensional diffusion, the fixation probability can be computed by where S(x) is the scale function from Eq. (B1) corresponding to the process (x t ) t ≥0 , and x 0 denotes the initial frequency of type A individuals.For the derivation of this formula, we refer to Otto and Day (2007, Chapter 15.3.3),Etheridge (2012, Lemma 3.14) or Kallenberg (2002, Theorem 23.10).
As an example, let us consider the Wright-Fisher diffusion with selection which is given in Eq. (5.4) without mutations (ν A→B = ν B →A = 0).We have µ(x) = αx(1−x) and σ 2 (x) = x(1−x) such that the scale function simplifies to Recalling the definition of α = sN for a finite population of size N and plugging this into Eq.( 6.1) yields which for x 0 = 1/N becomes P 1/N (x ∞ = 1) ≈ 2s, the result of Haldane for the fixation of a single mutant copy in a population of size N (Haldane, 1927).
This method is also applicable to more complicated stochastic differential equations where the sign of the deterministic dynamics µ(x) is dependent on the population configuration.Most classically, these frequency-dependent problems were studied in deterministic evolutionary game theory introduced by Maynard Smith and Price (Maynard Smith and Price, 1973) (see also Hofbauer and Sigmund (1998) for an introduction to evolutionary game dynamics).
Our aim here is to derive the fixation probability of a process under frequency-dependent selection using the scale function.We consider a diffusion with linear frequency dependence (see Traulsen et al. (2006) for a physical formulation of this and Pfaffelhuber and Wakolbinger (2018) for a more general mathematical analysis).We denote the strength of selection by α and let u, v be arbitrary real numbers.We write αx(1 − x)(ux + v) for the linear frequencydependent dynamics of selection.Then, the allele frequency evolves according to the following equation For α 1, we can linearize the exponential and write the scale function as Plugging this into Eq.( 6.1), we obtain (6.6) In the context of evolutionary game theory, this result is a re-derivation of the 1/3−law (Nowak et al., 2004) (generalized by Lessard and Ladret (2007)).It states that for an allele starting with one individual, it is more likely to become fixed in the population than under neutral dynamics if the deterministic fixed point is smaller than 1/3.This can be seen

Conclusion 8
The scale function can be used to analytically derive (or to compute numerically) the fixation probability of any one-dimensional diffusion representing trait frequencies by solving Eq. (6.1).

Mean time to fixation
A related quantity that can be derived from a one dimensional diffusion is the expected time to fixation (or extinction from the other species' point of view), i.e. the time the two types coexist in the population.Again, the calculation relies on a special function, this time Green's function G(x, y) which can be interpreted as the average time that a diffusion started in x spends in the interval [y, y + d y) before reaching one of the boundaries (Chapter 3.5 Etheridge, 2012).It is therefore sometimes also called sojourn time density (Ewens, 2004).Green's function is defined as where S(x) is the previously defined scale function and m(x) denotes the speed measure density (see Box 1).The expected time to fixation for a process started at frequency x, denoted E x [τ], is then given by (see also Kallenberg (2002, Lemma 23.10) or Ewens (2004, Section 4.4)) This corresponds to the summation of the sojourn times in the discrete case, see e.g.Ohtsuki et al. (2007) for an application in finite populations.It is possible to simplify this formula considerably when applying it to concrete examples.For a particularly simple stochastic process, the neutral Wright-Fisher diffusion (7.3) the scale function and speed measure density are given by The expected time to fixation of one of the two alleles can then be expressed as ).
(7.5)As a more complex example, let us again consider the frequency-dependent selection process described by the stochastic differential equation (7.6) Similar to the computation of the fixation probability, we will consider the case of small initial frequencies and weak selection, i.e. α, x 1.More precisely, we neglect terms of order α 2 and αx 2 .We recall the approximation of the scale function in this case that we derived in Eq. (6.5)

S(x)
Employing these approximations, the first integral in Eq. (7.2) yields x (7.8) Approximating the second integral in a similar way we find ≈ 2x ln(x −1 ) + 2xαv . (7.9) Taking these two expressions together we re-derived the results already known in the literature (Altrock and Traulsen, 2009;Pfaffelhuber and Wakolbinger, 2018), i.e. (7.10) for α and x sufficiently small.

Conclusion 9
Expected unconditional fixation times, i.e. the expected time of coexistence of two alleles in a population described by a stochastic diffusion process, can be calculated by integrating over Green's function (the mean occupation time of a certain frequency until extinction), as shown in Eq. (7.2).

Summary and Outlook
We have outlined how to derive a stochastic differential equation from an individual based description of two classical models in evolutionary theory and theoretical ecology, the Wright-Fisher diffusion and the logistic growth equation.The resulting stochastic differential equation in one dimension describes the evolution of the allele frequency or population density under study, respectively.Using probabilistic properties of this equation, i.e. transforming it to a rescaled Brownian motion (Box 1), it is possible to analytically derive the (quasi-) stationary distribution, fixation probability and the mean time to fixation.By way of example we derived these quantities for the Wright-Fisher diffusion without and with (frequency-dependent) selection.
The diffusion process emerges as the infinite population size limit.However, as we have seen in Section 4, one can also derive a finite population size approximation of the dynamics, i.e. we do not take the limit N , K → ∞.As mentioned, the fixation probability or mean time to fixation can be analogously derived as in the previous sections and thus carry over to these finite population equations.Applications of these finite population size approximations are abundant and cover diverse topics (e.g.Traulsen et al., 2005;Reichenbach et al., 2007;Assaf and Mobilia, 2011;Houchmandzadeh, 2015;Constable et al., 2016;Débarre and Otto, 2016;Kang and Park, 2017;Koopmann et al., 2017;Serrao and Täuber, 2017;Czuppon and Gokhale, 2018;Czuppon and Traulsen, 2018;Parsons et al., 2018;McLeod and Day, 2019;Schenk et al., 2019).
Apart from the fixation probability and the mean time to fixation, the (quasi-)stationary distribution is a commonly used measure to describe stochastic systems.Its calculation in an infinite population described by stochastic differential equations is straightforward.In finite population approximations though, populations can go extinct due to the inherent stochasticity of the microscopic reactions.Here, the quasi-stationary distribution describes the stationary distribution conditioned on the survival of the population.For negligible extinction probabilities, i.e. very large survival probabilities of the population, the functional central limit theorem (or linear noise approximation) can be used to approximate this quasistationary distribution.In the theoretical biology literature, this method is frequently used in models of gene regulatory networks (see Anderson and Kurtz, 2015, for a mathematical introduction) and, interestingly, less so in the context of ecology or evolution (e.g.Boettiger et al. (2010); Kopp et al. (2018); Wienand et al. (2018); Czuppon and Constable (2019); and Assaf and Meerson (2017) for a review of the physics literature related to this topic).Another interesting extension of this approach appears when dealing with a model where processes evolve on different time scales.Using the central limit theorem it is possible to capture the variance (noise) coming from different processes evolving on different time scales (see Kang et al. (2014) for the formal derivation and Czuppon and Pfaffelhuber (2018) for an application in the context of gene regulatory pathways).Ultimately, this can help to disentangle the single contributions from different process onto the quasi-stationary distribution.
Lastly, we did not cover multi-dimensional stochastic differential equations in this methods review.These high-dimensional processes offer a way to explore multi-trait or multi-type evolutionary dynamics, or evolution in spatially structured populations.We decided not to cover this topic because the methods to analyze these differential equations are not yet well developed or go beyond the (mathematical) scope of this manuscript.Still, under quite restrictive and simplifying assumptions, analytical results can be derived by using similar ideas as those presented here (e.g.Constable and McKane, 2014;Lombardo et al., 2015;Manceau et al., 2016;Simons et al., 2018;Czuppon and Rogers, 2019;Czuppon and Constable, 2019).Even though these multi-dimensional processes might not be tractable analytically in general, they are amenable to numerical analysis.As such they provide a tool to explore the potential qualitative outcomes of an ecological, evolutionary or eco-evolutionary model without the need to run computationally heavy individual-based simulations.
We hope that with our basic comparisons between different approaches used in different subfields of theoretical and mathematical biology, we help newcomers in the field to get more familiar with these methods.We simulate it by the Euler-Maruyama method (also stochastic Euler-method) with d t = 0.001.
(1986, Chapter 5.3).Briefly, one needs to apply Itô's formula (Kallenberg, 2002, Theorem 17.18) to the process f (x t ) to see that the process (x t ) t ≥0 that solves the stochastic differential equation indeed has an expected change in infinitesimal time steps described by the infinitesimal generator in Eq. (A.3).

Neutral diffusion
We need to compute the expectations in Eqs.(A.4) and (A.5) using the probability distribution given in Eq. (2.2) (with s = 0).Setting the time between two generations to 1/N such that for large N it becomes approximately continuous, we find for the expected change in the number of alleles The infinitesimal variance computes to Dividing by ∆t = N −1 and replacing k/N by x, we find the neutral Wright-Fisher diffusion for the allele frequency dynamics

Diffusion with selection and mutation
Next, we include selection and mutation.We say that type A alleles are beneficial (deleterious) if s > 0 (s < 0).Given that there are k type A individuals in the population, the probability for an offspring to choose a type A individual as a parent is given by We can also add mutations to the Wright-Fisher model, i.e. type A individuals can mutate to type B and vice versa.We set u A→B as the probability to mutate from type A to B and u B →A as the mutation probability from B to A. Then the probability for an individual to be of type A given k type A individuals in the parental generation reads In this model mutation is intimately connected with the reproduction mechanism.For the Moran model, compare Section 3, these processes do not necessarily need to be coupled (even though this would, biologically speaking, make the most sense).Following the same methodology as for the neutral Wright-Fisher model, we can derive a diffusion process by computing the infinitesimal mean and variance.For the allele frequency

B.2. Comparing the variances of the Wright-Fisher and the Moran process
As we have seen in Eq. (3.1) in the main text the variance in the diffusion limit is given by σ 2 (x) = 2x(1 − x).The underlying model was assumed to be the Moran model.If we do the same derivation of the diffusion limit using the discrete Wright-Fisher model (binomial updating) as done in the previous section, we obtain the variance σ 2 (x) = x(1 − x).Here, we hope to give an intuition and an analytical explanation for this difference.
For simplicity, we will only consider the neutral dynamics of the corresponding processes, i.e. we set the selection and mutation rates to zero, s = u A→B = u B →A = 0.In the Wright-Fisher process during one time-step all N individuals are updated (simultaneously).If we switch to the scale on which the diffusion process is evolving we need to rescale time by 1/N (see Eq. (B.9)).Under this scaling we obtain σ 2 (x) = x(1 − x).
For the Moran model, we have seen that we can define the process either in discrete or in continuous time.To obtain the diffusion limit from the discrete time process, we apply the same reasoning as in the previous section, i.e. we compute the infinitesimal variance.In this case we get For this term not to vanish in the infinite population size limit (N → ∞) we need to set ∆t = N −2 .This corresponds to N jumps in a time interval 1/N which corresponds to the number of individuals updated in the Wright-Fisher process during the same time.In addition there is a factor 2 appearing which is due to the different update mechanisms.While for a Binomial-distribution the variance takes a multiplicative form (x(1 − x)), for a birth-death process the birth-and the death-rate are summed up resulting in the factor 2.
For the continuous-time Moran process we refer to Eqs. (3.7) and (3.9) in the main text.In order to make sure that the variance does not vanish in Eq. (3.9) we need to rescale the time of the original process by N 2 resulting in the same reasoning as for the discrete-time Moran derivation.
To conclude, we have seen that the Moran diffusion limit evolves twice as fast as the Wright-Fisher diffusion limit.This is the reason for the larger variance since a "faster" Brownian motion will also spread further when compared to a "standard" Brownian motion.In terms of the original discrete processes, one can summarize that when the same number of individuals gets updated, a lot of individual jumps, like in the Moran process, accumulate more variance than updates with large effects, like in the Wright-Fisher process where the whole population gets updated at once.The sampling therefore determines the speed (and the variance) of the process even in the limit of large population sizes.
(3.1) from the time-continuous Moran model.This procedure can be applied to any time-continuous individual based model.The derivation of Eq. (3.1) from a discrete time individual based model, like the Wright-Fisher process, is outlined in Appendix B.1.
the expectation of the change in frequency, σ 2 (x) = lim N →∞ T xN + + T xN − N , the variance of the change in frequency.

Figure 1 :Conclusion 5
Figure 1: Individual based simulations of the logistic growth model.(a) For low population sizes, the individual based simulation (solid lines) fluctuates strongly around the deterministic evolution of the population (dashed lines) given by equation (3.20).(b) Increasing the stationary population size, the stochastic fluctuations around the deterministic prediction decrease.Further increasing the population size measure K would result in less and less variance, until eventually the individual based simulation is indistinguishable from the deterministic curve.The parameter values are chosen as follows: β = 2, δ = 1, γ = 1.The initial population sizes are given as stated in subfigure (b).

Figure 2 :
Figure 2: Wright-Fisher dynamics with selection and mutation.(a) The deterministic system given by Eq. (3.14) converges to the fixed point (dashed line) and remains there.(b) The stochastic process given by Eq. (3.16) fluctuates strongly in frequency space and spends most time close to the monotypic states x = 0 and x = 1.
by plugging in u = a − b − c + d and v = b − d , where a, b, c, d represent the payoffs of an evolutionary game.

Figure 5 :
Figure 5: Random walk and Brownian motion.(a) The random walk is defined on the discrete state space Z and changes at discrete times N. (b) The standard Brownian motion, starting in 0, takes values in R and is defined for positive times t in R ≥0 .We simulate it by the Euler-Maruyama method (also stochastic Euler-method) with d t = 0.001.