gesp: A computer program for modelling genetic effective population size, inbreeding and divergence in substructured populations

Abstract The genetically effective population size (N e) is of key importance for quantifying rates of inbreeding and genetic drift and is often used in conservation management to set targets for genetic viability. The concept was developed for single, isolated populations and the mathematical means for analysing the expected N e in complex, subdivided populations have previously not been available. We recently developed such analytical theory and central parts of that work have now been incorporated into a freely available software tool presented here. gesp (Genetic Effective population size, inbreeding and divergence in Substructured Populations) is R‐based and designed to model short‐ and long‐term patterns of genetic differentiation and effective population size of subdivided populations. The algorithms performed by gesp allow exact computation of global and local inbreeding and eigenvalue effective population size, predictions of genetic divergence among populations (G ST) as well as departures from random mating (F IS, F IT) while varying (i) subpopulation census and effective size, separately or including trend of the global population size, (ii) rate and direction of migration between all pairs of subpopulations, (iii) degree of relatedness and divergence among subpopulations, (iv) ploidy (haploid or diploid) and (v) degree of selfing. Here, we describe gesp and exemplify its use in conservation genetics modelling.

N e ) as compared to isolated homogeneous ones (Wang & Caballero, 1999;Waples, 2010). This is in spite of a long history of theoretical developments for N e of substructured populations that includes effects of symmetrical migration between subpopulations of the island model (Wright, 1951) and stepping stone migration models (Kimura, 1953;Weiss & Kimura, 1965), impact of strong migration (Nagylaki, 1980) and relationships between different types of N e (Whitlock & Barton, 1997).
The coefficient of gene differentiation G ST of Nei (1973) is another essential parameter that quantifies the proportion of genetic variation due to genetic differences between subpopulations. It extends the fixation index F ST (Wright, 1943(Wright, , 1951 to multiallelic and multilocus situations. In some applications, it is also important to quantify the degree of nonrandom mating in terms of departures from Hardy-Weinberg proportions, either within subpopulations (F IS ) or within the total population (F IT ).
We have recently developed a unified mathematical framework for haploid and diploid structured populations that can be used to compute expected inbreeding and kinship coefficients, effective population size, genetic divergence and departures from random mating in populations that consist of various numbers of more or less interconnected subpopulations whose size may vary over space and time H€ ossjer, Olsson, Laikre, & Ryman, 2014, 2015. This newly developed theory allows computation and modelling of parameters of complex metapopulations that has previously not been possible. It is applicable to selectively neutral inheritance at Y-chromosomes, mitochondrial DNA (haploid populations) and autosomes (diploid populations).
Here, we present a computer program GESP (Genetic Effective population size, inbreeding and divergence in Substructured Populations) that performs several of the analytical computations outlined in H€ ossjer et al. (2014,2015). GESP can be used to model exact local and global rates of inbreeding, haploid and diploid inbreeding as well as eigenvalue effective size, and population divergence in a substructured population. GESP focuses on geographic subdivision and ignores other types of structure such as overlapping generations.
Several software exist for addressing various issues relating to genetically effective population size, and they can be classified into the following three categories: computer programs that (i) simulate drift or other processes and produce data that can be used for estimating N e , (ii) estimate N e from empirical data and (iii) compute predictions of N e from demographic parameters by an exact algorithm.
Examples of category 1 software include simulation programs POPSIM (Hampe, Wienker, Schreiber, & N€ urnberg, 1998), VORTEX (Lacey, 2000) and EASYPOP (Balloux, 2001). TEMPOFS (Jorde & Ryman, 2007), LDNE (Waples & Do, 2008), ONESAMP (Tallmon, Koyuk, Luikart, & Beaumont, 2008), GONE (Coombs, Letcher, & Nislow, 2012) and NEESTIMA-TOR (Do et al., 2014) represent the second category of programs that estimate N e from empirical genotype or allele frequency data. Category 3 includes GESP and other programs, such as AGENE (Waples, Do, & Chopelet, 2011), that iteratively compute a forward prediction of N e . However, in contrast to GESP, AGENE focuses on a single, isolated population with age structure. GESP complements AGENE and other available software for N e -modelling by performing exact calculations for spatially substructured populations using theory that has not previously been available. No other program currently performs such computations. GESP can be used to model, for example metapopulation-N e and such modelling can use simulated, empirical or hypothetical estimates of local effective size, inbreeding levels and rates of migration as input. This can aid in avoiding incorrect conservation management recommendations for N e of substructured populations as have been reported (Holley et al., 2014). For instance, we have recently applied GESP to the case study of the Fennoscandian wolf metapopulation to address questions of how large local effective sizes and what rates of gene flow that are needed to reach conservation genetic goals (Laikre, Olsson, Jansson, H€ ossjer, & Ryman, 2016).
Below we describe GESP, its parameters, and exemplify how the program can be used to aid researchers of molecular ecology and conservation genetics to explore the impact that various future demographic scenarios may have on effective size, inbreeding and subpopulation differentiation.

| WHAT G E S P Does
In this section, we briefly introduce notation and describe the mathematics behind GESP. The metapopulation is assumed to have s subpopulations. The theory allows s to change over time (H€ ossjer et al., 2014, 2015), but in the current implementation of GESP, it is constant. On the other hand, the sizes of all subpopulations or the migration rates between them can differ and vary over time. We consider a selectively neutral and polymorphic locus and study how the genetic composition of the population at this locus is expected to evolve over discrete time steps t = 0, 1, . . ., t max , typically generations. Here, t = 0 represents the present, t > 0 the future and t max + 1 is the number of time points.

| Identity-by-descent parameters and their recursions
Let A be the set of all types of gene pairs, that is pairs of alleles.
The most crucial building block of GESP is a number of probabilities that a gene pair of type a 2 A is identical by descent IBD (or identical by state, IBS) if drawn randomly from the population at time t.
For a haploid population, where each individual has a single gene copy, there are d = s 2 different types a = ij that specify the ordered pair i and j of subpopulations to which the two genes belong. We then refer to f tij as an average inbreeding coefficient between subpopulations i and j at time t. For a diploid population, each individual carries two homologous genes. Whenever the two genes are drawn from the same subpopulation i, we must distinguish whether they belong to the same (a = i) or to different (a = ii) individuals. This gives a total of d = s 2 + s diploid gene pair types. The corresponding OLSSON ET AL.
| 1379 quantities in Equation (1) are referred to as average inbreeding coefficients (a = i), average kinship coefficients within subpopulations (a = ii) or average kinship coefficients between subpopulations (a = ij, i 6 ¼ j).
An important aspect of GESP is to use matrix analytic methods to describe time progression of the inbreeding and kinship coefficients, with genetic drift, migration and mutation as the three forces of genetic change. In the haploid as well as the diploid case, this is achieved by gathering all non-IBD probabilities at time t into a column vector h t = (h ta , a 2 A) 0 of length d. If the two genes are drawn without replacement, the vector of non-IBD probabilities obeys a linear recursion between time points t À 1 and t, where D t is a square matrix of order d, 1 is a column vector of d ones and l is the probability that a gamete mutates under an infinite alleles model (Kimura, 1971). A recursion similar to Equation (2) holds if the two genes are drawn with replacement.
For haploid and diploid models, the elements of D t are functions of the local census and effective sizes N ti and N eti of all subpopulations i at time t, as well as the migration rates m tji from one subpopulation j to another subpopulation i between time points t À 1 and t.
As migration is specified forward in time, from one time point to the next, we refer to m tji as a forward migration rate. Some additional parameters are needed for diploid models, because reproduction may occur either by selfing or crossing. This requires specifying the rate of selfing as well as whether mating occurs before or after migration.

| Subpopulation weights
When computing, for example the effective population size for the metapopulation as a whole, the contribution from the different subpopulations must be weighted, and this can be done in several ways regarding the weights of separate subpopulations. The predefined weighting schemes in GESP include that the subpopulation weights are either uniform (i.e. the same for all subpopulations), proportional to size, proportional to reproductive value (i.e., populations that contribute more to the system as a whole because of migration rates and patterns are given more weight; cf. Fisher, 1958;Felsenstein, 1971) or allocated to particular subpopulations so that other subpopulations are ignored by receiving zero weights. Which of these weights to use depends on the goals of the investigator. Size proportional weights treat all individuals of the metapopulation equally, reproductive weights corresponds to the long-term behaviour of the system, and local weights focus on one particular subpopulation.
Further, it is possible to define user-specified subpopulation weights as any non-negative numbers w 1 , . . ., w s that sum to one ( P s i¼1 w i ¼ 1). A weighting scheme is global if at least two w i are positive, whereas it is local if one subpopulation i receives full weight (w i = 1). It is convenient to interpret all w i as probabilities of sampling genes from the various subpopulations, because this naturally defines weights W a for all a in terms of probabilities of sampling gene pairs of type a. This can be done in different ways, and we distinguish between a number of different sampling schemes for gene pairs. In the diploid case, the three most important schemes are T, S and I. They differ as to whether the two genes are chosen independently from the total (T) population (weights W a = W Ta ), from the same randomly chosen subpopulation (S; weights W a = W Sa ) or from the same randomly chosen individual (I; weights W a = W Ia ). Given that subpopulation weights have been specified, the probability is

| Notions of effective population size in GESP
Some definitions of effective size incorporate mutations (Ewens, 1989;Maruyama & Kimura, 1980  GESP provides inbreeding effective size for the global metapopulation and for separate subpopulations over time intervals, either from the start t = 0 to another specified point t, or from one time point t to the next t + 1. The latter rate of inbreeding from one generation to the next corresponds to an instantaneous effective size. We have previously suggested the term realized effective size (N eR ) for an instantaneous effective size that is determined both by genetic drift and migration . If subpopulation i receives full weight (w i = 1), then N eRti = N eI ([t, t + 1]) is the realized effective size of i at time t. This is a local inbreeding effective size that equals N eti if i is isolated. But in general, the two quantities differ, as the local effective size N eti is an input parameter of the model that is only affected by genetic drift within i between time points t and t + 1, whereas N eRti is an output parameter that is also influenced by immigration into i from the other subpopulations between t and t + 1. More generally, N eI ([t, t + s]) quantifies the average combined impact of genetic drift and migration for time intervals of any length s, for those subpopulations that are part of the weighting scheme. For management applications, we argue that N eI is a more relevant concept than those notions of N e that only include genetic drift, as the effects of migration and drift are hard to separate (and estimate), in particular when the subpopulation structure is cryptic and partly unknown.
The eigenvalue effective population size N eE gives the long-term equilibrium rate at which inbreeding increases (s ? ∞). This requires some additional assumptions, such as time-invariant migration rates and subpopulation sizes, and that no group of subpopulations is isolated, to make D t = D in Equation (2)

| Fixation indices
In order to quantify subpopulation differentiation and departures from random mating, we use to predict the fixation indices G STt , F ISt and F ITt at time t. The quantities on the right-hand sides of Equation (

| PARAMETERS IN G E S P
In GESP, all input parameters are specified using the graphical user interface. Table 1 contains a summary of some of the most important quantities used by the program. The output of GESP is shown in the T A B L E 1 Population genetic parameters used by GESP. They all apply to a diploid model. Some quantities are slightly different for haploid models, see the reference manual (Olsson, 2017)   be exported to a csv-file. All input and output parameters are described in detail in the manual (Olsson, 2017).

| G E S P in Co nservation Genetic Modelling
One of the main purposes of GESP is to analyse how inbreeding dynamics and effective population sizes are affected by various migration scenarios, including populations with varying subpopulation sizes and local bottlenecks. Even though the number of subpopulations is kept fixed, it is still possible to put some local census sizes to zero and thereby incorporate subpopulation extinction and recolonization. Here, we describe an example population in which one of the subpopulations exhibits a local bottleneck, although not a complete extinction. The example is further described in the manual (Olsson, 2017), where the model is specified with a step-by-step instruction. See also Laikre et al. (2016) for a case study of the Fennoscandian wolves where the theory that has been incorporated in GESP is used for practical conservation genetic modelling, including suggestions of general conservation genetic targets for metapopulations (the publication is available for download at the GESP website).
Consider a diploid population with no selfing divided into five subpopulations with migration scheme and subpopulation sizes described in Figure 1. Let the initial inbreeding and kinship coeffi- F I G U R E 2 Inbreeding coefficients for subpopulations 1-3 of Figure 1. In the left subplot (a) the subpopulation sizes are constant, whereas in the right subplot (b), the size of subpopulation 2 has been reduced from 100 to 30 between generations 10 and 20 in order to model a local bottleneck see Figure 1 there is potential for further extensions of GESP, that is, to combine geographic structure with overlapping generations, as outlined in H€ ossjer et al. (2015). Further, we believe it is possible to extend the theory to X-chromosomes, by generalizing results in Nagylaki (1995) for isolated populations to those with geographic subdivision.

| DOWNLOAD AND USAGE
The program, together with its manual, can be downloaded from the website www.zoologi.su.se/research/GESP. The manual (Olsson, 2017) covers information about the installation process, a detailed overview of the interface and a number of examples.