## Introduction

Mathematical modelling in biology, which seeks to describe, explain, or predict phenomena that we see in the natural world, often addresses questions that exceed the reach of the analytical techniques available to us. For instance, even a question as seemingly simple as female preference for mating based on male age can lead to models which are intractable analytically (Beck *et al*. 2002). This is particularly troublesome when solving problems in evolutionary game theory, when the optimal strategy is a function of not only an individual's choice but also the choices of the other players in the game. Even simple games such as the famous Prisoner's Dilemma hold complexities that have required non-traditional techniques to model (Axelrod 1984, 1997).

To address this gap, researchers have begun to follow the advice of Sumida *et al*. (1990), who called for the use of *genetic algorithms* in the study of evolution. First proposed by Holland (1975), genetic algorithms are part of a class of population-based metaheuristics (Blum & Roli 2003) known collectively as *evolutionary computation*. These algorithms evolve populations of candidate solutions to discrete-valued problems using the tools of natural selection: mutation, crossover, selection, and replacement. Evolutionary computation includes other techniques such as genetic programming (evolving computer programs using natural selection; Koza 1992), but this article focuses on genetic algorithms as they are currently the most popular in biology.

I present a simple review of genetic algorithm methodology here and refer the reader to the *Further Reading* below for more detail. The use of terms in the genetic algorithm literature can vary from author to author, but I follow Luke (2009); it is important to remember that though the terms are borrowed from genetics and evolution, this does not mean that genetic algorithms should be seen as a faithful replica of either genetics or evolution unless this is explicitly modelled, (e.g. Sumida *et al*. 1990, and Section ‘'Pitfalls''). The main loop of a genetic algorithm is shown in Fig. 1. A population of candidate solutions, usually referred to as *individuals*, is created. Each individual has a structure referred to as its *genotype* or *genome* (or more generally, its *representation*), which encodes the solution to some optimisation problem. A trivial example is a string of binary numbers of arbitrary length which solves the problem of maximising the number of ‘1's in the string (i.e. maximum fitness is achieved when every position has a 1 in it). A genotype in the form of a vector of fixed length *l* is a *chromosome*, though the terms genotype and chromosome are often used interchangeably. Each position in the vector (i.e. each 1 or 0 in Fig. 1) is referred to as a *gene* or *locus* (I use locus hereafter). Binary and real-valued vectors are common, as are integer vectors (or vectors of discrete sets of elements, e.g. Hamblin & Hurd 2007).

Once the initial population is created, each individual is assigned a fitness score depending on how well it solves the optimisation problem, such as how many ‘1's are in the string for our trivial example. After fitness has been assessed the genetic operators of selection, crossover, mutation, and replacement are applied to the population, usually but not necessarily in that order. The selection operator is a scheme for translating differential fitness into differential reproduction, either deterministically or stochastically. Some common examples are:

- Truncation selection. The fittest
*X*% of the population is allowed to reproduce. The individuals that will actually reproduce after truncation are typically chosen uniformly from with the set of allowed individuals, or each individual reproduces an equal number of times (e.g. if*p*individuals are chosen, each reproduces 1/*p*times). - Roulette-wheel selection (also known as fitness-proportional). Individuals are given a probability of reproducing based on their fitness values, with larger fitness values assigned a higher probability, and then individuals reproduce when a draw from this probability distribution selects them. Note that it is possible for the fittest individuals not to be chosen at all; a variation of roulette-wheel, Stochastic Universal Sampling, ensures that the fittest individuals are chosen at least once.
- Ranking selection. Where roulette-wheel selection assigns probability based on the raw fitness values, ranking selection first transforms the fitnesses into a rank ordering and assigns fitnesses based on ranking. Linear ranking and exponential ranking are the two most common forms of ranking selection.
- Tournament selection. This is a non-parametric selection method that selects
*k*individuals from the population at a time and compares their fitnesses; the fittest of the set of*k*individuals reproduces and then all*k*individuals are replaced in the population and the process begins again. A binary tournament, where*k*= 2, is a popular choice in the genetic algorithm literature.

The rate of selection controls the percentage of population that is allowed to reproduce in each generation. For roulette-wheel, ranking, and tournament selection this is often 1 so that all individuals have a chance of reproducing no matter how small, but smaller values are possible as well so that only the top *X*% are eligible to reproduce. If selection is *elitist*, some percentage of the fittest individuals will be guaranteed inclusion in the next generation.

If crossover is used, it is performed on the selected parents in order to form new children by swapping parts of each parents chromosome. Again, the language used in the genetic algorithm literature can be confusing; here, crossover does not necessarily imply chromosomal crossover as it might in biology, and though I use it here for consistency with previous work, *recombination* would be a better and more general term. Crossover operators include uniform (swapping loci between parents with some probability), one-point (a swap point is selected randomly and everything after that point is switched between the two chromosomes), and two-point (two swap points are chosen and everything between them is switched). The crossover in Fig. 1 is an example of two-point crossover, with the swap points in front of and after the shaded boxes. Mutation is then applied to the new children, either by selecting a random locus to change from each chromosome chosen for mutation (per chromosome) or by mutating each locus in every chromosome with some specified probability (per locus). Mutation of binary strings is bit-flipping, where 1s becomes 0s and vice versa, while mutation of integers or reals is usually done by adding or subtracting a small step drawn from a uniform or Gaussian distribution. Mutation of discrete sets is commonly done by selecting a new value from the set of possible alleles.

Replacement is the merging of the new population of children with the parents. In *generational* genetic algorithms, all parents die at the end of each generation, and selection continues to generate children until the entire population has been replaced. If the genetic algorithm is *overlapping*, also called *steady-state*, some percentage (<100%) of the population is killed each generation and replaced with children bred during selection. The amount of the population killed each generation is controlled by the replacement rate. Following replacement the generational loop is complete and the process begins again, using the population of offspring as the new parental population. When the population composition has fixated on a single genotype or a mixture of genotypes that does not change over time, we may consider this to be the solution that the algorithm has *converged* to. Since solutions are rarely known beforehand, genetic algorithms can be run for a specified number of generations or until some other criterion (e.g. stability in fitness for some period of generations) is met.

An important aspect to the functioning of a genetic algorithm (and any other metaheuristic) is the trade-off between *exploration* and *exploitation* (Luke 2009; Blum & Roli (2003) call them diversification and intensification). Exploration refers to (often random) movement around the search space, while exploitation refers to movement along a fitness gradient. Genetic algorithms combine exploration (mutation and crossover) with exploitation (selection/replacement) to test new solutions while increasing the representation of good solutions within the population. I will return to this trade-off when providing advice for model implementation in 'Advice for model implementation'.

*Further reading*: Luke (2009) is an accessible and practical introduction to using metaheuristics, especially genetic algorithms. Goldberg (1989) and Mitchell (1998) are classic and still-relevant texts on genetic algorithms, and are good starting points to learn more about their theoretical foundations, while Correia (2010) is a more recent review of evolutionary computation specific to biology. Ruxton & Beauchamp (2008) is a nice test of the utility of genetic algorithms on a model from behavioural ecology, with discussions of the consequences of algorithm implementation choices for the results; it makes a good complement for this review. Readers looking for more literature on genetic algorithms will find much overlap with another evolutionary computation technique called Evolution Strategies (Bäck 1996; Luke 2009), and much of what is written for them applies to genetic algorithms as well.