The PD embodies the problem of cooperation: although individuals can benefit from mutual cooperation, they can do even better by exploiting cooperation of others. Therefore, the PD provides an interesting basis for exploring mechanisms that can either prevent exploitation or make it unprofitable, thus enabling cooperation to persist.
Iterated interactions
In the Iterated Prisoner's Dilemma (IPD), a single game consists of a number of rounds of the simple PD, which allows individuals to react to an opponent's past behaviour. If players interact repeatedly before the final tally is made, low expected payoffs in future interactions because of retaliation against current defection could render cooperation beneficial. This is the basic idea of reciprocal altruism (Trivers 1971). Repeated interactions open up a whole new world of possible strategies determining whether to cooperate or defect in the next round based on the outcome of earlier rounds. Exploring this world has been the subject of intense scrutiny by researchers in various fields, including economics, political science, biology, computer science and artificial intelligence.
Perhaps the best-studied class of strategies in the IPD are strategies that base their behaviour in round n + 1 of an interaction on what happened in round n. The most famous example of this type of strategy is ‘Tit-for-Tat’ (TFT), which consists of cooperating in the first round of the iteration, and then doing whatever the opponent did in the previous round. In the seminal computer tournaments of Axelrod (1984), the simple TFT strategy emerged as the clear winner against a range of other strategies (including very sophisticated ones), and the success of TFT was attributed to the fact that it never defects first, retaliates when the opponent defects, but forgives when the opponent reverts to cooperation. These properties generate iterated interactions that consist either mostly of CC rounds or mostly of DD rounds, and hence can be interpreted as giving rise to positive assortment between cooperative behaviours (J. Fletcher, pers. comm.). In particular, in a population of TFT players, individuals end up always cooperating, hence the success of TFT corresponds to maintenance of cooperation. However, the precise meaning of success in the IPD is somewhat ambiguous. For example, Boyd & Lorberbaum (1987) showed that no deterministic strategy is evolutionarily stable in the IPD. Moreover, TFT performs poorly in a noisy world, in which players are prone to make erroneous moves that can cause long series of low paying retaliatory behaviour.
The problem of noise can be addressed by considering probabilistic strategies. For strategies that condition the propensity to cooperate on the opponent's move in the previous round, evolutionary dynamics reveals that the probabilistic strategy Generous TFT (which retaliates only with probability 2/3) prevails in the long run, albeit only after its rise is catalysed by TFT (Nowak & Sigmund 1992). Extending the strategy space to include strategies that condition the probability to cooperate on the payoff received in the previous round, i.e. on the previous moves of the opponent as well as of the player itself, a new type of strategy, termed Pavlov, evolves (Nowak & Sigmund 1993). Pavlov implements a simple and intuitive behavioural rule: win-stay, lose-shift. It consists of repeating the previous move if that move resulted in the high PD payoffs T or R, and of switching to the opposite behaviour if the previous round resulted in the low PD payoffs P or S. Interestingly, Pavlov again relies on TFT as a catalyst of cooperation. To date, Pavlov appears to be the most consistently successful strategy in the IPD (Kraines & Kraines 2000).
Many other IPD strategies have been studied in the literature (see e.g. Sugden (1986); Brembs (1996) and Dugatkin (1997), especially Table 2.1 in the latter). In addition, there are many interesting variants and extensions of the IPD, which we can only mention briefly here. A biologically relevant alternative is obtained by considering the alternating IPD, in which players take turns in updating their behaviour (Frean 1994; Nowak & Sigmund 1994; Neill 2001). The alternating IPD tends to favour more forgiving strategies than the simultaneous IPD (Frean 1994; Nowak & Sigmund 1994), and it has been argued that the best strategies for the alternating game have a large memory, i.e. are strategies that are based on a number of previous moves (Frean 1994; Neill 2001). This seems to contrast with results for the synchronous IPD, in which increasing the memory size does not seem to significantly change the characteristics of successful strategies (Axelrod 1984; Lindgren 1991; Hauert & Schuster 1997). Other results show that cooperation is favoured if engaging in IPD interactions is optional (Batali & Kitcher 1995; see also Box 3), if there are extrinsic factors that maintain variation in behaviour (McNamara et al. 2004), or if more sophisticated strategies are considered. For example, successful strategies can exhibit an internal state that implements the idea of good and bad standing and enables strategies to deal with certain types of errors (Boerlijst et al. 1997). Internal states can also serve to implement basic forms of information processing that can lead to superior performance (Hauert & Stenull 2002).
By studying stochastic game dynamics in finite populations, Nowak et al. (2004) have recently argued that cooperation in the IPD may be enhanced by small population sizes. In another recent development, a new round of IPD tournaments was organized (see http://www.prisoners-dilemma.com) to commemorate the 20th anniversary of Axelrod's seminal work on the IPD (Axelrod 1984). This time so-called ‘colluding strategies’ emerged as the winning type. These strategies cooperate with their own type and play TFT against everyone else. In order to discriminate between self and non-self, colluding strategies exchange a secret handshake in the form of a sequence of identification moves at the beginning of each IPD encounter. However, it is not clear how such identification mechanisms would evolve in the first place, and how colluding strategies can increase in frequency when they are rare (i.e. when they do not meet their own type), but the concept of collusion may provide interesting new perspectives for use of the IPD in behavioural ecology and psychology.
Spatial PD games
Investigating the effects of spatial structure on population biological processes has been a major theme in theoretical ecology and evolution in the past two decades. In particular, it has been realized that spatial structure may be a potent promoter of cooperation. Axelrod (1984) already pointed out the potential role of spatial structure, but it was really the seminal paper by Nowak & May (1992) that spawned a large number of investigations of ‘games on grids’ (Nowak & Sigmund 2000), i.e. evolutionary games that are played in populations whose individuals occupy sites on a spatial lattice. Payoffs obtained from local interactions with neighbouring individuals are then used to update the lattice, i.e. to create subsequent generations in the evolutionary process. The propagation of successful strategies to neighbouring sites may be interpreted either in terms of reproduction, or in terms of imitation and learning (Nowak & Sigmund 2004). There are a number of different ways in which such updating procedures can be implemented with respect to individual sites (e.g. deterministic or probabilistic) and to the entire lattice (e.g. synchronous or asynchronous). Nevertheless, an unambiguous conclusion that has been reached from studies of the spatial PD is that spatial structure promotes cooperation (Nowak & May 1992, 1993; Hubermann & Glance 1993; Nowak et al. 1994a; Killingback et al. 1999). Cooperators can survive by forming clusters within which they reap the benefits from mutual cooperation and which allows them to persist despite exploitation by defectors along the cluster boundaries (Fig. 1). Thus, maintenance of cooperation in the spatially structured PD is a robust phenomenon, even though the dynamics of the spatial games can be very complicated, and even though the exact range of PD payoff parameters b and c (cf. Fig. 1a) for which cooperation can persist does depend on the update rules (Hubermann & Glance 1993; Nowak et al. 1994a; Nowak & Sigmund 2000). For an interactive on-line tutorial exploring these issues we refer to Hauert (2005).
Because spatial clustering implies that cooperators interact more often with their own type than expected by chance based on mean population frequencies, it is possible to interpret the effects of spatial structure on the evolution of cooperation in the context of the theory of kin selection Hamilton (1963). Box 2 discusses the connections between the spatial PD and kin selection in more detail.
The conclusion that spatial structure is beneficial for cooperation has also been reached for spatial versions of the IPD. For example, Lindgren & Nordahl (1994) showed that compared with non-spatial games, the unconditional cooperator AllC does much better in spatial IPD's and often outperforms TFT. Among deterministic strategies that condition their moves on the previous round, Pavlov is the most successful strategy in spatial IPD's (Lindgren & Nordahl 1994), just as in the non-spatial IPD. However, if probabilistic reactive strategies are considered, spatial structure favours more forgiving versions of the strategies that are successful in unstructured games (Grim 1995; Brauchli et al. 1999).
Perhaps the most promising approach for understanding the dynamics of lattice models analytically involves the technique of pair approximation (Matsuda et al. 1992; Ellner 2001). This deterministic approximation yields a set of differential equations describing the dynamics of spatial games based on pair correlations between nearest neighbours, while neglecting higher order terms. Pair approximation has led to fairly good agreement with results from numerical simulations in a number of different models (Dieckmann et al. 2000). Examples are shown in Fig. 1 for the spatial PD and in Fig. 2 for the spatial SD.
Other analytical results have been obtained for spatial models under simplifying assumptions, e.g. for one-dimensional lattices (Eshel et al. 1999). Finally, analytical results have also been obtained by using reaction-diffusion models based on partial differential equations (Hutson & Vickers 1995; Ferrière & Michod 1996) to describe spatially structured populations. The results confirm the overall conclusion that spatial structure is beneficial for cooperation in the PD.
So far, this conclusion has been reached mainly for models in which spatial structure was incorporated by using regular square lattices, in which interactions and reproduction/imitation was limited to either the four or the eight nearest neighbours. Some recent results indicate that the lattice topology does affect the dynamics of cooperation and that, interestingly, relaxing the rigid purely local neighbourhood structure of lattices seems to benefit cooperation (Abramson & Kuperman 2001; Masuda & Aihara 2003; Ifti et al. 2004; Hauert & Szabó 2005). For example, in PD games on random regular graphs (in which all individuals have the same fixed number of neighbours, but neighbours are drawn randomly from the population), the parameter range over which cooperators persist is larger than for regular lattices (Hauert & Szabó 2005). This is surprising because the formation of compact clusters is more difficult on random regular graphs. Also, Koella (2000) has shown that cooperation can persist in spatial PD interactions even if dispersal and interaction distances are allowed to evolve, leading to long-range dispersal and interactions in defectors, but not in cooperators. On the other hand, cooperation can be impeded if for any given individual there is a substantial difference between its interaction and its reproduction neighbourhood. (Ifti et al. 2004). For future research it will be an interesting topic to address these questions in greater detail, and in particular to study the evolution of lattice topologies and neighbourhood sizes.
Box 2: Kin-selection and population structure
The theory of kin selection (Hamilton 1964a,b) is often invoked to explain the origin of cooperation and the resolution of conflicts. The basic idea is that if a ‘helper gene’ causes its carrier to provide a benefit b to others at a cost c to itself, then the frequency of the helper gene only increases if the benefits fall sufficiently often to other carriers of the gene, e.g. because of relatedness between actor and recipient. Specifically, if r is the degree to which benefits accrue to other altruists compared with average population members, then Hamilton's rule specifies that the helper gene will increase from low frequencies if its inclusive fitness r b − c is greater than zero.
Kin selection is rarely considered in models of reciprocal altruism (for exceptions see e.g. Marshall & Rowe 2003), but it is possible to establish a connection between kin selection and the dynamics of cooperation in the spatial PD. It is generally thought that kin selection should operate in ‘viscous’ populations (Hamilton 1964a), in which limited dispersal promotes interactions among relatives. In the lattice models discussed here, population viscosity is obtained by assuming that individuals only interact with and disperse to neighbouring sites. The following simple argument illustrates that kin selection can benefit cooperation under these conditions. Imagine a homogenous lattice population consisting of defectors into which cooperators try to invade. An analytical argument based on the technique of pair approximation (van Baalen & Rand 1998; Le Gaillard et al. 2003, see also main text) shows that as long as cooperators are rare, every cooperator has on average approximately one other cooperator in its neighbourhood. Therefore, from playing a PD against each of its n neighbours the cooperator gets a total benefit of b and pays a total cost of n c. On the other hand, defectors get nothing, having on average only defectors as neighbours because cooperators are rare. As a result, cooperators can invade if b − n c > 0, or equivalently, if r b − c > 0, where r = 1/n is the average degree of relatedness of a cooperator to its neighbours. This could be considered as Hamilton's rule for the spatial PD, and inspection of Fig. 1a shows that the rule is quite accurate: for n = 4 cooperators should be able to invade if b > 4c, i.e. if r = c/(b − c) < 0.2, which is roughly confirmed by the numerical simulations.
It is worth pointing out that although spatial structure clearly favours cooperation in the PD (without spatial structure, cooperators would never thrive), the region of parameter space in which cooperators can persist is rather small. In terms of the spatial Hamilton's rule above, this is because the average relatedness of an invading cooperator to its neighbours is rather small. Thus, even though population viscosity is supposedly very high in lattice models with nearest–neighbour interactions, cooperators tend to have few cooperating neighbours during an invasion attempt. This can in turn be attributed to the fact that cooperators not only help each other, but also compete for lattice sites, thus limiting each other's proliferation.
In fact, Wilson et al. (1992) and Taylor (1992), and more recently West et al. (2002), have pointed out that population viscosity not only increases relatedness among cooperatively interacting individuals, but also increases competition for resources among relatives. West et al. (2002) show how these opposing effects can be incorporated into a modified version of Hamilton's rule that takes into account the relatedness of a cooperator to individuals who suffer increased competition from recipients of the cooperative act. The earlier results of Wilson et al. (1992) and Taylor (1992) indicated that the conditions for cooperation to thrive are exactly the same in well-mixed and in spatially structured populations, and hence that spatial structure may actually have no effect on the evolution of cooperation. However, these results may be too pessimistic, as spatial structure can favour cooperation not only in the spatial PD, but also in the corresponding lattice models for the Public Goods game (Mitteldorf & Wilson 2000; Hauert et al. 2002b; Szabó & Hauert 2002; see Box 3 for an explanation of the Public Goods game). In fact, the effect that competition between relatives counteracts kin selection is likely to be most pronounced in such lattice models, in which game interactions and competition occur among nearest neighbours. Moreover, Le Gaillard et al. (2003) have argued that through the minor change of allowing for empty lattice sites, the effect of competition between relatives becomes much weaker. In situations in which reproduction is local, but competition is global, e.g. because of high dispersal a scenario that Wilson et al. (1992) called ‘alternating viscosity’, competition between relatives will not be effective in impeding the evolution of cooperation through kin selection. West et al. (2001) described an empirical example in fig waSPS where intense local competition can indeed prevent cooperation despite potentially strong kin selection, and they supported this idea with recent microbial experiments (Griffin et al. 2004). In general, the extent to which local competition can counteract the beneficial effects of population viscosity in natural systems will critically depend on the particular form of population structure, and on the stages in the life cycle that are affected by cooperative acts (Wilson et al. 1992; van Baalen & Rand 1998; Le Gaillard et al. 2003). These questions deserve further theoretical as well as empirical investigations.
Continuous PD games
In the classical PD, cooperation is all or nothing, since this game has only two strategies. However, it is natural to assume that in real systems, cooperation can vary continuously. This idea has been present in other models of cooperation (e.g. Frank 1998), but continuous cooperative investments have only rather recently been incorporated into the PD (Mar & St Denis 1994; Killingback et al. 1999). In fact, it is straightforward to define a continuous version of the PD by assuming that cooperative strategies are defined by a real number x that lies in some interval [0, xmax], where xmax is the maximal possible investment. One then assumes that the benefit that an individual with trait value x provides to the opposing player is given by a benefit function B(x), whereas the cost that strategy x incurs to its carrier is given by a cost function C(x). Thus, if two individuals with trait values x and y play the continuous PD, player x gets the benefits from the cooperative investments y and incurs the costs from its own investment x, hence the payoff to player x is B(y) − C(x). Similarly, the payoff to y is B(x) − C(y). Typically, one assumes that the functions B(x) and C(x) are monotonically increasing and satisfy B(0) = C(0) = 0, as well as B(x) > C(x) at least for small x (otherwise mutual cooperation would be bad). For example, these functions could be linear: B(x) = bx and C(x) = cx, with b > c > 0. In such continuous games one would like to know the evolutionary dynamics of the cooperative trait x. In the section on the continuous Snowdrift game, we briefly describe how the theory of adaptive dynamics can be used as a general approach to investigate continuous games. For the continuous PD it is easy to see, and intuitively clear, that the trait x always evolves to 0, essentially because the cooperative trait only affects costs, but not the benefits of its carrier. Thus, defection prevails in the continuous PD and once again turns to investigating supporting mechanisms that can cause the trait x to evolve to non-zero levels.
The extensions considered to date are iteration, and spatial structure. In the continuous IPD, players make continuous cooperative investments over a number of rounds. For example, the investment in round n + 1 can be based on the opponent's investment in round n: xn+1 = f(yn). Wahl & Nowak (1999a,b) have investigated the case where the function f is linear: xn+1 = kyn + d. Cooperative strategies are characterized by high values of k and d, because when such strategies play against themselves, iteration quickly leads to large cooperative investments, and hence to large payoffs [in each round, payoffs are calculated as for the continuous PD, i.e. based on the benefit and cost functions B(x) and C(x)]. Wahl & Nowak (1999a,b) analysis is rather complex, but the general picture that emerges is nicely summarized in Figure 7 of their second paper (Wahl & Nowak 1999b): more cooperative strategies can gradually evolve, but once cooperation has reached a certain level, it becomes vulnerable to invasion by defecting strategies. This results in ever lasting cycles between cooperation and defection. In particular, cooperation cannot be stably maintained in this type of model.
In a similar vein, Roberts & Sherratt (1998) have devised a class of ‘raise-the-stakes’ strategies for iterated interactions that consist of increasing cooperative investments in response to an opponent's cooperation in the previous round. They have argued that these strategies do well against a number of traditional strategies in the IPD, such as TFT. However, in a continuous strategy space, evolutionary dynamics would gradually decrease cooperative investments in raise-the-stakes strategies (Killingback & Doebeli 1999). Thus, cooperation seems to be generally difficult to maintain in the continuous IPD if future investments are solely based on the current investment of the opposing player.
Things turn out to be different if investments in round n + 1 are based not just on the investment of the opponent, but on the net payoff received in the previous round. Here, xn+1 = f(pn), where pn = B(yn)−C(xn) is the payoff that an individual playing xn received when playing against yn. Killingback & Doebeli (2002) have analysed the case where the function f is linear, so that xn+1 = kpn + d. Cooperative strategies are again characterized by high values of k and d, but it should be noted that determining the dynamics of the investment levels during a single iteration is already a non-trivial problem. Nevertheless, Killingback & Doebeli (2002) have shown that cooperative strategies evolve if the benefits B(x) increase fast enough for small investments [i.e. whenever the slope B ′(0) is sufficiently large]. Thus, when continuous investments are based on previous payoffs, cooperation can evolve and persist in the continuous IPD. This echoes the findings from the classical IPD, where Pavlov-like strategies are generally more successful than TFT-like strategies that base their behaviour on the opponent's previous move, rather than on previous payoffs.
Interestingly, cooperation does not evolve if the continuous IPD is used as a model for mutualism between two different species. In this case, payoffs are obtained from continuous IPD interactions between members of different species, but competition for reproduction based on these payoffs occurs within species (Doebeli & Knowlton 1998). Scheuring (2005) has recently shown analytically that in this setting, cooperative strategies do not evolve in unstructured populations. However, Doebeli & Knowlton (1998) have shown that the evolution and maintenance of cooperation, and hence mutualism, are possible if the two interacting populations are spatially structured. Moreover, spatial structure can promote cooperation even in the continuous PD without iteration (Killingback et al. 1999), and can lead to coexistence of two distinct phenotypic clusters of high and low investors (Koella 2000). For lattice games with variable population sizes, the results of van Baalen & Rand (1998) and particularly those of Le Gaillard et al. (2003) imply that evolution of cooperation in the continuous PD should be the default expectation in spatially structured populations. Overall, it appears that the same mechanisms that support cooperation in the classical PD can promote cooperation in the continuous PD.
Other extensions of the PD
Iteration, spatial structure, and continuous investments, as described in the preceding paragraphs, are but three general types of extensions of the basic PD. Another important generalization consists of extending the PD to interactions among more than two players. The resulting N-player games are called Public Goods games and have a long tradition in the economics literature (Kagel & Roth 1995). Box 3 explains some basic aspects of Public Goods games and highlights interesting consequences of optional participation in such games. A different line of extensions of the PD is based on the idea that individuals may carry a reputation, and that players can condition their behaviour on the opponents’ reputation. This leads to the notion of indirect reciprocity (Alexander 1987; Nowak & Sigmund 1998; Panchanathan & Boyd 2004), which is the basis for the mechanisms of reward and punishment favouring cooperation in PD interactions. These concepts are explained in Box 4.
A related idea consists of considering tag-based games, in which cooperative interactions occur between individuals that are similar with respect to some neutral characteristic such as colour (Riolo et al. 2001; Hochberg et al. 2003; Axelrod et al. 2004). Tag-based cooperation appears to be prone to exploitation by unconditional cheaters (Roberts & Sherratt 2002), but further investigations of this interesting idea are called for. Overall, judging from the number of recent publications in high profile journals dedicated to the study of cooperation based on PD interactions, it is clear that this is a thriving line of research that attracts a lot of interest from a diverse array of scientists.
Box 3: Public Goods games and volunteering
The generalization of PD type interactions to groups of arbitrary size N is known as Public Goods games (Kagel & Roth 1995). In a typical Public Goods experiment a group of, e.g. six players gets an endowment of $10 each. Every player then has the option to invest part or all of their money into a common pool knowing that the experimenter is going to triple the amount in the pool and divide it equally among all players regardless of their contribution. If everybody invests their money, each player ends up with $30. However, each invested dollar only yields a return of 50 cents to the investor. Therefore, if everybody plays rationally, no one will invest, and hence the group of players will forego the benefits of the public good. In formal terms and assuming that players either defect or fully cooperate, the payoff for defectors becomes Pd = α nc γ/N, while the payoff for cooperators is Pc = Pd − γ, where α is the multiplication factor of the common pool, nc the number of cooperators in the group, and γ is the cost of the cooperative investment. As in the PD, defection dominates and cooperators are doomed. In fact, a Public Goods game in a group of size N is equivalent to (N − 1) pairwise PDs under the transformation b = α γ/N, c = (N − α)/[N(N − 1)] γ (Hauert & Szabó 2003). Under this equivalence, larger Public Goods groups correspond to larger numbers of single PD interactions. This implies that defectors can exploit cooperators more efficiently in larger groups, and hence that cooperation becomes increasingly difficult to achieve, which remains true even if interactions are iterated (Boyd & Richerson 1988; Hauert & Schuster 1998; Matsushima & Ikegami 1998). Interestingly, in experimental Public Goods games human subjects do not follow rational reasoning and often exhibit cooperative behaviour, thereby not only faring much better, but also undermining basic rationality assumptions in economics (Fehr & Gächter 2002). From a theoretical viewpoint, the reasons for this outcome are not fully understood but likely involve issues related to reward, punishment and reputation (Milinski et al. 2002), some of whose basic features are explained in Box 4.
Another approach to overcome the Public Goods dilemma is to allow for voluntary participation, which can be modelled by considering a third strategic type, called the loners (Hauert et al. 2002b). Loners are risk averse and instead of engaging in the Public Goods game rely on a small but fixed income σ [(α − 1)γ > σ > 0, where (α − 1)γ is the payoff for mutual cooperation and 0 the payoff for mutual defection]. This results in a rock-paper-scissors type dominance hierarchy of the three strategies: if everybody cooperates it pays to switch to defection, if defection dominates it is better to abstain and choose the loners option, and if loners abound, cooperation becomes attractive again, because it is likely that the effective group size in the Public Goods interaction is small and produces high returns. As a result, cooperators and defectors co-exist with oscillating frequencies. Thus, voluntary participation provides an escape hatch out of states of mutual defection and economic stalemate. Interestingly, the average payoff of all three strategic types, and hence the average population payoff, converges to the loner's payoff σ (Hauert et al. 2002a), which is better than a population payoff of 0 that would evolve in the absence of loners. The above dynamics of voluntary Public Goods interactions has recently been observed in experiments with humans (Semmann et al. 2003). We also note that in spatial voluntary Public Goods games, in which individuals interact only with a limited local neighbourhood (see section Spatial PD games), the average population payoff is usually greater than σ, i.e. the population draws a net profit from voluntary Public Goods interactions.
(A)
[ Replicator dynamics of the voluntary Public Goods game. The three homogenous states of the population eloners, ecooperators and edefectors are unstable, reflecting the rock-scissors-paper type dominance hierarchy between cooperators, defectors and loners. There is an interior neutrally stable fixed point Q that is surrounded by neutrally stable closed orbits. True stability of Q or interior limit cycles can be obtained through various extensions of the model, e.g. by introducing spatial structure. Parameters: n = 5; α = 3; c = 1; σ = 1. ]
Box 4: Cooperation through reputation
Direct reciprocity can establish cooperation in repeated interactions following the simple rule ‘I help you and you help me’. However, in higher organisms, and humans in particular, cooperation may also be established through indirect reciprocity: ‘I help you and someone else helps me’ (Alexander 1987). The basic idea is that an individual can improve its reputation, or image score, by helping fellows in need. It thereby produces a costly signal, which in turn will be assessed by other members of the population and may trigger assistance in case the individual itself is in need. Indirect reciprocity requires some consensus about how behaviour affects reputation. How such a consensus is reached is an interesting question in itself that deals with the establishment of social norms (Henrich et al. 2001). If higher image scores increase the chance of receiving help in the future, then discriminating strategies that condition their help on an acceptable image score of the recipient can promote cooperation (Nowak & Sigmund 1998). However, such scoring strategies have one weakness: whenever they refuse to help a cheater with a low score, their own score drops and reduces the chances of future help. To avoid this, the concept of standing was introduced, whereby the individual remains in good standing if it refuses to help an ‘unworthy’ recipient (Leimar & Hammerstein 2001). This concept could be taken even further by demanding that an individual attains bad standing for helping an unworthy cheater. Investigating these questions is an active field (Brandt & Sigmund 2004; Ohtsuki & Iwasa 2004) and includes interesting experimental studies indicating that humans tend to favour the simpler scoring strategies (Wedekind & Milinski 2000; Milinski et al. 2002).
The concept of reputation also lends itself to studying the role of punishment and reward for cooperation. Punishment is common in nature (Clutton-Brock & Parker 1995), ranging from simple forms of spiteful toxin production in bacteria (Kerr et al. 2002) to institutionalized civil and criminal law in humans. The success of cooperators hinges on the ability to condition cooperation on information about the opponent's reputation, i.e. about whether the opponent punishes defection, and to adjust the behaviour accordingly (Sigmund et al. 2001; Brandt et al. 2003; Hauert et al. 2004). Such interactions with second thoughts occur in two stages: first individuals decide whether to cooperate or to defect; second, individuals may punish the opponent conditioned on the outcome of the first stage. This results in four basic behavioural types: the social strategy G1 that cooperates and punishes defection, the paradoxical strategy G2 that defects but punishes, the asocial G3 strategy that neither cooperates nor punishes, and finally the mild G4 strategy that cooperates but does not punish. G2 is paradoxical because it does poorly when facing other G2 players. In evolving populations, the asocial G3 eventually reaches fixation. The reason for this is that the social G1 cannot discriminate between other G1 and G4 players. Hence G4 players can increase in numbers through random drift and thereby facilitate successful invasions by the asocial G3. This outcome changes dramatically if reputation is introduced, i.e. if individuals may learn about the punishing behaviour of their opponent and adjust their cooperative behaviour accordingly. This is illustrated in Fig. B. In spatially structured populations in which interactions are limited to the nearest neighbours (cf. section Spatial PD games), punishment also promotes cooperation, and quite intriguingly can even enforce cooperation if the costs of cooperation exceed the benefits (Brandt et al. 2003).
In contrast to punishment scenarios, rewarding mechanisms seem to be limited to higher organisms, and perhaps even to humans. Interestingly, already the simplest models indicate that such mechanisms lead to complicated dynamics that make it much more difficult to establish and maintain cooperation (Sigmund et al. 2001). This is essentially because rewarding individuals are easily exploited, while it is impossible to exploit punishers. Consequentially, rewarding mechanisms do not allow for similarly clear-cut conclusions as are possible for the case of punishment.
(B)
[ Dynamics of the PD with punishment and reputation. The four-dimensional strategy space foliates into invariant manifolds because, (x1x3)/(x2x4) is an invariant of the dynamics, where xi denotes the frequency of strategy Gi. The dynamics is illustrated on the manifold given by (x1x3)/(x2x4) = 1. The figure illustrates that reputation leads to bi-stable dynamics. Depending on the initial configuration, the population evolves either towards a pure social G1 state or a purely asocial G3 state. The basin of attraction of the two states is determined by the cost/benefit ratio of cooperation, as well as by the cost/fine ratio of punishment. It can be shown that under rather general conditions the social strategy G1 has the larger basin of attraction (Sigmund et al. 2001). ]