Game theory provides the tools for scientists interested in the strategies/decision rules that underlie helping behaviour and that may explain its evolution and stability. Variation of cost or benefits from helping and number of interactions between partners are factors that determine the control mechanisms a player may use to prevent a partner from cheating. The key contributions of game theory include (1) an emphasis on the fact that an individual’s best behavioural option may depend on how the partner(s) behave, (2) a framework that may explain variable behaviour both within individuals and between individuals, and (3) the possibility to explore the evolutionary dynamics of cooperative strategies, i.e. the specification of conditions that allow cooperative strategies both to evolve and be maintained. We think this aspect has been neglected in the recent conceptual papers by Lehmann & Keller (2006) and by West et al. (2007). Therefore, this approach will be described in much more detail than the other approaches.
Scientists interested in the evolutionary pathways that allow helping to be selected for investigate the average consequences of a behaviour on lifetime direct and indirect fitness, keeping the behaviour of the recipient constant (Lehmann & Keller, 2006; West et al., 2007). However, behaviour is flexible. Therefore, the behaviour of one individual should be dependent on the behaviour of other individuals rather than being fixed. Evolutionary game theory (Maynard Smith & Price, 1973; Maynard Smith, 1982) is the tool to explore the strategies that must necessarily underlie such flexible behaviour and which cause the adjustment of behaviour in response to variable outcome of interactions. Trivers (1971) provided the starting point for all research that seeks to understand how cooperative behaviour can be enforced with his concept of reciprocal investment (‘reciprocal altruism’). The basic observation is that we can often observe behaviours with the immediate effects of a benefit for the recipient and a cost to the actor. For example, playing ‘C’ in any round of an iterated prisoner’s dilemma is an investment: by definition the payoff within the interaction is lower than if the player had defected (Luce & Raiffa, 1957). Only the partner’s future behaviour may more than compensate for this investment. Therefore, the question emerges how the investor ensures that it will gain future benefits from the investment that will more than compensate the current costs as otherwise this form of helping would be counter-selected.
Here, we focus on individual strategies that promote cooperation as there is hardly any research on altruistic strategies. Cooperation often occurs between related individuals (Clutton-Brock, 2002; West et al., 2002) but the relevant models usually assume that partners are unrelated to each other or that indirect benefits are not high enough to promote helping behaviour. We will not distinguish between intraspecific cooperation and interspecific mutualism, neither with respect to the models nor with respect to examples that we will provide as illustrations, as the general problem remains the same: we want to know how investments may provide more than compensatory benefits to the actor. More specifically, we want to know how the strategies of investors ensure return benefits with cooperative partners or reduce both own losses and the gains of a cheating partner (Bshary & Bronstein, 2004; Sachs et al., 2004; Noë, 2006). We therefore have to ask (1) how the investment affects the behaviour of the recipient or of bystanders and (2) how investors behave in a similar situation in the future, depending on what their original investment has yielded.
Searching evolutionarily stable cooperative strategies
The classic study for this kind of approach by Axelrod & Hamilton (1981) provided the possibility to analyse the evolutionary stability of competing strategies in the iterated prisoner’s dilemma game. They first specified the game structure: players are paired randomly, each player has two behavioural options, a payoff matrix specifies the payoff for each player in each possible combination of behaviours, and there is a fixed probability of playing another round with the same partner. In the second step, they asked colleagues to submit strategies that they thought to be competitive in the specified game structure. Finally, they ran a computer tournament where the average payoff of a strategy in one round of interactions translated into the strategy’s abundance (increasing or decreasing in relative frequency) in the next round. Axelrod & Hamilton (1981) found two ‘winners’, either ‘always defect’ or ‘tit-for-tat’, a simple conditional cooperative strategy that causes the individual to start cooperatively in the first round and then to copy the partner’s behaviour of each previous round. Thus, a tit-for-tat player cooperates as long as the partner cooperates but switches to cheating if the partner cheats. Meanwhile, new conditional cooperative strategies have been tested in the iterated prisoner’s dilemma and emerged as superior to tit-for-tat (references in Dugatkin, 1997).
Maynard Smith (1982) noted that a weakness of the research using game theory at that time was that too much emphasis was made on finding stable equilibria rather than trying to define the phenotype set, i.e. the strategies that could be used by players. The variation in potential strategies may be limited by constraints on physiology, lack of information or lack of cognitive abilities, among others. To give some examples, a noncompetitive individual cannot reasonably threaten to inflict harm on a noncooperative dominant; selectively helping cooperative individuals requires close spatial association so that individuals could in principle acquire the necessary information, and also requires strong memory capacities. It is therefore a key challenge for both theoreticians and empiricists to identify all potential strategies that could be played in specific case studies, applying their knowledge about the system to identify constraints and elucidate the game structure itself. Unfortunately, this effort has rarely been made. Only for the iterated prisoner’s dilemma game, a large variety of strategies has been developed and tested against each other (Axelrod & Hamilton, 1981; Boyd, 1989; Nowak & Sigmund, 1992, 1993). However, constraints on playing any one of these strategies have rarely been addressed (but see Milinski & Wedekind, 1998) and the game structure seems to apply to very few known examples of cooperation (Dugatkin, 1997), while they apparently are irrelevant for the many known cases of mutualisms (Bergstrom et al., 2003). For other games, few strategies have been tested and potential constraints have been ignored. For example, we are not aware of any study where various strategies based on punishment as control mechanism must compete against each other to see which one prevails. Just to name a few possibilities, a punishment strategy could be ‘always cooperate and punish your partner after each round in which it failed to cooperate as well’, or ‘play tit-for-tat and in addition punish the partner for each defection’ or ‘start cooperatively, punish your partner the first time it fails to cooperate and switch to defection if the punishment does not alter the partner’s behaviour’. To identify feasible strategies and their respective potential constraints remains a key challenge for games other than the prisoner’s dilemma.
The controlling components of cooperative strategies
A crucial component of a strategy is how individuals foster cooperative behaviour on the part of the interaction partners or how they prevent cheating of the partner. Individuals may for example match the partner’s current behaviour in the next interaction (like tit-for-tat does) or they could respond to cheating with aggression or with the termination of the relationship. Such reactions are called control mechanisms because they will have negative effects on the total payoff of cheaters. Analytical models can be used to specify conditions under which a particular control mechanism may yield stable cooperation. Over the last 20 years or so, a large variety of concepts that may explain stable cooperation have been developed. In the literature, one can find by-product mutualism, pseudo-reciprocity, group augmentation, pay-to-stay, reciprocity, threat of reciprocity, parcelling, punishment, sanctions, power, partner switching, generalized reciprocity, strong reciprocity, policing, indirect reciprocity and social prestige. This diversity results partly because some terms are synonymous. However, it has also become clear from empirical advances that we need a large variety of concepts to grasp all the known examples of cooperation and mutualism (Bergmüller et al., 2007a). In an attempt to point out similarities and differences between each concept, Bergmüller et al. (2007a) found that most of the concepts can be classified with a combination of four basic parameters where each can be in one of two different states. For a detailed discussion of this classification we refer to the original paper as well as to 22 commentaries and the authors’ reply (Bergmüller et al., 2007b) in a special edition of Behavioural Processes (2007). Below, we restrict ourselves to a brief overview.
The following four parameters can be seen as building blocks to define the controlling aspect of a strategy that ensures that (within a certain parameter space) helping yields on average a net fitness benefit for the helper. (1) The act of helping: an investment or a self serving mutually beneficial behaviour? (2) The return benefits: an investment (i.e. a costly response) or a self serving mutually beneficial behaviour? Reciprocity is defined by mutual investment, whereas in pseudo-reciprocity the behaviour of one player is self-serving mutually beneficial. (3) Identity of the individual that provides the return benefits: the recipient or a bystander in a communication network (McGregor, 1993)? We call the former a ‘direct response’ and the latter an ‘indirect response’ (following Nowak & Sigmund, 1998). From the perspective of the responding individual, it might be more useful to describe direct benefits as experience based and indirect benefits as information based (Roberts & Sherratt, 2007). Note that this use of ‘direct response’ and ‘indirect response’ should not be confused with the ‘direct benefits’ and ‘indirect benefits’ through which the inclusive fitness of the actor is increased. (4) The nature of the return benefits: due to receiving a reward or due to avoiding a cost? Following Clutton-Brock (2002) we use the adjective ‘positive’ for the former and ‘negative’ if failure to help causes the infliction of a cost (‘punishment’). A combination of the states of the four parameters yields nine different basic concepts (Fig. 1, adapted from Bergmüller et al., 2007a).
Figure 1. Hierarchical classification of mechanisms that can maintain cooperative behaviour. By-product mutualism does not involve (1) investments that are directed towards others. An investment may be performed to obtain benefits resulting from the self-serving behaviour of the receiver (i.e. pseudo-reciprocity), without eliciting return investment. Alternatively, an investment may be (2) made in expectation of an investment in return (costly response), resulting in reciprocity. The investor may obtain benefits (3) either directly or indirectly (i.e. via third parties). (4) Cooperative behaviour may be stabilized by costly acts or by-products resulting from self-serving responses by the receiver (or third parties) that have either positive (+) or negative (−) effects on the partner.
Download figure to PowerPoint
The nine basic concepts that may explain why helping leads to direct fitness benefits for the actor
1. By-product beneficial behaviour. In this simplest form of cooperation, the mere existence of other individuals and their self-serving actions provide benefits to others, without involving investments. Its evolution and stability is therefore straightforward (Dugatkin, 1997; Leimar & Connor, 2003). Examples include cooperative hunting in jackals (Lamprecht, 1978) and more generally apply to cases of coordination (Clutton-Brock, 2002). Coordination is the basis for group living (selfish herd, Hamilton, 1971), mixed species associations and interspecific coordinated hunting (Bshary et al., 2006). Also some cases of group augmentation (Kokko et al., 2001) such as self-serving contributions to public goods (West et al., 2007, in their re-evaluation of ‘weak altruism’) fulfil the criteria of cooperation.
2. Direct positive pseudo-reciprocity (=‘pseudo-reciprocity’, ‘group augmentation’). In pseudo-reciprocity the recipient will use an investment for its own benefits. The donor benefits because the self-serving behaviour of the receiver benefits the investor as a by-product (Connor, 1986). The concept of group augmentation includes the very same logic. Most ant-mutualisms appear to be cases of pseudo-reciprocity (Leimar & Connor, 2003): the partner species typically invests in providing food rewards, which causes the ants to self-servingly defend their food sources against their predators.
3. Direct negative pseudo-reciprocity. This control mechanism relies on the potential victim’s ability to terminate the interaction (self-servingly), which has negative effects for the potential exploiter. The two basic concepts are ‘power’ (Johnstone & Bshary, 2002; Bowles & Hammerstein, 2003) and ‘sanctions’ (Herre et al., 1999; Kiers et al., 2003). Sanctions work because of a sequential game structure. One class of players makes an initial investment that is available to members of another class of interaction partners that have to make their offer. The initial investor can then selectively stop the interaction if a partner did not offer net benefits, through which the partner loses everything. Experimental evidence for sanctions has been provided in leguminose plant–rhizobia interactions, where plants selectively stop the maintenance of nodules in which the bacteria fail to fix a minimum amount of nitrogen (Kiers et al., 2003). Power differs from sanctions in that actions are not sequential but parallel. Many real life interactions usually last some time, allowing a potential exploitee to prematurely end an interaction. This selects for potential cheaters to cooperate as long as the payoff of a prolonged cooperative interaction is higher than the payoff of a shorter exploitative interaction. Both power and sanctions may yield cooperative outcomes in one-off interactions.
A third form of negative pseudo-reciprocity is partner switching. If a player cheats, the victim’s best option may be to switch to another partner for the next interaction (Ferrière et al., 2002; Bshary & Grutter, 2002a). Partner switching requires a repeated game structure and an asymmetry between cheater and victim: the cheater must belong to the abundant class of players from which individuals are chosen by potential victims, which are the members of the rare class of players. Under these circumstances, leaving a cheater is self serving while the cheater incurs a cost because it will spend some time without any interaction partner. Client reef fish with access to several cleaning stations appear to use switching as a mechanism to control the behaviour of cleaner wrasses (Bshary & Schäffer, 2002).
4. Indirect positive pseudo-reciprocity. This concept is based on ‘social prestige’ (Zahavi, 1995; Roberts, 1998; Lotem et al., 2003). In social prestige individuals signal their quality (cooperative behaviour is a handicap) to bystanders through helping. Bystanders choosing to interact with individuals with high prestige make a self-serving decision; they can expect personal benefits from this choice, like females choosing a high quality male to sire her offspring. In cleaning mutualism involving the cleaner wrasse Labroides dimidiatus, clients pay attention to how cleaners treat their current client and cleaners are therefore more cooperative towards their current client in the presence of bystanders (Bshary & Grutter, 2006). Clients are self-serving in choosing to interact with a cleaner that treated another client well and avoiding interactions with a cleaner that cheated another client as they make their choice in order to increase the average service quality they receive.
5. Indirect negative pseudo-reciprocity. The concept applies to situations where an actor helps a recipient because otherwise a third party individual would do best by evicting the actor from the area. The concept could be applied to helpers that ‘pay-to-stay’ (Gaston, 1978) in cooperative breeding: helpers invest in offspring because otherwise it would be in the self interest of the breeder to evict the helper. However, it is a matter of perspective whether the helper actually helps the offspring or the breeder to avoid eviction from the territory. In the latter case, the helper would provide food to avoid direct negative pseudo-reciprocity (as it has been classified by Bergmüller et al. (2007a), but see Gilchrist (2007).
6. Positive direct reciprocity (= ‘reciprocity’, reciprocal altruism’, ‘reciprocal investment’, ‘parcelling’). The controlling aspect of positive reciprocity is based on rewarding cooperative partners: as long as the partner invests, the focal individual invests in return. If the partner cheats, however, the focal individual switches to cheating as well in the next round. Tit-for-tat and its cousins (Dugatkin, 1997) are the key strategies for repeated game structures. A special case is ‘parcelling’ (Connor, 1986, 1995), where partners cut the total investment into pieces and transfer a shot prisoner’s dilemma into an iterated game. The classic example is the egg trading in hamlet fish, a simultaneous hermaphrodite (Fischer, 1988).
7. Negative direct reciprocity (=‘punishment’). This control mechanism is based on an individual inflicting costs on a noncooperating partner at own expenses. Punishment therefore reduces the immediate payoff of the punisher (Clutton-Brock & Parker, 1995). In contrast to sanctions, punishment can therefore only evolve in a repeated game structure (unless it provides indirect fitness benefits, see Gardner & West, 2004). The function of the act is to alter the future behaviour of the victim towards cooperative behaviour, which will then benefit the punisher. An empirical example based on experimental evidence are client reef fish that respond to cheating by cleaners with aggression, which causes cleaners to behave more cooperatively towards the same client in their next interaction (Bshary & Grutter, 2002b, 2005). Also the pay-to-stay concept may be a form of negative direct reciprocity if the breeder punishes a noncontributing helper rather than evicting it. Only future empirical studies can reveal the relative importance of punishment and eviction for stable contributions of helpers in cooperatively breeding species where helping is based on pay-to-stay.
8. Indirect positive reciprocity (=‘image scoring’, ‘generalized reciprocity’). In indirect reciprocity based on image scoring, individuals invest only in partners that have sufficiently helped others in the past (Alexander, 1987). Helping raises the ‘image score’ while failure to help reduces the score. An image score above a critical threshold is necessary to receive help from third parties (Nowak & Sigmund, 1998; Leimar & Hammerstein, 2001). Empirical evidence for indirect positive reciprocity based on image scoring is currently restricted to humans (Wedekind & Milinski, 2000).
Another form of indirect positive reciprocity is generalized reciprocity. In this game, the logical order of reasoning is reversed: rather than investing in order to receive benefits in the future, individuals that received help are willing to invest into third parties. The identity of the third party or the third party’s past behaviour do not influence decisions; players only need to know what happened to themselves rather than how potential recipients behaved in the past (Pfeiffer et al., 2004; Hamilton & Taborsky, 2005). First evidence for this concept has been provided in rats (Rutte & Taborsky, 2007).
9. Indirect negative reciprocity (‘policing’, ‘strong reciprocity’). Indirect negative reciprocity is also called policing. Policing occurs in hymenoptera where workers eat the eggs laid by other workers and attack these ‘cheaters’ (Ratnieks & Wenseleers, 2005). However, it is unclear how relatedness between individuals influences policing, therefore kin selection might be involved. Indirect negative reciprocity has also attracted much attention in studies on human behaviour (Fehr & Gächter, 2002), as humans are willing to pay money in order to punish individuals who behaved uncooperatively towards others in one-shot games under anonymous laboratory conditions (‘strong reciprocity’).