The cancellation effect at the group level

Abstract Group selection models combine selection pressure at the individual level with selection pressure at the group level. Cooperation can be costly for individuals, but beneficial for the group, and therefore, if individuals are sufficiently much assorted, and cooperators find themselves in groups with disproportionately many other cooperators, cooperation can evolve. The existing literature on group selection generally assumes that competition between groups takes place in a well‐mixed population of groups, where any group competes with any other group equally intensely. Competition between groups however might very well occur locally; groups may compete more intensely with nearby than with far‐away groups. We show that if competition between groups is indeed local, then the evolution of cooperation can be hindered significantly by the fact that groups with many cooperators will mostly compete against neighboring groups that are also highly cooperative, and therefore harder to outcompete. The existing empirical method for determining how conducive a group structured population is to the evolution of cooperation also implicitly assumes global between‐group competition, and therefore gives (possibly very) biased estimates.


The cancellation effect at the individual level
The cancellation effect at the individual level was discovered by Wilson et al. (1992) and Taylor (1992a;b). Before then, it was more or less generally thought that as soon as interacting individuals are related, there is scope for cooperation to evolve (see Hamilton, 1971;Boyd, 1982;Grafen, 1983; and other references in Wilson et al., 1992;Taylor, 1992a;and Taylor, 1992b, for exceptions). The idea was that positive relatedness means that cooperators are more likely to interact with cooperators than defectors are, which implies that, while cooperators pay the cost of cooperating, they will also be on the receiving end of cooperation more often. If they are indeed sufficiently much more often the recipients of cooperation than defectors are, and if the benefits are sufficiently large, then the cost of cooperation can be offset by the increase in benefits received.
What Wilson et al. (1992) and Taylor (1992a;b) discovered is that being around cooperators is not necessarily unambiguously good news. While being around more cooperators means receiving more cooperation -which is good -it can also mean being around individuals that cooperate more with one another, and therefore constitute more fierce competition -which is not good. For cases in which relatedness is caused by local dispersal, receiving more cooperation and facing more intense competition can go hand in hand, and therefore, if individuals have opportunities for cooperating that are as local as their competition is, no benefit is large enough to get costly cooperation to evolve. The main insight provided by Wilson et al. (1992) and Taylor (1992a;b) is therefore that it is not enough to be related; what is needed is a discrepancy between the relatednesses to the individuals with whom one has the opportunity to cooperate, and those with whom one has to compete. While the idea that positive relatedness alone would be enough for the evolution of cooperation was inspired by Hamilton's rule (Hamilton, 1964a;b), it was pointed out that, by defining the fitness effects of cooperation versus defection appropriately, the cancellation effect can actually be identified within the framework of Hamilton's rule; see Taylor (1992b); Grafen (2007); , and Section 7 in van Veelen et al. (2017), all in settings where the game between individuals satisfies equal gains from switching. 1

The cancellation effect at the group level
Group selection models aim at capturing the opposing effects of selection at the individual level, where defectors do better than cooperators within groups, and selection at the group level, where groups with more cooperators do better than groups with fewer cooperators (Sober and Wilson, 1998;Wilson and Wilson, 2007;Richerson et al., 2016). The existing models within the group selection literature share the property that competition between individuals happens within groups, and that competition between groups happens in a setting where all groups compete with all other groups equally intensely (Traulsen and Nowak, 2006;Simon, 2010;Simon et al., 2013;Luo, 2014;Luo and Mattingly, 2017;van Veelen et al., 2014). This last property is 1 Ohtsuki (2012) analyzes a model that does not satisfy equal gains from switching. Other papers on the scale of cooperation versus the scale of competition are Queller (1992;1994) and West et al. (2002), but they take a different approach. While Taylor (1992b); Grafen (2007);  and van Veelen et al. (2017) include the cancellation effect by making sure to account for all fitness effects, and keeping the definition of relatedness the same, Queller (1992;1994) and West et al. (2002) get Hamilton's rule to hold by changing the relatedness into effective relatedness. When we will do computations to include the cancellation effect at the group level, we will follow the first approach. a useful simplification if the aim is to illustrate the possibility of a tug of war between the different levels of selection. It is, however, not particularly realistic. Groups themselves typically live in a structured population of groups, where they compete with their neighbouring groups more often than they do with groups that are farther apart. Local dispersal would then imply that groups with many cooperators are typically surrounded by groups that also contain many cooperators, compared to the groups that surround groups with many defectors, and therefore are also subject to more intense competition. This can significantly dampen the benefits of being a cooperative group, which in turn affects the balance between selection at the individual and at the group level.
In order to study the cancellation effect at the group level, and how it affects the balance between selection at different levels, we consider a stylized model, in which groups live on a cycle. We look at two replacement rules for groups. The first replacement rule is Birth-Death, where groups replace their direct neighbours.
The second replacement rule is Shift, where a group at one position can reproduce, and a group anywhere else can die, and all groups in between the two just move over. With Birth-Death, groups compete with their direct neighbours, and the cancellation effect at the group level is the largest it can be. With Shift, there is no cancellation at the group level at all.
The rest of the Supporting Information is organized as follows. In Section 2, we describe the model, including both replacement rules. Analytical thresholds for the benefit-to-cost ratios in the limit of weak selection, both for Birth-Death and Shift, are derived in Section 3. These thresholds are expressed in terms of the parameters of the model and relatednesses. The relatednesses are endogenous themselves and also depend on the parameters of the model, so we compute these in Section 4. In Section 5, the benefit-to-cost ratios and the relatednesses are combined and it is shown that the critical benefit-to-cost ratio for the Birth-Death process is always higher than the critical benefit-to-cost ratio for the Shift process, making group selection models that assume away the cancellation effect more optimistic about the conditions for the evolution of cooperation. Section 6 describes the simulations. The simulation results, which are not in the limit of weak selection, are compared to the thresholds in the limit of weak selection in Section 7. In Section 8, we derive analytical results for some limits other than the limit of weak selection, and in Section 9 we discuss the empirical implications.

The model 2.1 Two update rules on the cycle
We consider a simple model, in which groups are situated on a cycle. Both the group size and the number of groups are fixed; at each point in time, there are m groups consisting of n individuals. Each individual can be either a cooperator (C) or a defector (D). In every time period, one of three types of events will happen: an individual can replace another individual within a group, a group can replace another group, or two individuals from two neighbouring groups can change places. These individual, group, and migration events happen with probabilities p, q and r, respectively, and, without loss of generality, we assume that p + q + r = 1.
In order to demonstrate the difference between global and local between-group competition, we compare two different replacement rules for group reproduction: Birth-Death (BD) and Shift. Competition between groups is local in BD, and global in Shift. Individual and migration events happen in the same way in both processes.

Individual events
If an individual event occurs -which happens with probability p -then a random group is selected, and within that group, an individual is chosen to produce an identical offspring. All groups have equal probability of being chosen to host an individual level event. Within the group, defectors get an individual payoff of 1 and cooperators get an individual payoff of 1 − c. The intensity of selection w is then used to transform these payoffs to values f C and f D : The probabilities with which individuals are chosen for reproduction within the group are proportional to these values. 2 The probability p C (k i ) that a cooperator is chosen, and the probability p D (k i ) that a defector is chosen, in a group with k i cooperators, then become: Whenever an individual reproduces, someone from the same group is chosen to die, where each individual, including the parent, but excluding the offspring, is chosen with probability 1 n . Figure 2: An example of the individual reproduction events. A defector is chosen for reproduction and produces an identical offspring. Another defector is chosen to die. In this case, the overall group composition has not changed.
If an individual event happens, the number of cooperators in that group can go up or down by one, or remain constant, depending on who reproduces and who dies within the group. Once a group reaches a state where all individuals are cooperators or all individuals are defectors, individual selection can not change the state of that group. Individual selection on its own, within a given group, therefore constitutes a Markov process with two absorbing states.

Group events
If a group event occurs -which happens with probability q -then one group is chosen to reproduce, and one group is chosen to die. Which group reproduces depends on the distribution of cooperators among groups.
If the groups are numbered from 1 to m, then a population state is a vector k, in which k i ∈ {0, 1, ..., n} is the number of cooperators in group i, for i = 1, ..., m. Groups i and i + 1 are neighbouring groups, for i = 1, ..., m − 1, as well as groups 1 and m, which makes this a cycle. The group payoff of group i is 1 + ki n b. The intensity of selection w is then used to transform these payoffs to values g(k i ) = 1 − w + w 1 + k i n b = 1 + w k i n b Both in BD and in Shift, first a group is chosen for reproduction, where each group's probability of being chosen is proportional to their value g(k i ) as given below.
where K = m j=1 k j is the total number of cooperators in the population. If a group is chosen for reproduction, it produces an identical offspring group. Which group is being replaced depends on which replacement rule is used at the group level.

Birth-Death (BD)
The offspring group replaces its own parent group with probability 1 m , and it replaces either the left or the right neighbour of the parent group, both with probability m−1 2m . If the offspring group replaces the parent group, the population state does not change. The possibility to replace the parent group is included in order to make the analytical comparison between the two processes more straightforward. This does not have any profound consequences; any process with that possibility is equivalent to a process without that possibility, and with a lower probability of having a group reproduction event at all.
One group is chosen for reproduction.
The offspring group replaces the parent group or one of its immediate neighbours. Figure 3: The BD process. One of the groups is chosen to reproduce, proportional to group values, and produces an identical offspring group. The offspring group replaces one of the neighbouring groups, or, with a small probability, it replaces the parent group itself.

Shift
Each group, including the parent group, but excluding the offspring group, is chosen to die with probability 1 m . Hence, with Shift, every group is equally likely to die and competition between groups is therefore global, as it is in the standard group selection models with a well-mixed population of groups. Once a group dies, the offspring group is either placed to the right or to the left of the parent group, with equal probability, and every other group between the parent group and the dying group moves over one spot.
One group is chosen for reproduction.
One group is chosen to die.
The reproducing group puts its offspring to its right or left, and every group in between the reproducing and the dying groups move over one spot. Figure 4: The Shift process. One of the groups is chosen to reproduce, proportional to group values, and produces an identical offspring group. The offspring replaces one of the groups chosen randomly from the whole pool of groups.
Group selection by itself also constitutes a Markov process with multiple absorbing states. Once all groups have the same fraction of cooperators, the population state can no longer change by group level events. Therefore, the states in which all groups have the same composition would be the absorbing states of a Markov process with group level events only.

Migration events
If a migration event occurs -which happens with probability r -then a random group is selected, and within that group, a random individual is chosen to migrate. All groups have equal probability of being chosen to host a migration event, and within the group, all individuals are equally likely to be chosen to become the migrant. Then a coin toss determines whether the individual moves to the group on the left or to the group on the right. Within the receiving group, a randomly chosen individual trades places with the first individual.
Migration makes a difference for which sets of states are absorbing. Without migration, the set that consists of all population states in which some groups consist of cooperators only, other groups consist of defectors only, but no group is mixed, is absorbing. Group events can still make a population transition from one population state within this set to another population state within this set, but no group or individual level event can make the population transition from a state within this set to one outside it. With migration, on the other hand, this is not an absorbing set of population states. When there is migration, the only sets of states that are absorbing are the set that only contains the state where everyone is a cooperator, and the set that only contains the state where everyone is a defector. Besides these two singleton sets, no sets of states are absorbing if we include migration.
This observation is important for understanding why, as we will see later, very low migration rates will reduce the gap between the critical b/c ratios for the two processes. Once absorbed within the set where all groups are homogeneous, the dynamics for the different replacement rules are only different in speed, while the differences in fixation probabilities disappear.

Alternative modeling choices
There are places in the model where, with the literature on group selection in mind, one can easily think of other ways to determine who reproduces and who is replaced. The reason for the choices we made is typically that they allow us to derive relatively tractable analytical solutions, which help illustrate the cancellation effect. Alternative choices would produce the same gap between full cancellation at the group level (BD) and no cancellation at the group level (Shift), but some would come with (much) more complicated analytical solutions.

Luo (2014)
In Luo (2014) and van Veelen et al. (2014), the reproduction rate of an individual is an individual characteristic, where defectors have a higher individual reproduction rate than cooperators do. That implies that a group with more defectors is more likely to host an individual reproduction event than a group with fewer defectors. In our model, every group is equally likely to host an individual reproduction event. Another implication of the choices in Luo (2014) and van Veelen et al. (2014) is that there, the ratio of group level reproduction events to individual level reproduction events depends on how many cooperators there are in the population as a whole; with many cooperators, group level reproduction events happen more frequently, relative to individual level reproduction events, compared to a population with fewer cooperators. In our model, this ratio is p q , which is constant.
The modeling choices in Luo (2014), and similar ones in Simon (2010) and Simon et al. (2013), produce a model that is in some respects more elegant than ours. When combined with a structured population of groups like the cycle, and with different replacement rules for group reproduction, the cancellation effect would be present and absent in their model in the same way as it is in our version. While our model may be a bit less elegant, it does allow for a more straightforward derivation of the formulas for the critical b/c ratios. (Just to be sure: groups in Luo (2014)  A final difference, not in the model itself, but in the approach to deriving analytical solutions, is that Simon of groups m going to infinity, while we derive analytical solutions in the limit of weak selection. Traulsen & Nowak (2006) In Traulsen and Nowak (2006), individuals within groups affect their own and each other's individual reproduction rates. Cooperators lower their own individual reproduction rate, and increase the individual reproduction rates of their fellow group members. When a group is at maximimum capacity, there is a small probability that at an individual reproduction event, the offspring does not replace another group member, but makes the group split. Because individuals in all-cooperator groups reproduce more frequently than individuals in all-defector groups, all-cooperator groups also split more often, and therefore produce more offspring groups. This is a bit further removed from our model, but it would of course be possible to make versions of this model where groups are situated on a cycle, and, in case a group splits, it either replaces a neighbouring group (BD), or a randomly chosen group (Shift).
There are also differences in the methods used to derive analytical solutions. One reason not to use an extended version of their model is that their method for finding analytical solutions makes assumptions that make it hard to identify the cancellation effect at the group level by comparing Birth-Death and Shift. Traulsen and Nowak (2006) assume a separation of timescales, by considering the case where the probability that a group splits as a result of an individual reproduction event is vanishingly small. This implies that almost all of the time, groups will be at maximum capacity, and will consist of one type of individual only.
The separation of timescales results in a nested Moran process, for which they compute fixation probabilities in the limit of weak selection. In Section 8.2 we take a similar approach by considering the limit of p → 1, but without assuming selection to be weak. There, we find that the difference between Birth-Death and Shift disappears with the separation of timescales. The reason why it does is similar to the reason why it does without migration, when the dynamics also make groups be at within-group fixation almost all of the time.
This is described in Section 7. Traulsen and Nowak (2006) also have a version with (global) migration. In the limit they consider, migration events happen with a frequency that is of the same order of magnitude as splitting events. This leaves the separation of timescales intact, which implies that groups will still either be all-cooperator or all-defector groups most of the time. For identifying a difference between Birth-Death and Shift, we would need migration to occur sufficiently frequently to keep at least some groups away from within group fixation. Although their model does not allow for a straightforward extension to a structured population of groups, in which we can easily identify the cancellation effect at the group level, we do find a few consistent patterns. The number of groups m, group size n, and migration rate r all affect the critical b/c ratio in our model in ways that are similar to how they affect that ratio in their model.

Public goods game
Although the game in our model does have the key properties of a public goods game, where paying individual costs come with collective benefits, one can also define payoffs so that the game looks like a public goods game already at the individual level. Such a version of the model would leave all model assumptions unchanged, except for the individual and group payoff functions. The values for a cooperator and a defector in group i, in which there are k i cooperators, then become: where w is the intensity of selection. The value for the group is defined as: where w is the intensity of selection. We assume that b > c.
In this formulation, the payoffs seem to make this a public goods game, already at the individual level, as all individuals get a higher payoff when all individuals play C (when they all get b − c), compared to what all individuals get when all play D (when they all get 0). In our setting, this is however not really an improvement for all at the individual level. The probability of a group being chosen to host an individual event is 1 m , and this probability is therefore independent of the population state. When all individuals have the same payoff, whether it is all large or all small, their probability of reproducing, conditional on their group being chosen to host the individual reproduction event, is 1 n . These payoffs therefore only seemingly introduce the public goods nature at the individual level. In alternative settings, where groups with many cooperators also host more individual reproduction events than groups with many defectors, the reproduction rates of everyone will be higher if everyone is a cooperator rather than everyone being a defector, but now this is compensated by an also elevated death rate, again neutralizing all "gains from cooperation" at the individual reproduction level. The only difference between the formulations is how the change in individual reproduction rate depends on the total number of cooperators in the group. In the formulation we chose, being a cooperator and not a defector decreases ones individual reproduction rate by 1 n−wkic − 1−wc n−w(ki+1)c , if k i is the number of cooperators among the other members of the group. In this "PGG" formulation, that difference is . In other words, in our formulation, the reduction in individual reproduction rate gets a bit larger when more others cooperate; while in this alternative PGG formulation, the reduction in individual reproduction rate gets a bit smaller.
This alternative formulation does furthermore link individual and group values, by making the latter the average of the former. The gains in group reproduction rate relate in a more straightforward way; the b − c in the PGG version replaces the b in our version.
Given that the effective differences between these two versions of the model are only minor details, it is not surprising that simulations also give very similar results. For obtaining analytic results, however, our version is easier to work with.

Analytical results in the limit of weak selection
We would like to derive critical b/c ratios, above which cooperation is selected for, in the limit of weak selection. In order to do that, we will go over the effects of being a cooperator instead of a defector on reproduction rates and death rates. The b and the c are just model parameters, so they are not the fitness benefits and fitness costs of cooperation. The effects that we compute below do amount to those fitness benefits and costs. Because these effects satisfy equal gains from switching locally (they are additive in the limit of weak selection), we can follow an inclusive fitness approach, where these effects are weighted with the relatednesses to the individual that the effects are on, in order to determine the direction of selection (van Veelen et al., 2017;van Veelen, 2018). We will use this approach to derive critical b/c ratios, which are therefore formulated in terms of model parameters. In the following subsections, we consider a case where one individual switches from being a defector to being a cooperator, and we calculate the effects of this change by the focal individual on every individual in the population, weighted by the corresponding relatednesses.
Since the probabilities concerning reproduction events, both at the individual level and at the group level, are the same in BD and in Shift, the effects of being a cooperator instead of a defector on reproduction rates will be the same for both. Moreover, in both processes, the individual death rate is the same. The difference between the two processes, therefore, will only start showing up when we compute the effects on group death rates. n−(i+1)wc , which is a difference of 1−wc n−(i+1)wc − 1 n−iwc . If we take the derivative with respect to w for both terms, we get:

Changes in reproduction rates
Evaluated at w = 0, we get: which means that the change in probability of being chosen for reproduction, for w close to 0 (weak selection), can be approximated by Changes in the reproduction rate of a C in my group If I am a C instead of a D, with i other C players in my group, and my group is chosen for an individual update, I change other C's individual probability of being chosen for reproduction from 1−wc n−iwc to 1−wc n−(i+1)wc , which is a difference of 1−wc n−(i+1)wc − 1−wc n−iwc . If we take the derivative with respect to w, we get: Evaluated at w = 0, we get: which means that the change in their probability of being chosen for reproduction, for w close to 0 (weak selection), can be approximated by n−(i+1)wc , which is a difference of 1 n−(i+1)wc − 1 n−iwc . If we take the derivative with respect to w, we get: Evaluated at w = 0, we get: which means that the change in their probability of being chosen for reproduction, for w close to 0 (weak selection), can be approximated by The effects on individual reproduction rates should be multiplied by p 1 m in order to account for the probability with which an individual event happens, and that my group is chosen to host it. The changes in individual reproduction rates within the group add up to 0, as they should with a fixed group size. There are no effects on individual reproduction rates in other groups.

Changes in group reproduction rates
Changes in my group's reproduction rate If I am a C instead of a D, with i other C players in my group, i = 0, 1, ..., n − 1, and j other C's in the population as a whole, j = i, i + 1, ..., i + n(m − 1), then if a group level event happens, I change my group's probability of being chosen for reproduction from If we take the derivative of both terms with respect to w, we get: Evaluated at w = 0, we get: which means that the change in my group's probability of being chosen for reproduction, for w close to 0 (weak selection), can be approximated by

Changes in other groups' reproduction rates
If I am a C instead of a D, with i other C players in my group, i = 0, 1, ..., n − 1, and j other C's in the population as a whole, j = i, i+1, ..., i+n(m−1), then if a group level event happens, I change the probability of being chosen for reproduction of a random other group with k cooperators from If we take the derivative with respect to w, we get: Evaluated at w = 0, we get: which means that the change in that other group's probability of being chosen for reproduction, for w close to 0 (weak selection), can be approximated by The effects on group reproduction rates should be multiplied by q in order to account for the probability with which a group event happens. The changes in group reproduction rates add up to 0, as they should with a fixed number of groups.

Overall effect through changes in reproduction rates
The combined effects, close to neutrality, all weighted with the corresponding relatednesses, are: where r s is the relatedness between an individual and a randomly chosen other individual from the same group, and r o is the relatedness between an individual and a randomly chosen individual from a randomly chosen other group.
In Section 4, we derive identities concerning different relatednesses. We can use one of those to rewrite the combined effects. Equation (39) states that r o = − 1 n(m−1) − n−1 n(m−1) r s , and using that, we can see rewrite the combined effects as Equation (8) moreover states that r 0 = 1 n + n−1 n r s , and therefore that n−1 n (1−r s ) = 1−r 0 and 1+(n−1)q s = nq 0 , where r 0 is the relatedness between an individual and a randomly chosen individual from the same group -which is labeled group 0 -including the individual itself. The only difference with r s is that there the individual itself is excluded, hence the simple relation between r 0 and r s . We can use this to rewrite the combined effects as

Changes in individual death rates
Individual death rates do not change when a player switches from being D to C.

Changes in group death rates
Changes in group death rates differ between the two update processes at the group level.

Birth-Death
Changes in my group's death rate If I am a C instead of a D, with i other C players in my group, i = 0, 1, ..., n − 1, and j other C's in the population as a whole, j = i, i + 1, ..., i + n(m − 1), then, if a group level event happens, we have seen that I change the reproduction rate of my own group by approximately m−1 nm 2 b · w. Since with probability 1 m the offspring group replaces the parent group, that comes with an increase in death rate equal to We have also seen that I change the reproduction rate of all other groups by approximately − 1 nm 2 b · w, including my two neighbouring groups. If one of those is chosen for reproduction, it replaces my group with probability m−1 2m , reducing my death rate by These two cancel out exactly, so the overall effect on the death rate of my group is 0, as my neighbours are now less likely to be chosen for reproduction and replace my group, but my group is more likely to reproduce and replace itself.
Changes in the death rate of the two neighbouring groups Following a similar argument, we find that the probability that one of my next-door neighbour groups is replaced changes by: The first term is the product of the probability that, if my group is chosen for reproduction, it replaces a given neighbour, and the increase in my groups probability to reproduce. The second term is the product of the probability of the neighbouring group of the neighbouring group to replace the neighbouring group, if chosen for reproduction, and the decrease in their probability of being chosen for reproduction. The third term is the product of the probability of the neighbouring group to replace itself, if chosen for reproduction, and the decrease in their probability of being chosen for reproduction. This sum can be rewritten as: This effect is the same for both right and left next-door neighbours.
Changes in the death rates of other groups The probability of being replaced changes by: for any of the m − 3 other groups. The first term on the left hand side is twice the product of the probability that a given neighbouring group replaces a given group, times the change in reproduction probability of those neighbouring groups. The second term is the product of the probability a group replaces its parent group, when chosen for reproduction, and the change in reproduction probability of such a group.
The effects on group reproduction rates should be multiplied by q in order to account for the probability with which a group event happens.

Changes in group death rates
For the Shift process, group death rates do not change when a player switches from being D to C.

Birth-Death
The overall effect through death rates again combines the effects on individual death rates (which are zero) and the effects on group death rates affecting self, group members, and members of other groups, each weighted with the corresponding relatedness measure: where we used the identity r 1 = r m−1 , and Equations (37), and (8) from Section 4.

Shift
For Shift, the overall effect through changes in death rates is zero, since the effects on death rates of both individuals and groups are zero.

Overall effect of switching from D to C
The overall effect of a player switching from playing D to C would be the effect on reproduction rates minus the effect on death rates.

Birth-Death
The overall effect for the BD process is This gathers the effects as summarized in Fig. 6. The probability that an individual event happens is p.
The probability that if it does, it happens in the group of the focal individual is 1 m . If we write the effect on the individual reproduction rate of the focal individual as − 1 n c + 1 n 2 c, then we can also see the individual

Individual reproduction rate:
Individual death rate: Group death rate BD: Group death rate Shift: Group reproduction rate: effects as a combination of a reduction in individual reproduction rate of the focal individual by 1 n c, and an increase in individual reproduction rate of 1 n 2 c for everyone in the group, including the focal individual. With n individuals per group, the latter is equivalent to an effect of 1 n c on a randomly chosen individual from the same group, including the focal individual. This randomly chosen individual is related r 0 to the focal individual.
The probability that a group event happens is q. For all groups other than the focal group and its direct neighbours, the group reproduction rate and the group death rate go down by the same amount. The difference between the change in group reproduction rate and the change in the group death rate is m−1 nm 2 b for the focal group, and − 1 2 m−1 nm 2 b for both neighbouring groups, which adds up to − m−1 nm 2 b. In the group of the focal individual, there are n individuals, who on average are related r 0 to the focal individual, and also in the neighbouring group there are n individuals, and those are related r 1 to the focal individual.
One way to rewrite this inequality would be The term qnr 0 1 nm b now reflects the effects through changes in group reproduction rates, while the term −qn 1 m r 0 + m−1 m r 1 1 nm b reflects the effects through changes in group death rates. This matches the group replacement rule for Birth-Death, where a reproducing group replaces itself with probability 1 m , and one of its neighbouring groups with probability m−1 m .
Another way to rewrite this condition would be

Shift
Since the effect on death rates is zero, the overall effect would be equal to the effect on reproduction rates: This gathers the effects for Shift, which are also summarized in Fig. 6. The effects on individual reproduction rates are the same as for Birth-Death. On the group level, there are now only changes in reproduction rates and no changes in death rates. The changes in group reproduction rates can be seen as a combination of an increase in group reproduction rate of the group of the focal individual by 1 nm c, and a decrease in group reproduction rate of 1 nm 2 c for every group, including the group of the focal individual. The latter is equivalent to an effect of 1 nm c on a randomly chosen group, including the group of the focal individual. A randomly chosen individual from the same group, including the focal individual itself, is related r 0 to the focal individual, and a randomly chosen individual from a randomly chosen group, including the group of he focal individual, is related m−1 i=0 1 n r i = 0 to the focal individual.
If we simplify this condition, we get

Comparing thresholds
There are two differences between these two thresholds. To pinpoint the first, we can write Condition (2), which gives the threshold for Birth-Death, as This condition has has a − 1 m r 0 + m−1 m r 1 term that is absent in Condition (4). This term reflects the cancellation effect, and it pushes the threshold up. The second difference is that r 0 will not the same across the two processes, even if everything else (that is: p, q, r, n and m) is equal. In Section 4, we calculate how r s , and thereby also r 0 , depends on those five parameters for both processes, and it turns out that r 0 is higher for BD than for Shift. Therefore, if it was not for the first difference, the critical b/c ratio would actually be lower for BD than for Shift.

Hamilton's rule
We derived these thresholds using inclusive fitness. Therefore, it is worth pointing out that the b and the c in those thresholds are model parameters, and that they are not the fitness benefits and fitness costs of cooperation. It would therefore not be correct to read these formulas themselves as versions of Hamilton's rule, where the b and the c would represent the fitness benefits and costs, and the right hand side would replace 1 r Nowak et al., 2010). These parameters do however determine the size of the fitness effects, as we have seen in the derivations that preceded Conditions (2) and (4).

More general version and interpretation
The processes we consider are Birth-Death, with completely local between-group competition, and Shift, with completely global between-group competition. One can however also consider a more general class of processes that are the same as Birth-Death and Shift with respect to their individual reproduction, but that vary in how local between-group competition is. To keep them comparable, we can assume for all processes that if a group is chosen to reproduce, then the parent group itself is chosen to die with probability φ 0 = 1 m . The remainder of the probabilities can be chosen freely, and are given by φ i , i = 1, ..., m − 1, where we do assume symmetry (φ j = φ m−j ) and, since they are probabilities, m−1 i=0 φ i = 1. Groups between the reproducing group and the dying group then move over in the same way as they do in Shift.

Changes in group death rates
For such a process, we can write the effect through the change in death rate of group i as a sum of the change in its death rate as a result of the focal group reproducing, and the changes as a result of all other groups reproducing: This can be rewritten as Here, we use

Overall effect of switching from D to C
The overall effect then becomes Here we use Birth-Death is a special case of this larger collection with In the descriptions of how Conditions (1) and (3) summarize the fitness effects, we have seen that the first term in both of them, which is also the first term here, summarizes the effects of changes in individual reproduction rates. We have also seen how the first half of the second term, qnr 0 1 nm b, summarizes the effects of changes in group reproduction rates. The second half of the second term, −qn summarizes the effect of changes in the group death rates. This term therefore captures the cancellation effect at the group level for these models.
We can also write the condition as It should be noted though that the relatednesses are endogenous; they also depend on the process. Since we focus on Birth-Death and Shift, we only compute relatednesses for those.
The whole of Section 3.6 follows a very helpful suggestion by one of the reviewers. Also the use of Equation (8) earlier on, and the focus on and interpretation of Conditions (1) and (3) follow suggestions of this reviewer.

Relatedness
In this section, we will calculate relatednesses r i between individuals that are i groups apart, both for Birth-Death and for Shift, in the limit of weak selection, using identical-by-descent (IBD) probabilities. Two individuals are considered IBD if they descend from a common ancestor, and no mutations have occurred along their lineage. We will first derive the relatedness measures by assuming a mutation probability u per individual reproduction (and, at a group reproduction event, all individuals in the group reproduce with the same mutation probability), and then take the limit of u ↓ 0 to find the no-mutation limit of relatedness measures. To do so, we will first derive the recurrence relations for IBD probabilities by assuming a stationary distribution {q i } and then derive the no-mutation limit for relatedness using the following identity (Malécot, 1948;Rousset, 2004;Taylor et al., 2007a;b;Grafen, 2007;Durrett, 2008) where q i denotes the stationary IBD probability for two individuals whose groups are i steps apart andq denotes the average IBD probability of a focal individual to all the individuals in the population, including self.
Two observations will be useful in our later derivations. The first is that, by symmetry, The second is that, for i = 0, we can relate q 0 -the IBD probability for two members from the same group, drawn with replacement -and q s -the same probability without replacement -in a straightforward way: since the individual's relatedness to herself is 1.

Birth-Death
In the BD process, an individual can be replaced if: • An individual event happens, the group the individual is in is chosen to host it, and within the group, she is chosen to be replaced.
• A group event happens, one of the neighbouring groups of the group the individual is in is chosen to reproduce, and that group replaces her group.
• A group event happens, the group that the individual belongs to is chosen, and replaces itself.
• A migration event occurs, the individual's group is chosen to be one of the groups in which the migration takes place, and the individual is chosen to be swapped.
Combining these events, we can derive the recurrence relations for the stationary IBD probabilities.

i = 1
For i = 1, we have the following recurrence relation: Here we use the probabilities with which different replacement events happen. We will sometimes refer to the two individuals in two neighbouring groups as the two focal individuals, and to their groups as the two focal groups.
If an individual event happens, the probability that any given group is chosen to host it is 1 m . The probability that any individual within that group is chosen to be replaced is 1 n . Therefore, for two given individuals in neighbouring groups, that adds up to 2 m 1 n . There are a few ways in which they can both not be affected. One of the other m−2 groups can be chosen, which happens with probability m−2 m ; or one of the two neighbouring groups can be chosen, while some other individual is replaced, which happens with probability 2 m n−1 n .
If a group event happens, then one of the two neighbouring groups can replace the other -which happens with probability 1 m m−1 m 1 2 -or the other can replace the one, which happens with the same probability. In both cases, q 0 is the relevant IBD probability. Also, a neighbour outside the focal pair of groups can replace one of the two in the focal pair, which again happens with a probability that is twice 1 m m−1 m 1 2 . In this case, q 2 is the relevant IBD probability. Finally, with a probability of twice 1 m 2 , one of the focal groups replaces itself, in which case, q 1 is the relevant IBD probability. Nothing happens to the pair if a group is chosen to reproduce that is neither of the two groups within the focal pair, nor one of their direct neighbours, which happens with probability m−4 m . Also nothing happens if one of the neighbours of the focal pair is chosen, and they replace themselves (  With a migration event, every neighbouring pair is doing an exchange with probability 1 m . This pair consists of both focal groups with probability 1 m , and it is a pair that consists of one of the two focal groups and its neighbour on the other side with probability 2 m . In the last case, the focal individual is chosen with probability 1 n , and q 2 is the relevant IBD probability. In case the exchange is between the focal pair of groups, the two focal individuals themselves are chosen to switch with probability 1 n 2 , one of them is swapped with an individual that is not the other with probability n−1 n 2 , and the other is swapped with an individual that is not the one with the same probability. In the first case, nothing changes with regard to the IBD probabilities, and in the latter two cases, q s is the relevant IBD probability. Nothing happens at migration if one of the other m − 3 pairs is chosen ( m−3 m ), a focal group and a neighbouring group outside the focal pair is chosen, but the focal individual is not chosen ( 2 m n−1 n ), or the focal pair itself is chosen, but two individuals other than the focal ones trade places ( 1 m (n−1) 2 n 2 ).
If we gather the terms with q 1 on the left-hand side, we get If we use the identity q 0 = 1 n + n−1 n q s , and multiply left and right by m 2 , this can be rewritten as If we furthermore use that p + q + r = 1, this can be further simplified to The recurrence relations for 1 < i < m − 1 are derived in a similar way. The differences with the recurrence relation for i = 1 arise because the two groups no longer are each other's neighbours, which means that the groups can no longer replace each other, nor can they exchange individuals.
If we gather the terms with q i on the left-hand side, we get: If we multiply left and right by m 2 , and use p + q + r = 1, this can be rewritten as To derive q s -and therewith q 0 -we will use the recurrence relation concerning the IBD probabilities for two individuals within the same group.
The In case of a group event, the relevant probabilities are 1 m 2 for the focal group replacing itself; m−1 m 2 for the focal group being replaced by a neighbouring group; m−3 m for a group being chosen to reproduce that is not the focal group nor a neighbour; m+1 m 2 for a neighbouring group being chosen to reproduce, and replacing itself or it's other neighbour; and m−1 m 2 for the focal group being chosen to reproduce, and replacing a neighbouring group and not itself.
In case of a migration event, the relevant probabilities are 4 mn for the focal group to exchange a member with the neighbouring group on the right or on the left, and one of the two focal individuals being chosen; 2(n−2) mn for the exchange happening between the focal group and any of the two neighbouring groups, and neither of the two focal individuals being replaced; and m−2 m for the exchange happening in any of the other m − 2 pairs.
The equation can be rewritten as Expressing q 1 as a function of q s , this becomes Finally, we use p + q + r = 1 to write

Solving the system
Suppose, for 1 ≤ i ≤ m − 1, q i has the following form: where s 1 = s m−1 = 1, and lim u→0 s i = 1 for 1 < i < m − 1, since all IBD probabilities approach 1 in the limit of no mutation. If we rewrite Equation (10) with the assumption in Equation (12), we get the following equality: Assuming that q 1 = 0, this is also: Since q i → 1 for all i as u ↓ 0, we cannot directly use Equation (7) to calculate relatedness, as both the numerator and the denominator approach zero. Therefore, we will apply L'Hôpital's rule and calculate relatednesses as: Here the derivatives are taken with respect to u, and evaluated in the limit of u ↓ 0. In order to determine q i andq , we want to find s i by taking derivatives on both sides of Equation (13).
If we evaluate this in the limit of u ↓ 0, then we can also use that s i = 1 for all i in that limit.
This can be reorganized as follows: Summing both sides of the above equation where we used the identity s 1 = s m−1 = 0 -since s 1 and s m−1 are constant -and s 2 = s m−2 -in the third and the fourth lines, respectively. Using the final equation above and the identity s 1 = 0, we can derive the limit values for all s i as given below: for 2 ≤ i ≤ m − 2.
Before we can plug these into the relatedness formula, it will be helpful to writeq differently. Sinceq = 1 m m−1 i=0 q i , we can take the derivatives of all the terms separately.
Now we can use that q 1 = 1 and s i = 1 in the limit of u ↓ 0 for all i, that q 1 = q m−1 , and that s 1 = s m−1 = 0 (because s 1 = s m−1 = 1 regardless of u) when rewritingq .
We can now plug this in Equation (7) to get In order to have a formula for relatedness that depends only on the parameters of the model (p, q, r, m and n), we still need to express q 0 and q 1 in terms of these parameters. In order to be able to do that, we will first express q s in terms of those parameters.
Step 1 is to take the first derivative with respect to u on both sides of Equation (9), and evaluate them at At u = 0, q 0 = q 1 = q 2 = 1, and hence Also q 0 = n−1 n q s and q 2 = s 2 q 1 + s 2 q 1 , and hence, with q 2 = q 1 = 1 and s 2 = 1, at u = 0, the expresion above becomes Step 2 is to take the first derivative with respect to u in Equation (11) q 1 = mn 4r Evaluated at u = 0, where also q s = 1, this is If we combine these two steps, we get We can simplify this using the formula for s 2 from Equation (16), repeated below.
If we look at the first term in the numerator in Equation (22), we see that the coefficient of s 2 is equal to the denominator of s 2 , so we can rewrite the formula for q s as follows: where we used p + q + r = 1 in the second line.
Then, we can rewrite the denominator, which links q s to q 0 , and Step 2 above gave us which links q s to q 1 . This implies that we have everything we need to complete Equations (18), (19) and (20) for the BD process.

Shift
In the Shift process, an individual can be replaced if: • An individual event happens, the group the individual is in is chosen to host it, and then within the group, she is chosen to be replaced.
• A group event happens, and a neighbouring group, or its offspring group, pushes the individual's group one position away from where it was, or replaces it. Probabilities for those events are also derived in  for the Shift process, where all positions are occupied by individuals instead of groups.
• A migration event occurs, the individual's group is chosen to be one of the groups in which the migration takes place, and the individual is chosen to be swapped.
Combining these events, we can derive the recurrence relations for the stationary IBD probabilities.

i = 1
For i = 1, we have the following recurrence relation: replaced in a group event + r 1 m 1 n 2q 2 + 2(n − 1) n q s + 1 n q 1 replaced in a migration event The probabilities for individual and migration events are the same as in BD. The probabilities for group events are different. To get the probabilities for group events right, it is important, in the face of equivalent ways to define this update rule, to have an unambiguous rule for who ends up at which location after a group reproduction event. The reproducing group always stays put. If it is also chosen to die, its offspring group takes its place and no group moves. If not, then with probability one half, the offspring group occupies the position to the left of the parent group, and every group in between the reproducing and the dying group moves in the same direction. Also with probability one half, the offspring group occupies the position on the right, and every group in between the reproducing and the dying group moves in that direction. The left one replaces itself with probability 1 m 2 . The right one too. After this, two randomly chosen members of the neighbouring groups are IBD with probability (1 − u)q 1 .
The group to the left of the left one reproduces to the right and pushes the left group to the right position if the group to the left of the left one is chosen to reproduce, the left group and the group to the left of the left one are both not chosen to die, and the reproducing group reproduces to the right. This happens with probability 1 2 m−2 m 2 . The mirror image of that happens with the same probability. After this, two randomly chosen members of the neighbouring groups are IBD with probability (1 − u)q 1 .
These probabilities and the probabilities with which they replace themselves add up to 1 m , and we have seen that, for both, the IBD probability is (1 − u)q 1 .
The left one is replaced by the group to the left of it, if the left one is chosen to die, any group other than the left one and its left neighbour is chosen to reproduce, and the reproducing group reproduces to the right.
The right one is replaced by the group to the right of it, if the right one is chosen to die, any group other than the right one and its right neighbour is chosen to reproduce, and the reproducing group reproduces to the left. Both events happen with probability 1 2 m−2 m 2 , and after this, two randomly chosen members of the neighbouring groups are IBD with probability q 2 .
The left one is replaced by the offspring of the group to the left of it, if the left one is chosen to die, its left neighbour is chosen to reproduce, and it reproduces to the right. The right one is replaced by the offspring of the group to the right of it, if the right one is chosen to die, its right neighbour is chosen to reproduce, and it reproduces to the left. Both events happen with probability 1 2 1 m 2 , and after this, two randomly chosen members of the neighbouring groups are IBD with probability (1 − u)q 2 .
After all other events, the groups at the two given neighbouring locations are both not the offspring group, and were neighbouring groups in the period before the group event, too.
All of those group event probabilities can also be found in . The only differences are that we derive them in a forward looking way, while they do it in a backward looking way, and, since we may have more than one individual at any site, we do not have q 0 = 1.
If we gather all terms with q 1 on the left hand side, we get If we use the identity q 0 = 1+(n−1)qs n , and multiply left and right by m, this can be rewritten as If we furthermore use that p + q + r = 1, this can be further simplified to The recurrence relations for 1 < i < m − 1 are derived in a similar way. The differences with the recurrence relation for i = 1 arise because the two groups no longer are each other's neighbours, which means that the groups can no longer replace each other, nor can they swap individuals.
replaced in a group event If we gather the terms with q i on the left-hand side, we get If we multiply both sides with m, and use p + q + r = 1 again, we can rewrite this as

i = 0
To derive q s -and therewith q 0 -we use the recurrence relation concerning the IBD probabilities for two individuals from one and the same group.
Again, the probabilities for individual and migration events are the same as in BD. If a group event happens, then any given group reproduces and replaces itself with probability 1 m 2 , is replaced by its left neighbour with probability 1 2 m−1 m 2 , and by its right neighbour with the same probability. That adds up to 1 m , and the IBD probability dilutes to (1 − u) 2 q s in all of these cases. In all other cases, it remains q s .
Expressing q 1 as a function of q s , this becomes Finally, we use p + q + r = 1 to write This turns out to be the same equation for Shift as for BD.

Solving the system
Suppose again that q i has the following form for 1 ≤ i ≤ m − 1: where s 1 = s m−1 = 1, and lim u→0 s i = 1 for 1 < i < m − 1. Then, we can rewrite Equation (26) as follows: Assuming that q 1 = 0, this is also: Since q i → 1 for all i as u ↓ 0, we cannot directly use Equation (7) to calculate relatedness, as both the numerator and the denominator approach zero. Therefore, we will apply L'Hôpital's rule and calculate relatednesses as: Here the derivatives are taken with respect to u, and evaluated in the limit of u ↓ 0. In order to determine q i andq , we want to find s i by taking derivatives with respect to u on both sides of Equation (28).
If we evaluate this in the limit of u ↓ 0, then we can also use that s i = 1 for all i in that limit.
This can be reorganized as follows If we divide everything by 2 and by the first term, we get the following equation.
We will call the right hand side of this equation −η i . If we do so, and we sum both sides of the equation over which, together with Equation (29), implies that The derivations of Equations (18), (19) and (20) in the previous subsection do not depend on the update process; they only depend on the assumption that q i = s i q 1 for 1 ≤ i ≤ m − 1. Here we make the same assumption for Shift. Hence, we can use the same equations for Shift as we did for BD. Because Equations (18), (19) and (20) also feature the sum of s j values, it will help to compute that sum as well. Since s 1 = s m−1 = 0, summing from j = 2 to m − 2 gives the same answer as summing from j = 1 to m − 1.
The first part of the above summation can be calculated as follows The second part of the summation can be calculated as follows When we extend the summation in this way, we see that for every η i , the first time it appears, it is multiplied by 1; the next time it is multiplied by 2; and so on; and the last point the same term appears, it is multiplied by (m − 2 − i). Therefore, we can rewrite the above summation as follows Now, if we combine the two parts found in (31) and (32), we see that We can write the terms with η 2 and η m−2 separately, and use that η 2 = η m−2 : Rearranging gives the formula below.
Unfortunately, there is no closed-form solution for the values s i and their sum. However, it is possible to find numerical solutions once we fix the population size.
In order to have a formula for relatedness that only depends on the parameters of the model (p, q, r, m and n), we still need to express q 0 and q 1 in terms of these parameters. In order to be able to do that, we will first express q s in terms of those parameters.
Step 1 is to take the first derivative with respect to u on both sides of Equation (25), and evaluate them at At u = 0, also q 0 = q 1 = q 2 = 1, and hence q 2(m − 1) m + r 2(2n − 1) n 2 q 1 + p 2 n + 2q = q m − 1 m + r 2 n (q 0 + q 2 ) Also q 0 = n−1 n q s and q 2 = s 2 q 1 + s 2 q 1 , and hence, with q 2 = q 1 = 1 and s 2 = 1 at u = 0, this is also Step 2 is to take the first derivative with respect to u in Equation (27) q Evaluated at u = 0, where also q s = 1, this is q 1 = p r If we combine these two steps, we get which gives q s = q m−1 m + r 2 n s 2 − p n + q 3 − 1 n + q As in BD, Equation (8) moreover implies which links q s to q 0 , and Step 2 above gave us which links q s to q 1 . This implies that we have everything we need to complete Equations (18), (19) and (20) for the Shift process.

Three useful identities
The definition of relatedness (Equation 7) implies that relatednesses have to add up to 0.
Equation (8) relates the relatedness within the group including self and excluding self in an obvious way, which we repeat here: Together, these imply that If we define r o as the relatedness found through IBD probabilities, as in the previous subsection, to a randomly drawn individual from another group, where all other groups are equally likely to be drawn, then this is Combining Equations (37) and (38), we get None of the three identities depends on the update process, so they apply to BD as well as Shift.

Alternative derivation of the three identities
In the limit of u ↓ 0, if two individuals are identical, they are identical by descent. Therefore, if we derive these identities more generally, using conditional probabilities, then they will coincide in this limit. Consider the dynamical system at hand, which is a Markov chain, in which every state is a vector k, where k i ∈ {0, 1, ..., n} is the number of cooperators in group i, for i = 1, ..., m. For every such population state, one can imagine hypothetical chance experiments, and define differences in conditional probabilities as we will below. These can then be aggregated, with weights attached to the population states. The weights could represent how often these states are visited relative to each other (this would be the rare-mutation dimorphic distribution from Allen and Tarnita, 2014, or the rare-mutation conditional distribution from Allen and McAvoy, 2018), but for now, all that matters is that p k is the weight of population state k, and that k p k = 1.

Within-group relatedness
Consider the following hypothetical chance experiment for a given population state k. Draw a random individual from the population, with every individual equally likely to be drawn. After this, go back to the same group, and randomly draw another individual from it. Then, for this state, one could define the proto-relatedness as The subscript s for same group indicates that this measure is about the relatedness between two different individuals within the same group. The subscript k indicates which population state it pertains to.
Within group relatedness can now be defined as

Relatedness with an individual from a random other group
Now think of another chance experiment for a given population state k. Draw a random individual from the population, again with every individual equally likely to be drawn. After this, go to a different group, with all other groups equally likely to be chosen, and randomly draw another individual from that group. Then, for this state, proto-relatedness is The subscript o for other group indicates that this measure is about the relatedness between two individuals in different groups.
Relatedness between two individuals from randomly chosen different groups can now be defined as

Relatedness between individuals that are i groups apart
With groups situated on the cycle, we can also define other chance experiments for a given population state k. First draw a random individual from the population, as before. After this, with probability 1 2 go to the group that is i steps to the left of the first group, and with probability 1 2 go to the group that is i steps to the right of the first group, i = 1, ..., m − 1. Randomly draw another individual from that group. Then, for this state, the i-step proto-relatedness, can now be defined as The subscript i indicates that this measure is about the relatedness between two individuals in groups that are i steps away from each other.
Relatedness between two individuals that are i groups apart can now be defined as

Identity I
In order to relate these relatednesses to each other, assume that we consider a state k, for which K = m j=0 k j is the total number of cooperators in the population as a whole. Now imagine a chance experiment where we first draw a random individual from the population, with all individuals equally likely to be chosen, and then -without replacement -another individual from the population as a whole, with all remaining individuals equally likely to be drawn. Conditional on the first being a cooperator, the chance that the second is a cooperator can be written in two different ways, which must be equal to each other: Conditional on the first being a defector, one can also express the chance that the second is a cooperator in two equivalent ways These two identities together imply that (n − 1) (P s,k (C|C) − P s,k (C|D)) + n(m − 1)(P o,k (C|C) − P o,k (C|D)) = −1 which implies that Because the p k add up to 1, this state-wise identity implies that if we aggregate over states accordingly, the following holds: This is Equation (39).

Identity II
One can also go over the groups, according to their distance to the group from which the first individual was drawn. Then the equalities become: These two identities together imply that Because the p k add up to 1, this state-wise identity implies that if we aggregate over states accordingly, the following holds: This is Equation (37).

Identity III
Given Identity I, Identity II is also equivalent to This is Equation (38).

Birth-Death versus Shift
In Section 3, we have found the critical b/c ratios for the the Birth-Death and the Shift process. For the BD process, the critical ratio was given in Condition (2). It is repeated below.
For the Shift process, the critical ratio was given in Condition (4). This is also repeated below.
These thresholds are expressed as functions of p, q, r, m, and n, as well as relatednesses r 0 and r 1 -or, equivalently, r s and r 1 . These relatednesses will typically be different for different update processes.
In Section 4, we have computed r 0 and r 1 , both for the Birth-Death process and for the Shift process, expressing them as functions of p, q, r, m and n as well. For both processes, Equations (18) and (19), reproduced below, apply.
In these formulas, we still need to fill in q 1 , q 0 and m−1 i=1 s i , and these will differ between the two processes. For BD, we have Equations (21), (22), (23) and (24), reproduced below.
For both processes, we have q 0 = n−1 n q s and q 1 = p r 1 2n + 1 q s + p+nq 2r , even though the value of q s will differ between the two processes. Therefore, we can use these formulas to express difference q 1 − q 0 in the relatedness formulas in terms of model parameters and q s only.
Now, if we plug this back into the relatedness formulas, we get the following expressions: Using the relatedness formulas and the relationships between IBD probabilities q s , q 0 and q 1 , we can rewrite 1 − r 0 and r 0 − r 1 as follows If we plug these into the formulas for the critical ratios, we get, for BD, And for Shift, we get What we will do in this section, is to compare these two critical b/c ratios, and show that the one for BD is always higher than the one for Shift.
We start by comparing the IBD probabilities q s , which now get a superscript, depending on the update process.
Therefore, every term within the summation in the numerator of Equation (42) is less than one. Hence, the sum is less than m − 3, and the numerator is positive, which implies that A is positive. Therefore, (q s ) BD is less than (q s ) Shif t , and since we know that both q s 'es are negative, this implies that (q s ) BD is "more negative" than (q s ) Shif t .
To compare the critical ratios for the two processes -given below -we need to compare their last terms, as their first three terms are identical.
for BD, and for Shift.

When m is odd
In the calculations in this and the next subsection, we assume that m > 3 and n > 1 since these are the interesting cases. If m ≤ 3, the two update processes become identical, and so do their thresholds. And if n = 1, in every individual event, the offspring replaces the parent and nothing changes in the population state.
First assume that m is odd -the case where m is even is treated below. If m is odd, we can rewrite m−2 i=2 (s i ) Shif t as follows: Going from the first to the second line, we gather the coefficients of We need to find out whether the critical ratio for the BD process is always larger than that of the Shift process; hence, we need to compare the right-hand sides of the above two inequalities that give the critical b/c ratios, given in Conditions (43) and (44). To do so, we start with comparing a few terms to zero; and step-by-step, we will make our way to the equations above. We start by showing that the four terms below are positive: • Term 1: • Term 3: As seen before, For i ranging from 2 to m − 2, we have i − 1 > 0 and m − i − 1 > 0. Therefore, Hence, Since each of the terms above are individually positive for 1 < i < m − 1, their products and sums will be positive as well. If we multiply the first three terms for a given i, the resulting product will be positive. If we sum these products over i, where 3 ≤ i ≤ m−1 2 , the resulting sum will be positive as well, since each term in the summation is positive. And finally, we add the fourth term above to the summation to reach the expression below, which is again positive: Now, we split this expression into two parts, depending on the sign of the terms, and put the terms with a negative coefficient on the left hand side: This leads to Going from the third to the fourth line above, we use that the terms for i and m − i in the latter summation are always the same. Also we rewrite m−2 i=2 1 as m − 3 in the last line. Now, multiply every term with p n + q to get; p+nq 2r Now, if we multiply both sides with −1, the sign also changes: Another step of re-arranging the terms above gives us Notice that the term in the numerator within the parentheses on the left hand side is equal to A = (q s ) Shif t − (q s ) BD from Equation (42), and that the denominator is equal to −(q s ) BD from Equation (22).
If we divide both sides by (q s ) Shif t , where the sign of the inequality changes since we multiply both sides with a negative term. Using (q s ) Shif t −(q s ) BD (q s ) Shif t (q s ) BD = 1 (q s ) BD − 1 (q s ) Shif t , we can rewrite the above inequality as follows, Reverse the numerator and the denominator on both sides of the inequality, where the sign of the inequality changes again on the first line above since we are reversing the fractions that we are comparing. Now, if we multiply both sides by p q n−1 n 2 m m−1 , we arrive at the inequality: In this last inequality, the left hand side is the critical b/c ratio for the BD process, and the right hand side is the critical b/c ratio for the Shift process, given in Conditions (43) and (44), respectively.

When m is even
If m is even, we can rewrite m−2 i=2 (s i ) Shif t as follows: To compare the critical ratios for the two update processes from Equations (43) and (44), we are going to follow a very similar path to the case above in Section 5.1. We start with adding a fifth term to the terms given in the previous subsection; and step-by-step, we are going to reach the equations for the critical ratios of the two update processes. We have shown previously that the Terms 1 through 4 are positive. Here, we add another term and show that it is also positive: • Term 5: Hence, Now, if we multiply Term 1, 2 and 3 and sum these products over 3 ≤ i ≤ m−2 2 = m 2 − 1, add Term 4 to the summation from the previous subsection, we know that the resulting term will be positive. Now, we add Term 5 above, which is also positive, to get Using similar steps as in the previous subsection and the fact that Multiplying both sides by − p n + q , which changes the sign of the inequality, and multiplying and dividing the left hand side by p n + q / pq which is the same inequality as Inequality (46). Following the same steps after Inequality (46) in the previous subsection, we arrive at the result that In this last inequality, the left hand side is the critical b/c ratio for the BD process, and the right hand side is the critical b/c ratio for the Shift process, given in Conditions (43) and (44), respectively.

Limit results for the number of groups approaching infinity
In this section, we explore the results in the limit where the number of groups m approaches infinity.

BD
For the BD process, we first repeat and rewrite Condition ( The limit result of the critical ratio we found in Condition (48) is in line with numerical solutions for large m.

Shift
For the Shift process, we first repeat and rewrite Condition ( Now consider the sum in the numerator in q s . Since each term in this summation is less than 1, we have The limit result of the critical ratio we found in Condition (49) is in line with numerical solutions for large m.

Birth-Death versus Shift in the limit where number of groups approach infinity
In the previous subsection, we solved for the critical b/c ratios in the limit where the number of groups m approaches infinity. The limit results are repeated below for convenience. for Shift. From these formulas, it is immediately clear that the critical b/c ratio for Shift is lower than the one for BD in the limit m → ∞.

Simulations
Since our model quickly becomes intractable once we move away from the limit of weak selection, we also ran numerical simulations with different intensities of selection. In this section, we describe the details of those simulations.
We programmed the simulation version of model, presented in Section 2, in Matlab, where the population state is represented by a vector at each time step. In the remainder of the text, we will call this "the population vector". Each entry in the population vector represents the number of cooperators in a group at a given location on the circle. The group size is fixed in our model, and therefore the number of defectors is implied by the number of cooperators. This vector represents a circle, and therefore the first and the last entry in the vector are treated like neighbours in any relevant group competition or migration event. At the end of each time step, we update the population vector depending on the changes that happened in that time step. Each simulation run starts with a population vector of zeros, except for the first entry, which is a 1. This represents a mutant cooperator in a population of defectors. Since the cycle we use in our model is a transitive graph (Taylor et al., 2007a), the location of the initial mutant does not matter. 3 As mentioned in Section 2, there are only two absorbing states in our model: one in which there are only cooperators and one with only defectors. We made sure that each individual simulation run presented in our results was long enough such that it reached either one of the two absorbing states.
We used random draws to decide what type of event -individual, group, or migration -happens in a given time step, and to determine the details of the corresponding event. 4 At each time step, we draw a random number from a uniform distribution between 0 and 1, 5 and, • if the random draw is lower than p, an individual event happens. In this case, we draw a random integer between 1 and m to decide in which group the individual event will occur. Individual payoffs within the group are defined as presented in Section 2; cooperators within the group get a payoff of 1 − wc, and defectors get a payoff of 1. Using these payoffs and the number of cooperators in the group that is chosen for the individual event, we define the probabilities with which the population state changes in this particular event as follows. Suppose that there are k i cooperators in the group, then with probability p + , the number of cooperators in the group increases by one -a cooperator is chosen to reproduce and a defector is chosen to die -and, with probability p − , the number of cooperators decreases by one -a defector is chosen to reproduce and a cooperator is chosen to die -where With the remaining probability 1 − p + − p − , the number of cooperators does not change, which would represent the case in which the reproducing and the dying individuals are both cooperators, or the case in which because they are both defectors. After defining the probabilities p + and p − within the group, we draw another uniform random variable between 0 and 1 to decide which of the above events happens within the group. If the random number is lower than p + , the number of cooperators in the group increases by one; if it is higher than or equal to p + but lower than p + + p − , the number of cooperators in the group decreases by one; and, if it is higher than p + + p − , the number of cooperators in the group does not change.
• if the random draw is higher than or equal to p but lower than p + q, a group event happens. In this case, we define the group payoff for group i as g(k i ) = 1 + w ki n b, where k i is the number of cooperators in group i at the beginning of the current time step. Then, based on our model, and K = m j=1 k j is the total number of cooperators in the population. Together, these give n thresholds; threshold j is j i=1 q(k i , K). Then we draw a uniform random number between 0 and 1, and if it is between threshold j − 1 and j, group j is chosen to reproduce. Then, two additional uniform random numbers are drawn to choose the dying group and the location of the offspring group.
In the BD process, one of the random draws is compared to 1/m to determine if the reproducing group is chosen to die or not. The other random draw is compared to 1/2 to decide whether the group above or below in the population vector is chosen to die, in case the reproducing group is not chosen to die.
The population vector is updated such that the number of cooperators in the place of the dying group becomes equal to the number of cooperators in its reproducing neighbour. In the Shift process, one of the random draws is used to determine which of the groups from the whole population of groups is chosen to die, and the other random draw is compared to 1/2 to determine the direction in which the offspring group is placed in the population vector. Every entry between the reproducing group and the dying group in the vector is shifted by one spot in the chosen direction.
• if the random draw is higher than or equal to p + q, a migration event happens. By drawing a random integer between 1 and m, we choose the first group to take part in the migration event. In our model, migration events occur between immediate neighbours. With the use of another random draw, we decide which neighbour of the first group is going to take part in the migration event as well.
Within each group, a random individual is picked to migrate in our model, and, only in the case that the individuals migrating are of different types, i.e. one is a cooperator and the other is a defector, the population state changes. Making use of the number of cooperators given by the population vector, we calculate the probabilities that the first group sends a cooperator or a defector, and the probabilities that the second group sends a cooperator or a defector. Then, we draw two random numbers to decide what type of player is sent by each group (this is done in a similar fashion to other type of choices mentioned above).
Depending on the type of the event, and what happens in a given event, the population vector at the end of each time step is updated as described. This procedure is repeated until we reach one of the absorbing states, or once the maximum time steps set is reached (in the latter case, we extend the maximum time steps and re-run the same setup until the run ends due to reaching fixation). At the end of each individual simulation run, we record whether cooperators or defectors fixated in that run.
This simulation program is used for finding critical b/c-ratio's for different choices of n, m, p, q, and r. For any given combination of n, m, p, q, and r, we do this by choosing c = 0.1, choosing a b, and running the simulation program a 1,000,000 times. This gives an estimate of the fixation probability, which we compare to 1 nm , which is the fixation probability under neutral selection. The b is then adjusted accordingly, until we find a fixation probability that is indistinguishable from 1 nm . Figure 7: An example of the last few steps in the iterative process of finding the critical b/c-ratio. Here, m = 50 and n = 20, which makes the fixation probability in the neutral process 1 1000 . Furthermore, this is the Birth-Death process, c = 0.1, w = 0.5, p = 18 21 , q = 9 210 , and r = 1 10 . In this particular instance, the number of fixations for b = 0.45 was almost spot on (999 out of 1,000,000), so we settled on a b/c-ratio of 4.5. In other instances, we rounded to the nearest value for b with 2 digits precision. This procedure results in one critical b/c-ratio, and therefore one point in Fig. 9a.

Theoretical results and simulations
In this section, we take the critical b/c ratios in the limit of weak selection, calculated in Section 3, combine them, as we did in Section 5, with the relatednesses in the limit of weak selection, calculated in Section 4, and plot them for a variety of parameter combinations. We combine those analytical results in the limit of weak selection with simulation results not in the limit of weak selection. As we vary group size n, the number of groups m, and migration rate r, we want to choose the probabilities of group versus individual events such that the ratio of probabilities for an individual to die in an individual event and in a group event remains constant under neutral selection. In this case, we choose them so that these probabilities are always equally large. At neutrality, the probability that an individual dies in an individual event is p 1 m 1 n and the probability that an individual dies in a group event is q 1 m . Keeping them equal therefore requires We also have the condition p + q + r = 1. Together with the equality above, this implies that In Fig. 8 and Fig. 9, we choose r = 0.1, and in Fig. 10, r varies. (a) m = 50 (b) n = 10 Figure 8: Results for the critical b/c ratios, combined with simulations with 1, 000, 000 independent runs, where w = 0.1, r = 0.1, p = 0.9 n n+1 and q = 0.9 1 n+1 . In (a), the effect of increasing the group size is shown for a fixed number of groups. In (b), the effect of increasing the number of groups is shown for a fixed group size. In Fig. 8a and Fig. 9a, we see that the thresholds in both processes increase with group size, and that the gap between the critical b/c ratios for the two processes is there for a range of group sizes. In Fig. 8b and Fig. 9b, we see that the thresholds decrease with the number of groups, and, again, that the gap between the critical b/c ratios for the two processes is there for a range of numbers of groups. Both the increase of the thresholds with group size and the decrease with the number of groups are in line with the results of Traulsen and Nowak (2006). The gap between the thresholds for the two processes in relative terms becomes (very) large for (very) large numbers of groups. In Fig. 10, we see that both thresholds increase with the migration rate. This is understandable, since a higher migration rate will result in lower relatedness. We also see that the gap disappears for migration rates close to 0 and migration rates close to 1. For migration rates close to 0, and not too few groups, there is a fair chance that the process will spend a lot of time in population states where all groups are at within-group fixation, or, in other words, where all groups consist of only cooperators or only defectors. Once this has happened, there is a fair chance that the population will get to a state where these all-cooperator groups form one sequence of groups on the circle. Without mutation or migration, this will remain true from then onwards. In Section 8, we will see that if we imagine a mutant group, where all individuals are cooperators, the fixation probability of such a mutant group is the same in either process. If cooperators on average gain ground on (or lose ground to) defectors, they do so faster with Shift than with BD, but the condition under which they gain or loose is the same for both processes. The reason for the difference in speed is that with Shift, in all intermediate population states with a string of all-cooperator groups and a string of all-defector groups, any all-cooperator group has a real chance of replacing any all-defector group and vice versa. On the other hand, in BD, all the action is at the two boundaries between the strings of all-cooperator groups and all-defector groups, while group reproduction events not on the boundary are inconsequential for the population state. This implies that the boundary moves much more with Shift than it does with BD, and if the boundary moves in expectation in favour of the all-cooperator groups, it will move faster with Shift than with BD. However, the speed turns out not to matter for the fixation probability at the group level of a mutant all-cooperator group. The speed does matter if there are mixed groups, because if there are mixed groups, the ground gained at the group level will balance against the ground lost at the individual level.
With cooperative groups winning faster in Shift, they can therefore overcome a larger within-group decay of cooperators. Absent mixed groups, however, speed does not matter anymore, and the difference in fixation probabilities disappears.
At migration rates r close to 1, it is not hard to understand why the difference between BD and Shift disappears as well. When almost all events are migration events, the population is shaken and stirred between any two reproduction events, be it at the individual or at the group level, and all that matters is what being a cooperator does to an individual's own birth rate and what it does to its group's birth rate.
Those effects are the same in both processes, and therefore the difference between the two processes should disappear at very high migration rates. This is also reflected in the formulas for the thresholds. In the limit of r ↑ 1, only finite population effects remain, and relatednesses become r s = r 1 = − 1 nm−1 and r 0 = 1 n + n−1 n r s = 1 n − n−1 n 1 nm−1 = m−1 nm−1 . This implies that 1 − r 0 = (n−1)m nm−1 and r 0 − r 1 = m nm−1 .
Condition (2) 8 Analytical solutions, without migration, and for limit cases that are not the limit of weak selection There are some limit cases, other than the limit of weak selection, for which it also becomes feasible to derive analytical solutions for fixation probabilities. These results will help understand why the difference between the critical b/c ratios for BD and for Shift disappears when migration rates get close to 0. In both limits we consider here, we assume that there is no migration (r = 0), we assume strong selection (w = 1), and we assume that there is a separation of timescales. In the first limit, where p → 1, group events are rare, and in almost all events, an individual reproduces. This limit is similar to the limit for which Traulsen and Nowak (2006) derive their formula [1]. They do consider the limit of weak selection, while we consider strong selection -which means that by scaling the b and c, we can really consider any intensity of selection. In the second limit, where p → 0, individual events are rare, and in almost all events, a group reproduces.

Fixation probabilities of mutant groups
To compute fixation probabilities in both limits, it will be useful first to compute the fixation probability of a "mutant group", with l cooperators in it, in a population of groups, all of which contain k cooperators.
This is the process at the group selection time scale 6 . It will only visit states in which there is one string of subsequent groups with k cooperators, and one string of subsequent groups with l cooperators -besides of course the absorbing states, where all groups have k cooperators or all groups have l cooperators. Therefore, the population state can be denoted by a single number i, 0 ≤ i ≤ m, representing the number of groups with l cooperators.

Birth-Death
For the Birth-Death process, the only way in which i can change is if one of the groups on the two boundaries between the strings of different group types is chosen for reproduction; one of the two k-groups on the boundary if i is to go down, and one of the two l-groups on the boundary if i is to go up. Moreover, it needs to replace the group on the other side of the boundary, and not its other neighbour or itself, which happens with probability 1 2 m−1 m . The probability of a transition from i to i − 1 in the Birth-Death process then becomes . That makes the down-up ratio for the Birth-Death process equal to

Shift
For the Shift process, change is much more likely. State i changes any time the group that is chosen for reproduction is of a different type than the group that is chosen for death. The probability of a transition i m−1 , while the probability of a transition from i to i + 1 is That makes the down-up ratio for the Shift process also equal to

Birth-Death and Shift
Since the down-up ratios are the same for both, the fixation probability of a group with l cooperators, while all other groups have k of them, is also the same for Birth-Death and Shift: This can be rewritten as In the limit of p → 1, we will use τ 0→n and τ n→0 , and in the limit of p → 0, we will use τ k→k+1 and τ k→k−1 for 1 ≤ k ≤ m − 1.

The limit p → 1
In this limit, group events are rare, while there are individual events almost all of the time. This implies that within her group, an initial mutant will either have gone extinct, or gone to fixation, before a group event happens. With the separation of timescales, we can first concentrate on the probabilities of fixation within the group happening. With j being the number of cooperators in the group, a defector replaces a cooperator within the group, and a cooperator replaces a defector, with probabilities The down-up ratio therefore is 1−c , making the fixation probabilities within the group After fixation within the group, all groups will be homogeneous. As soon as the population consists of homogeneous groups, the population state can only change with a group event. Therefore, the fixation probability of a mutant will be the product of the fixation probability of the mutant within the group -which is the same for both processes, because this does not involve group events -and the fixation probability of the all-mutant group in the population -which is also the same for both processes, as we have seen above.
If we do consider the limit of weak selection, then we can approximate σ C with 1 n 1 − n−1 2 c for small c, and τ 0→n with 1 m 1 + m−1 2 b for small b. That implies that in the limit of weak selection, ρ C > 1 nm if b c > n − 1 m − 1 . (2006) is

Condition [1] in Traulsen and Nowak
This condition applies to their model without migration, and also with almost all events being individual replacements. In their model, the individual reproduction rate in all-cooperators groups (and therefore also their group reproduction rate) is b − c higher than the individual reproduction rate in all-defector groups. In our model, the group reproduction rate is increased by b, which explains why there is a 1 on the right hand side in their inequality, which one can also write as b−c c > n m , and not in ours.
Similarly, we can approximate σ D with 1 n 1 + n−1 2 c for small c, and τ n→0 with 1 m 1 − m−1 2 b for small b. In the limit of weak selection, that gives the same threshold for when ρ D < 1 nm . Traulsen and Nowak (2006) also consider a version with migration, but because the migration rate is taken to be proportional to the probability that an individual reproduction event induces the group to split (which is vanishingly small), this implies that these events are also only happening occasionally, and if they do, the migrants almost always either take over their new group or go extinct there before anything else happens that is not an individual event.

The limit p → 0
In this limit, individual events are rare, while there are group events almost all of the time. The population will therefore regularly be in a state in which all groups have the same composition. When each group has the same number of cooperators, the population state can only change as the result of an individual event. If it does, then that event can make the number of cooperators within one group go up by one, or go down by one. After that, a sequence of group events either make the new "mutant group" go extinct, or go to fixation.
Fixation of the mutant group happens with probability τ k→k+1 if the individual event had a cooperator replace a defector, and with probability τ k→k−1 if a defector replaced a cooperator. With rare individual events, the population, on the larger time scale, therefore moves between states where all groups have the same number of cooperators, and this number goes up or down by at most 1. The probability that it goes down by one is the probability with which a defector reproduces, which is n−k n−kc , times the probability that a cooperator dies, which is k n , times the fixation probability of the "mutant group", which is τ k→k−1 . The probability that it goes up by one is the probability with which a cooperator reproduces, which is k(1−c) n−kc , times the probability that a defector dies, which is n−k n , times the fixation probability of the "mutant group", which is τ k→k+1 . The down-up ratio at the larger timescale therefore is They can be smaller or larger than 1, depending on the parameters. With these down-up ratios at the larger timescale, we find the overall fixation probabilities of cooperators and defectors. The effect of the different separations of time scales therefore is the same in the limit of weak selection, and in both of them, the cancellation effect dissipates; in the first, because there are almost no mixed groups, in the second because the population almost never consists of differently composed groups. Note also that this formula makes perfect sense for n = 1, in which case any positive group benefit will make cooperators do better than defectors, and for m = 1, in which case no benefit is ever high enough to offset positive costs of cooperation.

Empirical implications
between them is E [X 1 − X 2 ] 2 = E X 2 1 − X 2 2 − 2X 1 X 2 = 2E X 2 1 − E 2 [X 1 ] = 2V ar(X 1 ) In our case, X 1 and X 2 represent drawing an individual from one and the same group, where drawing a cooperator makes the value of X to be 1, and drawing a defector makes it 0. If we draw two individuals without replacement, instead of with replacement, then they are no longer independent. We can however still compute the expected squared difference between the draws. This is i n i − 1 n − 1 · 0 2 + i n n − i n − 1 · 1 2 + n − i n i n − 1 · 1 2 + n − i n n − i − 1 n − 1 · 0 2 = 2 i(n − i) n(n − 1) Half of this is what is used instead of the within-group variance. For every state k, the "without-replacement" version of the F ST is therefore equivalent to the definition as a difference in conditional probabilities r s,k = P s,k (C|C) − P s,k (C|D). For large group sizes n, the difference between the with and without replacement version of the within-group variance disappears. There is yet another definition of F ST , which expresses it in terms of within-group and between-group variance. We can take the following steps to get to that expression. between-group variance average within-group variance + between-group variance (51) Again, for large group sizes n, the with or without replacement versions are close.

Empirical estimates
The condition for cooperation to be selected for by group selection used in, for instance, Bell et al. (2009) (see also Weir and Cockerham 1984;Crow and Aoki 1984;Aoki and Nozawa 1984;Bowles 2006;2009;Langergraber et al. 2011;Walker 2014;Rusch 2018) is Here β(w g , p g ) is the increase in the mean fitness of the group as a result of an increase in the frequency of cooperators, or altruists, and β(w ig , p ig ) is the decrease in fitness of an individual as a result of switching from defection to cooperation. The idea is that this criterion separates the fitness effects, on the left hand side of the inequality, from a measure that characterizes the population structure, on the right hand side of the inequality. In a setting with a linear public goods game, played within groups that compete with each other in a well mixed population of groups, such a separation can indeed be made in this way.
Suppose the fitness of a cooperator and a defector in a group with i cooperators (for cooperators: including the individual itself) are w C,i = 1 + i−1 n−1 b − c and w D,i = 1 + i n−1 b. Being a cooperator instead of a defector would then give n − 1 others in the group a fitness benefit of 1 n−1 b, adding up to an aggregate fitness benefit of b, at a fitness cost to the individual of c. The average fitness within the group would go from iw C,i +(n−i)w D,i n = 1 + i n (b − c) to (i+1)w C,i+1 +(n−i−1)w D,i+1 n = 1 + i+1 n (b − c), which amounts to an increase of 1 n (b − c) as a result of an increase in the frequency of cooperators within that group of 1 n . This makes β(w g , p g ) = b − c. The fitness effect measured by β(w ig , p ig ) is similarly interpreted as c. Rewriting If we were to measure β(w g , p g ) in a setting in which competition between groups is not actually global, but to some degree local, then the resulting value for β(w g , p g ) would not only reflect the effect of cooperators on the average fitness within the group, but a mixture of these fitness effects and the cancellation effect. A moderate value for β(w g , p g ) can both be the result of a moderate group benefit and the absence of the cancellation effect, and a high group benefit combined with the cancellation effect at the group level. In the latter case, the negative effect of having neighbouring groups with many cooperators, combined with the positive correlation between being a cooperative group and having neighbouring groups with many cooperators, would bias the estimated effect of -all else equal -the number of cooperators on average fitness within the group downwards.
In other words, this term would end up absorbing the cancellation effect. In order to disentangle all fitness effects and the cancellation effect, one would have to estimate a more complex statistical model, which would not only use the composition of the own group as explanatory variable of the average fitness within the group, but also include the composition of neighbouring groups as an explanatory variable. This would be hard to estimate because it will require sufficiently high independent variation to overcome multicollinearity, but, if successful, it would separate the positive effect of the cooperation within the group from the negative effect of having a cooperative neighbouring group.
What most empirical papers do, however, is only estimate the F ST , which is then taken as an indication of how conducive the population structure is to cooperation. We have seen that the absence or presence of the cancellation effect -which is part of the population structure -can however make a huge difference for how much the group needs to benefit from cooperators in it, relative to the individual costs, in order for cooperation to spread in the population.