Extending the range of additivity in using inclusive fitness

Abstract Inclusive fitness is a concept widely utilized by social biologists as the quantity organisms appear designed to maximize. However, inclusive fitness theory has long been criticized on the (uncontested) grounds that other quantities, such as offspring number, predict gene frequency changes accurately in a wider range of mathematical models. Here, we articulate a set of modeling assumptions that extend the range of scenarios in which inclusive fitness can be applied. We reanalyze recent formal analyses that searched for, but did not find, inclusive fitness maximization. We show (a) that previous models have not used Hamilton's definition of inclusive fitness, (b) a reinterpretation of Hamilton's definition that makes it usable in this context, and (c) that under the assumption of probabilistic mixing of phenotypes, inclusive fitness is indeed maximized in these models. We also show how to understand mathematically, and at an individual level, the definition of inclusive fitness, in an explicit population genetic model in which exact additivity is not assumed. We hope that in articulating these modeling assumptions and providing formal support for inclusive fitness maximization, we help bridge the gap between empiricists and theoreticians, which in some ways has been widening, demonstrating to mathematicians why biologists are content to use inclusive fitness, and offering one way to utilize inclusive fitness in general models of social behavior.


| INTRODUC TI ON
Inclusive fitness is an individual-level quantity identified by Hamilton (1964), which he showed, under some assumptions, to increase due to the action of natural selection. Hamilton genetical pointed out that adult offspring number is affected not just by the actions of an individual but by those of the individuals it interacts with. He observed that measuring those effects involves averaging over possible distributions of genotypes, which in turn involves knowing gene frequencies in the population, neither of which are simple or readily available calculations (Hamilton, 1964). Accordingly, he turned to an alternative metric, "inclusive fitness," which involves taking the perspective of the focal individual and its effects on others (as opposed to others' effects on it). Hamilton (1964) provided a verbal definition for inclusive fitness as follows: the sum of an individual's adult number of offspring after it has been "stripped of all components which can be considered as due to the individual's social environment," and a weighted sum of the "quantities of harm and benefit which the individual himself causes" to the offspring numbers of others. The weightings are degrees of relatedness. Relatedness is a measure of genetic similarity between two individuals (r = 1 for identical twins, r = 0 for random population member, including possibility of self in finite populations). The exact definitions of the fitness effects and of relatedness differ in different formal treatments. For nearly 40 years, at least within behavioral and evolutionary ecology, most field and laboratory workers have treated inclusive fitness as the quantity that organisms appear designed to maximize, and tailored their studies and experiments accordingly (summarized in, e.g., Davies et al., 2012;Westneat & Fox, 2010). However, Hamilton's verbal definition lacks mathematical precision, and in Section 4.2 below we provide such precision for a particular model. This is important for reconciling relatively informal inclusive fitness arguments with full population genetic models.
Further, since at least 1978 (Cavalli-Sforza & Feldman, 1978), the concept of inclusive fitness has been controversial, criticized most notably for assuming additivity of fitness effects. The type of additivity we discuss here refers to how the effects of different social actions combine to affect one individual's offspring number. The well-known challenge for inclusive fitness is that under nonadditivity, changes in gene frequency are no longer wholly attributable to a focal genotype. Since at least 1979 authors have pointed out that, in such scenarios, mean offspring number does a better job at predicting gene frequency change (Grafen, 1979). Unfortunately for biologists, mean offspring number is not a useful maximand in practice, either in terms of empirical applicability or explanatory power (discussed in detail by Levin & Grafen, 2019). Therefore, the problem of nonadditivity remains a relevant challenge for empirical biology.
A potential solution to the problem of nonadditivity is weak selection. Weak selection can arise either because the contributions to fitness of a mutation are relatively small ("w-weak selection") or because the mutant is not far from the wild-type in phenotype space (on average, " -weak selection"). There is a wide consensus that under weak selection organisms at equilibrium act as if maximizing their inclusive fitness (Gardner et al., 2011;Grafen, 2006;Lehmann et al., 2015Lehmann et al., , 2016Lehmann & Rousset, 2014;Okasha & Martens, 2016b;Taylor, 2017). While this goes some way toward satisfying biologists hoping to use inclusive fitness, two significant challenges remain. First, the difficulty of capturing Hamilton's inclusive fitness in models has forced many mathematical biologists to use replacements for the correct inclusive fitness, such as "simple-weighted sum" inclusive fitness, or neighbor-modulated fitness, in their tests for maximization (Lehmann et al., 2015;Okasha & Martens, 2016b). This makes the results for inclusive fitness maximization, positive or negative, difficult to interpret biologically. Second, it is not immediately clear how widely we expect weak selection to hold. If the conditions needed for inclusive fitness maximization are rare in practice, this would be unfortunate news for empiricists, as no practically useful alternative maximand has been offered (Levin & Grafen, 2019).
Our aim here, then, is to resolve both problems, providing formal support for extending the range of inclusive fitness's applicability.
We do this through two steps. First, we invoke an assumption that we expect to be reasonable and usually close to holding across a wide range of biological scenarios, and which recovers a form of weak selection. This is the assumption that the strategy set (set of possible phenotypes) contains all probabilistic mixtures of all pairs of strategies. Second, we illustrate a new method for capturing Hamilton's (1964) verbal definition of inclusive fitness in a mathematical model testing for fitness maximization, which is to replace Hamilton's comparison with the nonsocial situation with a comparison with the resident phenotype, thus taking an ESS-like approach. Using these two steps, we show that inclusive fitness is indeed maximized under probabilistic mixing in two particular models. This provides formal support for biologically meaningful extension of the range of applicability of inclusive fitness, and a mathematical method for utilizing Hamilton's inclusive fitness in maximization modeling.
We proceed as follows. First, we illustrate the two steps in greater detail, providing verbal arguments for the importance of probabilistic mixing and correctly capturing inclusive fitness. Second, we turn to two recent models by Okasha andMartens (2016b) andLehmann et al. (2015) does which developed sophisticated techniques for studying inclusive fitness maximization. These two models are of particular interest because they study fitness maximization at the individual level in an encouraging way, and yet do not find inclusive fitness maximization where biologists would hope they might. We reanalyze these models, showing that our suggested new assumption of probabilistic mixtures and suggested new expression for inclusive fitness recover inclusive fitness maximization in both settings. Finally, we discuss the relevance of these findings to recent work on fitness maximization more generally, further implications of our analysis for calculating inclusive fitness, and how empirical biologists might utilize the results.

| PROBAB ILIS TI C MIXING AND IN CLUS IVE FITNE SS
Here, we verbally articulate two steps which recover inclusive fitness maximization in a wide range of scenarios. These points are illustrated mathematically in the subsequent sections, in which we instantiate the points in specific models.

| Probabilistic mixing and weak selection
Our first point is that when the strategy set (set of possible phenotypes) contains all probabilistic mixtures of all pairs of strategies, weak selection arises near an equilibrium, and therefore, inclusive fitness should be maximized. To understand this point, it is useful to make a distinction between phenotypic and genetic additivity, in the following senses. Phenotypic additivity is determined by the game matrix that would be constructed by biologists studying interactions, which they interpret as a game, who can observe the actions performed in the game and the payoffs from the game but not the genotypes. Genetic additivity among a set of genotypes is determined by whether the fitnesses of individuals can be written as an additive function of the genotypes of the interactants. Weak selection arises when an analysis restricts itself to a genetically additive subset of genotypes.
We argue that under probabilistic mixing, phenotypic nonadditivity is compatible with genetic additivity. This arises because, while deviant behaviors can have large effects, they are expressed rarely.
The verbal argument has been made elsewhere (Queller, 1996;Grafen, 2019 andGrafen, 1979, final paragraph on p. 906), but we repeat it here for clarity. For example, consider the case where strategies are not discrete but continuous, where a player can choose to cooperate on a fraction of occasions. Now, a variant strategy plays Cooperate on a fraction + of occasions, where ≪ 1. In other words, it plays Cooperate instead of Defect on one occasion out of many, and the probability that it is the same occasion its related partner also plays Cooperate is very low (Grafen, 1979).
Thus, under biologically relevant scenarios, phenotypic nonadditivity is compatible with genetic additivity (although a formal treatment has thus far been lacking).

| Measuring inclusive fitness
We also make two significant points about the definition of inclusive fitness. One is simply to emphasize the distinction between inclusive fitness and neighbor-modulated fitness. Inclusive fitness requires a careful isolation of the effects of actor, whereas neighbor-modulated fitness is simply a measure of mean offspring number. These differences are important in practice for biologists and cannot simply be used interchangeably (Levin & Grafen, 2019). We suggest that the verbal definition is more useful, and more in keeping with Hamilton's mathematical definition, if the comparison is instead made with the incumbent behavior (in an ESS-like analysis that tests for rare mutants against an incumbent) or, more generally, is made with the average behavior (which will therefore vary with gene frequencies). This suggestion makes the verbal definition easier to apply and, as we shall see, does support inclusive fitness maximization.
We now turn to two recent models (Lehmann et al., 2015;Okasha & Martens, 2016b), in an attempt to formalize these points and provide support for the use of inclusive fitness as a biological maximand. These two papers continue the encouraging trend toward an explicit mathematical treatment of inclusive fitness maximization. Although they fail to find it, and instead show that mean offspring number (in the guise of neighbor-modulated fitness, the calculation Hamilton termed "unwieldy") is maximized, we are able to utilize their mathematical advances to bolster the use of inclusive fitness by biologists. Okasha and Martens (2016b) analyze a version of the Hawk-Dove game played between relatives (they focus on the simpler cooperation game, but we keep the discussion general here as the conclusions hold for both). Their goal was to look with mathematical precision at the question of whether inclusive fitness appears to be maximized by individuals at equilibrium. Our first point is that neither of the two fitness functions they define corresponds to Hamilton's inclusive fitness, and we show what the third function is below. Our second point is that, when we allow all probabilistic mixtures of Okasha and Martens' strategies also to be strategies, this third function is indeed maximized.

| Do they consider inclusive fitness?
Okasha and Martens' (2016b) first utility function, which they refer to as inclusive fitness, is.
where r is relatedness, and V(i,j) is an individual's payoff when playing strategy i against a partner who plays j. It is immediately apparent that this is not inclusive fitness, but something more akin to simpleweighted sum fitness (Grafen, 1982). It measures the actor's whole payoff plus r times its partner's whole payoff and, therefore, does not partition offspring number by causation. They find that this utility function is not maximized, which is not surprising, as it is not inclusive fitness.
Their second utility function, which they call " Grafen, 1979," is expressed as follows: This payoff function, identified by Grafen (1979), is simply mean number of offspring, and, as expected, Okasha and Martens (2016b) find that that the strategy with the highest value increases in frequency, and establish links between evolutionary dynamics and as-if maximization. Clearly, neither of these utility functions is inclusive fitness as defined by Hamilton (1964).

| What is the correct expression for inclusive fitness?
In order to ask whether inclusive fitness is maximized, we must write a third utility function, which sums the effect on personal payoff of expressing the strategy and the relatedness weighted difference to partner's payoff as a result of actor expressing the strategy, according to Hamilton's, 1964 definition. To do this, we write k as a default, "nonsocial" strategy and, therefore, can express inclusive fitness of an individual playing i against a partner playing j in a population (the "nonsocial strategy") playing k, as the sum of the nonsocial payoff against itself, the deviation from the play of i rather than k against j, and relatedness times the effect on the partner of the play of i rather than k: This formula would be useful if required for connecting to gene frequencies using the Price equation at any frequency (Grafen, 2006), but with the probabilistic mixing assumption for invasion of an incumbent, we regard the partner as also playing the incumbent strategy, so j = k and we obtain

| Is inclusive fitness maximized under probabilistic mixing?
The problem of nonadditivity remains. Consider the simple twoplayer cooperation game with discrete strategies, analyzed above, where each player can choose to play either Cooperate or Defect.
Relatedness, r, is the measure of genetic similarity between players discussed above. In a simple two-player game like this, r also measures assortation between strategies. If we imagine a mutant in the population that played Cooperate instead of Defect, increasing r increases the likelihood that its partner's strategy will also be Cooperate, and inclusive fitness fails to take this alteration in the partner's behavior into account. When fitness effects depend on the partner's genotype, as in the case of nonadditivity, this oversight matters.
However, when we assume probabilistic mixing, for reasons outlined above and elsewhere (Grafen, 1979;Levin & Grafen, 2019;Queller, 1996), we can recover inclusive fitness maximization. Grafen (1979, p.907) has already shown that, when we allow for probabilistic mixing of strategies, inclusive fitness correctly predicts the direction of gene frequency change in the simple game above, and this resolves the problem identified by Okasha and Martens (2016b). In 12, we provide a proof for this simple cooperation game, recovering the links between as-if inclusive fitness maximization and gene frequency change.
In summary, Okasha and Martens' (2016b) "inclusive fitness" function is not inclusive fitness. The natural expression for inclusive fitness arising from Hamilton's (1964) definition and our suggested amendments is our Equation (4). Under probabilistic mixing, this correct inclusive fitness is indeed maximized at equilibrium by each individual, regarding the incumbent strategy as fixed.
principle, a fourth function that we exhibit below. Our second point is that, once we allow probabilistic mixtures of Lehmann et al.'s strategies also to be strategies, this fourth function is indeed maximized at equilibrium. Some of Lehmann et al.'s arguments apply to general strategy sets, and these already include probabilistic mixtures.
When we come to the parts that focus on simple real number strategies, we will need to extend the domain of fitness and other functions accordingly.
First, Lehmann et al. (2015) identify the utility function u A , which they refer to as inclusive fitness: It is immediately apparent then that u A is not inclusive fitness, but instead a version of "simple-weighted sum" fitness (Grafen, 1982).
It measures an individual's personal offspring number plus a weighted sum of the offspring of all its social interactants and therefore fails to isolate the actor's effects, as Hamilton (1964) intended. Lehmann et al. (2015) then turn to a second utility function, u B , which they refer to as "average personal fitness," where P k is the subset of hypothetical neighbor strategy profiles such that k − 1 neighbors have a strategy identical to the focal individual, and q k is the probability of that profile (Lehmann et al., 2015). than u A (Lehmann et al., 2015). This is not surprising, as we expect this to hold for mean offspring number, and it parallels Okasha and Martens' finding about Grafen, 1979. However, none of these functions is inclusive fitness as Hamilton (1964) outlined, and therefore, their analysis cannot satisfactorily interrogate inclusive fitness maximization.

| What is the correct expression for individuallevel inclusive fitness?
Instead, we require a fourth function, which we will call u IF . In line with Hamilton (1964), to obtain u IF, we must sum three components: baseline asocial fitness, the difference to personal fitness as a result of the strategy, and relatedness weighted difference to social partners' fitnesses as a result of the strategy. We define the inclusive fitness of a player with the focal strategy, in a group with an arbitrary distribution of other strategies, but in a population in which almost all individuals play an incumbent strategy x. This follows the individuallevel philosophy as outlined by Lehmann et al. (2016). We will go on to convert that expression to investigate the invader-incumbent case, following Lehmann et al. (2015). Recalling our principle, from the previous example, of adopting the incumbent as the nonsocial strategy for inclusive fitness purposes, inclusive fitness is made up of the following parts: • Baseline asocial fitness in the population as a whole -the average for an x-player, so where x N−1 indicates that all other group members play x, • The difference to own personal fitness as a result of being a y-strategist rather than an x-strategist, in which others play an arbitrary (N−1)-tuple of strategies x −i : • The difference to others' personal fitnesses as a result of the focal individual being a y -strategist rather than an x-strategist, weighted by relatedness: where ŵ differs from w in that the second argument of ŵ describes the strategies of the whole group, and not of the group apart from i. Formally, w x, z, 1 x =ŵ x, zx, 1 x , and we regard ŵ as being undefined if the first argument is not also an element in the group strategies. r (y, x) is relatedness from the perspective of a y player in a population of resident x players. x j for j ≠ i are the elements of x −i .
Putting all this together, we can write the inclusive fitness of an individual playing y , in a group x −i , with population incumbent x as follows: If using this expression to understand gene frequencies in general, we would average this expression over the distribution of x −i that the population structure implies. If instead we are testing for invasion of a population playing x by a rare mutant playing y, all individuals would be playing x ory, and this would allow us to write for a group with kmutants altogether, and average over the different values of k with their probabilities q k (y, x) in Lehmann et al.'s notation. Going further, under the probabilistic mixing assumption, as already discussed in relation to the Okasha and Martens model, we would evaluate inclusive fitness substituting x (N−1) for x −i , that is, we would assume that the neighbors were all playing the incumbent strategy whether they were genetically mutant or genetically incumbent. This simplifies Equation (7) and allows us to define inclusive fitness under probabilistic mixing for

invasion-incumbent purposes as
This simple form is a way of applying additive ideas to phenotypically nonadditive situations and recovers much of the simplicity of the additive case.

| Is inclusive fitness maximized under probabilistic mixing?
In Section 2.3, we offered a verbal argument for the biological im- We proceed to ask whether evolutionary uninvadability = utility maximization, by checking whether the first and second order conditions for uninvadability and utility maximization are the same.
Following Lehmann et al. (2015, equation (3), which applies to arbitrary strategy sets), we write the lineage fitness of the mutant as: where W is the lineage fitness of the mutant, w is the personal fitness of the mutant expressing ỹ in a patch with k − 1 other mutants and N − k residents displaying x, in a population otherwise monomorphic for x. q k is the probability that the neighbor profile of the focal mutant will consist of k − 1 other mutants. For x to be uninvadable, it must be that x ∈ arg max y W (y, x), that is, x must be the best invader against itself and so must achieve a local maximum of W (y, x) In the appendix, we find the first order condition for uninvadability under our probabilistic mixing condition (based on the first partial derivative of W) to equal the first order condition for utility maximization (based on the first partial derivative of u IF ), and the same for the second order conditions. Therefore as a result of gene frequency dynamics, at equilibrium, organisms appear as if trying to maximize inclusive fitness. Due to the wide latitude afforded by the approach, this result holds some generality for inclusive fitness maximization.
In summary, Lehmann et al. (2015) does did not analyze inclusive fitness as defined by Hamilton (1964). We derive the natural expression above in Equation (8). We show in the appendix that under probabilistic mixing, the correct inclusive fitness is indeed maximized.
We expect our result to hold for other recent analyses which have identified mean offspring number as a successful maximand (e.g., Allen & Nowak, 2015), if we adopt our newly articulated modeling approach of regarding the incumbent as Hamilton's "nonsocial" case, and of allowing all probabilistic mixtures of elements in the original strategy set. An interesting future step would be to try to extend our result to more general population structures. Our articulation of these additional conditions will, we hope, help mathematicians and biologists understand each other better in future.

| D ISCUSS I ON
Inclusive fitness has formed the bedrock of a vast body of empiri-  Tables 1 and 2). However, it has long been criticized for its assumptions, most notably additivity of fitness effects, and its failure in such scenarios to predict gene frequency change as well as mean offspring number (sometimes referred to as "neighbor-modulated fitness"). Recent papers have apparently lent support to such claims (though this may not have been their goal) with general mathematical models (Lehmann et al., 2015;Okasha & Martens, 2016b). However, we have shown that such models fail to correctly capture inclusive fitness, and that when the correct expression is used, under the assumption of probabilistic mixtures of phenotypes inclusive fitness maximization is recovered.

| Inclusive fitness maximization
The precise mathematical definition of inclusive fitness depends on the specific settings. However, in defining it precisely in specific cases, here we have aimed to help mathematical biologists find the precise definition in their own setting. Rousset (2004, pages 194-195) has a useful discussion of how the idea of fitness maximization can be understood mathematically, concluding that it should be understood in an ESS-like way, considering the success of a rare mutant against an incumbent. This is in line with the approach advocated by Dawkins (1976Dawkins ( , 1980. This implies that the fitness function must depend not only on the individual's strategy, but also on the incumbent strategy. The stable incumbent is one that is the best-spreading mutant against itself, and the calculation of best-spreading may rely on reproductive values in structured populations. It is useful if the definition of inclusive fitness also connects to gene frequency change at nonrare frequencies, as in Grafen (2006).
There are a number of papers that have considered inclusive fitness maximization whose work we have not addressed explicitly here, but which readers may be interested in referring to for broader considerations of biological maximization. Hamilton The first paper to do this was Grafen (2006), but his highly technical conditions, although perhaps required in a model without dynamic sufficiency, have not so far met with approval or been further developed. Second, Lehmann and Rousset (2014) fitness showed in a simple model that inclusive fitness was maximized under additivity of phenotypic effects on offspring number but not otherwise.
Thus, we have focused on Lehmann et al. (2015) and Okasha and Martens (2016b), as these are the only papers we know of to explicitly analyze inclusive fitness maximization at the individual level. We note that in the case of Lehmann et al., the failure to find inclusive fitness maximization was not their main conclusion. Thus, our aim is not to say that these analyses are wrong or not useful-quite the opposite. Instead, we simply note that both papers appear to offer disappointing conclusions for users of inclusive fitness, and hitchhike on their very useful technical developments to offer further useful biological results.
In doing so, we are following a recent resolution offered by Birch (2017a,b), who argues that the critics (e.g., Allen & Nowak, 2016;Nowak et al., 2010;van Veelen et al., 2017)  of biology often seems to prove difficult. In extending these models and formalizing our verbal arguments, we hope to make it easier for future modelers to make links to the general and verbally expressed conceptual theory when they build precise mathematical population genetic models.

| Probabilistic mixing
The biological significance of the "probabilistic mixtures" assumptions is important to understand. Some of what follows is at the moment our own intuition, and obtaining mathematical proofs of precise versions would be extremely useful. First, uncontroversially, it will usually be conceivable that the assumption is true in any particular example and cannot be ruled out. Second, we conjecture that the possible deviations from the assumption will not tilt the biology in any particular direction, and thus, we can consider the equilibrium under the probabilistic mixtures assumption as a central case. The fact that this central case applies without knowledge of the genetics across such a wide range of possibilities is very important in regarding social biology as possible without detailed genetic knowledge.
The probabilistic mixing approach also provides a particular answer to a little-discussed extra problem raised by nonlinearity. When we ask for the effect of an actor on recipients, should we ask for that effect on the basis that the recipients are (a) incumbents (b) mutants or (c) some probabilistic mixture depending on population structure and relatedness in particular? Under linearity, these all give the same answer. Eshel (2018), for example, assumes we should assume the recipients are mutants, presumably on the grounds that it is the mutant recipients that will further spread the mutant allele, but his model and our rationale for it depend on haploidy.
However, one consequence of probabilistic mixing is that inclusive fitness should be calculated on the assumption that the recipients are incumbents, and we regard this as biologically appropriate.
If mutations really were unconditional, then some mixture would be preferable. But most behavior is conditional, and chance events lead individuals into expressions of different parts of a complex phenotype, so we should not expect to see a correlation of behaviors between related interactants. This chimes with our aim in the Okasha and Martens (2016) example to separate the two effects of relatedness, retaining the part that an actor "cares about" the recipient's offspring number, but ignoring the possibility that the recipient's behavior will tend to be more like the actor's than the population average. This suppression of the second effect provides a unique inclusive fitness, while the alternative is to have a complicated expression that depends on genetic details such as ploidy, penetrance, and dominance, as well as how often the genetic potential for deviant behavior is actually expressed because the appropriate environmental conditions happen to arise. This simplification may reduce the difference between the gene-centered and individual-centered approaches discussed by Lehmann et al. (2015) and Lehmann et al. (2016). Thus, an important question that can be asked of any inclusive fitness formulation under nonlinearity is "what is assumed about the phenotype of the recipients?" We do recognize that in this paper we have focussed only on haploidy and that further challenges are likely to arise in applying our general philosophy to diploid or mixed-ploidy models.
Finally, when the assumption is not true, and the phenotype that would be the equilibrium under that assumption is not available as true-breeding under the actual phenotype set, the possible outcomes are as follows. The simplest possibility is that the population evolves to the phenotypically closest population to the one that would evolve if the assumption were true: that will often be an internal equilibrium with genetic variation. Maynard Smith (1981Smith ( , 1982 made the general point about ESSs and population genetics, and we expect it to be true of inclusive fitness too. Uyenoyama and Feldman (1982) is just one example of population genetic models finding close results with internal equilibria. Sometimes, if the requisite average behavior cannot be achieved under the available genetic variation, there may be scope for intransitivity and for continual flux in gene frequencies. These rough guesses represent food for future theoretical thought and indicate how the equilibrium behavior under the probabilistic mixing assumption may turn out to be useful in understanding a system when that assumption is false.
We expect that the importance of probabilistic mixtures of phenotypes may extend to more general scenarios in which the genetic component of the variability in how individuals act on any given occasion is proportionally low (which implies the -weak selection of Wild & Traulsen, 2007), because it removes the assortation effect of r. We expect this scenario to be the norm for populations near equilibria (or, more precisely, near a point at which a monomorphic population is uninvadeable by any one of set of mutations that code for all nearby phenotypes), where it is usually reasonable to suppose we study organisms (Birch, 2017a,b;Fisher, 1930;Grafen, 1985).

| CON CLUS ION
Empirical successes provide some assurance that the working hypothesis of inclusive fitness is by and large satisfactory. Here, we hope to have lent some formal support for such assurance. Further, we hope that our paper will present future modelers with a mathematical articulation of biologists' intuitions about inclusive fitness under additivity and show how it can be extended on mild assumptions to provide useful guidance in more general situations.

ACK N OWLED G M ENTS
The authors thank SA West and several anonymous referees for helpful comments.

CO N FLI C T O F I NTE R E S T S
The authors declare no competing interests. Writing-review and editing (equal).

DATA AVA I L A B I L I T Y S TAT E M E N T
There are no data to be archived.

A PPE N D I X 1 O K A S H A A N D M A RTEN S
Okasha and Martens (2016b) analyze the simple cooperation game described in the text, in which an altruist donates b to its partner at a cost c, and when two cooperators are paired, they each receive an additional benefit, d. With initial strategy set {C, D}, there are four payoffs that we write formally as We now invoke our probabilistic mixing assumption to allow strategies of the form ( ) to represent playing D with probability 1 − and C with probability . Then, we extend the payoff function for mixes in terms of the original values as follows, To find an equilibrium strategy in the extended set, we seek a probability, 0 < < 1 such that Differentiating with respect to and solving for when = produces a well-known result for an internal extremum (Grafen, 1979;Hines and Maynard Smith, 1979)  This example has shown how payoff functions for scalar strategies need to be extended to probabilistic mixtures to implement the probabilistic mixing assumption. Inclusive fitness in this case is defined for a strategy ( ) in a population playing by adding the payoff to s ( ) against a incumbent, and adding r times the difference it makes to a incumbent that the actor is playing not , as follows, where in the second line the first main bracket shows the payoff to ( ) against itself, and the cofactor of ( − ) shows the inclusive fitness effect of one player deviating from toward Cooperate against a population playing . The rb − c term is the familiar effect from additive models. Nonadditivity appears by regarding the effect of d as contributing d to self (with relatedness of 1) and d to the partner (with relatedness r), hence the factor 1 + r. The factor appears because this is the chance that the nonadditive gain will be made when the population plays .
Thus, the inclusive fitness makes complete sense. It has a turning point at an internal value of only if the cofactor of − equals zero, which is a maximum only if d is negative, because then increasing above results in a decrease in the player's fitness. The cofactor of − equaling zero immediately yields the solution for * given above on dynamic grounds.
Thus, an inclusive fitness analysis, using our probabilistic mixing assumption and using the incumbent in place of Hamilton's "nonsocial" phenotype, yields a very satisfying analysis and interpretation of this two-player game played between relatives. The probabilistic mixing assumption has the consequence that for inclusive fitness purposes we regard the partner as playing the incumbent strategy. Thus, this model supports the idea that we can consider organisms, at equilibrium, to appear as though maximizing their inclusive fitness (Okasha, 2018;Okasha & Martens, 2016b). The calculations effectively repeat those of Grafen (1979), but we articulate the arguments more fully.

A PPE N D I X 2 LE H M A N N E T A L .
Following Lehmann et al. (2015), and the assumptions outlined in our main text, we consider an infinite island model of haploid individuals on patches of size N. We extend Lehmann et al.'s (2015) analysis, and consider a mutant strategy ỹ, in which a mutant displays the deviant behavior with some small probability, , and otherwise, with probability (1 − ), behaves like a resident (x): ỹ = (1 − ) x + y

First order conditions
We can rewrite Equation (10), by the definition of q and p from Lehmann et al. (2015), as, where p k is the probability that a randomly drawn mutant has k − 1 other lineage members in its patch. A strategy x is uninvadable if, given x, In a slight abuse of notation, we will write partial derivatives of functions of ỹ with respect to y and suppress the functional dependence ỹ (y) . When we come to unpack this expression for W, p k , and w, which we do when we require x and y to be single real numbers for computational purposes, we will face and resolve the question of how these functions are extended when we allow their first argument to be a probabilistic combination of real numbers rather than a single real number. For the first example W, the definition of W for a general space from Lehmann et al. (their Equation 3) suffices for one stage of unpacking: The first term is equal to 0 because at y = x we can factor out the fitness term, and ∑ N k = 1 y p k (ỹ, x) = (1) = 0 Turning to the second term, we unpack w by regarding it as a an average over the different realizations of ỹ , and so now allow for a total k mutants and an independent chance, , of each of them displaying the deviant behavior. This gives: where h is the number of mutants displaying the deviant behavior. We can express the binomial as, Eliminating higher order terms of gives: We now adopt the notational convention of Lehmann et al. (2015Lehmann et al. ( , page 1862) that w y, x −1 , 1 x should be regarded as having N + 1 arguments for the purpose of differentiation. Thus, we can take a partial derivative of up to the N + 1th argument (though only actually use up to N).
This allows us to take the derivative of one individuals offspring number with respect to the behavior of other single members of the group.
By permutation invariance, and following Lehmann et al. (2015) in denoting w j as the derivative of w with respect to its jth argument, we get: And from the definition of relatedness following Lehmann et al. (2015), r (ỹ, x) = ∑ N k = 1 p k (ỹ,x)(k − 1) (N − 1) we obtain a first order condition of:

Second order condition
The second order condition is given by the second derivative: The first term of the RHS of the equation equals 0 because at y = x we can factor out the fitness term, and ∑ N k = 1 2 y 2 p k (ỹ, x) = 2 (1) = 0 Turning to the second term, we already have the partial derivative of w (above) as: kw j y, x (N − 1) , 1 x . We unpack by using the definition of p k from Lehmann et al. (Lehmann et al. (2015, box 2) and write: where t k (ỹ, x) is the number of demographic periods ("sojourn time") for which the lineage consists of k individuals. To get an expression for y t k (ỹ, x), we use the matrix, Q, from which t k is derived, as defined in the Supplementary Material of Lehmann et al. (2015, equation A1)). Q is a matrix whose i, jth entry is the probability a patch with j mutants becomes a patch with i mutants in the next demographic period. To obtain the formula, we need a symbol R i−f,j−h,h (y, x) for the probability that the j − h nondeviant mutants contribute i − f individuals to the next time step. The probability of going from j to i mutants is then where h represents the number out of a total j mutants that display the behavior y, and jh is the probability that a group with j mutants will have h individuals displaying the behavior. The Q matrices capture individuals contributed by the deviant displaying mutants, and the R matrices capture mutant individuals contributed by mutants acting as residents. Now, we apply our assumptions that only a small fraction of mutants display and that the chances of displaying are all independent. This gives us (A11) W (ỹ, x) y | y=x = w 1 x, x (N − 1) , 1 x + (N − 1) . and, as equations (A14-A20) show that we can replace r (ỹ, x) with r (x, x), as part of eliminating terms of higher order of , this leads to first order condition at y = x of and second order condition of which match the uninvadability conditions in Equations (A12) and (A22).
Thus, the first and second order conditions for uninvadability and utility maximization are identical when our inclusive fitness is used as utility.