Multilevel selection theory and evidence: a critique of Gardner, 2015
Abstract
Gardner (2015) recently developed a model of a ‘Genetical Theory of Multilevel Selection, which is a thoughtfully developed, but flawed model. The model's flaws appear to be symptomatic of common misunderstandings of the multi level selection (MLS) literature and the recent quantitative genetic literature. I use Gardner's model as a guide for highlighting how the MLS literature can address the misconceptions found in his model, and the kin selection literature in general. I discuss research on the efficacy of group selection, the roll of indirect genetic effects in affecting the response to selection and the heritability of group‐level traits. I also discuss why the Price multilevel partition should not be used to partition MLS, and why contextual analysis and, by association, direct fitness are appropriate for partitioning MLS. Finally, I discuss conceptual issues around questions concerning the level at which fitness is measured, the units of selection, and I present a brief outline of a model of selection in class‐structured populations. I argue that the results derived from the MLS research tradition can inform kin selection research and models, and provide insights that will allow researchers to avoid conceptual flaws such as those seen in the Gardner model.
Introduction
In a recent paper, Gardner (2015) developed a ‘Genetical Theory of Multilevel Selection’ that made it apparent that there are deep misunderstandings about multi level selections (MLS), and the research tradition on which it is based. The problem appears to not be with Gardner per se, but rather a general feature of the kin selection tradition, which is the framework within which Gardner works. Gardner's paper is thoughtfully developed and shows that his views have been refined from his earlier work on the topic (e.g. West et al., 2007, 2008). Nevertheless, there remain flaws that are common within the kin selection tradition, and it thus provides an excellent framework for reviewing issues associated with MLS. The goal of this paper is not to criticize Gardner directly, although I will discuss issues with his paper. Rather, I seek to use his paper as a guide to identify and discuss literature and concepts that are neglected or misunderstood by those outside of the MLS tradition.
Throughout this paper, I will use the terms ‘group selection’, ‘cell‐level selection’ and other descriptive terms to refer specifically to selection acting at a level other than that of the individual, and I will use ‘MLS’ to refer to situations in which selection is acting at more than one level, typically this will be at the individual and group levels, but can also be at other levels, such as the cell level and the individual level.
Group selection
In reading Gardner's (2015) paper, it quickly becomes apparent that the literature review is lacking, particularly for papers following the tradition of MLS. This is unfortunate because the MLS literature, and particularly the experimental literature, provides insights that provide essential information regarding the efficacy of group selection.
There are really three questions that need to be addressed regarding MLS in general, and group selection in particular. The first is whether or not group selection is effective, the second is whether it occurs in nature, and the third is whether MLS provides insights that cannot be attained using other methods, such as kin selection. This is settled science and has been settled for over 20 years.
The way to test whether group selection works is to perform group selection experiments in which there is differential survival and reproduction of groups. The first experiment of this type was performed by Wade (1977), who examined group selection for population size in Tribolium flour beetles, results that were confirmed in numerous experiments varying population size, migration rate and culture conditions (Wade, 2016). Subsequently, a number of group selection experiments following similar protocols were performed (e.g. Craig, 1982; Goodnight, 1985). These are reviewed in Goodnight & Stevens (1997). More recently, there has been work on group selection in agricultural settings (e.g. Muir, 1996; Muir et al., 2013), and indeed, MLS has become an important tool for animal breeders (Wade et al., 2010). The important general conclusion from these studies is that there is always a rapid and highly significant response to group selection. Indeed, this has been shown to be true for group selection by differential migration (Wade & Goodnight, 1991), for low levels of population differentiation (McCauley & Wade, 1980), and even for multispecies communities (Goodnight, 1990a,b; Swenson et al., 2000a,b). This is the insight that these studies provide. Yet, many authors, such as West et al. (2007), conclude that what they call ‘old’ group selection is not effective. The experimental data tell us exactly the opposite. Group selection is highly effective and often overwhelms selection at the individual selection.
In cases where theory and experiment disagree, a reasonable starting position is that it is the theory that is incorrect. In this case, we do know the flaw in the models that leads theory and experiment to disagree, and we have known this flaw for a very long time. Within the tradition of evolutionary biology, Wade (1978) (see also Goodnight & Stevens, 1997) published a paper suggesting that one thing that was missing from classic models of MLS is that they ignored deviations from additivity due to gene interactions and what are now called indirect genetic effects (IGEs) (Wolf et al., 1998). McCauley & Wade (1980) showed empirically that this was the underlying reason for the laboratory response to group selection. A year earlier in the agricultural quantitative genetics literature, Griffing (1977) modelling selection in crop plants reached a similar conclusion. Finally, Goodnight (1990a,b) experimentally confirmed the role of interactions among individuals in the response to community selection using two species communities of Tribolium castaneum and Tribolium confusum. Had Gardner been aware of the MLS literature he would have known not only that his model was not consistent with the results of the experimental literature, he would have known that his model would fail if he did not include interactions among individuals.
The second question, whether group selection occurs in nature, is not so thoroughly explored; nevertheless, it is also quickly becoming settled science. With the introduction of contextual analysis (Heisler & Damuth, 1987), there have been an increasing number of studies of MLS in natural and near natural settings. I will address the validity of contextual analysis later; however, for the moment it is sufficient to say that contextual analysis uses the identical equations as the direct fitness approach of kin selection, albeit in a somewhat different manner (Goodnight, 2013a). Thus, if inclusive fitness is an acceptable approach to studying social evolution, then contextual analysis is as well. Using contextual analysis, there have been multiple studies examining MLS in plants (Stevens et al., 1995; Kelly, 1996; Aspi et al., 2003; Donohue, 2003, 2004; Weinig et al., 2007), arthropods (Breden & Wade, 1989; Tsuji, 1995; Herbers & Banschbach, 1999; Eldakar et al., 2010; Pruitt & Goodnight, 2014), birds (Laiolo & Obeso, 2012) and humans (Moorad, 2013). One study by Pruitt & Goodnight (2014) identified colony‐level adaptations that were consistent with the observed ongoing group‐level natural selection that was measured using contextual analysis. Although at this point we do not have a good idea of how common MLS is in natural populations, we do know that it does occur. Although there is considerable opportunity for observational bias, every study that has examined MLS in the field has found that it is acting, suggesting that MLS may be common.
For the third question of whether a MLS approach provides insights not available using other methods, such as inclusive fitness, the answer is yes. Kin selection and inclusive fitness identifies where the rates of evolution will come into equilibrium (e.g. Gardner et al., 2011), whereas MLS examines the strength of selection acting in a population (e.g. contextual analysis), the response to an applied selection pressure on an experimental population (e.g. group selection experiments) or patterns of variation in an existing population (e.g. Linksvayer & Wade, 2009; Van Dyken & Wade, 2012a,b). Thus, kin selection identifies where a population should eventually stabilize with optimum fitness, whereas the MLS approach identifies how a population will change at its current configuration. Far from being antagonistic, these two approaches have the potential to be highly complementary (Goodnight, 2013a).
The price multilevel partioning
In recent years, there has been considerable discussion about whether the Price multilevel partitioning (Price, 1970, 1972) or contextual analysis is the most appropriate way to model MLS (Frank, 2012; Okasha & Paternotte, 2012; Earnshaw, 2015). Most of this discussion traces from Okasha's book (2006) (see also Okasha, 2004). Although Okasha comes out over all in favour of contextual analysis (Okasha, 2004), he suggests that there are times, such as with soft selection, in which the Price multilevel partitioning is more appropriate.
Although Okasha's (2004) book triggered considerable debate, notably absent from this debate are researchers who actually study MLS in natural populations. Researchers who are studying MLS in experimental (Eldakar et al., 2010) or natural (Stevens et al., 1995; Tsuji, 1995; Herbers & Banschbach, 1999; Aspi et al., 2003; Donohue, 2003, 2004; Weinig et al., 2007; Laiolo & Obeso, 2012; Moorad, 2013; Pruitt & Goodnight, 2014) populations have found that contextual analysis is the appropriate approach to measuring MLS, and in recent years, contextual analysis has come to dominate the field.
The reason for this is clearly expressed by Frank (2012) who notes that the Price multilevel partitioning does not imply a single causal explanation and that it does not make sense to talk about the Price multilevel partitioning and contextual analysis as causal alternatives. What this means is that the Price multilevel partitioning is a mathematical equivalence that allows the change due to selection in a structured population to be partitioned into within‐ and among‐group components. Such a partitioning does not imply causality, and to make the claim that among‐group changes in fitness are solely the result of group selection is simply incorrect. In contrast, contextual analysis, although most easily performed as a multiple regression, is actually a form of path analysis (Stevens et al., 1995; Frank, 2012), and path analysis is a causal analysis (Wright, 1968; Li, 1975). Although there are well‐understood issues with the implied causality when using regression in selection analysis (Wade & Kalisz, 1990), this is an issue for selection analysis in general and is not unique to contextual analysis.
To see why the Price multilevel partitioning is inappropriate, consider Gardner's (2015) Price multilevel partitioning for a structured population, which I have modified to simplify the notation and make it more appropriate for the present purposes:

is the covariance between groups, and
is the expected covariance within groups. The Price multilevel partitioning is a simple transformation, and, assuming no transmission bias (Frank, 1995), this partitioning cannot be disputed.
Because the issue with the Price multilevel partitioning I want to discuss relates to the group‐level covariance, although the result is equally true for the individual‐level covariance, I will focus only on the among‐group component of the Price multilevel partitioning. What we want to do is to divide the overall group‐level covariance into a component that is solely due to selection acting at the group level, and a component that is due to selection acting at the individual level. Contextual analysis achieves this partitioning using partial regression; however, for our purposes it is more convenient to use partial covariances. First, consider the partial covariance between relative fitness and group mean phenotype,
. In words, this is the covariance between relative fitness and group mean phenotype with the effects of individual phenotype removed. From Goodnight et al. (1992) (see also Bijma & Wade, 2008):



, which is the effects of group selection, and
, which is the effect of individual selection on the group mean phenotype. This demonstrates that the among‐group covariance from the Price multilevel partitioning contains both effects due to individual selection and effects due to group selection.
To see the importance of this, consider a structured population in which selection is only occurring at the individual level. By random chance, some groups will have a larger proportion of high fitness individuals. Because of the higher proportion of high fitness individuals, these groups will also have a higher mean fitness, and as a result, there will be a covariance between group mean fitness and group mean phenotype. What this means it that it is never appropriate to equate the among‐group component of the Price multilevel partitioning with group selection.
This has not always been recognized. Thirty years ago, Wade (1985) published a model that, except for a few minor details, is identical to the model presented by Gardner (2015). However, in the time since its publication, the problems with the Price multilevel partitioning have become apparent (e.g. Goodnight et al., 1992), and within the MLS community, the Price multilevel partitioning is not used either to analyse data or to draw conclusions about the strength and direction of group selection.
Contextual analysis
What has supplanted the Price multilevel partitioning for analysing MLS is contextual analysis. Contextual analysis is an extension of standard methods for selection analysis (Lande & Arnold, 1983; Arnold & Wade, 1984). These are standard accepted methods that have been used extensively (Kingsolver et al., 2001). Theoretical work (Heisler & Damuth, 1987; Goodnight et al., 1992) has demonstrated that it does accurately partition levels of selection in classic models, and it is based on the same equations that are used in the direct fitness approach (Taylor & Frank, 1996; Goodnight, 2013a). Importantly, the direct fitness approach, and thus potentially contextual analysis, has been shown to be equivalent to the inclusive fitness approach (Taylor et al., 2007), although as mentioned above, the use of the equations in contextual analysis are somewhat different (Goodnight, 2013a). Thus, it is surprising that there is such apparent resistance to contextual analysis in analysing MLS (e.g. Gardner & Grafen, 2009).
Some of this may come from one apparently paradoxical result of analyses of soft selection using contextual analysis. In soft selection, there is no variation in fitness among groups; however, contextual analysis does identify a group‐level effect in this model (Goodnight et al., 1992). This is actually an example that demonstrates the power of contextual analysis. As pointed out above, if there was only individual selection acting in the system, there would be variation among groups in fitness, and it would equal
. The fact that there is no variation among groups indicates that there must be group selection countering the effects of individual selection at the group level and that it must be equal and opposite to that effect.
In this situation, an analogy might be useful. Consider an animal, say a lizard, that varies in length and weight, and that these two measures are positively correlated. If we were to select for increased length in these lizards, we would select for the longest animals, but ignore their weight. If we did this selection, we would expect the weight of the lizards to increase as a correlated response to selection on length. If we wanted to increase the lizard length without increasing the weight, we would need to select for long skinny lizards. That is, we would select the longest animals, but in order to keep their weight from changing, we would have to also select for the lightest animals. It is exactly the same with group and individual selection. The trait and the group mean of the trait are almost certainly correlated. If we select at the individual level, the group mean values of the traits will inevitably increase. The only way to prevent that is to exert counter selection at the level of the group. That is exactly what is happening in soft selection. Group selection is required to counteract the inevitable change in mean that results from selection acting at the individual level.
Breeding values
One assumption that it is rarely questioned is the assumption of constancy of genetic effects. This is illustrated by Gardner's (2015) model in which he uses the Price multilevel partitioning to partition breeding values. He specifically cites Falconer (1981) and thus identifies that he is partitioning breeding values as defined by Fisher (1930) (Falconer & Mackay, 1996); however, Fisher defined breeding values for a single well‐mixed population in which interactions could be ignored. To partition Fisherian breeding values, he must assume that the genetic effects underlying the breeding values are not affected by IGEs or other genetic interactions (for the effects of epistasis on breeding values see Goodnight, 1995). In the real world, individuals do interact, and these interactions affect the expression of phenotypes and breeding values (e.g. Griffing, 1977, 1989; Goodnight, 1985, 1990a,b; Muir, 1996; Wolf et al., 1998; Bijma 2010, Bijma, 2011). These ‘IGEs’ are effects on the phenotype of one individual that are due to the expression of genes in a different individual (Wolf et al., 1998). Not all genetically based interactions among individuals are IGEs. Interactions are only considered IGEs if they directly modify the expression of a trait. ‘Indirect genetic effects’ were first explored by Griffing (1977, 1989), but more recently have received a substantial amount of attention (e.g. Wolf et al., 1998; Bijma et al., 2007; Bijma & Wade, 2008; Bijma, 2014). All of these studies have shown that IGEs have dramatically different effects at the group level than at the individual level. Indeed, in one study, Goodnight (1985) found a negative response to individual selection in interacting populations, but a positive response to selection at the group level, a result that was predicted to occur in interacting plant populations by Griffing (1977). Wolf & Wade (2001) showed that the practice of assigning offspring fitness to parents can lead to this same type of confusion over the direction of selection.
Partitioning Fisherian breeding values is impossible in interacting systems, because, as originally defined by Fisher, breeding values are calculated without regard to population structure, and as a consequence, they lose much of their usefulness in structured populations. However, Bijma & Wade (2008) proposed the concept of ‘total breeding value’ (TBV). The TBV of an individual is twice the average phenotype of an individual's offspring measured in the social environment in which they actually are found. In the situation where we have individuals within demes and demes within a metapopulation, we can use the Price multilevel partition to legitimately partition the TBV, although the previously discussed issues with the Price multilevel partitioning remain:

The TBV consists of the direct genetic effects of an individual on itself (DGEs), and the IGEs of deme members on the focal individual. As with standard quantitative genetics, and without loss of generality, assume that all the DGE's, the IGE's and the TBVs are measured as a deviation from the metapopulation mean, and thus, the global (but not the deme) mean of all of these values equals zero. Finally, assume that all members of the same deme experience the same IGEs. This assumption is not necessary, but simplifies the algebra for this example. Thus, DGEij is the direct genetic effect on the ith individual in the jth deme, and IGEj is the indirect genetic effect on all individuals in the jth deme. The classic Fisherian breeding value assumes that the offspring are randomly dispersed, and as a result, IGEs are experienced equally by all individuals and do not enter into the breeding value equation. For TBVs, we can show that the within‐deme component of the Price multilevel becomes:

Remembering that the metapopulation mean TBV = 0, the between‐deme component becomes:

In words, because all individuals in the same deme experience the same IGEs, they do not contribute to the within‐deme component of the covariance between breeding value and relative fitness. On the other hand, different demes do experience different IGEs, and the IGEs do contribute to the deme mean breeding value and thus the among demes component of the covariance between TBV and relative fitness (Fig. 1). In effect, Fisherian (Fisher 1930) breeding values treat the population as if it was unstructured, and cannot be partitioned when structure is imposed on a population, whereas the Bijma & Wade (2008) TBV explicitly incorporates population structure, and as such, it does allow such a partitioning.

To see why Griffing (1977) would predict a negative response to individual selection and a positive response to group selection, consider a population of a crop plant, such as corn. Most plants are fairly well adapted, and in the absence of interactions, individual selection for increased yield would be expected to produce at most a moderate increase in yield. On the other hand, in an interacting population there is another possible way for a plant to have increased yield. Plants that more aggressively compete for resources with their neighbours will have a higher yield at their neighbour's expense. This has the effect of creating a strong negative genetic correlation between DGEs and IGEs: what is good for the individual is bad for its neighbours. Griffing (1989) demonstrated that Arabidopsis do have direct genetic effects on themselves and negatively genetically correlated IGEs on their neighbours. Thus, a plant with high yield will, either through shading or through root competition, suppress the yield of its neighbours. Griffing (1977) pointed out that, in agreement with the TBV partitioning shown above, such negative IGEs cannot respond directly to individual selection as the phenotypic effect is on the neighbour, but can respond as a correlated response to selection on individual yield. Thus, individual selection was likely to result in increased competition, and as a result a negative response to individual selection, a situation that was seen in the experiment of Goodnight (1985). On the other hand, group selection can act directly not only on the direct genetic effects, but also on the IGEs, which explains why in Goodnight's experiment group, selection was highly effective in spite of the negative response to individual selection.
The overall effect of this is that it would have been possible for Gardner (2015) to use the Price equation to divide the phenotype or the TBV into within‐ and among‐group components, but not to divide the Fisherian breeding values into these components. It must be acknowledged that a trait and the group mean of a trait have a different genetic basis because of IGEs.
The way to handle this is to treat a trait and its group mean as different but correlated traits, which is the approach used in contextual analysis (Heisler & Damuth, 1987), and, by extension, implicitly the way it could be handled in the direct fitness approach (Taylor & Frank, 1996). In contextual analysis, a trait and its group mean are treated as separate traits not for convenience but because that is the reality: They are different traits with a different genetic basis.
This is actually a serious issue for kin selection models in general. For simplicity, Hamilton (1964) used a single locus model. Since then, kin selection models have either typically assumed a additive genetic basis for the traits (e.g. Hamilton, 1975; Charnov, 1977; Charlesworth, 1980; Michod & Hamilton, 1980; Seger, 1981; Grafen, 1985; Taylor, 1989, 1990; Gardner, 2015), or avoided specifying a genetic basis which allows IGEs and gene interactions to be implicitly incorporated (e.g. Queller, 1992, 2011; Taylor & Frank, 1996). One of the strengths that Gardner (2015) identifies for his model is that his model ‘describes the action of group selection in terms of change in a genetical character’. The results of studies on IGEs indicate that great caution is needed in doing this. Models that assume a simple genetic basis are inadequate for modelling evolution in structured populations. In general, group selection, or indeed individual selection in structured populations (Goodnight, 2000), cannot be meaningfully reduced to selection on individual alleles. This deserves further explanation. It must be true that at any moment, it is, in principle, possible to assign fitness to an individual allele; however, in interacting systems, be it gene interactions or IGEs, such assignments are dependent on the context in which they are measured. As gene frequencies and population structures change, the assignment of fitness will also change. Thus, such fitness assignments can change rapidly over short periods of time and have little value for predicting the course of evolution. Goodnight (2015) gives an example involving epistasis where early in the selection process, the effect of an allele on fitness is positive; later, it becomes nearly neutral; and finally, it has a negative effect on fitness and is eliminated by selection. Kin selection researchers would do well to consider the effect of polygenic inheritance in interacting systems on their models, and the interpretation of experimental results.
Aggregate and emergent traits
Gardner (2015) and others (e.g. Okasha, 2006) make a distinction between aggregate and emergent traits. In their discussion, a distinction is made between ‘aggregate’ traits that are summary statistics based on characteristics measured on individual organisms and ‘emergent’ traits that are measures taken on the population as a whole, but not expressed by any one individual. An example of an aggregate trait would be the group mean yield of a field of corn, which is simply the average yield of the individual plants, whereas an emergent trait would be one such as population density, which can only measured on the population as a whole.
As should be apparent from the discussion of breeding values, this distinction, while often useful, is not one that greatly impacts MLS research. The reason is that aggregate traits can, and typically do, have a heritable genetic basis that is a product of both direct and IGEs. Thus, from a genetic perspective mean yield is effectively an ‘emergent’ trait. In contrast, one interpretation of contextual analysis is that traits, such as ‘population density’, can be viewed as the ‘population density an individual experiences’ and have much in common with a trait measured on individuals. An interesting example of this is Stevens et al. (1995) who used contextual analysis to examine multilevel (individual and neighbourhood) selection in a continuous population. In this case, every individual is at the centre of its own neighbourhood and experiences a unique population density. As a result, in these continuous populations the population density an individual experiences is unique to each individual and is conceptually very similar to an individual trait. This is one of the ways that contextual analysis has allowed the expansion of MLS thinking into more realistic situations. Of course, the heritability in these continuous populations depends both on the underlying genetic basis and aspects of the population structure such as dispersal distance and pollination distance. The point is that although there are traits that can be most usefully thought of as aggregate traits, and other traits that can most usefully be thought of as ‘emergent traits’, it is difficult to draw a clear distinction between the two types of trait. It is for this reason that terms such as ‘emergent properties’ are not common in the MLS literature.
Fitness assignment and interpretation of selection
One of the important philosophical and conceptual issues with MLS is the level at which fitness is assigned. Using the Price multilevel partitioning, fitness is effectively assigned at two different levels: one being individual fitness and the other being group mean fitness. This is one additional problem with the Price multilevel partitioning, as with fitness assigned at two levels, it is impossible to compare the strengths of selection. Wolf & Wade (2001) identified a similar problem when comparing selection on parents and offspring in systems with IGEs. The correct solution is to assign fitness at only one level, as is performed in contextual analysis. In most situations, we are inclined to assign fitness at the level of the organism, although this need not be the case. Thus, we might measure a fitness component such as the reproductive output of each animal and use contextual analysis to determine what portion of the fitness is explained by individual‐level traits, and what proportion is explained by contextual traits.
For levels at and above the level at which we assign fitness, we can study change in the population in terms of evolution and selection. That is because at these levels, there is variation in fitness, which is required for selection to act (Lewontin, 1970). Below the level at which we assign fitness, by definition we cannot describe changes in terms of fitness and thus selection. As a result, we must call them something else. In the example of assigning fitness at the level of the organism, we can describe changes in the metapopulation of organisms as selection, possibly at multiple levels. However, changes below the level of the organism must be described in other terms (Damuth & Heisler, 1988; Goodnight, 2013b). ‘Development’ is an example of one such term.
Importantly, we can in principle define fitness at any level that we choose, although some levels may be more appropriate than others (Goodnight, 2013b). If we change the level at which we assign fitness, then what we call adaptation by natural selection and what we call development, or an analogous term, changes as well. For example, we could assign fitness at the level of the cell. Then, we could interpret much of what we had called ‘development’ as selection, and selection on organismal traits, such as foraging behaviour, would appear as contextual traits at the ‘group’ (organismal) level. Below the level of the cell, we would have to call changes something else, such as cell maturation. Similarly, we could assign fitness at the level of the group. In this case, changes at the group level or above could be analysed as selection, but not changes below the level of the group. Such changes below the level of the group would have to be interpreted as ‘within‐group ecological changes’, or a similar term.
Note that the level at which we assign fitness cannot have an affect on the changes that are actually occurring. Rather, what changes is how we interpret those changes. This has important implications. One is quite positive. Because the level at which fitness is assigned is the decision of the investigator, they can fit the choice of level to the study. Thus, for a study of whole organismic behaviour or morphology it makes sense to assign fitness at the level of the organism. On the other hand, if we are interested in the study of cancer as a MLS system, then it makes sense to assign fitness at the level of the cell. Finally, a palaeontologist may only have fossils and no way of knowing the fitness of individual ancient organisms. For this palaeontologist, it makes sense to assign fitness at the level of the species (Goodnight, 2013b).
One of the less fortunate implications of this relativity of the level at which we assign fitness is that it can lead to confusion. It is confusion such as this that leads Gardner (2015) to argue that cancer cannot be studied in terms of selection. He argues that cancer cells do not have breeding values and thus cannot be subject to selection; however, it is basically the same argument as saying that cancer cells do not have variation in fitness and thus cannot be subject to selection. This is a consequence of him implicitly asserting that fitness is only correctly assigned at a single level, that of the organism. Demanding such a fixity on the level at which fitness is assigned would make much of evolutionary research difficult or impossible. Certainly, if we insist on assigning fitness at the organismic level, then we cannot take an evolutionary perspective on cell‐level processes, such as cancer. Similarly, our palaeontologist would be out of a job, as they would not have access to the fitnesses of the individual organisms. Finally, from a practical perspective, much of micro‐organismal studies of evolution would be hampered by the difficulty in identifying exactly what the organism is. Thus, although it would be aesthetically pleasing to demand a fixed concept of the level at which fitness should be assigned, this does not reflect reality and we must accept that fitness can be assigned at different levels for different purposes.
Another implication of this relativity of the level at which we assign fitness is that different investigators may assign fitnesses at different levels. This is the heart of the multilevel selection 1 (MLS‐1) and multilevel selection 2 (MLS‐2) debate (Damuth & Heisler, 1988; Okasha, 2006). When the level at which fitness is assigned is moved from the organism to the group, then this can change the interpretation of selection. For example, hard selection is a model of selection in which the fitness of an individual is solely a function of their phenotype. When Wade (1985) used the Price multilevel partitioning to partition hard selection, he was effectively assigning fitness at the level of the group and was thus using an MLS‐2 framework. Because of the problems with the Price multilevel partitioning discussed above, this partitioning indicated that there was group selection acting. On the other hand, when Goodnight et al. (1992) used contextual analysis they were assigning fitness at the level of the organism and were thus using an MLS‐1 framework. In this case, contextual analysis revealed that only individual selection was acting. This raises the question whether hard selection is group selection or individual selection. The answer is that it depends. If you are using an MLS‐1 perspective, then hard selection is strictly individual selection. If you are using an MLS‐2 perspective, then hard selection must be interpreted as group selection, and within‐group changes would be identified as within‐group ecological changes. Goodnight (2013b) provides criteria for choosing which level to assign fitness, and these criteria argue that in the case of hard selection, the MLS‐1 perspective would be more appropriate. It should be mentioned that the MLS‐1/MLS‐2 terminology is somewhat archaic, and an alternative language that will often be more appropriate is to clearly identify the level at which fitness is being assigned. Such an approach avoids the difficulties encountered when there are more than two possible levels at which fitness could be potentially assigned.
The end result of this discussion is that the level at which fitness is assigned is indeed an important issue, and one that can lead to confusion. Nevertheless, it is not a controversial issue, and it is well understood. It is simply the consequence of the complexity of working with structured populations with interacting individuals. Much, if not all, of the misunderstandings are resolved when people clearly identify the level at which they assign fitness.
The units of selection
Gardner (2015) raises the question of whether social groups are units of selection, an example of the much larger debate of what is a unit of selection. This question was actually answered long ago by Lewontin (1970) who identified three criteria that are necessary and sufficient for evolution by natural selection. These are as follows: (i) there must be phenotypic variation; (ii) there must be a correlation between phenotype and fitness; and (iii) the phenotype must be heritable. In one example, Gardner claims
‘Cancer is often conceptualized as involving a tension between different levels of selection… [S]omatic tissues – including cancerous ones – do not generally contribute genes to distant future generations… so their proliferation within the organism cannot correspond to selection in the strict sense of the genetical theory.’
However, if we apply Lewontin's criteria, we can clearly see that (i) there are phenotypic differences between cancer cells and normal cells; (ii) these phenotypic differences include disregulation of the cell cycle, resulting in more rapid division of cancerous cells relative to somatic cells; and (iii) these differences are genetic, usually a minimum of five mutations are needed for cancer to develop. Thus, cancer cells do satisfy Lewontin's criteria in that there is (i) variation, (ii) fitness differences and (iii) heritability. As these are necessary and sufficient conditions for evolution by natural selection, then Gardner is clearly wrong and, provided we assign fitness at the level of the cell, cancer can indeed be viewed from a MLS perspective.
The point is that whether or not a level is a ‘unit of selection’ is an empirical question. This is the reason that, as with so many other issues addressed in this review, the levels of selection controversy is a controversy for philosophers and those who do not understand MLS methods, but not an issue with those who actually do experiments in an MLS framework. The way to determine whether or not something is a unit of selection is go out and measure selection. As outlined in this review, we have the tools to measure MLS. Thus, it is not only an empirical question, it is one that we know how to answer. If we do measure selection in a structured population and find that selection is acting at a particular level, then that level is indeed a unit of selection.
Class‐structured populations
In a second example, Gardner (2015) raises the question of whether a set of groups consisting of two individuals of different classes can be studied as MLS. The example he gives is a parasitoid wasp that oviposits two eggs, one male and one female, in a host. Because the male and female are different classes, they cannot be pooled. This situation, which is a good example of all his concerns about class‐structured models, is easily handled by contextual analysis. The groundwork for this analysis was laid by Lande et al. (Lande, 1980; Via & Lande, 1985).
Assume that we have a measure of fitness for the male and female offspring, such as the probability of surviving to reproduction. Also, we measure one trait on each sex, which we can label ZF for the female and ZM for the male. Finally, we measure a contextual trait on the pair of individuals, C. This trait might be the mean weight of the two sibs, or perhaps the total quantity of toxins they produce in the host.
For this system, there will be a genetic covariance matrix, but no full phenotypic covariance matrix. There is a genetic covariance matrix because they share a common gene pool. Thus, the female carries genes for the trait in the male and vice versa:


for females and

Finally, we need selection vectors, which are the change in phenotype resulting from selection. Again, because of sex‐specific expression, they are subvectors:

for females and

etc. are the selection differentials, or the mean before selection,
etc., subtracted from the mean after selection but before reproduction,
etc.
The overall response to selection will then be:

where
etc. are the response to selection, and
etc. are the mean of the trait in the offspring.
Because the sex ratio is 1:1, exactly half the selection is on males and half of the selection is on females. In the more general case of class structure, selection on each class would be weighted by its frequency (Via & Lande, 1985). In the case of social insects, selection on the workers would be zero as they do not reproduce; however, they would evolve as a result of correlated responses to selection on individual and contextual traits affecting the reproductive class.
The details of how the additive genetic variance for a contextual trait is measured remain to be fully explored; however, these issues are well explored in the IGEs literature. (e.g. Bijma, 2011, 2014). A theoretical integration of contextual analysis and IGEs would presumably solve most, if not all, of these issues. Other than adding contextual traits, the brief model presented here is identical to Lande's model of sexual dimorphism (Lande, 1980) and Via & Lande's (1985) model of phenotypic plasticity.
This model addresses two issues that Gardner (2015) raises. First, by considering selection on the classes separately, but recognizing that they share a common G matrix, this addresses the problem of class structure. I think it is fair to say that the reason that this has not been addressed in the MLS literature is that the only work using contextual analysis in social insects was performed shortly after the method was introduced (e.g. Herbers & Banschbach, 1999) and did not address the issue of class structure. The second issue this addresses is that it shows how contextual traits can be incorporated even in very small populations. Gardner (2015) accepts that an inclusive fitness approach would have no problem with the situation he addresses. Given that contextual analysis uses the same equations as the direct fitness approach, it should come as no surprise that contextual analysis would be equally capable of handling the situation.
Conclusions
Goodnight & Stevens (1997) identified two approaches to the study of MLS, which can be called the ‘adaptation’ approach and the ‘evolutionary change’ approach. In the adaptation approach, a trait of interest is identified, and plausible pathways by which it may have evolved are investigated. Kin selection is a part of this tradition (e.g. Maynard Smith, 1964; Williams, 1966), which is reflected by the fact that kin selection is an optimization approach in that the phenotype or phenotypes with the highest overall fitness are identified. In contrast, the evolutionary change approach is closely related to quantitative genetics and to a lesser extent population genetics. In this approach, the process of selection is observed, and the rate of change due to an experimentally applied selection pressure (e.g. group selection experiments) or the strength of selection in nature (e.g. field experiments) is measured. The MLS approach grew out of this tradition and indeed can be viewed as an extension of quantitative genetics.
The differences in these two approaches result in large differences in the language and literature of the two fields. In kin selection research, the focus is almost exclusively identifying the adaptive forces that lead to the evolution of social traits such as altruism that cannot be explained by individual selection; in MLS research, the focus is on ongoing processes, and situations where group and individual selection are acting in opposition are rarely given special attention. Perhaps due to their different histories, the two approaches tend to differ in how genetic effects are modelled as well. Kin selection models typically use an explicit genetic framework with fitness being maximized relative to some measure of genes or genotypes. In contrast, MLS approaches typically take a quantitative genetic framework emphasizing selection in a multivariate framework that is measured in terms of changes in phenotype within a generation. In the MLS approach, genetics are encompassed in the genetic covariance matrix that is important in the response to selection, but not selection itself.
Unfortunately, these differences in approach and language have tended to isolate the kin selection and MLS traditions. In this review, I have highlighted some of the areas where MLS oriented research can inform research based on the kin selection tradition.
The main result from the MLS laboratory research is that group selection works, a point that is beyond question, and has been so for nearly 40 years (Wade, 1977). As we know that group selection works, instead of developing models to test whether group selection is effective, we should be developing models that explore why it is so effective. A second point is that we actually do know why group selection works (Goodnight, 1990a,b): there is unequivocal evidence that group selection acts on TBV, which includes IGEs (Bijma & Wade, 2008). Any model that fails to include IGEs will miss the essence of why MLS is interesting and important. Third, contextual analysis, which is very similar to the direct fitness approach of Taylor & Frank (1996) (Hamilton, 1964; Goodnight, 2013a), provides us with a valid statistical methodology for measuring selection in natural populations. To the extent that contextual analysis has been used, it suggests that MLS may be much more common in nature than we previously suspected.
This failure of communication between these two approaches has had the consequence that important results from the MLS literature do not inform research following the kin selection tradition. The opposite is probably also the case, although I am not the one to make that judgment. The result is that researchers are unaware of how important the results of the other fields are to their own research. This is seen clearly in Gardner's (2015) paper, but indeed it can be seen in many of the papers coming out of the kin selection tradition. I have attempted to show a few places where the results coming out of the MLS tradition can directly inform kin selection models, and in the process, these results can hopefully promote an intellectual fusion of these fields that will benefit our overall understanding of evolution in structured populations.
Acknowledgments
I thank M. Ritchie for suggesting that I write this paper, and L. Higgins and R. de Brito for helpful discussions; M. J. Wade for useful comments on the manuscript; and P. Bijma and an anonymous reviewer for their careful review of earlier versions of this manuscript. This work was performed while at residence at the Universidade Federal São Carlos, Brazil, in the laboratory of R. de Brito, and supported by FAPESP grant number 2014/04455‐5 to R. de Brito.
Appendix
Note added in proof
In the response to my comment concerning his paper, Gardner identified an error in my model for separate selection on males and females. I am grateful to Dr. Gardner for pointing out that the matrices of separate male and female covariance cannot be inverted. In addition, my original formulation indicated that selection on the contextual trait in one sex did not affect the individual trait in the other sex, which is not true.
To correct this, we need a single phenotypic covariance matrix that combines males and females. However, because any one individual can only express the trait for one sex, there can be no phenotypic covariance between these two traits. Thus, the phenotypic covariance matrix becomes:

This matrix will be invertible under normal circumstances.
What I had listed as the S vectors, or selection differentials in this paper, are actually selection gradients, or vectors of partial regressions of phenotype on fitness for females and males (βF and βM respectively). Thus, the vector of partial regressions for females is:


where β(C)F and β(C)M are the partial regressions on the contextual traits in the female and male, respectively. The selection vector for females becomes:


There are two things to note here. First, the sex‐specific traits can change due to selection on the shared contextual trait. Second, even though the opposite sex trait is changing, it, of course, will not be expressed in the sex under selection. Thus, we need the combined selection vector for both sexes:

Number of times cited: 11
- D. N. Fisher and A. G. McAdam, Social traits, social networks and evolutionary biology, Journal of Evolutionary Biology, 30, 12, (2088-2103), (2017).
- E. R. A. Cramer, S. A. Kaiser, M. S. Webster, T.S. Sillett and T.B. Ryder, Characterizing selection in black‐throated blue warblers using a sexual network approach, Journal of Evolutionary Biology, 30, 12, (2177-2188), (2017).
- Ciprian Jeler, Multi-level selection and the issue of environmental homogeneity, Biology & Philosophy, 32, 5, (651), (2017).
- David Jablonski, Approaches to Macroevolution: 2. Sorting of Variation, Some Overarching Issues, and General Conclusions, Evolutionary Biology, 10.1007/s11692-017-9434-7, 44, 4, (451-475), (2017).
- Jonathan N. Pruitt, Charles J. Goodnight and Susan E. Riechert, Intense group selection selects for ideal group compositions, but selection within groups maintains them, Animal Behaviour, 124, (15), (2017).
- David N. Fisher, Stan Boutin, Ben Dantzer, Murray M. Humphries, Jeffrey E. Lane and Andrew G. McAdam, Multilevel and sex‐specific selection on competitive traits in North American red squirrels, Evolution, 71, 7, (1841-1854), (2017).
- César Marín, The levels of selection debate: taking into account existing empirical evidence, Acta Biológica Colombiana, 21, 3, (467), (2016).
- Pierrick Bourrat, Generalizing Contextual Analysis, Acta Biotheoretica, 64, 2, (197), (2016).
- Zachary Shaffer, Takao Sasaki, Brian Haney, Marco Janssen, Stephen C. Pratt and Jennifer H. Fewell, The foundress’s dilemma: group selection for cooperation among queens of the harvester ant, Pogonomyrmex californicus, Scientific Reports, 6, 1, (2016).
- A. Gardner, More on the genetical theory of multilevel selection, Journal of Evolutionary Biology, 28, 9, (1747-1751), (2015).
- Susan A. Dudley, Plant cooperation, AoB Plants, 7, (plv113), (2015).




