The path method … is not so much concerned with prediction as [it is with] the proposal of a

plausible interpretationof the relationships between the variables. In other words, path analysis is concerned with erecting a causal structure compatible with the observed data (Li, 1975, p. 3).

**Journal of Evolutionary Biology**

# Natural selection. VI. Partitioning the information in fitness and characters by path analysis^{†}

^{†}Part of the Topics in Natural Selection series. See Box Box 1

Correspondence: Steven A. Frank, Department of Ecology and Evolutionary Biology, University of California, Irvine, CA 92697–2525, USA. Tel.: +1 949 824 2244; fax: +1 949 824 2181; e-mail: safrank@uci.edu

## Abstract

Three steps aid in the analysis of selection. First, describe phenotypes by their component causes. Components include genes, maternal effects, symbionts and any other predictors of phenotype that are of interest. Second, describe fitness by its component causes, such as an individual's phenotype, its neighbours’ phenotypes, resource availability and so on. Third, put the predictors of phenotype and fitness into an exact equation for evolutionary change, providing a complete expression of selection and other evolutionary processes. The complete expression separates the distinct causal roles of the various hypothesized components of phenotypes and fitness. Traditionally, those components are given by the covariance, variance and regression terms of evolutionary models. I show how to interpret those statistical expressions with respect to information theory. The resulting interpretation allows one to read the fundamental equations of selection and evolution as sentences that express how various causes lead to the accumulation of information by selection and the decay of information by other evolutionary processes. The interpretation in terms of information leads to a deeper understanding of selection and heritability, and a clearer sense of how to formulate causal hypotheses about evolutionary process. Kin selection appears as a particular type of causal analysis that partitions social effects into meaningful components.

## Introduction

Populations accumulate information by natural selection. The amount of information may be expressed by classical information theory (Frank, 2012b). That purely informational expression describes phenotypes and fitness abstractly, without consideration of the explicit causes that determine phenotypic traits and their association with fitness. Here, I partition phenotypes and fitness into their component causes.

For phenotypes, we must track the influence of genes, symbionts, maternal effects and other potential causes. The components of phenotype lead to explicit models of character expression and heritability. For fitness, we must track how different characters and external forces combine to determine success. An individual's fitness may, for example, depend on a combination of its own phenotype and the phenotypes of its neighbours.

I put those explicit causal components of phenotype and fitness into the fundamental expressions of selection and evolutionary change. I recover an expanded concept of heritability, a precise understanding of Fisher's fundamental theorem and a general form of the equations of selection for multiple characters. With those tools, the following article clarifies kin selection and other social processes (Frank, 2013).

### Box Box 1. Topics in the theory of natural selection

This article is part of a series on natural selection. Although the theory of natural selection is simple, it remains endlessly contentious and difficult to apply. My goal is to make more accessible the concepts that are so important, yet either mostly unknown or widely misunderstood. I write in a nontechnical style, showing the key equations and results rather than providing full derivations or discussions of mathematical problems. Boxes list technical issues and brief summaries of the literature.

I presented much of this material in Frank (1997b, 1998). Here, I pursue four goals. First, I express the key partitions of phenotypes and fitness with respect to my new information theory interpretation of selection (Frank, 2012b). Second, the information expressions translate the traditional regression and variance terms of selection into more meaningful descriptions of cause and consequence. Third, the partitions of phenotype and fitness provide the basis for replacing outdated concepts of kin selection with a solid conceptual foundation (in Frank, 2013). Fourth, I emphasize simplicity, presenting the mathematical material at the most basic level consistent with the concepts. The original publications contain more detail (Frank, 1997b, 1998).

Mathematically, little is required beyond simple forms of statistical regression and the location of points in coordinate systems. Although I use only basic mathematics, the article is nonetheless challenging. I cover a wide array of problems at a very general level, with emphasis on the connections between seemingly different topics. That sustained abstraction and synthesis provide both significant rewards and demanding challenges.

It may seem that the basic problems of selection and kin interactions were solved long ago. Why do we need to revisit those topics? In fact, our understanding of natural selection and kin selection has continued to advance over the past few decades. Those advances have developed while the old formulations have remained. The core of the subject has become cluttered with incompatible expressions from different eras, derived in different contexts. One can no longer go forward without first resetting the foundations.

## Selection

I briefly review the general equations for selection and evolution. Recent articles in this series provide full details (Frank, 2012a, b).

### The Price equation

Consider an initial population. Let be the average in the population of some value (phenotype). A second population has average value . Total change between the populations is . Split the total change into two components

The first term, , is the part of the total change caused by selection. The second term, , is the remaining part of total change by all other causes.

To evaluate these terms, we write the average value as . The index *i* divides the population in any way that we choose. We may use *i* to label by different individuals, by different groups, by genotype or by any other partition of the population. The frequency of a type *i* in the population is . The phenotype associated with *i* is . The average value in the second population is .

We define selection as changes in frequency, holding constant phenotype

Here, the populations differ in their frequencies, , but we have held the phenotype values constant at in both populations. Using for frequency change, we write

To obtain the total change, we need the changes in phenotype holding constant the frequencies

Here, the populations differ in their phenotype, , but we have fixed the frequency at . We use the final frequencies in the second population, , because they provide the proper reference for final phenotype after change (Box Box 2). Using for phenotypic changes, we write

The total change from eqn (1) can now be written (Box Box 2) as a form of the Price equation

### Box Box 2. BoxPrice equation: difference of a product

The Price equation simply expands a difference into multiple terms. Consider, for example, the difference of the product of *x* and *y*, which we write as . We can expand the difference of the product as

which yields

This expression shows that the difference of a product is the difference of the first term holding the second term constant, plus the difference of the second term holding the first term constant, plus the product of the two differences.

We can simplify the difference expansion by combining a pair of terms on the right-hand side. Noting that , we can combine the last two terms into one, yielding

The derivation of the Price equation follows the rule for the difference of a product

The value of the Price equation arises from identifying as the part of total change caused by selection. Selection acts on phenotype at a fixed point in time, so it makes sense to consider selection as the partial difference in frequency holding phenotype constant. When we use log fitness for the phenotype, *m* ≡ *z*, we get an exact correspondence between the selection term and the increase in information expressed by classical information theory (eqn (8)). That correspondence supports interpreting as selection.

### Classical expressions of covariance, regression and variance

The definition of fitness is

where is the fitness of type *i*, and is average fitness. The change in frequency is

Thus, the change caused by selection can be written as a covariance between fitness and phenotype

We can rewrite a covariance as a product of a regression coefficient and a variance term

where is the regression of phenotype, *z*, on fitness, *w* and is the variance in fitness. Selection equations are often expressed with these covariance, regression and variance terms. Classical population genetics expressions for change in gene frequency also have this form, in which we let be the frequency of a gene in a population.

## Information

Frank (2012b) showed that selection can be expressed in terms of information theory. I briefly review the key points in this section.

### Fitness and the gain in encoded information

Fitness, *w*, describes relative changes in frequency. Logarithms provide the natural scaling for relative changes. Using the expression for fitness in eqn (4), we write log fitness as

Using *z* ≡ *m* in the expression for selection (eqn (2)), we have

The classic information theory expression for the change in encoded information between two populations with frequencies and *q* is

With that definition, we have

in which the right-hand side is known as the Jeffreys information divergence, *J*. Thus, we can write the fundamental expression for the accumulation of information by natural selection as

Because *z* in eqn (6) is just a placeholder for any character, we can use *m* in place of *z* in that equation, yielding

Thus, the information accumulated by natural selection, *J*, is equivalently expressed in terms of the regression coefficient and the variance,

### Variance, regression and information

The variance in fitness, , is proportional to the information gain by natural selection, *J* (eqn (9)). It is easy to understand why selection may be expressed in terms of information. Selection is, in essence, a process by which populations gain information about the environment. But, why should the variance arise as an alternative description of selection?

The usual view is that selection acts on differences within the population. The greater the differences, the larger the variance and the greater the opportunity for selection. But, why exactly is the variance the correct measure of differences within the population, rather than some other measure of variation?

Consider the definition of fitness (eqn (4)) given earlier

in which the relative fitness is the ratio of frequencies between the new and old population. Relative fitness is, in essence, a measure of the separation between the new population and the old population, a comparison of vs. *q*. Because the frequencies in each population must add to one, each separation between a pair and must be balanced by opposite separations in other pairs.

Thus, the variation in the ratios measures the total separation of the new population from the old population. In particular, the variance in those ratios – the variance in fitness – is like a distance between the new population and the old population. That distance-like measure has units in terms of the information gain (Frank, 2012b). The variance in fitness expresses an informational distance, the amount of information gained by selection.

Information gain is measured on the logarithmic scale of frequency changes (eqn (7)). The regression coefficient, , transforms fitness from the linear scale, *w*, to the log scale, *m*, yielding the key expression given earlier for the change in log fitness (information) caused by selection

It is common to think of a regression coefficient as a linear prediction estimated from data. That interpretation misleads with regard to understanding the fundamental equations of selection. Instead, the regression coefficient describes the consequence for the change in average value when transforming from one scale to another scale (Boxes Box 3 and Box 4). The proper way to read is a change in scale from *w* to *m* when evaluating the averages and .

### Box Box 3. BoxRegression

Simple regression is based on the equation for a line

in which *z* is the outcome of interest, *y* is a variable that is used to predict *z*, the term *β* is the slope of the line relating *z* to *y* and *a* is the intercept, which is the value of *z* when *y* = 0. The simple regression model is usually written as

in which the *i* subscripts denote values associated with different observations, and is the residual as described below. In some applications, it is convenient to make the intercept *a* disappear, which we achieve by , which gives

This expression is equivalent to the previous one. The only change is that *x* differs from *y* by a constant value. The second expression uses in place of . Those terms have the same value, but I use the term with *x* to emphasize that the relation is now between *z* and *x*. In any regression model, we can make a similar substitution in which we change *y* by a constant factor to get an *x* value that makes the intercept disappear.

From the perspective of regression analysis, provides a prediction of *z* given *x*. The difference between the actual value and the predicted value is the residual (error), . Two changes in notation provide a cleaner expression. Write the regression coefficient as , and drop the *i* subscript, yielding

where the variables implicitly range over *i*.

Regression has a natural asymmetry. In prediction, the value of *z* is the predicted value given the predictor, *x*. In a causal interpretation, in the sense of path analysis (Box Box 5), the effect *z* depends on the cause, *x*. One must keep this asymmetry in mind to interpret regression equations correctly. Proper notation helps. We may write

which emphasizes that the outcome, *z*, depends on the given fixed value of *x*. We read *z* | *x* as ‘*z* given *x*’. If we take the average of both sides

where E(*z* | *x*) is the expectation of *z* given *x*, in which ‘expectation’ means the average value. On the right side, *δ* disappears because the regression coefficient, *b*, is chosen so that the average value of the residual is zero, .

### Box Box 4. BoxChange in scale

In the regression model (Box Box 3) with subscripts used explicitly for labelling types

If we consider subscripts for two different types, *k* and *i*, we can write and . Subtracting these two equations from each other gives

Using Δ to denote a change between the *k* and *i* values

which we can write equivalently as

which we read as: ‘the regression of *z* on *x* is the expected change in *z* for a given change in *x* divided by the change in *x*’. From this expression, we see that a regression coefficient is the expected change in scale for one variable in relation to another variable. One can also think of the regression coefficient as a sort of generalization of differentiation. For situations in which we can consider *z* and *x* as continuous variables with an underlying functional relationship, *z*(*x*), it will often be the case that as the changes become small, Δ*z* → 0 and Δ*x* → 0 with *x* confined to a small range of values, then the regression coefficient approaches the derivative, .

Finally, the variables *x* and *δ* are uncorrelated, so that Cov(*x*,*δ*) = 0. Regression uses all of the available information in *x* about *z*. Thus, any left over deviations, *δ*, cannot contain information about *z*, which is reflected in the lack of correlation between those variables.

When we have multiple predictors, or causes, , then the regression equation is

where each is the partial regression of *z* on , holding constant the other predictor values. Suppose, for example, that we have two predictors, and . For notational convenience, let and , so that the regression equation is

If, as above, we take the difference between two *x* values, holding *y* constant, we obtain

which we read as: ‘the regression of *z* on *x*, holding *y* constant, is the expected change in *z* for a given change in *x* and a fixed value of *y*, divided by the change in *x*’. This expression gives the expected change in scale between *z* and *x* for a given value of *y*. If *z*,* x* and *y* are continuous variables with an underlying functional relationship, *z*(*x*,*y*), then for small changes confined to a small range of predictor values for *x* and *y*, it will often be the case that the regression approaches the partial derivative .

These properties of regression follow from least squares. The squared distance between predicted and observed values is the sum of squares, . Minimizing that distance gives the least value for the sum of squares – the least squares. All properties here follow from that minimization. Further aspects of regression depend on other assumptions. For example, many tests of statistical significance assume that the residuals have a normal distribution. Certain interpretations require that the observations be linearly related to the predictors. I do not use those further aspects and therefore do not require any assumptions about linearity or the distribution of observations and residuals.

### Phenotype as a change in the scaling of information

Selection causes populations to accumulate information. The measure of information is related to log fitness. In the analysis of selection, we often focus on phenotypes rather than fitness. Here, I show that, with respect to selection, one can think of the phenotypic scale simply as an alternative scale on which to measure information.

Begin with the expression given earlier for the change in log fitness

The regression coefficient, , changes scale from fitness, *w*, to log fitness, *m*. If we divide by , we obtain

The factor reverses the scale change, transforming from the logarithmic scale, *m*, to the linear scale, *w*.

The change in phenotype from eqn (6) can be written as

The regression changes scale from fitness, *w*, to phenotype, *z*, and reverses the direction of the change in scale. Thus

Because the information accumulated by natural selection is , we have

This expression describes the change in phenotype by selection in relation to the information gain, *J*, rescaled by the transformation from the scale of information, *m*, to the scale of phenotype, *z*. We may describe the scaling between the gain in information, *J*, and change in phenotype caused by selection, , as

Thus we can write the relation between the change in phenotype and the gain in information as

## Causes of phenotype

This section partitions the causes of phenotype into components. The next section connects the causes of phenotype to the capture and transmission of information. The following section partitions fitness into components, dividing the gain in information by selection into different causes. Boxes 3–6 provide background on regression. Box Box 7 provides citations to the literature.

### Overview

Heritability describes the expected similarity in phenotype between different individuals (Falconer & Mackay, 1996). For example, we may define the predictors of phenotype as the set of alleles in an individual, and the heritability as the part of similarity between ancestors and descendants ascribed to those alleles. Because sex and recombination break up particular combinations of alleles, adding up the effects of each individual allelic predictor often provides a good estimate of the similarity between different relatives caused by genetics.

Alternatively, we may expand the set of predictors to include certain nonlinear combinations of alleles. For example, we may have a predictor for the presence of allele A, another for the presence of allele B, and a third for the presence of both alleles. Certain expanded predictor sets may give a more accurate description of similarity between closely related ancestor–descendant pairs that are likely to share the allelic combinations, but may give a less accurate description when the allelic pairs tend to be broken up during transmission.

Here, I am primarily interested in the information that a population accumulates by selection, and how different processes may reduce or alter the transmission of accumulated information. My expressions include the classic genetic measures as special cases. But, I do not emphasize the connection to traditional genetics – the genetic interpretations are discussed in every basic textbook of genetics (Falconer & Mackay, 1996). Instead, I focus on general equations for selection and the transmission of information. In my expressions, any predictors can be used including, but not limited to, all of the traditional genetic forms.

Why bother with such abstractions? Because many extensions to basic genetic theory have been developed to cope with nongenetic effects or to analyse selection independently of genetics (Lynch & Walsh, 1998). The literature tends to deal with each particular problem as a novel challenge that requires special theory. For example, maternal effects, kin selection, cultural evolution and institutional evolution in economics all have their distinct literatures and ways of framing problems. Yet all of those problems are just examples of a general theory of selection and transmission. In any particular application, the key is to express the causes of phenotypes (characteristics) and the causes of fitness (success) by a model, or hypothesis, of how various predictors combine to determine outcome. A general theory expressed in terms of any choice of predictors defines the unifying conceptual framework ( Frank, 1997b, 1998).

### Fisher's average effect

We can separate phenotype into components by

Each type, *i*, has *n* different associated values, . From the perspective of multiple regression, the *x*'s are predictors, or independent variables, with respect to the phenotype, *z*. Each is a partial regression coefficient of *z* on . Roughly speaking, a partial regression coefficient, , describes the average change in phenotype, *z*, for a change in the associated predictor variable, .

We often focus on the general relation of a phenotype, *z*, to its components, , rather than on the particular phenotype, , of a particular type, *i*, in relation to its particular components, . Thus, we may express the general relation between a phenotype and its components as

in which one understands that the particular values of *z*, and *δ* vary for the different types, *i*, whereas the average effect of a predictor, , is a property of the population.

The regression expression applies to any predictors, . We could use temperature, neighbours’ behaviour, another phenotype, epistatic interactions given as the product of allelic values, symbiont characters or an individual's own genes. Fisher first presented this regression for phenotype in terms of alleles. Suppose each is the presence or absence of an allelic type. Then each describes the average contribution to phenotype for adding or subtracting the associated allelic type, and is called the average effect (Fisher, 1930; Crow & Kimura, 1970; Falconer & Mackay, 1996).

Predicted phenotype is

In genetic contexts, *g* is often called the breeding value (Falconer & Mackay, 1996). Using *g*, we can partition phenotype into a predicted component and a residual component

where *δ* = *z* − *g* is the difference between the actual value and the predicted value. If we take the average of both sides, we get , because .

### The components of heritability

#### The part of phenotype not transmitted

Typically, we only follow the transmission of the predictors. For example, we may follow transmission of genes plus any other variables we choose. Those effects that we include explicitly end up as part of the predicted phenotype, *g*, and as candidates for the transmitted phenotype. All effects on phenotype not explicitly included as predictors end up in the residual, *δ*. The split between the predicted phenotype and the residual is arbitrary. If we add a new predictor, any additional effect of that predictor moves from the residual, *δ*, to the predicted phenotype, *g*. Usually, we wish to give the best description of the causes of phenotype that we can. Thus, our choice of predictors defines our hypothesis about the causes of phenotype, in the sense of path analysis discussed in Box Box 5.

The part of phenotype associated with the particular set of predictors, *g*, defines one component of heritability. Aspects of phenotype not associated with the particular predictors in our model appear as a nontransmitted component of phenotype, *δ*, reducing the similarity of phenotype between ancestors and descendants associated with the predictors.

#### Change in transmitted components of phenotype

A second component of heritability arises from the stability of the effects associated with the predictors. If a predictor has effect *bx* in the original population and effect in the second population, then the transmission of that predictor is associated with a change in phenotype . Box Box 2 shows that we can express this change as

Summing over the *j* different predictors and using the definition of *g* from eqn (12) yields

On the right side, the first term describes the change in the predicted value of a type that arises from the changes in the average effects of the predictors, , holding constant the predictor values, . For example, the average effect of an allele on phenotype may be frequency dependent. Thus, the average effect will change over time as the frequency of the allele changes in the population. The second term describes the change in the transmitted predictor values, , evaluated in the context of the average effects from the second population, . For example, an allele may mutate into another form, thus weighting the average effect by a different amount.

The smaller the Δ*b* and Δ*x* values, the less the phenotype changes with respect to the transmitted predictors, and the higher the heritability associated with those predictors. Equivalently, the more stable the predictors and their average effects, the greater the fidelity at which those particular predictors transmit the information accumulated by selection to the new population.

The change in the predictors, Δ*x*, includes mutation as well as any other process that alters predictor values (Frank, 1995a, 1997b 1998; Price, 1995). For example, predictors in a descendant may derive from multiple ancestors. We can think of the mixing of predictors by considering the change in predictor values when derived from different sources. In some cases, we may wish to alter the assignment of descendants to ancestors. For example, a behaviour may influence the frequency of nondescendant types. To associate the behavioural phenotype with the change in frequency, we could assign those nondescendants to the ancestral behaviour responsible for their presence (Hamilton, 1970). In general, we can make such assignments in any way that we choose. The key is that assigning different descendants to an ancestor may alter the change in predictor values between a descendant and its assigned ancestor. Such changes may alter the fidelity at which information is transmitted (Frank, 1998). I will take up that topic in the next article (Frank, 2013).

#### The part transmitted and the change during transmission

The full, exact expression from eqn (3) for the total evolutionary change is

We can partition phenotype as *z* = *g*+*δ*, the split between the part explained by the predictors of phenotype, *g*, and the part that is not explained by the set of predictors in our model for phenotype, *δ*. From eqn (13), because , thus

With , we get

We can express each of these terms with a particular notation that emphasizes its interpretation

On the right side, the terms are the change caused by selection, the change caused by the part of phenotype that is not associated with a transmitted predictor, and the change in the effects of the predictors during transmission.

### Box Box 5. BoxCauses and predictors

Since path analysis depends on structure, and structure in turn depends on the cause-and-effect relationship among the variables, we shall first say a few words about the way these terms will be used … There are a number of formal definitions as to what constitutes a cause and what an effect. For instance, one may think that a cause must be doing something to lead to something else (effect). While this is clearly one type of cause-and-effect relationship, we shall not limit ourselves to that type only. Nor shall we enter into philosophical discussions about the nature of cause-and-effect. We shall simply use the words ‘cause’ and ‘effect’ as statistical terms similar to independent and dependent variables, or [predictor variables and response variables] (Li, 1975, p. 3).

I analyse causes of phenotypes and causes of fitness. Here, I briefly comment on the word ‘cause’. The above quote and the epigraph come from Li's book on *Path Analysis*. Li's point concerns the distinction between three levels of analysis. First, true causality describes the relations between actual forces and actual effects. Whether such things can ever be studied or known directly remains a philosophical problem beyond our scope.

Second, at the other extreme, multiple regression analysis from classical statistical theory concerns only correlations and variances. The standard theory explicitly disavows causal interpretation – correlation is not causation. Regression arises by minimizing the distance between predicted outcomes and actual outcomes – an attempt at optimal prediction. One thinks of the variables used to predict outcome simply as predictors that, in the past, would have helped one to make a better guess about what actually happened. The predictors may have direct effects themselves or be correlated with some other unseen causal factor. However, those notions of direct and unseen cause are irrelevant to the method.

Third, path analysis takes an intermediate approach. One chooses the predictors for a model as a hypothesis about cause. Rather than aim for optimal prediction, one aims for a set of variables that consistently describe the observed patterns of variation. The quality of the causal interpretation is primarily evaluated by the consistency of the hypothesized pathways in capturing the observed variance in outcome. Consistency roughly means relative stability in the magnitude of a pathway's effect under different circumstances. Although that interpretation potentially offers some insight into cause and effect, the analytical method remains multiple regression. One simply emphasizes the quality of a model as a potential causal interpretation rather than as an attempt at optimal prediction.

Consider a model in which we use genes as predictors of phenotype. In a breeding programme to improve yield, we want to predict offspring phenotype to make the best choice of breeding design. Causality is irrelevant, we aim only for a good outcome. By contrast, in a theoretical analysis of adaptation by natural selection, we want to understand the causal processes. How do the genes that affect phenotype combine to determine morphology or behaviour? How does selection influence the underlying genes and the resulting phenotypic design in relation to performance? We are after an understanding of the process. The quality of prediction will, of course, be the primary way to interpret the causal model. But a good prediction arising from the wrong underlying causal model is what we most want to avoid. Prediction becomes a method for evaluation rather than the goal.

This article analyses natural selection in relation to causal interpretations. For that reason, I think of my models of multiple regression as models of path analysis. In a different context, the same models could be thought of strictly as analyses of regression and prediction.

### Box Box 6. BoxNonlinearity

Regression and path analysis are sometimes thought to be limited to linear and additive effects. However, that is misleading. Consider *z* = *bx* + *δ*. Here, *b* is the linear relation between *x* and *z*. However, it may be that , in which the true underlying cause is *y*. Thus, we are actually regressing on a nonlinear function of a causal variable, *y*. Or, it may be that we start with . This appears to be an additive model. However, the underlying cause may be , and and . Thus, our model expresses nonlinearity and nonadditivity in the causes, *y*.

In general, any nonlinear relation can be expressed by an additive sum of terms, in which the individual terms may be nonlinear. Thus, regression can fully account for any nonlinearity by an additive sum of terms. In practice, limitations arise because we may not know the correct nonlinear relation, and so cannot express the proper sum of nonlinear terms. However, that is not a limitation of regression, but rather a limitation that arises from our ignorance. Another method of analysis does not solve the problem of our ignorance. The point is that one must distinguish limitations arising from method from limitations arising from ignorance. Confusing those different limitations is a common mistake.

### Box Box 7. BoxBrief history of evolutionary partitions

Fisher (1918, 1930) partitioned phenotype into its various genetic causes. Quantitative genetics extended the partitioning of phenotype by genetic and nongenetic causes (Falconer & Mackay, 1996; Lynch & Walsh, 1998). Models of cultural evolution use culturally transmissible attributes as predictors of phenotype (Dawkins, 1976; Cavalli-Sforza & Feldman, 1981; Boyd & Richerson, 1985).

Quantitative genetic models may also consider partitions of fitness into component causes. Recent work on partitions of fitness was stimulated by Lande & Arnold (1983). Many subsequent studies expanded that approach, including various explicit descriptions based on path analysis (Heisler & Damuth, 1987; Crespi & Bookstein, 1989; Crespi, 1990; Kingsolver & Schemske, 1991; Scheiner *et al*., 2000). I unified the different lines of study on partitions of phenotype and partitions of fitness (Frank, 1997b, 1998), motivated initially by Queller's quantitative genetic models of kin selection (Queller, 1992a, b).

In the text, I mentioned that *rB* − *C* > 0 can sometimes be interpreted in terms of group selection. For example, if neighbours’ phenotype, *y*, is an average character value in a local group, then *r* can be defined as the regression of individual character value on group character value. That group regression can be considered in a path analysis model, which is roughly the way in which Heisler & Damuth (1987) analysed group selection. In their article, they emphasized ‘contextual analysis’ similarly to the way in which I have emphasized ‘path analysis’. Frank (1995b) and Taylor & Frank (1996) also calculated *r* by regressing group value on individual value in several models, following a long tradition that blurred the mathematical distinction between kin and group selection (Hamilton, 1975; Frank, 1986).

Some of the multivariate analyses of fitness attempt to predict evolutionary dynamics, and therefore must make explicit assumptions about the distribution of phenotypes and the nature of heritability. I do not discuss dynamics; my models do not require any of those extra assumptions.

## Heritability and information

This section focuses on the amount of information that populations accumulate by selection, and the various processes that degrade or alter the transmission of that information. Some of the forms given here include the classic genetic measures of heritability as special cases. However, I do not emphasize those connections. Rather, I focus on general expressions given in terms of the full Price equation for total evolutionary change and based on predictors that may be chosen in any way. Different problems and goals will lead one to choose different sets of predictors or underlying causal schemes for phenotypes. The results here apply to any choice of predictors and causal scheme.

We start with eqn (15), the partition of phenotypic change into components

The first term on the right side is the selection component, . From eqn (11), , where changes scale between phenotype, *z*, and the gain in information by selection, *J*. Thus,

Here, selection happens in the initial (parental) population, causing a gain in information, *J*. On the phenotypic scale, that gain in information is . The remaining terms include processes that cause loss of information during transmission or cause other changes to phenotype.

### The part of phenotype not transmitted

Start by assuming that the predictors and their effects do not change during transmission, . That assumption reduces total change to

where denotes the heritable component of selection, which is the total selection, , minus the part of selective change that is not associated with predictors, . The part not associated with predictors is not explicitly transmitted within the given model of phenotype.

The second term, , has the general form (eqn (11)) of the change in information

which holds for any choice of *z*. Thus, letting *z* ≡ *δ*, we obtain . Putting this into the original expression yields

The scale change terms, *α*, have the important additivity property that, in general, . Thus,

because *g* = *z* − *δ*. The expression for the change in phenotype, ignoring the change during transmission in the predictors and their effects, is

This expression is the information gain by selection, *J*, scaled by , which relates the predicted phenotype, *g*, to the information accumulated by selection. Because *g* = *z* − *δ*, we see that the amount of information transmitted is degraded by *δ*, the fraction of the phenotype, *z*, that is not explained by the predictors.

### Change in transmitted components of phenotype

When we add back the remaining term to eqn (17), we obtain the full expression for phenotypic change as

The last term is the change in the transmitted components of phenotype. From eqn (14), those components include changes in the predictors and changes in the effects of the predictors. A predictor's effect is its associated multiple regression coefficient. Multiple regression coefficients often change with context. On the one hand, the true underlying causal effect may change. On the other hand, our model of causality may not be exactly right, in which case shifting context will cause the assigned role of different predictors to change, even though the underlying causal effects of those predictors may not have changed.

Various approaches may be taken to evaluate the accuracy of the causal model, such as the stability of the predictor effects under changing context (Li, 1975). Typically, a better causal model has predictors with greater stability, shifting the components of total change more strongly to the information term. That increase in the information term is usually advantageous with respect to interpretation, because it is often hard to evaluate the meaning of changes in predictors and their effects in the second term.

Suppose, for example, that a significant component of phenotype is not explained by a stable set of predictors. Is the information accumulated by selection in the initial population lost during transmission because it is not associated with any transmissible component? Or, is that information transmitted by other predictors that are not included in our model? If the information does transmit by predictors not in our model, that information contributes to the second term with changing values of the predictors and their effects. Such changes are hard to interpret, because many different processes can potentially alter the predictors and their effects.

These fundamental equations of selection and evolution are, in a way, rather arbitrary, because they depend so strongly on the particular set of predictors that one chooses. What can we conclude? First, the equations are always true, and so give us a clear sense of the essential nature of selection, information and evolution. Second, a key part of understanding any problem concerns choosing the right set of predictors. Third, simple genetic models provide a good starting point in many cases, but rarely define a complete set of predictors and an accurate expression of causality. If one is able to model the causal scheme well, the analysis will often be simple and natural. I have emphasized a path analysis interpretation for the regression expressions, because path analysis emphasizes the choice of a good causal model.

### Fisher's fundamental theorem

If we hold the predictors and their effects constant, then using eqn (17), the change in mean log fitness is

for *m* = *g* + *δ*. This expression for change in fitness, holding constant the predictors and their average effects, provides a generalization of Fisher's fundamental theorem of natural selection. Fisher used the presence or absence of allelic types as predictors, and the associated value of predicted fitness, *g*, as the genic value of fitness. With those definitions, the expression here is equivalent to Fisher's theorem. To translate back to the particular notation that Fisher used, one would translate the definitions for and *J* into Fisher's forms. Frank (1997b) provides the tools for the translation, following Price (1972) and Ewens (1989). The point here is that Fisher's theorem holds for any choice of predictors, as emphasized in Frank (1997b).

## Causes of fitness

The expression associates the accumulation of information by selection, *J*, with the selective component of phenotypic change. But that expression does not tell us why the association occurs. The phenotype may directly influence fitness. Alternatively, the phenotype may have no direct effect on fitness, but instead may be associated with some other process that influences fitness. A significant part of evolutionary analysis concerns evaluating the causes of fitness (Box Box 7).

We may analyse the causes of fitness in the same way that we analysed the causes of phenotype. We write our model, or hypothesis, for the causes of fitness as the regression equation

Here, *ϕ* is the baseline fitness when all other terms are zero; *π* is the average direct effect of the phenotype *z* on fitness, holding constant the other predictors of fitness; and is the average effect of the other predictors of fitness, . We may use any number of other predictors, and those predictors may be defined in any way, including factors in the model for phenotype. For example, predictors can be alleles, nonlinear interactions between combinations of alleles, symbionts, maternal effects, cultural or environmental attributes, other phenotypes, phenotypes of neighbours and so on. The residual, , is the difference between the predicted value of fitness for a given set of predictors and the actual fitness.

### A simple example

To study the role of different predictors of fitness, it is useful to reduce the model to just the direct effect, *z*, and one indirect effect, *y*, yielding

In this partial regression equation, it is helpful to write out the regression coefficients in full notation to emphasize their interpretation. The partial regression coefficient is the average effect of *z* on *w* holding *y* constant, and is the average effect of *y* on *w* holding *z* constant, thus

### Condition for the increase of a phenotype by selection

Using the standard covariance form for selection based on eqn (6), the partial change in *z* caused by selection is

which simply states that *z* increases by selection when it is positively associated with fitness. However, we now have the complication shown in eqn (19) that fitness also depends on another predictor, *y*. If we expand the covariance using the full expression for fitness in eqn (19), we obtain

If we replace the covariance term by the product of a regression coefficient and a variance, , we have

The condition for the increase of *z* by selection is . The same condition using the terms on the right side is

Let us use an abbreviated notation for the three terms

The first term, , describes the association between the phenotype, *z*, and the other predictor, *y*. An increase in *z* by the amount Δ*z* corresponds to an average increase of *y* by the amount (see Box Box 4)

The second term, , describes the direct effect of the other predictor, *y*, on fitness, holding constant the focal phenotype, *z*. The third term, , describes the direct effect of the phenotype, *z*, on fitness, *w*, holding constant the effect of the other predictor, *y*.

Using the abbreviated notation, the condition for the increase in *z* by selection is

The following sections interpret this condition in terms of three different biological scenarios.

### Interactions between two species

I trace the effects of phenotype *z* in species A and phenotype *y* in species B on the fitness of types from species A (Frank, 1994, 1995c 1997a). One may think of species B as an ecological partner that can influence the fitness of types from species A. Here, fitness always refers to effects on species A.

#### Unknown cause of association

I follow the path diagram in Fig. 1a. Increases in the phenotype, *z*, by an amount Δ*z*, reduce fitness by −*C*Δ*z*. Increases in the phenotype *y* directly benefit fitness by *B*Δ*y*. The *z* and *y* phenotypes are associated by *r*, although no specific cause is known. It may be that similar phenotypes tend to settle in the same area, or that a common environment of temperature and nutrients causes a phenotypic association. In any case, as *z* increases, the associated value of *y* changes on average by Δ*y* = *r*Δ*z* and, equivalently, *B*Δ*y* = *rB*Δ*z*.

Tracing the pathways in Fig. 1a, an increase in the direct phenotype by Δ*z* causes a change in fitness proportional to (*rB* − *C*)Δ*z*, which is greater than zero when *rB* − *C* > 0. Thus, selection may favour an increase in *z* even though *z* directly decreases fitness, because the benefit from species B's phenotype, *y*, in proportion to *rB*, may outweigh the direct cost, −*C*.

#### Direct cause of association

Alternatively, suppose that the phenotype *z* directly enhances the vigour of its partners from species B. That direct effect of *z* on species B causes an increase in the benefit, *y*, that species B provides back to those with phenotype *z*. Fig. 1b shows this direct cause of *y* by *z*. The condition for *z* to be positively associated with fitness and to increase by selection remains *rB* − *C* > 0. However, the interpretation differs. In this case, *z* directly influences its neighbours’ phenotype, *y*, rather than being associated with *y* by some unknown cause.

### Body temperature

Suppose *z* is body temperature, which imposes a direct effect −*Cz* on fitness. That direct cost may arise because body temperature raises the rate at which energy is used. Let *y* be speed of response to a challenge, such as a predator attack. Faster response provides a direct benefit, *By*. An unknown cause may associate temperature, *z*, and response rate, *y*, by an amount *r* (Fig. 1a). For example, sunshine may directly raise temperature and simultaneously increase response to attack by providing better visual opportunities. Alternatively, temperature, *z*, may directly raise response rate, *y*, by increasing the responsiveness of muscles (Fig. 1b). In either case, selection favours an increase in body temperature if *rB* − *C* > 0.

### Social evolution and group selection

The phenotype *z* may be a costly altruistic behaviour that helps neighbouring individuals (Hamilton, 1970; Queller, 1992a, b; Frank, 1998). The direct effect on fitness is −*Cz*. Neighbours have phenotype *y* that provides a benefit, *By*, back to the original individual. An association, *r*, between *z* and *y* may arise in a variety of ways.

Some unknown cause may associate *z* and *y* (Fig. 1a). For example, shared cultural, environmental or genetic variation may cause related behaviour. Or a shared symbiont may cause an association. In general, any association in the predictors of phenotype will cause an association of phenotypic values.

In other cases, the altruistic phenotype, *z*, may directly enhance neighbours’ beneficial behaviour, *y*, in proportion to *r* (Fig. 1b). For example, the level of *y* in the neighbours may depend on the probability of the neighbours’ survival. If an increase in *z* raises neighbours’ survival in proportion to *r*, that increase in survival enhances the expression of the neighbours’ behaviour, *y*, which has a beneficial effect on fitness of *By*.

Whether *r* arises from unknown causes (Fig. 1a) or from the direct effect of *z* on *y* (Fig. 1b), we can trace the effect of an increase in *z* on fitness. The condition for an increase in *z* to raise fitness is *rB* − *C* > 0.

In some cases, we may interpret the condition *rB* − *C* > 0 in terms of group selection (Hamilton, 1975). For example, *z* may measure individual restraint in the harvesting of nonrenewable resources (Frank, 1995b). Greater restraint reduces the direct benefit to the individual, because it means less resource harvested, with an effect on fitness of −*Cz*. Neighbours’ phenotype, *y*, may be the average restraint among individuals in a local group with regard to harvesting nonrenewable resources.

Greater group restraint provides a benefit to all members of the group, including our focal individual, by providing greater local productivity through maintenance of nonrenewable resources. The benefit of group restraint on individual fitness is *By*. The association between an individual's phenotype, *z*, and the group phenotype, *y*, is *r*. Thus, when *rB* − *C* > 0, individual restraint evolves and provides a joint benefit to all group members. Here, the two predictors of fitness are individual behaviour, *z*, and average group behaviour, *y*. This type of group selection is just a special case of partitioning the causes of fitness, in which one of the predictors is a group attribute (Box Box 7).

## Causal structure

All of these examples share a common causal structure. We are interested in the change in a phenotype, *z*, caused by selection. Fitness depends on two predictors: the phenotype of interest, *z*, and another predictor, *y*. In all cases, the condition for the increase in *z* by selection is *rB* − *C* > 0. This condition is just the partition of the causes of fitness into two components. The direct effect on fitness of *z* is −*C*, and the direct effect of *y* is *B*. We multiply *y* by *r* to change the scale of the effect from *y* to *z*, because the net effect must be the relation between *z* and fitness, *w*.

We can see the logical relations and the units for the various scales by writing out the full notation

Box Box 3 shows that a regression coefficient, , has units Δ*x*/Δ*y*. Taking the terms of the above equation in order from left to right, the units are

The ratio Δ*w*/Δ*z* is the change in fitness, *w*, per unit change in the phenotype, *z*. That ratio is the slope of fitness on phenotype. When the slope is positive, selection favours the increase of the phenotype. In any analysis of this sort, the term

rescales changes of the secondary predictor, Δ*y*, with respect to changes in the primary scale, Δ*z*.

The key point is that *rB* − *C* > 0 simply partitions fitness into the direct effect of a phenotype plus the indirect effect through a secondary predictor. The true causal structure will, of course, frequently depend on multiple secondary causes, as in eqn (18). Multiple causes lead to an expanded expression for the increase of *z* caused by selection, , as

in which each is the regression of on *z*, and each is the partial regression of *w* on holding constant the other factors. One may also need to consider cascading causes or hidden factors in the sense of path analysis (Li, 1975). The simple expression *rB* − *C* > 0 should be thought of as a convenient example to illustrate the logic of partitioning the causes of fitness, or as the expression of simplified models that isolate two opposing processes.

In this section, I have analysed the partitioning of fitness. I have not discussed the partition of phenotype into components, *z* = *g* + *δ*, where *g* is the sum of the predictors of phenotype. The amount of information accumulated by selection that can be transmitted depends on the slope of fitness, *w*, relative to the transmissible predictors of phenotype, *g*. If we think of *g* in terms of the genetic predictors of phenotype, then *r* can be interpreted as a genetic relatedness coefficient, and *rB* − *C* > 0 calls to mind Hamilton's rule from the theory of kin selection (Hamilton, 1970). The next article takes up the relations between kin selection and the general analysis of the causes of fitness and the causes of phenotype (Frank, 2013). A full evolutionary analysis also requires attention to other causes of change, , in eqn (16) (Frank, 1997b, 1998).

It is important to relate the causes of fitness to information, which is the ultimate scale for selection. Box Box 8 connects the partitions of fitness in this section to the expressions of information given earlier in this article.

### Box Box 8. BoxInformation and the causes of fitness

Changes caused by selection can always be related to the change in information accumulated by the population. For example, the change in phenotype caused by selection from eqn (11) is

where *J* is the change in information by selection, and relates the scale of information to the scale of phenotype. We can examine the units of the scaling term

which is the ratio of two regression coefficients (eqn (10)). A regression coefficient, , has units Δ*z*/Δ*y*, when used as a scaling relation for changes in average values (Box Box 3). Thus, the units for the scaling relation, , are

The term Δ*m* has units of change in log fitness. Changes in log fitness are equivalent to changes in information, *J* (eqn (8)). To emphasize that *J* is a change in information, write the units on *J* as Δ*I*. Thus, the scaling factor

is the change in phenotype relative to the change in information.

One must learn to read the regression coefficients as scaling factors that change units. Once one learns to recognize the scale changes, and the key units such as information and phenotype, the fundamental equations can be read like a sentence. When analysing selection, I prefer information as the ultimate scale, because selection is the process by which populations accumulate information.

With that background, I present a long sentence to translate the causes of fitness into an expression for the change in information. Start with eqn (20) and divide both sizes by , yielding

The units are

the change in information by selection. All of the regression coefficients in the prior equation change scales for the various terms, and we also have , which has units . The net units of the long right side are Δ*I*, the change in information. The right side appears complex. But each term has a simple, readable meaning with respect to the effect of a predictor on fitness, and the scale changes required to transform those effects into the common units of information. To understand selection, we often need to decompose fitness and phenotypes into their component causes. Such decomposition requires that we combine all the components properly to recover the correct scale of analysis.

## Discussion

I first partitioned phenotype with respect to a set of hypothesized causes. I then partitioned fitness with respect to a different set of hypothesized causes. Finally, I placed those partitions of phenotype and fitness into a general expression for selection and evolutionary change. Those steps allowed me to express heritability, selection and evolutionary change in terms of causal components.

I also translated the standard expressions of selection and evolution, given in terms of regressions, covariances and variances, into expressions for the change in information. In my view, selection is best interpreted as the accumulation of information by populations (Frank, 2012a). Other evolutionary processes often cause a decay in the transmission of information. The information expressions allow one to read the equations of selection and evolution as if they were sentences. Those sentences express the fundamental relations between the causes of phenotypes and fitness and the consequences for the change in information by evolutionary processes.

I showed that the commonly used regressions coefficients in models of selection and evolution can be understood as coefficients for the change in scale with respect to the ultimate scale of information (Box Box 4). For example, the change in a phenotype caused by selection can be understood as a rescaling of the change in information accumulated by selection. Certain measures of heritability, often expressed as regression coefficients, are the change in the scaling of information from one phenotype to another. For example, a parent–offspring regression may describe the change in scale between parent and offspring phenotype with respect to the underlying information content in those phenotypes.

My extended development in terms of causal components and information may, at first, seem like a lot of technical complication. We are, after all, simply modelling selection, heritability and other widely studied evolutionary processes. Many models of those processes seem more direct and concise. My goal is to go beyond common calculations or common applications. The more abstract and exact models here provide a conceptual guide for understanding how selection actually works, how populations accumulate information, and how that information is transmitted or lost.

I have also traded the certainty of the standard models of genetics for the uncertainty that arises when we freely choose our predictors as causal hypotheses. In my view, the apparent certainty of genetics is often misleading. We know that many factors influence phenotypes in addition to the narrowly defined allelic types of genes. Traditionally, a specific extended model deals with each additional factor: cytoplasmic inheritance, nonlinear genetic interactions, maternal effects, social interactions and so on. By describing each of those aspects as a special situation, one ends up with a catalogue of special models.

The models here show how to think in general about a variety of causal structures. Those models are only as good as the particular hypothesized system of causality that we choose. But that is also true for genetic models and for every other model, whether or not we admit it openly. Here, I have traded the false sense that there are a few standard models for the more realistic view that one has to bring a good hypothesis to an analysis to get a good understanding of phenotypes and selection.

Hamilton (1970) made clear the central role of causal analysis in kin selection theory

Considerations of genetical kinship can give a statistical reassociation of the [fitness] effects with the individuals that cause them.

The seemingly endless debates about kin selection arise from failure to recognize that the theory is ultimately a way of framing causal hypotheses (Frank, 1997b, 1998). The following article develops kin selection as a method of causal modelling.

## Acknowledgments

National Science Foundation grant EF-0822399 supports my research.