Our aim here is to derive the response to general selection of the mean vector of breeding values of a multivariate trait without making assumptions on the distribution of breeding values. Then, we shall apply this equation to derive the response to selection and the lag for the moving optimum model used in the main text.

We consider *I* traits, and the trait values of individuals are given by *z*_{1} = *x*_{1} + *e*_{1}, *z*_{2} = *x*_{2} + *e*_{2}, …,*z*_{I} = *x*_{I} + *e*_{I}. Thus, each individual is characterized by a column vector of trait values (**z**), a column vector of genotypic (i.e. breeding) values (**x**) and a column vector of environmental effects (**e**). Let *W*(**z**) denote the fitness of individuals of phenotype **z**, and *w*(**x**) the (average) fitness of individuals of genotypic values **x**. Then, clearly, the mean fitness satisfies . If *P*(**x**) denotes the distribution of breeding values, the exact change in the distribution of breeding values caused by selection is

- (A.1)

We follow the approach devised by Bürger (1991) and Turelli & Barton (1994) and use cumulants and higher-order selection differentials to derive the multivariate selection response of the mean. Denoting the moment-generating function of *P*(**x**) by Ψ(**ξ**) and the cumulant-generating function by Φ(**ξ**) = ln(Ψ(**ξ**)), the response to selection in terms of generating functions (Turelli & Barton, 1994; Bürger, 2000, p. 174) is

- (A.2)

where Γ(**ξ**, **η**) = exp [Ψ(**ξ** + **η**) − Ψ(*η*)] − exp [Ψ(**ξ**)]

We denote the multivariate cumulants of order 1, 2, 3, 4, … of the (multivariate) distribution of breeding values by *κ*_{i}, *κ*_{ij}, *κ*_{ijk}, *κ*_{ijkl}, …, respectively. The first-order cumulants are just single-trait means, the second-order cumulants are the genetic variances and covariances, and the third-order cumulants are the third-order central moments, that is,

- (A.3)

Of course, *i*,* j* and *k* need not be different and the order is irrelevant, thus *κ*_{iij} = *κ*_{iji} = *κ*_{jii}.

The response of the mean of trait *i* is obtained by differentiating (eqn A.2) once with respect to *ξ*_{i}. Paraphrasing the derivation of eqn (17a) in Turelli & Barton (1994) and specifying their formula to *U* = {*i*}(and *V* = {*i*,* j*,* k*, …,}) produce the fundamental equation for the selection response of the mean:

- (A.4)

Because recombination, symmetric mutation, random mating and random genetic drift do not alter the means (e.g. Bürger, 2000), the response of the means across generations is also given by eqn (A.4), that is .

In principle, the above approach could be generalized to derive the response to selection of the variances, covariances and higher cumulants of the distribution of breeding values. However, as already envisaged in the univariate case (Turelli & Barton, 1994; Bürger, 2000), this leads to enormous complications because not only the effects of recombination and genetic drift have to be taken into account, but also genetic details, such as number of loci and distribution of allelic effects at each locus, influence the evolutionary dynamics of the cumulants of order > 1.

Now we apply eqn (A.3) to our model. Assuming weak selection, we can approximate the fitness of individuals with vector **x** of genotypic values by

- (A.5)

Here, **Ω** is a matrix that describes the curvature of the fitness landscape for breeding values, and **θ** is the position of the optimum. We write *a*_{ij} for the entries of the *I* × *I* matrix **Ω**^{−1}. In the notation of the main text, **Ω** = **ω** + **E**. Then

- (A.6)

From eqn (A.4) and because all partial derivatives of with respect to cumulants of order higher than two vanish, the response of the genotypic means to selection is given by

- (A.7)

This formula is approximate only because we assumed weak selection.

To present eqn (A.7) in instructive vector form, we write

- (A.8)

which is the classical selection gradient, and

- (A.9)

which is the second-order selection gradient (see above) and a vector of length *I*^{2}. Finally, we define the *I* × *I*^{2} matrix

- (A.10)

Now we can cast (A.7) in matrix form:

- (A.11)

Simple calculations show that

- (A.12)

- (A.13)

From now on, we assume a constantly moving optimum, such that **θ**(*t*) = *t*Δ**θ**, where *t* is time in generations and Δ**θ** represents the amount the optimum moves per generation. If the expected lag is defined as then (A.12) yields

- (A.14)

Because for a constantly moving optimum, we must have , a simple calculation invoking (A.11) produces

- (A.15)

We use this equation in the main text, but use slightly different notation for **C** for the sake of clarity. In the two-trait case,

- (A.16)

where .

The attentive reader may note that in the absence of skew, (eqn A.14) and (4) differ slightly from the Gaussian prediction (3) because **ω** + **P** ≠ **Ω**. The reason is that the derivation of eqn (A.14) assumes weak selection, which is not assumed in deriving (3). Clearly, under weak selection, **ω** + **P** = **Ω** + **G** ≈ **Ω**.