### Abstract

- Top of page
- Abstract
- Introduction
- Predicting breeding values
- The accuracy of PBVs
- Application to natural populations and its complications
- Considerations for future research
- Conclusions
- Acknowledgments
- References
- Appendix

The ability to predict individual breeding values in natural populations with known pedigrees has provided a powerful tool to separate phenotypic values into their genetic and environmental components in a nonexperimental setting. This has allowed sophisticated analyses of selection, as well as powerful tests of evolutionary change and differentiation. To date, there has, however, been no evaluation of the reliability or potential limitations of the approach. In this article, I address these gaps. In particular, I emphasize the differences between true and predicted breeding values (PBVs), which as yet have largely been ignored. These differences do, however, have important implications for the interpretation of, firstly, the relationship between PBVs and fitness, and secondly, patterns in PBVs over time. I subsequently present guidelines I believe to be essential in the formulation of the questions addressed in studies using PBVs, and I discuss possibilities for future research.

### Introduction

- Top of page
- Abstract
- Introduction
- Predicting breeding values
- The accuracy of PBVs
- Application to natural populations and its complications
- Considerations for future research
- Conclusions
- Acknowledgments
- References
- Appendix

Evolutionary biology tries to understand how variation amongst individuals, amongst groups of individuals, and across time has arisen, and how it is maintained (Endler, 1977; Roff, 1997; Schluter, 2000). This requires the separation of phenotypic variation into its underlying environmental and genetic components. Recently, the application of mixed model methodology in combination with a so-called ‘animal model’ has been used to estimate quantitative genetic parameters in natural populations with known pedigrees (see Kruuk, 2004 for an extensive review). In contrast to the more traditional quantitative genetics methods (see Falconer & Mackay, 1996; Lynch & Walsh, 1998), these methods use all the information that is available in a pedigree, and do not require a specific pedigree structure. Furthermore, they are able to accommodate selection and inbreeding, which are common phenomena in the great majority of natural populations (Kingsolver *et al.*, 2001; Keller & Waller, 2002). Finally, it is possible to include additional environmental effects in the model, which are estimated simultaneously. For these reasons, the animal model is in theory highly suitable for the estimation of quantitative genetic parameters in natural populations (Kruuk, 2004).

The animal model may help to answer two types of questions, both of which are central to the study of evolution. First, on a population level it provides a method to separate the phenotypic variance we observe in the field into the underlying genetic and environmental variance components, and thus for estimating heritabilities (Knott *et al.*, 1995; Lynch & Walsh, 1998; Kruuk, 2004). Secondly, and more excitingly, it makes it possible to separate environmental and genetic effects on both an individual and a population level. More specifically, it allows for the quantification of the sum of the additive effects of an individual's genes for a given trait, and thus the expected effect of the genes that it passes on to its offspring, also referred to as its ‘breeding value’ (Lynch & Walsh, 1998; Kruuk, 2004).

The ability to predict breeding values for free-living animals can be considered as one of the more important recent advances in the field of evolutionary genetics of natural populations. However, whereas it is nowadays the most widely accepted method for the genetic evaluation of livestock (Mrode, 1996), its use in studies of natural populations is still in its infancy. Although a huge amount of literature on the prediction of breeding values exists, this is mainly part of the animal breeding literature, and is often of a rather technical nature (see e.g. Mrode, 1996; Cameron, 1997). Although Lynch & Walsh (1998) and Kruuk (2004) do discuss the theory specific to the animal model, they assume that the basics of the prediction of breeding values are known by the reader. A basic understanding of the underlying theory is, however, essential for a correct application of the methods and interpretation of the results.

The subject of this article is the prediction of breeding values in natural populations using animal model methodology, and especially their interpretation. In the first section of the article, I will provide a brief theoretical introduction into the rationale behind the prediction of breeding values. I will then address the accuracy of predicted breeding values (PBVs), and show how an individual's PBV differs from its true breeding value. I will show that, as an animal model also uses the observations on the individual for whom we want to predict the breeding value, PBVs will partly reflect the environmental component of an individual's phenotype. As a consequence, patterns in PBVs may, in the absence of sufficient pedigree information, simply reflect phenotypic patterns and provide little additional information. Furthermore, the variance of PBVs is lower than that of the true breeding values, which has major implications for the quantification of selection pressures acting upon them, and especially for their comparison with phenotypic measures of selection. In the second part, I then take a more applied approach. I will identify the main questions that PBVs have been used for, and critically evaluate their suitability for answering these questions. I will, however, not only point at the limitations of PBVs, but also present solutions and make suggestions for future research.

### Predicting breeding values

- Top of page
- Abstract
- Introduction
- Predicting breeding values
- The accuracy of PBVs
- Application to natural populations and its complications
- Considerations for future research
- Conclusions
- Acknowledgments
- References
- Appendix

Every phenotypic observation on an individual (*y*_{i}) can be written as the sum of the population mean (or the mean for a specific subset of the population) (*μ*), its breeding value (*a*_{i}), and one or more environmental effects (*e*_{i}). So in its simplest form,

- (1)

We cannot directly measure an individual's breeding value for a given trait (here referred to as its true breeding value, or *a*). There are, however, two main sources of information available to obtain a prediction of *a* (here referred to as ), namely phenotypic observations on the individual itself and on its relatives. The prediction of breeding values from any of these groups of information is based on a similar principle and requires the formulation of the relationship between the observation we want to use and the focal individual's true breeding value.

The slope of a regression of breeding values on individual phenotypes is equal to the heritability (Falconer & Mackay, 1996). In the situation where all we have is a single observation on individual *i* (referred to as *y*_{i}) and no information on its relatives, our best prediction of its breeding value is therefore

- (2)

where *h*^{2} is the heritability of the trait (Mrode, 1996). In the absence of any additional information, the best prediction of an individual's breeding value is thus the phenotypic deviation from the mean, multiplied by the heritability of the trait. Although for some traits we may be able to obtain a more informative and biologically more interesting prediction by using additional observations on an individual made at different moments in time, observations on relatives provide better and more informative predictors of an individual's breeding value (Falconer & Mackay, 1996; Lynch & Walsh, 1998). To predict the breeding value of individual *i* (*a*_{i}) from the phenotype of relative *j* (*y*_{j}), we formulate the slope of the regression of *a*_{i} on *y*_{j}, and use this in eqn 2 instead of the heritability. This slope will, in addition to the heritability, depend on the proportion of genes shared between *i* and *j*, and thus their relatedness, where the slope will decrease with decreasing relatedness (Mrode, 1996).

Often, we have different sources of information, both on the individual for which we want to predict a breeding value and on a range of its relatives, and both on the trait of interest and on genetically correlated traits. After the appropriate weighing, these sources of information can be combined into one. This combined prediction is referred to as a selection index in the animal breeding literature. The procedure by which the factors weighing the individual sources of information are obtained is similar to that employed in obtaining the individual regression coefficients in multiple linear regression and is based on the phenotypic and additive genetic variances and covariances amongst observations (Falconer & Mackay, 1996; Mrode, 1996).

Using selection index methodology, we can address the question how the different sources of information contribute to an individual's PBV, or in other words, what are the relative sizes of the partial regression coefficients? In Fig. 1a, it is illustrated that the contribution of relatives to an individual's PBV declines rapidly with decreasing relatedness. Although the pedigree in Fig. 1a goes back many generations, the average relatedness is low. As a consequence, the PBV of the focal individual is based mainly on its own phenotype. In Fig. 1b on the other hand, the pedigree contains the same number of individuals but spans only two generations, resulting in a higher average relatedness and in a PBV for the focal individual that is based mainly on the observations on its offspring. Also note that the contribution of observations on relatives to the prediction of an individual's breeding value declines with increasing heritability. Consequently, although every individual has two parents and the number of relatives does thus double with each generation we go back in time, the increase in the number of relatives cannot compensate fully for the decrease in the contribution of an observation to an individual's PBV.

Where on the continuum between the two extremes depicted in Fig. 1, the pedigree of a population will be located will depend on the type of trait under investigation, the life-history of the species and the characteristics of the population. For example, the pedigree for a sex-limited trait that can only be measured in adults (e.g. clutch size) in a mainland population of a passerine bird will be more similar to (a), whereas a pedigree for a juvenile trait like birth weight in an island population of red deer will be more similar to (b). On the whole however, Fig. 1 emphasizes that even if observations on a range of relatives are available, the observations on the focal individual itself make a substantial contribution to the prediction of its breeding value, and that especially for traits with a relatively high heritability the remaining information will come mainly from close relatives.

Selection index methodology as employed above assumes that all individuals come from a population with the same mean, and that this mean (and thus the environment) remains constant across both time and space (Falconer & Mackay, 1996; Mrode, 1996). A way around this assumption was provided for by the development of mixed model methodology, and more specifically a method called Best Linear Unbiased Prediction, or BLUP (Henderson, 1949, 1950). BLUP allows for the prediction of random additive genetic effects (the breeding values) as well as any additional random environmental effects, while simultaneously estimating several fixed effects (different population means). Note, however, that in the absence of additional fixed and random effects a BLUP is equivalent to a selection index (Mrode, 1996).

To obtain (best linear unbiased) PBVs, we rewrite and generalize eqn 1 using mixed model and matrix notation into

- (3)

where **y** is a vector containing all phenotypic observations, *β* is a vector of fixed effects, and **a** is a vector containing the breeding values. **X** and **Z** are incidence matrices that link phenotypic observations to fixed and random effects, respectively. Furthermore, the BLUP of the vector of breeding values **a** is

- (4)

Note that if we only had a single record on one individual, and no information on relatives, this reduces to eqn 2. Using instead of *μ* allows for the incorporation of multiple fixed effects, using **GV**^{−1} instead of *h*^{2} enables us to use information on all related individuals and account for inbreeding, and using **Z** allows for repeated or missing records on individuals (Mrode, 1996; Lynch & Walsh, 1998; Kruuk, 2004).

In summary, the combination of an animal model in combination with mixed model methodology like best linear unbiased prediction makes it possible to predict an individual's breeding value from a range of sources of information, and simultaneously takes into account potentially confounding environmental effects. However, because of the way in which the different sources of information are weighed, breeding values are predicted for a large part from observations on the individual itself and on close relatives.

### The accuracy of PBVs

- Top of page
- Abstract
- Introduction
- Predicting breeding values
- The accuracy of PBVs
- Application to natural populations and its complications
- Considerations for future research
- Conclusions
- Acknowledgments
- References
- Appendix

Above we have seen how we can use different sources of information to obtain a prediction for an individual's breeding value. The quality of this prediction will depend on the amount and the type of information used. Such a quality measure is provided by the correlation between the true and the PBVs (Falconer & Mackay, 1996; Mrode, 1996; Cameron, 1997). This correlation is referred to as the accuracy *r* of the PBVs. As the covariance between true breeding values and PBVs is equal to the variance in PBVs [so , the accuracy is given by

- (5)

where the variance in true breeding values (*σ*^{2}(*a*)) is equal to the additive genetic variance. Usually, the accuracy of PBVs is, however, expressed in terms of the reliability, which is the squared correlation between true and PBVs (*r*^{2}). The reliability is equal to the proportion of the additive genetic variance (or the variance in true breeding values) that is accounted for by the PBVs. If we only have a single observation on an individual, and no observations on relatives, the reliability of its PBV is equal to the heritability, and the reliability increases as the number of observations and the average relatedness increases (Falconer & Mackay, 1996; Mrode, 1996). For example, the reliability of the PBV of individual one in Fig. 1a is only 0.29, given a heritability of 0.25. In contrast, the reliability of its PBV in Fig. 1b is 0.49, while if we reduce the number of offspring to four, the reliability goes down to 0.38. It is thus the difference between the heritability and the reliability that provides insight into how much additional information is provided by the PBVs, on top of the phenotypic values.

The proportion of the variance in true breeding values that is not accounted for by the PBVs is referred to as the prediction error variance (PEV), so

- (6)

From Appendix 1, it follows that the PEV is also equal to the covariance between the PBV and the residual environmental effect (*e*), so

- (7)

This relationship implies that an individual's PBV does not only depend on its true breeding value, but also at least partly reflects the environmental component of its phenotype.

As the covariance between true breeding values and phenotypes is equal to the covariance between PBVs and phenotypes [so ; see Appendix 1], the regression of PBVs on individual phenotypes is always equal to the heritability, irrespective of the reliability of the PBVs. The variance of the PBVs, however, goes down with decreasing reliability. As a consequence, the correlation between PBVs and phenotypes is stronger than the correlation between true breeding values and phenotypes.

In this section, I have shown that PBVs differ from true breeding values in two interrelated ways. Firstly, the variance in PBVs is lower than the variance in true breeding values (or the additive genetic variance). Secondly, PBVs are partly reflecting the environmental component of an individual's phenotype, and patterns in PBVs will thus always resemble the pattern in the phenotypes more than the underlying true breeding values. These two characteristics of PBVs have important implications for their interpretation when applied to evolutionary questions, which I address in the following section.