Fluctuating asymmetry and developmental instability in evolutionary biology: past, present and future

Authors


Stefan Van Dongen, Group of Evolutionary Biology, University of Antwerp, Groenenborgerlaan 171, B-2020 Antwerp, Belgium.
Tel.: +32 (0)3 265 33 36; fax: +32 (0)3 265 34 74;
e-mail: stefan.vandongen@ua.ac.be

Abstract

The role of developmental instability (DI), as measured by fluctuating asymmetry (FA), in evolutionary biology has been the focus of a wealth of research for more than half a century. In spite of this long period and many published papers, our current state of knowledge reviewed here only allows us to conclude that patterns are heterogeneous and that very little is known about the underlying causes of this heterogeneity. In addition, the statistical properties of FA as a measure of DI are only poorly grasped because of a general lack of understanding of the underlying mechanisms that drive DI. If we want to avoid that this area of research becomes abandoned, more efforts should be made to understand the observed heterogeneity, and attempts should be made to develop a unifying statistical protocol. More specifically, and perhaps most importantly, it is argued here that more attention should be paid to the usefulness of FA as a measure of DI since many factors might blur this relationship. Furthermore, the genetic architecture, associations with fitness and the importance of compensatory growth should be investigated under a variety of stress situations. In addition, more focus should be directed to the underlying mechanisms of DI as well as how these processes map to the observable phenotype. These insights could yield more efficient statistical models and a unified approach to the analysis of patterns in FA and DI. The study of both DI and canalization is indispensable to obtain better insights in their possible common origin, especially because both have been suggested to play a role in both micro- and macro-evolutionary processes.

Introduction

Why are developmental instability (DI, i.e. an individuals inability to buffer its development against random noise) and fluctuating asymmetry (FA, i.e. the phenotypic outcome of DI) of interest to evolutionary biologists? First of all, FA has been suggested to reliably reflect stress experienced during development and to predict individual fitness. FA is (seemingly) easy to measure and analyse, whereas fitness is hard to estimate directly especially in natural populations. Secondly, the study of the observable FA allows tracking evolutionary changes in a complex and poorly understood set of processes, namely the unobservable DI. Thirdly, FA and DI have been suggested to play an important role in sexual selection and the evolution of mate choice, both aspects which are recognized as an important evolutionary force. Finally, it has been suggested that developmental homeostasis – the combined effect of DI and canalization (Debat et al., 2000) – could mask genetic variation, such that new mutations can escape from natural selection. Under stress, developmental homeostasis may decrease and this cryptic genetic variation may become expressed (Rutherford & Lindquist, 1998; Sangster et al., 2004). These points of interest obviously have important implications for other fields. The application of FA as a monitoring tool critically depends on the evolutionary dynamics of both stress resistance and DI.

In spite of half a century of research, however, the study of FA and DI has developed into a controversial area where the same topics are being studied as was done by the pioneers in the 1950s and 1960s (Polak, 2003). The vast heterogeneity in observations and oscillations of ideas over time (see e.g. the many reviews in Polak, 2003) can easily yield distrust such that even the most exciting biological phenomena may become ignored (Palmer, 2000). Indeed, while there has been a boost of studies of FA in evolutionary biology over the past 10–15 years, there appears to have been a steady decline in studies of FA and DI since 1999 (Fig. 1). Although we should not exclude the importance of year-to-year fluctuations and the increased professionalization of the field (Tomkins & Simmons, 2003), this apparent decrease should not be interpreted as an indication that we fully understand the evolutionary properties of FA and DI. Several reasons may account for this apparent abandoning of the fields. The apparent high heterogeneity has led to a general acknowledgement that FA is neither a ubiquitous and simple measure of stress and fitness nor a straightforward measure of DI, hampering its general application. The measurement and statistical analysis of FA is quite cumbersome and complex, and very little is known about the underlying mechanisms of DI and what causes individuals to differ in their degree of FA and presumably DI as well (Klingenberg, 2003). However, these observations should rather stimulate further research to obtain more in depth insights in the evolutionary properties of DI. The biological reasons for the observed heterogeneity are largely unknown (Polak, 2003). Besides the general lack of understanding of the underlying mechanisms of developmental homeostasis, there is a challenging task for evolutionary biologists to gain better insights in the evolutionary properties of DI as well as canalization, because both have been argued to have a common background (Waddington, 1957, but see e.g. Debat et al., 2000; Dworkin, 2005; Willmore et al., 2005). The aim of this paper was to stimulate a renewed interest in evolutionary studies of FA and DI (preferably in conjunction with canalization) pinpointing towards specific lacunes in our current understanding and to specify particular pitfalls when applying FA as a measure of DI. Because there has been a wealth of good review papers with respect to different aspects of FA and DI recently, this is not another exhaustive review. Rather, an overview of the current state of knowledge will be presented referring to recent reviews and selected empirical studies.

Figure 1.

 Evolution of number of studies on developmental instability in evolutionary journals for the past 30 years. For comparison, the increase in total number of publications is added (dashed line). A decreasing tendency in the interest in developmental instability studies is indicated by the smoothed curve (Bold line). The number of studies was obtained from a search on web of science performed on the 15th of November 2005. Papers were selected on the basis of journal (journals with the word ‘evolution’ or ‘evolutionary’ in their title) and the term ‘fluctuating asymmetry’, ‘developmental instability’ or ‘developmental stability’ in the title, abstract or keywords.

Developmental noise, (in)stability and asymmetry

Getting the definitions straight

The terms developmental homeostasis, canalization, developmental noise, developmental stability and instability have been used interchangeably in the literature. To avoid confusion, a short overview of current ideas is given and the definitions used in this paper are identified. The core idea about FA as a measure of DI is that both sides of an organism can be viewed as independent replicas of the same developmental event. Both sides share the same genotype and in a homogeneous environment (i.e. identical on both sides), they are under the influence of the same external factors. During development, small random perturbations cause the developmental pathway to deviate from its expected trajectory under the given environmental conditions. As these processes act locally, thereby affecting only one body part, their effects will accumulate on left and right side separately, leading to asymmetric phenotypes. The sensitivity to random perturbations – or developmental noise– can be viewed as the tendency of a developmental system to produce a morphological change in response to these perturbations and is often called developmental instability (e.g. Klingenberg, 2003; Nijhout & Davidowitz, 2003). In this definition, developmental noise and instability are treated as synonyms (as on p. 4 in Nijhout & Davidowitz, 2003; see also Palmer, 1994), or DI is assumed to reflect the outcome of developmental noise and a developing systems sensitivity to the perturbations (as in Klingenberg, 2003). On the other hand, however, developing systems are also characterized by buffering capacities, a property termed developmental stability. In Klingenberg's (2003) view, developmental stability is part of a system's sensitivity to random noise. In spite of minute differences in interpretation, the currently accepted view is that the degree of FA is the outcome of developmental noise and a suite of properties of a developing system that affect its sensitivity. Extrinsic factors such as stress can thus influence FA through changes in both developmental noise and DI (Klingenberg, 2003). Two other types of asymmetry are commonly encountered, which however do not reflect DI. Directional asymmetry (DA) refers to a general or average tendency of asymmetry into one direction, whereas antisymmetry (AS) is the pattern where about half of the population has a larger right hand side and the other half have developed a larger left hand side. DA is characterized by a unimodal distribution with a mean differing from zero, whereas AS results in a bimodal distribution (Palmer & Strobeck, 1986, 1992).

Although DI refers to causes of within individual variation, canalization refers to causes of between individual variations. Canalization is the process that ensures phenotypic constancy under changing conditions, or the ability to produce the same phenotype in different situations. A distinction between genetic and environmental canalization is often made where the former represents the phenotypic constancy in the face of genetic perturbations (mutations) whereas the later reflects this ability in a range of environments (Nijhout & Davidowitz, 2003).

Developmental instability is expressed phenotypically by within-individual variation, which is traditionally measured by FA in bilaterally symmetric organisms, or by radial symmetry in flowering plants. Canalization, on the other hand, is expressed by between-individual variation. Although between-individual variation has also been used as a measure of DI, it is important to maintain the distinction between DI and canalization until further research has resolved their origins.

Statistical properties of FA and models of developmental noise and instability

Statistical analyses have predominantly assumed that developmental errors occurring during development are independent from each other (see Klingenberg, 2003 for a recent overview), leading to an additive model and a predicted normal distribution of FA (Whitlock, 1996). The left and right trait value are viewed as a sample from a normal distribution with mean equal to the expected trait value in the absence of DN, and variance reflecting DI. The difference between left and right trait value (the signed asymmetry or signed FA) will also be normally distributed with zero mean and variance equal to twice the level of DI (Whitlock, 1996). Although normality of the signed FA has been used as the golden standard for trait selection (Palmer & Strobeck, 1986), leptokurtic distributions may reflect real FA as well (see also Box 1). However, the additivity of developmental errors has been challenged recently, which has important consequences for the distribution of FA and interpretation of patterns in FA (Graham et al., 2003; Klingenberg, 2003).

It has long been recognized that developmental systems have a high buffering capacity such that levels of FA are usually small, on average in the order of 1–2% of mean trait size. Therefore, FA can easily become confounded with measurement error (ME) (Palmer & Strobeck, 1986). It appears to have become common practice to investigate the magnitude of ME relative to FA through repeated measures of at least part of the dataset and the use of mixed model (Palmer & Strobeck, 1986; Van Dongen et al., 1999a).

The second step in the analysis of FA is to check if the distribution of the signed FA has zero mean and is normally (or leptocurtically) distributed. Average deviations from perfect symmetry indicate the presence of DA, whereas deviations from normality could also indicate the presence of other forms of asymmetry. Platycurtic distributions are generally interpreted as evidence for the presence of AS (Palmer & Strobeck, 1992). Combinations of different forms of asymmetry in single populations have been observed and detected using mixture analyses (Van Dongen et al., 1999b; Lens & Van Dongen, 2000). If indications of the presence of other forms of asymmetries are found, analyses should be done with great care.

Other forms of asymmetry

Directional asymmetry and AS may confound estimates of real FA and thus complicate analyses. Both DA and AS may arise from different nonexclusive origins. In any case, they do not comply with the above model of DI and great caution is needed when interpreting patterns in traits that show either DA or AS (Klingenberg, 2003). They may have an adaptive basis and/or be genetically determined, or may be environmentally induced when one side is used differentially than the other or experiences different environmental conditions. For example, it is well known that tennis players develop stronger and larger bones in the arm they use when playing tennis (Ducher et al., 2005). A systematic bias in measurements may also be due to handedness of the measurer, and not the result of morphological asymmetries (e.g. Helm & Albrecht, 2000).

It is, in theory, possible to obtain real FA values by statistically correcting for mean DA (and AS, although this has to my knowledge never been done in practice), by subtracting the average amount of asymmetry from individual signed asymmetry values. While applying such corrections, however, two important assumptions may be violated. First, one implicitly assumes that the average state of asymmetry used as correction factor reflects the optimal state for each individual, which should not necessarily be the case. Indeed, Stige et al. (2006) recently showed between-individual variation in levels of DA, invalidating the use of mean DA as correction factor, since after correction FA values are still confounded with DA. Pither & Taylor (2000) found evidence that levels of DA may differ among populations. And secondly, if the left and right hand side develop differently, the developmental processes and their reaction to perturbations may not be identical (Klingenberg, 2003). Therefore, correcting for DA should be done with great caution, and if possible such traits should be avoided. Even when the levels of DA are likely to be due to biased measurements on the left and right, subtracting the average degree of asymmetry is only suitable as correction when the amount of bias is identical at each measurement. The problem of detecting DA and correcting for it could be even more troublesome. Pither & Taylor (2000) have found indications that different component parts of the wings of the black-winged damselfly (Calopteryx maculate) showed DA which compensated each other to form wings of which the full lengths were consistent with real FA.

It has been argued that DI and AS could reflect or measure DI themselves (e.g. Graham et al., 1993), but this view is not commonly shared (e.g. Klingenberg, 2003). Nevertheless, it is worth noting that some studies have indeed found an increase in DA and/or AS with stress (e.g. Lens & Van Dongen, 2000; Kark et al., 2004) suggesting that these forms of asymmetry also deserve further study as biomarkers; yet, because the link with DI is unclear that is not the focus of this paper.

Even if ME is relatively small, mean asymmetry does not differ from zero, and the distribution of the signed FA is normal, it is impossible to rule out that very subtle degrees of either DA or AS are present and are confounding asymmetries due to random developmental noise. This is to some extent due to the relatively low statistical power of techniques to separate different forms of asymmetry. In addition, subtle environmental differences between left and right side of an organism could introduce a systematic, though directionally random in most cases, asymmetry that is not a reflection of DI (called environmentally induced asymmetry by Nijhout & Davidowitz, 2003). Such differences could arise for various reasons. In humans and primates handedness is prominently present. As bone and muscular tissues become modulated when used differentially (Lazenby, 2002), it is not surprising that for example bone volume can differ a lot between the dominant and other side in tennis players (e.g. Ducher et al., 2005). Directional asymmetries have been demonstrated in human limb bones and the degree of asymmetry appears to be influenced by a mix of genetic and behavioural factors (Auerbach & Ruff, 2004). Nevertheless, asymmetries in arms and legs are routinely used in human studies on DI, largely ignoring the fact that at least to some extent, these asymmetry values may not solely reflect DI (e.g. Brown et al., 2005). Furthermore, handedness is not limited to humans and primates, or to the higher vertebrates. Limb DA appears to be the rule in tetrapods and may have a genetic basis, at least in mice (Garland & Freeman, 2005). This implies that the expected degree of DA differs among genotypes, and thus that subtracting average population level asymmetries to correct for DA is incorrect. The asymmetrical use of left and right body parts appears to be very common throughout the world of animals. Snakes appear to show lateralization of coiling behaviour (Roth, 2003); octopus eye use appears to show an antisymmetric pattern (Byrne et al., 2004); toads have lateralized use of hind- and forelimb (Robins et al., 1998); humpback whales show behavioural asymmetries (Clapham et al., 1995) and fish may show a preferred direction by which they pass a barrier to reach a target (Bisazza et al., 2000). These behavioural differences might in some cases result in asymmetries that are not the result of DI, complicating the interpretation of patterns. Moreover, in animals that go through an immobile phase during development (e.g. pupae, eggs or a foetus in a womb) the orientation may imply slight environmental differences between left and right causing differences that do not reflect DI. Such environmentally induced asymmetry could also have caused correlations in the signed asymmetry of the tibia lengths in two different moth species (Van Dongen et al., 1999b). Studies on the establishment of the three-dimensional axes of the vertebrate body during embryological development show that these embryos have information on what will develop as left and right hand side. This information appears crucial for the correct and asymmetrical development of many organs. The signal of left and right appears to be acquired via maternal mRNA. It has therefore been argued that slight DA can be expected in nearly any structure that develops in the vertebrate body (Kraak, 1997), thereby confounding estimates of FA. In conclusion, the basic implicit assumption of using FA as a measure of DI may be violated quite often for a variety of reasons. To my knowledge, this aspect has not been thoroughly examined, yet has been identified by some as one of the most fundamental problems of FA studies (Palmer & Strobeck, 2003) and might, if explored profoundly, explain some of the observed heterogeneity. This would however, require additional behavioural data or experimental manipulation besides the measurement of both sides. In humans, handedness has been linked to asymmetric development (above), and such associations could be obtained for many animals as well. Windig & Nylin (1999), for example, have found indications that very subtle degrees of DA in the speckled wood butterfly (Pararge aegeria) might be related to the typical spiral flight of the males. The experimental orientation of pupae in climate rooms along a similar direction would allow exploring the importance of slight environmental differences between left and right.

The leptokurtic distribution of the signed asymmetry

If all individuals exhibit the same underlying level of DI, signed FA values will follow a normal distribution. However, when the levels of DI vary across individuals, the sample of signed FA values reflects a mix of different distributions, all with zero mean but different variances. Such a mix of distributions, termed mixture in statistical language, is typically leptokurtic or peaked (Whitlock, 1996). This deviation from normality is not related to any deviations from ‘real’ FA but simply reflects heterogeneity in the underlying levels of the unobservable DI. One point of caution is required as Palmer & Strobeck (1992) showed that a mix of real FA and AS could result in leptokurtic distributions as well. In order to differentiate between this possibility and the former where heterogeneity in DI underlies the cause of the leptokurtosis, we refer to the mixture model approach where the distribution of the signed asymmetry is broken up in different components with possible different means and variances (Van Dongen et al., 1999c). It is the degree of leptocurtosis which has been suggested to serve as a measure of between-individual variation in DI. This aspect is reviewed in the next section.

More recently, the assumption the additivity of developmental errors and the expected normal distribution of FA has been challenged (Graham et al., 2003; Klingenberg, 2003). Graham et al. (2003) argued that models of development would predict a log-normal or gamma distribution instead of a Gaussian distribution. Therefore, the phenotype on left and right hand side should not be modelled as a sample from a normal distribution with particular mean and variance, but rather as a log-normal or gamma distribution. As a consequence, the distribution of the signed asymmetry would become leptokurtic even in the absence of any heterogeneity in the underlying DI (i.e. the variances of the distributions). Recently statistical models have been developed to incorporate non-normal distributions of the outcome of developmental noise (Van Dongen et al., 2005), yet indicating that the normal distribution is probably a convenient approximation of the real phenomena (S. Van Dongen & A.P. Møller, unpublished data). Nevertheless, the routine use of a normal approximation should be done with great care until this aspect is investigated more thoroughly (Klingenberg, 2003).

Individual FA estimates DI only poorly

Developmental instability is estimated statistically as a variance and this at both the population (e.g. Palmer & Strobeck, 1986) and individual level (Whitlock, 1996). A higher variance reflects lower precision of development. It is commonly accepted that for a given sample size, a variance is estimated less accurately compared with a mean value. Consequently, at the population level, the power of statistical tests comparing levels of FA is relatively low, and different test procedures may show different behaviours and properties (Palmer & Strobeck, 1992; Palmer, 1994). However, in spite of power problems, the effect sizes of differences in average levels of FA between groups of individuals are unbiased.

The problem of sampling error is more striking when analysing patterns in FA at the individual level. Using single trait FA as an estimate of individual DI is an attempt to estimate a variance with only 1 data point (i.e. the difference between left and right trait value). As a consequence, FA only poorly reflects the underlying and unobservable levels of DI. The implications of this have only recently been acknowledged (Whitlock, 1996), and are still only rarely incorporated in routine statistical analyses. This is undoubtedly due to the fact that this topic is still under development and has been characterized by different opinions. In spite of the fact that the discussion involves detailed statistical arguments, it is too important for the study of FA in evolutionary biology to neglect it in this review. The importance comes from the fact that if we use FA as predictor in regression analyses, estimate the heritability of DI through FA, or investigate between-trait correlations in FA estimates are biased downward (Whitlock, 1996). Whitlock (1996, 1998) developed a framework, based on the hypothetical repeatability of FA, to correct for this downward bias. Van Dongen, in a series of papers has explored Bayesian models to achieve this (e.g. Van Dongen, 2001; Van Dongen et al., 2005; S. Van Dongen & A.P. Møller, unpublished data). The importance of the downward bias is illustrated in Box 1 using simulated datasets. Bias corrections are reviewed in Box 2. Nevertheless, it is important to keep in mind that these techniques are under full development and rigorous testing of them is generally lacking because only very little is known about the underlying mechanisms of DI. Therefore, these techniques are based on theoretical and statistical arguments only. Their application is illustrated in Box 3.

Which index to use?

There is a wealth of indices available to estimate DI at the individual or population level. It is beyond the scope of this overview to provide an extensive review. We refer to Palmer & Strobeck (1986,2003) for an overview of population levels estimates and to Hoffmann & Woods (2001) for some guidelines about their usage. In evolutionary biology one is mostly concerned with individual levels of asymmetry and DI. Usually the unsigned asymmetry is used. It can be calculated directly from the measurements or be obtained from mixed regression models (Van Dongen et al., 1999a). These estimates are corrected for measurements error and DA. The correction for ME, however, will only affect the outcome if heterogeneity in the accuracy of measurements is incorporated in the statistical model (Van Dongen et al., 2003). As indicated above, corrections for DA should be used with great caution and traits exhibiting DA should be avoided if possible.

There has been some debate about the use of size corrected asymmetries. The idea is that in absolute terms, a particular asymmetry value becomes biologically more meaningful and reflects higher DI the smaller the trait size is. Most commonly a size correction is be performed by dividing the unsigned asymmetry values by the average individual trait size, but other transformations may be more appropriate (Leung, 1998; Palmer & Strobeck, 2003). Particular attention should be paid to size corrections when ME is large. If ME is large, correcting for size will artificially create negative associations between asymmetry and trait size, as the absolute degree of ME can be expected to be independent of trait size (Palmer & Strobeck, 2003).

As reviewed above and in Boxes 1, 2 and 3, single trait FA only weakly reflects the underlying DI. It has therefore often been suggested to combine information from different traits in a single analysis. Leung et al. (2000) have performed a series of simulations and suggest to take the sum (or mean) of either the mean-standardized or ranked unsigned FA values. Under very stringent assumptions Leung et al. (2000) were able to show that when k traits are being measured, the sample size of an experiment should be k × N if only one trait would have been measured. However, this is only true when there is no variation in DI among individuals, all traits can be measured with the same accuracy and all traits show the same association with, for example, fitness. These assumptions are likely violated for several reasons. Levels of DI appear to vary among individuals (e.g. Van Dongen et al., 2005) and it has been argued (although there is little empirical evidence) that some traits may be more vulnerable to the effects of stress than others (see below). An alternative to using a composite FA index is to treat data from different traits within single individuals as repeated measures and to use a mixed model type of analysis. In such models, the data from different traits are not combined into a single index. Treatment is hereby added as fixed effect, whereas trait and the trait × treatment interaction are added as random effect to the model. In addition, between-trait correlations in the unsigned asymmetry should be incorporated in the models’ R-matrix (which will also incorporate correlations in the signed FA). We refer to Verbeke & Molenberghs (2000) for more details about mixed modelling and to Karvonen et al. (2003) for an application of this approach. The main difference between this approach and averaging FA values into a composite measure is the interpretation of the fixed treatment effect. When comparing average FA values, the treatment effect can only be interpreted if one makes the (perhaps unrealistic) assumption of no variation in effect sizes between traits. With the mixed model analysis, one assumes that the set of traits that has been measured reflects a random sample of all measurable traits and the fixed treatment effect is corrected for between-trait variability in effect sizes. In addition, the trait × treatment interaction allows interpreting and testing the amount of variation in effect sizes among traits. If the set of traits is either small or one has good reasons to assume it does not reflect a random sample of all measurable traits, one could decide to treat trait as fixed effect. This way of analysing the data and combine information from different traits is comparable to a meta-analysis, and has the advantage it allows testing a broad range of hypotheses.

The evolutionary potential/properties of FA and DI: what do we know?

In this section an overview is given of the current state of knowledge regarding the evolutionary potential of DI. It is not presented as a formal meta-analysis, but rather as a ‘review of the reviews’ with special emphasis on the genetic architecture, fitness associations and to what extent DI is a property of an individual/genome or rather is trait specific.

The genetic basis of FA and DI

The question of whether or not DI is heritable and the study of its genetic architecture is of fundamental importance when implementing DI in addressing evolutionary questions (Leamy & Klingenberg, 2005). Insights in the genetic architecture of DI would allow evaluating if it can respond to selection or play a role in sexual selection where the model of ‘good genes’ is usually implicitly assumed. A better understanding of the genetic background of DI is also of primary interest to the question whether an individual asymmetry parameter (IAP) exists and to what extent patterns at one or a few traits reflect genome/individual-wide effects (see also below). The use of FA and DI as a measure of stress in more ecological or conservation oriented studies implicitly assumes that FA levels are entirely attributable to the environmental conditions (Palmer, 1994), which is not necessarily true (e.g. McKenzie, 2003).

Developmental instability has been considered to be, at least to some extent, under genetic control since the pioneering work by Mather (1953), Thoday (1958) and Reeve (1960). Because heritabilities were usually low, Møller & Thornhill (1997a) performed a meta-analysis to combine data from available studies. This analysis led to an estimated average heritability of 0.27, yet drew a lot of criticism. Subsequent analyses indicated that the average heritability of FA was likely to be lower (e.g. 0.08, Leamy, 1997; 0.043, Van Dongen, 2000; 0.026, Fuller & Houle, 2003). The value of an average heritability had been doubted before, as there are indications that heritabilities may be trait specific (Woods et al., 1998), a pattern that can be expected from the developmental model of Klingenberg & Nijhout (1999). It seems to be generally accepted now that the heritability of FA appears very small (Leamy & Klingenberg, 2005). However, since FA only roughly estimates DI, heritabilities in FA underestimate heritabilities in the underlying and unobservable process of interest, namely DI (Whitlock, 1996, 1998; Van Dongen, 1998, Boxes 1, 2 and 3). Heritabilities of DI, therefore, have been suggested to be much higher (Gangestad & Thornhill, 1999, but see below). Since the controversial 1997 meta-analysis (and perhaps in part as a result of the controversy and attention it drew), research on the genetic architecture of DI has also moved towards other areas, namely quantitative trait loci (QTL) mapping, effects of single genes and transformations of heritabilities in FA into those in DI.

A transformation of heritabilities in FA into heritabilities in DI can be carried out by dividing the heritability in FA by the so-called hypothetical repeatability R (Whitlock, 1996, 1998, Box 2). Thus, if this value of R is small, the underestimation is more severe. Estimates of the values of R from empirical studies however, vary a lot among studies. Whitlock (1996), Van Dongen (1998) and Van Dongen & Lens (2000) reported relative high values, whereas Gangestad & Thornhill (1999) reported very low values. Obviously this has important implications for estimates of the heritability of DI. Even very small values for the heritability of FA can yield relatively high heritabilities of DI if the value of R is below 0.1, as suggested by Gangestad & Thornhill (1999). However, when variation in DI is large and values of R are above 0.4 (Van Dongen & Lens, 2000), the interpretation will hardly change. In order to investigate the importance of the low repeatability of FA and its downward bias on the estimates of the heritability of DI, Van Dongen & Lens (2000) predicted that estimates of the heritability FA would be positively correlated with the values of R. This hypothesis, however, was not supported, in spite of the fact that such a positive association was observed between R and between-trait correlations in FA (Van Dongen & Lens, 2000; and below). This further suggests a commonly low heritability of both FA and DI. Nevertheless, it is important to reiterate at this point that the transformation of patterns in FA into patterns in DI is based on theoretical and statistical arguments only (Klingenberg, 2003) and the high values of R have been challenged for various reasons (Graham et al., 2003; Van Dongen et al., 2005). There is an urgent need for studies that shed light on the underlying processes that determine DI such that the implicit assumptions of these statistical models can be evaluated explicitly.

Heritabilities are known to be difficult to estimate reliably and this is particularly the case for DI (Fuller & Houle, 2003). This is even more so for nonadditive genetic variation, which has been suggested to be more important in the genetic architecture of DI. Studies using QTL mapping and the effects of single genes on DI have been very useful in unravelling the presence of dominance and/or epistatic interactions (Leamy & Klingenberg, 2005). The best known and most extensively studied single gene system that appeared to affect levels of DI in one particular trait is that of the sheep blowfly (Lucilia cuprina). This study system clearly emphasized the important of epistatic interactions (McKenzie, 2003). An example where one locus appeared to affect DI more generally across different traits can be found in Mitton (1993), and other examples were reviewed by Møller & Thornhill (1997b) and Leamy & Klingenberg (2005). QTL studies have also revealed the occurrence of dominance effects but its general importance is not supported by the overall weak association between FA/DI and inbreeding (Leamy & Klingenberg, 2005). Nevertheless, it is important to realize that also the FA-inbreeding association underestimates patters in DI and that several studies had an inadequate design to robustly estimate DI-inbreeding associations. It is also well known that many (>20) variable genetic markers are required in order to estimate genome-wide heterozygosity. Studies using a relative low number of loci are therefore more likely to estimate effects of specific genes or linked regions. One of the strongest associations between microsatellite heterozygosity and single trait asymmetry was observed in the Taita thrush (Turdus helleri) in the Taita hills Kenya. Interestingly, an interaction between heterozygosity and habitat quality was observed, where the strongest FA-heterozygosity association was observed in the area of lowest quality, whereas such an association was absent in the undisturbed control area (Lens et al., 2000). This pattern was consistent among three traits. Although the importance of local effects of areas linked to the six microsatellite markers cannot be ruled out, this study suggested the importance of environmental stress on the expression of FA-inbreeding/heterozygosity associations (see also Mitton, 1997).

At present, there seems to be a consensus that the additive genetic variation for FA and DI is generally low and there are strong indications that nonadditive effects, epistasis in particular, may play an important role in the genetic architecture. Further indirect evidence for the importance of epistatic interactions comes from hybrid studies where crosses of different species or subspecies often show outbreeding depression. A convenient – yet to date untested – explanation for this phenomenon is that selection has produced co-adapted gene complexes, i.e. adaptive epistatic interactions, which are broken down by the hybridization (Leamy & Klingenberg, 2005). However, as pointed out by Alibert & Auffray (2003), the study of the role of epistatic adaptation and genomic co-adaptation for DI is still in an exploratory phase, but the study of hybrids can help elucidate the role of the genome in determining individual DI.

DI-fitness associations and the role of FA in sexual selection

As for the genetic basis of DI, there is much work to be done to increase our understanding of the association between DI and fitness and the role it may play in sexual selection. The literature has been plagued by controversy and heterogeneous results (e.g. Møller, 1997a; Clarke, 1998; Møller & Thornhill, 1998; Møller, 1999a,b; Palmer, 1999) and the most accepted current view appears to be that DI may in some cases be related to fitness and/or play a role in sexual selection, but this is not a universal pattern (Bjorksten et al., 2000; Clarke, 2003). Unfortunately, it is at present impossible to predict when such an association can be expected. Review studies have been performed but failed to lead to a broader understanding of the forces behind DI-fitness association, in part because of publication bias and a lack of general patterns. In addition, as well as for the genetic basis of FA and DI, DI-fitness associations are underestimated when single trait asymmetries are used as estimates of DI. The problems associated with transforming patterns in DI are discussed in the previous section and Box 2.

Many suggestions have been made with respect to the origins of the heterogeneity in the observed FA-fitness associations.

First, there are indications that FA-fitness associations may become apparent more strongly under relatively high stress levels where individuals of lower (genetic) quality are challenged and unmasked (Lens et al., 2002b; Woods et al., 2002; Hendrickx et al., 2003). This aspect is closely related to the idea that levels of FA will increase with stress before appreciable fitness consequences are observed, the so called ‘early warning paradigm’ (Clarke, 1995). More recently W. Talloen, L. Lens, E. Matthysen and S. Van Dongen (unpublished data) showed that isolation of young great tit (Parus major) pulli from parental care for a very short period during development resulted in increased asymmetries in tarsus lengths whereas no association with any fitness component was found at the individual level. In addition, the treatment itself did not affect fitness, indicating that very short disruptions during early development can increase FA without any clear fitness effect supporting the early warning paradigm.

Secondly, DI-fitness associations are likely to be character specific and this could be the case for different reasons (Clarke, 2003). The development of some traits may be buffered more strongly because of its direct importance for fitness, like flying ability. One could envision that even under relatively high stress conditions, still many resources will be diverted to the stable development of those functionally important traits and that no association with stress or fitness will emerge at those traits. Unfortunately, comparisons between traits of different functional importance have only rarely been made (but see Karvonen et al., 2003). Exaggerated traits under directional selection, like ornaments, often show higher levels of DI and have been suggested to be more sensitive to the effects of stress (but see Bjorksten et al., 2000). Finally, different traits develop in part at different time points such that variation in stress over time may have different effects on the degree of asymmetry as well as on fitness.

Thirdly, trade-offs between different fitness components and life history processes may obscure patterns. Very often, studies of FA-fitness associations have to rely on a few fitness components. However, it is well established that a relatively high value for a particular fitness-component should reflect high overall fitness as trade-offs between different life history characteristics make it difficult to estimate fitness accurately. Radwan et al. (2003), for example, found evidence that in the bulb mite Rhizoglyphus robini, sexually selected male fighting ability is traded off against DI and lifespan.

Fourthly, it has been argued that developmentally unstable individuals might disappear from the population before measurements are being done. Møller (1997b) has indeed argued that as in most organisms many more individuals are being produced than those actually reaching adulthood. If developmentally unstable individuals die early, patterns could become obscured. This developmental selection hypothesis was supported in an experiment by Polak et al. (2002) studying the effects of arsenic on FA and fitness in Drosophila melanogaster. If this hypothesis would hold more generally, it might explain why in some cases, under very extreme stress, no differences in FA are found (e.g. Van Dongen et al., 2001; Stige et al., 2004), yet its generality remains unknown.

Fifthly, one could wonder why a negative association between DI and fitness should exist ubiquitously anyway. Increases in levels of FA and DI are not stress-specific, as in most natural populations individuals are subjected to various forms and strengths of stress at different stages in their life and development. As not each of these stresses should have an equal and uniform impact on both fitness and FA/DI, many patterns may become obscured. Such interactions have only rarely been investigated. In a recent study, Polak et al. (2004) showed an interaction between the effects of lead contamination and temperature as stress factors on the levels of FA. Surprisingly, increased lead levels apparently attenuated the effects of suboptimal temperature. This result emphasizes that different forms of stress may affect DI in an unpredictable way and should not necessarily act in an additive fashion. This obviously complicates the interpretation of FA-stress and FA-fitness associations in field situations where different forms of stress usually co-occur.

Asymmetries in sexual ornaments have taken up a central place in the study of FA and its role in evolutionary biology. This, no doubt, is in part the result of the fact that sexual ornaments have been studied intensively in evolutionary biology in general as they are among the fastest evolving traits of animals. Sexual selection can be expected to favour larger and/or more exaggerated signals, but this evolution is likely to be slowed or halted by physiological costs or reduced survival. There is growing evidence that sexual traits are very plastic and sensitive to environmental stress, which may reduce their expression. The observation that the degree of asymmetry appears to decrease with size in sexual ornaments has led to the suggestion that FA in sexually selected traits may honestly signal individual quality. This hypothesis predicts that there exists a negative correlation between FA and size which is condition dependent. When only high quality individuals can bear the cost of producing large and symmetrical ornaments, inter-male differences in genetic quality is reflected by their degree of asymmetry. However, this hypothesis assumes that all individuals experienced the same levels of stress, which is unlikely to be the case under natural conditions. Polak & Starmer (2005) indeed found indications in a secondary sexual character in Drosophila bipectinata that a negative size-FA association was driven by environmental heterogeneity rather than genetic heterogeneity. Thus, FA in sexual ornaments might as well reflect the environmental stress an individual experienced earlier in life instead of genetic quality (Polak & Starmer, 2005). Under both scenarios it is predicted that sexually selected traits should be more sensitive to environmental stress compared to others, a prediction that has never been tested thoroughly. Although the role of FA in sexual selection is mostly investigated correlatively, relatively few studies have considered the proximate mechanisms of signalling and assessment and the role of sensory structures and the central decision-making process (Uetz & Taylor, 2003). Furthermore, the importance of the ‘good genes’ hypothesis remains largely unresolved, yet the low heritability generally observed appears to rule it out as an important evolutionary force.

In conclusion and as Clarke (2003) pointed out recently, despite the high number of studies testing an association between DI and fitness, few were scientifically and statistically robust enough. Ideally, fitness DI associations should be tested at the individual level, in different populations and under different environmental conditions. Given a suspected trait-specificity, asymmetry in a sufficient number of traits should be measured. Patterns in traits that are and are not under the influence of sexual selection should be compared. Correlative phenotypic associations between FA and trait size or fitness components should be interpreted with caution (Polak & Starmer, 2005). When studied in several populations it has the potential to include other components such as levels of heterozygosity and environmental stress factors. Although the observed heterogeneity in the literature might be frustrating, it should urge us to try to unravel the basis of this heterogeneity and elucidate the underlying mechanisms.

Is developmental instability a property of the individual?

There are reasons to suspect that heritabilities might be trait specific, that some traits may be more vulnerable to stress than others or that their asymmetry relates to fitness in different ways (see previous sections). Crucial in these discussions is to what extent the asymmetry of any single trait reflects DI of the individual (i.e. a genome wide property) or whether it reflects more localized effects. In other words, if DI is a property of the individual, relatively strong correlations in the unsigned FA among different traits at the individual level are expected. This would be indicative of a so-called IAP. As for the other aspects reviewed so far, the observations have been contradictory and equivocal (Polak et al., 2003). And, in agreement with estimates of heritabilities of DI and DI-fitness associations, between trait correlations in FA underestimate patterns in DI (Box 2). Only if there is a lot of variation in the underlying DI, relatively high correlations should emerge. Van Dongen & Lens (2000), therefore, examined the association between the amount of variation in DI in a population (i.e. the hypothetical repeatability R, Box 2) and the strength of between trait correlations in FA. As predicted, and contrary to the association for heritabilities (see above), this analysis showed a positive association, supporting the presence of an IAP. In a recent review, Polak et al. (2003) emphasized the importance of localized buffering mechanisms in developmentally integrated traits. In addition, they found evidence that a similarity in function (locomotory or involved in sexual selection) increased between-trait correlations. For example, two traits involved in locomotion, or in sexual selection correlated more strongly compared with an association between a locomotory and a sexually selected trait. These results suggest a ‘dual buffering system’, where trade-offs exist between in resource allocation between processes that affect organism wide and trait-specific processes (Polak et al., 2003). These results caution for the more general conclusions formulated earlier (Van Dongen & Lens, 2000). Nevertheless, in general association are very weak and it remains largely unclear to what extent DI can be regarded as an organism-wide property or if patterns are trait-specific. Resolving this issue obviously requires large sample experiments where several traits are studied.

Crucial to our understanding of to what extent FA and DI of one or a few traits contain information about an organisms genome wide DI, are more insights in the physiological and biochemical processes that control or determine the stability of developing traits. Yet very little is known about the mechanisms that govern DI. In addition, it is important to understand how different parts of a developing organism, either different sides or different parts of a complex trait within a side, interact during development. It has been suggested that during development, asymmetries can be compensated for through some interaction between the two developing sides (i.e. compensatory growth hypothesis). Alternatively, both sides might be considered as independent units that do not interact (which is implicitly assumed in current statistical models of DI, Boxes 1 and 2). Again observations in the literature have been equivocal. Chippendale & Palmer (1993) followed the development of limbs in a Brachyuran crab and concluded that asymmetries largely persisted through development. In a later study, Swaddle & Witter (1997) found indications of the presence of compensatory growth in the tails of Starlings, a pattern that was criticized by Aparicio (1998). The occurrence of compensatory growth was further supported in both insects (Tomkins, 1999; Servia et al., 2002) and domestic fowl (Kellner & Alford, 2003). In a recent study by W. Talloen, L. Lens, E. Matthysen and S. Van Dongen (unpublished data) no evidence of compensatory growth was detected in tail feathers of great tits that were artificially regrown asynchronously. However, W. Talloen, L. Lens, E. Matthysen and S. Van Dongen (unpublished data) noted that artificially induced regrowth may not reflect normal development but rather damage repair, and may not be representative. Thus, there are at least indications that the left and right side of developing traits interact to ensure stable development. Whether stress can affect the degree of compensatory growth is, however, only rarely examined. To our knowledge, there is only one study that has found indications that the degree of correlation between asymmetry at a certain time point during development and asymmetry in growth in the following period may decrease under stress (Møller & Van Dongen, 2003). The importance of compensatory mechanisms and their role in DI apparently remain underinvestigated.

Although the compensatory growth hypothesis refers to interactions between sides, different parts of a developing trait may also share developmental events. FA has recently emerged as a research tool to study the degree of developmental integration of different traits. This application was introduced by Klingenberg and co-workers in a series of papers and applied to the mandibles of rodents and insect wings (e.g. Klingenberg & McIntyre, 1998; Klingenberg et al., 2002). Correlations in the signed asymmetries between different parts of a developing trait are indicative of developmental integration as they may share developmental errors in processes that occurred earlier in development. It is important to note, however, that patterns in correlations in the signed FA may also originate from common environment effects that differ between sides. Nevertheless, recent work has indicated that the study of patterns in correlations in the signed asymmetry of different parts of a trait may prove an important tool in understanding the evolution of complex traits and the effects of stress on developmental integration (Badyaev & Foresman, 2005). Recent developments have suggested that developmental integration may aid the stable development of complex traits and that functionally integrated parts may be more buffered against stress. A more general application of these approaches is likely to provide more insights in how developmental integration will affect the response of trait development to environmental stress and how individuals differ in their response.

Important caveats and future directions

Perhaps most importantly, very little is known about the underlying mechanisms that determine DI of an individual or particular trait. Consequently, the link between the observed asymmetries and the underlying mechanisms of DI are based on theoretical and statistical arguments only. Furthermore, there are many sources of variation in asymmetry that do not reflect DI and thus may obscure patterns. A prerequisite for understanding the role of DI in evolutionary biology, or any other area of research, is that it can be estimated reliably. Given the numerous potentially confounding factors reviewed in the section The evolutionary potential/properties of FA and DI: what do we know?, the fact that asymmetries may serve as a simple estimate of the unobservable DI has perhaps been taken for granted too often. Even in cases where DI is the sole cause of asymmetric development, it is unclear how different levels of DI map onto the observable FA. The link between the observed asymmetry and the presumed underlying process (DI) commonly used in statistical models still remains hypothetical. As a result, statistical analyses have to rely on rather arbitrary choices of distributions. Incorrect choices, though, can lead to misleading inferences (Van Dongen, 2006). In addition, the precise nature of the FA-DI link is required to be able to estimate the amount of between-individual variation in DI and how closely single trait FA reflects DI (Boxes 2 and 3). In a recent study, S. Van Dongen and A.P. Møller (unpublished data) found indications that the normal distribution fitted the observed asymmetries in barn swallow (Hirundo rustica) tail feathers and petal radial asymmetry in flowers best. Clearly, the generality of this result requires further testing in order to elucidate the discussions about the amount of variation in DI present in laboratory and natural populations.

A second important caveat is the lack of studies that investigate how interactions between different types of stress affect levels of DI. In addition, few studies have investigated the genetic basis of DI under stress. In order to be of evolutionary importance in itself, DI should have a genetic basis and be linked with fitness. Furthermore, it is crucial to understand to what extent levels of asymmetry of one or a few traits reveal information on the (genetic) quality of the individual and/or the environmental conditions in which it developed. Results in any area are very controversial, although the common consensus for the genetic basis of DI seems to be that the amount of additive genetic variation is limited and that nonadditive variation, epistatic interactions in particular, would be more important (Leamy & Klingenberg, 2005). Nevertheless, the apparent near absence of additive genetic variation for DI seems surprising given the fact that stress tolerance and parasite resistance appear to be heritable in some cases and that stress and parasitism often increase FA and DI at the population level. However, so far few studies have estimated the heritability of FA and DI under stressful conditions. If we assume that DI increases with stress, the highest heritabilities of DI can be expected to occur under such stress condition and when stress tolerance has a heritable basis. In addition, stress has often been shown to increase heritabilities in other traits (Badyaev, 2005). Woods et al. (1998) studied heritabilities of FA under control and stress situations but found no markedly higher heritabilities under stress. However, there were no clear increases in FA under stress, nor was the heritability in stress resistance studied. Because FA has often been observed to increase with stress, and the effects of different forms of stress should not be additive (Polak et al., 2004), the interacting effects of different forms of stress seem to deserve more attention as well. In addition, inbreeding may affect levels of stress tolerance (e.g. Dahlgaard & Hoffmann, 2000). One study that has examined these stress-by-stress and genotype-by-stress interactions on levels of DI did not reveal different responses of the different genotypes to stress. However, the interpretation was complicated by the fact that the effects of temperature and lead appeared to interact in a complex way (Polak et al., 2004). In another study, Polak & Starmer (2005) only investigated one type of stress. Clearly there appears to be a general lack of studies that compare patterns among traits and across different environments and under different types of stress. This argument applies for all aspects of FA studies, including associations between FA and fitness, between FA and heterozygosity and inbreeding and between-trait associations in FA. Indeed, there are indications that stress may affect FA-fitness and FA-heterozygosity associations (Lens et al., 2000, 2002b; Hendrickx et al., 2003).

It has often been argued that many studies of FA are underpowered and have sample sizes that are too low. This is especially true when analysing patterns at the individual level as single trait FA only weakly reflects DI (Boxes 1 and 2). However, combining information from different traits could complicate the interpretation of patterns when trait-specificity occurs. It may therefore be advisable to include traits that more strongly reflects DI compared with metric linear measurements. This can be expected to be the case for shape asymmetry and positional FA (PFA, e.g. Polak & Starmer, 2005). Shape analyses usually rely on landmark data and therefore combine information from different parts of a trait, but have to my knowledge not been used to obtain heritabilities nor have they been frequently used to examine the effects of stress. PFA has been shown to exhibit a relatively high heritability compared with estimates for other traits (i.e. broad sense heritability of 0.1 in Polak & Stillabower, 2004) and to be important in sexual selection (Polak & Starmer, 2005). Future studies should thus also focus on comparing patterns among traits and include some types of traits that more reliably reflect DI. Perhaps surprisingly, Lens et al. (2002a) found indications that meristic traits may be more sensitive to the effects of environmental stress.

On the one hand there is a clear need for a direct evaluation of the evolutionary potential of DI under stress and to establish trait-specificity of patterns. On the other hand the explanation of the heterogeneity in patterns across traits, populations and species will require more thorough insights in the underlying mechanisms of DI itself. At present, it largely remains a black box (Klingenberg, 2003 and above). Nevertheless, some progress has been made in this area, and a good candidate protein which may play an important direct role in organism wide DI is HSP90, a capacitator of morphological variation (Rutherford & Lindquist, 1998). Nevertheless, results have been equivocal at best. Although several studies have demonstrated a role of Hsp in canalization, few have studied its importance for DI. In Arabidopsis DI increases when Hsp90 is impaired (Queitsch et al., 2002), whereas Milton et al. (2003) did not find a link between Hsp90 and DI in Drosophila. Further studies should look for more candidate genes and elucidate their role in determining DI (at the level of the whole organism or specific traits) and stress tolerance using QTL mapping and micro array analyses. As for the heritability estimates, these studies should be performed under a variety of stress conditions. This should lead to better insights in the role of developmental homeostasis in masking genetic variation emerging from mutations and when this ‘silent’ variability may become expressed and subjected to natural selection.

Concluding remarks

There are many reasons why not to study FA. The estimation of DI by means of FA is difficult and cumbersome (section Developmental noise, (in)stability and asymmetry). Not only is FA a poor estimate of individual DI, measurements must be done carefully and repeatedly in many different traits. Even after careful measurement, it is impossible to guarantee that the observed asymmetries are not confounded by other factors. In addition, statistical analyses are relatively complex. Generally, FA is not a ubiquitous measure of individual fitness, and there are no shortcuts with respect to fitness determination. In addition, there seems to be a general agreement that there is little additive genetic variation (section The evolutionary potential/properties of FA and DI: what do we know?). To make things even worse, meta analyses are difficult to interpret because of publication bias (e.g. Palmer, 1999). Nevertheless, there exist carefully controlled studies that do show the existence of FA-fitness associations as well as the importance of the genome, albeit usually in a nonadditive way. Furthermore, there is no other morphological or physiological trait that reflects fitness reliably. At this point, FA should be viewed as one of the many potential fitness-markers of which it is not yet possible to predict when it can be applied reliably. It is crucial to get a better understanding of the underlying sources that cause the observed heterogeneity to be able to evaluate the role of DI in evolutionary biology. Besides the increase of the overall quality of FA research and the more general understanding of the many pitfalls, the most important breakthrough has been the use of asymmetry as a tool to study developmental integration which can provide important insights in the evolution of complex traits. Furthermore, the use of latent variable models might contribute to a better understanding of the relationship between the unobservable DI and other covariates. Keeping in mind that there are still several caveats in our current understanding of patterns in FA and DI, further carefully controlled studies are clearly needed. These studies should attempt to carefully measure different traits and inspect their suitability as measure of DI. In spite of half a century of research, our understanding has progressed relatively little and future studies should focus on specific hypotheses that could explain the observed heterogeneity in the literature and allow predicting if and when FA and DI could play a role in evolutionary biology. Most importantly, associations between FA/DI and fitness and inbreeding, as well as its genetic architecture should be investigated under different types of stress. With this review I hope to have stimulated renewed interest in this challenging area of research by pinpointing particular pitfalls when using FA as a measure of DI and important caveats in our current knowledge.

Acknowledgments

I am very grateful to Luc Lens for stimulating discussions and comments to earlier drafts and to two anonymous reviewers for their many suggestions improving this paper substantially.

Box 1: FA poorly estimates DI and this causes biases

Whitlock (1996) drew major attention to the fact that single trait asymmetry only weakly reflects the underlying instability of development. The reason for this is that one attempts to estimate a variance (i.e. DI) with only one data point (i.e. the observed asymmetry) which involves a high degree of sampling variation. This obviously has important consequences for the estimation of associations between DI and other covariates, as well as estimates of heritabilities of DI and between-trait correlations in DI. When using single trait FA as an estimate of individual DI, estimates are biased downward. To illustrate this, we simulated three datasets of 1000 individuals, each with the same average degree of developmental instability (=0.5) but with different levels of variation [high (CV = 150%), intermediate (CV = 100%) and no variation (CV = 0%)]. From each dataset of 1000 individuals asymmetry values were sampled from a normal distribution with mean equal to zero and variance equal to the individual-specific levels of DI. The distributions of DI and the signed asymmetry values are given in Fig. 2. With increasing variation in DI, the distribution of the signed asymmetry becomes more peaked (i.e. kurtosis >0) (see e.g. also Gangestad & Thornhill, 1999; Fig. 2). When individuals do not differ in their degree of underlying DI, there is nevertheless substantial variation in the observable asymmetry (Fig. 2 top). It can be shown analytically that the coefficient of variation in the unsigned FA (absolute value of the signed FA values) equals 76% when no variation in the unobservable DI is present. All this variation is due to the above mentioned sampling error, the fact that we estimate a variance (i.e. DI) with only one data point (i.e. unsigned FA) (Whitlock, 1996). When exploring the association between the unobservable levels of DI and the observable individual levels of unsigned FA even in the presence of very high variation in DI (Fig. 2 bottom) the association between FA and DI is relatively weak (r2 = 36%).

Figure 2.

 Overview of simulated data. Columns 1 and 2 show simulated distributions of developmental instability (DI) and the signed asymmetry for three different levels of variation in DI (top: CV = 150%, middle: CV = 100%, bottom: CV = 0%). Associations between DI and FA (column 3) and DI and fitness (column 4) are not observable in empirical datasets but only in simulations like this one. The observable association between FA and fitness are given in the right column.

Now, if we assume there is an association between the underlying individual levels of DI and some fitness component according to the following linear model:

image

Where

image

we can explore the observable associations between the unsigned FA and fitness when the amount of between-individual variation in DI is either high, intermediate or completely absent. The degree of correlation (and thus statistical power) between the FA and fitness is weaker when variation in DI is smaller, and clearly underestimates the true correlation in any case (Fig. 2). Methods to correct for this bias will be reviewed in Box 2 and illustrated in Box 3.

Box 2: How to correct for the downward bias?

As illustrated using simulated datasets in Box 1, patterns in single trait FA underestimate those in DI. It is possible to correct for this downward bias and an important parameter in this context is the so called hypothetical repeatabilityR, as developed by Whitlock (1996, 1998. This parameter estimates the proportion of the total variation in the unsigned asymmetry (i.e. absolute value of the signed asymmetry) that results from between-individual heterogeneity in DI (Whitlock, 1996). It can be seen as a repeatability, because it estimates the repeatability of the unsigned asymmetry assuming that a trait could develop more than once under exactly the same set of conditions. Since that is only possible in theory, it is often called a hypothetical repeatability. Traditionally, estimation is based on distributional characteristics of signed and/or unsigned asymmetry (Van Dongen, 1998; Whitlock, 1998; Gangestad & Thornhill, 1999). The most commonly used formula was developed by Whitlock (1998) and uses the coefficient of variation of the unsigned asymmetry (CVFA):

image

Other estimators have been proposed, of which those of Van Dongen (1998) and Gangestad & Thornhill (1999) give very comparable results. The hypothetical repeatability can be easily calculated from the data and can then be applied to transform heritabilities, between trait correlations and correlations between FA and other covariates into patterns in DI as follows:

image
image
image

Because any repeatability is by definition smaller than 1, and in the case of FA its maximal value is 2/π = 0.637, patterns in FA always underestimate patterns in DI. However, this transformation will not increase statistical power, because the upper and lower limits of the 95% confidence intervals need to be transformed as well. However, because R is also estimated with some degree of error, these transformed 95% confidence intervals do not take the uncertainty of the estimation of R into account and are likely to be to narrow. At present there is no procedure available to estimate the correct 95% confidence intervals, but it could be easily implemented in a bootstrap procedure (see Box 3).

A different approach to estimate patterns in DI is through a latent variable model where a link between the observable FA and the unobservable DI can be modelled statistically. In a series of papers, Van Dongen has introduced and explored the use of Bayesian techniques to estimate the association between DI and a covariate using such a latent variable model. A formulation of the problem in this way appeared to have several advantages. First, any complex model can be handled in this way. Secondly, heterogeneity in ME can be incorporated (Van Dongen et al., 2003). Thirdly, although the derivation of R is based on the assumption that developmental errors are additive leading to a normal distribution, the Bayesian models can incorporate alternative distributions (Van Dongen et al., 2005). Fourthly, it can be easily extended to radial symmetrically traits, hereby being able to statistically compare different distributional alternatives (S. Van Dongen and A.P. Møller, unpublished data). And finally, there are indications that the Bayesian method is more powerful (Van Dongen, 2006). In general, Bayesian methods have the additional advantage that they are very flexible, can incorporate prior knowledge, take all sources of uncertainty into account and are very well suited to make predictions. Major drawbacks are that Bayesian methods seem more complex (while they are not), that setting up priors requires a subjective judgement (Van Dongen, 2006), that the methods are not readily available in most standard software packages and therefore require a more thorough understanding of the statistical model. It is beyond the scope of this paper to treat the Bayesian techniques in detail, but instead we will apply them to two simulated datasets in Box 3 and have referred to the relevant literature.

Box 3: Transforming an FA-fitness into a DI-fitness association

To illustrate the two methods to transform patterns in FA into patterns in DI reviewed in Box 2 we simulated 4 datasets of 250 individuals each. The coefficient of variation in DI was set equal to 1, ME equalled 0.1, and two independent measurements on each side were simulated. In each of the four datasets, one of four different slopes of the association between fitness and DI were simulated (i.e. 0, −1, −2.5 and −5). This setting led to datasets where no (true correlation = 0), a weak (true correlation = −0.25), an intermediate (true correlation = −0.5) or a strong (true correlation = −0.8) association between fitness and DI is present. In each dataset, the Pearson correlation between the unsigned asymmetry and fitness was estimated as well as their 95% confidence intervals. We then estimated the hypothetical repeatability and used this to transform the FA-fitness association into a DI-fitness association (see Box 2). 95% CI were obtained by bootstrapping individuals 10 000 times. The same datasets were also analysed in a Bayesian framework following Van Dongen (2001). This methodology does not estimate the hypothetical repeatability, but used individual DI as a latent variable (see also Box 2). The distribution of variation in DI is modelled by a gamma distribution, of which the coefficient of variation is reported here (see Van Dongen et al., 2005 for more details). This small simulation was run three times. An R script which was used to do these analyses will be made available electronically at http://www.ua.ac.be/stefan.vandongen/.

Both methods apparently transform patterns in FA into patterns in DI adequately in the sense that they yield estimates close to the expected correlations. In addition, there appears to be a moderate to much smaller 95% CI when using the Bayesian methodology (Table 1). Furthermore, the use of R in combination with the bootstrap failed when some of the resamples rendered negative values for this hypothetical repeatability. This problem will occur more often as between-individual variation in DI gets smaller. Obviously, these techniques will require further testing to assure when they can be reliably used to make inference at the level of the unobservable DI. Nevertheless, they suggest that the Bayesian methodology might have some power advantage over the use of the hypothetical repeatability (see also Van Dongen, 2006).

Table 1.   Estimates of Pearson correlation coefficients and their 95% confidence intervals of the associations between both FA and DI on the one hand, and fitness on the other hand. Transformations into an association between DI and fitness were achieved using the hypothetical repeatability (R) and a latent variable where variation in individual DI is estimated explicitly. The amount of variation is reported in the table below as well (CVDI). Three separate runs of these simulations were performed.
Degree of associationrtrue = 0rtrue = −0.25rtrue = −0.5rtrue = −0.8
  1. *The bootstrap method failed here because in some resamples, the hypothetical repeatability was negative, such that it was impossible to take the square root.

Simulation 1
 rfitness-FA0.01 (−0.11–0.14)−0.14 (−0.26–−0.02)−0.28 (−0.39,−0.17)−0.38 (−0.48,−0.26)
 R0.22 (0.13–0.28)0.30 (0.18–0.36)0.27 (0.18–0.33)0.30 (0.17–0.38)
 rfitness-DI0.03 (−0.28–0.30)−0.26 (−0.49,−0.05)−0.55 (−0.82,−0.33)−0.69 (−1.01,−0.50)
 CVDI0.85 (0.53–1.14)1.03 (0.76–1.29)1.05 (0.78–1.31)1.00 (0.76–1.25)
 rfitness-DI0.02 (−0.28–0.29)−0.25 (−0.47,−0.03)−0.54 (−0.71,−0.36)−0.77 (−0.90,−0.63)
 rfitness-FA0.08 (−0.04–0.20)−0.17 (−0.29,−0.05)−0.34 (−0.44,−0.22)−0.37 (−0.47,−0.25)
 R0.30 (0.20–0.36)0.28 (0.13–0.39)0.24 (0.14–0.31)0.27 (0.18–0.34)
 rfitness-DI0.15 (−0.07–0.41)−0.33 (−0.61,−0.13)−0.69 (−0.99,−0.44)−0.70 (−0.96,−0.48)
 CVDI1.10 (0.83–1.37)0.98 (0.72–1.23)1.02 (0.77–1.28)1.01 (0.79–1.24)
 rfitness-DI0.17 (−0.08–0.42)−0.30 (−0.49,−0.08)−0.57 (−0.72,−0.41)−0.73 (−0.85,−0.59)
 rfitness-FA0.00 (−0.12–0.13)−0.02 (−0.15–0.10)−0.31 (−0.42,−0.19)−0.38 (−0.48,−0.27)
 R0.21 (0.08–0.30)0.26 (0.15–0.33)0.35 (0.26–0.41)0.21 (0.12–0.28)
 rfitness-DI*−0.04 (−0.27–0.19)−0.52 (−0.72,−0.36)*
 CVDI0.79 (0.51–1.09)1.04 (0.76–1.30)1.18 (0.95–1.42)0.77 (0.50–1.05)
 rfitness-DI0.02 (−0.28–0.30)−0.05 (−0.29–0.19)−0.54 (−0.71,−0.36)−0.88 (−0.98,−0.76)

Ancillary