The two-step hypothesis of Müllerian mimicry evolution states that mimicry starts with a major mutational leap between adaptive peaks, followed by gradual fine-tuning. The hypothesis was suggested to solve the problem of apostatic selection producing a valley between adaptive peaks, and appears reasonable for a one-dimensional phenotype. Extending the hypothesis to the realistic scenario of multidimensional phenotypes controlled by multiple genetic loci can be problematic, because it is unlikely that major mutational leaps occur simultaneously in several traits. Here we consider the implications of predator psychology on the evolutionary process. According to feature theory, single prey traits may be used by predators as features to classify prey into discrete categories. A mutational leap in such a trait could initiate mimicry evolution. We conducted individual-based evolutionary simulations in which virtual predators both categorize prey according to features and generalize over total appearances. We found that an initial mutational leap toward feature similarity in one dimension facilitates mimicry evolution of multidimensional traits. We suggest that feature-based predator categorization together with predator generalization over total appearances solves the problem of applying the two-step hypothesis to complex phenotypes, and provides a basis for a theory of the evolution of mimicry rings.

The mechanisms behind the evolution of Müllerian mimicry have been debated since the 19th century (Bates 1862; Müller 1879). An important issue is whether mimicry evolves through gradual change or by larger mutational leaps toward greater similarity (Turner 1984a,b, 1985, 1987). One of the challenges concerning the size of the changes in mimic appearance is purifying (apostatic) selection acting on mutants that are intermediate between the original appearance and the model. Predators might neither recognize such mutants as members of their own population, nor of the model's. For this reason, Punnett (1915) argued that mimicry, either Müllerian or Batesian, must come about through a single, large mutational leap that instantly establishes high similarity to the model. The currently accepted view (Turner 1984a; Joron 2003) of how mimicry evolves is the so-called two-step hypothesis, which was originally proposed for Batesian mimicry by Nicholson (1927) and extended to Müllerian mimicry by Turner (1984a). The hypothesis states that a large mutation first establishes an approximate resemblance to the model, sufficient for predators to generalize between the mutant and the model, after which a gradual evolutionary process follows, fine-tuning the mimic's appearance toward more accurate mimicry. In this way, the problem of purifying selection might be solved.

The two-step process of Müllerian mimicry evolution is often portrayed as a jump between adaptive peaks in a one-dimensional space of appearances (e.g., Turner 1984a, figs. 14.4, 14.6; Turner 1987). It is, however, not so evident that this process can work also for multidimensional appearances. The aposematic signals used by prey to deter predators are often complex and consist of multiple components, such as coloration and pattern shape, and include multimodal signals that address several of the predator's senses other than vision, such as olfaction and audition (Rowe and Guilford 2001). Considering a complex prey phenotype with components that are controlled by separate genetic loci, it is unlikely that large mutations would occur in two or more of these loci simultaneously and establish sufficient similarity for the mutant to escape purifying selection. The problem becomes all the more important when the original appearance of the mimic is distinct enough from the model's that there is no generalization overlap between these appearances, because it is in such a situation that the two-step process is required for mimicry to evolve. When there already is noticeable generalization overlap from the beginning, the process of gradual change described by Fisher (1927, 1930) can readily explain the evolution of Müllerian mimicry (Balogh and Leimar 2005; Ruxton et al. 2008), without the need for an initial large mutation. Thus, for the two-step process to work in a multidimensional space, in situations where the initial appearances are quite distinct, a mutational change in just one of the dimensions must cause predators to classify the mutant as rather similar to the model, even though, taking all components of the appearance into account, the mutant would be intermediate between the original appearance and the model, and perhaps even closer to the original appearance.

Our aim here is to combine ideas from psychology about categorization of complex stimuli with evolutionary modeling to suggest how the two-step process can work. We develop a model in which predators use a feature-based categorization mechanism together with generalization over all components of appearance. The main idea is that, for a complex phenotype, it could be sufficient for mimicry evolution that the initial mutational leap of the two-step process occurs in a single trait, provided that this trait is used by predators as a qualitative feature to categorize prey. When a mutant shares the feature with the model species, we assume that predators generalize broadly over other traits. An initial feature similarity could then give rise to generalization overlap and relaxed purifying selection, and thus facilitate gradual evolution of other traits, fine-tuning the total appearance toward more accurate mimicry.

Category formation as a possible mechanism of discrimination between complex stimuli is a traditional field of study in animal psychology (Pearce 2008), but up to now there are only a few instances where it is taken into account in the study of mimicry (e.g, Beatty et al. 2004; Chittka and Osorio 2007). Categorization could be particularly useful for animals that need to discriminate between large numbers of complex stimuli. When investigating categorization in psychology, natural classes of stimuli, such as photographs of organisms and natural scenery, are often used. In some cases, it seems as if animals are able to form an abstract concept such as pictures of trees or bodies of water (e.g., Herrnstein et al. 1976). In other cases, it is more probable that the categorizations are based on generalization of perceived similarity. Studying the errors that experimental subjects make in their judgments often reveal that they have not formed an abstract concept, but rather attended to one or a few individual features of the stimuli (e.g., D’Amato and Van Sant 1988; Jitsumori and Yoshihara 1997; Martin-Malivel and Fagot 2001; Ghosh et al. 2004; Marsh and MacDonald 2008). Such features could for instance be the presence of a reddish patch in a categorization of pictures that either do or do not contain a human (D’Amato and Van Sant 1988), or the presence of eyes and/or orange color in a categorization of pictures containing orangutans versus other primates (Marsh and MacDonald 2008). Taken together, these studies suggest that animals often use one or a few relevant features to solve categorization tasks.

Many feature-based theories of categorization and concept formation suggest that objects/stimuli are represented as collections of features and that object similarity is recognized by comparing individual features. The common and distinctive features of objects/stimuli thus allow formation of concepts or categories (e.g., Tversky 1977; Treisman and Gelade 1980; Elio and Anderson 1981). Another suggested mechanism of effective similarity judgment is that stimuli are encoded hierarchically at two levels of detail, category and fine-grain levels (Huttenlocher et al. 2000; Crawford et al 2006). Category level information is first used to sort stimuli into crude categories to reduce variability, whereupon fine-grain information completes the judgment. It is this latter kind of mechanism that we assume in our model.

Thus, when applying feature theory to a mimicry situation, we assume that predators primarily attend to individual features of the appearance of aposematic prey, such as its color, or the presence of a stripe in a certain position, to crudely categorize it. We also assume that predators are more likely to treat stimuli that are categorized together as similar, by generalizing more broadly between such stimuli. We use our model to investigate the consequences of a feature-based decision-making mechanism for the two-step process of Müllerian mimicry evolution. Our ambition is to delineate the relatively stringent conditions that need to be satisfied for saltational mimicry evolution to be a realistic possibility. Provided that initially rather dissimilar species join a mimicry ring through a saltational change that achieves feature similarity to a central model, our analysis provides the basis for a theory of the evolution of mimicry rings. A mimicry ring formed in this way is characterized by the feature that acts as a starting point of mimicry evolution for each mimetic species in the ring. Finally, we discuss possible examples of features in Müllerian mimicry and in mimicry rings.

Model Description

The model is a development of models used in Balogh and Leimar (2005), Franks and Sherratt (2007) and Ruxton et al. (2008). In Balogh and Leimar (2005) the gradual evolution of a one-dimensional prey trait was driven by predator generalization, and in Franks and Sherratt (2007) this process was extended to multiple traits. Our model here treats prey appearances as multidimensional stimuli, but differs from previous approaches in that we use a hierarchical decision mechanism. Some of the stimulus dimensions are first used by predators for discrete categorization, from which prey are determined to have or not to have a certain feature, such as a particular coloration. Next, predators generalize between stimuli over all stimulus dimensions, using a generalization function as in previous models, but the width of the generalization function depends on whether the stimuli share the feature. The idea of feature-based generalization is that generalization should be broader between stimuli that share the feature.


One or more of the prey traits are used by predators for classification of prey. A prey item is likely to be classified as having the feature if these traits lie in a certain region of the trait space. Figure 1A shows an example with two feature traits, x1 and x2, in which the predator is likely to classify a prey item as possessing the feature if the feature traits lie in a circular feature region. Similarly, for a single feature trait x1, the feature region would be an interval, such as the one shown on the x1-axis in Figure 1A. We assume that the probability that a predator classifies a prey as possessing the feature is a sigmoid function of the distance R of the prey item's feature traits from the centre of the feature region. Thus, the probability of feature classification is


where Rf is the radius of the region and ρ describes the steepness of the function at the boundary of the region (Fig. 1B).

Figure 1.

Illustration of feature-based classification and generalization functions for multidimensional stimuli. A complex stimulus has traits x1, x2, … xn, either in stimulus dimensions defining a feature (“feature traits”), or in other, nonfeature stimulus dimensions. (A) A feature region defined by two traits, x1 and x2. Stimuli with (x1, x2) within the circle with radius Rf have a high probability to be classified as having the feature (with a single feature dimension this would hold for stimuli x1 within the interval marked on the x-axis). (B) Probability that a stimulus is classified as having the feature as a function of the distance R from the centre of the circle (or from the centre of the one-dimensional interval) in (A). The red and blue lines illustrate more or less sharp classification. (C) Generalization as a function of dissimilarity between two trait values. For stimuli that have been classified as sharing the feature, the broader (dashed) generalization function is used. For stimuli that both lack the feature, the narrower function is used. Predators are assumed not to generalize between pairs of stimuli when one but not the other is classified as having the feature. (D) Contour plot of a generalization function in two dimensions (here with equal generalization widths in both dimensions).


Consider a predator that has learned to avoid the appearance of the individuals of the model population, that all have a certain feature. When the predator encounters a new prey individual, it first classifies it as either having or lacking the feature. This classification affects the next level of the decision-making hierarchy. If the prey item is determined to have the feature, a broad generalization width will be used for all trait dimensions (see Fig. 1C). An interpretation is that when the feature is shared, the predator becomes less discriminating with respect to other differences in prey appearance. If the feature is present only in the model, we assume that there is no generalization between the model and the new prey. Finally, if the feature is lacking in both model and the new prey (or if predators do not use features for classification), a narrow generalization width (Fig. 1C) is used for all traits, including any feature traits.


A predator can observe the multidimensional prey phenotype and, on attack, also the prey unpalatability y. We describe the predator's experience as a list of the xi and yi for the prey it has attacked, together with the feature classification. The probability of the predator attacking a discovered prey is written as


where h depends on the appearance x= (x1, … , xn) of the prey and on the predator's experience. The function q(h) has a step-like shape, h0 is the inflexion point of the curve, and s is a measure of the steepness of the curve at that point. We can think of h as the degree of attack inhibition toward prey with appearance x and it is given by


where m is the number of previously attacked prey and g is a function that describes feature-based generalization. F is a 0/1 variable that indicates whether the feature was detected for the current prey item, and Fi is a similar indicator for prey i in the list of experiences. The probability that F= 1 is given by eq. 1 and the function g is given by




is the squared Euclidean distance between stimuli, and σ and ω are the narrow and wide widths illustrated in Figure 1C. Just as in Balogh and Leimar (2005), the learning process can be viewed as the accumulation of inhibition, with the difference being that generalization is feature-based.


The predator–prey community consists of Np predators and two prey types a and b with fixed unpalatabilities y (the prey types are equally unpalatable in our simulations) and population sizes Na and Nb at the start of each season. A predator independently discovers prey at a rate u per unit time and prey individual. The population sizes Na(t) or Nb(t) change after every time a predator attacks prey of either type (attacks are always fatal). The duration T of a season is divided into small intervals Δt of time. The probabilities Pa and Pb of an individual predator discovering prey types a and b in a time interval are




The size of the time interval Δt is chosen so that it is small enough for both Pa and Pb to be small. In this way we can ignore the possibility of several discoveries during Δt. Thus, the probability of no prey being discovered is


On discovery of a prey individual with appearance x, the probability of attack is computed by first determining the state of the predator according to eqs. 3 and 4 and then using eq. 2. If there is an attack, the event is added to the predator's experience. This is repeated for each predator, after which the next time interval is handled in the same way, until the end of the season.

At the start of a simulation, the prey populations are monomorphic. The prey individuals reproduce sexually and have a diploid genotype that additively determines their appearance x, with one locus for every trait and free recombination between loci. The Na and Nb individuals of the next generation are formed by randomly selecting (with replacement) parents among the survivors from the respective population. Mutations occur with a probability of 0.0005 per allele and mutational increments are drawn from a reflected exponential distribution (cf. Orr 1998) with the standard deviation σm. Prey type a represents the mimic and b is the model. Unless specified otherwise, we used the following parameter values in the simulations: u= 0.04, Np= 100, Na= 1000, Nb= 5000, σ= 1.0, ω= 10.0, s= 2.0, h0= 2.5, σm= 1.0, ρ= 10, centre of feature region x1= 7.5 (or (x1, x2) = (7.5, 7.5)), Rf= 2.5, starting values (x1, … , xn) = (7.5, … , 7.5) for the models and (x1, … , xn) = (2.5, … , 2.5) for the mimics.


We first present a simulation showing that the two-step process of mimicry evolution is feasible for multidimensional phenotypes when there is feature-based generalization. We then explore the effects of the dimensionality of the feature traits and the overall phenotype on the two-step process. Finally, we investigate how large a proportion of predators must make use of the feature for the two-step process to work.


Figure 2 shows an example of feature-based mimicry evolution for a three-dimensional phenotype. One of the traits (x1) is a feature trait and all predators use the interval depicted on the x-axis of Figure 1A as feature region (5 ≤x1≤ 10). At the outset, the feature trait of the model population was located at the centre of the feature region (x1= 7.5) while the mimic population started at a trait value well outside the feature range (x1= 2.5). After approximately 1500 generations, a saltational leap of the mimic toward the model occurred in the feature trait (Fig. 2A), and thus feature similarity was established. A detailed depiction of the mimic's saltation is shown in Figure 3. At the start of the simulation, the mimic population was monomorphic, and before the saltation the mimics were distributed around the initial trait value (Fig. 3A). During the saltation, some mutant heterozygotes became established within the feature range (Fig. 3B) and increased in frequency (Fig. 3C), followed by more and more mutant homozygotes until the mutant allele was fixed in the population (Fig. 3D). After the mutational leap, predators had a high probability to classify the models and mimics as sharing the feature, thus generalizing more broadly between overall model and mimic appearances, including the traits x2 and x3. This led to gradual evolution of the nonfeature traits toward mimicry (Figs. 2B,C), through the so-called Fisher process (Balogh and Leimar 2005; Ruxton et al. 2008). We also conducted simulations with parameters identical to the ones in Figure 2, but with none of the three traits being a feature trait. In this case no mimicry evolution occurred in any of the traits, as the predators did not generalize between the prey appearances in the two populations.

Figure 2.

Feature-based mimicry evolution. Trajectories of the mean traits for a model population and a potentially mimetic population are shown. There are three trait dimensions, one feature trait x1 (with the feature region 5 ≤x1≤ 10), and two other traits x2 and x3. (A) The mimic feature trait x1 evolves through a saltational leap (arrow) from x1= 2.5 to x1≈ 9.0 (which leads to a high probability of the stimulus being classified as having the feature). The mimic then shares the feature with the model (x1= 7.5 for the model). (B,C) After the feature similarity has been established, there is gradual evolution of the mimic traits x2 and x3 toward a more perfect mimicry.

Figure 3.

Detailed illustration of the mimic's feature saltation in Figure 2A. The four panels show histograms of the feature trait x1 in the mimic population at different points in time around the saltational event. (A) Before the saltation, the trait is narrowly distributed near its initial value x1= 2.5. (B) Approximately 100 generations after this time, a number of mutant heterozygotes within the feature range (5 ≤x1≤ 10) appear (arrow). (C) After an increase in frequency of the mutant allele, more and more mutant homozygotes appear in the population (arrow). (D) 200 generations later, the mutant is fixed and the mimic shares the feature with the model.

It seems appropriate to regard the evolutionary process depicted in Figures 2A and 3 as a saltation, because already the first mutant heterozygote of population a appearing in the feature region loses essentially all protection from its similarity to the other members of its population. According to our assumptions, predators do not generalize between prey that possess and prey that lack the feature. The mutant can gain protection only through predator generalization from the model population, which means that it fully plays the role of a mimic. To gain this protection, the feature-based generalization must be sufficiently broad and the model population must be better protected than is originally the case for the potential mimics (in Fig. 2, the models have a larger population size than the mimics). Even when the mutant heterozygote has a survival advantage, chance events of predator learning and attack might still prevent it from establishing a mutant lineage. In our simulations there were typically several unsuccessful feature mutants before a saltation like the one in Figure 3 occurred.


To examine whether the two-step process is also possible with multiple feature traits, we conducted a simulation similar to the previous three-dimensional case, except that we used two feature traits and the circular feature region in Figure 1A. The outcome can be seen in Figure 4A, which shows the Euclidean distance from the model's and the mimic's average feature traits (x1, x2) to the centre of the feature region. The model started at zero distance from the feature centre, while the mimic started outside feature range (at x1=x2= 2.5, as in the previous simulation). Over the time span of the simulation (100,000 generations) no saltation occurred and no feature similarity was established for the two-dimensional feature. Thus, it is clearly less likely that a feature saltation occurs in two traits simultaneously than in a single feature trait. Lacking feature similarity, predators did not generalize between the appearances of the two populations and, as a consequence, there was no evolution toward mimicry in the third trait x3 (Fig. 4B). By adjusting the parameters of the simulation, we found a situation where feature similarity evolved also for the two-dimensional feature (Fig. 4C), but the feature similarity in this case was not attained through a simultaneous mutational leap in the two feature traits. Because we increased the width of the narrower (cf. Fig. 1C) generalization width to σ= 2.0, purifying selection was relaxed and there was more random drift of the mimic average phenotype. From its starting position nearer to the feature region, this brought one of the two feature traits of the mimic to a position where a saltational change in the other trait could produce feature similarity, and this is the saltation seen in Figure 4C. Thus, feature similarity was attained by one of the feature traits randomly drifting toward the feature region, after which there was a saltation in the other feature trait, moving the model appearance into the feature region. After feature similarity was attained in this manner, also the third trait x3 evolved to establish mimicry (Fig. 4D). We conclude that the initial saltation in a two-step process can occur for a single trait, but is very unlikely for a combination of several, genetically independent traits.

Figure 4.

Illustration of limitations of the two-step process for multidimensional features. There are two feature traits, (x1, x2) and one nonfeature trait x3. (A) R(x1, x2) is the Euclidean distance from (x1, x2) to the centre of the feature region (see Figs. 1A,B). When the feature trait is two-dimensional, no saltational feature evolution occurs, even though the parameters are the same as in Figures 2 and 3. (B) Without feature evolution, there is no gradual mimicry evolution in the nonfeature trait x3. (C) When changing the parameters so that the mimic has initial trait values closer to the border of the feature region (x1=x2= 4.5 for the mimic, instead of 2.5 as in (A) and (B)), and doubling the generalization width for the predators (σ= 2.0), a feature saltation is possible. (D) After feature similarity has been established, there is gradual mimicry evolution in the nonfeature trait.


We also examined cases with higher overall dimensionality but with a single feature trait. One might expect that when the feature is one-dimensional, and with sufficient mutational variability in this trait, also a high-dimensional phenotype could evolve gradually toward mimicry provided that the Fisher process of gradual mimicry evolution can operate in a multidimensional trait space. The result of a simulation for six-dimensional prey appearance is presented in Figure 5, showing a feature saltation (Fig. 5A) followed by a Fisher process of gradual mimicry evolution (Fig. 5B). We used the standard parameter values for this simulation, except that we increased the width of generalization between stimuli that share the feature from ω= 10 to ω= 15. The reason why a greater generalization width per trait is required in a higher-dimensional space is to compensate for the effect of dimensionality of the generalization function. In eqs. 4 and 5, for a given difference per trait, the Euclidean distance between stimuli is greater when the number of dimensions n is higher. Figure 5B shows the Euclidean distance between the model and mimic average appearances and, after feature saltation, this distance decreases during mimicry evolution, but it does not reach zero. The reason is that, because of the wide generalization width (ω= 15), purifying selection is rather weak and allows some random drift in both model and mimic appearances. Thus, mimicry evolves, but remains somewhat sloppy and does not reach perfection. Finally, it is worth noting that immediately after feature saltation is complete, the average Euclidean distance between mimic and model appearances (≈11.0; see Fig. 5B) is larger than the average Euclidean distance between this postsaltation mimic appearance and the original mimic appearance (≈5.5). This implies that the greater part of the overall mimicry evolution is gradual.

Figure 5.

Feature-based mimicry evolution for a six-dimensional stimulus. Only one of the six traits is a feature trait (x1). (A) The mimic feature trait evolves through a saltational leap (arrow) from x1= 2.5 to x1≈ 8.5. (B) After the feature similarity has been established, there is gradual evolution of all mimic traits toward mimicry. The y-axis in shows the Euclidean distance between population means of the model and mimic traits (x1, x2, … , x6). The parameters in this simulation were the same as in the case in Figure 2, except that ω= 15 instead of 10.


To further illustrate the implications of an initial feature saltation for the evolutionary process toward mimicry, we conducted simulations of the three-dimensional case in Figure 2, but varying the proportion of predators that use the feature to classify prey. As can be seen in Figure 6, feature saltation becomes more and more difficult as the proportion of predators that actually use the feature decreases. When the proportion was less than around 90%, feature saltation was not possible, and it took progressively longer to happen for proportions approaching this limit. The reason is that, for sufficiently large feature-trait mutations, the predators that do not use the feature for classification will not generalize between the feature mutant and its other population members, nor will they generalize between the mutant and the model population. These predators then represent an extra cost for the feature mutant. If the cost becomes too high, feature saltation is prevented. The requirement that a high enough proportion of predators use feature based classification is thus an important constraint on the two-step process.

Figure 6.

The rate of appearance of feature similarity (which is followed by mimicry evolution) as a function of the proportion of predators using the feature. The x-axis shows the proportion of the predator population using the feature for prey classification in the same way as in the simulations in Figure 2. The remaining predators generalize over prey appearances according to the more narrow generalization function that would apply to stimuli that both lack the feature (cf. Fig. 1C). The y-axis shows the inverse of the time until feature saltation and the points illustrate the median and upper and lower quartiles of 20 simulations for each proportion (the inverse time is displayed because the times were very large for smaller proportions of predators using the feature). Each simulation was run for 200,000 generations.


We found that when there is feature-based generalization, the two-step process is a possible path to Müllerian mimicry for multidimensional prey phenotypes, also in situations where unlinked loci influence the different trait dimensions. If a mutational leap occurs in a feature trait that is used by predators for prey categorization, predators may place the mutant individual into the same category as the model. This first step toward mimicry can facilitate the evolution of other traits and thus the fine-tuning of the model-mimic resemblance. Our results nevertheless imply that the evolutionary path to mimicry is constrained and becomes likely only under certain conditions. First, even if predators place prey appearances into categories based on the presence or absence of qualitative features, it is only when a single feature dominates classification, and when prey can acquire this feature through a single mutation, that the first step of the two-step process becomes likely. Second, a large majority of the predators that influence the mimetic relationship must use the same or a similar feature-based classification; otherwise the first step of the process becomes unlikely. To the extent that several species join a mimicry ring through the process, it then follows that they should enter the mimetic relationship by acquiring more or less the same feature, provided the predator community stays similar in its categorization of prey. A mimicry ring formed in this way can thus be characterized by a feature that acts as a starting point of mimicry evolution.

In our analysis, we made the assumption that the model species possessed a feature, whereas the original appearance of the mimic was classified by predators as not having the feature. There are of course other possibilities, for instance that a mutation produces a transition from one feature-defined category to another, as could be the case for a change from one particular color to another, and this would work in much the same way as the situation we studied. If a mutation instead adds a feature without simultaneously removing another preexisting feature, one needs to take into account how features interact. It is possible that one of the features could dominate such an interaction, for instance the feature with greater salience or the one associated with a more important (e.g., more unpalatable) prey category. The range of possibilities becomes even greater if one considers that the predator and potential model communities might vary in space and over evolutionary time. In such cases sequences of different feature saltations, interspersed with periods of gradual change, could occur.

The distribution of gene effects that are fixed during adaptive evolution is an important topic in evolutionary genetics (Orr 1998, 2005). As emphasized by Baxter et al. (2009), the two-step process of mimicry evolution, involving a jump between adaptive peaks, differs from an adaptive walk when there is only a single peak. The current knowledge of the genetics of mimicry in Heliconius butterflies is in accordance with the two-step hypothesis (Baxter et al. 2009), in that major wing-pattern gene effects are found together with smaller effects. At the present time, it is, however, not known whether at least some major effects arose as single mutations, or if they instead evolved through several substitutions at the same locus. Detailed comparison of the DNA sequences of the genes is needed to resolve this issue.

After the first step of the two-step process succeeds, our simulations showed varying degrees of fine-tuning of the mimetic appearance. Because feature-based generalization entails broad generalization over nonfeature traits, it can be consistent with instances of imperfect mimicry (cf. Chittka and Osorio 2007). In particular, we found that mimicry becomes less perfect when generalization occurs over many nonfeature traits, the explanation being that there must then be less purifying selection per trait. Cases of highly perfected mimicry for very many traits are not fully explained by our analysis, and may require the action of more narrowly generalizing predators. These kinds of predators would make the first step of the two-step process difficult, but they might enter the predator community at a later stage (cf. the discussion of the predator spectrum in Balogh and Leimar 2005), or be present only in part of the region where the model and mimic occur. There are several other reasons why mimicry might remain imperfect (Lindström et al 1997, 2004; Holen and Johnstone 2004; Gilbert 2005), for instance because there are several somewhat dissimilar models (Sherratt 2002) that are either present simultaneously or successively over a season.

Empirical support for the kind of predator decision-making mechanism used in our model includes experimental studies investigating the relative importance of stimulus components for birds’ attention in avoidance learning. Some studies report a bias in attention toward one or a few features (Terhune 1977; Gamberale-Stille and Guilford 2003; Bain et al. 2007; Aronsson and Gamberale-Stille 2008). Further, studies of mimicry and aposematism that investigate predator reactions and generalization among color pattern stimuli suggest that possible features within a stimulus are attended to in different ways. For instance, Schmidt (1960) trained birds to discriminate food presented together with a model stimulus looking like half a butterfly, with predominantly black wings and with one red and one white area. Control (palatable) stimuli were grey, brown or green. In a generalization test, the birds generally avoided imperfect mimics that had a large area of black in their coloration, whereas the red and white patterns were not effective without the black. Also, studies of generalization by birds of naturally occurring aposematic prey suggest that they generalize quite broadly between different red and black patterned prey species (Evans et al. 1987; Exnerová et al. 2008), and generalize color but ignore pattern in comparisons between different color morphs of a heteropteran bug (Exnerová et al. 2006). Overall, these studies suggest that it is reasonable to assume that bird predators often attend to and learn one or a few salient components within a stimulus, rather than giving equal attention to every component.


In the following we give examples from the literature of traits that are candidates to function as features for predator categorization of prey. In the experiments mentioned above, coloration was found to be important. This suggests that the overall color of an aposematic prey, or the color of specific pattern elements, may constitute a feature, for example red or yellow in insect species of about the same shape and size. Color has been found to be of great importance, more so than pattern, in predator recognition of distasteful prey (Osorio et al. 1999; Exnerová et al. 2006; Aronsson and Gamberale-Stille 2008). Moreover, a drastic change in color can be effected by single mutations (e.g., Lal and Bhatia 1962; Turner 1971; Socha and Nemed 1996). Thus, it is quite plausible that, for instance, a red mutant in an otherwise yellow population can be selectively favored if predators use the color red as a feature based on their experience with other red distasteful species in the area (or vice versa).

To show that a mimetic relationship has evolved, various types of data are needed: behavioral (predator–prey interactions), biogeographical and phylogenetic. Most empirical data on Müllerian mimicry do not support the idea of coevolved mutual convergence, which might be the case if mimicry would come about through purely gradual evolution. Evidence rather favors unilateral advergence (Mallet 1999), where one species evolves to mimic another and there is little or no change in the appearance of the other. Advergence can be consistent either with saltational or gradual mimicry evolution (Balogh and Leimar 2005).

In the literature we have found several cases that involve the evolution of color change in Müllerian mimics, and here we bring up three such cases. First, the switch from red to white/yellow in some populations of the moth Zygaena ephialtes (Zygaenidae), second the switch between red and yellow in the bug genus Phonoctonus (Reduviidae), and third, the evolution of the orange “tiger” coloration in silvaniform Heliconius butterflies.

Zygaena ephialtes has a wide distribution in Europe and exhibits geographic variation in coloration. North of the Alps it resembles other zygaenids, including the abundant Z. filipendulae, which has red forewing spots and red hindwing patches on a black background. In addition, the northern variant of Z. ephialtes—referred to as the red peucedanoid form—has a red abdominal band (Turner 1971). In some parts of southern Europe, the species instead occurs as the yellow ephialtoid form. The southern variant of Z. ephialtes resembles the co-occurring Syntomis (Amata) phegea (Arctiidae), which is black with white wing spots and has a yellow abdominal band, and lacks hindwing patches. That Z. ephialtes mimics S. phegea in this region, and not the other way around, is supported by the fact that S. phegea and other Syntomis species have white and yellow in their coloration also when they occur alone (without Z. ephialtes), whereas Z. ephialtes is white and yellow only when it co-occurs with S. phegea. In addition, the population densities in the areas of sympatry are much higher for S. phegea (Sbordoni et al. 1979), although it may be the least toxic of the two species. There are also regions in Europe where an intermediate variant of Z. ephialtes is found—the red ephialtoid form—which lacks hindwing patches and has white wing spots, apart from two red basal forewing spots, and has a red abdominal band. The intermediate form occurs either as a local monomorphism or in a polymorphism with one of both of the other two variants (Turner 1971). The difference in appearance between the red peucedanoid form and the intermediate form is determined by a single locus and the difference between the intermediate and the yellow ephialtoid form is determined by another, unlinked locus. This implies that the evolutionary transition between red peucedanoid and yellow ephialtoid must have involved at least two (and probably more than two) steps. One of these steps might represent a feature saltation, for instance the shift from red to white wing spots or the change from red to yellow coloration.

The switch from the red peucedanoid form to the yellow ephialtoid form of Z. ephialtes has been used in evolutionary genetics to exemplify transitions between two adaptive peaks (Coyne et al. 1997; Gavrilets 1997; Barton et al. 2007). Our theory of feature-based mimicry evolution can throw further light on such transitions. The suggestion has been that a transition from the original red peucedanoid form to the intermediate form occurred first, being favored in situations where both the models Z. filipendulae and S. phegea are present, and was followed by a change from red to yellow of the abdominal band and the basal wing spots, leading to the yellow ephialtoid form. This scenario is in agreement with the argument by Sbordoni et al. (1979), to the effect that the intermediate form is favored when S. phegea is abundant early in the season but only Z. filipendulae is present later in the season. The transition from the original to the intermediate form could in principle be a single feature saltation, but reality is most likely more complex. The difference in appearance between the red peucedanoid and the intermediate form involves changing two different traits: the color of the wing spots and the degree of melanism in the hind wings. The locus controlling these traits seems to be a supergene with two closely linked components (Sbordoni et al. 1979). For the degree of melanism, a number of intermediate forms occur in certain Mediterranean regions (Hofmann 2003; Hofmann et al. 2009), and similar types of melanism are found in other Zygaena species, perhaps as an adaptation for thermoregulation, which suggests that a change in hindwing melanism could have preceded a possible feature saltation from red to white wing spots in Z. ephialtes. It is also possible that a saltational change from red to yellow coloration occurred before the appearance of white wing spots. The preservation of the two basal yellow wing spots on the yellow ephialtoid form of Z. ephialtes might then be a result of adaptive fine-tuning to resemble an anterior abdominal patch in S. phegea. In this alternative scenario, the red ephialtoid form would have arisen at a later time, perhaps as a consequence of hybridization of red peucedanoid and yellow ephialtoid populations (Hofmann 2003).

Cotton stainers, Dysdercus spp. (Pyrrhocoridae), have a wide tropical and sub-tropical distribution (Schaefer and Ahmad 2000). Pyrrhocorids are generally distasteful to predators and often aposematic (e.g., Exnerová et al. 2006). A series of studies on the evolution of coloration in neotropical Dysdercus (Zrzavy 1994; Zrzavy and Nedved 1997, 1999) have identified both yellow and red mimicry rings. In Africa, there are also both red and yellow species of Dysdercus, sometimes co-occurring in the same geographical area. For instance, D. voelkeri and D. fasciatus are both red with a black transversal wing band, whereas D. melanoderes is usually yellowish and lacks wing bands. The species prefer different habitats (the first two mainly savanna, the third species mainly forest; Fuseini and Kumar 1975) and are predated on by different Phonoctonus species (Reduviidae). Not only are Phonoctonus specialized predators on Dysdercus, but they closely mimic the specific Dysdercus species which constitutes their main prey (e.g., P. fasciatus mimics D. voelkeri and D. fasciatus, and P. subimpictus mimics D. melanoderes; Stride 1956a,b; see also Edmunds 1974, p. 74, fig 3.5). We have found no phylogenetic investigations on Phonoctonus, but assuming that the genus constitutes a monophyletic group, we can infer at least one transition between yellow and red in the genus. This transition may have been a feature saltation. That Dysdercus are the models and Phonoctonus the mimics in this system is clearly supported by the fact that the prey by far outnumber the predators. The presence/absence of a wing band may be part of the feature or a trait subjected to adaptive fine-tuning. It is likely that both of the described genera, Dysdercus and Phonoctonus, belong to wider mimicry rings involving at least other heteropteran bugs in the region.

Heliconius butterflies have been extensively studied with respect to geographic variation and Müllerian mimicry (Turner 1971; Mallet and Gilbert 1995). Common colorations that may contain features are the “rayed” (rayed hind wings) and “post-man” (red and yellow stripes; Joron et al. 2006) appearances, and one of these may be ancestral; it has been difficult to reconstruct ancestral wing patterns in Heliconius because of their rapid diversification (Joron et al. 2006). A less common Heliconius appearance is the “tiger” coloration (orange and yellow stripes and blotches on a black ground; Turner 1971) that involves Müllerian mimicry with Ithomiinae butterflies. The coloration occurs in several species within one, probably monophyletic, group of Heliconius species (Beltrán et al. 2007), the so-called silvaniform group, and is probably a derived character. We hypothesize that the central orange and black in the “tiger” pattern is a candidate for a feature that predators use in prey categorization. There is considerable variation in the “tiger” coloration of different silvaniform butterflies, a possible result of evolutionary fine-tuning toward more accurate mimicry of different Ithomiinae species. In general, mimicry evolution in Heliconius appears to be a complex process, possibly with a strong influence of introgression of wing pattern genes through hybridization between different Heliconius species (Gilbert 2003). Even so, the kind of two-step scenario we have analyzed might still apply to Heliconius, if one takes into account spatial and temporal variation in predator and model communities.

In conclusion, we find several examples of evolutionary transitions between major color components, such as red and yellow, in insects involved in Müllerian mimicry. Because such transitions can be effected by single mutations and because predators generally attend to the color of aposematic prey, we suggest that overall color, or the color of specific pattern elements, may be an important feature that predators use to categorize prey as distasteful. More generally, we suggest that mimicry rings, in particular those containing phylogenetically distantly related organisms, can be characterized by the feature used by members of the ring to gain entry to the mimetic relationship.

Associate Editor: C. Jiggins


We thank Øistein Haugsten Holen and two anonymous referees for helpful comments. This work was supported by grants from the Swedish Research Council to G.G.-S., B.S.T. and O.L.