A recent molecular phylogenetic study of the cosmopolitan nymphalid butterfly genus Vanessa (Wahlberg & Rubinoff, 2011) exemplifies some potential pitfalls of algorithm-based inferences drawn from temporal and biogeographical patterns of taxonomic divergence. There has lately been a trend towards inclusion of model-based molecular clock and ancestral area estimates in phylogenetic analyses of lepidopteran taxa (e.g. Müller & Beheregaray, 2010; Price et al., 2011), and we are concerned about the potential for naive acceptance of results from such computer-based approaches. We offer the following comments in the hope of encouraging our esteemed colleagues to consider their data as well as their models.
The program diva (Ronquist, 1997) has gained popularity as a means to infer hypothetical ancestral geographical distributions of clades. This software appears to generate ‘conservative’ optimizations of internal nodes, in the sense that no area represented in a subtending clade is excluded as a potential ancestral area, at least in the ‘unconstrained’ mode. This is not necessarily the most enlightening interpretation of the available data (although it does endow the user with plenty of leeway for spinning favoured narrative scenarios).
Wahlberg & Rubinoff (2011) offer an example. The genus Vanessa comprises 22 species, one of which, V. tameamea, is a Hawaiian endemic (one of two such butterfly species; the other is a lycaenid). In Wahlberg & Rubinoff (2011: fig. 2), Hawaii is indicated as a potential ancestral area for the entire genus Vanessa. If V. tameamea, the only Vanessa known from Hawaii, were the sister taxon to all remaining Vanessa species, or if the sister genus of Vanessa also occurred on Hawaii, this might be correct. However, given that neither of these propositions is true and the distribution of V. tameamea is autapomorphic, the parsimonious interpretation is that V. tameamea occurs on this remote oceanic archipelago due to a colonization event. Based on morphological data, Leestmans (1978) suggested that V. tameamea is the sister group of V. atalanta, and this was also the conclusion of Vane-Wright & Hughes (2007) based on a review of available morphological and molecular evidence – a position corroborated by Wahlberg & Rubinoff (2011: fig. 2) from molecular data alone. As V. atalanta now occurs throughout the western Palearctic, North Africa and North America south to Guatemala, the naive or ‘common sense’ suggestion would be colonization from North or Central America. Wahlberg & Rubinoff (2011) almost concur in their discussion: ‘North America or the Palaearctic may have been the source for V. tameamea’ (p. 366). If so, why do they accept an algorithm that allows ‘parsimonious’ but illogical optimizations, right down to the root?
Those familiar with ‘standard’ cladistic biogeography (cf. Humphries & Parenti, 1999) will be aware that the major difficulties of inferring biogeographical patterns arise not from inference of ancestral distributions of individual clades, but from the inference of general vicariant patterns from partially incongruent area cladograms of multiple taxa. Under those circumstances, dispersal is viewed as an undesirable ad hoc hypothesis because it may be invoked to save a preferred vicariant scenario. However, in Wahlberg & Rubinoff (2011) only the biogeographical history of Vanessa is at stake, and given the species' distributions and biology, dispersal is a necessary part of a parsimonious historical narrative.
Any biogeographical inference is only as good as the character coding of its terminals. In the case of a relatively young clade of butterflies with apparently widely varying dispersal ability, it seems particularly important to be precise about the distributions of individual species: V. vulcania does not occur in ‘the Palearctic’ [as coded by Wahlberg & Rubinoff (2011)], but is endemic to the Azores and Canary Islands. Vanessa terpsichore, V. braziliensis and V. altissima may all be ‘neotropical’, but they are also allopatric with respect to one another. Thus, lumping discrete areas may ignore relevant biogeographical history. At the other end of the spectrum, Mesozoic vicariant events, such as the separation of the Afrotropical and Neotropical regions are probably not causal factors in the distribution of a taxon that diverged in the Oligocene. Although dispersal may be ‘less parsimonious' than vicariance according to the algorithm, sometimes it is still the only sensible explanation of a given pattern of distribution. For inhabitants of oceanic islands, this would seem to be the case regardless of the age of the taxon.
The Bayesian statistical program beast (Drummond & Rambaut, 2007) is frequently used to infer clade ages across a tree from sequence data using a rate smoothing algorithm calibrated by some number of empirical date estimates. The ages of diversification of various butterfly lineages have been in question for a long time due to a paucity of fossils (Scott & Wright, 1990; Vane-Wright, 2004; de Jong, 2007), but have lately been addressed with model-based molecular clocks (Wahlberg, 2006; Wahlberg et al., 2009), using a combination of butterfly fossils and ages of larval food plant taxa as empirical constraints. Happily, the genus Vanessa appears to be represented by at least one fossil species, Vanessa†amerindica (from the Florissant Formation, late Eocene/early Oligocene, ∼34 mya; Miller & Brown, 1989; Grimaldi & Engel, 2005). Based on an imprint of its forewing pattern, V. amerindica was hypothesized by Miller & Brown (1989) to be more closely related to V. indica than to other members of the genus, such as V. cardui. Assuming, as Wahlberg & Rubinoff (2011) have done, that the age and identity of this fossil have been inferred correctly [see de Jong (2007), who has cast strong doubt upon the precision of the identity of V. †amerindica], we can state that the split between the V. cardui (‘painted lady’) group and the V. atalanta (‘red admiral’) group within the genus Vanessa, and therefore the origin of the genus as a whole, took place at least 34 mya. Wahlberg (2006) used the age of this fossil to calibrate the split between Vanessa and its sister taxon, the Neotropical genus Hypanartia, resulting in an age estimate for the divergence among members of Vanessa at 27.0–31.3 mya, depending on the model employed. This date (28 ± 3 my) was used as the basis for the estimates of divergence ages within the genus in Wahlberg & Rubinoff (2011).
Logically, a fossil cannot be older than the genus it is placed in. To skirt this conundrum, Wahlberg & Rubinoff (2011) qualified the 6 my discrepancy between the dates by postulating that the younger one represents the age of divergence of extant Vanessa, implying that V. †amerindica is a sort of Archaeopteryx of its genus. However, this stem versus crown characterization is at odds with the original description of the fossil, as indicated above. In fact, as a hypothetical sister to V. indica, V. †amerindica would be more closely related to V. virginiensis than it is to V. annabella [the two Vanessa species sampled in Wahlberg (2006)], and unequivocally a member of the crown group. Wahlberg & Rubinoff (2011) presented no evidence to contradict that hypothesis, but instead chose to ignore data that were incompatible with the model-based molecular clock estimate from Wahlberg (2006). Thus, although it may originally have seemed a plausible, conservative assumption to use the age of the fossil as a minimum age for the split between Vanessa and Hypanartia, it appears that the calibration date was affixed to the wrong node in the original publication, and this error has been compounded in Wahlberg & Rubinoff (2011) with manifestly illogical results. Vane-Wright & Hughes (2007) did not challenge the inclusion of V. †amerindica within the ‘red admiral’ group of Vanessa but, if de Jong (2007) is correct, it is questionable if amerindica even belongs to the genus, rendering all of this discussion – and ANY hypothesized age of Vanessa depending upon the age of V. †amerindica– a moot point. Either way, the Wahlberg & Rubinoff (2011) age estimate for the origin of Vanessa of 28 my, and any other ages of divergence estimated based upon that datum as a calibration, are incorrect.
The only direct link to an absolute time scale in molecular dating is the empirical ages of fossils (or other geological events) attached to particular nodes on the tree. As de Jong (2007) has documented in depressing detail, even these ‘facts’ often suffer from over-optimistic precision of taxonomic affinity. Despite these evidentiary limitations, the fossils still represent the ‘empirical base’. When those anchors are tossed overboard to accommodate contradictory results from an algorithm that uses them as its calibration points (we note that beast puts 95% confidence intervals even on its externally calibrated nodes), we see a problem – the model has become more ‘real’ than the data, and in the process ‘history’ becomes fiction (cf. Rieppel, 2007; Williams & Ebach, 2010).
We are grateful to Niklas Wahlberg and Dan Rubinoff for sharing editorial correspondence related to their publication with us, and look forward to fruitful collaborations with them in the future. We also thank Peter Cranston for providing an opportunity for us to express our Luddite opinions.