Quantifying the extent of morphological homoplasy: A phylogenetic analysis of 490 characters in Drosophila

Abstract Homoplasy is a fundamental phenomenon in evolutionary biology but an appraisal of its extent at the morphological level is still lacking. Here, we analyzed the evolution of 490 morphological characters conceptualized among 56 drosophilid species. We found that two thirds of morphological changes were homoplastic and that the level of homoplasy depended on the stage of development and the type of the organ, with the adult terminalia being the least homoplastic. In spite of its predominance at the character change level, homoplasy accounts for only ∼13% of between species similarities in pairwise comparisons. These results provide empirical insights on the limits of morphological changes and the frequency of recurrent evolution.


Impact Summary
Is morphological evolution limited, with life being constrained to a small area in the space of all possible forms? Or is it endless, as has been postulated by Darwin 150 years ago? The repeated origin of similar traits in phylogenetically distant lineages, known as homoplasy, challenges Darwin's view but we still lack empirical appreciation of the extent of this phenomenon. Analysis of the evolution of a large set of morphological traits in different developmental stages among 56 fly species revealed that two thirds of morphological changes were homoplastic. Homoplasy was also more frequent in juvenile stages than in adults who showed the highest morphological diversity. Although these findings support the prevalence of homoplasy, they also show that opportunities for the origin of new forms are still higher than it has recently been suggested.
"Although new and important modifications may not arise from reversion and analogous variation, such modifications will add to the beautiful and harmonious diversity of nature." (Darwin 1859) Homoplasy, that is, the independent origin of similar character states between distant taxa, is widespread in the living world (Wake et al. 2011), but an appraisal of its extent and underlying mechanisms remains lacking. Some authors argue that homoplasy is so ubiquitous that it is evidence for the limitation and predictability of evolution (Conway Morris 2008;McGhee 2011;Blount et al. 2018). Others, however, caution against this view due to possible experimental and character selection biases (Powell and Mariscal 2015;Stayton 2015). Indeed, whereas long lists of homoplastic examples have been compiled (e.g., Martin and Orgogozo 2013), equivalent lists of nonhomoplastic, that is, apomorphic, states have rarely been made. An account of the diversity of as many character states as possible in clades with well-defined phylogenies is therefore strongly needed.
Characters are qualities attributed to delimited structures. For molecular data, the structures are nucleotides or amino acid residues at a well-defined spatial position in a sequence, whereas the qualities are the biochemical compositions of these nucleotides and amino acids. Categorical values with no intermediates could be attributed to these qualities, that is, 4 for nucleotides and 20 for amino acids. For morphological traits, on the other hand, such character conceptualization and categorical coding are difficult (Vogt et al. 2010). In addition, morphological traits are usually stored in the context of phylogenetic analyses wherein autapomorphic states, that is, unique, novel states existing in only a single taxon or species, are often intentionally omitted when parsimony, the predominant approach, is used for tree reconstruction (Bryant 1995;Lewis 2001). This omission could lead to strong biases in estimating homoplasy since homoplasy is timeindependent, whereas synapomorphies, that is, shared, commonly inherited derived states, reflect a particular case of evolutionary stasis wherein a state is maintained with minimal changes for long periods of time.
Here, we address the question of morphological homoplasy in Drosophila. One hundred years of genetics and developmental biology research have made this fly one of the best-understood animals at the morphological level. Annotated genome sequences for multiple species are available and links between genetic mutations and morphological aberrations are curated in accessible online databases such as Flybase (www.flybase.org). Besides the availability of unique genetic toolkits, these resources have made the genus ideal for studies aiming to trace morphological homoplasy between species to their molecular underpinnings (e.g., Wittkopp et al. 2003b;Prud'homme et al. 2006;Kagesawa et al. 2008;Tanaka et al. 2009;Frankel et al. 2012;Signor et al. 2016;Yassin et al. 2016a, b). From two major taxonomic references, we conceptualized 490 morphological characters among 56 drosophilid species. By analyzing the evolution of these traits on a molecularly inferred phylogeny, we were able to quantify the extent of morphological homoplasy in this important clade.

TAXON SAMPLING
We selected 56 drosophilid species for which molecular sequences were available in GenBank and full morphological descriptions could be obtained from two major taxonomic books, Okada's (1968) Systematic Study of the Early Stages of Drosophilidae and Bächli et al.'s (2004) The Drosophilidae (Diptera) of Fennoscandia and Denmark (Table S1). These books represent two of the most explicit standardized and illustrated descriptions for drosophilids juveniles and adults. We selected species from the main drosophilid clades, that is, the subfamily Steganinae with its two tribes the Steganini and the Gitonini and the subfamily Drosophilinae with its two tribes the Colocasiomyini and the Drosophilini, following Yassin's (2013) classification scheme (see O'Grady and DeSalle 2018 for the current status of drosophilids phylogenetics). The 56 species belonged to 15 genera, including four Drosophila subgenera (namely Sophophora, Dorsilopha, Drosophila, and Siphlodora). More than one member represented some species groups, e.g., the melanogaster, obscura, and quinaria groups, hence encompassing shallow and profound phylogenetic depths. Some species were treated as "name holder," having composite DNA sequences or grouping juvenile data from Okada (1968) and adult data from Bächli et al. (2004), such as Leucophenga sp, from knowledge or presumption of monophyletic relationships (Table S1; Fig. 1). For morphological data, only 21 species had data from both taxonomic monographs (Table S1; Fig. 1).

MOLECULAR PHYLOGENETIC TREE
In order to define the phylogenetic relationships of species that were not included in Yassin's (2013) family-wide phylogenetic revision, we obtained molecular sequences from GenBank for the 56 taxa for one mitochondrial (COII) and four nuclear genes (28S rRNA, Adh, Amyrel, and Gpdh; Table S1). For each gene, DNA sequences were aligned using the Muscle program (Edgar 2004) with default parameters as implemented by the MEGA7 software package (Kumar et al. 2016). The alignments were then concatenated in a single nexus file (Supporting Information S1). MEGA7 was also used to infer the best DNA substitution model for each gene. For the five genes, the GTR+G+I model had the lowest Akaike Information Criterion (AIC) value and thus was chosen for phylogenetic inference.
Phylogenetic analysis was conducted using MrBayes ver. 3.2. (Ronquist et al. 2012). We used Yassin's (2013) phylogenetic classification as a topological constraint prior given its larger taxonomic and gene sampling. Because the GTR+G+I model was suggested for all genes, we tested unpartitioned model versus partitioned model strategies using stepping-stone sampling (Xie et al. 2011) implemented in MrBayes. For each strategy, we conducted two simultaneous runs of 1,000,000 generations with sampling every 100 generations, and considered a value of average SD of ࣘ0.01 an appropriate indicator for convergence between the two runs. Unpartitioned model had a higher marginal likelihood value than partitioned model (-42,712 vs. -42,804). We also used stepping-stone sampling to test for various clock models using the same run conditions of the unpartitioned data. Relaxed clock models had better marginal likelihood values (TK02 = -42,657 and IGR = -42,658) than the strict (-42,684) and no clock (-42,712). Consequently, we chose the unpartitioned, relaxed clock under the TK02 model. For the trees to be used in subsequent homoplasy analyses in this paper, we performed two simultaneous runs of 2,000,000 generations with sampling every 100 generations for the unpartitioned matrix under the TK02 relaxed clock model (Supporting Information S1). However, we stopped the runs after 1,004,000 generations since an average SD of 0.006 was attained. We then estimated the consensus tree after a burning period of 25% of the 20,086 sampled trees (Fig. 1). In subsequent analyses, we rerooted the consensus and sampled trees by considering the subfamily Steganinae a sister outgroup to the remaining species following the general consensus in drosophilid systematics (Yassin 2013;O'Grady and DeSalle 2018).

CONCEPTUALIZATION
The taxonomic description of specimens consists of a laconic style wherein the name of an anatomical structure is followed by a quality. The same quality (e.g., color) could be attributed to multiple structures (e.g., head, legs, etc.). The same structure (e.g., the aedeagus) could hold multiple qualities (e.g., size, shape, texture, etc.). The same structure bearing the same quality but at different developmental stage (e.g., the number of teeth of the mouth hook of the cephalopharyngeal skeleton in the three larval instars) is conceptualized as a separate character for each stage. Subtle differences in the same quality could lead to the conceptualization of multiple characters. For example, "pleura yellowish" or "pleura with three dark stripes" stem from the same quality, that is, color, but each describes a different character, that is, pigmentation and color pattern, respectively.

MORPHOLOGICAL CHARACTER STATE CODING
Coding refers to how different values of the same quality of a character in each species could be categorized so that diversity between species and state transformation during evolution could be inferred. We opted for discrete coding, because of its long tradition in the phylogenetic literature and its ability to summarize different types of descriptions such as binary, verbal, and numerical (Supporting Information S2 and S3). Below is how we proceeded for coding the different types of traits: (1) Numerical descriptions: Numerical values such as lengths, widths, counts (e.g., bristles), and indices were directly obtained for each taxon. When the range over multiple specimens was recorded, we used the average between the extremes as the summarizing value. We then used the NBClust package in R (Charrad et al. 2014) to estimate the optimal number of clusters for each character using Euclidean distance and Ward's method. Values belonging to the same cluster were then attributed the same code in the data matrix. Figure 2A-D shows an example for the coding of a single numerical character (wing length).
(2) Verbal descriptions: Qualities such as "large," "fusiform," "concave," "divergent," and so on were directly coded when only few distinct values were given (e.g., "concave" vs. "convex"). When illustrations were present, we used ImageJ package (Abramoff et al. 2004) to estimate lengths, angles, and areas for characters for which qualities were slightly ambiguous (e.g., "elliptical" vs. "ovoid," "subequal" vs. "slightly equal," etc.). The numerical values obtained from ImageJ were then analyzed using NBClust in R to define the number of optimal states. (3) Color descriptions: We found 60 verbal descriptions of colors in the two books such as "yellow," "slightly grayish," "brownish black," "reddish brown," and so on. We translated these descriptions into corresponding RGB values (Table S2, Fig.  2E). Since color perception is highly subjective, each of us produced an independent RGB table for the same descriptions. We then asked a colleague with an experience in Drosophila pigmentation to select for the most likely values in cases of disagreement. We conducted a Principal Component Analysis (PCA) on the RGB values, and projection of the two first axes identified a triangular distribution ( Fig. 2F) corresponding to the three main Drosophila pigments, that is, black and brown melanin, yellow NADA sclerotin, and white NBAD sclerotin (Wittkopp et al. 2003a). We analyzed each of these axes using NBClust and identified six states for colors that were used for all color characters. (4) Pattern descriptions: Patterns describe the spatial distribution of a quality within a structure. We encountered several pattern descriptions such as trichomes on the larval abdomen, tracheae of the anterior spiracles of the pupa, tubercules on the posterior spiracles of the puparia, pilosity on the genitalia, pigmentation of adult tergites, and so on. For larval trichomes, we followed Okada's (1968) categories. For pupal tracheae and tubercules, we followed a reductive coding, wherein each structure was reduced to its components and qualities of each component coded separately (e.g., tracheae were separated into basals, pseudobasals, centrals, pseudocentrals, and marginal). For pigmentation patterns on adult tergites, a trait that has intensively been studied in Drosophila evo-devo research (Massey and Wittkopp 2016), we divided each tergite into distinct spatial regions (Figs. S1-S7) and coded the colors as described above.
Missing data corresponded to either values that were not given for all taxa or those that are attributed to structures not present in all taxa. For example, steganine eggs lack filaments (Character 7). They were then attributed a missing data sign ('?') in all characters describing filaments number and shape in other species whose eggs have filaments (Characters 8-10).
The biological significance of our statistical delimitation remains to be clarified since the genetic basis of only a few morphological changes in drosophilids have yet been determined in evo-devo research (Table 1). From this literature, we found that our approach has led to "state lumping", that is, coding genetically different states into similar categories, for the number of hypandrial bristles (Character 394). This character was statistically coded as having three states with state 0 bearing 0 to 3 pairs. However, the transition from 1 pair to 0 bristles between a pair of Drosophila species has a strong genetic component (Nagy et al. 2018). Consequently, we used the biological information to recode 0 bristles as a fourth state for this character.

QUANTIFYING ENSEMBLE HOMOPLASY
Several comparative phylogenetic measurements and tests for homoplasy have recently been developed (Speed and Arbuckle 2017). However, for discrete data, traditional cladistics measurements such as the ensemble consistency (CI) and homoplasy (HI) indices (Kluge and Farris 1969) and the retention indices (RI, Farris 1989) remain the only developed statistics (Speed and Arbuckle 2017). We thus used PAUP * version 4.0a (Swofford 2002) to estimate these indices for the whole data set, using the Describe Tree command under both the ACCTRAN and DEL-TRAN models and all characters were unordered. To account for topological uncertainties, the indices averaged over the 20,086 sampled trees from the two simultaneous MrBayes runs (Supporting Information S3). Given the strong topological constraints we applied (see above), we averaged indices over all sampled trees and not only on those retained after burnin.

COUNTING HOMOPLASTIC STATES
We used the PAUP * to produce the list of character changes (DescribeTrees/chgList) as well as to infer ancestral states (De-scribeTrees/internal) on the consensus Bayesian tree under the ACCTRAN and DELTRAN models. We then parsed the output file to count for each character state the number of times it has been derived and shared between taxa. This led to five possible categories: (1) nonhomoplastic root states, (2) homoplastic root states (i.e., root states that have been secondarily derived), (3)  (5) autapomorphic states (i.e., states that have been derived once and are specific to a single terminal taxon).

AND BODY ORGANIZATION
A character consistency index (c i ) of 1 indicates that all states are apomorphic, that is, they have been derived once. While a c i < 1 is a sign of homoplasy in the character, c i < 1 values cannot be compared between characters because the lower boundary of c i depends on the number of character states (e.g., being 0.5 for a binary character). The r i was proposed to correct for this limitation. A character retention index (r i ) value of 1 also indicates apomorphy, but unlike the c i , r i cannot be calculated for autapomorphic traits, that is, characters with derived states appearing in only one terminal taxon. Consequently, an r i of 1 indicates only synapomorphies, that is, commonly inherited derived states among a group of taxa. For each character, we estimated both c i and r i using the PAUP * DescribeTrees/diagnose command on the 20,086 sampled Bayesian trees. Averaged values for each character were then used to test for differences in the amount of homoplasy among developmental stages and body parts. For this, characters were classified according to six developmental stages: egg, larva 1, larva 2, larva 3, pupa, and adult, with the adult characters being further classified into five body parts: head, thorax, abdomen, male terminalia, and female terminalia. We also tested the effect of autapomorphies by conducting statistical tests on r i without autapomorphies or after attributing an r i value of 1 to autapomorphic characters (hereafter denoted r ic ). We conducted all statistical analyses and model testing using R (www.R-project.org).

MECHANISMS
Traditionally, homoplasy is thought to result from three major mechanisms: parallelism, convergence, and reversal. There are confusions on the distinction between parallelism and convergence, with the former meaning similar initial conditions of the homoplastic state between the compared taxa and the latter referring to distinct origins (reviewed in Pearce 2012 andStayton 2015). Some authors consider the "origin" in developmental terms, for example, changes in similar or different genes for parallelism or convergence, respectively. Others consider the "origin" as the ancestral placement on the morphospace regardless to the underlying developmental mechanism as proposed by Jablonski (cited in Pearce 2012). We follow here Jablonski's definition for its operationality in phylogenetic analyses.

Figure 3. Different categories of trait evolution on the morphospace between pairs of species according to the degree of resemblance between states of the ancestors and the descendants.
We distinguished six possible changes for character spaces in a pair of species depending on the degree of sameness, similarity, and difference of the ancestral and present states in the two species (Fig. 3). We then used PAUP * to infer under the ACCTRAN and DELTRAN models the ancestral states at each internal node. Using a customized perl script we obtained for each taxon for each trait its current state, its original state, and the internal node at which the state changed (Supporting Information S4). For each trait, we conducted all possible pairwise comparisons between species with no missing data and estimated the frequency of each change category (Supporting Information S4).
Unlike parallelism and convergence, reversal does not imply comparisons between taxa, but a comparison between the state present in a taxon with the states present in its ancestors, and consequently this may lead to any of the possible between-species mechanisms shown in Figure 3. Using a perl script (Supporting Information S5), we compared for each trait the current state of each taxon with that of its ancestors and counted incidences were a reversal (e.g., 0 → 1 → 0) was found. Because the path of each taxon to the root had to be manually entered in the script from PAUP * node labeled cladogram, we conducted these analyses on a single tree, that is, the consensus Bayesian phylogeny.

DIFFER BETWEEN DROSOPHILID SPECIES
Thorough reading of the descriptions of the 56 species led to the conceptualization of 490 characters from drosophilids eggs (N = 12), larvae1 (N = 13), larvae 2 (N = 23), larvae 3 (N = 26), pupae (N = 46), and adults (N = 370). Adult characters were also classified into traits on the head (N = 78), thorax (N = 90), abdomen (N = 40), male terminalia (N = 150), and female terminalia (N = 12). The total number of characters states was 1479, ranging from 1 to 8 with an average of three states per character. Three characters were invariable having a single state, that is, egg color (always white or greyish white), the number of ventral caudal tubercles in the pupa (always 2), and the presence of an apical bristle on the mesotibia (always present).

HOMOPLASTIC
Under both the ACCTRAN and DELTRAN models, consistency index (CI) for the whole data set averaged over all 20,086 trees sampled from the two runs of MrBayes was 0.33 (±4.37 × 10 −6 ), with its complement, the data homoplasy index (HI), being 0.67 (±4.37 × 10 −6 ; Fig. 4A). This means that for a character change in Drosophila there is nearly as much as twice a chance that the character will take a recurrent or a preexistent state than a new one. Interestingly, average HI value after 10,000,000 random permutations of taxa over the tree using PAUP * was 0.75, with the empirical value of 0.67 falling outside the random distribution (Fig. 4A). Although this result supports the predominance of morphological homoplasy, it still shows that the potential space for novelties in drosophilids morphology is considerably large. model, 220 of the 490 root states have been secondarily derived. Among the 989 derived states, 429 were derived once. Of these, 137 states (ß32%) are shared between at least two taxa, that is, synapomorphic states (Fig. 4B). Under the DELTRAN model, 122 root states were secondarily derived, and 113 of 405 apomorphic states were synapomorphic (ß28%). That means that only one third of states occupying novel positions on the morphospace would remain unchanged for longer evolutionary times. This result cautions against homoplasy quantification from only phylogenetically informative characters, that is, after excluding characters with only autapomorphic derived states, as it is customary in cladistic analyses (Bryant 1995).

DEVELOPMENTAL CHANGES IN HOMOPLASY SUPPORT A MORPHOLOGICAL 'HOURGLASS' MODEL
Homoplasy, as measured by the character retention index with characters with only autopamorphic states attributed an r i value of 1 (here r ic ), significantly differed among developmental stages (Kruskall-Wallis chi-squared = 34.013, df = 5, P = 2.37 × 10 −6 ; Table S3; Fig. 4C). The difference formed a concave pattern (polynomial regression: r ic = 0.755 -0.245 D + 0.037 D 2 , P = 0.003, with D referring to the developmental stage as egg, larva 1, larva 2, larva 3, pupa, and adult with a unit increment). This quadratic regression had a lower Akaike Information Criterion (AIC) than a linear regression (352.45 vs. 363.20, respectively). The concave pattern, which supports the "developmental hourglass model" (see Discussion below), did not differ when autapomorphic characters were excluded (r i = 0.588 -0.201 D + 0.030 D 2 , P = 0.006). No differences were found between the ACCTRAN and DELTRAN models.

HOMOPLASTIC CHARACTERS
Homoplasy also differed between adult organs (Kruskall-Wallis chi-squared = 41.935, df = 4, P = 1.72 × 10 −8 ), with male and female terminalia characters being the least homoplastic compared to the three somatic organs, that is, the head, the thorax, and the abdomen (Wilcox test: W = 10986, P = 7.39 × 10 −9 ; Table S3; Fig. 4D). This difference did not persist when autapomorphic characters were excluded (Wilcox test: W = 7119, P = 0.18). Although on average, male terminalia had higher r ic than female organs (0.73 vs. 0.61, respectively), the difference was not significant (Wilcox test: W = 1107.5, P = 0.158). Exclusion of autapomorphic characters did not lead to significant difference either (Wilcox test: W = 349, P = 0.32). This lack of strong divergence between the sexes in terminalia traits supports the hypothesis of genital coevolution for which several lines of evidence have recently been demonstrated in Drosophila (Kamimura 2007;Yassin and Orgogozo 2013) and other animals (Yassin 2016). No differences were found between the ACCTRAN and DELTRAN models.

13% OF PAIRWISE SPECIFIC SIMILARITIES
Proportions of instances belonging to each of the six possible cases of character changes significantly differed (Table S3, Fig. 5A and B). For similarity between pairs of species, the most frequent case was that of phylogenetic inertia (ACCTRAN = 53.77%, DELTRAN = 54.20%), that is, the two species have a state that derived from the state present in their last common ancestor. Under the ACCTRAN model, this figure was followed by both convergence (4.22%) and parallelism (3.61%), which together indicates that homoplasy does not exceed 13% of cases where two species have similar character states. Under DEL-TRAN, parallelism (4.22%) exceeded convergence (1.99%), with sum homoplasy counting for 10.3% of species comparisons with similar states. For differences between species, the most frequent case was that of Disparity 2 (ACCTRAN = 34.86%, DELTRAN = 34.72%), wherein different states arose from different ancestral states. This was followed by Disparity 1 wherein the ancestral states were similar but of different origins (ACCTRAN = 3.54%, DELTRAN = 4.56%). No (0.00%) and 94 cases (0.30%) of divergence were found in pairwise comparisons where different states diverge from the same different ancestral state under the ACCTRAN and DELTRAN models, respectively.

DESCENDANT-ANCESTRAL COMPARISONS
Two hundred sixty-one (53%) and 363 (74%) characters showed no single occurrence of reversals under the ACCTRAN and DEL-TRAN reconstructions, respectively. On average, reversals were found in 6.65% and 2.63% of all comparisons between terminal taxa and their ancestors according to the two models. The frequency of reversals correlated with both parallelism and convergence, although its effect on convergence was higher under the ACCTRAN model (R 2 = 0.62 and 0.29 for convergence and parallelism, respectively). Similar correlations were also found under the DELTRAN model (R 2 = 0.65 and 0.11 for convergence and parallelism, respectively). These results suggest that high incidences of convergence and parallelism could be mostly due to recurrent reversals (Table S3, Fig. 5C and D).

HOMOPLASY?
Our analyses on 490 morphological traits indicate that nearly two thirds of morphological changes are homoplastic. Our ensemble consistency index (CI) estimate of 0.33 approaches that obtained in other cladistics analyses with >300 morphological characters, for example, in Diptera (CI = 0.40, Lambkin et al. 2013), Birds (0.24, Livezey andZusi 2007), Birds and related Dinosaurs (0.27, Godefroit et al. 2013), Squamates (0.16, Conrad 20080.21, Gauthier et al. 2012), andPlacental Mammals (0.20, O'Leary et al. 2013). Our study differs in its narrower taxonomic scale (i.e., all species belonging to the same family) and its quantification of morphological homoplasy on an independently inferred molecular tree. Zou and Zhang (2016) have recently compared O'Leary et al.'s (2013) ß3,500 morphological traits with a molecular phylogeny and estimated a morphological ensemble CI value of ß0. 25. Testing whether the potential of morphological innovations is larger in Dipterans (or Insects) than in Vertebrates would require further developments and refinements of the so-called "giant taxon-character matrices" (Simões et al. 2017;Laing et al. 2018).

THERE?
In order to draw general conclusions about the extent of homoplasy, one should be aware about how representative is the studied sample of characters. Information on the total number of morphological traits is lacking, not only because their study often requires profound learning and expertise but also because of the lack of a standard conceptualization approach. To the best of our knowledge, our study represents the largest morphological dataset to be analyzed for the prevalence of homoplasy in an insect. Various Drosophila evolutionary developmental studies have investigated some of the traits analyzed here (Table 1). Morphological phylogenetic studies have analyzed 18 (Throckmorton 1962 as reanalyzed by Grimaldi 1990), 217 (Grimaldi 1990), 68 (Hu and Toda 2001), and 37 traits (Yassin 2013). However, it remains difficult to recognize the degree of overlap between these studies because different authors often used different concepts for the same characters and/or different coding approaches. Vogt et al. (2010) highlighted the need for standardizing morphological terms among disciplines with a morphological character being defined as a quality attributed to a delimited anatomical structure The 490 characters analyzed here corresponded to 140 anatomical structures and nearly seven qualities (size, shape, count including presence/absence, color, pattern, texture, and hardness). Our reliance on the taxonomic literature (with its emphasis on variable traits) and our taxonomic sampling and scope have definitively overlooked counting multiple invariable characters (e.g., the presence/absence of the eyes and the legs, the number of abdominal tergites, etc.). Invariable characters are also omitted in cladistics analyses using parsimony in spite of their relevance in rate estimate in probabilistic phylogenetics (Lewis 2001). On Flybase (as of July 2018), ß10,000 anatomical terms have been described in D. melanogaster, half of which have no descendant term and hence may form the basis for a standard anatomical description. Assuming that those seven qualities are measured for each of the 5,000 structures, the minimal number of morphological characters may be 35,000. This figure largely exceeds (nearly 70 times) our current analysis and strongly reflects the paucity of our understanding of the evolution of morphological structures.

MORPHOLOGICAL HOMOPLASY?
Our morphological analyses uncovered two interesting genomic evolutionary trends, namely the developmental "hourglass" model and the rapid evolution of sex-specific traits. The "hourglass model" postulates that intermediate stages of development are evolutionarily less variable than early and later stages (Irie and Kuratani 2014). In the context of our analysis, low variability correlates with high homoplasy, that is, there are higher chances to find similar states among taxa at larval stages than at the egg, pupal or adult stages. The hourglass model was initially proposed based on morphological data in Vertebrates, but its support in insects mainly came from genomic and transcriptomic analyses in Drosophila (Cruickshank and Wade 2008;Kalinka et al. 2010;Yassin et al. 2010) and Diptera (Jiménez-Guri et al. 2013;Schep and Adryan 2013). Our results support the prevalence of the model at the morphological level in an insect but it is important to note that we did not study embryos, since the egg analyzed here is the female gamete with most features such as size, shape, and chorionic specializations (e.g., dorsal appendages, ridges, and sculpture) being determined during oogenesis. Moreover, characters were unevenly distribution on the phylogeny (Fig. 1). Investigating these missing data in future comparative studies should improve our tests of different developmental patterns. Song and Bucheli (2010) found no significant difference in the phylogenetic signal between genital and non-genital traits in a meta-analysis of 41 studies in insects wherein autapomorphic characters were excluded. Similarly, we did not find a difference in the amount of homoplasy between adult genital and non-genital characters when autapomorphies were also excluded. Nonetheless, including autaopomorphies revealed a sharp difference. Rapidly evolving characters should be richer in autapomorphic states, because such states have either recently appeared or rapidly changed in other lineages. Our finding hence is consistent with previous studies using fewer morphological characters in Drosophila (Civetta and Singh 1998) and with recent genomewide analyses revealing faster evolution of traits and genes with sex-limited expression (Haerty et al. 2007;Parsch and Ellegren 2013), most likely due to divergent sexual selection that could ultimately lead to fewer homoplasy.
Ever since the early days of Drosophila genetics, a major question has arisen on the relationship between the genetic mutations discovered in D. melanogaster and the morphological differences between wild species (Sturtevant 1921). Patterson and Stone (1952) argued that resemblance between morphological differences and D. melanogaster mutants might involve homologous or quite different loci, which seems to be the case as unraveled by recent evo-devo studies (e.g., Frankel et al. 2012Frankel et al. , 2016bYassin et al. 2016a;Yassin et al. 2016b). On the online database Flybase, the effect of genetic mutations on different anatomical structures is curated, and for each gene in D. melanogaster links to its orthologous genes (when present) in other Drosophila species are also provided. However, data on the evolutionary differences in these structures are still scattered in the taxonomic literature for the nearly 4,500 drosophilid species. The matrix provided here represents a first attempt to curate these disperse data and future research should complement missing data, adding more taxa and enriching Flybase resources with structures evolution. This endeavor would significantly promote the identification of the genetic underpinnings of morphological changes leading to better character state delimitation. Given the prevalence of the "deep homology" of developmental pathways in animals, such knowledge would ultimately lead to a better understanding of the molecular basis of morphological homoplasy in Drosophila and beyond. AUTHOR CONTRIBUTIONS S. A. S. conceptualized and coded the morphological characters and performed the phylogenetic analyses. A. Y. conceived the project, designed the study, supervised the analyses and wrote the paper with input from first author.

ACKNOWLEDGMENTS
The authors thank two anonymous reviewers for their insightful comments on a previous draft of this manuscript and Héloïse Bastide (Univ. Paris Saclay) for help in RGB color coding. This work was funded by an ATM and a Richard Lounsbery Foundation grants to A.Y. The authors declare no conflict of interests.

DATA ARCHIVING
Data associated with this article are all provided as supporting files.

LITERATURE CITED
Associate Editor: A. Goswami

Supporting Information
Additional supporting information may be found online in the Supporting Information section at the end of the article. Table S1. Phylogenetic classification, data origin and GENBANK accession numbers for taxa sampled in the present study. Table S2. RGB coding and NBClust analysis for the 60 colors described in the taxonomic literature. Table S3. Character evolution statistics (ci=consistency index, ri=retention index, rc=rescaled consistency index, hi = homoplasy index) and proportions of different evolutionary mechanisms for the 490 morphological characters analyzed in the present study. Supporting Information Supporting Information Supporting Information Supporting Information Supporting Information