Analysis of the genetic loci of pigment pattern evolution in vertebrates

Vertebrate pigmentation patterns are amongst the best characterised model systems for studying the genetic basis of adaptive evolution. The wealth of available data on the genetic basis for pigmentation evolution allows for analysis of trends and quantitative testing of evolutionary hypotheses. We employed Gephebase, a database of genetic variants associated with natural and domesticated trait variation, to examine trends in how cis‐regulatory and coding mutations contribute to vertebrate pigmentation phenotypes, as well as factors that favour one mutation type over the other. We found that studies with lower ascertainment bias identified higher proportions of cis‐regulatory mutations, and that cis‐regulatory mutations were more common amongst animals harbouring a higher number of pigment cell classes. We classified pigmentation traits firstly according to their physiological basis and secondly according to whether they affect colour or pattern, and identified that carotenoid‐based pigmentation and variation in pattern boundaries are preferentially associated with cis‐regulatory change. We also classified genes according to their developmental, cellular, and molecular functions. We found a greater proportion of cis‐regulatory mutations in genes implicated in upstream developmental processes compared to those involved in downstream cellular functions, and that ligands were associated with a higher proportion of cis‐regulatory mutations than their respective receptors. Based on these trends, we discuss future directions for research in vertebrate pigmentation evolution.


I. INTRODUCTION
One of the central goals of evolutionary biology is to understand the genetic basis of organismal diversity. Determining which genes and mutations underlie adaptive traits is essential for understanding how evolutionary forces shape organismal variation (Barrett & Hoekstra, 2011). In this regard, vertebrate pigmentation is a powerful system for integrating historically disparate fields of evolutionary research, and ultimately identifying the genetic mechanisms underlying evolutionary change (Cuthill et al., 2017;Hubbard et al., 2010;Kronforst et al., 2012;Orteu & Jiggins, 2020).
Vertebrate pigmentation offers a diverse array of phenotypes, with intra-and interspecific variation ranging from whole-body colouration to highly localised pattern alterations (Fig. 1). The adaptive significance of pigment patterns can be attributed to multiple distinct selection pressures, including thermoregulation, camouflage, aposematism, sexual display, and ultraviolet protection (Protas & Patel, 2008). The rapid evolution of vertebrate pigmentation patterns, and their evolutionary significance, allows for both mapping of individual mutations to their resultant phenotypes, and inference of the evolutionary pressures driving the selection of those phenotypes. There have been many studies identifying loci associated with a wide range of pigmentation phenotypes in different vertebrate taxa. Analysis of these individual case studies may reveal broad trends underlying pigment pattern evolution and help to inform the future direction of vertebrate pigmentation research.
Here, we first outline the basic biology of vertebrate pigmentation, as well as the key differences between different vertebrate clades. We then utilise a data set of vertebrate pigmentation literature to analyse trends in the underlying genetics. For this purpose, we used Gephebase (https://www. gephebase.org/)a knowledge base dedicated to the compilation of literature on genes and mutations underlying natural and domesticated organismal variation in eukaryotes . Gephebase gathers published data on evolutionarily relevant mutations, with each entry representing a causal association between an allelic change at a given locus and trait variation between individuals or species. Each entry includes information relating to the species/population, the type of trait, the gene, the nature of the mutation(s), whether they represent null alleles, and the study that identified it. Using the vertebrate pigmentation entries in this database, we focused on the relative abundance of cis-regulatory and coding mutations contributing to vertebrate pigmentation evolution. Finally, we discuss the direction of future vertebrate pigmentation research and the role to be played by recent model systems and innovative approaches.
(1) Vertebrate pigmentation In vertebrates, most colour patterns derive from specialised pigment cells which produce either pigments or reflective structures. These cells arise from the migratory neural crest cell population, which emerges from the dorsal neural tube during early vertebrate development (Lapedriza, Petratou & Kelsh, 2014). Neural crest cells then delaminate and undergo some of the longest migrations of any embryonic cell type to showing differences in horizontal stripes and vertical bars (Kratochwil et al., 2018) Bedford). Differences in melanic coat colouration correlate with the colour of the background and evolved as a result of strong predation. Photographs adapted from Bedford & Hoekstra (2015). (E) Red and black head-colour polymorphism in the Gouldian finch (Erythrura gouldiae), morphs display differences in aggression and reproductive success (Toomey et al., 2018). Photograph courtesy of Ricardo Jorge Lopes and Miguel Carneiro. give rise to multiple derivatives such as neurons and glia of the peripheral nervous system, smooth muscle, craniofacial cartilage and bone, and pigment cells (Simões-Costa & Bronner, 2015).
Across vertebrate clades there is considerable diversity in the cell types producing colours. In fishes, amphibians, and non-avian reptiles there are multiple distinct classes of pigmented and structurally coloured cells, called chromatophores. These classes contain different combinations of pigments and/or reflective structures, and therefore exhibit different ranges of colours. Across these clades up to at least nine chromatophore classes are recognised. The most common chromatophore classes that utilise pigments are melanophores (brown/black melanin pigments) and xanthophores (yellow/ orange carotenoid and pteridine pigments). These cells produce pigment-based colour via the deposition of their respective pigment molecules, which selectively absorb specific wavelengths of light. By contrast, structural colouration results from the presence of light-interfering structures, such that structural colour is variable depending on the angle from which it is viewed. Iridophores are structural chromatophores, which appear silvery or blue due to the arrangement of layered purine platelets of variable size, shape, and arrangement (Parichy, 2021;Singh & Nüsslein-Volhard, 2015).
Rarer chromatophore classes include red erythrophores and blue cyanophores, as well as two distinct classes of white leucophores with different regulatory profiles, developmental origins, and chemical compositions (Goda & Fujii, 1995;Huang et al., 2021;Lewis et al., 2019). Each chromatophore class present in an organism goes through extensive cell movements and cell-cell interactions to form a final colour pattern. Variation in the abundance, combinations, and arrangements of chromatophores generates the intricate and diverse colour patterns seen in fishes, amphibians, and reptiles.
By contrast, mammals and birds have independently lost most pigment cell diversity and mainly retain only one pigmentary cell type, the melanocyte (equivalent to the melanophore) (Kelsh et al., 2009). Melanocyte differentiation and development remains highly conserved among vertebrates, but differences exist. Unlike melanophores, melanocytes produce two melanin pigment types in different shadesbrownish-black eumelanin, and reddish pheomelanin. The ability to switch melanogenesis between eumelanin and pheomelanin production is specific to birds and mammals (McNamara et al., 2021). Mammals and birds develop complex pigmentation patterns by temporally and spatially regulating melanogenesis switching, as opposed to using different chromatophore classes for different colours. Furthermore, while other vertebrates retain pigments within the respective chromatophore, in birds and mammals melanin is secreted from melanocytes into the skin or the integumentary appendages, such as feathers and hairs (McNamara et al., 2021). Birds additionally exhibit an array of yellow, orange and red colours due to the processing and accumulation of dietary carotenoid pigments (Toews, Hofmeister & Taylor, 2017). Finally, birds also display structural colouration arising from reflective nano-structures in both the skin and plumage (Saranathan & Finet, 2021).
(2) The role of cis-regulatory and coding mutations in evolution The extent to which evolution is predictable on the level of genetic variation is a long-standing topic in evolutionary biology. One important question concerns the relative prevalence of cis-regulatory and coding sequence mutations (Carroll, 2008;Hoekstra & Coyne, 2007;Martin & Orgogozo, 2013;Stern & Orgogozo, 2009, 2008. It has been hypothesised that cis-regulatory mutations are more likely to contribute to evolutionary changes in morphology (Carroll, 2008). One reason for this is that cis-regulatory mutations are expected to have fewer pleiotropic effects when compared to coding mutations, mostly due to their highly modular nature conferring tissue specificity, thus affecting gene expression patterns without changing protein function. Conversely, protein coding sequences are less modular, and non-synonymous mutations are expected to affect protein function across every cell and tissue in which it is expressed. However, for the past three decades, case studies have shown that both types of mutations can contribute to evolutionary change. Thus, trying to argue for the existence of a dichotomy in the relative frequency of these types of genetic change is too simplistic. Instead, we should strive to identify the contexts in which one type of mutation is favoured over the other (Stern & Orgogozo, 2008). For example, are different mutations associated with particular aspects of trait variation (e.g. colour versus pattern)? Do the cellular and developmental processes underlying a trait influence the nature of the mutations that can affect it? Do we see a shift in the role played by different types of mutation over evolutionary time?
Herein, we tackle these questions by examining the relative distribution of cis-regulatory and coding mutations across vertebrate pigmentation evolution, and how the relative frequencies of these mutation types associate with other factors, such as loci mapping strategy, type of trait variation or types of genes involved. Moreover, by analysing the types of mutation associated with the evolution of vertebrate pigmentation, we identify trends in the field and discuss possible directions for future research.

II. METHODS
(1) Literature curation in Gephebase Gephebase compiles pairs of alleles in association with pairs of phenotypic states (a genetic variation causing a phenotypic variation is called a 'gephe' for brevity). A full description of the database is provided in . Data currently cover all eukaryotes with relevant publications, with a focus on traits of evolutionary rather than Vertebrate pigmentation analysis clinical relevance. This includes variations that have been artificially selected by breeders ('Domesticated' data set), or subject to experimental evolution under laboratory-controlled selective pressures. Data are manually curated by a small team of researchers, using key words as well as Pubmed/Google Scholar citations to identify newly published studies. The current triage system gives priority to gephes identified by forward geneticsquantitative trait locus (QTL) mapping, and genome-wide association studies (GWAS) with single-gene resolution and reasonable supporting evidence for causality (if not for the variant, at least for the gene). Gephes identified by candidate gene approaches without mapping are also included when there is additional functional evidence for the causality of the mutation. Gephebase is up to date and gathers all relevant papers published until 2017. Since 2017, data curation efforts have mainly focused on colour variation in vertebrates (trait category = 'Coloration'). The download of Gephebase data on 15 August 2022 (see online supporting information, File S1, tab: All Vertebrates), which was used for the present study, represents a compilation of the genes and mutations contributing to vertebrate colouration up to August 2022.
(2) The vertebrate pigmentation data set To examine trends in colour pattern evolution, we utilised Gephebase as a resource for exploring genotype-phenotype relationships. We formed a working data set by selecting every Gephebase entry pertaining to the trait category 'Coloration', and further filtered those entries by removing all entries pertaining to non-vertebrate species (accessed 15.08.2022). In total we retrieved 389 entries, with each entry representing a mutation at a single locus. Each mutation was present in only one gene for the organism in which it was identified, and there were a total of 68 unique genes identified across 103 vertebrate species. For each entry, we examined five parameters defined by Gephebase: Gene ID, Taxonomic status, Study methodology, Aberration type, and Molecular type . We further defined five additional parameters detailed below: Clade, Pigment type, Phenotype category, Cellular function category, and Molecular function category (supporting information, File S1, tabs: All Vertebrates, GO Terms by Gene).
(3) Assignment of new parameters

(a) Clade
Each vertebrate represented in the data set belonged unambiguously to one of five clades with distinct pigmentation biology: amphibians, teleosts, squamates, birds and mammals (Table 1). We divided entries into these clades to compare trends in their different pigmentation systems.

(b) Pigment type
In total, five pigment types were represented in the data set: biliverdin, carotenoids, pteridines, melanins, and psittacofulvins. Additionally, four entries referred to structural colouration. This is a short list of pigments, but few studies identified the specific pigment molecules associated with a phenotype. For example, a phenotype may clearly implicate a change in the synthesis or distribution of melanin pigments, without distinguishing between pheomelanin and eumelanin. Hence, these two types of pigment are grouped in the melanin category. Finally, there were 13 entries in which a single genetic variant was associated with changes to multiple pigments. These entries were assigned to each relevant pigment type category.

(c) Phenotype category
We empirically assigned to each entry one of seven phenotype categories (Table 2) in order to group similar organismal colour loss/tuning phenotypes, and analysed trends in how they arise. Each entry was considered independently, and assigned the category that best described the organismal phenotype based on the original study. We accounted for both the visible phenotype in terms of pigment distribution and quantity, as well as the molecular mechanisms underlying the phenotype. For example, amelanism was distinguished from white spotting on the basis of whether the phenotype resulted from a loss of melanin synthesis, or impairment of melanocyte migration/differentiation.
A total of 54 entries could not be assigned a category unambiguously. These included phenotypes involving global developmental changes that indirectly affect pigment pattern, for instance, a cyp19a1 mutation that results in femalelike plumage patterns in male chickens due to ectopic Table 1. Number of entries in the vertebrate pigmentation Gephebase data set in 2022 (see supporting information, File S1, tab: All Vertebrates) according to vertebrate clade and taxonomic status. One entry in the Gephebase database corresponds to genetic variation at a given gene that has been found to contribute to pigmentation evolution in a given taxon. oestrogen synthesis in the skin (Li et al., 2019b). Likewise phenotypes involving pigment changes without a clear causative mechanism were not categorised. For instance, a blue eggshell phenotype in chickens is associated with a slco1b3 allele that is overexpressed in the shell gland, but the mechanism by which slco1b3 affects shell pigmentation remains unclear . Removing ambiguous cases left a total of 335 assignments, of which 306 were cis-regulatory or coding. Additionally, categories were assigned relative to the ancestral state, so that, for instance, eumelanism refers to a mutation leading to an increase in eumelanin pigmentation. Colour loss versus colour tuning Pheomelanism A phenotype exhibiting increased abundance of pheomelanin relative to eumelanin, through changes to melanogenic switching or production of either melanin type. This includes phenotypes in which eumelanin synthesis is absent or reduced such that pheomelanin is relatively more prevalent.

87
A missense mutation in mc1r is associated with the red phenotype in domesticated donkeys (Equus asinus; Abitbol et al., 2014).

Eumelanism
A phenotype exhibiting increased abundance of eumelanin relative to pheomelanin, through changes to melanogenic switching or production of either melanin type. This includes phenotypes in which pheomelanin synthesis is absent or reduced such that eumelanin is relatively more prevalent.

83
A loss-of-function SNP in asip is associated with 'black panther' phenotype melanistic leopards (Panthera pardus; .

Dilution
A phenotype in which both eumelanin and pheomelanin synthesis is reduced (but not lost) across one or several tissues.

22
A melanophilin splice-site SNP leads to exon skipping and coat colour dilution in domesticated rabbits (Oryctolagus cuniculus; Lehner et al., 2013). Amelanism A phenotype characterised by a loss of melanin production across one or several tissues. Colour shift versus pattern alteration Colour shift Any phenotype in which organismal colour shifts either locally or globally, without impacting the spatial boundaries of its existing patterning. Includes phenotypes in which an organism's existing pattern is overridden, e.g. whole-body eumelanism.

222
In Japanese quail (Coturnix japonica), a frameshift deletion in asip is associated with the recessive black phenotype, in which feather colour is darkened across the whole body without altering pattern boundaries (Hiragaki et al., 2008).

Pattern alteration
Any phenotype in which the spatial boundaries of an organism's colour pattern change, including the formation of new boundaries e.g. establishment of a pattern in a previously uniform region. Does not include phenotypes in which an existing pattern changes colour without change to the pattern boundary.

129
The convergent evolution of horizontal melanic stripes in different cichlid species is associated with regulatory changes in agouti-related peptide 2, and expression levels are predictive of stripe presence across multiple cichlid radiations (Kratochwil et al., 2018).

Vertebrate pigmentation analysis
We additionally assigned each category to either colour shift or pattern alteration (Table 2), on the basis of whether the phenotype affected colour or spatial pattern boundaries. Each entry was considered independently, and these assignments were considered separately from the previous phenotype categories. Phenotypes involving a loss of patterning (for example whole-body albinism resulting in no visible pattern boundaries) were considered colour shifts.
A total of 38 entries could not be assigned a category unambiguously. These included phenotypes for which the distinction between colour and pattern changes was unclear, particularly in the case of domesticated breeds where the 'ancestral' pigmentation pattern is poorly defined. For instance, an asip mutation associated with a uniform coat colour in dogs is found in breeds that otherwise exhibit various coat phenotypes both patterned and uniform, making comparison to an ancestral state unclear. Removing ambiguous cases left a total of 351 assignments, of which 317 were cis-regulatory or coding.

(d) Cellular function category
In order to analyse genes associated with different developmental and cellular roles, we assigned each gene in the data set a functional category, reflecting its role in the development of the phenotype (Table 3). Unlike phenotype categories, these assignments were based on the gene rather than on individual entries. Entries were assigned empirically, based on a literature review of the genes identified by each study, and the phenotypic function(s) they have been implicated in. As with phenotype categories, each gene was assigned with high certainty and little ambiguity. A total of four genes, comprising seven entries, were removed from this analysis, as their cellular functions specifically with regard to pigmentation biology were unclear. These were cyp19a1, spint1, eif2s2, and goldentouch, an undescribed gene. These genes had one entry each.

(e) Molecular function category
We assigned protein molecular function categories nonempirically by using the gene ontology (GO) terms associated with each gene in the data set. We employed the EBI QuickGO Mouse Slimmer in order to identify parent GO terms associated with the sets of child terms associated with each gene. This slimmer was selected as being the most representative of the distribution of GO terms within vertebrate clades. We then narrowed the set of GO terms to include only those associated with the 'Molecular function' GO tree (see supporting information, File S1, tab: GO Assignments). We additionally manually combined a number of closely related GO terms in order to reduce the number of potential Table 3. Summary of cellular functional categories: assignment criteria, and the total number of entries in the data set to which that category was assigned. Each gene was assigned to only one functional category. categories (see Table 4; supporting information, File S1, tab: Molecular Function Summary). In total there were 89 unique assignments of parent GO categories across the data set. All bar two of the 68 unique genes in the data set generated at least one assignment. The only two exceptions were goldentouch, a previously undescribed gene identified in Midas cichlids with no known molecular function, and mlana, a gene implicated in melanosome biogenesis. Only one molecular function assignment (GO:0005515 protein binding) is assigned to mlana, which is not included in the QuickGO Mouse Slimmer. After combining closely related terms, there were 10 parent GO terms assigned. Then, we removed all categories with fewer than five unique gene assignments, of which there were three: RNA binding, enzyme regulator activity, and lipid binding. Thus we ended up with seven molecular function categories for comparison. The DNA binding GO category was the result of combining two GO terms: DNA binding and transcription factor activity. Although there are distinct biological differences between a gene displaying DNA binding activity and one acting as a transcription factor, in the Gephebase data set there was nearly complete overlap between these terms. Only one gene was tagged as DNA binding without also being tagged for transcription factor activity: epidermal growth factor receptor egfr. All other genes were tagged either with both GO terms, or with neither.

(4) Genetic hotspots
We investigated the overlap of genes between pigmentation systems by examining the number of entries corresponding to shared genes. For each clade (excluding amphibians due to lack of data), we listed all the genes identified in that clade, and then calculated the total number of entries corresponding to those same genes in each of the other three clades. We normalised each of these figures by dividing by the total number of entries in that clade, in order to account for the disparity in entries between clades.

(5) Statistical analysis
For each analysis the cis-regulatory proportion was fitted as a binomial logistic regression model and post hoc pairwise comparisons were performed with Tukey correction using the glht function in the multcomp package for R. Direct comparisons between categories or groups of categories were performed using Fisher's exact test, again at P < 0.05.

(6) Correction for overrepresented genes
For each analysis we compared the total numbers of cisregulatory and coding mutations in each category. However, some particularly well-characterised genes contribute a Table 4. The 10 molecular function (parent gene ontology, GO) categories assigned to data set entries, the total number of genes to which each GO category was assigned and corresponding total number of entries, the number of cis-regulatory entries to which each was assigned and corresponding cis-regulatory proportion, and the child GO terms that were combined into the category where relevant. For analysis we removed categories with fewer than five unique assignments (RNA binding, enzyme regulator activity, and lipid binding; see Section II.3). Vertebrate pigmentation analysis disproportionate number of entries to the data set. To assess whether this affected our results, we repeated all analyses by treating each unique gene as a single data point, with a proportion of cis-regulatory or coding mutations calculated for that gene's entries in a particular category. For instance, for asip in birds, there were three entries of cis-regulatory mutations out of five entries in total, so asip was treated as a single data point with a value of 0.6 for cis-regulatory mutations in birds; in mammals, cis-regulatory mutations in asip instead had a value of 0.32. Each plot shows the proportions of cisregulatory and coding mutations by category, manually scaled to sum to 1 to allow comparison with other figures. Post hoc statistical tests were not performed on these plots.

III. RESULTS
(1) Information on genetic variants associated with pigmentation is asymmetrically distributed across vertebrate clades To examine the representation of different pigmentation systems across vertebrate evolutionary studies, we looked at the number of entries in Gephebase across five clades: teleost fishes, amphibians, squamates, birds and mammals. All 389 entries belonged to one of these clades. The significant majority (68.3%, N = 266) of entries were for mammals, followed by birds (21.1%, N = 82), teleosts (7.5%, N = 29), squamates (2.6%, N = 10) and amphibians (0.5%, N = 2). Clades in which multiple distinct chromatophore classes are found thus represent only 10% of our data set, with the majority of data from clades with only melanocytes and no other pigment cell classes. This taxonomic bias is attributable in part to how this field originated and evolved. The disproportionate representation of mammalian studies is likely a reflection of the extensive characterisation of mouse coat colour genetics, which has been studied for more than a century (Hoekstra, 2006). As a result, many of the key components of the melanin biosynthesis pathway were first identified in mice, and these components are often selected for investigation in other mammalian and bird systems.
(2) Variation in vertebrate pigmentation is associated most often with coding mutations We examined the relative prevalence of protein coding and cis-regulatory mutations across the data set. Cis-regulatory mutations, and coding sequence mutations together account for 348 entries (89.5%) in the full data set, with coding mutations representing the majority of entries (272, 69.9%) ( Fig. 2A). The remaining classes of genetic variation were gene amplification, gene loss, intronic mutations, and unknown (when the gene was identified but the exact mutation(s) involved were not specified) ( Fig. 2A). This dominance of coding mutations is in contrast to the prediction that cis-regulatory mutations are more likely to generate phenotypic change (Carroll, 2008). However, the observed pattern may reflect the relative ease of identifying coding mutations compared with cis-regulatory changes (Stern & Orgogozo, 2008).
(3) Study methodologies with less ascertainment bias exhibit a slightly higher proportion of cis-regulatory mutations Three methodology categories are represented in the data set: candidate gene, linkage mapping, and association mapping . Of these, candidate gene studies are likely to involve the highest ascertainment bias and may be expected to identify a higher proportion of coding mutations. Many candidate gene studies historically focused on identifying amino acid changes within coding regions in new species based on previous findings in other species. However, they can facilitate comparisons across broad taxonomic distances compared with other mapping approaches. By contrast, both linkage and association mapping studies start with a phenotypic difference and attempt to link it to a sequence change, or at least to a locus interval. Linkage mapping involves crossing populations to generate recombinant hybrids, and thus can only map variation between closely related organisms that are interfertile. Such studies are less likely to suffer from ascertainment bias than candidate gene approaches in their detection of causative loci, depending on their resolution and mapping coverage. Finally, association mapping approaches involve identification of significant associations within large, heavily intermixed populations, typically with minimal ascertainment bias as a result. To account for potential differences between methodologies we plotted the proportions of cis-regulatory and protein coding mutations for each method (Fig. 2B).
As expected, candidate gene approaches identified the lowest proportion of cis-regulatory mutations (12.5%). For linkage mapping studies, this value was 25.3%, and for association mapping it was higher still (46.2%). Thus all three of these methodology categories result in more reported coding than cis-regulatory mutations (Fig. 2B), and regardless of the method used, protein coding mutations appear to be involved in the majority of investigated vertebrate pigmentation variation. An effect of study methodology may be seen in that the two methodologies with a lower ascertainment bias appear to report more causal cis-regulatory mutations.

Vertebrate pigmentation analysis
We also did not observe an appreciable change in the relative prevalence of linkage mapping or association mapping studies (Fig. 2C, D). Taken together, while there is an overall trend for increased reporting of cis-regulatory mutations with time, differences in the relative prevalence of different study methodologies are insufficient to explain the preponderance of coding mutations in our data set. It remains possible that there is a discovery bias towards the identification of coding mutations, or alternatively, that evolutionary variations in vertebrate pigmentation do indeed involve coding mutations more often.
(5) The dominance of coding mutations in the data set may be partly explained by studies on three genes involved in melanic pigmentation Historically, most evolutionary studies have focused on mammalian and bird melanic pigmentation systems, presumably due to the large number of studies on the genetics of mouse coat colour. Therefore, we explored if the observed preponderance of protein coding mutations could result from the overrepresentation of melanic pigmentation case studies. Mammal and avian melanic pigment patterns are the result of dynamic switching between the synthesis of two types of melanin. The switch between black eumelanin and reddish-brown pheomelanin is controlled by the melanocortin 1 receptor (MC1R) encoded by mc1r (García-Borr on, S anchez-Laorden & Jiménez-Cervantes, 2005). With 88 entries (22.6% of the data set) mc1r is the most common gene in our data set. The next most represented gene is kit, which encodes a receptor tyrosine kinase essential for the survival and migration of melanocyte/melanophore precursors (Hou, Panthier & Arnheiter, 2000) and represents 12.6% (49 entries) of the data set. The third most represented gene is asip, which codes for agouti signalling protein (ASIP). This protein acts as an antagonist to MC1R; in the absence of ASIP, MC1R initiates a signalling cascade that results in eumelanin production, while when ASIP is present the melanocyte reverts to pheomelanin synthesis (García-Borr on et al., 2005). There are 43 asip entries in our data set (11.1%).
The overrepresentation of these three genes in Gephebase reflects their relevance in mammalian and other vertebrate pigmentation systems, and particularly their frequent selection as candidate genes for further study. To remove any potential bias resulting from a focus on these genes, we reanalysed the data with mc1r, kit, and asip removed. With these entries removed from the data set, the overall proportion of cis-regulatory mutations increased from 21.8% to 28.6%, due to the prevalence of coding mutations in mc1r entries in particular (Fig. 2E). Similarly, the proportions of cisregulatory mutations in all three methodology categories increased (Fig. 2E). Coding mutations remain the most reported for candidate gene and linkage mapping methodologies, but for the association mapping category there is now an approximately equal number of cis-regulatory and coding mutations (N = 21 and 20, respectively). However, the difference between the proportion of cis-regulatory mutations before and after removing the overrepresented genes was not significant (Fisher's exact test, P = 0.089).
We performed an additional analysis in which we normalised the data set to weight each individual gene equally regardless of its number of entries in the database (see Section II.6; Fig. S1). The results were again similar, with a slightly increased proportion represented by cis-regulatory mutations relative to the unnormalised data set, and association mapping now showing more cis-regulatory than coding mutations (59.6%). We conclude that overrepresentation of these melanic case studies may partially explain the higher prevalence of protein coding mutations in the vertebrate pigmentation data set.
(6) The proportion of coding mutations is higher for studies of domesticated and intraspecific variation than for studies of interspecific variation It has been hypothesised that different mutation types may occur at different frequencies during short-term and long-term evolution, with coding mutations being more common in the short term, and thus across shorter taxonomic distances (Stern & Orgogozo, 2009, 2008. Over shorter periods of time, mutations with deleterious pleiotropic effects might become established if alternative, less-pleiotropic mutations do not appear. By contrast, over longer periods of time, non-optimal mutations will be tested across a variety of environments and there will be more opportunity for adaptive mutations without pleiotropic effects to appear and be selected for. In addition, contexts of artificial selection by humans could facilitate the presence of mutations that would normally be counterselected in the wild (Cieslak et al., 2011;Hanly et al., 2021). To test this in vertebrate pigmentation systems, we plotted the proportion of cis-regulatory mutations according to the taxonomic status categories: domesticated, intraspecific, and interspecific. The 'domesticated' category included cases of artificial selection by breeders, with pigment trait variations being directly selected in most cases. The 'intraspecific' category contained studies that investigated natural phenotypic differences between morphs of the same species. 'Interspecific' entries were all those that reported pairs of taxa above the species level.
We found a similarly low proportion of cis-regulatory mutations in the domesticated and intraspecific variation categories (17.6 and 24.8%, respectively; Figs 3A and S2). By contrast, studies on interspecific variation reported a significantly higher proportion of cis-regulatory mutations (64.3%), suggesting that this mutation type indeed might be more common with increasing taxonomic distance (Fig. 3A). This pattern broadly remained in a re-analysis of the data normalised for number of entries (Fig. S3A), wherein slightly higher proportions of cis-regulatory mutations were calculated than in the nonnormalised analysis but increasing taxonomic distance was still associated with a higher proportion of cis-regulatory mutations.
These results must be interpreted with caution as the number of interspecific studies addressing vertebrate pigmentation was smallonly a total of 14 entries were interspecific, corresponding to 11 unique genes. 14 'interspecific' entries were from teleost fishes, which, unlike mammals, have multiple pigment cell types (Fig. S4). The low number of interspecific studies together with their low clade diversity limits the examination of trends across large evolutionary timescales and may reflect an underexplored aspect of vertebrate pigmentation evolution.

(7) The proportion of coding mutations is higher in mammal and bird studies than in studies of teleosts, amphibians, and squamates
We also investigated whether the prevalence of coding mutations could be associated with a disparity in representation of different vertebrate clades. Mammals and birds constitute the vast majority of entries (89.5%, N = 348 combined), with entries for teleosts, amphibians, and squamates being much less common (10.5%, N = 41 combined). The latter three clades also have much more varied pigmentation systems than mammals or birds, with multiple developmentally distinct classes of pigment cells. This difference could potentially lead to differences in evolutionary trends. For example, the development of distinct pigment cell classes from a shared neural crest origin increases the likelihood that coding sequence mutations in genes contributing to pigment cell development will have pleiotropic effects. Mutations, and particularly coding mutations, affecting genes with shared roles between chromatophore classes could thus impact several aspects of pigmentation simultaneously. Thus, cis-regulatory mutations may perhaps be selected for their specificity of expression as they are more likely to affect only one individual chromatophore class. Similarly, genes implicated in direct cellular interactions might be expected to suffer more negative consequences from coding sequence mutations, given the recognition specificity that is typical of signalling molecules and their receptors. We therefore examined whether clades with distinct pigmentation systems showed different patterns in the proportion of cis-regulatory and coding mutations (Fig. 3B). The proportion of cis-regulatory changes was lowest for mammals (16.7%) and birds (23.3%). By contrast, studies on squamates (55.6%), teleosts (52.0%) and amphibians (50%; but note that N = 2, Fig. 3B) reported approximately equal proportions of the two mutation types.
Although the low number of entries in our data set for taxa with multiple chromatophore pigment systems precludes definitive conclusions, the observed trend does fit with the hypothesis that variation in pigmentation systems utilising multiple pigment cell classes could lead to a higher proportion of cis-regulatory mutations. The proportion of cisregulatory mutations for teleost, amphibian, and squamate entries combined was 52.8%, compared with 18.3% for birds and mammals combined. This difference was statistically significant (Fisher's exact test, P < 0.0001). Notably, half of the interspecific data came from teleosts (Fig. S4), making it difficult to draw conclusions regarding patterns in the proportion of cis-regulatory mutations across higher taxonomic levels. However, considering only the intraspecific cases, there is a higher proportion of cis-regulatory mutations for teleosts, amphibians, and squamates (45.0%, 9/20 entries) than in birds and mammals (20.4%, 19/93) (Fig. S4). This difference was statistically significant (Fisher's exact test, P = 0.0419). Re-analysis of the data using equal weighting for each gene (see Section II.6) for each clade also implies that clades with multiple pigment cell classes exhibit higher proportions of cis-regulatory mutations (Fig. S3B). A larger sample of studies conducted in model systems with multiple pigment cell classes, together with more interspecific studies in other clades, would provide valuable insights.
(8) Types of phenotypic variation associated with coding and cis-regulatory mechanisms Differences in the nature of phenotypic variations may be associated with distinct types of genotypic change. For instance, loss of pigmentation may be more likely to result from loss-of-function coding sequence changes, whereas patterning changes could preferentially be driven by cisregulatory mutations due to their high spatial modularity reducing pleiotropic effects (Orteu & Jiggins, 2020).

Vertebrate pigmentation analysis
To determine whether there are differences between colour loss, colour tuning, and pattern variation, we assigned different phenotype categories to each entry (Table 2; see Section II.3).
In every category except changes in carotenoid colouration, coding sequence changes were the most common (Fig. 4A). For the melanic pigmentation categories (pheomelanism, eumelanism, dilution, amelanism, white spotting)  cis-regulatory mutations represented a low proportion of the data set (13-29%). These results do not support the suggestion that phenotypes involving loss of pigmentation are associated with a higher proportion of coding mutations than changes to pigmentation quantity or distribution. Both phenotype categories pertaining to carotenoid colouration had too few entries for reliable conclusions to be drawn. However, both carotenoid categories combined appear to involve a significantly higher proportion of cisregulatory mutations (72.7%) compared with melanic pigmentation (17.3%; Fisher's exact test, P = 0.0001), suggesting that the evolution of carotenoid pigmentation may be driven by cis-regulatory changes to a greater extent than for melanic pigmentation. The physiological relevance of carotenoids outside of their signalling roles, their distribution across multiple tissue types other than pigment cells, and the plasticity associated with carotenoid pigmentation may mean that carotenoid pigmentation has evolved under different selection pressures compared with melanic pigmentation (Svensson & Wong, 2011). Further studies investigating carotenoid phenotypes would be invaluable to determine whether carotenoid colouration does evolve through different mechanisms compared to melanic pigmentation.
For melanic pigmentation, there were no significant differences between categories. The category with the highest cisregulatory proportion was white spotting (29.1%), followed by amelanism (18.0%), pheomelanism (13.9%), dilution (14.3%), and eumelanism (12.7%). Notably, two phenotype categories were strongly associated with 'domesticated' entries: 18 out of 20 dilution entries, and all 55 entries for white spotting, were identified in domesticated taxa (Fig. S5). White spotting phenotypes are associated with strong pleiotropic effects, with many of these entries in Gephebase being deleterious in the homozygous state. It is therefore unsurprising that white spotting mutations were only identified in domesticated taxa, where they can be subject to heterozygous advantage (Hanly et al., 2021;Hedrick, 2015). This may reflect the nature of white spotting phenotypes, which result from impairment of pigment cell migration or differentiation. It is possible that coding mutations that cause white spotting are more likely to be non-viable due to their pleiotropic effects on upstream cellular development.
Phenotypes relating to pigment pattern alteration were associated with a significantly higher proportion of cisregulatory mutations (31.4%) than phenotypes relating to colour shifts (13.7%) (Fisher's exact test, P = 0.0003; Fig. 4A). This general pattern was seen for every study methodology and taxonomic status (Figs S5 and S6), although many comparisons did not reach statistical significance. This difference between colour and pattern phenotypes appears to be most pronounced in non-domesticated animals: for intraspecific and interspecific entries combined, pattern phenotypes had a cis-regulatory proportion of 63.6% compared with 16.8% for colour (Fisher's exact test, P < 0.0001; Fig. S5). This is despite white spotting phenotypes being almost exclusively found in domesticated animals as well as exclusively categorised as pattern alterations. Together, these results imply that there may be a disparity between colour and pattern phenotypes in terms of the mutation type involved, particularly in the evolution of non-domesticated animals.
Repeating this analysis after correction for the number of entries for each gene (Fig. S7A), again increased the overall proportions represented by cis-regulatory mutations for each category. Both eumelanism (46.9%) and amelanism (47.9%) appeared to involve a higher proportion of cis-regulatory mutations than other melanic pigmentation categories (pheomelanism 27.5%, dilution 25.3%, white spotting 34.3%), mainly due to the large numbers of tyr (amelanism) and mc1r (eumelanism) entries in these categories in our data set. Therefore, this comparison also does not support the hypothesis that loss-of-pigmentation phenotypes are more likely to be due to coding mutations. Additionally, this analysis provides further support for a disproportionate role of cis-regulatory changes in the generation of pattern variation (53.0%) compared with colour shifts (36.9%).
(9) Upstream developmental processes are associated with a higher proportion of cis-regulatory mutations Differences in mutation types may also be associated with the developmental or cellular function fulfilled by the gene in question. We hypothesised that genes associated with upstream developmental processes would exhibit different proportions of mutation types compared with those associated with downstream processes. For example, genes that play a role in pigment cell fate specification (an upstream process) may be less tolerant of coding sequence mutations than genes contributing only to pigment deposition (a downstream process). To test this, we assigned each gene to one of six categories reflecting different cellular and/or developmental processes (Table 3). Three of these categories are cell type-specific (downstream) functions: melanosome formation, pigment biosynthesis, and pigment trafficking/ localisation. The remaining categories involved upstream processes: pigment cell fate specification, pigment cell precursor survival/migration, and cellular interactions ( Table 3).
The three categories representing the more upstream cellular/developmental processes appeared to be associated with a higher proportion of cis-regulatory mutations than the categories representing downstream cellular functions, although many of these differences were not statistically significant (Fig. 4B). However, grouping the upstream (37.5%, N = 88) and downstream (15.9%, N = 258) categories, showed a statistically significant difference between them (Fisher's exact test, P < 0.0001). Note that there were only four entries for the cellular interactions category, all of which involved cis-regulatory mutations (Fig. 4B); this small number makes it difficult to draw conclusions regarding this category. All categories representing downstream processes, i.e. melanosome formation (17.0%), pigment biosynthesis (15.8%) and pigment trafficking/localisation (13.3%), were Biological Reviews 98 (2023)  Vertebrate pigmentation analysis associated with a low proportion of cis-regulatory mutations. The overall pattern supports the hypothesis that for upstream processes, such as pigment cell fate specification, mutations are more likely to be cis-regulatory when compared with downstream cellular processes which do not necessarily affect cell viability, such as variable rates of pigment deposition. This may reflect a lower likelihood of pleiotropic effects associated with downstream processes.
The differences between upstream and downstream processes persist after correction for the number of entries associated with each gene (Fig. S7B), with the exception that a greater proportion of the pigment biosynthesis category involves cis-regulatory mutations (50.4%), reflecting a preponderance of some overrepresented genes, such as mc1r, tyr, and tyrp, within this category. Note, however, that with respect to pleiotropy, correcting for the number of entries per gene may lead to misinterpretation. A large number of entries in our data set for a small number of biosynthesis genes could be explained if such genes are less constrained by pleiotropy, rather than simply reflecting a literature bias. Such considerations mean that these results must be interpreted with caution. Nevertheless, our findings may suggest that pigment biosynthesis genes are more functionally constrained than other downstream genes.
(10) Transcription factors are associated with a higher proportion of cis-regulatory mutations than other molecular functions We were also interested in whether the specific molecular activity of a gene would influence the distribution of mutation types. Coding mutations are constrained by the degree of functional conservation required by the protein's molecular function. Transcription factors implicated in multiple regulatory functions may be more constrained than enzymes that catalyse pigment-specific pathways, as even loss-of-function mutations may be tolerated in enzymes that are functionally specialised and non-essential (P al, Papp & Lercher, 2006). For example, differential expression of the transcription factor sox10 resulting from cis-regulatory mutations underlies changes to melanogenesis in both pigeons and chickens (Domyan et al., 2014;Gunnarsson et al., 2011). The sox family of transcription factors, including sox10, are highly pleiotropic and essential for neural crest specification, migration and differentiation (Sarkar & Hochedlinger, 2013). No sox10 coding mutations were identified in this data set. Conversely, multiple presumptive null coding mutations have been identified in mc1r, including one example of parallel evolution in the cavefish Astyanax mexicanus (Gross, Borowsky & Tabin, 2009). Thus, in contrast to sox10 and other transcription factors, highly specialised proteins such as mc1r may be less susceptible to pleiotropy arising from null mutations. Thus, we predicted that molecular functions that are less pleiotropic would exhibit higher proportions of cis-regulatory mutations, and vice versa.
Using EBI's QuickGO mouse slimmer and manual combination of highly related molecular GO terms, we assigned each entry to one of seven categories of molecular function (see Section II.3, Table 4) and examined the proportions of cis-regulatory and coding mutations for each category (Fig. 4C). DNA binding had the highest proportion of cisregulatory mutations of any category (63.0%) and was the only category in which they exceeded coding mutations. Signalling receptor binding had the second highest proportion of cis-regulatory mutations (46.3%), followed by protein binding (25.7%), and the lowest proportion was identified for signalling receptor activity (6.7%). Taken together, these results suggest that the ability of transcription factors to evolve coding variants that lead to phenotypic changes may indeed be limited by their pleiotropic roles in development, while their non-coding regions offer a more viable path for phenotypic changes.
One potentially interesting result is that we found a significantly higher proportion of cis-regulatory mutations for ligands (signalling receptor binding) than for receptors (signalling receptor activity) (Fisher's exact test, P < 0.0001). For most signalling receptor genes, the corresponding ligand was also present in the data set in the signalling receptor binding category. For all ligand/receptor pairs, we found a higher proportion of cis-regulatory mutations in the ligand than in its receptor, including the receptor/ligand pairing with the most entries in the data set (mc1r receptor 1.1%, its ligand asip 41.6%). A similar example is provided by kit/kitlg. For the kit receptor, the proportion of cis-regulatory mutations was 17.5%, with multiple putative null mutations reported. Although kitlg has only four entries in our data set, three are cis-regulatory. Both the low number of kitlg entries and its relatively high proportion of cis-regulatory mutations may indicate lower evolutionary tolerance of coding mutations when compared with its receptor. Overall, our results suggest that the coding sequence of genes encoding signalling ligands is more constrained than that of their signalling receptors, and support previous findings that cisregulatory mutations in ligand genes drive morphological evolution .
It has been hypothesised that ligands may play a role in altering spatially localised phenotypes due to the relatively high modularity of their expression patterns . For example, in deer mice (Peromyscus spp.) alternative isoforms of asip are differentially expressed in different body regions, so that cis-regulatory mutations can drive spatially localised pattern changes by altering relative isoform expression (Mallarino et al., 2017). Additionally, it has been shown that functional residues within regions of protein interaction are more highly conserved than residues outside of interaction regions, thus it may be the case that only the ligand-binding domains of receptor proteins are highly constrained by specificity, whereas other domains are more tolerant of coding mutations (Worth, Gong & Blundell, 2009). Alternatively, ligand genes, which typically encode a smaller protein, may be expected to have a greater proportion of their coding sequence functionally constrained when compared with the corresponding receptor genes. Taken together, both the modularity associated with ligand expression patterns as well as the greater number of Biological Reviews 98 (2023)  mutation-tolerant domains in receptors may explain the higher proportion of cis-regulatory mutations in ligandencoding genes compared to receptor genes.
The same general patterns were observed when repeating this analysis after correction for the number of entries associated with each gene (Fig. S7C), although absolute proportions of cis-regulatory mutations were higher for most categories. The category with the largest increase compared with the uncorrected data (Fig. 4C) was catalytic activity, primarily because of the large number of coding entries in this category for tyr and tyrp.
(11) Divergent pigmentation systems have fewer shared genetic hotspots Studies of the genetic basis of evolutionary change have shown that repeated co-option of the same genes underlies the evolution of convergent phenotypes. Genes that are repeatedly found as causal drivers of similar phenotypic changes in diverse taxa or populations have been referred to as 'genetic hotspots' (Martin & Orgogozo, 2013). We hypothesised that similar pigmentation systems would be more likely to evolve through the same sets of genes, whereas more divergent pigmentation systems would have fewer shared genetic hotspots.
In our comparison of normalised gene overlap between clades (see Section II.4), the greatest overlap was found for mammalian entries for genes shared with birds, and in bird entries for genes shared with mammals (Fig. 5). For squamates, the greatest overlap was for genes shared with teleosts. Teleosts had comparatively few shared hotspot genes with any other clade, with the greatest overlap being with mammals, possibly due to the large number of teleost entries related to melanic phenotypes (65.5%, 19 out of 29 entries). Additionally, there was considerable overlap between birds and squamates which may reflect their phylogenetic closeness. As the number of shared hotspot genes would be expected to reduce over evolutionary time, phylogenetic distance is likely to contribute to the observed overlap between clades (Conte et al., 2012).
Although mc1r was highly represented in the data set for all clades, and thus contributed disproportionately to our hotspot calculations for all clade pairings, many other genes that were highly represented in the data set were confined to mammals and birds only. The second, third, and fourth most represented genes, kit, asip, and tyrp1, which all function in development of melanic pigments, had no teleost or squamate entries. However, other melanogenic genes exhibited overlap between teleost and squamate entries, such as oca2, a melanosomal transport protein (Table S1). In total, oca2 comprised 5 out of 29 (17.2%) teleost entries, and 1 out of 10 (10%) squamate entries, two in mammals (0.82%) and was absent entirely in birds. Only one non-melanic pigmentation gene was shared between birds and squamates. Aside from mc1r, no single gene was identified in all four clades.
Overall, the observed trend of higher hotspot overlap between similar pigmentation systems may indicate shared genetic mechanisms underlying pigment evolution in systems with or without multiple pigment cell classes. The low number of non-melanic pigmentation entries (only 31 in total) makes it difficult to draw conclusions about the relevance of hotspot genes outside of melanogenic pathways for clades with additional pigment types.

IV. DISCUSSION
Gephebase is a powerful tool that provides an accessible compilation of data on phenotype-genotype associations to allow investigations of general patterns regarding the mechanisms underlying evolutionary change. Importantly, this data set can be used to identify gaps in the literature, and to identify novel directions and study systems for future research. Here, we analysed a large data set of 389 Gephebase entries, each representing a pair of alleles linked to phenotypic variation in vertebrate pigmentation. Of the three forms of empirical evidence accepted by Gephebase, it was clear that the study methodology with the highest ascertainment biasthe candidate gene approachidentified more coding sequence mutations than the other methods. Irrespective of this bias, protein coding changes were still the majority of mutations identified in all analyses, representing 75% of entries for linkage mapping studies and 50% for association mapping studies. In addition, we identified several factors that may affect the proportions of coding and cis-regulatory mutations. Below we discuss these factors and highlight new avenues of research in the vertebrate pigmentation field.
(1) Strong selection pressure might explain the higher proportion of coding mutations associated with vertebrate pigmentation diversity A majority of studies on vertebrate pigmentation, particularly in domesticated species, identified coding mutations.

Vertebrate pigmentation analysis
In the case of domesticated vertebrates, this may be explained by the strong selective pressures inherent in the domestication process. During any form of artificial selection, desirable traits will be under strong directional selection, potentially outweighing any negative pleiotropic effects of coding mutations. One example is that of 'frame overo' horses, where changes in patterning are a result of heterozygosity for a coding mutation in the endothelin B receptor (ednrb), which leads to a white spotted pattern (Metallinos, Bowling & Rine, 1998). Homozygosity for this mutation leads to lethal white foal syndromeaffected foals are completely white and die shortly after birth. Selection for white spotted patterns in domestic horses has thus led to a lethal deleterious coding mutation becoming established, whereas in wild populations such a mutation would likely not have conferred sufficient advantage to overcome its negative pleiotropy.
A sufficiently strong selection pressure may likewise allow the emergence of pleiotropic coding mutations in wild populations.
Gephebase contains examples that demonstrate how rapidly changes in pigmentation patterns may evolve when under such pressures, particularly in the context of cryptic camouflage in a new or rapidly changing environment. Populations of North American deer mice exhibit high gene flow, and thus low population genetic structure, with the exception of the agouti (asip) locus. Strong selection for crypsis against variable soil colouration has led to high variance in the asip coding sequence, and rapid divergence driven by colonisation over the last 4000 years (Pfeifer et al., 2018). The numerous intraspecific coding mutations in our data set may therefore be indicative of the strength of selection typical for vertebrate pigmentation evolution. In the future it would be interesting to investigate the relative strength of selection on each mutation type, and to perform the same analyses for multiple morphological traits to determine if the observed trends are pigmentation specific.
(2) Evolutionary timescales and taxonomic distance We found that shorter taxonomic distances (i.e. domesticated and intraspecific entries) were associated with a higher proportion of protein coding changes than in interspecific entries. These results should be taken with caution as the smaller number of interspecific studies indicates comparatively little investigation of pigmentation traits across larger evolutionary timescales. Interestingly, the majority (8 out of 14) of interspecific entries in our data set were for teleosts, despite their overall underrepresentation. Four of these were from East African cichlids, a group of teleost fishes characterised by rapid diversification and speciation. The observed higher frequency of cis-regulatory mutations found in interspecific studies thus could be biased by a focus on these particular cases of rapid evolution.
The current overrepresentation in the data set of shorter evolutionary timescales may itself introduce bias in the types of mutation that can be identified. Shorter evolutionary timescales may allow for pleiotropic mutations, as well as mutations that confer an advantage only in a specific environmental context. Thus, the higher number of intraspecific and domesticated case studies is likely to result in overrepresentation of coding mutations. Expanding the timescales investigated in pigmentation evolution will facilitate testing of this hypothesis, and a greater understanding of how pigmentation changes over evolutionary time.
Identifying bird and mammal model systems for interspecific studies is challenging owing to their often-limited potential for hybridisation, restricting the types of mapping techniques available. For instance, the only two interspecific mammalian entries examined differences within the black rat (Rattus spp.) species complex, and between the house mouse (Mus musculus) and tobacco mouse (Mus poschiavinus), respectively (Kambe et al., 2011;Robbins et al., 1993). Both of these studies used a candidate gene approach. In birds, Toews et al. (2016) used an association mapping approach to examine pigmentation evolution in two naturally hybridising species of warbler with highly divergent pigment patterns. Diversification of model systems in the future may facilitate a greater understanding of long-term pigment evolution in birds and mammals. (

3) Potential biases introduced by study systems
Mammalian and avian systems dominate our data set. These taxa possess only melanocytes as pigment cells, and pigmentation differences primarily result from spatial and temporal differences in pigment production and deposition. In other vertebrates, pigmentation patterns result from the presence and differential arrangement of different classes of pigment cells that share a developmental origin. The few studies in the Gephebase database that focus on clades with distinct chromatophore classes often found novel pigmentation phenotypes associated with a wider range of genes. This is corroborated by the greater numbers of shared hotspot genes between clades with similar pigmentation systemsthere was considerable overlap between mammal and bird hotspot genes. Studies in clades with distinct chromatophore classes also implicated a greater role for cis-regulatory mutations. As current research continues to contribute new data, it will be interesting to investigate whether this trend is confirmed, or whether the observed higher prevalence of cis-regulatory mutations in telosts, squamates and amphibians is an artefact of the small number of studies available to date. A greater emphasis on a broader set of model systems will enhance our understanding of vertebrate pigmentation evolution.
It would be of interest to expand the number of studies on teleosts and reptiles to validate the hypothesis that pigmentation systems with multiple chromatophore classes evolve preferentially through cis-regulatory modifications. Similarly, amphibian pigmentation evolution was only represented by two studies in our data set. Amphibians' highly permeable skin and lack of epidermal protection, as well as their commonly biphasic aquatic and terrestrial life cycles, may necessitate distinct adaptive functions for colouration compared with other vertebrate clades (Rudh &  Qvarnström, 2013). Additional information for these underrepresented clades will be highly informative, particularly with respect to interactions between chromatophore classes and pattern formation.
(4) The importance of understanding the cellular and developmental basis of variation We found a higher proportion of cis-regulatory mutations for changes in patterning than for changes in colour. This is consistent with previous hypotheses that cis-regulatory mutations largely govern spatial changes due to altering expression patterns (Carroll, 2008). For example, in sticklebacks (Gasterosteus aculeatus) a cis-regulatory allele of the kit ligand gene reduces its specific expression in gill tissue, leading to divergent gill pigmentation phenotypes . Gephebase provides several such examples, where mutations in widely expressed genes are cis-regulatory, so that expression only in specific tissues is affected.
Nonetheless, a majority of pattern alteration case studies still involve coding mutations. When looking more closely at these, we found that many involve loss-of-function mutations in genes that are expressed only in specific regions of the body. For instance, the urucum breed of canary (Serinus canaria) exhibits a red bill and legs, which are normally yellow in wild-type canaries (Gazda et al., 2020). This is the result of a mutation that compromises the enzymatic activity of betacarotene oxidase 2 (bco2), an enzyme responsible for breakdown of full-length red carotenoids into shorter apocarotenoids. The specificity of this allele in affecting the bill and the legs is the result of loss-of-function mutations in a gene whose expression is restricted to specific tissues. Without knowing the expression patterns and the level of pleiotropy of a given gene, we may be wrongly categorising phenotypes as pattern alterations when in fact they are colour shifts. Unfortunately, we have no way of testing this, since most case studies focus solely on the genotype-phenotype association without conducting gene expression or developmental studies. Knowledge of a gene's specific function, together with its placement in a cellular and developmental context, is essential for an understanding of when and how certain genes and mutations are favoured over others.
Characterising trait development at the gene-expression and cellular level is also important since we found potential differences in mutation proportions associated with the upstream or downstream position of cellular and developmental processes. The genes contributing to upstream processes, i.e. those related to fate specification and cellular interactions, were more likely to be cis-regulatory compared with downstream processes. A previous study (Stern & Orgogozo, 2008) distinguished two categories of genes: those that act at or near the terminal points of regulatory networks, named 'differentiation gene batteries', and those that are upstream in the gene network. They found that the proportion of cis-regulatory mutations was not significantly different for the two categories. This analysis was based on a compilation of 331 mutations for all types of traits across animals and plants. It is possible that this previous categorization was affected by the very diverse gene networks considered. Here, by focusing only on vertebrate pigmentation, we uncovered a potential effect of upstream versus downstream position on the proportion of cis-regulatory mutations. While these results contradict those of Stern & Orgogozo (2008), this may reflect an effect specific to vertebrate pigmentation. Additional studies, particularly focusing on upstream genes, may provide useful clarification.
One study identified a cis-regulatory mutation that altered the expression of colony stimulating factor 1 (csf1), a signalling factor expressed in xanthophores that is required for their differentiation and survival (Patterson, Bain & Parichy, 2014). In Danio albolineatus, a close relative of the zebrafish D. rerio, increased expression via a cis-regulatory mutation of csf1a was associated with early xanthophore recruitment (Patterson et al., 2014). This precocious xanthophore appearance led to changes in abundance and distribution of all three principal chromatophore classes (melanophores, xanthophores, and iridophores) which ultimately inhibited formation of the stripe pattern found in other Danio species. This study illustrates the potential complexity of phenotypes arising from mutations that affect cellular interactions. The identification of such mutations represents an exciting avenue for future pigmentation evolution research.
By contrast, coding mutations in downstream processes would typically be expected to have less impact on nonpigmentation phenotypes. One study found that a loss-offunction mutation in scavenger receptor B1 (scarb1) in canaries produces a completely white phenotype due to scarb1 being necessary for carotenoid uptake. As a result of the specialised role of scarb1, this null coding mutation produces a spectacular pigmentation phenotype (Toomey et al., 2017). Together, these results support the hypothesis that developmentally upstream genes may be more constrained by pleiotropy. Given the overrepresentation of pigment synthesis case studies in the vertebrate pigmentation data set, the collection of additional data regarding other types of trait variation is imperative.
(5) What are the main knowledge gaps?
Most vertebrate pigmentation entries (363 out of 389) in our data set result from the study of variation in melanic traits. By contrast, variation in other types of colour traits, such as carotenoids and pteridines, structural colouration, and colour plasticity remain largely unexplored. Further, most case studies address naturally selected traits with a relatively simple genetic basisonly 21 out of 389 entries assessed variable sexually dimorphic traits (see supporting information, File S1, tab: All Vertebrates). It would therefore be interesting to investigate the proportions of mutation types for sexually selected traits or highly polygenic traits where each mutation potentially has a small effect.
A focus on previously unexplored traits may allow identification of uncharacterised genes underlying these traits. For example, the study of variation in yellow pigmentation in budgerigars (Melopsittacus undulatus) led to the identification of a coding mutation in a previously uncharacterised

Vertebrate pigmentation analysis
polyketide synthase that is involved in the synthesis of yellow polyene pigments called psittacofulvins (Cooke et al., 2017).
We found only four entries in the data set relating to structural colouration, all of which involved iridophore distribution. Three of these entries involved phenotypes affecting multiple chromatophore classes, of which two were classified as belonging to the 'cellular interactions' category (Table 3). For example, in an interspecific study, Spiewak et al. (2018) investigated pigment pattern differences between two Danio fish species. D. nigrofasciatus has a disrupted stripe pattern compared to D. rerio, which is associated with a cis-regulatory mutation in the endothelin-3 (edn3) signalling peptide (Spiewak et al., 2018). Because edn3 promotes iridophore proliferation, this cis-mutation leads to decreased edn3 expression and correspondingly lower iridophore complement in D. nigrofasciatus. A lower number of iridophores secondarily affects melanophore proliferation, leading to a reduction of both cell types and disruption of the defined striped pattern in D. nigrofasciatus. The fourth structural colouration entry involved changes only to iridophores. This study investigated ectopic iridophore proliferation in a domesticated leopard gecko morph, implicating a serine peptidase inhibitor, spint1, in iridophore abundance and the development of skin tumours. This entry was the only study for spint1, and its specific role in iridophore biology remains unclear. The low number of Gephebase entries relating to structural variation highlights that structural colouration is relatively unexplored, and represents an emerging field of research where new genes and developmental processes leading to variation in cells and physical structures will be identified.
The current entries in Gephebase are heavily focused on colouration traits that are genetically determined independently of environmental variables. Colour variation can also arise as a result of phenotypic plasticity, where the same genotype can generate different colour states under different conditions. Six entries pertained to plastic or seasonal colour changesfor example, cis-regulatory alleles of asip determine whether the winter coat of snowshoe hares is brown or white (Jones et al., 2018). Determining how plasticity emerges and evolves remains a challenge. Genes involved in plasticity evolution can only be identified when mapping differences between closely related species or populations that show variation in the presence and absence of plastic colouration (Gibert, 2017). Likewise, variation in plastic colour responses can only be mapped by measuring variation in population reaction norms, which are costly and time-consuming experiments. Nonetheless, such studies will be key to differentiate between the two main hypotheses regarding the evolution of plasticity: whether evolutionary change occurs through changes in genes that sense and regulate a downstream response to external factors, or instead through changes in the colouration genes themselves that become responsive to environmental cues.
Sexually selected traits are also underrepresented in the database, making up only 5.4% of the case studies. Understanding the interplay between natural and sexual selection on colour traits is important since they may often act in opposite directions. For example, in several species of Lake Malawi cichlids a well-camouflaged colour morph is associated with a cis-regulatory mutation in pax7. However, this mutation also has a deleterious effect in that it disrupts male nuptial colouration. This conflict has been resolved by the invasion of a novel sex determination locus in tight linkage with the pax7 allele (Roberts, Ser & Kocher, 2009). The importance of pigmentation patterns in mate choice can lead to such conflicts, and the ways in which they are resolved could provide fascinating case studies. Sexually selected traits also present a conundrum in that it is unclear why and how trait variation is maintained in natural populations despite apparent directional selection due to female choice or male-male competition (Chenoweth & McGuigan, 2010). Identifying genes and genomic regions contributing to trait variation and studying its adaptive significance in wild populations creates the opportunity to understand this paradox (Johnston et al., 2013).
Finally, the Gephebase data set is currently biased towards large-effect loci, as linkage mapping is limited in its statistical power to detect small-effect mutations and polygenic architectures (Rockman, 2012;Slate, 2013). Association studies conducted on thousands of samples are now reaching a point where small-effect loci are detectable, under the conditions that these variants are common: for instance, several GWAS studies of human skin pigmentation levels have uncovered an amino acid polymorphism of mfsd12 as a determinant of colour variation across several continents (Adhikari et al., 2019;Crawford et al., 2017;Feng, McQuillan & Tishkoff, 2021;Lona-Durazo et al., 2019). This gene is now a candidate that is showing association signals in domestic animal studies (Hédan et al., 2019;Tanaka et al., 2019), suggesting it is effectively a hotspot of pigment variation. While human GWAS studies are systematically curated and will undoubtedly lead to powerful meta-analyses (Buniello et al., 2019), and while infrastructure is being created to integrate these data with laboratory model systems (Shefchek et al., 2020), there is a lack of resources to compile data from evolutionary and artificial breeding gene-to-trait relationships beyond these organisms. Gephebase, OMIA, and Animal QTLdb Hu, Park & Reecy, 2021;Lenffer et al., 2006) represent current attempts to curate these data, but long-term resources are needed to keep pace with increased rates of discovery in the genomic age.

V. CONCLUSIONS
(1) As more genetic variants underlying trait variation are identified, it becomes possible to test more rigorously predictions relating to the mechanisms of genetic evolution. Here, we highlight some of the trends in studies of vertebrate pigmentation evolution, and specifically test some of the predictions made about the relative frequencies of cis-regulatory and coding mutations. In contrast to expectations, we found that the majority of the documented variation in pigmentation is driven by coding sequence mutations. (2) However, we also identified multiple factors potentially associated with the observed proportions that partly explain this disparity. We make suggestions for the future direction of vertebrate pigmentation research with respect to both systems and study design. As the number and variety of case studies continues to increase, we expect our understanding of the genetic, cellular and developmental mechanisms underlying the evolution of vertebrate pigmentation to expand.

VI. ACKNOWLEDGEMENTS
We thank members of the Morphological Evolution Lab for discussion. A. M. and V. C.-O. were supported by a John Templeton Foundation grant from 2014 to 2017 (JTF 43903). M. E. S. is supported by NERC IRF NE/R01504X/1.

VIII. SUPPORTING INFORMATION
Additional supporting information may be found online in the Supporting Information section at the end of the article.
File S1. Data set used in this analysis downloaded from Gephebase on 15 August 2022 (tab: All Vertebrates), the gene ontology (GO) assignments for each gene (tab: GO Assignments), and the final molecular function categories including child GO terms (tab: Molecular Function Summary). Fig. S1. Relative proportions of cis-regulatory and coding mutations associated with each study methodology, and in the total data set, after correcting for the number of entries associated with each gene.   S2. Relative proportions of cis-regulatory and coding mutations associated with different taxonomic status (domesticated, intraspecific and interspecific variation) and study methodology, and the total proportion for each taxonomic status. Fig. S3. (A) Relative proportions of cis-regulatory and coding mutations associated with each taxonomic status and clade, and in the total data set, after correcting for the number of entries associated with each gene. Fig. S4. Relative proportions of cis-regulatory and coding mutations associated with different taxonomic status (domesticated, intraspecific variation, and interspecific variation) and clade, and the total proportion for each taxonomic status, after removing ambiguous cases. Fig. S5. Relative proportions of cis-regulatory and coding mutations associated with different phenotype categories according to taxonomic status: domesticated, intraspecific, and interspecific, and the total proportion for each taxonomic status, after removing ambiguous cases. Fig. S6. Relative proportions of cis-regulatory and coding mutations associated with different phenotype categories according to study methodologies: candidate gene, linkage mapping, and association mapping, and the total proportion for each study methodology, after removing ambiguous cases. Fig. S7. (A) Relative proportions of cis-regulatory and coding mutations associated with each phenotype category, functional category, and molecular function category, and the total for the data set, after correcting for the number of entries associated with each unique gene. Table S1. The 10 genes for each clade with the most total entries across other clades, excluding amphibians. Vertebrate pigmentation analysis