Phenotypic and genetic diversity in aposematic Malagasy poison frogs (genus Mantella)

Abstract Intraspecific color variation has long fascinated evolutionary biologists. In species with bright warning coloration, phenotypic diversity is particularly compelling because many factors, including natural and sexual selection, contribute to intraspecific variation. To better understand the causes of dramatic phenotypic variation in Malagasy poison frogs, we quantified genetic structure and color and pattern variation across three closely related species, Mantella aurantiaca, Mantella crocea, and Mantella milotympanum. Although our restriction site‐associated DNA (RAD) sequencing approach identified clear genetic clusters, they do not align with current species designations, which has important conservation implications for these imperiled frogs. Moreover, our results suggest that levels of intraspecific color variation within this group have been overestimated, while species diversity has been underestimated. Within major genetic clusters, we observed distinct patterns of variation including: populations that are phenotypically similar yet genetically distinct, populations where phenotypic and genetic breaks coincide, and populations that are genetically similar but have high levels of within‐population phenotypic variation. We also detected admixture between two of the major genetic clusters. Our study suggests that several mechanisms—including hybridization, selection, and drift—are contributing to phenotypic diversity. Ultimately, our work underscores the need for a reevaluation of how polymorphic and polytypic populations and species are classified, especially in aposematic organisms.

Examples of color variation in aposematic organisms-where conspicuous warning signals advertise toxicity or unpalatability to predators (Poulton, 1890)-are particularly compelling. Aposematic colors are often highly contrasting, variable, and potentially exhibit tradeoffs between natural and sexual selection Estrada & Jiggins, 2008;Jiggins, Naisbit, Coe, & Mallet, 2001;Nokelainen, Hegna, Reudler, Lindstedt, & Mappes, 2012;Reynolds & Fitzpatrick, 2007;Stevens & Ruxton, 2012;Summers, Symula, Clough, & Cronin, 1999). Historically, variation in aposematic signals has been considered perplexing from a theoretical perspective because phenotypic diversity is expected to be highly constrained in such systems due to positive frequency-dependent selection via predation (Briolat et al., 2018;Endler & Greenwood, 1988;Mallet & Joron, 1999). Once a predator has learned to associate toxicity with a particular phenotype, protection should be conferred to those organisms displaying a similar phenotype, encouraging uniformity in warning coloration. According to theoretical predictions of predator avoidance learning, novel phenotypes should be unrecognizable to predators as toxic and thus quickly removed from populations (Guilford & Dawkins, 1993;Mallet & Barton, 1989;Müller, 1879). In recent years, however, studies have identified many biotic and abiotic factors-including intraspecific communication, parasite load, temperature and variability in predator learning and sensory abilities-that contribute to variation in aposematic signals (reviewed in Briolat et al., 2018). Growing awareness of the variety of selective factors influencing aposematic coloration has led scientists to encourage a more holistic approach when investigating diversity in warning coloration and to consider the range of factors that may be at play in maintaining phenotypic variation (Briolat et al., 2018).
Studies of color variation within aposematic species have traditionally focused on systems demonstrating either multiple color morphs within a population (i.e., polymorphism) or geographic color variation among populations (i.e., polytypism) (reviewed in Briolat et al., 2018). In systems with high color variability, however, determining whether color variants represent different species, different populations, or different morphs within populations is difficult particularly when genetic structure is not well resolved. Distinguishing between species, populations, and morphs can be especially challenging in phenotypically diverse poison frog groups, where high rates of phenotypic variation can confound our understanding of species delimitations (Posso-Terranova & Andrés, 2018;Roland et al., 2017;Tarvin, Powell, Santos, Ron, & Cannatella, 2017).
In many instances, the inability to distinguish whether color variation occurs within populations, between populations, or between species is further compounded by nonexistent or limited genetic datasets, which lack the resolution needed to clarify relationships among populations. Tarvin et al. (2017) recently demonstrated that the level of interspecific mitochondrial divergence among four distinct poison frog species was comparable to the divergence levels observed between populations considered to be a single polymorphic species (Hauswaldt, Ludewig, Vences, & Pröhl, 2011). Noting the limitations of mitochondrial data in resolving species boundaries, the authors explicitly called for genome-level studies, in combination with information on phenotypic diversity and natural history, to understand relationships in these complicated systems (Tarvin et al., 2017). The power of more comprehensive genetic datasets to resolve genetic structure in poison frogs has been demonstrated in recent studies where Neotropical poison frogs considered to be a single species in fact contained multiple genetic lineages that potentially represent new species (Posso-Terranova & Andrés, 2018;Roland et al., 2017).
While the relationship between phenotypic and genetic diversity has been extensively studied in the Neotropical poison frogs (e.g., Wang & Summers, 2010;Twomey et al., 2013;Roland et al., 2017;Tarvin et al., 2017), there is an entirely separate radiation of poison frogs in which color diversity has never been examined. Endemic to Madagascar, the Mantella genus describes sixteen species of toxic, diurnal frogs exhibiting variable coloration and pattern both within and among species (Glaw & Vences, 2007). Commonly called Malagasy poison frogs, the bright coloration displayed by many species within this group is presumed to be aposematic. The toxic skin alkaloids found in Mantella species are believed to be derived from arthropod prey (Daly, Andriamaharavo, Andriantsiferana, & Myers, 1996;Daly, Garraffo, Hall, & Cover, 1997), similar to the Neotropical poison frogs, and variation in alkaloid composition has been observed among species, populations, and habitats (Andriamaharavo et al., 2015;Daly et al., 2008). The mechanism of chemical defense is hypothesized to have evolved convergently in Neotropical and Malagasy poison frogs (Clark, Raxworthy, Rakotomalala, Sierwald, & Fisher, 2005). Despite their high degree of phenotypic variation and apparent similarity to the Neotropical poison frogs (Heying, 2001b;Rojas, 2016), little is known about the natural history, ecology, and genetic background of Malagasy poison frogs.
Within the Mantella genus, one complex of three closely related species, Mantella aurantiaca, Mantella crocea, and Mantella milotympanum, demonstrates a particularly high degree of variability in conspicuous coloration and pattern. Found in the rainforests of central-eastern Madagascar, the geographic range of all three species is highly restricted and patchy in its distribution (Bora et al., 2008).
Among and within populations, there is exceptional phenotypic variation both in dorsal coloration, which ranges from red to green at the extremes, and in patterning elements, which are variable in the degree of ventral spotting and black banding present on the side. Yet, any attempt to understand the phenotypic diversity in this group is hindered by the unresolved taxonomy of these species. Previous genetic work is limited to a handful of mitochondrial DNA and allozyme studies, which have yielded somewhat confusing results Schaefer, Vences, & Veith, 2002;Vences, Chiari, Raharivololoniaina, & Meyer, 2004;Vences, Hille, & Glaw, 1998).
Mantella aurantiaca, M. milotympanum, and M. crocea are thought to fall within the Mantella madagascariensis group, one of five clades within the Mantella genus, though their position in this group is controversial (Schaefer et al., 2002;Vences et al., 2004). Population genetic studies have detected high degrees of haplotype sharing between M. milotympanum and M. crocea, resulting in the hypothesis that these two species are conspecific . Additionally, frog populations displaying patterning that is intermediate between that of M. crocea and M. milotympanum exist in the wild and are referred to as M. cf. milotympanum in the literature . Evidence of haplotype sharing between M. aurantiaca and M. crocea Vences et al., 2004) has also prevented taxonomic resolution within this group, and species designations remain controversial.
Given the lack of resolution in previous molecular studies, it is apparent that a high-resolution genetic dataset is needed to both clarify relationships within this group and to determine whether observed color variants represent distinct species or morphs. In this study, we used a restriction site-associated (RAD) sequencing approach, in combination with multiple matrix regression analysis, to compare variation in dorsal coloration and side and ventral patterning with genetic and geographic distance across the entire known range of three species of Malagasy poison frog. Specifically, our objectives were to (a) clarify genetic structure among populations of M. crocea, M. milotympanum, and M. aurantiaca, (b) quantify variation in dorsal coloration and side and ventral patterning both among and within populations, and (c) describe the relationship between genetic diversity, phenotypic diversity, and geographic distance for all major genetic clusters within this three-species complex. This study-the first quantitative and objective exploration of color diversity in the Mantella genus-not only provides a foundation for future studies of color evolution in Malagasy poison frogs but also identifies several critical issues that should be more thoroughly considered in any investigation of aposematic organisms presumed to be polymorphic or polytypic.

| Field sampling
We sampled three closely related species, currently named M.

| Quantification of phenotypic variation
To characterize frog coloration and pattern, we photographed the dorsal, ventral, and side surfaces of all frogs in a standardized manner following a protocol modified from Stevens, Párraga, Cuthill, Partridge, and Troscianko (2007). Photographs of all frogs were taken after transportation to the field laboratory, but before any other handling occurred. Frogs were always photographed in natural light during the hours of 1:00-5:00 p.m. using a Pentax K-30 digital single-lens reflex camera fitted with a Pentax 18-135 mm lens. All frogs were photographed on a white paper background with a scale bar and a white-gray-black standard present (QPcard 101; gray standard reflectance value of 18%). Two individuals (ABNK09 and RAN11) were excluded from our phenotypic analysis due to unsuitable photographs.
Briefly, we calculated three axes from our RGB values corresponding to a red-green channel, (R − G)/(R + G), a blue-green channel, , and a luminance channel, which we defined as untransformed R values. Because luminance is processed separately in vertebrates (Endler, 2012;Endler & Mielke, 2005), we used our other  Krohn and Rosenblum (2016). Chroma was calculated as the hypotenuse of the triangle formed by the x and y axes, and hue was calculated as the angle between the hypotenuse and the x-axis. We used these values of chroma, hue, and luminance to characterize dorsal coloration.
To quantify side and ventral pattern, we again used the Image Calibration and Analysis Toolbox (Troscianko & Stevens, 2015) to generate aligned and normalized images from RAW photographs.
Next, we manually outlined the ventral and side surfaces of frogs on the standardized images using the polygon and brush selection tools.
We selected the entire ventral and side surfaces to obtain a comprehensive measure of overall pattern on each surface. After manually selecting the regions of interest, we used the scale bars present in each photograph to scale all images to the same number of pixels per millimeter (side surfaces = 19 px/mm; ventral surfaces = 18.6 px/ mm), which is necessary for pattern analysis. We performed a granularity analysis, which is based on Fast Fourier bandpass filtering, using the Image Calibration and Analysis Toolbox implemented in ImageJ. For our side pattern analysis, we specified Fast Fourier Transform Bandpass filters at 16 levels, starting at two pixels and increasing by a multiple of the square root of two until 430 pixels.
For our ventral pattern analysis, we specified Fast Fourier Transform Bandpass filters at 14 levels, starting at two pixels and increasing by a multiple of the square root of two until 193 pixels. From our granularity analysis, we generated descriptive summary statistics to estimate pattern contrast, pattern diversity, and luminance contrast for both side and ventral pattern. Granularity analysis has been increasingly used to measure pattern in a variety of organisms (e.g., Barbosa et al., 2008;Stoddard & Stevens, 2010) and draws on basic characteristics of early-stage visual processing present across diverse animals (Campbell & Rodson, 1968;Godfrey, Lythgoe, & Rumball, 1987;Pérez-Rodríguez, Jovani, & Stevens, 2017;Stoddard & Stevens, 2010). Conversely, color adjacency analysis, a method which has previously been used to quantify pattern in poison frogs, does not represent visual processing of pattern (Pérez-Rodríguez et al., 2017).
Our dataset included both males and females (58 males, 26 females, two juveniles). Preliminary analyses did not reveal an effect of sex on frog coloration or pattern, so sexes were lumped for the analyses presented here. However, our study design did not explicitly aim to quantify sexual dimorphism, and future studies can investigate this question with targeted sampling.

| Quantification of genetic variation
We extracted DNA from toe clips using Qiagen DNeasy extraction kits (Qiagen, Valencia, CA, USA) generally following the manufacturer's protocol with two modifications: 4 µl of 1 mg/ml RNase A was added to each sample after lysis, and DNA was eluted in 45 µl of 1× LTE buffer to maximize concentration. Prior to library preparation, we checked the quality of extracted DNA using agarose gels and quantified DNA using a Qubit 2.0 Fluorometer (ThermoFisher, Waltham, MA, USA).
We constructed a restriction site-associated DNA (RAD) library

| RADseq data processing
To process RADseq data, we used pipelines implemented in a customized PERL workflow that also utilized various external programs (pipelines can be accessed through https://github.com/CGRL-QB3-UCBerkeley/RAD). We first demultiplexed raw fastq reads using internal barcode sequences and allowing for one mismatch in barcode sequence. We removed demultiplexed reads that did not include the expected restriction enzyme cut site at the beginning of the read, again allowing for one mismatch in cut site sequence, and also removed exact duplicates using Super Deduper (https://github. com/dstreett/Super-Deduper). To filter reads, we used Cutadapt (Martin, 2011) and Trimmomatic (Bolger, Lohse, & Usadel, 2014) to trim adapter contamination and low quality reads. We removed filtered reads that were shorter than 50 bp. After cleaning and filtering reads, we used cd-hit (Fu, Niu, Zhu, Wu, & Li, 2012;Li & Godzik, 2006) to cluster forward reads of each individual at 95% similarity, retaining only those clusters with at least two supported reads.
For each cluster, we used the sequence identified as representative by cd-hit as our marker. Retained markers were next masked for repetitive elements, low complexities, and short repeats with Ns using RepeatMasker (Smit, Hubley, & Green, 2014) Bi et al., 2013), that was slightly modified to remove sites around indels and implemented in our pipelines. Additionally, we filtered out markers where more than two alleles were called at any site, and also masked sites that were within 10 bp of any indel. Only individual sites falling within the first to ninety-ninth percentile of overall coverage among all samples were retained. We also removed SNPs failing to pass a one-tailed HWE exact test (1e−4) or showing strong base quality bias (1e−100). To avoid bias resulting from excessive missing data, we only used sites present in at least 60% of individuals with at least 3× coverage in our downstream analyses.

| Population genomic analyses
We used genotype likelihoods instead of genotype calls whenever possible to account for uncertainty in our data. Because our data showed low to medium coverage (1.8-13.9×, with an average of 5.4×), SNP and genotype calls based on allele counts could potentially cause bias or introduce noise in downstream analyses (Johnson & Slatkin, 2008;Lynch, 2008 (Huson, 1998;Huson & Bryant, 2006) to visualize population structure. We adhered to the stringent thresholds mentioned above to call a set of high-quality variants, which were used to compute a genetic distance matrix for Splitstree using the Adegenet package in R (Jombart, 2008). We also quantified the population structure of all samples using NGSadmix (Skotte, Korneliussen, & Albrechtsen, 2013), which relies on genotype likelihoods. Because there were sixteen populations present in our study, we estimated individual admixture proportions with the number of clusters ranging from one to seventeen (K = 1-17), with ten replicates per K value.
We then used the Evanno method (Evanno, Regnaut, & Goudet, 2005) to identify the most likely K value.
To characterize fine-scale population structure, we performed an NGSadmix analysis within each major genetic cluster. We again estimated individual admixture proportions with the number of clusters ranging from one to one more than the total number of populations (K = 1-3 for Cluster A; K = 1-7 for Cluster B; K = 1-9 for Cluster C; K = 1-15 for candidate hybrid populations). We ran NGSadmix with ten replicates for each K value and used the Evanno method to determine the most likely K value. We assigned admixed populations to the NGSadmix group from which more than 50% of its admixture was drawn. Finally, we calculated F ST values for all possible pairwise population comparisons using the realSFS function of ANGSD.

| Integration of genomic and phenotypic datasets
We used Mantel and partial Mantel tests to investigate the relationship between genetic, geographic, and phenotypic distance for each major cluster in our study. Some concerns have been raised over the use of Mantel tests in population genetics, especially in regards to inflated type I error rates for partial Mantel tests when spatial autocorrelation is present (Diniz-Filho et al., 2013;Guillot & Rousset, 2013;Legendre & Fortin, 2010). Therefore, we rely most heavily on pairwise comparisons, and we are cautious in our interpretation of partial Mantel results. In addition, we focus on comparisons across major genetic groupings, where any potential biases should be comparable. Although we are conservative throughout about linking pattern to process, our results highlight complexes with interesting spatial patterns of genetic and phenotypic variation.
Our regression analysis was performed on 86 individuals and excluded the two samples indicated above. To generate our geographic distance matrix, we used the Geographic Distance Matrix Generator (Ersts, 2018). To generate our genetic distance matrix, we used ngs-Dist (Vieira, Lassalle, Korneliussen, & Fumagalli, 2016) to estimate pairwise genetic distances using genotype posterior probabilities.
To quantify phenotypic distances among individuals, we generated three separate distance matrices: a dorsal coloration distance matrix, a side pattern distance matrix, and a ventral pattern distance

| RE SULTS
Our sequencing efforts yielded a high-quality dataset. After filtering reads for intact barcode sequences and restriction enzyme sites,  Within Cluster A, both our PCA and NGSadmix analysis indicated that each population was a genetically distinct entity. Our NGSadmix analysis identified two groups (K = 2 based on Evanno et al., 2005 method) corresponding to each population ( Figure 3). These populations were also clearly differentiated in our splitsTree analysis, with no admixture occurring between them (Figure 2b). In fact, the pairwise F ST value between these two populations (0.43) was among the highest in our dataset (Figure 4). Within this cluster, Mantel and partial Mantel tests confirmed that genetic distance was most highly correlated with geographic distance (r = 0.9444; p-value = 0.001; Table 1). Although dorsal coloration and side pattern were also correlated with genetic variation, neither phenotypic trait remained significantly associated with genetic distance after accounting for the effect of geographic distance in partial Mantel tests (side pattern: r = −0.1835; p-value = 0.994; dorsal coloration: r = 0.1033, pvalue = 0.129). Additionally, the degree of phenotypic variability was much lower in this cluster than in the others, as side pattern was the only phenotypic trait that was significantly correlated with geographic distance when accounting for genetic distance (r = 0.3006; p-value = 0.008). Phenotypically, populations within this cluster were relatively uniform, and there was little variation in either dorsal coloration or in side and ventral pattern ( Figure 5).
Within Cluster B, we also found evidence for two distinct groups with any of the phenotypic traits that we quantified (Table 1).
Genetic distance was only significantly correlated with geographic distance (r = 0.1038; p-value = 0.018; Table 1). All three phenotypic traits, however, remained significantly correlated with geographic distance when accounting for genetic distance (  higher within-population variation in patterning characteristics, and also displayed novel pattern phenotypes not present in either "parental" species ( Figures 5 and 6). Within each candidate hybrid population, we observed a spectrum of pattern variants rather than discrete morphs.

| D ISCUSS I ON
Our investigation of color diversity and genetic structure in a complex of three closely related species of Malagasy poison frog revealed that the level of intraspecific color variation in this species complex has likely been overestimated, while the occurrence of distinct species has been underestimated. Although not in alignment with current species designations, we found evidence for several clear genetic clusters, each demonstrating a distinctive pattern of genetic and phenotypic variation, suggesting that a number of mechanisms are contributing to color evolution in this complex. Below, we discuss the relevance of our findings for taxonomy, conservation, and the evolutionary processes contributing to phenotypic variation in aposematic organisms. Additionally, we consider the implications of our findings for characterizing morphs in aposematic systems.

| Taxonomic resolution and implications for conservation
Populations within this complex were highly structured at relatively small spatial scales, with distinct genetic groups emerging.
Unexpectedly, genetic clusters identified in this study do not corre- Although previous work has found evidence of haplotype sharing between M. crocea and M. aurantiaca , it is surprising that populations of these two species are identified as a cohesive cluster in our analysis, as they differ greatly in color, pattern and body shape. Our work also demonstrates the  Vences et al., 2004). However, we demonstrate that M. milotympanum is its own distinct genetic entity, Overall, while it is premature to delineate new species on the basis of our analyses-especially considering the significant conservation implications in this system-our study confirms that there are at least three genetically distinct groups that do not correspond to current species descriptions. At the least, it seems clear that Cluster A (composed of green "M. crocea") should be considered a distinct entity and prioritized given its extremely limited distribution (i.e., known from only two isolated locations). Due to the likely hybridization that we detected between Clusters B and C, and the fine-scale population structure that we observed within each cluster, the status of Clusters B and C is less clear. Rather than revising taxonomy prematurely, we recommend additional studies on gene flow and migration, characteristics of frog calls, and mating behavior before species boundaries can be clarified with any certainty.

| Divergent patterns of genetic and phenotypic diversity
Our findings suggest that a variety of processes at different spatial and genetic scales are likely contributing to differentiation in this Malagasy poison frog complex. Within major genetic clusters, we found evidence of highly regionalized patterns of phenotypic and genetic diversity (Figure 3). In our regression analysis, while genetic distance was most highly correlated with geographic distance for Clusters A and C, dorsal coloration was most highly correlated with genetic distance within Cluster B (Table 1). We also identified likely M. crocea-M. milotympanum hybrid populations (populations genetically admixed between Clusters B and C) that displayed intermediate genotypes and novel pattern phenotypes (with especially high within-population pattern variation; Figures 3, 5 and 6).
Hybridization has been hypothesized to be an important mechanism of generating novel phenotypes in other poison frog systems (Medina, Wang, Salazar, & Amézquita, 2013) and may play a similar role here. Based on our findings, patterns of color evolution are likely influenced by variable patterns of drift, selection, and hybridization across major genetic groups in this Malagasy poison frog complex.
Regional diversity in patterns of genetic and phenotypic variation has been found in other frog systems, including the red-eyed treefrog Agalychnis callidryas, and lends support to the hypothesis that modes of diversification can vary substantially at relatively small spatial scales (Robertson & Zamudio, 2009).
Previous work in Oophaga pumilio has demonstrated that male dorsal coloration is an important component of female mating preferences (Maan & Cummings, 2008) and that dorsal brightness is an important signal in male-male competition (Crothers, Gering, & Cummings, 2011), while also confirming that conspicuous coloration is an honest signal to predators (Maan & Cummings, 2011). In fact, studies within this system have even suggested that natural and sexual selection may operate on different aspects of frog coloration, as variation in male brightness, an important component of assortative mating and male-male interactions, is not visible to avian predators . Additionally, there is growing evidence that pattern traits in poison frogs may also be under selection (Barnett, Michalis, Scott-Samuel, & Cuthill, 2018;Rojas & Endler, 2013;Wollenberg, Lötters, Mora-Ferrer, & Veith, 2008). Studies of intrapopulation pattern variation in the poison frog Dendrobates tinctorius have demonstrated that certain pattern traits are correlated with movement behavior, suggesting that patterning elements may also be important in determining how aposematic organisms are perceived by predators (Rojas, Devillechabrolle, & Endler, 2014). In fact, recent work in D. tinctorius indicates that pattern and color function as an aposematic signal to predators at close range, but are perceived as cryptic when viewed from longer distances (Barnett et al., 2018 (Heying, 2001a;Hutter, Andriampenomanana, Razafindraibe, Rakotoarison, & Scherz, 2018;Jovanovic, Vences, Safarek, Rabemananjara, & Dolch, 2009). Although the extent to which snakes and lizards utilize visual signals-especially in relation to olfactory cues-in predation is not fully understood (Maan & Cummings, 2011;Willink, García-Rodríguez, Bolaños, & Pröhl, 2014), lizards have been found to select for lower conspicuousness in the poison frog Oophaga granulifera (Willink et al., 2014). Consequently, there is reason to believe that selective pressures resulting from predation-presumably a major driving force in shaping phenotypic variation in aposematic organisms-may be substantially different for Malagasy poison frogs.
Thus, the Malagasy poison frogs represent an important unique and comparative system for developing generality about the selective factors influencing color evolution. To understand the processes generating the interesting and variable patterns of phenotypic diversity, there is a pronounced need for research on the ecology and life history of these frogs. Fundamental research on Mantella predation, migration, mating behavior, diet, toxicity, and habitat quantification will be essential in formulating explicit hypotheses regarding color evolution.

| Broader implications for defining species and morphs in phenotypically diverse aposematic organisms
Our In this study, although our high-resolution genomic dataset clarified species boundaries at the highest level, the way in which finescale population structure is interpreted has significant implications for how hypotheses of color evolution are framed in this system.
Within Cluster B, for example, if M. aurantiaca and M. crocea are considered to be color morphs of the same species, then one interpretation of our results is that differences in dorsal coloration are driving reproductive isolation and may be a potential mechanism for incipient speciation in this group, as hypothesized in another polymorphic poison frog complex (Wang & Summers, 2010). If considered to be separate species, however, then the dramatic phenotypic differences that coincide with genetic variation may serve to reinforce species recognition and to prevent hybridization. Our study demonstrates that even when genetic structure is well understood, distinguishing between species, subspecies and morphs is not always a straightforward process and may require additional information on behavior, acoustics and mating preferences, among other traits.
For aposematic organisms presumed to be polymorphic or polytypic, our results emphasize that more thorough deliberation is necessary not only in the delineation of species but also in the characterization of morphs. Refining our understanding of what constitutes a morph, and when species or populations are polymorphic and/or polytypic, may be a necessary first step in this regard. Below, we highlight three conceptual areas identified on the basis of our findings that merit more careful consideration in the identification and description of phenotypic morphs.
Although phenotypes are multifaceted, morphs are often defined on the basis of one charismatic trait. Consequently, designating variants of one phenotypic axis as morphs is premature and fails to account for variation across multiple traits. In our study, patterns of variation in coloration and pattern traits were not always concordant; interpopulation differences in dorsal coloration were much more discrete than differences in pattern, although both varied among populations ( Figure 5). Within most candidate hybrid populations, dorsal coloration was relatively uniform, but patterning elements varied almost continuously along a spectrum (Figures 5 and   6). These findings highlight the difficulties of characterizing distinct morphs when there is variation along multiple, potentially correlated phenotypic axes. Similar findings have been reported in other poison frog species, where continuous pattern variation was observed within populations (Rojas & Endler, 2013). In such instances, explicitly describing and quantifying variation-whether it is continuous or discrete-in multiple phenotypic traits is essential. We recommend caution in designating discrete morphs when there may be continuous phenotypic variation, especially along multiple phenotypic axes.
In aposematic organisms, designation of morphs is often based on human-observed phenotypic variation. This raises the question: is there really that much diversity in aposematic signals, when considered from the relevant predator and conspecific visual perspectives? In our study, although populations varied in terms of dorsal chroma and hue, dorsal luminance was largely similar across most populations ( Figure 5). Interpreted in light of evidence suggesting that high luminance contrast can serve as an effective warning signal to predators (Prudic, Skemp, & Papaj, 2007), and that dorsal brightness is an important component of conspecific signaling in poison frogs (Crothers et al., 2011;Maan & Cummings, 2009), biologically meaningful color diversity in this system may be much less than expected. Although we observed high degrees of pattern variation on the side and ventral surfaces of frogs in our study ( Figures 5 and 6), the role of this variation is unknown. Recent work in poison frogs linking patterning elements to movement behavior and detectability by predators (Barnett et al., 2018;Rojas, Devillechabrolle et al., 2014) indicates that pattern variation may be equally, or even more, So, what is a morph? Evolutionary biologists often use the term "morph" without a formal conceptual or operational definition. The research community has thought carefully over decades about how to define and delineate species (e.g., Mayr, 1942;Wiley, 1978;Mallet, 1995;Sites & Marshall, 2004;de Queiroz, 2007). The same attention has not been paid to the morph concept (but see Teasdale, Stevens, & Stuart-Fox, 2013). Not only do we need a more standardized definition of morph but also operational criteria for delineating morphs when there is phenotypic variation along multiple axes and/ or when species boundaries are unclear or dynamic through time.
Our purpose here is not to provide a revised definition of "morph" but rather to highlight the importance of contextualizing phenotypic variation appropriately. When a novel phenotypic variant is discovered, rather than prematurely being described as a new species or morph, it should serve as a launching point for comprehensive studies that integrate phenotypic information with genomic datasets (and, ideally, information on acoustics and behavior). Describing novel phenotypic variants within a species should require-at the least-a high-resolution genetic dataset to confirm intraspecific relationships. In addition, rather than characterizing human-observed variants as morphs, we recommend that researchers specify the facet of phenotype measured, the sensory perspective used to quantify phenotype, and the level of biological organization where variation is observed. The way we define and delineate morphs is relevant in many systems, but carries particular significance in aposematic organisms, where novel color and pattern variants are regularly found. If we are overestimating intraspecific color variation and underestimating species diversity-as was found in this study-the classification of populations and species as polymorphic or polytypic may need to be reevaluated. Ultimately, it is time to give as much consideration to the conceptual and operational morph delineation as has been given to species.

ACK N OWLED G M ENTS
We thank S. Ndriantsoa for his invaluable assistance in the field. We also thank M. Vences

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R CO NTR I B UTI O N S
KK designed the research with input from EBR. KK performed the field work and lab work. KK and KB analyzed the data. KK and EBR wrote the paper with contributions from KB.

DATA ACCE SS I B I LIT Y
RADseq data will be deposited in the NCBI Sequence Read Archive (SRA).