Occurrence data uncover patterns of allopatric divergence and interspecies interactions in the evolutionary history of Sceloporus lizards

Abstract As shown from several long‐term and time‐intensive studies, closely related, sympatric species can impose strong selection on one another, leading to dramatic examples of phenotypic evolution. Here, we use occurrence data to identify clusters of sympatric Sceloporus lizard species and to test whether Sceloporus species tend to coexist with other species that differ in body size, as we would expect when there is competition between sympatric congeners. We found that Sceloporus species can be grouped into 16 unique bioregions. Bioregions that are located at higher latitudes tend to be larger and have fewer species, following Rapoport's rule and the latitudinal diversity gradient. Species richness was positively correlated with the number of biomes and elevation heterogeneity of each bioregion. Additionally, most bioregions show signs of phylogenetic underdispersion, meaning closely related species tend to occur in close geographic proximity. Finally, we found that although Sceloporus species that are similar in body size tend to cluster geographically, small‐bodied Sceloporus species are more often in sympatry with larger‐bodied Sceloporus species than expected by chance alone, whereas large‐bodied species cluster with each other geographically and phylogenetically. These results suggest that community composition in extant Sceloporus species is the result of allopatric evolution, as closely related species move into different biomes, and interspecies interactions, with sympatry between species of different body sizes. Our phyloinformatic approach offers unique and detailed insights into how a clade composed of ecologically and morphologically disparate species are distributed over large geographic space and evolutionary time.


| INTRODUC TI ON
The current availability of geographically informed, organismal data offers opportunities for new insights into evolutionary processes such as trait divergence and sympatric diversification that have been traditionally studied through time-intensive fieldwork.
Often, selective forces lead to phenotypes that allow species to occupy distinct ecological niches while living in sympatry with other closely related species (Lack, 1947;López Juri et al., 2015;Pacala & Roughgarden, 1982). This is particularly apparent in adaptive radiations in which species evolve specialized morphologies that allow them to exploit distinct microhabitats. For example, the Caribbean anoles have evolved disparate limb morphologies that allow them to occupy different ecological niches, thereby reducing competition (Losos, 1992;Losos et al., 1998). Similar examples of adaptive radiation and convergent evolution (ecomorphs) have been found in other island species (Darwin's finches: Grant & Grant, 2006, Hawaiian silverswords: Blonder et al., 2016, Hawaiian spiders: Gillespie, 2004 and on larger scales, for example, in Europe and North America fishes (Lamouroux et al., 2002). Coexistence with closely related species and niche-partitioning are clearly also important drivers for mainland communities (Kartzinel et al., 2015;Mujic et al., 2016), but their effects can be difficult to detect, for example, as in the composition of plants in the Amazon rain forest (Kraft & Ackerly, 2010) and the assemblage of bat species on a latitudinal gradient across the Americas (Stevens et al., 2003). In part, these mainland community patterns are obscured by power limitations of the existing methods (Kraft & Ackerly, 2010) and the relatively small sample sizes imposed by the time-intensive nature of collecting the necessary data. Here, we use the power of massive data available in online databases and recent advances in data manipulation and statistical modeling to study patterns of sympatry and body size across a diverse genus of lizards.
Gause's competitive exclusion principle dictates that two species occupying the same niche should not be able to coexist (Gause, 1934;Mayfield & Levine, 2010). However, it has been long known that closely related taxa occur together more often than expected by chance alone (Elton, 1946;Moreau, 1948;Williams, 1947) and may also expand the range of physical conditions in which other species can live (e.g., by providing refuges: Bulleri et al., 2016) or shape phenotypes through character displacement (Dayan & Simberloff, 2005;Losos, 2000;Moritz et al., 2018;Schluter & McPhail, 1993). For example, the plethodontid salamanders, Plethodon hoffmani and P. cinereus, are identical when living in allopatry (Adams & Rohlf, 2000). However, when the two species are in sympatry, they exhibit differences in jaw morphology that allow them to partition their prey resources so that P. hoffmani eats larger, faster prey while P. cinereus eats smaller, slower prey (Adams, 2004). Sympatric European lacertid lizards similarly exhibit small changes in head morphology that allow the partitioning of food resources by species, sexes, and age category (Herrel et al., 2001). Other studies have emphasized evolution via selective migration (Gillespie, 2004;Li et al., 2015), phenotypic convergence in older lineages (Tobias et al., 2014), and interactions with phylogenetically unrelated competitors (Wilcox et al., 2018). Theoretical discussions of these forces (e.g., Edwards et al., 2018;Mittelbach and Schemske 2015) suggest that there is still much to learn about the selective impact of phenotypically similar sympatric taxa.
Body size is a convenient and effective trait for analyzing the relationship between abiotic factors, like climate, and morphology (Bergmann, 1847), and often serves as a starting point to understand trait divergence in sympatric species (Echeverría-Londoño et al., 2018;Thomas et al., 2009). The relationship between body size and sympatry has been studied in many groups of amphibians and reptiles (Dunham et al., 1978;Kozak et al., 2009;Moen & Wiens, 2009;Okuzaki et al., 2010;Soule, 1966). For example, in the Australian Gehyra lizards, differences in body size are greater in sympatric lineages than in allopatric lineages (Moritz et al., 2018) suggesting that body size divergence is a key element of the process by which species coexist.
The genus Sceloporus serves as an ideal system to study the forces involved in the evolution of species in sympatry and how body size is distributed over geographic space and evolutionary time. Sceloporus lizards, also known as spiny or fence lizards, have been well studied in terms of behavior (Carpenter et al., 1977;Hews et al., 2013), physiology (Angilletta et al., 2004;Beal et al., 2014), and phylogenetic relationships (Leaché et al., 2016;Sites et al., 1992;Wiens et al., 2010). Moreover, the genus Sceloporus has a large geographic distribution that encompasses much of North America and Central America (Sites et al., 1992), is speciose with nearly 100 species (Leaché et al., 2016), and possesses a tremendous amount of ecological diversity inhabiting lowland deserts, high-elevation alpine forests, and grasslands (IUCN, 2018). Sceloporus species also can be found in sympatry with other congeners at varying degrees, but this has only been studied at a small, local scale (Grummer et al., 2015;Serrano-Cardozo et al., 2008). Although the genus Sceloporus does not possess a tremendous amount of morphological disparity, it offers an intriguing opportunity for discovery analyses using publicly available data over a large geographic area.
Here, we use information from large, open-source databases to study the biogeographic and phenotypic patterns of congeneric sympatry among Sceloporus species. First, we use spatial distribution data and phylogenies to ask how species are spatially grouped, and how sympatry relates to phylogenetic divergence. Specifically, do closely related species cluster geographically or are they evenly spread across the landscape? In particular, we apply a bioregions approach (Vilhena & Antonelli, 2015). Bioregions are a powerful way to describe species distributions and can be used in evolutionary biology to infer historical processes that led to current day distributional patterns (Matzke, 2014;Ree & Smith, 2008). We also test whether characteristics of each geographic area correlate with species richness to determine if particular factors are contributing to the biogeographic patterns of the clade.
Then, we infer how body size is distributed over different spatial regions and has evolved through time. Finally, we combine occurrence data, phylogenies, and body size to ask whether species of different body sizes co-occur more often than by chance to understand how phenotypic diversity varies across space.

| The biogeography of congeneric sympatry
We obtained point-occurrence data for 69 species of Sceloporus lizards from the Global Biodiversity Information Facility (Tgbif, 2018), using the "rgbif" v0.9.9 package (Chamberlain,) in R (R Core Team, 2017). The most recent phylogenetic analysis of the genus by Leaché, Banbury (Leaché et al., 2016) identifies 97 Sceloporus species and sampled 86 species to construct the phylogeny. Our study includes 69 of these species that are also represented in GBIF. The remaining species are known from very few records. The 13 species that are identified by the Leaché, Banbury (Leaché et al., 2016) phylogeny but not in our current study are roughly evenly distributed geographically, phylogenetically, and by size, such that excluding them from our analyses is unlikely to bias our results. For each species in our study, we removed all points that were likely errors, clearly falling outside of the geographic ranges provided by International Union for Conservation of Nature (IUCN, 2018). Our final data set consisted of 133,665 occurrence points for the 69 species (Table 1).
We used the Infomap Bioregions application to determine how species are spatially grouped (Edler et al., 2017). We used this approach because the program uses a novel adaptive resolution method that is less sensitive to biases from incomplete species distribution data (Edler et al., 2017). Infomap Bioregions is a network approach (Vilhena & Antonelli, 2015) that uses adaptive spatial resolution to cluster pointoccurrence data from multiple species into geographic grid cells. The software creates a bipartite network of species and grid cells, connecting each species to geographic cells in which it occurs. The program then aggregates geographic cells for which there are occurrence points into larger geographic regions, creating larger regions in areas with similar species composition. The result is a detailed map with designated bioregions that are composed of unique species assemblages.
The Infomap Bioregions application has several input parameters including maximum and minimum cell size and maximum and minimum number of occurrence points per cell. We tested various combinations of parameters and found that a minimum cell size below 2° produced patchy, discontinuous bioregions. Setting the maximum cell size above 4° led to large, uninformative regions that included areas where we know that species do not co-occur. As a result, we set the maximum cell size to 4° and the minimum cell size to 2°. Changing the maximum and minimum cell capacities (number of incident points/cell) had little impact on our findings except in slightly changing the shapes of a few bioregions. Therefore, we used the default minimum and maximum cell capacities of 100 and 10, respectively. We also provided the Leaché, Banbury (Leaché et al., 2016) phylogenetic tree as an input parameter for the analysis. Once the software produced the bioregions, we summarized the total number of biomes as described by Olson, Dinerstein (Olson et al., 2001) within each bioregion. Lastly, we used a linear model to explore the relationship between biome heterogeneity and species richness across bioregions.

onto a map of North
America. Then, we linked each taxon to the map at the mean latitude and longitude for all occurrence records for that species. We omitted species that were not included in the Leaché, Banbury (Leaché et al., 2016) phylogeny. Additionally, we used the package "picante" (Kembel et al., 2010) to calculate the mean pairwise distance (MPD) between species assemblages to test whether bioregions show a pattern of phylogenetic overdispersion or underdispersion. We used the standardized effect size of the MPD and p-value quantiles to evaluate the results.

| Body size evolution
We gathered Snout-to-Vent Length (SVL) measurements for adult males from each of the Sceloporus and outgroup species from the literature ( Table 1) to test how body sizes have evolved through evolutionary time and are distributed across the phylogeny. First, we generated a histogram with the mean SVLs of each Sceloporus species and used the natural breaks in the data to visually generate three body size categories: large, medium, and small. We then used "phytools" v0.6-99 (Revell, 2012) in R (R Core Team, 2017) to reconstruct ancestral body sizes of the continuous SVL values using three models of evolution: Brownian motion (BM), Ornstein-Uhlenbeck (OU), and Early-burst (EB). We used log-likelihoods to compare the model fit and checked for convergence. We then estimated the degree of phylogenetic signal (Pagel's λ) for the SVL size data to determine the degree to which phylogenetic history constrains body size evolution. We performed this analysis using phylogenies from both Leaché, Banbury (Leaché et al., 2016) and Wiens, Kuczynski (Wiens et al., 2010), but since our results were similar both in terms of phylogenetic signal and ancestral state reconstruction, we present only the results using the Leaché, Banbury (Leaché et al., 2016) phylogeny here.

| Sympatry and body size evolution
For each species, we estimated the degree to which the focal taxon occurs with species of the same or different body size. To do this, TA B L E 1 Number of occurrence points used to characterize the range of each species, the mean Snout-to-Vent Length (SVL, with number of individuals used to obtain that estimate in parentheses), and the published reference from which we obtained body size data. Also included are size data for the outgroup used in the ancestral reconstruction analysis Species N occurrence points

| Patterns of sympatry across bioregions
Using the Infomap Bioregions application and the GBIF occurrence data from 69 species of Sceloporus, we identified 16 distinct regions ( Figure 1; Table 2)

| Body size evolution
We found three discrete size categories for the Sceloporus clade   Figure 6). In addition, body size categories tended to cluster phylogenetically ( Figure 6). An estimate of phylogenetic signal confirmed that body size is evolving with high phylogenetic constraint (Pagel's λ = 0.99; Log likelihood = 270.34). Thus, closely related species tend to be similar in body size. Increases in body size occurred abruptly in a few lineages and were then retained over long periods of evolutionary time ( Figure 6).

| Sympatry and body size evolution
Our chi-squared analysis found evidence that small Sceloporus species are spatially distributed so that they often co-occur with different sized species (Appendix S2). Conversely, large species were most often found to co-occur with other large species than by chance alone. More than % of small-bodied Sceloporus species   where medium and large species were overrepresented. Species also seem to sort geographically so that small-bodied species co-occur F I G U R E 2 A map showing the recovered bioregions from the clustering analysis. Also shown is the phylogeny used in the analyses modified from Leaché, Banbury (Leaché et al., 2016). Next to the species is a pie chart indicating which bioregion each species belongs to and colors of the pie slices correspond to colors of the bioregions in the map  more often with larger-bodied species, perhaps to reduce ecological competition or interference like in other lizard species (Losos, 1990;Moritz et al., 2018). In contrast, large-bodied Sceloporus species most often co-occur with other large-bodied species, perhaps reflecting phylogenetic constraints on body size evolution and limited dispersal. Future studies may want to investigate the relationship between species distributions, body size, and habitat at smaller spatial scales and to explore functional diversity through more detailed analysis of phenotypes.

| Bioregions and sympatry
Sceloporus species are unevenly distributed across North America, with some geographically large regions, like the southern and western United States, having few species, whereas geographically smaller regions, like central Mexico and south Central America, have many species. This pattern seems to follow the latitudinal diversity gradient where biodiversity increases from poles to the tropics (Hillebrand, 2004) and Rapoport's rule where species at higher latitudes have lower rates of spatial turnover and larger ranges (Stevens, 1989). The undulatus clade exemplifies this as these nine northern species span the entire United States. Conversely, species assemblages at lower latitudes tend to have higher richness and turnover rates (Stevens, 1989). This also seems to be the case in Sceloporus as the small area from central Mexico to Panama harbors 65 species, over half of the total diversity of the genus. Other factors may also contribute to the high species richness in relatively small bioregions. For example, the Trans-Mexican volcanic belt (Region 11), which formed during the Eocene and Miocene periods (Ferrari et al., 2012), is a relatively small geographic region that includes several volcanic peaks that rise above 3,000 m, and the region is a biodiversity hotspot with a string of sky islands (Halffter & Morrone, 2017;Mastretta-Yanes et al., 2015). Likewise, Region 14 harbors 16 species, is close to the equator, and has a large mountain range named Cordillera de Talamanca that traverses Costa Rica and Panama. This high habitat heterogeneity likely contributes to increased allopatry and speciation within these two regions. This was also supported by our correlation analysis showing that regions with more elevation heterogeneity have a higher species richness compared to regions with low elevation heterogeneity. This pattern may also be explained by the elevation diversity gradient where biodiversity is highest at middle elevation (McCain, 2005), but this has yet to be explicitly tested. Additionally, the lower turnover rates in areas close to the equator may because Central American is vastly smaller in area compared to North America so species tend have smaller ranges.
Early research suggested that the genus Sceloporus originated in Mexico given the diversity seen today (Hall, 1973). More recent studies, however, found that Sceloporus most likely originated much further north in what is now the northern United States and F I G U R E 4 Maps depicting the geographic center for a species' range separated by clade. Dashed lines are used to link the species with their respective range center. Panel (a) represents the (angustus + siniferus + variabilis) clades, panel (b) represents the (merriami + pyrocephalus + gadoviae + jalapae) clades, panel (c) represents the (graciosus + magister + scalaris) clades, panel (

TA B L E 3
Results from the mean pairwise distance (MPD) community structure analysis. We used the standardized effect size and p-value quantiles to evaluate the results Canada (Lawing et al., 2016). Based on fossil evidence and paleoclimate reconstructions, Mexico was too hot and dry and, therefore, unsuitable for the genus until about 6 million years ago (Lawing et al., 2016). We note that Lawing, Polly (Lawing et al., 2016)  In addition to Rapoport's rule, the lack of diversity at higher latitudes may reflect historical events like glacial maxima and shifts in climate regime. If taxa were unable to find refuges from increasing glacier event, species would fail to recolonize and be extirpated (Grubb et al., 1987) leaving only lineages farther south.
Second, the emergence of new unsuitable habitat may also contribute to this pattern. Sceloporus lizards may have been extirpated in Canada and parts of the United States due to the rise of the Rocky Mountains, which blocked moisture from the midcontinent (Dimmitt et al., 2015). This allowed the Great Plains of North America to emerge but also created more xeric conditions (Dimmitt et al., 2015) (Búrquez et al., 1986). Sceloporus species tend to increase their activity, including foraging, during the peak mating season, and, at least in males, reducing activity during nonbreeding seasons (Rose, 1981).
By employing strategies like off-setting breeding seasons or modification of diet and activity through the year, species may be able to live in sympatry with reduced antagonistic interactions.
A limitation in our study was the lack of intraspecific variation in the body size data. Phenotypic variation has long been known to be an important measure that can affect the response to selective pressures, how species interacts with abiotic and biotic factors, and even community dynamics (Bolnick et al., 2003). Thus, ignoring intraspecific variation may overlook fundamental processes that dictate species assemblages and the ecological breadth of a species (Violle et al., 2012). The lack of intraspecific size data can become an issue for species with large distributions, like S. grammicus and S. variablis, because populations are bound to differ in body size and sampling effort may not be even across the range. This may have led to incorrect classification of the size, small versus medium, for example, which would affect our chi-squared analysis, ancestral reconstruction, and understanding of how species' phenotypes are distributed. This can be overcome by measuring a large number of individuals and equal sampling efforts across the range of the species. In our case, S. grammicus and S. variabilis both had large sample sizes, 412 and 457 individuals, respectively, but some species, like S. zosteromus may be more vulnerable to bias.

| A different approach to studying diversification
As large data sets become more accessible through open-source databases, we can begin to address questions at a larger scale. For example, GBIF is thought to be a good estimate for species richness completeness in both the United States (~75%-80%) and Mexico (~60%) (Meyer et al., 2015) allowing researchers to address questions about all of North America, rather than a small area. Additionally, much of the data that populates large databases come from reputable sources like museums. For example, between 70% and 80% of occurrence data for the clade Reptilia comes from museums vouchers and are accurate in terms of species identification and locality.
Although little work has been done describing bioregions in terms of ectotherms in North America, a study by Burbrink and Gehara (Burbrink & Gehara, 2018) found that the clade Lampropeltis, New World kingsnakes, showed a similar composition of bioregions as found in our study. For example, we found distinct bioregions in the west coast, east coast, central, and southwest United States and this same pattern was found by Burbrink and Gehara (Burbrink & Gehara, 2018); however, the size of the bioregions slightly dif- Many insular examples exist of closely related species occupying a single geographic region and exhibiting the phenotypic consequences that arise from such sympatry (Lack, 1947;Losos et al., 1998;Seehausen, 2006). These examples have led to groundbreaking work in evolutionary biology and pushed the field forward, yet they may represent the exception rather than the rule. Here, we emphasize and promote the use of open-source databases to perform macroscale analyses across large geographic areas that were previously difficult to achieve. Additionally, large-scale analyses provide a novel view into continental systems to gain new insights on macroevolutionary processes that create large-scale patterns of diversity.

ACK N OWLED G M ENTS
We would like to thank Jesualdo Fuentes-G., Cristina Romero-Diaz, Melissa Lopez, Tamal Roy, and Piyumika Suriyampola for their feedback. We also would like to thank two anonymous reviewers for their insightful comments that strengthened this study.

CO N FLI C T O F I NTE R E S T
None declared.

DATA AVA I L A B I L I T Y S TAT E M E N T
All data in this study were downloaded from open-source material online. See Methods section of the manuscript for details. Generally, occurrences were downloaded from GBIF, and size data were collected from published and unpublished literature.