EcoPhyloMapper: An r package for integrating geographical ranges, phylogeny and morphology

Spatial patterns of species richness, phylogenetic and morphological diversity are key to answering many questions in ecology and evolution. Across spatial scales, geographical and environmental features, as well as evolutionary history and phenotypic traits, are thought to play roles in shaping both local species communities and regional assemblages. By examining these geographical patterns, it is possible to infer how different axes of biodiversity influence one another, and how their interaction with abiotic factors has led to the spatial distribution of species assemblages—and their attributes—that we observe in the present. Although there has been interest in this area of research for some time, it has recently become more tractable to include multivariate shape data in such analyses. Shape information has the potential to provide a more direct measure of the functional morphology of species as compared to individual trait measurements and might be more relevant to understanding community composition. However, few tools currently exist to explore geographical patterns of both phylogenetic and shape diversity. We present the EcoPhyloMapper r package (epm) that aims to streamline the handling of geographical range polygons or point occurrences and integration of resulting species metacommunities with phylogenetic trees and morphological shape. Geographical maps can be generated that demonstrate spatial patterns in diversity metrics pertaining to phylogenetic similarity, multivariate shape similarity and disparity, and combinations of the two. Patterns of taxonomic, phylogenetic and shape disparity turnover can also be visualized. Biodiversity indices summarized across grid cells can easily be exported to GIS software as well as to other r packages that specialize in community assembly or geospatial statistics. This r package will facilitate the geographical exploration of multivariate shape data in concert with phylogenetic diversity, which will, in turn, support macroecological research exploring how species assemblages are structured. Furthermore, this r package should prove useful across a wide range of macroecological applications that extend beyond the study of morphology.


| INTRODUC TI ON
The morphological and phylogenetic affinities of species in a community can reveal a great deal about the processes that have shaped and maintain assemblages (Bowers & Brown, 1982;Brown, 2012;Carlucci et al., 2016;Montaño-Centellas et al., 2019;Rabosky et al., 2011). At small spatial scales, we refer to 'communities' as groups of coexisting species, where morphological or phylogenetic attributes may point to the mechanisms that promote such coexistence (Bowers & Brown, 1982;Webb et al., 2002). At broader spatial scales, we refer to 'assemblages' or 'metacommunities' as the regional species pools of a particular taxonomic group that local communities are drawn from, and morphological or phylogenetic attributes may be used to infer how abiotic factors and evolutionary history have contributed to the geographical distributions of these large-scale assemblages (Maestri et al., 2018;Mishler et al., 2014;Ricklefs, 2004). Some important traits, like body size, can be analysed as a single trait, but others are inherently multivariate, such as shape. Many complex traits are related to species' functional ecology, and these are increasingly of interest in ecology and evolutionary biology (Cooney et al., 2017;Maestri et al., 2018;Rüber & Adams, 2001).
The manner in which species in communities are arranged in morphological space (hereafter morphospace), and the degree to which they overlap with each other in this morphospace can provide useful information regarding how likely species are to be ecologically similar, and therefore how likely they are to be competing with each other for the same resources (Cooney et al., 2017;Pellissier et al., 2018;Pigot et al., 2016;Ricklefs et al., 1981;Ronco et al., 2021).
The overall size and shape of the morphospace occupied by species in a community can also be informative regarding evolutionary or physiological constraints that may prevent species from evolving in particular ways to acquire certain morphologies (Bright et al., 2016;Pigot et al., 2020).
There is also a rapidly growing body of literature detailing the ways in which phylogenetic relationships between species in a community can inform us about past processes that have shaped the assemblages we see today across spatial scales (Cavender-Bares et al., 2009;Fine & Kembel, 2011;Graham et al., 2009;Kraft et al., 2007;Webb, 2000). Hypotheses have been proposed that attempt to link characteristics of community-level phylogenetic distances (such as the degree of over-dispersion and clustering) to specific assembly mechanisms (Helmus et al., 2007;Webb et al., 2002), although how exactly those null models should be generated remains an active area of research (Miller et al., 2017).
At its core, community structure is shaped by species' traits (which will dictate which resources they can utilize and how, as well as interactions between species), their evolutionary history and their environment. By exploring how the relationships between these dimensions change with species richness, we can learn a great deal about the factors that promote or inhibit species accumulation, and the abilities of particular clades of species in expanding or partitioning their morphological space (Pigot et al., 2016;Swenson & Weiser, 2014). At broader spatial scales, environmental factors, evolutionary history and trait evolution will also leave their imprints on the geographical distribution of regional species assemblages, potentially revealing eco-evolutionary dynamics that have contributed to the assembly of continental biota (Holt et al., 2017;Huang et al., 2012;Maestri et al., 2018).
With a growing appreciation that evolutionary history, morphology and species richness are interconnected and all play a role in shaping the geographical distributions of species pools at differing spatial scales (Kraft et al., 2007;Ricklefs, 2004), there has been an increasing interest in quantifying and exploring biodiversity across multiple axes (Huang et al., 2012;Mazel et al., 2014;Nakamura et al., 2020;Talluto et al., 2018). Morphological data, both size and shape, lend themselves well to the analysis of multi-faceted biodiversity, as they are metric and the complexity of shape data often captures information about multiple dimensions of ecological adaptations. Refinements in photogrammetry and μCT scanning techniques are helping to facilitate the digitization of museum specimens (Medina et al., 2020), leading to the expanding collection of images that could be used for geometric morphometrics. In parallel, molecular phylogenies are being generated for large swaths of the tree of life Rabosky et al., 2018;Smith & Brown, 2018;Tonini et al., 2016;Upham et al., 2019). Meanwhile, species occurrence records and geographical range data have become increasingly accessible (IUCN, 2021;Robertson et al., 2014;Roll et al., 2017), alongside rich environmental and climatic datasets that cover the globe (Karger et al., 2017;Tuanmu & Jetz, 2014). There is now enormous potential to explore how axes of biodiversity intersect and interact across a broad range of spatial and phylogenetic scales.

| Aims of EcoPhyloMapper
The primary aim of the EcoPhyloMaPPEr (EPM) r package is to provide a unified framework within which to integrate spatial, phylogenetic and morphological shape data. Some tools already exist, for example to generate gridded datasets for macroecological analysis (Vilela & Villalobos, 2015), to integrate morphometric and phylogenetic data [r packages gEoMorPh (Adams & Otárola-Castillo, 2013) and MvMorPh (Clavel et al., 2015)] or to integrate phylogenetic and geographical range data [the Biodiverse software (Laffan et al., 2010), sPacodir (Eastman et al., 2011), PcPs r package (Debastiani & Duarte, 2014), nodiv r package (Borregaard et al., 2014), PhylorEgions package (Daru et al., 2020), among others], but few generate geographical patterns of phylogenetic and morphological diversity for subsequent analysis and visualization. Otherwise, such analyses are limited to those with programming expertise that encompasses geospatial, phylogenetic and morphometric methods. By providing this package in r, we allow for a more seamless integration in the programming environments within which researchers manipulate datasets and perform other analyses while taking advantage of the rich universe of existing r packages and graphics capabilities. More generally, the aim of the EPM r package is to provide a useful set of tools for anyone working with species-specific geographical data and flexibility allows for the mapping of a broad range of species attributes, phenotypic or otherwise.

| Novelty
The EcoPhyloMaPPEr package is unique in its ability to calculate diversity and disparity metrics for multivariate shape data alongside phylogenetic and taxonomic diversity metrics, all in a geographical framework ( Figure 1). Underlying C++ code is leveraged via Rcpp (Eddelbuettel & François, 2011), and organizational shortcuts are undertaken such that geographical grids of diversity metrics can be efficiently generated for datasets that contain many species, or at high spatial resolution. Additional functionality is provided to enable subsequent statistical analysis of the resultant diversity maps. A number of commonly employed diversity metrics are implemented; we additionally present novel approaches for examining spatial patterns in morphological shape disparity.

| Geographical input
The EPM r package requires, at a minimum, geographical data at the species level. This can be provided in a range of formats, including polygons, raster grids, point occurrences or as a presence/absence table of species by sites. Once geographical range data have been processed, a phylogeny may be added, as well as univariate and/ or multivariate trait data. Although it would be ideal to have identical species coverage for all data types, this is not a requirement.
Phylogenetic or trait data are subset as appropriate, depending on the diversity metrics requested.
All calculations of diversity metrics are performed across grid cells of species. Therefore, if geographical range data are provided in the form of polygons, then they must be converted to a common grid system. The input data define the coordinate system of the output grid. For instance, if the species range data are defined with an equal area projection (which may be the logical choice for calculating diversity metrics), then the output grid system will have an equal area projection as well. Two approaches are implemented for assigning range polygons to grid cells ( Figure 2). For the first approach (method 'centroid'), a range polygon is recorded as present for a grid cell if that polygon intersects the centroid coordinates of the grid cell. For the second approach (method 'percentOverlap'), a range polygon is recorded as present for a grid cell if it overlaps anywhere with that grid cell by a minimum percent threshold (by default, 25%). These two approaches will converge as spatial resolution increases. The choice of method will depend on the application at hand. In general, the 'centroid' method will tend to under-estimate the area occupied, whereas the 'percentOverlap' method will tend to over-estimate ( Figure 2).
The EPM package can employ either hexagonal or square grid cells ( Figure 2). Hexagonal grids offer a number of benefits over square grids, including improved conversion of nonlinear features to grids (hexagonal grids can more naturally follow contours), and allow for a more intuitive definition of what cells would be the direct neighbours of a focal cell (Birch et al., 2007). With hexagonal cells, all immediate neighbours are equidistant and share a border with the F I G U R E 1 Conceptual diagram illustrating the framework and purpose of the EPM r package. Individual species' geographical ranges are converted to a standard grid system and metacommunity composition is determined. Incorporation of phylogenetic and morphometric information then allows one to calculate alpha and beta diversity metrics that relate to metacommunity composition and maintenance F I G U R E 2 The choice of approach for converting polygons to grid cells and the type of grid cell shape lead to different outcomes. Grid cell neighbours are also different, depending on the cell shape focal cell, whereas with square cells, cells connecting to the focal cell via contact at the corners are more distant than cells that share a border ( Figure 2, Birch et al., 2007). In the EPM package, square grid cells are much more memory efficient (as currently implemented), and will therefore be most appropriate for datasets where a very large number of cells will be generated (i.e. large spatial extent, and/or high spatial resolution). Hexagonal grids are handled by the sf package (Pebesma, 2018), and square grids by the tErra package (Hijmans, 2022).
An additional option is also included for how to handle species with small ranges. Highly endemic and small-ranged species can potentially have ranges that are so small that they will not register in any grid cell. The chances of this will increase as the resolution of the grid decreases. Depending on the objectives of the researcher, it may be important to include endemic, rare or potentially sensitive taxa in subsequent analyses (Laffan et al., 2016). Such taxa may, for instance, draw attention to regions of particularly fast diversification (such as in topographically complex regions; Quintero & Jetz, 2018), or high ecological divergence. Therefore, if the 'retainSmallRanges' option is enabled, species that would otherwise be dropped due to their small range size are forced to remain present in the single cell that contains the majority of their range (or multiple cells if the range is non-contiguous).
What the resulting sets of species per grid cell represent will depend on the underlying geographical data. If the input data represent highly detailed range polygons from survey work, and/or from species distribution modelling, then the resulting sets of species per grid cell may potentially be thought of as representing the local communities or metacommunities at those sites (Ellis-Soto et al., 2021).
If the input data are extent-of-occurrence range polygons, then the resulting sets of species may more appropriately be thought of as regional species pools that contribute to the local communities found in those grid cells (Belmaker & Jetz, 2012;Harrison & Cornell, 2008).
These distinctions will depend on the spatial resolution of both the input data and the geographical grid used for analysis, and are left to the discretion of the researcher.

| Morphological
Although the EPM r package was specifically designed to examine geographical patterns in phylogenetic and morphometric shape data, it was also designed with flexibility in mind; therefore, several types of trait data can be explored, be they univariate, multivariate shape or pairwise data (not all metrics can be calculated for all types of data).
Multivariate analyses should be restricted to variables measured on a common measurement scale so that covariances among traits are meaningful and distances between multivariate species are Euclidean; for example, coordinates of anatomical landmarks from Procrustes superimposition, projected onto the tangent plane, performed by the gpagen function in the gEoMorPh r package (Adams & Otárola-Castillo, 2013). Other types of morphological data may be analysed as well, so long as they satisfy these criteria. Shape data may be analysed as either two-or three-dimensional landmark coordinate data, superimposed and converted to a 2D array format prior to being imported into the EPM r package.

| Phylogenetic
An ultrametric phylogeny can also be integrated, allowing for the calculation of diversity metrics that include phylogenetic relationships between species in grid cells ( Table 1)

| THE epm G rid OBJEC T
The core data object in the EPM package is of custom class epmGrid.
This object contains a list of all unique sets of co-occurring species that are found in grid cells, as well as a reference mapping unique species sets to specific grid cells. This structure allows for a more condensed and efficient way to store community information, and also lends itself to faster calculations of diversity metrics, as calculations are only executed for unique communities. We refer to these species combinations as sets of species rather than as communities or species pools, to avoid any ecological implications, as these sets of species can be defined for any spatial scale. The epmGrid object also stores the trait and phylogenetic data, if those have been supplied.

| By-cell grid metrics
The function gridMetrics() allows for the calculation of a number of diversity metrics (Table 1), where the calculation is done for the species sets in each grid cell. Of the diversity metrics made available in the EPM package, some can be applied to univariate, multivariate or pairwise trait data, and some are specifically intended for multivariate shape data. Other metrics can only be calculated if a phylogeny has been provided, and some metrics are range-weighted, such that taxa with smaller geographical distributions (or in the case of phylogenetic range-weighted metrics, phylogenetic branches representing smaller geographical distributions) are emphasized (Rosauer et al., 2009). To not limit researchers to those metrics currently implemented, the function customGridMetric() additionally allows one to define their own grid metric.
Although a set of diversity metrics has been implemented in the EPM package (Table 1), several studies have demonstrated that the large proliferation of such metrics has led to a certain degree of redundancy such that multiple metrics exist that effectively measure the same thing (Tucker et al., 2016;Miller et al., 2017). We therefore urge caution and point readers to these studies to better understand what metrics may be most appropriate.

| Turnover metrics
In addition to diversity metrics calculated for each species community independently, we include functionality to generate maps of taxonomic and phylogenetic beta diversity, as well as turnover in morphological disparity. The beta diversity functions in the EPM package utilize the framework proposed by Baselga (2010Baselga ( , 2012 where beta diversity is decomposed into two additive components: dissimilarity resulting from turnover and dissimilarity resulting from differences in species richness (nestedness). These measures can be calculated based on the species communities exclusively (function betadiv_taxonomic()) as well as based on species communities and phylogenetic relatedness (function betadiv_phylogenetic()).
Custom metrics can also be implemented via the customBetaDiv() function. Beta diversity can be quantified across the geographical extent in two ways: (a) in terms of a given reference location such that the resulting map represents community dissimilarity of all grid cells relative to the focal location and (b) via a moving window such that the beta diversity values in the resulting map represent multisite local dissimilarity within a certain radius of any given grid cell (Baselga, 2013;Leprieur et al., 2012).
A novel turnover metric for morphological shape disparity is made available in the EPM package (function betadiv_disparity()).
This function generates a map that reflects the degree to which local-scale variation in species composition reflects change in morphological disparity. To do so, we first calculate total morphological disparity for all species, D total as well as species-specific partial disparities SpDisp (function getSpPartialDisparities()), following Foote (1993) such that the sum of SpDisp equals D total . We then define a focal cell and its neighbouring cells containing species sp NB_1 , sp NB_2 , …, sp NB_n , where n is the number of cells within a moving window defined by some radius.
We then calculate morphological disparity turnover as: If all species communities across a set of neighbouring grid cells contribute equally to overall morphological disparity, then this value will be zero, but the greater the variation in relative disparity across grid cells, the larger β disparity will be. This measure will allow one to identify regions that disproportionately influence morphological disparity, and the abiotic or biotic features that may be driving this variation.

| Additional functions
Various functions have also been included in this r package to facilitate the export of data to other r packages, statistical or GIS software. The function generateOccurrenceMatrix() converts ep-mGrid objects into a presence/absence matrix with sites as rows and taxa as columns, and the function coordsFromEpmGrid() returns the grid cell coordinates. This is a common format for other r packages that focus on species community structure. The function calcMeanShape() is intended for multivariate shape and will return the mean shape per grid cell (see Maestri et al., 2018). The function tableFromEpmGrid() extracts the data attributes from a set of epmGrid objects, as well as other spatial datasets, at either random or predefined coordinates. The primary purpose of this function is to provide the required inputs for geostatistical analyses. Finally, the function writeEpmSpatial() writes the spatial component of the epmGrid object to disk, allowing it to then be loaded into other GIS software.

| AN EMPIRI C AL E X AMPLE: NORTH AMERIC AN SQUIRREL S
To demonstrate some of the capabilities of the EPM package, we applied this framework to 68 species of North American squirrels (Sciuridae). We downloaded geographical range polygons for mammals from IUCN (2021), and paired this spatial dataset with multivariate mandibular shape data (Zelditch et al., 2017) and a phylogeny for mammals (Upham et al., 2019). After projecting the range polygons to an equal area projection, we opted for a hexagonal grid system with a resolution of 50 km. By then adding the shape and phylogenetic data to the epmGrid object, we are able to easily visualize species richness, morphological shape and phylogenetic metrics across geographical space. We can see that species richness is highest in the mountainous west of North America, with up to 13 potentially co-occurring species, but that there is also substantial species richness in regions of the eastern half of the US (Figure 3a).
We investigated whether increases in species richness are mediated by increases in the size of the occupied morphospace, increases in the degree to which species are more tightly packed in that morphospace, or a combination of the two (Macarthur, 1965). Using the gridMetrics() function, we generated maps for two morphological diversity metrics: morphological range and the mean nearest neighbour distance. The range is a simple richness-independent measure of the overall size of the morphospace and the mean nearest neighbour distance is the morphological distance between a species and the other species that is most morphologically similar, and this is averaged across all species in each grid cell. We also generated a map for phylogenetic mean nearest neighbour distance, which is equivalent to the morphological metric, but for phylogenetic patris- Using the function tableFromEpmGrid(), we can extract these data via a random sampling of grid cells and summarize these morphological and phylogenetic metrics as a function of species richness . This function also provides the elements that could be used in spatial statistical approaches. For the three diversity metrics, we fitted spatial simultaneous autoregressive (SAR) models with the sPatialrEg r package (Bivand et al., 2013), and selected the best neighbourhood structure via AIC model selection. In Figure 3f-h, we show the fitted values from the SAR models to examine the relationships between these diversity metrics and species richness, with reduced artefacts from spatial autocorrelation. We find that morphological range on average increases for small values of species richness but reaches a plateau at a richness of around six species (Figure 3f).
Nearest neighbour distances, in contrast, decline and do not level out at higher species richness (Figure 3g-h), with phylogenetic nearest neighbour distances declining more precipitously than predicted under the SAR model. These relationships appear to be nonlinear, potentially hinting at saturation at high species richness. In general, it would appear that larger squirrel assemblages are enabled by both an expansion of morphospace and a tighter packing within it. More species-rich regions are also more likely to contain closely related species. The Great Plains region, which figures prominently across all examined metrics, appears to be a region with few squirrel species that are fairly morphologically and phylogenetically divergent. These maps and graphs provide preliminary observations-further analyses would be warranted to understand what leads to the spatial patterns observed for these metrics, and to further explore how these broadscale species assemblages are structured. A detailed walk-through of the code written for this empirical example has been deposited on Dryad (Title et al., 2022a). We have also included a simpler 'beginner's guide' that focuses on the main functionality of the EPM r package and uses included datasets.

| CON CLUS IONS
We introduce the EcoPhyloMaPPEr r package, which facilitates the integration of geographic, morphological and phylogenetic information across spatial scales, while providing flexibility to the user throughout. Although tools now exist in the rich R environment for investigating such axes of biodiversity, few include all three axes and generate maps of spatial patterns, and none that we are aware of allow for the geographical mapping of diversity metrics relating to morphological shape. With the increasing availability of geographical, phylogenetic and shape datasets, the availability of tools to analyse all three within the EPM package will lead to an improved F I G U R E 3 A demonstration of spatial mapping and analysis with the EPM r package, for north American squirrels. Species richness (a) can be mapped from range polygons. Morphological range and mean nearest neighbour distances can be mapped (b, c) and analysed in conjunction with species richness (f, g) thanks to the inclusion of geometric morphometric data, an example of which is shown (e) for Tamiasciurus douglassi. With the inclusion of a phylogeny, phylogenetic mean nearest neighbour patristic distances can also be mapped (d) and analysed (h). The values in f-h are fitted values from the models rather than the raw data. Boxplots in f-h indicate the 5%-95% and 25%-75% quantiles. Asterisks indicate mean values. Blue lines show the modelled relationship from a simultaneous autoregressive model that accounts for spatial structure understanding of how abiotic factors and evolutionary history have shaped community structure at local spatial scales and continue to influence the dynamics of interspecific competition and coexistence.
Tools within the package can also be used to investigate dimensions of biodiversity at the regional to continental scales, such as patterns of diversity and turnover in relation to broad-scale climate gradients, landscape properties and biome distributions. Even in the absence of morphological shape data, the EPM r package is a useful tool for mapping species richness patterns, as well as phylogenetic and univariate trait data of any kind. The ability to implement custom functions which are then applied across grid cells provides users with a great degree of flexibility. Many opportunities for further development are possible, including further incorporation of uncertainty (intraspecific trait variation, uncertainty associated with geographical range polygons, etc.), and the addition of abundance-weighted metrics. Sensitivity analyses could also be developed to assess the potential influence of missing geographical or trait data on diversity metrics. Another potential direction for improvement would be to build in the generation of null models via randomization, so as to identify geographical regions that exhibit phylogenetic and/or morphological patterns that depart from some null expectation (Helmus et al., 2007;Kembel & Hubbell, 2006;Mishler et al., 2014;Miller et al., 2017).

ACK N OWLED G EM ENTS
We thank the participants of the online Transmitting Science workshop, who tested out an early version of this r package. We also thank M. Grundler, T. Smiley and two anonymous reviewers for constructive comments and suggestions that improved this manuscript.

CO N FLI C T O F I NTE R E S T
None declared.

PEER R E V I E W
The peer review history for this article is available at https://publo ns.com/publo n/10.1111/2041-210X.13914.

DATA AVA I L A B I L I T Y S TAT E M E N T
The EPM r package source code is hosted at www.github.com/ptitl e/ epm, and has been archived with Zenodo at https://doi.org/10.5281/ zenodo.6577160 (Title et al., 2022b). The package is available on CRAN at https://cran.r-proje ct.org/packa ge=epm. All squirrel data from the empirical demonstration have been deposited in the Dryad Digital Repository https://doi.org/10.5061/dryad.qjq2b vqjg (Title et al., 2022a), along with EPM package tutorials.