Genogeographic clustering to identify cross‐species concordance of spatial genetic patterns

While in recent years, there have been considerable advances in discerning spatial genetic patterns within species, the task of identifying common patterns across species is still challenging. Approaches using new data from co‐sampled species permit rigorous statistical analysis but are often limited to a small number of species. Meta‐analyses of published data can encompass a much broader range of species, but are usually restricted by uneven data properties. There is a need for new approaches that bring greater statistical rigour to meta‐analyses and are also able to discern more than a single spatial pattern among species. We propose a new approach for comparative multi‐species meta‐analyses of published population genetic data that address many existing limitations.


| INTRODUC TI ON
It is a considerable challenge in conservation biology to understand the relationships among genetic, ecological and environmental variables Taberlet et al., 2012). One relevant genetic measure is genetic diversity, which is a fundamental component of biodiversity and has crucial implications for fitness, productivity and resilience of species, and ultimately for ecosystem functionality (Aguirre & Marshall, 2012;Bernhardt & Leslie, 2013;Hughes et al., 2008). Patterns of genetic diversity within populations can provide insights about historical processes such as demographic change (Garrick et al., 2010;Nason et al., 2002). Information about genetic diversity plays an important role in conservation management , and there is an increasing focus on spatial patterns of diversity in large-scale conservation initiatives Coates et al., 2018).
For insight into genetic processes at the whole ecosystem level, a multi-species approach is required (Dawson, 2014;Hickerson & Meyer, 2008). Comparative phylogeography describes the study of multiple co-distributed species, often to identify common spatial genetic patterns (Dawson, 2014;Liggins et al., 2015;Moritz & Faith, 1998;Rocha et al., 2007). Although many such studies are focused on genetic divergence (Marko & Hart, 2011), there is also an increasing drive to understand patterns in genetic diversity (Lamy et al., 2017;Vellend et al., 2014). Within each species, genetic diversity is influenced by numerous mechanisms including mutation, migration, genetic drift, population demography and natural selection (Hewitt, 2004;Hughes et al., 2008;Vucetich & Waite, 2003). Therefore, different species will naturally exhibit different spatial genetic patterns due to their undoubtedly different histories. However, each species serves as a sample replicate of a system in which genetic diversity is shaped by the action of common mechanisms. Multi-species studies may provide insight into these common mechanisms by illuminating spatial patterns of genetic diversity that are shared across taxa (Dawson, 2014;Liggins et al., 2016;Toonen et al., 2011).
Although there are various methods available for analysing spatial genetic patterns within species, identifying a suite of key patterns underpinning a community of species remains challenging (Liggins et al., 2016). Riginos et al. (2011) outlined two main types of quantitative approaches to finding concordant patterns. The first is to collect congruent data by co-sampling multiple species at shared locations Hickerson et al., 2007;Liggins et al., 2016;Villamor et al., 2014), while the second is to perform a meta-analysis of previously published studies (Pelc et al., 2009;Riginos et al., 2011). Studies using the first approach are limited by the considerable logistics and time required to undertake extensive co-sampling and usually based on relatively few species. However, because they share uniform data properties, they are readily amenable to statistical analysis such as hypothesis testing (Liggins et al., 2016). Additionally, where studies of co-sampled species can incorporate temporal information about divergence events and identify pseudocongruence, they present an opportunity to understand factors driving patterns of genetic differentiation (Dawson, 2014).
By contrast, studies based on previously published analyses tend to be severely restricted by different sampling locations for each species and often by not having access to the raw genetic data.
These aspects greatly hinder investigations of the factors underlying geographic patterns. However, studies based on pre-existing data have the advantage of being able to include a much larger number of species (Riginos et al., 2011;Selkoe et al., 2014).
Recently, some studies have merged the two approaches identified by Riginos et al. (2011), for instance by using a meta-analysis of synchronously diverging codistributed pairs of taxa to discern factors driving patterns of population differentiation (Schiebelhut & Dawson, 2018). Despite this, there are still clear benefits in taking advantage of the wealth of previously published studies in order to maximize the breadth of species analysed. It seems clear that in complex ecosystems such as marine assemblages, assessing a broad spectrum of species offers us the greatest chance of understanding community assembly and functioning. By examining the patterns of population genetic variation over a broad range of species, we can build a true overview of the most important factors responsible for community structure.
Here, we present and explore a new approach for meta-analysis of spatially explicit genetic patterns across multiple species that addresses some of the limitations of existing approaches. Difficulties encountered in previous meta-analyses include the following: (1) differences in sampling location and intensity, (2) differences in genetic markers used, (3) inaccessibility of the raw genetic data, (4) a limited number of methods to objectively compare spatial genetic patterns among species, and (5) no existing method to quantitatively group species by their spatially explicit genetic patterns. To address these problems, we present a method of capturing the spatial pattern for each species that is indifferent to precise sampling locations, enabling us to identify species whose spatial genetic patterns are similar regardless of differences in sampling location and intensity.
The analysis follows a three-stage approach: (1) compile location-specific measures of a genetic quantity of interest, such as diversity, for each species across the geographic range of the study, (2) fit a smooth spatial trend curve to the genetic data for each species, to enable comparisons between the spatial patterns of all species regardless of their individual sampling regimes, and (3) quantitatively cluster species according to their similarity in spatial pattern. As it is the shape of the genetic geographic patterns that are being compared, and not the measured diversity values themselves, this approach can integrate results from contributory studies using different genetic metrics to measure diversity. Because this encompasses possibilities beyond the phylogenetic relationships of DNA haplotypes, we term these "genogeographic" rather than "phylogeographic" patterns.
Our ultimate aim is to create a quantitative method to capture spatial genetic patterns using data from previously published studies, notwithstanding differences in sampling location and intensity, and to cluster species according to similarities in their corresponding genogeographic profiles. In this way, it becomes possible to evaluate the variables that unite species with similar patterns, gaining insight into both the intrinsic (biological) and extrinsic (environmental) forces potentially governing the entire community. Our goal here is to find and compare genetic patterns and not investigate whether genetic indices give a good description of evolutionary processes.
As our approach cannot tease apart the complex genetic and demographic forces driving the observed patterns, we view it as an exploratory method for investigating multi-species communities, suitable for generating hypotheses about the processes shaping biodiversity, which can then be addressed by more targeted analyses.
We illustrate the approach using an example of within-population genetic data on allelic diversity, in a one-dimensional scenario corresponding to the intertidal community around the coastline of New Zealand. Such one-dimensional scenarios have been common in previous studies of marine comparative phylogeography (Cahill et al., 2017;Crandall et al., 2019;Matias & Riginos, 2018;Mertens et al., 2018;Nielsen et al., 2017;Stanley et al., 2018). Further developments, such as exploring genetic divergence rather than diversity, and extension to two-dimensional habitats, are considered in the Section 4.

| Focal species dataset
For an adequate test of our approach, we sought a dataset from species of one community type, with as broad taxonomic range as possible and with each species relatively well-sampled throughout the geographic range considered. We assembled a dataset of benthic intertidal marine species around the New Zealand coast. The study area comprised the North, South and Stewart Islands. The species of interest were endemic invertebrates inhabiting the intertidal and high subtidal zones, which have been relatively well examined previously. Measures of genetic diversity at each sampling location were obtained for each species from previously published studies and studies underway in our laboratory. Studies were included if they met a set of inclusion criteria based on Pelc et al. (2009) (see Appendix S1). Twenty-one species were included comprising all the main invertebrate phyla found along the New Zealand coast: Cnidaria, Arthropoda, Echinodermata and Mollusca (Table 1).

| Genetic measures
For each species, a genetic diversity measure was computed within each geographic location. The measure of genetic diversity used depended on the genetic marker available for each species. Haplotype diversity "H" was used for mitochondrial DNA, and the analogous allelic expected heterozygosity "H e " for nuclear DNA microsatellites.
Collectively, these values are henceforth referred to simply as allelic diversity, "H". Values were either reported directly in the publications used, or alternatively calculated using arlequin v 3.5 (Excoffier & Lischer, 2010) and Genealex (Peakall & Smouse, 2012) from the genetic data associated with the publication. Adjustments for small sample sizes were included in the calculation of all parameters, using unbiased estimators (Nei, 1972). The assembled dataset is available from the Dryad Digital Repository (https://doi.org/10.5061/dryad. b2rbn zsbr).

| Linearized coastline
The New Zealand coastline was linearized using coastal distances between sampling points. Sampling points within 50 km were synonymized to the same location. Linearization enabled us to plot genetic patterns along one spatial dimension, namely distance along coast. The starting point at 0 km was Cape Reinga, the northernmost point of the country, and the distance coordinate followed the coastline in a clockwise direction around New Zealand, through to Stewart Island in the south, before returning to Cape Reinga ( Figure 1). The linearized coastline crossed Cook Strait between the North and South Islands at its narrowest point. The north-south route at this point best captured the known predominant spatial genetic patterns in the region of Cook Strait, which are north-south divergences, rather than east-west divergences.

| Genogeographic species curves
For each species, the data points corresponding to the measured H at each sampled location were plotted against the appropriate coordinate on the linearized coastline, and a curve was fitted to these points to create the spatial pattern for that species. The fitted curves aimed to capture all major trends in the data points while being smooth enough to allow the curves for different species to be compared. To achieve this, we used cubic spline curves, which are piecewise cubic polynomials that can fit very flexible shapes while remaining smooth (Hastie & Tibshirani, 1990).
In the case of our example data set, which formed a closed loop, each curve was constrained to have the same height at the start and end, because both ends corresponded to Cape Reinga. This was achieved by cycling the data three times over the entire coastline length, and using only the curve fitted to the middle cycle, so that the curve height at each endpoint was correctly influenced by data points to each side of the Cape Reinga divide and then imposing a penalty to force the curve heights to match at the start and end of the middle cycle. Spline knots were equally spaced, and the middle cycle was estimated with n/3 knots, where n is the number of data points for the corresponding species. This number of knots was selected to give the curve sufficient flexibility to expose key patterns in the spatial genetic profile, while maintaining sufficient smoothness to respect the spatial autocorrelation in each profile. The impact of choosing Cape Reinga as the start point was investigated using three alternative start points spaced at 1000 km intervals.
If the coastline followed a linear geometry rather than being a closed loop, the triple-cycling of data and the penalty steps would be unnecessary, and n/3 would be the number of knots used for the whole curve. When dealing with a linear coastline, ideally curves should not be extrapolated beyond the sampled range for any species. This might restrict the spatial scope of the study to the sampled extent common to all species of interest. The smooth fitted curves define the spatial trend in the chosen genetic measure for each species. These curves are here termed "genogeographic" curves, as they capture similar patterns to those detected in standard phylogeographic studies (Avise, 2000), but are not dependent on phylogenetic data. Here, the curves summarize the broad spatial pattern of genetic diversity within each species.
However, in general, they are applicable to a wide range of genetic and geographic scenarios, as long as there are location-specific genetic measures available and the topography can be adequately represented by a linear spatial coordinate.

| Clustering the spatial genetic curves: genogeographic clustering
The fitted curves were scaled and centred before clustering across species, such that the mean of values along each scaled-and-centred curve was 0 and the standard deviation was 1. The scaling forced the same overall variability in all curves before clustering, to allow the clustering procedures to identify species that showed similarities in their geographic patterns rather than in their absolute genetic measures. This step was appropriate for our study objective of comparing spatial genetic patterns among species, irrespective of the degree of genetic diversity within each species, but could be omitted if comparisons of absolute genetic diversity were of primary interest. With the scale-and-centre step, the methodology will identify species that have similar locations for higher and lower genetic diversity relative to their own mean diversity. A bootstrap method described below is used to take account of the level of statistical support for each curve, to ensure the scaling step does not create spurious patterns. Without the scaling step, species will cluster according to the absolute level of their genetic diversity and the extent and locations of geographical differences. Without scaling, comparisons should therefore be restricted to species for which a common genetic marker is available. TA B L E 1 New Zealand species and studies included in the meta-analysis. Studies were selected according to the inclusion criteria in Appendix S1  Keeney et al. (2013) Following the scale-and-centre step, we used a cluster dendrogram to cluster species with similar genogeographic curves. We took the pairwise distance between two curves to be the Euclidean distance between the heights of the curves evaluated at 1000 equally spaced points along the linearized coastline. We chose 1000 points as a number that was well over ten times the maximum number of knots per curve and checked that results were consistent between alternative choices ranging from 500 to 2000.
In the dendrogram chart, species with the most similar genogeographic curves, corresponding to the smallest pairwise distances, are assigned to the first clusters at the bottom of the chart. Those species that are least similar only link together in the final, largest clusters that connect at the top of the chart. For our dataset of 21 species, the dendrogram contains a total of 20 clusters. The dendrogram was obtained using R function "hclust" with method "ward.
D2" to decide the order of clusters. This method chooses the link at each step that minimizes the total within-cluster variance and can be thought of as the clustering equivalent of principal components analysis (Murtagh & Legendre, 2014).
A colour map was also generated for each species, to display the species-specific genogeographic pattern for the diversity measure. Low values of the scaled-and-centred curves were depicted in yellow, and high values in red. Colour maps were also generated for each of the clusters obtained, using the average of the scaled-andcentred curves within the cluster.

| Measuring significance of genogeographic species clusters: parametric bootstrap
The data matrix we used to generate the dendrogram was not of the traditional species-by-site configuration, so to assess the significance of the clusters we used a parametric bootstrap. Each bootstrap replicate consisted of new data drawn from the fitted curves at the same linearized coastline coordinates as the real data, created by generating random draws from the fitted beta distributions underlying the trend curves. The percentile intervals shown on the trend curves mark the precision with which the trend was estimated for each species, based on the estimated dispersion parameter for that species, and they indicate the interval in which 95% of the randomly drawn bootstrap points will fall at any location. Having drawn a set of new data points for every species, the genogeographic curves were refitted to the new random data. For species with high scatter of the data points about the curve in the real data, the shape of the fitted curve changed substantially across bootstrap replicates, whereas for species with low dispersion, the curve tended to repeat the same shape for each random dataset. This procedure enabled us to allow for uneven sampling intensity or precision among species in our data.
From each set of bootstrap replicate data across the 21 species, a new clustering of the replicate trend curves was undertaken, and a new dendrogram constructed. The whole process was repeated for 1000 bootstrap replicates to obtain 1000 different dendrograms.
To determine significant joins or splits, each pair of species was examined. For each of the 1000 dendrograms, if two species joined together before the final link, they were considered to belong to the same major cluster, or the same "half" of the dendrogram for that bootstrap replicate, whereas if they did not join until the final link, they were considered to have appeared in opposite halves of the dendrogram for that replicate. The two species were considered significantly joined at the 5% level if they appeared in the same half of the dendrogram for at least 95% of replicates, and they were considered significantly split at the 5% level if they appeared in opposite halves of the dendrogram for at least 95% of replicates. The significance of pairwise joins or splits therefore provided a measure of the robustness of these joins or splits to the variability encountered in the bootstrap procedure, which reflected the different levels of uncertainty for different species due to uneven sampling intensities and differences in the intrinsic spatial variability of the genetic measure.

| Principal coordinates analysis (PCO) for genogeographic curves
We used a principal coordinates analysis (PCO) to display species according to similarity of their scaled-and-centred genogeographic curves and visualize species with significant joins or splits. PCO takes the matrix of distances between the genogeographic curves for each pair of species, as described above, and depicts these distances on a two-dimensional projection plot. We plotted lines between pairs of species that were significantly joined or split according to the bootstrap analysis.

| Spatial genetic patterns from genogeographic curves and maps
The fitted genogeographic curves for the 21 species listed in Table 1 are plotted in Figure 2 and Figure S1.

| Similarity among species curves:
genogeographic species clustering Figure 3a shows the cluster dendrogram based on the 21 genogeographic H curves. The scaled-and-centred curves contributing to each cluster are shown in Figure 3b and Figure S3, showcasing the similarity of curves within clusters. For example, Cluster 11 contains five genogeographic curves with similar trends, in which the analysis detected a group of species with a common pattern of lower genetic diversity in southern populations compared to northern populations. This spatial pattern is more readily observed in the average heat map for Cluster 11 (Figure 3c). Species patterns within each cluster remained relatively similar for Clusters 1 to 17, while clusters beyond this combined some relatively different genogeographic curves. Clusters in which most links were identified as significant using the bootstrap approach are indicated by asterisks on the dendrograms, using a 10% significance level to allow for the stringency of the test where clusters contain species with few sampling points.

| Cluster membership and significance
Based on the cluster dendrogram in Figure

| Evaluating the statistical approach
We have presented an innovative approach to comparative multispecies analysis of spatial genetic patterns that accommodates species sampled at different locations and analysed using different genetic markers. In general, the fitted genogeographic curves appeared realistic, reflecting known major patterns of spatial genetic variation without noisy detail. The curves illuminated several distinctive patterns of spatial variation, enabling species to be clustered according to their similarity in pattern.

| Use of original genetic data
Analyses that are solely based on a compilation of conclusions from individual studies are likely to be impacted by the questions that were being investigated in the contributory studies. In New Zealand, well-documented biogeographic boundaries are likely to have predisposed authors to test for specific concordance with the resulting bioregions (Shears et al., 2008), potentially overlooking other interesting patterns in the spatial profile of genetic diversity. By using the raw within-population genetic diversities extracted from the original studies, our analysis retained all the geographic information available, so the patterns exposed were not confined to those reported by the original authors.

| No limitations on the genetic measures used
Our methodology can accommodate any measure of genetic diversity or divergence, depending on the questions to be addressed. In this initial study, a within-population genetic diversity measure (H) was chosen for illustration. However, other metrics of genetic properties could equally well be employed. For the purposes of identifying priority regions for conservation, a measure of phylogeographic endemism (Rosauer et al., 2009) may be appropriate. Where the goal is to identify regions of common past demographic expansion, metrics of departure from neutrality may be useful, such as Tajima's D (Tajima, 1989) or Fu's F (Fu, 1997).
A direction of particular interest would be to incorporate measures of genetic divergence among sampled locations, such as F ST , Φ ST or D Jost . These metrics compare two or more sampled locations, whereas diversity measures such as H are associated with a single sample location, so attention would need to be given to associating a population divergence metric with specific locations. Illuminating regions of high and low genetic divergence that are common across species could bring substantial benefits for defining discrete geographic units for conservation and management (Pikitch et al., 2004;Villamor et al., 2014). However, although our method can detect common hotspots of genetic divergence at a regional level, it will not reveal the precise locations of sudden discontinuities due to the smooth nature of the trend curves. If discontinuities were hypothesized at particular locations, we could fit curves that were piecewise smooth, which could then be clustered as usual.

F I G U R E 4 PCO plot based on similarity between scaled-and-centred genogeographic curves fitted to the genetic diversity measure H.
Grey lines on the left figure show significant pairwise joins at the 5% level, and those on the right figure denote significant pairwise splits at the 5% level. Major clusters are marked on each panel 4.1.3 | Comparisons across a wide range of species A principal advantage of this methodology was to allow comparison across a wide range of species, permitting the inclusion of many large datasets from previous publications with disparate sampling localities. It is usually recommended that a common sampling design be used to compare spatial genetic patterns among species, especially to uncover the underlying processes driving common patterns Liggins et al., 2016;Villamor et al., 2014).
However, by creating a statistical framework that does not rely upon a common sampling design, we can fully exploit the wealth of existing genetic data. Given the success of this trial, a much larger data set from New Zealand waters could be analysed to provide a more comprehensive assessment of marine biodiversity patterns around the country.

| Limitations and extensions
The framework we have presented is an exploratory one, with various options that can be adjusted according to the context and goals of the study. One such option concerns the scaling of genogeographic curves to a common mean and standard deviation before clustering. This is a pragmatic strategy that enabled us to illuminate commonalities in spatial genetic patterns across species, regardless of the patterns manifesting with different magnitudes in different species. It is expected that genetic measures will manifest with different absolute value and variability in different species due to different population sizes and mutation rates. By applying the scaling step, we were able to focus on spatial pattern irrespective of absolute value.
The bootstrap method is an important part of this strategy, because spatial patterns for some species may be falsely exaggerated in the scaling step so it is important to identify species for which the shape of the trend curve is not robust to sampling variability. These species will fail to return significant cluster memberships in the bootstrap analysis. Omission of the scaling step would be appropriate if a comparison of absolute diversity patterns were of primary interest and can readily be achieved by toggling a switch in the code provided.
This option would require comparisons to be restricted to species that were analysed with a common genetic marker.
The methodology as presented is most readily applied to closedloop coastline geometries that can be readily linearized. Application to linear coastline geometries is analytically straightforward, but there might be practical difficulties if the species of interest are sampled over different spatial ranges. Additionally, there will be a loss of precision in the estimated trend curves towards the extremes of the range, so it might be useful to apply buffering to ensure the clustering of trends across species is not influenced by edge effects.
This can be accomplished by omitting a minor section of each trend curve at the start and end of the range when clustering the curves.
Although many comparative studies focus on one-dimensional coastlines and river systems that can be naturally linearized (Cahill et al., 2017;Crandall et al., 2019;Matias & Riginos, 2018;Mertens et al., 2018;Nielsen et al., 2017;Stanley et al., 2018), more complex geometries may be of interest. The linearization step might be hard to implement over complex coastlines such as archipelagos. This difficulty could potentially be addressed by undertaking the analysis at both broad and fine scales. Expansion of the methodology to two-dimensional systems would involve the calculation of "genogeographic surfaces," which could then be clustered across species.
Reduction of edge effects through buffering would likely become more important for such higher dimensional analyses.
Our approach appears to be successful at discerning the similarity of broad regional patterns, but might miss fine-scale geographic differences due to the smoothing procedures. Areas where finescale patterns are of interest could be examined over a smaller geographic range, permitting more detailed curve-fitting. Preliminary work suggests our framework is promising for detecting major spatial patterns of genetic divergence, as opposed to diversity, but there may be greater difficulty in using it to discern patterns of chaotic patchiness (Johnson & Black, 1982) or isolation-by-distance (Wright, 1943). However, with respect to fitting the genogeographic curves used here, these processes would tend only to add noise rather than to create bias.
Finally, as with any meta-analysis, the conclusions are only as robust as the data supplied. Our methodology allows multiple studies with disparate sampling regimes to be included in the same analysis, and it estimates the uncertainty separately around each genogeographic curve to ensure that only those curves with strong statistical support will generate significant outcomes. However, if a predominance of contributory studies missed some important feature of a spatial genetic profile, perhaps due to low sampling coverage or a common gap arising from shared sampling designs, the same limitation will apply to the meta-analysis.

| Common genetic diversity patterns for New Zealand coastal species
Spatial genetic diversity patterns have been observed across the geographic distribution of many species and have been used in the past to infer range expansion and other historical processes affecting the population demography of a species (Templeton et al., 1995). The results here suggested a high variation in patterns of genetic diversity among the species analysed, but some strong commonalities in spatial patterns were also highlighted by the species clusters detected.
Importantly, if genetic diversity patterns were simply averaged across all species, as is commonly undertaken in comparative phylogeography studies (Miraldo et al., 2016;Selkoe et al., 2016), no geographic patterns would have been evident. This is best exemplified by comparing the maps of genetic diversity within each major cluster ( Figure 3c) with those for all species combined (cluster 20: Figure S4). It was only when groups of similar species were discriminated through clustering that patterns became evident. Our analysis revealed at least three major geographic patterns, each shared by a number of species; but because the regions of high and low diversity differed between clusters, these counteracted each other and caused the spatial trend to disappear altogether when all species were combined ( Figure S4). Notably, the clusters of species identified were not obvious groupings by any biological criteria such as taxonomy or life history and would have been unlikely to have been discerned through a priori clustering within any traditional analysis of diversity.
The principal divide between spatial diversity patterns in our data occurred between Cluster 17, corresponding to species with highest genetic diversity in the North Island, and Cluster 19, which grouped species with the overall highest genetic diversity in Cook Strait (Cluster 15) and the South Island (Cluster 18) (Figure 3). The spatial genetic pattern displayed by the species of Cluster 17 corresponds to an inverse correlation of genetic diversity with latitude.
This pattern has already been documented in several of the species included in this cluster, such as P. novaezelandiae (Silva & Gardner, 2015), H. iris , P. regularis (Ayers & Waters, 2005) and L. smaragda (Arranz et al., 2021). This spatial diversity distribution agrees with global patterns of genetic diversity, where high levels of genetic diversity are often expected at lower latitudes (Adams & Hadly, 2013), together with evidence that speciation rates are also higher in the tropics (Jablonski et al., 2006).
The species grouped in Cluster 18, with higher diversity in the south, presumably had a different ancestral origin or were affected by alternative evolutionary processes. It is possible that this genetic pattern was driven by higher population sizes in the south for these species. Human actions such as harvesting or protection may also have impacted population sizes and thus genetic diversity for some species (Allendorf et al., 2014;Silva & Gardner, 2015).

Species in Cluster 15 had highest diversity in the central Cook
Strait region. A possible explanation for this is the mid-domain effect, commonly referred to as the "abundant centre" model, whereby populations at the edges of their distribution are expected to have lower abundance (Colwell & Lees, 2000). This pattern is also one component of the "core-periphery" or "central-marginal" hypothesis (Eckert et al., 2008). Most of the species examined in this study were endemic, with range confined to the North, South and Stewart Islands, such that the Cook Strait region corresponds to the central part of their distribution. However, other species have a wider distribution range, including areas such as East Australia, Chatham Islands and Kermadec Islands, among others, that were not covered in this analysis. Studies that encompass the entire distribution range of a species can be particularly informative about the specific factors that are instrumental in shaping the spatial genetic diversity (Hasselman et al., 2013;Silva & Gardner, 2015).
Overall, genogeographic analysis of our study data illuminated several groups of species within the New Zealand intertidal marine community with considerable similarity in their spatial genetic patterns. These findings will facilitate study of the variables that unite species with similar patterns, leading to a better understanding of the biological and environmental factors governing the community assemblage.

| CON CLUS IONS
A major limitation of previous comparative phylogeographic approaches has been the need to group either populations or species prior to conducting quantitative analysis (Kelly and Palumbi, 2010;Riginos et al., 2011). The methodology used here provides an objective assessment of the genetic patterns present, therefore avoiding the possible biases introduced by a priori groupings (Pelc et al., 2009).
The analytical approach developed here provides a powerful exploratory method that takes full advantage of existing published data to better understand the predominant spatial patterns of genetic diversity across a community. These findings can be used to generate further investigations of the processes shaping biodiversity, by finding characteristics in common among the species clustered together. Such investigations may then drive more specific tests of hypotheses and population models that can foster deeper understanding of the historical, ecological and biological processes underlying the spatial patterns revealed.

ACK N OWLED G M ENTS
We thank the many authors who previously published valuable genetic data that contributed to this study. Special thanks to Martin the University of Auckland.

CO N FLI C T O F I NTE R E S T
The authors have no conflict of interest to declare.

DATA AVA I L A B I L I T Y S TAT E M E N T
The R code, instructions and example data to use "Genogeographic species clustering" approach are available from the Dryad Digital Repository: https://doi.org/10.5061/dryad.b2rbn zsbr.