What drives study‐dependent differences in distance–decay relationships of microbial communities?

Aim Ecological communities that exist closer together in space are generally more compositionally similar than those far apart, as defined by the distance?decay of similarity relationship. However, recent research has revealed substantial variability in the distance?decay relationships of microbial communities between studies of different taxonomic groups, ecosystems and spatial scales and between those using different molecular methodologies (e.g., high-throughput sequencing versus molecular fingerprinting). Here, we test how these factors influence the strength of microbial distance?decay relationships, in order to draw generalizations about how microbial ?-diversity scales with space. Location Global. Time period Studies published between 2005 and 2019 (inclusive). Major taxa studied Bacteria, Archaea and microbial Eukarya. Methods We conducted a meta-analysis of microbial distance?decay relationships, using the Mantel correlation coefficient as a measure of the strength of distance?decay relationships. Our final dataset consisted of 452 data points, varying in environmental/ecological context or methodological approaches, and we used linear models to test the effects of each variable. Results Both ecological and methodological factors had significant impacts on the strength of microbial distance?decay relationships. Specifically, the strength of these relationships varied between environments and habitats, with soils showing significantly weaker distance?decay relationships than other habitats, whereas increasing spatial extents had no effect. Methodological factors, such as sequencing depth, were positively related to the strength of distance?decay relationships, and choice of dissimilarity metric was also important, with phylogenetic metrics generally giving weaker distance?decay relationships than binary or abundance-based indices. Main conclusions We conclude that widely studied microbial biogeographical patterns, such as the distance?decay relationship, vary by ecological context but are primarily distorted by methodological choices. Consequently, we suggest that by linking methodological approaches appropriately to the ecological context of a study, we can progress towards generalizable biogeographical relationships in microbial ecology.


| INTRODUC TI ON
The distance-decay of community similarity is one of the most widely studied relationships in macroecology (Nekola & White, 1999;Soininen et al., 2007). This relationship quantifies the decrease in compositional similarity (β-diversity) between communities with increasing geographical distance separating them and demonstrates that nearby communities are more similar to each other than distant communities. Distance-decay relationships arise through several different, but often interacting, ecological and evolutionary processes; consequently, ecologists have debated extensively the underlying mechanisms that generate such patterns (Hanson et al., 2012;Nekola & White, 1999;Soininen et al., 2007). Spatial structuring of the environment can lead to distance-decay relationships, because communities close together in space are likely to experience more similar environmental conditions, hence contain more similar communities than those situated in different environmental conditions. Dispersal limitation can also lead to distance-decay relationships by limiting the connectivity between communities, meaning that communities closer together in space will share more species through localized dispersal than those further apart.
Distance-decay relationships are well documented in a multitude of plant and animal communities (e.g., multiple aquatic taxa, Astorga et al., 2012;tropical amphibians, Basham et al., 2019; multiple taxa, Soininen et al., 2007;urban plants, Sorte et al., 2008). Nonetheless, these relationships are of particular interest to microbial ecologists, because microorganisms were assumed to have ubiquitous distributions for several reasons. First, their small size facilitates passive dispersal over large geographical distances by vectors such as wind, bio-aerosolization, ocean currents or migrating animals (Bisson et al., 2007;Favet et al., 2013;Joung et al., 2017;Vašutová et al., 2019), thus potentially overcoming dispersal limitation as a contributory factor to microbial community composition. Second, microorganisms often maintain high population densities in the environment, leading to dispersal by "mass effects", whereby high dispersal rates from areas of increased population density maintain populations in less optimal environments (Shmida & Wilson, 1985), helping them to overcome the constraints of spatially structured environmental gradients. Third, some microorganisms are able to enter dormant states, whether as vegetative cells or as cysts or spores (Locey et al., 2020), allowing them to survive and disperse through suboptimal environments, simultaneously enhancing their dispersive abilities and reducing the influence of spatially structured environmental gradients (Low-Décarie et al., 2016). Combined, these traits theoretically lower microbial β-diversity by increasing the proportion of shared species between distant communities, in turn leading to weaker distance-decay relationships in comparison to macroorganisms. However, empirical studies have yielded mixed results on the strength of microbial distance-decay relationships, where strength is defined as the degree to which geographical distance and community dissimilarity are correlated. Many studies have detected little or no evidence of distance-decay relationships in microbial communities (Hazard et al., 2013;Kivlin et al., 2014), whereas others have reported relationships of varying strengths, across a range of spatial extents, study systems and taxa (Clark et al., 2017;Dumbrell et al., 2010;Martiny et al., 2011). Thus, despite hundreds of empirical studies, the generality of spatial patterns in microbial communities remains unclear, and we are no closer to understanding whether variability in the spatial scaling relationships of microbial β-diversity originates from ecological or methodological sources.
Variation in microbial distance-decay relationships could be attributable to different environmental or ecological contexts in studies. Here, we consider environmental context as the variability in the physicochemical environment (e.g., temperature, pH, topology) and ecological context as the total suite of species present and their interactions. The study systems commonly of interest to microbial ecologists vary in terms of connectivity, which may facilitate or hinder dispersal between communities, thereby leading to weaker or stronger distance-decay relationships, respectively. In well-connected systems where dispersal is more feasible, such as oceanic waters, distance-decay relationships should be weaker than in systems in which dispersal is limited, such as host-associated systems or soil systems, where distance-decay relationships are weaker in deeper soil horizons (Li et al., 2020). Moreover, study systems differ in the spatially structured environmental gradients and heterogeneity they support. Sediments and soils, for example, can support strong environmental gradients over distances of a few metres (Dumbrell et al., 2010) and can be highly heterogeneous at the millimetre scale (Vos et al., 2013), strengthening the correlation between distance and community dissimilarity. Additionally, different study taxa are likely to yield variable distance-decay relationships because they differ in traits that are linked to dispersal efficacy. For example, small cells disperse more efficiently over long distances (Norros et al., 2014;Wilkinson, 2001;Wilkinson et al., 2012), meaning that organisms with larger cell sizes, such as microbial Eukarya, should be more strongly dispersal limited than those with small cell sizes, such as Bacteria (although this might not be true for all taxa, e.g., see Kivlin, 2020). Finally, it is known that spatial extent can influence our perception of ecological relationships, which might contribute to variable distance-decay relationships (Steinbauer et al., 2012). methodological approaches appropriately to the ecological context of a study, we can progress towards generalizable biogeographical relationships in microbial ecology.

K E Y W O R D S
Archaea, Bacteria, biogeography, community dissimilarity, dispersal limitation, Eukarya, macroecology, Mantel test Studies incorporating larger spatial extents would be expected to show exponential decay of similarity, because communities are more likely to originate from distinct species pools, with high dispersal limitation. In contrast, studies with smaller spatial extents are generally expected to follow power-law decay, although the spatial scales at which the distance-decay relationship follows either of these forms might also depend on the size of the study organisms (Luan et al., 2020;Martiny et al., 2011;Nekola & McGill, 2014).
Although the context in which a study was undertaken might contribute to variability in microbial distance-decay relationships, so too could different methodologies. Technological advances have yielded new insight into the structure and functioning of the development of environmental microbial communities (Clark et al., 2018).
However, rapid turnover in molecular methodologies means that our perception of microbial β-diversity patterns integrates methods that vary substantially in both coverage (ability to detect a greater proportion of the community in a given sample) and resolution (ability to resolve closely related taxa) (Glenn, 2011;Muyzer, 1999). Early methods, such as clone library sequencing and community fingerprinting methods [e.g., denaturing gradient gel electrophoresis (DGGE), terminal restriction fragment length polymorphism (TRFLP) or phospholipid fatty acid (PLFA) analysis] are limited in their ability to detect rare taxa (Bartram et al., 2011) and often miss them completely (Low-Décarie et al., 2016). In turn, this could reduce the detected β-diversity, inflating estimated community similarity and weakening distance-decay relationships (Hanson et al., 2012).
In contrast, high-throughput sequencing (HTS) platforms [also frequently referred to as next-generation sequencing (NGS)] can deliver sequencing depths of tens or even hundreds of thousands of sequences per sample (Caporaso et al., 2012), thereby both improving community coverage (the detected proportion of a given community) and allowing more samples to be examined in a single study (improving sample coverage). Consequently, variation in the ability of molecular methods to resolve closely related taxa and to detect rare taxa can be an additional source of variability in microbial βdiversity, which, by extension, can either weaken or strengthen microbial distance-decay relationships.
In addition to the molecular methods, the choice of analytical methods, such as similarity metric, can influence distance-decay relationships. The similarity of communities varies according to the identity and abundance of the species present, their phylogenetic relationships and external factors, such as varying sample sizes. Thus, similarity metrics that vary by one or more of these characteristics would be likely to result in contrasting distance-decay relationships (Barwell et al., 2015;Chao et al., 2005). For example, phylogenetic indices would be expected to yield weaker distance-decay relationships than other metrics, because communities that have no species in common can still exhibit high phylogenetic similarity if the species share many branches of a phylogenetic tree, thereby reducing the decay of similarity over geographical distance (Bryant et al., 2008).
In contrast, quantitative indices compare not only the composition of species present, but also their abundance in each community, reflecting finer-scale changes in community structure, and should therefore result in stronger distance-decay relationships by providing an additional axis (species abundances) by which communities can differ.
Here, to disentangle the effects of both contextual (e.g., spatial extent, taxon or ecosystem) and methodological (e.g., means of identifying/differentiating taxa or similarity metric) variables on microbial distance-decay relationships, we undertook a meta-analysis to test the following specific hypotheses: 1. Bacteria and Archaea will show weaker (lower correlation between geographical distance and community dissimilarity) distance-decay relationships than micro-eukaryotic taxa owing to their smaller size and higher population densities in most environments.
2. Environments that are able to maintain steep physicochemical gradients, such as sediments and soils, will have stronger (higher correlation between geographical distance and community dissimilarity) distance-decay relationships than those such as seawater or air, where environmental gradients are more diffuse.
3. Spatial extent will be related positively to the strength of the distance-decay relationship because, at large spatial scales, increased dispersal limitation and environmental heterogeneity will decrease the variance in community similarity at a given spatial distance, resulting in stronger distance-decay relationships.
4. High-throughput sequencing methods will yield stronger distance-decay relationships owing to: (a) their ability to resolve closely related taxa; (b) their greater community coverage (e.g., number of sequences per sample or number of individuals counted per sample); and/or (c) their greater sample coverage. 5. Phylogenetic similarity metrics (e.g., Unifrac, beta nearest taxon index) will result in weaker distance-decay relationships than other metrics, because communities can be similar phylogenetically, yet different at fine taxonomic resolutions, and quantitative metrics (e.g., Bray-Curtis, Hellinger and Euclidean) will yield the strongest relationships because they reflect changes in both species composition and abundance.

| Meta-analysis
In order to test our hypotheses, we first gathered available data on microbial distance-decay relationships via a systematic literature search.
To do this, five search terms were selected to detect relevant studies (Table 1). All literature searches were conducted using the Web of Science search portal on 18 April 2020, and all results published between 1900 and 2019 (inclusive) were retained. To filter the dataset to studies suitable for testing our hypotheses, search results were downloaded and screened manually using the "metagear" (Lajeunesse, 2016) package in R (v.3.4.1; R Core Team, 2019). Here, suitable studies were those that tested the relationship between community similarity and geographical distance in microbial communities, and not studies of "macroorganisms" or studies of strain-level genetic distance (e.g., using multi-locus sequence typing). Furthermore, studies that did not test distance-decay relationships using Mantel correlation, or that used only partial Mantel tests, were also discarded. We did not identify any potentially suitable studies that were published before 1967, the year the Mantel test was described (Mantel, 1967), and the earliest suitable study was published in 2005.
From these studies, we extracted Mantel correlation coefficients (r) as an effect-size measure for each distance-decay relationship, which we refer to throughout as distance-decay strength. The Mantel test is a permutation-based method used to test for correlation between two distance matrices or, in the context of this study, community (dis)similarity and geographical distance. The Mantel test statistic is an ideal measure of effect size for use in meta-analytical frameworks for several reasons. First, the Mantel correlation test is the most frequently used method for testing distance-decay relationships in microbial ecology (Franklin & Mills, 2007;Ramette, 2007).
Second, given that the Mantel coefficient is a standardized correlation coefficient (i.e., it is bound by minus one and plus one), it provides an easily interpretable and comparable measure of effect size (Harrison, 2011).
We ensured that all Mantel correlation coefficients reflected correlations between geographical distance and community dissimilarity, rather than similarity, by multiplying correlation coefficients by minus one where necessary (meaning that positive values indicate a typical distance-decay relationship). Partial Mantel statistics (which test for correlation between two matrices whilst controlling for a third) were excluded because they are influenced by other variables included in the test and are, therefore, not easily comparable between studies. All Mantel correlation coefficients were transformed to z-scores using Fisher's z transformation, as recommended by Rosenberg et al. (2013). All subsequent statistical analyses were conducted on the transformed z-scores, whereas the original Mantel correlation coefficients were used to make figures, for ease of interpretation.
In order to test our hypotheses, several variables relating to the context and methodology of each distance-decay relationship were recorded. Details of these variables are described in Box 1.

| St atis tic al analyses
In order to determine whether distance-decay relationships varied between categorical variables (as in hypotheses 1, 2, 4 and 5), we used ANOVAs. In tests where significant differences between groups were found, Tukey's honestly significant difference (HSD) tests were used to determine which groups were different from each other. Linear mixed-effect models were used to test separately for relationships between the strength (correlation between geographical distance and community dissimilarity, expressed as the Mantel correlation coefficient) of distance-decay relationships and single continuous variables, such as spatial extent and community coverage, using a random intercept to account for heteroscedasticity owing to some studies contributing multiple relationships in each model. The p-values and R 2 values were calculated for each term in these models using the approach described by Nakagawa and Schielzeth (2013). The variables spatial extent and community coverage were initially log 10 -transformed to aid model fitting, because they spanned several orders of magnitude. To compare the overall influence of ecological versus methodological factors on microbial distance-decay relationships, we compared two full models (including all relevant variables), using Akaike information criterion (AIC) scores, on a subset of the data for which all variables were recorded successfully. We report the results of all null hypothesis tests in terms of statistical "clarity" rather than "significance", in line with recommendations from Dushoff et al. (2019).

| RE SULTS
Our Web of Science searches resulted in 2,982 unique search results. Manual screening of the abstracts yielded 951 studies that were deemed potentially to be suitable for use in this analysis. A total of 452 Mantel correlation coefficients were obtained successfully from 187 studies represented in 61 journals (Supporting Information Figure S1). Reported Mantel correlation coefficients ranged from −.33 to .95, with a mean of .27 (SE = 0.011), and a summary of the variables collected is shown in Table 2.

| Influence of context on the distance-decay relationship
In order to determine whether contextual factors can influence the strength of distance-decay relationships, the influence of ecological factors, including study taxa, study system and spatial scale, were tested. Within the dataset, the most commonly studied taxa were Bacteria (n = 238), followed by Fungi (n = 93), other microbial Eukaryotes (n = 67) and Archaea (n = 26). We found no clear differences in the strength of distance-decay relationships between these taxa (Table S2, F 5,441 = 0.99, p = .43), although distance-decay relationships incorporating bacterial and fungal communities showed the weakest relationships, albeit only from six studies ( Figure 1).
The distance-decay relationships in our dataset originated from 16 different environments. Of these, five were represented by three or fewer distance-decay relationships and were therefore excluded from further analyses (marsh, n = 3; snow, n = 3; dune, mine and aquifer, n = 1). The most frequently studied environments were

BOX 1 Details of the explanatory variables extracted from each study Resolution
Each distance-decay relationship was categorized into high resolution (high-throughput or Sanger sequencing), low resolution (molecular, e.g., ARISA, TRFLP, DGGE, PhyloChip or PLFA) or low resolution (morphological), based on the ability of the method to distinguish between closely related organisms.

Community coverage
This refers to the depth of sequencing in sequencing-based studies, or the number of individuals counted in morphology-based studies, per sample. For sequencing studies, we recorded the number of sequences after rarefaction or, if this was not given, the average number of sequences per sample. Given that there is no comparable measure of coverage for fingerprinting studies, we excluded them from analyses of community coverage.

Sample coverage
Sample coverage refers to the sample size (e.g., number of communities/samples) of each distance-decay relationship.

Correlation type
Studies were categorized according to the type of correlation coefficient used in the analysis of the distance-decay relationship (e.g., Spearman's or Pearson's correlation coefficient). The type of correlation was recorded only if the type of correlation coefficient was mentioned explicitly.

Study taxon
Each distance-decay relationship was binned into the following broad taxonomic categories based on the taxonomy of the focal organisms: Archaea, Bacteria, Fungi or other microbial Eukarya, or a combination of these categories if a relationship was based on multiple taxa (for example, owing to the use of sequencing primers that detect both Archaea and Bacteria). Fungi were grouped separately from other micro-Eukaryotes owing to their distinct reproductive strategy (e.g., spore production) and the fact that they are frequently targeted using distinct molecular approaches (e.g., via taxon-specific primer sets), in contrast to most other studies of micro-Eukarya.

Spatial extent
This is the maximal distance separating communities (in kilometres). If this was not stated in the text or provided in the supplementary material (e.g., in a geographical distance matrix), it was calculated from the geographical coordinates given, estimated from a plot of the distance-decay relationship or estimated from scaled maps.

Environment
We categorized distance-decay relationships broadly, based on the type of environment (agriculture, air, aquifer, coastal wetlands/ intertidal, desert, dune, forest, glacier, grassland, lake, marine, coastal marshes, mine, river, snow or urban) within which they were sampled. Although these categories are not mutually exclusive, we categorized each study based on which environment best represented the environmental context in which each study was undertaken. For studies on lakes, we also recorded whether relationships originated from a single lake or across multiple lakes.

Habitat
Habitat was the type of environmental material that the sampled communities occupied. We categorized distance-decay relationships as follows: air, host-associated, sediment, snow, soil or water.  Note: For categorical variables, the number of individual distance-decay relationships in each category is shown, whereas minima, maxima, median and mean values are shown for continuous variables. Detailed descriptions of each variable are found in Box 1, and raw data can be found in the Supporting Information (Table S1).
The "All" category consists of studies that incorporated all microbial taxonomic groups, whereas combined categories (e.g., Archaea + Bacteria) incorporate communities from multiple taxonomic groups (e.g., archaeal and bacterial communities). b β mean nearest taxon distance.
c β mean pairwise distance.
grasslands (n = 96), marine (n = 88), and lakes and forests (n = 76 for both). We found clear differences in the strength of distancedecay relationships between environments (Figure 2a, Table   S2; F 10,432 = 3.187, p < .001). Specifically, and perhaps counterintuitively, grassland-based studies had weaker distance-decay relationships than those from aquatic environments, such as lakes, rivers or the marine environment (magnitude of coefficient |coef| > 0.17, p < .05 for all comparisons). Urban environments, which included built environments, such as sewers and indoor air, also produced weak distance-decay relationships, although with only four data points this difference was not statistically clear (p > .43 for all comparisons). We also found no difference in the strength of distance-decay relationships between studies conducted in single lakes compared with those incorporating multiple lakes (F 1,74 = 0.11, p = .74), despite the average spatial extent of multiple-lake studies being c. 32-fold greater than that of single-lake studies (Supporting Information Figure S2).
A more detailed analysis of the interaction between environment type and habitat revealed that although environments (F 9,420 = 3.29, p < .001) and habitat (F 3,420 = 6.65, p < .001) differed from each other, their interaction was not statistically significant (F 4,420 = 1.93, p = .10). In fact, within environments, only marine host-associated and marine water-based distance-decay relationships were clearly different from each other (Figure 2b), with host-associated communities showing significantly stronger distance-decay relationships (coef = 0.35, p < .001).
The spatial extents of recorded distance-decay relationships ranged from 10 cm to > 18,000 km, and minimal spatial extents varied notably across environments and habitats, with terrestrialand soil-based studies often conducted over smaller spatial scales (Supporting Information Figure S3). After accounting for differences between studies, we found no evidence of a statistically clear relationship between the spatial extent of a study and the strength of the observed distance-decay relationship (Table S2, coef = 0.02, marginal R 2 = .020, t = 1.58, p = .11). Finally, given that studies at a larger spatial scale might also incorporate greater sampling coverage, we tested for collinearity between the spatial scale of a study and the sampling coverage, but found no correlation between these variables (⍴ = .06, p = .19).

| Influence of methodological factors on the distance-decay relationship
We grouped community characterization methods according to their ability to distinguish between closely related taxa. There were no clear differences in the strength of distance-decay relationships between different resolution methods (Table S2, F 2,449 = 0.562, p = .57), nor were there clear differences between different molecular methods (Supporting Information Figure S4; F 7,437 = 1.97, p = .06), considering only those methods that had more than four distance-decay relationships across the entire dataset (excluding Ion Torrent, n = 4; PhyloChip, n = 2; and Pac-Bio, n = 1; Figure 3).
Although we observed no differences in distance-decay relationships between different resolution methods, after accounting for study-dependent differences we found a positive relationship between (log 10 ) community coverage and the strength of microbial distance-decay relationships (Figure 4a, Table S2; n = 337, conditional R 2 = .57, coef = 0.06, t = 2.73, p < .01), although the marginal effect of community coverage was weak (marginal R 2 = .04).
The logistics of multiplexing samples on high-throughput sequencing runs means that there is often a trade-off between the community coverage and sampling coverage of a study. However, we found no evidence of negative correlation between these two factors (Pearson's ρ = −.03, p = .54), nor did we detect any clear relationship between the number of samples (log 10 sample coverage) and the strength of distance-decay relationships, even after accounting for study-specific differences with a mixed effects model ( Figure 4b, Table S2; n = 451, coef = −0.06, marginal R 2 = .01, t = −1.40, p = .16).
Choice of similarity index also had a clear impact on the strength of microbial distance-decay relationships. In addition to recording the specific similarity index used, we categorized indices into types (binary, abundance or phylogenetic) to test for broad differences in distance-decay relationships. We analysed the nested interaction between similarity index and index type and found no clear differences between different index types ( Figure 5a; F 2,424 = 1.48, p = .23). However, the interaction between index type and similarity index was significant (F 7,424 = 7.20, p < .001). Post hoc analysis revealed differences between similarity indices within and between index types (Figure 5b). Distance-decay relationships based on the

F I G U R E 1
The strength (Mantel r ) of distance-decay relationships based on different study taxa. A larger Mantel r value indicates a stronger distance-decay relationship. The "All" category consists of studies that incorporated all microbial taxonomic groups, whereas combined categories (e.g., Bacteria/Archaea) incorporate communities from multiple taxonomic groups (e.g., bacterial and archaeal communities) Raup-Crick index were weaker than those based on either Sørensen but of those that did, Spearman's correlation coefficient was more frequently used (n = 86) than Pearson's (n = 62). We found no clear difference in the strength of microbial distance-decay relationships using these two methods (Table S2, F 1,146 = 2.47, p = .12).

| Comparison of contextual and methodological variables
In order to determine whether eco-environmental context or methodological factors better explain the strength of microbial distancedecay relationships, we specified two models, with variables from these two categories, using a subset of the original data for which  (Table 3). Notably, neither model explained a high proportion of the variance, although both AIC and likelihood ratio tests supported both models over a null (interceptonly) model.

| D ISCUSS I ON
Previous research into the spatial ecology of microbial communities has not yielded a consistent distance-decay relationship. Our metaanalysis of 452 microbial distance-decay relationships suggests that the reasons for this lack of consistency are twofold. First, the differing contexts within which studies are conducted contribute variability to reported distance-decay relationships. In particular, we found that differing study systems were associated with variation in microbial distance-decay relationships. Second, methodological

F I G U R E 2 Variation in Mantel correlation coefficients of distance-decay relationships (a) between different environments, and (b)
between types of habitats. Environment categories are arranged from strongest to weakest mean distance-decay relationship (a) (b)

F I G U R E 3
The relationship between spatial extent and the Mantel correlation coefficient of microbial distance-decay relationships. The dashed line represents the fit of a mixed-effects model between the log 10 of spatial extent and Mantel correlation coefficient, with a study-dependent random intercept 1 ×  Warmink et al., 2011), migratory species such as birds (Bisson et al., 2007), wind-blown soil particles (Favet et al., 2013) or bio-aerosols (Joung et al., 2017). The depth profile over which soil samples integrate might also play a role in obscuring distance-decay relationships, because surface soils show stronger distance-decay relationships than deeper ones, probably owing to the greater intensity of dispersing propagules entering and leaving the surface (Li et al., 2020). Furthermore, soils harbour extensive microbial "seed banks" of dormant organisms and/or relic DNA that could weaken the distance-decay relationship (Carini et al., 2016;Lennon & Jones, 2011;Lennon et al., 2018). Dormant cells and relic DNA are not subject to environmental selection, yet they are routinely detected in molecular community assays, which is likely to diminish the perceived effects of spatially structured environmental selection on microbial communities (Locey et al., 2020). Thus, in habitats such as soils, distinguishing dormant from active cells could result in stronger distance-decay relationships than those recorded previously, although evidence of the same effect on distance-decay slopes is mixed (Locey et al., 2020;Meyer et al., 2018). The extent to which this phenomenon plays a role in other environments is also unclear.
Originally, we expected the weakest distance-decay relationships to occur in connected aquatic environments, such as rivers and oceans, or within single lakes, because the movement of water might provide an effective dispersal mechanism, homogenizing microbial communities over larger spatial and environmental distances. In contrast, we found that aquatic communities showed stronger distancedecay relationships than terrestrial systems. Soininen et al. (2007) recorded similar distance-decay rates between terrestrial, marine and aquatic ecosystems, showing that context-dependent distancedecay relationships might be a feature of microbial communities. We also found that the strength of distance-decay relationships was not different in studies based on single or multiple lakes, despite the difference in spatial extents of these studies. Lakes act as habitat islands within a terrestrial matrix; therefore, dispersal limitation and environmental heterogeneity should be greater across multiple lakes than within a single lake, resulting in stronger distancedecay relationships in multi-lake studies. One explanation is that catchment-scale environmental parameters, such as geology, might homogenize environmental conditions across multiple lakes, meaning that environmental distances are similar within and between lakes. Alternatively, other biogeographical processes, such as mass effects, might homogenize communities between hydrologically connected lakes (Lindström & Bergström, 2004), especially where lakes are of different sizes (Reche et al., 2005). Host-associated communities showed relatively strong but variable distance-decay relationships. We suggest that this is caused jointly by the ecol- The scale dependence of various biogeographical relationships is well studied (Bissett et al., 2010;Hillebrand, 2004;Martiny et al., 2011;Soininen et al., 2011), albeit with contrasting results. Soininen et al. (2011) reported that distance-decay relationships of various microbial communities were generally steeper over greater spatial extents, whereas our results suggest that increasing spatial extent does not significantly increase the strength of distancedecay relationships. Given that we analysed distance-decay strength rather than steepness, our results are not necessarily contradictory. A strong distance-decay relationship occurs when, at a given spatial distance, all pairs of communities are equally dissimilar to one another, whereas a steep distance-decay relationship occurs when communities separated by different distances are highly dissimilar to each other. We expected initially that spatial extent might alter the strength of distance-decay relationships because, at greater distances, decreased dispersal and increased environmental heterogeneity should reduce the variance in compositional similarity between pairs of communities (at a given distance). Instead, it could be that the spatial configuration or connectivity of the communities could be more important than spatial extent per se. For example, at a given spatial distance, some pairs of communities could be linked by dispersal and others not, increasing the variation in community similarity at each distance and weakening the distance-decay relationship. In practice, this could occur in lake systems where, at a certain geographical distance, some pairs of communities fall within the same lake and some in different lakes, or when long-distance dispersal vectors link some pairs of communities separated by large distances, but not others, as has been proposed for halophilic microbial communities dispersing on migratory birds, for example (Clark et al., 2017;Kemp et al., 2018). Furthermore, we observed that the minimum spatial extents differed according to the environment in which they were conducted. Studies from terrestrial environments (e.g., grasslands and forests) or those based on soils generally incorporated smaller spatial extents than those based on aquatic systems (with the exception of some host-associated marine studies) or on habitats such as water or air. This could be attributable to the logistics of sampling at small scales. For example, sampling planktonic microbial communities at small (centimetres to metres) scales could be confounded by mixing caused by the sampling process or by tidal movements of water. Additionally, given that many studies analysing microbial distance-decay relationships aimed to discern between environmental and spatial effects on microbial communities, it might be widely assumed that aquatic environments are more  Note: The Akaike information criterion (AIC) and adjusted R 2 (Adj-R 2 ) quantify the likelihood and fit of a model relative to the number of predictor variables, respectively.

TA B L E 3
Comparison of models specified using either contextual or methodological variables homogeneous and/or that microorganisms are not dispersal limited at these scales compared with more physically stable environments, such as soils or sediments.
Distance-decay relationships are frequently interpreted as evidence for neutral community assembly processes, such as dispersal limitation, in the microbial literature. Across microbial taxa, cell size is a trait thought to influence dispersal efficacy (Wilkinson, 2001;Wilkinson et al., 2012;Zinger et al., 2019); therefore, larger microorganisms, such as micro-Eukarya, should show stronger distancedecay relationships than smaller microorganisms, such as Bacteria or Archaea. However, we found no evidence for this, suggesting that phylogenetically structured traits, such as cell size, might be less important than other contextual and methodological factors or that the broad domain-level classification used here does not capture different microbial cell sizes sufficiently. As discussed previously, distance-decay relationships can arise from spatially autocorrelated environmental gradients and from dispersal limitation (Nekola & White, 1999). Therefore, the lack of differences in biogeographical patterns observed at the domain level might be the result of a trade-off between dispersal-related processes and environmental filtering. For instance, bacterial distance-decay relationships might be less strongly influenced by dispersal than environmental filtering, and vice versa for Eukarya. Consequently, these influences might balance out at broad taxonomic levels, resulting in similar biogeographical patterns at the domain level.
In comparison to contextual factors, methodological factors were found to have a greater influence on microbial distance-decay relationships. The development of molecular methods, including high-throughput sequencing platforms, has vastly improved our ability to characterize microbial communities (Caporaso et al., 2012;Roesch et al., 2007). However, these methods differ in their resolution, community coverage, and ability to multiplex large numbers of samples, all of which we hypothesized could strengthen or weaken distance-decay relationships by altering our estimation of microbial β-diversity. In contrast, we observed only a weak relationship between the strength of distance-decay relationships and community coverage, and no clear effects of different resolution methods or the number of samples, suggesting that molecular methodology might not play as large a role in determining microbial biogeographical patterns as previously thought.
The ability to resolve closely related taxa has previously been found to be an important determinant of our ability to detect biogeographical patterns, because such patterns may emerge only when taxa are defined at sufficiently high resolution (Hanson et al., 2012).
Yet, other studies show that bioinformatically altering taxonomic resolution frequently has little effect on microbial biogeographical patterns. For example, increasing the similarity threshold at which operational taxonomic units are defined is thought to be equivalent to increasing the taxonomic resolution (Callahan et al., 2017).
Nevertheless, empirical biogeographical relationships often appear robust to such manipulation, in a variety of taxa and ecosystems (Clark et al., 2017;Glassman & Martiny, 2018;Meyer et al., 2018), supporting our finding that resolution might not be important.
Perhaps most molecular methodologies operate above resolutions at which biogeographical patterns begin to change or, more worryingly, perhaps we are still studying microbial biogeography at too low a resolution.
Aside from resolution, another important variable related to molecular methodology is community coverage. One of the few universal patterns that appears to hold true for most microbial communities is the "long-tailed" species abundance distribution (Dumbrell et al., 2010;Maček et al., 2019;Shoemaker et al., 2017), which is caused by the majority of microorganisms in a community being rare. The rarer taxa in microbial communities also tend to be the least widespread (Clark et al., 2017;Lindh et al., 2017;Meyer et al., 2018;Shade & Stopnisek, 2019); therefore, detecting only the more abundant, widespread organisms would overestimate compositional similarity across communities and, consequently, weaken distance-decay relationships owing to the lower rate of turnover (Meyer et al., 2018). Perhaps of more concern is that even with existing sequencing platforms, our surveys of environmental microbial communities still miss taxa that are vanishingly rare in the environment, such as extremophiles that persist in non-extreme habitats (Low-Décarie et al., 2016). The ability of common species to reflect ecological patterns of the wider community is debated (van Dorst et al., 2014;Galand et al., 2009;Heino & Soininen, 2010) and is linked to a wider debate on the ecological importance of rare species that is far beyond the scope of this work (e.g., Gaston, 2012). However, rare microorganisms are well known to be of crucial importance in the context of environmental perturbations (Low-Décarie et al., 2016;Shade et al., 2014) and in providing ecosystem processes (e.g., sulfate reduction in peat soils, Hausmann et al., 2016;and anaerobic ammonia oxidation in river sediments, Lansdown et al., 2016), and as a result, ignoring them might further disconnect biogeographical patterns from ecosystem-level processes.
Against expectation, we observed no clear differences in distance-decay relationships using different types of similarity metrics, and differences between specific metrics were minimal.
Distance-decay relationships based on the weighted Unifrac distance and the Raup-Crick index were weaker than those based on other metrics. The Raup-Crick index is less influenced by concurrent changes in species richness between communities, and as such, is a purer reflection of shifts in β-diversity (Chase et al., 2011).
Consequently, by removing the potentially confounding effects of differences in richness, the Raup-Crick index is likely to result in more variable estimates of similarity between communities, which would lead to weaker distance-decay relationships.
Phylogenetic metrics, such as Unifrac, cluster communities at a lower resolution, because two communities can be closely related genetically, yet distinct at fine taxonomic resolutions (e.g., species or strain level). For example, Bryant et al. (2008) found that Unifrac similarity was approximately three times higher than the compositional similarity of the same set of bacterial communities. Furthermore, phylogenetic metrics might be inappropriate in less phylogenetically diverse environments (e.g., extreme systems), where phylogenetic diversity can be constrained largely to one taxon (e.g., the Haloarchaea in hypersaline environments), leaving few "phylogenetic degrees of freedom" left to separate communities (Fukuyama, 2019). However, this does not account for the observed difference between weighted and unweighted versions of the Unifrac index, the former of which accounts for relative abundance data of species, whereas the latter is binary (presence/absence based). A criticism of the weighted Unifrac index is that too much weight is placed on abundant taxa (Chen et al., 2012). Given that abundant species are generally more widespread, placing too much weight on them would have the effect of making communities appear artificially similar, exacerbating the effects of using a phylogenetic metric. Given that we observed no difference between binary and abundance-based compositional indices, the differences observed with weighted Unifrac appear to be the result of combining phylogenetic and weighted indices. We suggest, therefore, that weighted phylogenetic metrics might underestimate microbial biogeographical patterns, unless appropriate weight is given to rare and abundant taxa (Chen et al., 2012).
Our analysis of 452 microbial distance-decay relationships also revealed the overwhelming preference of microbial ecologists to use classic dissimilarity indices, such as the Bray-Curtis (n = 218), Jaccard (n = 49) and Sørensen (n = 42) indices. These choices undoubtedly reflect a wider trend in ecology as a whole; however, it is pertinent to draw attention to more recently developed metrics that might be more appropriate given the properties of microbial datasets and the hypotheses being tested. Biotic interactions are drivers of microbial β-diversity (Hanson et al., 2012), yet classic dissimilarity metrics do not account for co-occurrence information in communities. To this end, a new family of metrics described by Schmidt et al. (2017) include information on the average interactions of the taxa present, thereby providing a new approach to integrating co-occurrence data into distance-decay relationships. Microbiome sequencing data also have several characteristics that can be problematic in the analysis of community (dis)similarities. For example, the non-biological variance of sample sizes in sequence datasets can result in statistical artefacts that confound biogeographical relationships (Baselga, 2007). Here, modifications made to some classic indices by Chao et al. (2005) (Gloor et al., 2017), or recently developed metrics, such as the rank bias overlap index, show promise for analysing similarity between communities based on species abundance ranks (Webber et al., 2010). Finally, many similarity metrics have been shown to merge compositional turnover (replacement of species) and nestedness (whereby communities are subsets of one another), thereby blurring the contribution of distinct ecological processes to total community (dis)similarity. To combat this, modified versions of classic indices, such as Jaccard, Sørensen and Bray-Curtis, have been developed, allowing the partitioning of community similarity metrics into their turnover and nestedness components (Baselga, 2010;Podani & Schmera, 2011). We echo the call of Green and Bohannan (2006) for microbial ecologists to exercise more care in their choice of dissimilarity metrics, especially given that many of these new metrics are implemented in popular and freely accessible software, such as R (e.g., Baselga & Orme, 2012).
Overall, our analyses revealed that methodological factors explain more variation in microbial distance-decay relationships than ecological context, but that both sets of factors alter our perception of this biogeographical pattern. Given the importance of methodological factors in determining the strength of microbial biogeographical patterns, it is intuitive to recommend standardization of approaches across studies in order to minimize the statistical signals associated with methodological variance. However, our results show that variance attributable to differing ecological contexts would still hinder the drawing of generalizable relationships across studies.
Instead, we suggest that tailoring methodological choices towards specific ecological contexts might clarify generalizable relationships in microbial ecology. For instance, in searching for consistent relationships between ocean waters and terrestrial soils, it would be unrealistic to sample both at the same spatial grain and extent, because the heterogeneity in the physicochemical environment and the dispersal processes of their microbial communities are fundamentally different. Likewise, we should not necessarily expect the relationships between soils and river sediments to be comparable, because microorganisms in soils can disperse feasibly in any direction, whereas in rivers or streams dispersal would be constrained largely by the direction of flow. Consequently, tailoring methodological approaches, such as the sampling design and/or (geographical) distance measure, to reflect the environmental heterogeneity and dispersal dynamics better between contrasting ecological contexts might enable us to negotiate the hierarchy of interacting factors that obscure macroecological patterns in microbial communities.

| Conclusions
Our meta-analysis of > 450 microbial distance-decay relationships revealed that factors related to the eco-environmental context within which a study was conducted, in addition to the methodology of the study, jointly influence quantification of this classic biogeographical pattern. Against expectation, factors related to molecular methodology had relatively little effect on distance-decay relationships, whereas the choice of dissimilarity metric was more important, highlighting that even after using robust, modern molecular methods, analytical choices have the power to obscure or enhance biogeographical patterns. We detected clear relationships between microbial distance-decay relationships and various contextual and methodological variables, yet combining these variables explained only a modest amount of variation in our dataset. This lack of explanatory power indicates that microbial biogeographical patterns depend on a number of contextual variables beyond those analysed here. In future, we suggest that microbial ecologists should place greater emphasis on quantifying habitat connectivity to gain a better understanding of the dispersal processes that lead to spatial patterns, such as the distance-decay relationship. Additionally, we recommend that experimental designs and data-collection strategies should be replicated spatially, taxonomically, temporally or any combination thereof where possible (e.g., Alzarhani et al., 2019;Meyer et al., 2018;Zinger et al., 2019), facilitating a more generalized understanding of the variation in spatial microbial community patterns. The question of whether microbial communities show spatial patterns such as distance-decay relationships should be laid to rest; disentangling the web of ecological and environmental drivers that shape these patterns is the next challenge in microbial biogeography.

This work was funded by a UK Natural Environment Research
Council (NERC) quota award DRC studentship (471757).

AUTH O R CO NTR I B UTI O N S
All authors planned the study and contributed to manuscript preparation. D.R.C. carried out all data collection and data analyses.

DATA AVA I L A B I L I T Y S TAT E M E N T
Full raw data analysed in this manuscript (provided in Supporting Information