Niche‐neutral theoretic approach to mechanisms underlying the biodiversity and biogeography of human microbiomes

Abstract The human microbiome consists of five major regional biomes distributed in or on our five body sites including skin, oral, lung, gut, and reproductive tract. Its biogeography (the spatial and temporal distribution of its biodiversity) has far‐reaching implications to our health and diseases. Nevertheless, we currently have very limited understanding on the mechanisms shaping the biogeography, since it is often rather difficult to determine the relative importance of drift, dispersal, speciation, and selection, the four processes (mechanisms) determining the patterns of microbial biogeography and community dynamics according to a recent synthesis in community ecology and biogeography. To disentangle these mechanisms, I utilize multisite neutral (MSN) model and niche‐neutral hybrid (NNH) model to analyze large number of truly multisite microbiome samples covering all five major human microbiome habitats, including 699 metacommunities and 5,420 local communities. Approximately 89% of metacommunities and 92% local communities exhibit patterns indistinguishable from neutral, and 20% indistinguishable from niche‐neutral hybrid model, indicating the relative significance of stochastic neutral forces versus deterministic niche selection in shaping the biogeography of human microbiome. These findings cast supporting evidence to van der Gast's revision to classic Bass‐Becking doctrine of microbial biogeography: “Some things are everywhere and some things are not. Sometimes the environment selects and sometimes it doesn't,” offering the first educated guess for “some” and “sometimes” in the revised doctrine. Furthermore, the logistic/Cox regression models describing the relationships among community neutrality, niche differentiation, and key community/species characteristics (including community diversity, community/species dominance, speciation, and migration rates) were constructed to quantitatively describe the niche‐neutral continuum and the influences of community/species properties on the continuum.

their hybrid integrations have been the premier ecological theories for investigating the mechanisms underlying community assembly and dynamics in macro-ecology (Fisher & Mehta, 2014;Haegeman & Etienne, 2017;Hubbell, 2001;Jeraldo et al., 2012;Kalyuzhny et al., 2014;Matthews & Whittaker, 2014;McGill, 2003;McGill et al., 2006;Noble & Fagan, 2015;Ofiteru et al., 2010;Pigolotti & Cencini, 2013;Rosindell et al., 2011Rosindell et al., , 2012Stokes & Archer, 2010;Tang & Zhou, 2013;Tilman, 2004). Compared with the studies of neutral theory in macrobial ecology, there have been relatively few applications of neutral theory in microbial communities (e.g., Curtis & Sloan, 2004, Sloan et al., 2006Sloan et al., 2007, Zhang et al., 2010, Costello et al., 2012, Venkataraman et al., 2015, Chen et al., 2017, Ma, 2020. I subscribe to the Vellend (2010) identification of the four key processes (mechanisms) that shape the community structure and dynamics, including drift, dispersal, speciation, and selection. Hanson et al. (2012) proposed that drift, dispersal, mutation, and selection govern the formation and maintenance of the microbial biogeographic patterns on ecological and evolutionary scales that are hardly separable. Hubbell's (2001) unified neutral theory of biodiversity and biogeography (UNTB) was built on the combination of the former three in the Vellend (2010) and Hanson et al. (2012) lists, excluding selection (Rosindell et al., 2011(Rosindell et al., , 2012. The first objective ( Figure 1), also the primary objective, of this article was to evaluate the importance of dispersal, drift, speciation, and selection in driving the community assembly and diversity maintenance, in which lots of advances have been made in the last decade, but significant issues are still unsettled (Adams et al., 2013;Clark, 2009;Foissner, 2006;Gilbert & Levine, 2017;Grossart et al., 2010;He et al., 2012;Hu et al., 2006;Liu & Zhou, 2011;Lowe & McPeek, 2014;May et al., 2011;Svensson et al., 2018). For example, it is still controversial whether the neutral and niche theories can produce similar patterns via different mechanisms; alternatively, both the theories may interact to jointly produce the observed diversity patterns (Tang & Zhou, 2013). We realized the extensive challenges in resolving these and similar issues, and resorted to two recent additions of hybrid and neutral models (Harris et al. 2017, Tang & Zhou, 2013 supported with extensive truly multisite HMP datasets to get the best-educated "guess" for the proposed objective. The multisite neutral (MSN) model proposed by Harris et al. (2017) was used for implementing the UNTB requiring truly multisite samples; that is, the required samples are taken from spatially connected local communities (sites) on ecological timescales.
Theoretically, Harris et al. (2017) represent a major advance in dealing with intractable computational algorithm for the parameter estimations of the multisite UNTB model with different immigration rates. The advance allows for more accurate parameter estimation and reliable testing of neutral theory. Obviously, its advantage can only be leveraged with truly multisite datasets. A minor issue in Harris et al. (2017) was that the gut microbiome datasets utilized to demonstrate their MSN model are not truly multisite samples.
Instead, the gut microbiome samples they used were taken from independent individuals, and at ecological timescales, the occurrence of microbial migrations among those individuals may be unrealistic, Natural communities are typically structured by stabilizing niche differences and competitive asymmetries among species, which are expected to modulate the effects of drift and generate communities that are distinctly non-neutral (Gilbert & Levine, 2017).
Consequently, the level of neutrality exhibited by a community can act as a measure of drift effect in the community, and we take this advantage of the UNTB in this study. Therefore, my estimation of the drift effect with neutral theory is expected to be conservative (reliable). Furthermore, it is virtually a consensus that there is a continuum between strict neutrality and deterministic niche-based models, in which ecological drift can operate at different levels (Gotelli & McGill, 2006;He et al., 2012;Hu et al., 2006;Hubbell, 2001Hubbell, , 2005Svensson et al., 2018;Vellend, 2010). With this consideration, the Tang and Zhou (2013) niche-neutral hybrid model offers an ideal augment to the UNTB in determining the relative importance or balance of stochastic neutral drift and dispersal versus deterministic selection in the neutral-niche continuum. Moreover, I argue that, in microbial communities, the stochastic neutral drifts (including stochasticities in demography and dispersal) should certainly exist, at least, for the following microbial characteristics: The simple cell division of bacterial reproduction makes bacterial species more likely equivalent with each other than macrobial species (e.g., different insect species may have very different egg loads); and perhaps to a less extent, similarly small sizes and arguably similar dispersal abilities also support the assumption of species equivalence-the core of neutral theory.
The second objective of this study was to look into the implications of the first objective (i.e., neutrality vs. niche balance) to microbial biogeography (Figure 1). A central theme still hotly debated in microbial biogeography research is whether microbes are cosmopolitanism or endemism. The traditional view in microbiology, first proposed by Beijerinck (1913) (cited in Foissner, 2006 and further refined by Baas-Becking (1934), that "everything is everywhere, but the environment selects who stays locally" assumes high dispersal rates of microorganisms, leading to their ubiquity. The endemism (biogeographically restricted natives) is essentially about selection.
Therefore, if microbial biogeography is totally determined by selection, then dispersal and drift are unlikely to be key processes in microbial communities. The present study with the neutral theory tool will help us evaluate the importance of dispersal and drift in driving microbial community assembly and dynamics. Furthermore, I also harness the power of niche-neutral hybrid analysis to determine the importance of local selection via niche differentiations. This study is expected to offer the first educated guess for "some things" and "sometimes" in van der Gast (2015) recent revision to classic Bass-Becking doctrine of microbial biogeography: "Some things are everywhere and some things are not. Sometimes the environment selects and sometimes it doesn't." The debate on the cosmopolitanism versus endemism also touches an even more fundamental issue in current growing acceptance and incorporation of traditional ecological principles and theories into microbial ecological research during the last decade (Carbonero et al., 2014;van der Gast, 2015). Much of the interests in translating principles and theories from macro-ecology to microbial ecology has been focused on the question of microbial biogeography (van der Gast, 2015;Hanson et al., 2012;Martiny et al., 2006). Therefore, the issue surrounding the microbial biogeography becomes a test bed for developing an inclusive ecology. The core of the issue is whether or not the uniqueness of microbial biology and physiology is so strong that an inclusive ecology is valid. Testing F I G U R E 1 A diagram illustrating the study design and the relationships among its various aspects including theoretic background and models, priori, test datasets and the research objectives Three inter-connected objectives: (1) Assessing and interpreting the relative importance of drift, dispersal, speciation and selection in driving community assembly and diversity maintenance; (2) Inferring the biogeographic implications of the first objective; (3) Analyze the relationships between community neutrality, niche differentiation and indexes of key community and species characteristics.
Test datasets: Three datasets of the human microbiomes covering all five major human microbiome habitats), including 699 meta-communities and 5420 local communities.
Background: Microbial biogeography (spatial and temporal distribution of biodiversity) is shaped by four processes: Drift, Dispersal, Speciation and Selection (Vellend 2011, Hanson et al. 2012. for assessing the effects of the first three processes. Niche-neutral hybrid model can be harnessed to assess the effects of deterministic selection forces. A priori statement: Human microbiomes should form a continuum between strict neutrality and deterministic niche-based models, in which stochastic drifts can operate to different levels. It is further argued that (i) the simple cell division of bacteria makes bacterial species more likely equivalent with each other than macrobial species (e.g., different insects may have very different egg loads); (ii) perhaps to a less extent, similarly small sizes and arguably similar dispersal the neutral theory and niche-neutral balance, in particular the biogeography inferences from the test, should also be meaningful for developing the inclusive ecology theoretically. The third and less ambitious objective (Figure 1) of this article was to analyze the relationships between community neutrality, niche differentiation, and key community/species characteristic parameters including community diversity, community and species dominance, speciation, and migration rates. (mechanism) synthesis of community dynamics and biogeography, and the human microbiome datasets used to implement the study design.

| The study design and human microbiome datasets
As shown in Table 1, I use three datasets of the human microbiome, including one from the HMP (human microbiome project).
The datasets are essentially the species abundance distribution data in the form of marker gene abundance or 16S rRNA reads. A total of 699 metacommunities consisting of 5,420 local communities are included in the datasets. In this study, a metacommunity is referred to all microbiome samples (sites) belonging to a single individual subject, or to all samples belonging to one of the three major microbiome habitats (oral, skin, and vaginal) of the same individual subject; a local community is a "component" (corresponding to a sample or site) of metacommunity.

| Harris et al.'s (2017) multisite neutral model (MSN)
The UNTB conceptually distinguishes local community dynamics from metacommunity dynamics, but both are driven by similar neutral processes (Hubbell, 2001(Hubbell, , 2006. Specifically, many local communities are connected to a single metacommunity with different immigration rates. Harris et al. (2017) developed an efficient Bayesian fitting framework by approximating the neutral models with the hierarchical Dirichlet process (HDP) (Teh et al., 2006). Harris et al.'s (2017)  Hence, the neutrality tests are performed at both metacommunity level and local community level.

| Tang & Zhou's (2013) niche-neutral hybrid model (NNH)
Tang and Zhou (2013)    per capita birth to death rates (x) and immigration parameter (γ) vary among species from different niches. In the case of the multisite microbiome datasets used in this study, I treat each site as a niche occupied by a local microbial community and fit the neutral model for each local community. The p-value from the chi-squared test is then utilized to determine whether or not Tang and Zhou's (2013) hybrid model is suitable for a set of microbial communities sampled from the multiple microbiome sites of a human individual. Specifically, at the metacommunity level, if p-value > .05, then the metacommunity satisfies the NNH and the metacommunity assembly is co-driven by both niche and neutral processes, which also implies that the metacommunity itself does not satisfy the neutral theory, but within each niche, the local community is neutral. If p-value < .05, the metacommunity does not satisfy the NNH, which also implies that within each niche, the local community is not neutral either, and the metacommunity assembly is solely influenced by the niche process.
It should be noted that the choice of threshold p-value (=.05) in testing the goodness of fitting to the NNH model here and also in the MSN previously is somewhat arbitrary, which was based on conventions (usually the recommendations by the model inventors) in testing the neutral theory or niche-neutral hybrid models (e.g., Hubbell, 2001, Harris et al., 2017, Tang & Zhou, 2013

| Logistic regression and Cox regression modeling for the relationships among neutrality, niche differentiations, and key community/species characteristics
The previously described MSN/NNH modeling approaches are based on the theoretically derived MSN (Harris et al., 2017) and NNH (Tang & Zhou, 2013) models, which allow for testing their goodness of fitting to the human microbiome datasets and assessing the relative importance of stochastic neutral forces (drift and dispersal) and deterministic selection forces (niche differentiations). To deal with the possibility that neither MSN nor NNH can fit the datasets satisfactorily, a contrastingly different modeling strategy from the previous sections, that is, a data-driven approach, is adopted to evaluate the factors influencing the performance of MSN/NNH models.
The potential insights from this alternative approach can help to understand how key community/species characteristics may influence the neutrality and/or niche differentiations.
The standard logistic regression (LR) (Kleinbaum & Klein, 2010) or Cox regression (Cox proportional hazard model) (Kleinbaum & Klein, 2012), if the former is not applicable, is used to conduct the above-designed analysis. The reason why both the regression approaches (rather than regular regression approaches) were selected is because of the fact that the prediction from the LR/Cox modeling is probability, which is advantageous in dealing with the good-   Tables S1-S2, Tables S3-S14). Therefore, Tables S1 and S2 offer windows to inspect the parameters and con- Similarly, to test the neutrality at the local community level, a pseudo-p-value P L is computed. If P L > 0.05, the local community satisfies the neutral model (see Tables S3-S8 (full results) and Table S1 (selected samples for better explanation) for fitting the MSN model).

| MSN (multisite neutral) and NNH (nicheneutral hybrid) modeling
With Tang and Zhou's (2013) NNH model (see Tables S9-S14 for the detailed results and  Figure 3a) below shows the passing rates for testing the MSN and NNH models, on the left and right sections, respectively. For each model, the passing rate at metacommunity and local community level is listed separately.
As expected, the passing rates for neutrality tests at local community level are higher than those at the metacommunity level. The difference is approximately 4% for the MSN and 23% for the NNH, respectively. This should be expected given that local communities are more homogenous in their habitats. In the case of MSN, it indicates that neutrality is more likely to maintain at the local community scale than at the larger metacommunity scale.
The results of testing the NNH model (see Figure 2b for an ex-  Table 3 (also see Figure 3b).  Table 3 and Figure 3b displayed the cross-verifications of both the modeling approaches.

| Logistic/Cox regression analysis
With the selected community/species characteristics (indexes) and performance metrics ( are strongly related to the balance between stochastic neutral versus deterministic selective niche forces. Note that some of the factors are the exhibitions of neutral or niche effects, rather than the causes, and hence, the term "related" is used. I first identified the microbiome characteristics (factors) that are strongly related to the neutrality at the meta-community level by using the LR with subset selection option (Table S15). Four significant factors, community dominance, species richness (Hill numbers at q=0), the fundamental biodiversity number (q) and the neutrality of local community were judged to be significant (p<0.0001) in influencing the niche-neutral balance. The first two factors were positively related to the non-neutrality at the meta-community level.
That is, high dominance or high species richness are negatively correlated with meta-community neutrality. The concept and metric of community and species dominance were proposed by Ma & Ellison (2018, 2019, and high community dominance indicates lower community evenness (or heterogeneity). Obviously, this positive relationship between community dominance and non-neutrality should be expected because high dominance may signal strong asymmetric interactions, i.e., a property of non-neutrality or the opposite of neutrality. Similarly, the other two negative relationships (fundamental biodiversity number q and species dominance) displayed in Table S15 should be expected. Table S17 showed the performance of the logistic model described in Table S15 in predicting the neutrality at the F I G U R E 3 (a): Bar charts showing the passing percentages of testing the MSN or NNH for each human microbiome dataset, respectively: For each group (dataset), the passing percentages for both metacommunity and local community of each model (MSN or NNH) were exhibited (see Table 3A for detailed numerical information). (b) Bar charts showing the passing percentages of samples that passed MSN only, NNH only, both MSN and NNH, and none, for each human microbiome group (dataset), respectively (see Table 3B for detailed numerical information). Note that the gut and lung datasets were from two projects different from the other HMP datasets, and caution should be taken in interpreting their differences. In other words, the differences among the three groups of datasets might be complicated by possibly different experimental procedures such as sequencing platforms meta-community level. The overall precision of predicting the meta-community neutrality with the four characteristics (factors) was 91.6%. Table S17 also included the prediction of the LR modeling at the local community level (see Table S16) as explained below.
For the MSN at the local community level, community dominance, species dominance, immigration rate (M-value), and the fundamental biodiversity number (θ) were identified to be significant in influencing the neutrality at local community level (Table S16).
The precision level of predicting local neutrality was 96.4%, which is higher than that at the metacommunity level (Table S17).
For the NNH model, Table S18 shows that community dominance, birth-to-death ratio, migration of each niche, and neutrality at local community level significantly influence the probability of passing NNH model at metacommunity level (i.e., the niche differentiations at the metacommunity level). Table S19 shows that the prediction precision of the LR modeling for NNH model at the local community level was 91.9%.
As to the testing of the local community neutrality with the NNH model, I adopted Cox regression model (with variable subset selection), rather than the LR, given that the response variable is not 0/1 at the local community level (LR is only applicable for 0/1 variable) (Table S20). Table S20 shows that community dominance, birth/death ratio, M-value (migration rate from MSN model), species richness, Shannon index, and the success of NNH model at metacommunity level are significant factors in influencing the neutrality at the local community level with the NNH modeling. Although I successfully built the Cox regression model, given the high standard error associated with the regression coefficient of M, I caution that its interpretation needs further investigation, and therefore, no local community-level prediction was made in this case.

| D ISCUSS I ON
As shown in Figure 1, the two primary objectives and a third minor one were as follows: (a) to evaluate the niche-neutral continuum; (b) to infer the biogeographic implications of (a); and (c) to discuss the influence of major community/species characteristics on the continuum. Here, we continue to discuss the first two objectives by putting them in the context of the existing literature. In addition, we also discuss some miscellaneous but relevant topics including the microbiome habitat-specific issues and the comparisons with what I termed "single-site" neutral model-fitting the Hubbell (2001) neutral theory model to the species abundance distribution data obtained from a single microbiome sample, which has been used in virtually all existing studies on the microbiome.

| The relative importance (balance) between niche and neutral forces in shaping nicheneutral continuum
The truly multisite microbiome datasets, in which dispersal (migration) can occur in ecological times, may even be in vivo (e.g., from one skin site to another in distance of centimeters), and the truly % is the percentage that passed the test. See Figure 3b for the graphic display of the percentage results in this table. a TA B L E 3 Comparative summary of the performances of MSN and NNH models fitted to the human microbiome datasets of 699 metacommunities, summarized from Tables S3-S14 included in the OSI-2 (Supplementary Information S2) a failed to fit. This simply indicates the complexity of the problem, and further analysis with more powerful and flexible models is needed.
Alternative data-driven modeling approaches with the LR and Cox model (Tables S15-S10 in the OSI-1) may offer insights into designing more comprehensive and powerful models for interpreting the human microbiome assembly. Those data-driven models may identify the community/species-level characteristics that significantly influence the model performance such as MSN and NNH, which can be particularly useful when more datasets are accumulated since LR is a primitive method of powerful deep learning. and shaping the biogeography of the human microbiome. Since dispersal may not be wholly neutral and may even be treated as an adaptive mechanism to niche differentiations, this study cannot assess the exact level of dispersal in shaping the community assembly. Alternative modeling approaches such as those developed by Janzen et al. (2015) can be used to distinguish multiple syndromes of dispersal.
According to the Gilbert and Levine (2017) experimental study, the importance of ecological drift in structuring diversity in fragmented ecosystems is far greater than predicted by neutral models. Svensson et al. (2018) emphasized that ecological drift can operate even if species are not completely equivalent, and consequently, species are not strictly neutral. Neutral theory of biodiversity assumes that dispersal is stochastic and equivalent among species. Lowe and McPeek (2014), and also Clark (2009) argued that the neutral dispersal assumption should be rejected on principle because dispersal can be a species-specific process (property), and it evolves by natural selection. However, there is little doubt that dispersal can be "partially neutral" in the sense that stochastic and extrinsic forces influence dispersal in many species and systems, and dispersal affects biodiversity independent of adaptive mechanisms of coexistence. Liu and Zhou (2011), through simulation, showed that asymmetric dispersal could lead to deviations from neutrality. These existing arguments suggest that, if I use the neutrality levels revealed by MSN and NNH as educated guess of the drift and dispersal, my estimation should be on the safe side (conservative).

| Cosmopolitanism versus endemism debate and microbial biogeography
The traditional view of Bass-Becking (1934) doctrine-"Everything is everywhere, but the environment selects who stays locally"-was largely based on some general traits microbes posses including tiny individual sizes, large population sizes, fast reproduction, consequently their easy long-distance dispersal, and low chance of local extinction (Barreto et al., 2014;van der Gast, 2015 "some" should be a half of time or occasions, at the minimum. I further hypothesize that there is room for both evolutionary and ecological forces to shape the distribution of biodiversity (Hanson et al., 2012), and perhaps the real challenge is to determine at what scale the cosmopolitan or endemism (native) dominates. It is very likely that the scale may be a continuum given that the evolutionary time and ecological time of microbes may overlap with each other partially because microbes reproduce much faster (20 min could be enough for bacterial to complete one generation) and their evolution is on fast tracks (e.g., Baym et al., 2016). In a follow-up study, I will further explore this topic by analyzing time-series data of microbial community dynamics.
Quantifying dispersal is not easy. As mentioned previously, dispersal is often treated as purely stochastic and extrinsically controlled, including neutral theory of biodiversity. There is a need to investigate the relative importance of neutral and adaptive forces in shaping individual dispersal propensities and distances, population-level dispersal distributions, and resulting effects on populations and communities (such as population density or community diversity) (Lowe & McPeek, 2014). Again, Janzen et al.'s (2015) approach is worthy of trying in quantifying dispersal from a multidimensional perspective.

| Habitat-specific differences in the human microbiomes
Here, I briefly discuss the differences among five major microbiome habitats in their passing rates with both MSN and NNH testing, and consequently, the relative balance between stochastic neutral forces (drift and dispersal) and deterministic niche forces. Regarding the neutrality test with the MSN model, the neutrality passing rates among different microbiome habitats ranged between 81.4% and 100% at the metacommunity level, and 72.7 and 100% at the local community level. In consideration of their differences in sample sizes (N = 11-172; Table 2) and possible differences in metagenomic sequencing operations, I do not draw any further conclusion about the apparent differences.
Regarding testing the niche differentiations among local communities with the NNH model, the oral and skin exhibited the passing rates of NNH lower than 10% (5.2%-8.9%), while vaginal, gut, and lung exhibited much higher passing rates of NNH (36.4%-53.3%).
The face value of this contrasting difference is that the latter three microbiome habitats should have bigger niche differentiations internally. This face value should be true, at least, for the gut, which is obviously the most highly differentiated microbiome habitat, and also host the most complex part of the human microbiome.  Li and Ma (2016), which used the singlesite sampling formula (Etienne, 2005(Etienne, , 2007(Etienne, , 2009Hankin, 2007), is much larger than that from MSN or NNH models obtained in this study. This result confirmed the validity of Harris et al.'s (2017) recommendation for using their HDP (hierarchical Dirichlet process)based MSN model. Tang and Zhou (2013) study further verified Harris et al.'s (2017) recommendation. I therefore conclude that the neutrality of the human microbiome is rather strong given that approximately 89% metacommunities passed the MSN test, and even more (approximately 92%) local communities were neutral ( Table 2).

| Comparisons with existing studies on the neutrality of human microbiomes
As to the comparisons with other existing studies on the human microbiome neutrality, majority of the previous studies obtained similar conclusions as Li and Ma (2016), and were discussed there.
A general suggestion is that Harris et al.'s (2017) HDP-MSN fitting framework for the UNTB should be utilized whenever multisite data are available.

ACK N OWLED G EM ENTS
This study received funding from the following sources: a National

CO N FLI C T O F I NTE R E S T S
The author declares no competing interests.

DATA AVA I L A B I L I T Y S TAT E M E N T
All datasets analyzed in this study are available in public domain and see Table 1 for the access information. No new datasets are generated from this study.