Drift dynamics in microbial communities and the effective community size

The structure and diversity of all open microbial communities are shaped by individual births, deaths, speciation and immigration events; the precise timings of these events are unknowable and unpredictable. This randomness is manifest as ecological drift in the population dynamics, the importance of which has been a source of debate for decades. There are theoretical reasons to suppose that drift would be imperceptible in large microbial communities, but this is at odds with circumstantial evidence that effects can be seen even in huge, complex communities. To resolve this dichotomy we need to observe dynamics in simple systems where key parameters, like migration, birth and death rates can be directly measured. We monitored the dynamics in the abundance of two genetically modified strains of Escherichia coli, with tuneable growth characteristics, that were mixed and continually fed into 10 identical chemostats. We demonstrated that the effects of demographic (non-environmental) stochasticity are very apparent in the dynamics. However, they do not conform to the most parsimonious and commonly applied mathematical models, where each stochastic event is independent. For these simple models to reproduce the observed dynamics we need to invoke an "effective community size", which is smaller than the census community size. This article is protected by copyright. All rights reserved.


Introduction
It is widely accepted that ecological drift plays a role in shaping all biological communities (Vellend and Agrawal, 2010). However, its relative importance remains a source of debate with potential important consequences, especially for quantifying the uncertainty in strategies to manipulate microbial communities for engineering, medical of agricultural practices (Zhou and Ning, 2017). The doubt about the importance of demographic randomness, which causes drift, is particularly pertinent to microbial communities because, theoretically (Nee, 2005;Ricklefs, 2006), the importance diminishes with increasing population size. Thus, for example, Nee (2005) and Ricklefs (2006) considered the dynamics resulting from a stochastic birth-death model, where each event is independent and randomly distributed in space and time. They showed that in a large population the drift associated with randomness would occur over unfeasibly long timescales and, therefore, would be unobservable and entirely swamped by the environmental effects on the population. Most microbial communities of consequence to us, such as those in soil, the gut, wastewater or infections, comprise huge numbers of individual organisms.
Yet this theoretical argument is at odds with many results from researchers who have just gone ahead and deployed stochastic birth-death processes to describe community assembly (Sloan et al., 2006;Woodcock et al., 2007;Morris et al., 2013;Adair et al., 2018;Ling et al., 2018;Mo et al., 2018). The interest in random effects was initially brought into focus, and simultaneously polarized (McGill, 2003), by neutral theory (Hubbell, 2001), which is an extreme case where only drift is at play in local communities. So, for example, most theoretical manifestations of the purely neutral assumptions yield the results that the abundance of an individual taxa in an isolated community is beta distributed (Sloan et al., 2006) and that the taxa-abundance distribution of all taxa in that community adheres to a Dirichlet distribution . It has been shown, using a suite of different methods, that these distributions fit data from a single community or multiple, similar, but geographically distinct communities for single snap-shots in time (Volkov et al., 2003;Sloan et al., 2006;Woodcock et al., 2007). However, fitting stationary taxa abundance distribution to observations will only ever provide circumstantial evidence of the importance of ecological drift. The models may fit but for the wrong reasons; a range of different processes could yield very similar stationary distributions. There are at least as many studies where the neutral models do not fit the data (Clark and McLachlan, 2003;Dornelas et al., 2006). In addition, the assumption of neutrality, where all taxa have the same specific birth and death rate, is at odds with biologists' experience in the laboratory. The fact that neutral models deliver an extreme manifestation of the effects of drift is sometimes overlooked and where neutral models fail to describe patterns in ecology the whole concept of demographic stochasticity and thus drift is dismissed in favour of alternative models built on entirely different axioms (Tilman, 2004). Yet to discard the building block of a random birth-death process as a collateral effect of dismissing the neutral assumption would seem hasty. The drift effects of demographic stochasticity may still be manifest in the dynamics of communities where growth rates are differentiated by species.
The dichotomy of theoretical slow drift in large communities alongside circumstantial experimental evidence of its importance in shaping community composition can only be resolved with richer datasets. In particular, time series of the population dynamics in systems where some of the key parameters that determine the magnitude and speed of the random effects can be directly measured. Time series of abundance of bacteria in communities are quite rare. Ofiţeru et al. (2010) were able to calibrate a stochastic birth-death process for microbes in a wastewater treatment plant and provide stronger evidence that somehow the demographic stochasticity was indeed important, even in large communities. However, the complexity of that environment made it difficult to discover why the phenomena might be observed in such a large population; the key parameters could only be inferred. Cira et al. (2018) created time series using relatively complex synthetic communities of bacteria and again inferred parameter values. In a forest, where a comprehensive time series of tree species exist, again using model inference alone it appearred that the role of demographic stochasticity was difficult to disentangle from environmental variance (Chisholm et al., 2014). Here, we observe the dynamics in a set of very simple experiments where the environment, as far as is possible, stays constant and the factors that are known to affect the rate of drift are controlled and measured.
Birth, death and immigration rates can be controlled in chemostats, so we describe experiments where we operate 10 identical chemostats in parallel and record the dynamics of two strains of E. coli. The strains only differ in having a single copy of a different antibiotic resistance gene added to exactly the same position in their genomes. They behave neutrally when mixed without antibiotics or can be disadvantaged by adding one or other of the antibiotics. We then fit a modest adaptation of the birth-death-immigration model in Ofiţeru et al. (2010) to the experimentally observed population dynamics to ascertain whether drift explains the dynamics in the chemostats. There are two random effects in our experimental data: the drift, which is a fundamental component of the real dynamics, and experimental errors. Drift is manifest as a random walk where steps are a function of the strain's abundance and successive data in a time series are correlated. Experimental error is manifest as the addition of independent realizations of a normally distributed random variable to the true population numbers at each time point. Our model and fitting method distinguishes between these. We can independently verify the estimate of measurement error from repeat measurements and thus it is possible to reliably quantify the drift. We believe our approach is a uniquely rigorous exploration of drift that is particularly suited to the tractable microbial community dynamics in our experimental setup.

Two bacterial strains
Truly neutral species would have identical growth rates in all environmental conditions. Unfortunately, it would be extremely difficult to guarantee such equivalence in two naturally occurring species. Thus, we created two genetically modified strains of E. coli from a common laboratory parent strain (K-12) by replacing the single-copy nonessential pepA gene in its genome with two different antibiotic resistance genes: one conveying chloramphenicol resistance and the other kanamycin resistance. This was achieved through recombination-mediated genetic engineering using bacteriophage lambda-encoded recombination enzymes (Supplementary Note 1), which is reported to be precise and efficient (Datsenko and Wanner, 2000). Placing the antibiotic resistance gene in the chromosome, rather than on a plasmid, offers more stability. We tested that the single copy pepA gene was replaced by only one copy of the antibiotic resistance genes (Supplementary Note 2) and, hence, that we could quantify the abundance of the two modified strains, when they were mixed in known ratios, using qPCR (Supplementary Note 3). This meant we could be confident in qPCR estimates of the abundance of the two populations when mixed and fluctuating in the subsequent chemostat experiments.
In benign conditions, the growth characteristics of the two modified strains should be identical. We showed that this was indeed the case by estimating the growth in triplicate batches for different substrate (tryptone broth) concentrations at 37 C (Supplementary Note 4). The relationship between growth rate and substrate concentration was then characterized using a Monod kinetic model; and the maximum growth rate, μ max ffi 0.85 h −1 , and the half-saturation constant, K sat ffi 2.5 g L −1 , for both strains. We similarly fitted Monod curves to the relationship between substrate concentration and growth rate when the strains were inhibited by the addition of antibiotics. In the presence of 0.5 μg ml −1 chloramphenicol the growth rate of the chloramphenicol resistant, μ c , strain is two times that of the kanamycin-resistant strain, μ k , across a wide range of substrate concentrations (Supplementary Note 4). The kinetic curves also allowed us to design the chemostat setup using a conventional deterministic mathematical model of chemostats to ensure that the biomass was turned over and retained in the system.
A schematic of the experimental system is given in Fig. 1. It was designed so that different experimental setups could be implemented by simply switching on or off pumps. Three separate experiments were conducted: 1. The first experiment aimed to maximize the chance of observing neutral dynamics, if they occur. In this first experiment, neither of the strains were disadvantaged by the addition of antibiotic, so pump B was not active. The A pumps were active so that the two large chemostats (2 L) were fed with 10 g L −1 tryptone-broth at a rate of 0.6 ml min −1 which maintained the pure cell cultures at similar densities. These pure cultures were pumped at a rate of 0.55 ml min −1 into two separate manifolds and divided evenly between 10 smaller chemostats where they were mixed and continuously stirred. No additional substrate was added and so the mixed cultures in smaller chemostats grew in the residual substrate entering with the bacteria from the large chemostats. The small chemostats overflowed when the volume reached 250 ml and so the volume was held constant at that value and the volumetric flow rate of influent and effluent were similar, not accounting for evaporation. The mass flux of residual substrate ensured that there was growth in the small chemostats so the cell density in the effluent always exceeded that of the influent. The bacteria entering In experiment 1, which potentially creates neutral dynamics in the 10 small co-culture chemostats, substrate and migrants entered via the pumps labelled A; the one labelled B was off. In experiment 2, the initially inoculated chemostats were fed substrate with 0.5 μg L −1 chloramphenicol via pump B, but no new migrants were allowed to enter, so the A pumps were off. In experiment 3, substrate and migrants entered via the A pumps and 0.5 μg L −1 chloramphenicol in substrate was fed in via pump B. [Color figure can be viewed at wileyonlinelibrary.com] the small chemostats will be referred to as immigrants and those in the effluent as emigrants. The entire experimental setup was housed in a temperaturecontrolled chamber that was maintained at 37 C. The chemostats were operated identically for 7 days. After allowing 2 days for the chemostats to achieve steady cell densities, the large pure-culture chemostats and the effluent from the 10 smaller chemostats were sampled every 2 h during the 5 remaining days (08.00-20.00 h) and the cell density (cells ml −1 ) was enumerated by QPCR. 2. In the second experiment, the 10 smaller communities were inoculated with a 50:50 mix of the two bacterial strains; the stabilizing force of immigration was removed (pumps A switched off); the small chemostats were fed bacteria-free Tryptone-broth substrate (pumps B switched on) with 0.5 μg ml −1 chloramphenicol via a manifold at a rate of 1.1 ml min −1 and so the kanamycin-resistant strain was disadvantaged. Thus, we would expect the chloramphenicolresistant strain to become monodominant with the dynamics following variable trajectories if demographic stochasticity plays a role and with identical trajectories if it does not. In this experiment, the chemostats were maintained at a volume of 200 ml. 3. In the third, and final experiment, the setup was the same experiment as 1, but with the addition of 0.5 μg ml −1 chloramphenicol in tryptone media via pump B at a rate of 0.33 ml min −1 directly into the 10 small chemostats. Also, the small chemostats were kept at a volume of 200 ml. Here, the chloramphenicol-resistant strain should dominate, but the kanamycin-resistant strain ought to be maintained at low abundance by new migrants. It should be possible to ascertain whether any dynamics can be attributed to those induced by the randomness of birth, death immigration events.
The mathematical model is a modest adaptation of that in Ofiţeru et al. (2010), which derives from Sloan et al. (2006) and ultimately Hubbell (2001). It describes a birth-death-immigration process for the co-culture of the two species in small chemostats that are maintained with a constant number of individuals, N T . For the assemblage to change an individual is selected at random to die or leave the system. The dead individual is replaced by an immigrant from a source community with the probability m or by reproduction by a member of the local community with probability (1 − m).If we take either one of our two strains comprising initially of N individuals, then we can write the probabilities that after one replacement event, there has been an increase by one, decrease by one, or no change (Sloan et al., 2006), Pr Pr respectively, where and K and K other are the number of individuals in the large monoculture of the strain of interest and the other strain respectively. Thus p is the relative abundance of the strain in the influent to the smaller chemostats. α is a parameter that conveys an advantage on the strain by increasing the probability of its birth relative to other strain within the chemostat; when this is set to zero the dynamics are neutral. Note Pr(N/N) is independent of α.
These transition probabilities describe a discrete Markov chain process that can simulate the dynamics of the abundance of the species. In the discrete Markov transition probabilities, each transition represents one replacement event. Simulating the dynamics using this discrete model rapidly becomes cumbersome as N T gets large. For microbial communities, where N T is very large indeed, it is completely impractical. So, it is convenient to recast the model as a set of continuous stochastic differential equations (SDEs). The model presented here is very similar to that of Ofiţeru et al. (2010) but differs slightly. For the complex wastewater treatment plant, where they applied their model, the average time between birth/death random replacements was experimentally intractable and therefore was estimated by calibration. With the careful experiments conducted here, we do know how many replacements occur within a given time period so we can estimate the average time between replacement events. We can also estimate the total population size, N T , the source community relative abundance, p, and the immigration probability, m, directly from the experimental data. This allows us to give a richer interpretation of the parameters that are delivered by calibrating our SDE. Hence, we relate discrete events in time to continuous realtime in a marginally more intuitive way. We let η be the average time between individual replacement events in the populations and define a scaled time τ = t/η. Then in a time period Δτ = 1 we can expect one replacement in the community as a whole so that Δτ is, on average, the transition time in the discrete Markov model. In general, N T is large we can use the continuous variable X = N N T represent the Markov chain process by the SDE (2), where W τ is a standard Wiener process and for small Δτ, and For the discrete time model, the smallest change that can occur is the replacement of one individual, which we expect to occur in a time Δτ = 1. So, during this time period the possible changes in which gives us M and V. In Sloan et al. (2006), it is argued that the second term in V is an order of magnitude smaller the first and it is reasonable to ignore it. So that Eq. 5 becomes Now we can convert to real time using dτ = dt = η so that, or dX = f X, p : where f X,p : p . This then constitutes our SDE that we hope will describe the dynamics in relative abundance of either one of our bacterial strains.

The fitting method
To calibrate the model we have n observations of the relative abundance {X i , i = 1, n} that were made every 2 h, dt = 2, in all of the small chemostats, from 08:00 to 20:00 h daily. We also have observations of the relative abundance {p i , i = 1, n} in the large pure cultures and we can calculate dX i = X i + 1 − X i for all observations except the couplets that span the night, where the time gap was 12 h, which is too-long. The relative abundances were quantified using qPCR with the same standard curve which should minimize the variability in measurements; nonetheless, there will be some measurement error. So, for our measurements, where dW t, i is the ith realization of the standard Weiner process dW t,i $ ffiffiffiffi ffi dt p N 0, 1 ð Þ at ε i is a realization of the measurement error in X. We make the assumption that measurement noise, ε i , has the effect of adding a constant, σ ε , to standard deviation in the process. To fit the model we use maximum likelihood estimation (MLE), where the log-likelihood function for the parameters given the data is This allows us to make an estimate of all the parameters including the measurement noise σ ε based solely on the time-series data, which would not be possible using, for example, least squares fitting. Then, merely to gain more information on the quality of the model fit, the MLE estimate of σ ε is fixed in Eq. 13, which allows us to use ordinary least squares, as in Ofiţeru et al. (2010), to deliver goodness-of-fit statistics.

Results
The time series of relative abundance for both bacterial strains in all experiments and reactors are given in supplementary materials. We have highlighted the generic behaviour of the dynamics in relative abundance in the experiments by plotted graphs for an arbitrarily selected subset of the reactors. Plotting all reactors for all time points would lead to too many, or overly complicated, figures and plotting the averages across all reactors would defeat the purpose of highlighting the variability between reactors. For experiment 1, the time series of bacterial abundance in the pure culture of the chloramphenicol-resistant strain in the large monoculture chemostat and for two of the smaller co-culture chemostats are presented in Fig. 1A. We have not displayed the relative abundance of the kanamycin strain because it is merely one minus the relative abundance of the chloramphenicol strain. The graph shows that for both of these small chemostats the relative abundance fluctuates around the relative abundance of the large monoculture from which immigrants are drawn. The dynamics in the fluctuation about the source community abundance seems to be independent in the two reactors. This observation is similar for all 10 of the small co-culture chemostats; Fig. 1B demonstrates this by plotting the dynamics in abundance for 1 day for all 10 chemostats. So, whilst the abundance of the source of immigrants strongly influences the abundance in the local co-culture chemostats, there are serially correlated dynamics occurring independently in each of the vessels, despite the environmental conditions being the same in each. This is redolent of ecological drift.
In experiment 2 we inoculate the chemostats and cutoff immigration, so that the probability of an immigration event m = 0. The addition of 0.5 μg L −1 of chloramphenicol to each of the co-culture chemostats should convey an advantage to the chloramphenicol-resistant strain. In Fig. 1C we see that the experimental results, presented here for just three arbitrarily selected chemostats, bear this out. The chloramphenicol strain ultimately becomes dominant in all chemostats, but the route towards dominance differs in the reactors. Indeed, in two of the reactors displayed in Fig. 1C the chloramphenicol-resistant strain initially fairs less well than the kanamycin-resistant strain. Again, the dynamics vary from reactor to reactor despite the environmental conditions being identical.
In experiment 3, we reintroduce migration from the large monoculture chemostats and see, for two arbitrarily selected reactors (Fig. 2D), that whilst the chloramphenicol strain dominates it does not become monodominant; the kanamycin-resistant strain is retained in the chemostat by the steady stream of migrants, despite being disadvantaged by the antibiotic.
Our use of qPCR allowed us to estimate the total population size, N T . Organisms enter the reactor and grow, so the density in the reactor is higher than in the influent, however, at a steady state that density stays constant in time. Therefore, the number of replacements either by births or by immigration in a unit of time must be equal to the number leaving in the effluent. So, if the density of bacteria in the effluent is N eff and the volumetric flow rate is Q then the total number of replacements in the community per unit time is simply QN eff and thus the time between replacement is simply η = 1 QN eff . If N in is the density of immigrants in the influent, then the number of immigrants per unit time is QN in and so the probability that when a replacement event occurs it is by and immi- In supplementary materials, we see that in the presence of 0.5 μg L −1 chloramphenicol the growth rate of the chloramphenicol resistant, μ c , strain is two times that of the kanamycin-resistant strain, μ k , across a wide range of substrate concentrations. Thus, μ c μ k = 1 + α 1−α = 2 which means that α = 0.33. So, estimates of m, η, N T and α can be calculated directly from the experiments and are given in Table 1.
We can also approximate the standard deviation of the measurement error, σ ε . The abundances of each of the antibiotic-resistant genes were enumerated at all time points in the small reactors using qPCR with three technical replicates. All qPCR estimates for each gene used the same standard curves. With three estimates of the kanamycin-resistant gene and three of the chloramphenicol then we have nine possible combinations to estimate the total abundance of bacteria in the samples at each time point. And then if we consider all possible ratios of, for example, the three measurements of the abundance of chloramphenicol strain with the nine estimates of total abundance then there 27 possible values. At each time point, we calculated the mean and standard deviation of the possible values and then we calculated the mean of the standard deviation across all time points to obtain a crude estimate of σ ε . We obtained a value of σ ε ' 0.07 by averaging across all the data (Table 1).
One set of parameters was calibrated for each experiment using the time series from all the chemostats ( Table 2). The p-values for the least squares regression fit demonstrate that the model parameters are significantly different from the null hypothesis of them being zero (no trend). Bearing in mind that we are fitting a SDE with the aim of explaining the variability between reactors we would not necessarily expect a high R-square value for the fit, which is the case, especially for experiment 2 where there is no stabilizing influence of immigration. Overall, though, the significances of the over-all model fits are high. It is more important to test the null hypothesis that the weighted residuals are normally distributed. A B C D Fig 2. A. Experiment 1, where we might expect neutral dynamics. A graph of the dynamics in relative abundance of the chloramphenicolresistant strain in the large monoculture that acts as a source of immigrants and in two arbitrarily selected smaller co-culture chemostats (2 & 5). B. The relative abundance of the chloramphenicol-resistant strain in all 10 of the co-culture chemostats for a 10 h period during experiment 1. C. Experiment 2, where immigration to the small chemostats is cut off and chloramphenicol is added, which gives the chloramphenicolresistant strain an advantage. The plot shows the chloramphenicolresistant strain taking a variety of trajectories towards becoming monodominant in an arbitrary selection of four reactors (2, 5, 6 & 10). D. Experiment 3, where immigrants are equally likely to be of either strain, but the addition of chloramphenicol conveys an advantage on the chloramphenicolresistant strain. A graph of both the relative abundance of chloramphenicol and kanamycin-resistant strains in two of the co-culture chemostats. The chloramphenicolresistant strain is consistently more abundant but the dynamics vary between reactors. [Color figure can be viewed at wileyonlinelibrary.com] For the case with no immigration, where the chloramphenicol-resistant strain becomes monodominant, then 1 and 0 are absorbing boundaries where the residuals become very close to zero; which erroneously skews the distribution of residuals. So we test the fit for the region just before the process first hits the boundary, which is analogous to first passage time problem in survival analysis (Lee and Whitmore, 2006), and thus we only consider residuals where the abundance lies between 0.05 and 0.95. Using the Anderson-Darling test, we fail to reject the null hypothesis of normality at the 95% level. So, it would appear that the SDEs are doing a good job of capturing the noisy dynamics across all the reactors.

Discussion
There was a high degree of variability in the dynamics of the abundance of the two strains across identically operated chemostats. We were able to fit the parameters of our SDEs using the time series data from all three experiments with a high degree of significance. This would suggest that we are observing neutral dynamics with immigration in experiment 1, selection and drift in experiment 2 and selection and drift mediated by immigration in experiment 3. Precisely what we anticipated. So, it would appear, on this evidence, that demographic stochasticity does play an important role in the dynamics, whether or not the species have identical growth rates.
The beauty of the simple experimental system we have used here is that we can actually measure the key variables in the experiment and compare them to our calibrated model parameters. First, the calibrated estimates of the standard deviation in measurement noise are in close agreement with that estimated directly from the triplicate qPCR measurements. For the other parameters, it is important to note that in our mathematical model, and indeed, in every other manifestation of random birthdeath population models, the parameters we calibrate are compound combinations of the fundamental parameters that describe the process. So, for example, here θ 1 = mdt ηN T . If we estimate the compound parameters for our model using the variables, m, N T , η and α, measured directly in our experiments (Table 1) do they match the calibrated model parameters (Table 2)? Table 3 makes that comparison.
For parameters θ 1 and θ 2 there is a reasonably good match between the calibrated and experimentally derived estimates. However, values of θ 3 estimated by calibration are several orders of magnitude larger than that are calculated from the direct measured variables. θ 3 determines the size of the random step in the stochastic process and hence the timescales over which stochastic demography becomes apparent.
So, this takes us straight back to the heart of the dilemma that we outlined in the introduction. As Nee (2005) and Ricklefs (2006) suggested, if we use the actual measured population size, immigration probability and replacement rate the model predicts the stochastic contribution to dynamics that is small and would only become apparent over long timescales. Yet, the observed dynamics are highly variable and are well represented by the SDEs albeit with the calibrated parameter that determines the speed of dynamics, θ 3 , much larger than we would expect. So, in our tightly controlled experiments, we see exactly the dichotomy that we described earlier, based on birth-death processes calibrated for more complex environments. This is frustrating. There is a tantalizing alignment between the models and the experimental results that is only undermined by the failure to match one important parameter. Thus, it appears that our underlying model (Eqs. 1-3), which is  -4.58× 10 −11 6.61× 10 10 0.33 0.07 3. 0.31 6.29× 10 −12 3.7× 10 11 0.33 0.07 the basis of all birth-death processes in ecology is fundamentally flawed.
Before dismissing birth-death processes for our simple microbial communities it is worth considering their application in the related field of population genetics. There the central premise is that mutations in the genomes of a population are inevitable, but the precise timing and location of the mutations defy characterization. So, the mathematics of population genetics, whilst continually advancing, has adhered to the idea that inheritance is an intrinsically stochastic process. The simplest models encapsulate a few fundamental truths for all organisms; they are born, they die, they inherit mutations and they accumulate new ones at random. Selection is manifest as a tweak on the probability that one mutant preferentially reproduces over others. Thus population genetics has simple random birth-death processes at its heart and these are deployed to great effect in range of practical applications including human medicine, animal husbandry, epidemiology and agriculture (Pirchner, 1969). This is the case even although the model predictions, in their rawest form, always deviate from observations. Thus, for example, the models always significantly underestimate the variance in the frequency of alleles in successive populations (Wright, 1931) when applied using the census population number. This does not deter geneticists; confident that the fundamental stochastic processes are at play, they retain the birth-death-mutation model and introduce the idea of an effective population size. That is, a fictitious smaller population size would see the model produce the correct results. This is not as much of a 'fix' as it might first seem because there is a wide variety of biological and demographic reasons why it is reasonable to assume that not all of the census population will contribute to the genetic diversity of the next generation. Thus effective population sizes in clonal populations of bacteria estimated using allele frequencies have small effective population sizes (Fraser et al., 2007).
Can we deploy a similar approach here and, based on the assumption that the random birth-death process must be occurring, make sense of the rapid drift by describing an effective parameter that makes biological sense? The model assumes that every organism that is replaced is selected independently at random from the population. Suppose instead that there was some spatial or temporal correlation in the replacements. This is not unreasonable since, despite our best efforts, it is impossible to perfectly mix a reactor and so when an individual leaves the system it departs at the same time as at least a few of its neighbours. Furthermore, when a bacterium divides the offspring are collocated, they are not instantaneously transported to randomly selected locations in the chemostat. In addition, with any bacterial community some degree of flocculation is common.
Our model does not explicitly represent space. However, we can explore what size of spatial or temporal grouping, or 'clumping', might yield the stochastic dynamics we observe. If the cells in the chemostat reside in clumps, then they would exit the chemostats in those clumps. So if N ec is the total number of clumps and η ec is the average time between clumps leaving then, N ec η ec = N T η; the turnover in the population is the same it merely occurs in clumps rather than individuals. Note then that the first two parameters in our model remain unchanged if we replace N T with N ec and η with η e no matter what their value. If we assume that N ec η ec is given by the experimentally defined N T η then we can now calculate the value of N ec that would yield the calibrated value of θ 3 , which for experiment 1, 2 and 3 would be 598, 126 and 255 respectively; which is tiny in comparison to the number of individuals. Now consider the number of organisms that would constitute a clump, N T /N ec , which is 5.56 × 10 8 , 5.21 × 10 8 and 14.4 × 10 8 , similar across all three experiments. These seem large, but the density of organisms is very high in the chemostats. Indeed, if the organisms were evenly distributed in the liquid then one of these clumps would only occupy approximately 0.5 ml. If they were correlated in time then it is the number of organisms that would leave the reactors in approximately 30 s. It, therefore, does not seem unreasonable to us that spatial or temporal correlation in the replacement events might be the reason that we seem to observe stochastic dynamics of a magnitude far higher than we would expect if events were truly randomly distributed as they are in our model. We suspect that the spatial correlation of events will be equally important to stochastic dynamics in other environments. Thus, predators will plough through a population, consuming bacteria and their neighbours rather than picking up individuals at random from across the population. Infiltration of rainfall in soils, attachment to substratum in a reactor or indeed in structured biofilms will all promote correlated random events. Also, it may explain the parameter values that emerge from studies where neutral models are calibrated using stationary taxa abundance distributions. In Etienne and Alonso (2005) description of the stationary taxa-abundance distribution for the neutral case, a 'fundamental dispersal number' is calibrated, and in Sloan et al.'s (2006) adaption for microbial communities, a similar compound parameter N T m is calibrated. Typically the calibrated values of N T m are in the range of 10s to 10 000s depending on the clade of bacteria (Sloan et al., 2006;Woodcock et al., 2007;Morris et al., 2013;Adair et al., 2018), which in large populations would suggest very low immigration probabilities. But if N T were replaced by a much smaller effective community size, N ec , then immigration could be orders of magnitude higher and perhaps more realistic for the systems. The composition and dynamics of any open microbial community are ultimately shaped by four things (Vellend and Agrawal, 2010): diversification, dispersal, selection and drift. If we had a perfect knowledge of every individual organism in a community, from its position to metabolic function and state, if we could anticipate precisely where and when migrants entered or departed and if we had an exact map of the environment now and in the future then we might, in theory, be able to predict the effects of these processes on the functioning and the fate of species and their diversity in microbial communities. However, we do not have this information even for simple communities, let alone the complex communities that we routinely rely on for important biotechnologies in, for example, agriculture and water treatment. Furthermore, it is not clear if we will ever be able to routinely collect such information or choose to deploy the computing power to process it. So if we are to use models to predict critical elements of the behaviour of important communities then must we accept models that are gross-simplifications of the real world but, importantly, capture stable aspects of the system (Cox, 1995). In population genetics, this perspective is deeply ingrained. Thus, tractable models that describe the drift and selection of alleles are built on often overly simplified but convenient assumptions like, random mating, simultaneous birth of each new generation, constant population size and equal numbers of children per parent. When applied to real-world populations, in order for the 'ideal' model predictions to match data it is usually necessary to use an effective population size that is smaller than the census population size. The effective population size has emerged as one of the most important characteristics in predicting within-species allele dynamics and diversification.
In microbial ecology, we have no such convention. We are less inclined to accept such abstractions and more inclined to limit ourselves to first-order descriptors that take no account of any underlying conceptual model, even though we have some consensus (Vellend and Agrawal, 2010).
Here we have demonstrated that by adopting similar effective community sizes, N ec , which are orders of magnitude lower than the census population the dynamics in all three experimental scenarios, neutral, with selection, and totally insular, the dynamics are consistent with an idealized community where all births, replacements and immigration events are independent. Thus, using the effective parameter allows us to make vital predictions on the dynamics of species in the community and the challenge then becomes determining this fundamental characteristic for different consortia and environments. N ec ,we believe, can be deployed as a fundamental measure to help characterize the dynamics of mixed bacterial consortia in all sorts of important environments from wastewater reactors to the human gut or soils and ultimately be as useful as effective population size has been in genetics.