## Introduction

Power law rank–abundance distributions have been there since Pareto first discovered the relationship between rank and wealth. Many mechanisms give rise to power laws but they are all essentially of the rich-grow-richer type. The mechanism described in this study is simple, new, and fundamentally different. It is motivated by the kill the winner hypothesis for marine phage advanced by Thingstad (2000). Contrary to other power law scenarios, which show a type of stability where the rich stay rich, in this model the dominant phage keeps changing. In this paper, a mechanism is advanced that can produce a power law rank–abundance distribution in which the population size of any one phage cycles through the ranks. The authors begin by citing some evidence that the marine phage community structure is indeed well approximated by a power law.

### Experimental basis for the model

Phage are viruses that predate on bacteria. Phage are also the most abundant biological entities in the biosphere, with an estimated 10^{31} particles on the planet^{1}. In this study, six distributions (broken stick, exponential, logarithmic, lognormal, niche pre-emption, and power law) were tested for their goodness of fit to marine phage metagenomic data obtained from shotgun sequencing as described in Breitbart *et al.* (2002). The distributions were chosen based on prior use in ecological or mathematical contexts (Magurran, 2003). In each of the two experiments considered, 200 L of water were filtered to yield about 2 × 10^{12} viral particles. The first sample was collected from Mission Bay, the second from near-shore ocean off Scripp's Pier, both in San Diego. The viral particles were lysed and a linker amplified shotgun library was created. A number of clones (1061 and 873, respectively) were sequenced and analyzed for contiguous fragments of sequenced DNA using a criterion of 98% or better identity over at least 20 bp. The analysis consisted of assembling matching pieces into groups called contigs, where a *q*-contig is a collection of *q* fragments that can be assembled by virtue of matching overlaps. The number of observed 1-contigs, 2-contigs, … were then used as the basis for community structure inferences. The two communities discussed had [1-contigs, 2-contigs, 3-contigs] given by [1021, 17, 2] and [841, 13, 2], respectively. No higher order contigs were observed.

To determine which distribution best describes the metagenomic data, the numbers of contigs predicted from a modified Lander–Waterman equation (Lander & Waterman, 1988; Breitbart *et al.*, 2002) were compared with the observed numbers of contigs. The error between the predicted and the observed numbers of 1-contigs, 2-contigs, …, 13-contigs was minimized, i.e. 10 observed zeros for 4-contigs and higher were also counted. The results are shown in Table 1. Such analyses are now publicly available using the PHACCS website (Angly *et al.*, 2005).

Model | Error | |
---|---|---|

Scripps pier (SP) | Mission bay (MB) | |

Differences in the values of the error represent odds ratios of the observations coming from the respective models. The models are listed in order of increasing error for each sample. The exponential and logarithmic models were named according to their analytic form in the rank-abundance relationship, viz. frequency proportional to exp(− *k*^{*}rank) or frequency proportional to 1/log(rank +1), respectively.
| ||

Power law | 1.8 | 2.1 |

Lognormal | 1.9 | 2.3 |

Logarithmic | 2.5 | 2.8 |

Broken stick | 11 | 15 |

Exponential | 12 | 16 |

Niche preemption | 12 | 16 |

The values of the error can be interpreted as logarithms of odds ratios of the observed contigs occurring in community distributions of the specified forms. Thus, a value of 0.1 for the difference in error corresponds to an odds ratio of exp(0.1)≈11/10 between the two models meaning that the model with the smaller error is about 10% more likely. In both marine phage communities, the data are best described by a power law distribution [error for Scripps Pier (SP)=1.8 and Mission Bay (MB)=2.1], followed by the lognormal distribution (error for SP=1.9 and MB=2.3). In contrast, classical niche-based ecological models like broken stick (error for SP=11 and MB=15) and niche pre-emption (error for SP=12 and MB=16) are very poor fits to the data.

### The model

The rank–abundance distribution of marine phage communities suggests ecological models of how phage and their microbial hosts interact. The aim of this study is to present the simplest model of this interaction which captures the important qualitative features of the system. The model of this study makes four assumptions that lead to the rank–abundance relation. Three of the assumptions have a general character while the fourth is in the form of a system of dynamical equations. The first assumption is that different predator–host pairs do not interact. While there is certainly some interaction, it is assumed that such interaction is weak. It is further assumed that there is a strong and specific interaction between a microbial host and its phage predators, which results in the most abundant microorganism (bacteria and archaea) being killed by its phage predators. This relationship has been termed ‘kill the winner’ and it predicts that specific predator–prey pairs oscillate in time as blooms of a particular microorganism are followed closely by blooms of its phage predator (Thingstad, 2000). The third assumption is that all phage–host pairs follow identical dynamics, but bloom at independent times (i.e. they are randomly distributed *in time* along a common cycle). This hypothesis bears some resemblance to the hypothesis of neutral evolution; it is expected that the qualitative features are captured by replacing the full complexity of the problem by one ‘average’ type. In fact, comparable calculations using a distribution of parameter values can also give a close fit to the power law (Rodriguez-Brito, 2005).

Once the dynamical equations are specified, the three assumptions above imply a definite rank–abundance relationship. Just how this works is illustrated in Fig. 1. Whatever the dynamics, the slope of the least squares fit to the power law line is given by

where *N* is the richness of the community, i.e. the number of populations, and Φ is the phage population size. This slope is the exponent in the power law rank–abundance relationship.

The simplest dynamical equations to try are the classic Lotka–Volterra (LV) predator–prey dynamics. The equations are:

where *B* denotes the size of the microbial population for a single genotype, Φ denotes the population size of the corresponding phage predator, and where α, β, γ and δ are positive constants that describe microbial growth, microbial death, phage decay and phage production, respectively.

In order to match the observed community structure, blooms must alternate with comparatively long periods of ‘hiding out’ at concentrations several orders of magnitude lower than the bloom concentrations (Wommack *et al.*, 1999; Breitbart *et al.*, 2004; Casas *et al.*, 2006). For LV dynamics, such cycles can be robustly found for a broad range of parameter values using initial states near the origin, i.e. states for which both the phage and the microbial concentrations are very close to zero. Figures 2a–d show the results of adopting LV dynamics for 1000 noninteracting phage–host pairs. Transformation of these data to a log–log plot, shown in Fig. 2d, gives a decent fit to the power law (*R*^{2}=0.83).

The fit can be improved by noting that the high abundance phage are overrepresented on the log–log plot, i.e. too many ranks correspond to high abundance values and destroy the linearity. The dynamics would resemble a power law more closely if the blooms were shorter and more intense. Mathematically this can be accomplished using a generalized version of LV (GLV), where an exponent larger than one is added to the predator populations (Costello, 1999; Dancso *et al.*, 1991). A GLV model with an exponent of 2 produced an almost perfect fit to the power law distribution; panels (e–h) in Fig. 2 show a fit with *R*^{2}=0.99.

An excellent fit to the power law is observed robustly for a range of parameter values provided an orbit that is sufficiently removed from the equilibrium point is used. The parameters of the model as well as the orbit can be determined up to a time scale by the four relatively easily measured parameters *B*_{ave}, Φ_{ave}, *B*_{max} and Φ_{max}. If the time scale for a cycle is assumed to be a year, a reasonable choice in light of seasonal blooms of various microorganisms, the predicted dominance of any one microorganism is of the order of a few hours.

The orbits in (*B*, Φ) space (see Fig. 2f) are of the ‘canard’ type (Gavin *et al.*, 2006; Rotstein *et al.*, 2003) spending most of their time very close to the *B* and Φ axes. The crossing is very rapid and nearly linear. The buildup of the microorganisms is relatively slow and approximately exponential. The subsequent buildup of phage and dropoff of the microorganism population are very rapid. In general, the phage bloom is much shorter than the microorganism bloom. The final phage dropoff is slow with an algebraic relaxation to the minimum population.

The better fit of the GLV over LV might be interpreted biologically as representing the cooperative nature of phage predation in a local spatial–temporal region. The ocean is a gel made up of particles ranging from colloids to marine snow (Alldredge *et al.*, 1986; Azam, 1998; Chin *et al.*, 1998). These particles represent high local concentrations of nutrients and microorganisms are known to chemotax to these particles (Blackburn *et al.*, 1998). Phage lysis of microorganisms on particles would create locally high concentrations of both predators and prey (i.e. a local deviation from spatially averaged mass action). The simultaneous release of many phage in the middle of such a microbial colony represents a cooperative effect. The local nature of blooms on a substrate leads to pockets of chain-reaction-type interactions. This is one possible way of achieving an effective cooperativity. Such local chain reactions have been historically modelled with higher order kinetics as a device to mimic the spatial inhomogeneity.

One consequence of adding the cooperativity function to the lysis event (δ*B*Φ^{2}) is that the decay exponent on Φ must match the exponent in the mass action terms to keep the populations oscillating (i.e. to maintain neutral stability of the *orbits*). This feature of the proposed model matches data on phage decay, which show that phage particles display rapid initial decay rates, which then decrease over time (Heldal & Bratbak, 1991; Mathias *et al.*, 1994) and result in very long-term viability of samples.

In the foregoing analysis, the goal was to reproduce a power law rank–abundance distribution using a simple model. However, the data in Table 1 might allow a lognormal fit instead. After all, the odds ratio represented by the differences of the errors makes a power law only about 10% more likely. The model with the first three assumptions above can also reproduce lognormal distributions, albeit with assumption four replaced by different dynamical equations instead of equation (2) or (3). With any dynamical equations, the model connects community structure and community dynamics and provides a tool for building more elaborate models. The neutral stability of the dynamics in equation (2) or (3) is not a necessary feature. Limit cycles traversing approximately the same orbit would make similar predictions and represent viable alternatives to the proposed model, which trades simplicity for structural stability. Fitting the model parameters α, β, γ and δ to a particular biome can be carried out with minimal information: total microbial and phage concentrations, most abundant microbial and phage concentrations, and the time between blooms of the same phage–host pair.