Power law rank–abundance models for marine phage communities


  • Editor: Wolfgang Schumann

Correspondence: Peter Salamon, Department of Mathematics and Statistics, San Diego State University, San Diego, CA 92182-7720, USA. Tel.: +1 619 594 7204; fax: +1 619 594 6746; e-mail: salamon@sdsu.edu


Metagenomic analyses suggest that the rank–abundance curve for marine phage communities follows a power law distribution. A new type of power law dependence based on a simple model in which a modified version of Lotka–Volterra predator–prey dynamics is sampled uniformly in time is presented. Biologically, the model embodies a kill the winner hypothesis and a neutral evolution hypothesis. The model can match observed power law distributions and uses very few parameters that are readily identifiable and characterize phage ecosystems. The model makes new untested predictions: (1) it is unlikely that the most abundant phage genotype will be the same at different time points and (2) the long-term decay of isolated phage populations follows a power law.


Power law rank–abundance distributions have been there since Pareto first discovered the relationship between rank and wealth. Many mechanisms give rise to power laws but they are all essentially of the rich-grow-richer type. The mechanism described in this study is simple, new, and fundamentally different. It is motivated by the kill the winner hypothesis for marine phage advanced by Thingstad (2000). Contrary to other power law scenarios, which show a type of stability where the rich stay rich, in this model the dominant phage keeps changing. In this paper, a mechanism is advanced that can produce a power law rank–abundance distribution in which the population size of any one phage cycles through the ranks. The authors begin by citing some evidence that the marine phage community structure is indeed well approximated by a power law.

Experimental basis for the model

Phage are viruses that predate on bacteria. Phage are also the most abundant biological entities in the biosphere, with an estimated 1031 particles on the planet1. In this study, six distributions (broken stick, exponential, logarithmic, lognormal, niche pre-emption, and power law) were tested for their goodness of fit to marine phage metagenomic data obtained from shotgun sequencing as described in Breitbart et al. (2002). The distributions were chosen based on prior use in ecological or mathematical contexts (Magurran, 2003). In each of the two experiments considered, 200 L of water were filtered to yield about 2 × 1012 viral particles. The first sample was collected from Mission Bay, the second from near-shore ocean off Scripp's Pier, both in San Diego. The viral particles were lysed and a linker amplified shotgun library was created. A number of clones (1061 and 873, respectively) were sequenced and analyzed for contiguous fragments of sequenced DNA using a criterion of 98% or better identity over at least 20 bp. The analysis consisted of assembling matching pieces into groups called contigs, where a q-contig is a collection of q fragments that can be assembled by virtue of matching overlaps. The number of observed 1-contigs, 2-contigs, … were then used as the basis for community structure inferences. The two communities discussed had [1-contigs, 2-contigs, 3-contigs] given by [1021, 17, 2] and [841, 13, 2], respectively. No higher order contigs were observed.

To determine which distribution best describes the metagenomic data, the numbers of contigs predicted from a modified Lander–Waterman equation (Lander & Waterman, 1988; Breitbart et al., 2002) were compared with the observed numbers of contigs. The error between the predicted and the observed numbers of 1-contigs, 2-contigs, …, 13-contigs was minimized, i.e. 10 observed zeros for 4-contigs and higher were also counted. The results are shown in Table 1. Such analyses are now publicly available using the PHACCS website (Angly et al., 2005).

Table 1.   Goodness of fit for six different models to the observed number of contigs from two marine phage metagenomes
Scripps pier (SP)Mission bay (MB)
  1. Differences in the values of the error represent odds ratios of the observations coming from the respective models. The models are listed in order of increasing error for each sample. The exponential and logarithmic models were named according to their analytic form in the rank-abundance relationship, viz. frequency proportional to exp(−k* rank) or frequency proportional to 1/log(rank +1), respectively.

Power law1.82.1
Broken stick1115
Niche preemption1216

The values of the error can be interpreted as logarithms of odds ratios of the observed contigs occurring in community distributions of the specified forms. Thus, a value of 0.1 for the difference in error corresponds to an odds ratio of exp(0.1)≈11/10 between the two models meaning that the model with the smaller error is about 10% more likely. In both marine phage communities, the data are best described by a power law distribution [error for Scripps Pier (SP)=1.8 and Mission Bay (MB)=2.1], followed by the lognormal distribution (error for SP=1.9 and MB=2.3). In contrast, classical niche-based ecological models like broken stick (error for SP=11 and MB=15) and niche pre-emption (error for SP=12 and MB=16) are very poor fits to the data.

The model

The rank–abundance distribution of marine phage communities suggests ecological models of how phage and their microbial hosts interact. The aim of this study is to present the simplest model of this interaction which captures the important qualitative features of the system. The model of this study makes four assumptions that lead to the rank–abundance relation. Three of the assumptions have a general character while the fourth is in the form of a system of dynamical equations. The first assumption is that different predator–host pairs do not interact. While there is certainly some interaction, it is assumed that such interaction is weak. It is further assumed that there is a strong and specific interaction between a microbial host and its phage predators, which results in the most abundant microorganism (bacteria and archaea) being killed by its phage predators. This relationship has been termed ‘kill the winner’ and it predicts that specific predator–prey pairs oscillate in time as blooms of a particular microorganism are followed closely by blooms of its phage predator (Thingstad, 2000). The third assumption is that all phage–host pairs follow identical dynamics, but bloom at independent times (i.e. they are randomly distributed in time along a common cycle). This hypothesis bears some resemblance to the hypothesis of neutral evolution; it is expected that the qualitative features are captured by replacing the full complexity of the problem by one ‘average’ type. In fact, comparable calculations using a distribution of parameter values can also give a close fit to the power law (Rodriguez-Brito, 2005).

Once the dynamical equations are specified, the three assumptions above imply a definite rank–abundance relationship. Just how this works is illustrated in Fig. 1. Whatever the dynamics, the slope of the least squares fit to the power law line is given by


where N is the richness of the community, i.e. the number of populations, and Φ is the phage population size. This slope is the exponent in the power law rank–abundance relationship.

Figure 1.

 (a) The population of microorganisms (blue dashed) and phage (red solid) as a function of time along with five points sampled at integer times (red circles). All populations and time are in arbitrary units. (b) The relation between the sizes of the bacterial and phage populations (blue solid) showing the points where the phage population is sampled (red circles). (c) The sampled values for phage as a function of rank. (d) The log–log version of part (c). R2 is the coefficient of determination.

The simplest dynamical equations to try are the classic Lotka–Volterra (LV) predator–prey dynamics. The equations are:


where B denotes the size of the microbial population for a single genotype, Φ denotes the population size of the corresponding phage predator, and where α, β, γ and δ are positive constants that describe microbial growth, microbial death, phage decay and phage production, respectively.

In order to match the observed community structure, blooms must alternate with comparatively long periods of ‘hiding out’ at concentrations several orders of magnitude lower than the bloom concentrations (Wommack et al., 1999; Breitbart et al., 2004; Casas et al., 2006). For LV dynamics, such cycles can be robustly found for a broad range of parameter values using initial states near the origin, i.e. states for which both the phage and the microbial concentrations are very close to zero. Figures 2a–d show the results of adopting LV dynamics for 1000 noninteracting phage–host pairs. Transformation of these data to a log–log plot, shown in Fig. 2d, gives a decent fit to the power law (R2=0.83).

Figure 2.

 Comparison of phage population structure predicted by Classical LV (a–d) and the proposed GLV (e–h) dynamics. All population sizes and time are in arbitrary units. (a) Populations of microorganisms (dashed blue line) and phage (solid red line) as a function of time. Red circles show 1000 random samples along the cycle. (b) The relationship between the sizes of the microbial and phage populations (solid blue line). Red circles show the same 1000 random samples along the cycle. (c) Rank–abundance plot of the 1000 random phage samples. (d) Log–log version of (c). The solid blue line is the least-squares fit to the ranked phage populations (red asterisks). (e—h) show the same relationships for the GLV dynamics. R2 is the coefficient of determination.

The fit can be improved by noting that the high abundance phage are overrepresented on the log–log plot, i.e. too many ranks correspond to high abundance values and destroy the linearity. The dynamics would resemble a power law more closely if the blooms were shorter and more intense. Mathematically this can be accomplished using a generalized version of LV (GLV), where an exponent larger than one is added to the predator populations (Costello, 1999; Dancso et al., 1991). A GLV model with an exponent of 2 produced an almost perfect fit to the power law distribution; panels (e–h) in Fig. 2 show a fit with R2=0.99.


An excellent fit to the power law is observed robustly for a range of parameter values provided an orbit that is sufficiently removed from the equilibrium point is used. The parameters of the model as well as the orbit can be determined up to a time scale by the four relatively easily measured parameters Bave, Φave, Bmax and Φmax. If the time scale for a cycle is assumed to be a year, a reasonable choice in light of seasonal blooms of various microorganisms, the predicted dominance of any one microorganism is of the order of a few hours.

The orbits in (B, Φ) space (see Fig. 2f) are of the ‘canard’ type (Gavin et al., 2006; Rotstein et al., 2003) spending most of their time very close to the B and Φ axes. The crossing is very rapid and nearly linear. The buildup of the microorganisms is relatively slow and approximately exponential. The subsequent buildup of phage and dropoff of the microorganism population are very rapid. In general, the phage bloom is much shorter than the microorganism bloom. The final phage dropoff is slow with an algebraic relaxation to the minimum population.

The better fit of the GLV over LV might be interpreted biologically as representing the cooperative nature of phage predation in a local spatial–temporal region. The ocean is a gel made up of particles ranging from colloids to marine snow (Alldredge et al., 1986; Azam, 1998; Chin et al., 1998). These particles represent high local concentrations of nutrients and microorganisms are known to chemotax to these particles (Blackburn et al., 1998). Phage lysis of microorganisms on particles would create locally high concentrations of both predators and prey (i.e. a local deviation from spatially averaged mass action). The simultaneous release of many phage in the middle of such a microbial colony represents a cooperative effect. The local nature of blooms on a substrate leads to pockets of chain-reaction-type interactions. This is one possible way of achieving an effective cooperativity. Such local chain reactions have been historically modelled with higher order kinetics as a device to mimic the spatial inhomogeneity.

One consequence of adding the cooperativity function to the lysis event (δBΦ2) is that the decay exponent on Φ must match the exponent in the mass action terms to keep the populations oscillating (i.e. to maintain neutral stability of the orbits). This feature of the proposed model matches data on phage decay, which show that phage particles display rapid initial decay rates, which then decrease over time (Heldal & Bratbak, 1991; Mathias et al., 1994) and result in very long-term viability of samples.

In the foregoing analysis, the goal was to reproduce a power law rank–abundance distribution using a simple model. However, the data in Table 1 might allow a lognormal fit instead. After all, the odds ratio represented by the differences of the errors makes a power law only about 10% more likely. The model with the first three assumptions above can also reproduce lognormal distributions, albeit with assumption four replaced by different dynamical equations instead of equation (2) or (3). With any dynamical equations, the model connects community structure and community dynamics and provides a tool for building more elaborate models. The neutral stability of the dynamics in equation (2) or (3) is not a necessary feature. Limit cycles traversing approximately the same orbit would make similar predictions and represent viable alternatives to the proposed model, which trades simplicity for structural stability. Fitting the model parameters α, β, γ and δ to a particular biome can be carried out with minimal information: total microbial and phage concentrations, most abundant microbial and phage concentrations, and the time between blooms of the same phage–host pair.


The proposed model predicts that typical phage–host cycles involve long periods of microorganisms hiding from phage predators at very low numbers alternating with brief spurts of dominance. A given phage then only becomes abundant following a bloom of its corresponding microbial prey. Therefore, it is unlikely that the most abundant phage genotype will be the same at different time points separated by more than a few hours and there should be several orders of magnitude difference in the numbers of a specific phage present in different seasons. It also predicts that phage particles display rapid initial decay rates, which decrease over time and result in very long-term viability of samples. In addition to these predictions, which match empirical data, the proposed model represents a mechanism for how a system can display a new turnover type of power law in which the phage–host pairs keep cycling through the ranks.


  1. 1 This estimate is based on the estimated number of prokaryotes on the planet (Whitman et al., 1998).


Arlette Baljon, Avinoam Rabinovitch and members of the MathPhage group at SDSU are thanked for helpful conversations. This work was supported by grant DEB-BE 04-21955 from the National Science Foundation. M.B. was supported by an EPA STAR Fellowship.