A fully stochastic approach bridging the microscopic behavior of individual microorganisms with macroscopic ensemble dynamics in surface flow networks
Environmental Engineering and Science Program, College of Engineering and Applied Science, University of Cincinnati, Cincinnati, Ohio, USA
Corresponding author: L. Yeghiazarian, Environmental Engineering and Science Program, College of Engineering and Applied Science, University of Cincinnati, 746 ERC, 2901 Woodside Dr., Cincinnati, OH 45221, USA. (firstname.lastname@example.org)
 Prediction of microbial surface water contamination is a formidable task because of the inherent randomness of environmental processes driving microbial fate and transport. In this article, we develop a theoretical framework of a fully stochastic model of microbial transport in watersheds, and apply the theory to a simple flow network to demonstrate its use. The framework bridges the gap between microscopic behavior of individual microorganisms and macroscopic ensemble dynamics. This scaling is accomplished within a single mathematical framework, where each microorganism behaves according to a continuous-time discrete-space Markov process, and the Markov behavior of individual microbes gives rise to a nonhomogeneous Poisson random field that describes microbial population dynamics. Mean value functions are derived, and the spatial and temporal distribution of water contamination risk is computed in a straightforward manner.
 Fecal contamination is the leading cause of surface water impairment in the US [USEPA, 2008], and fecal pathogens are capable of triggering massive outbreaks of gastrointestinal disease. The most notable outbreak took place in Milwaukee, WI in 1993; it affected over 400,000 people, contributed to more than 100 deaths and was not predicted by water resources management [MacKenzie et al., 1994]. It is well known that the difficulty in prediction of water contamination has its roots in the stochastic variability of fecal pathogens in the environment, and in the complexity of environmental systems [e.g., Bertuzzo et al., 2007; Tyrrel and Quinton, 2003]. The environmental parameters that have a high degree of influence on surface water flow and contamination include rainfall, hydraulic conductivity, soil composition, surface roughness, slope, and vegetation among others. The variability of some of them seems to evolve randomly and cannot be described deterministically at the current level of knowledge, even if the underlying phenomena were thoroughly understood [Kottegoda, 1980]. Instead, posing them in a probabilistic framework allows for a quantitative, rational treatment [Clarke, 1998; Freeze, 1980].
 The complexity of environmental systems and our lack of knowledge about the physical, biological, and chemical aspects of microbial fate and transport is yet another significant barrier in accurate prediction of water contamination events. For instance, general trends in microbial transport can be captured using standard hydrodynamic models that do not incorporate specifics of microbial behavior. This approach, however, leaves dynamic processes during and poststorm events unexplained [Cho et al., 2010; Hellweger and Masopust, 2008]. This is problematic because many water contamination events and associated waterborne disease outbreaks take place after storms [Curriero et al., 2001]. For example, a specific aspect of storm and poststorm events that has recently been addressed by several groups is microbial-soil sediment interactions in general, and deposition and resuspension of microbes from bottom sediments in particular. It appears that streambeds serve as a significant, albeit previously unaccounted, source of microbes in surface waters, particularly during and after storm events, and inclusion of these sources, along with resuspension and deposition of microbes during transport, improves model predictions [see, e.g., Cho et al., 2010; Jamieson et al., 2005; Kim et al., 2010; Pandey et al., 2012; Yeghiazarian et al., 2006, among others]. This view necessitates the inclusion of sediment dynamics in microbial transport models. The overall conclusion, therefore, is that in order to accurately predict microbial surface water contamination events, physics-based stochastic models are needed.
 Most models of microbial transport in surface water, however, including those developed by regulatory agencies, do not necessarily account for many of the characteristics discussed above [Collins and Rutherford, 2004; Ferguson et al., 2005; Li and Duffy, 2011; McElroy, 1976; Medema and Schijven, 2001; Steets and Holden, 2003; Tian et al., 2002; Walker and Stedinger, 1999; Arnold et al.; Bicknell et al., 1997; USEPA, 2007]. Nonetheless, they are used ad hoc to determine watershed management strategies and specific measures for prevention of water contamination. It is becoming apparent that these approaches are not adequate, as is evidenced by multiple unpredicted outbreaks of waterborne disease [Curriero et al., 2001].
 In this article, we build upon our previous work to develop a framework for a fully stochastic, physics-based model for microbial transport in watersheds [Yeghiazarian et al., 2006, 2009]. The model bridges the gap between the microscopic behavior of individual microorganisms that includes interactions with soil sediments, to macroscopic ensemble dynamics. This scaling is accomplished within a single mathematical framework, in which individual behavior of particles gives rise to the macroscopic behavior of very large numbers of microbes. More specifically, each microorganism behaves according to a continuous-time discrete-space Markov process, and the Markov behavior of individual microbes collectively translates into a Poisson random field that describes microbial population dynamics [Yeghiazarian et al., 2006, 2009]. Sediment transport and interactions with microbes are integrated through coupling with a physics-based Water Erosion Prediction Project (WEPP) model [Nearing et al., 1989]. Model details are presented in section 2.
 This framework, in which the spatiotemporal distribution of an ensemble of microorganisms, each of which follows Markovian dynamics, is represented by a nonhomogeneous Poisson random field, has been developed for microbial transport on individual hillslopes [Yeghiazarian et al., 2009]. The challenge is then to expand the theory to watershed-scale transport processes. This is the objective of our current work. In this article, we develop the theoretical basis of a fully stochastic model of microbial transport in watersheds, and apply the theory to a simple flow network to demonstrate its use.
 The model parameters reflect the characteristics of Cryptosporidium parvum oocysts, a waterborne pathogen behind several notable outbreaks of waterborne disease and a subject of many studies [see, e.g., Atherholt et al., 1998; Atwill et al., 2006; Dai and Boll, 2006; Dorner et al., 2006; Searcy et al., 2006]. Cryptosporidium, like many other microbes and chemicals, is heterogeneously distributed in the environment, its transport is driven by random events, it has a low infectious dose and is expensive to monitor [Curriero et al., 2001; Gale, 1998; Nnane et al., 2012; Richardson et al., 1991; Yeghiazarian et al., 2009]. For these reasons, the decision-making process aiming to reduce human exposure to this, and many other pathogens and contaminants, must ultimately rest with mathematical models capable of capturing its occurrence and transport. This is where we see the utility of the modeling framework developed here.
 The article is organized as follows: Section 2 consists of three parts. Section 2.1 describes how Geographic Information System (GIS)-based node and link schematic networks can be used to represent flows in watersheds. Section 2.2 describes the theoretical development expanding the fully stochastic microbial transport model from the scale of individual hillslopes to watersheds. Section 2.3 describes risk analysis that can be conducted using the theory developed in section 2.2. Section 3 summarizes the results and section 4 concludes the paper.
2. Spatial and Temporal Dynamics of Microorganism Ensembles on a Stream Network
2.1. Schematic Networks
 Watersheds can be modeled within GIS as a schematic network, a graphical representation of connectivity between features of the landscape in the form of a network of nodes and links [Whiteaker et al., 2006]. Each node represents a given feature such as a subwatershed; links symbolize streams, overland flow, channels or slopes, and embody, in other words, microbial transport routes. Figure 1 shows a simple schematic network symbolizing flow in three links. The assumptions are: (1) each link Li has a constant flow velocity ui; (2) flow is from top to bottom of the network (from nodes with smaller indices toward nodes with larger indices). Denote Li the length of link i, xNi linear coordinates of nodes i, LObs the length between xN3 and some observation point x on link 3.
2.2. The Theory
 Each microorganism within the ensemble behaves according to a continuous-time discrete-state Markov process [Yeghiazarian et al., 2006]. The five allowed states reflect microbial interactions with soil particles, as well as settling and resuspension: (1) free (unattached) and immobile on the soil surface, (2) attached to immobile sediment on the soil surface, (3) attached to suspended sediment, (4) unattached and freely moving with flowing water, and (5) physiologically nonviable or lost from the system due to biological decay or infiltration (Figure 2). Transitions between states j and k are described in terms of transition rates gjk, where j = 1, …, 4 and k = 1, …, 5. Transition rates are critical because they are the constitutive relations that complete the model. They are determined based on the physics that drive transitions between states. For instance, transition rates from state 1 to 4, and from 2 to 3 are functions of rainfall intensity and the shear stress of the flow, which determine detachment and entrainment of particles from the soil surface into the flow. Detailed definitions of transition rates are shown in Table 1.
Values for gij and ui are obtained from [Yeghiazarian et al., 2006]. All other values are assumed.
Rate of immobile sediment-attached microbial aggregate breakdown
Rate of free microbe settlement onto soil surface from overland flow
Rate of immobile microbe attachment to soil particles
Rate of mobile sediment-attached microbial aggregate settlement onto soil surface from overland flow
Rate of mobile microbe attachment to immobile sediment
Rate of immobile sediment-attached microbial aggregate removal from soil surface due to rainfall impact and flow entrainment
Rate of mobile microbe attachment to suspended sediment
1.15 × 1013 h−1
Rate of immobile microbe removal from soil surface due to rainfall impact and flow entrainment
Rate of microbe release from immobile aggregate into overland flow
Rate of mobile sediment-attached microbial aggregate bond breakdown
gi5, i = 1, 2
Rate of microbe biological decay
3.48 × 10−6 h−1
gi5, i = 3, 4
Rate of microbe removal from system due to biological decay and infiltration
Mean of the underlying normal distribution for intensity Λ
17 m−2 kg−1
Standard deviation of the underlying normal distribution for intensity Λ
Flow velocity on link 1
205.1 m h−1
Flow velocity on link 2
211.21 m h−1
Flow velocity on link 3
211.21 m h−1
Length of link 1
Length of link 2
Length of link 3
Length between xN3 and x
 Let us denote N(t, x) the number of microorganisms per soil mass per unit width, at arbitrary time t, on a strip of land between x0i and x, i = 1, 2. N(t, x) represents the spatiotemporal distribution of microorganisms on the network, and is a random field driven by Markovian dynamics of individual microorganisms in the ensemble.
 At time zero the total number of microorganisms present on links 1 or 2 is a Poisson random variable. Initially, the microorganisms are assumed to be uniformly distributed between x0i, i = 1, 2, and xN3 at time zero (Figure 1).
 For every t > 0, N(t, x) is a nonhomogeneous Poisson process, which means that at any time t, N(t, x) is Poisson distributed with mean mt(x), also known as the mean value function of the process. The mean value function for individual links was formulated in our previous work [Yeghiazarian et al., 2009]:
where Λ is the Poisson intensity, or the rate of the Poisson process, and p(t, x, x0) is the probability that a microorganism found in state 1 at x = x0 and t = 0, will be within (x0, x) by time t.
 In this paper, we seek to derive the mean value function for downstream links, in this simple case represented by link 3, with no loss of generality. The results can be expanded to any dendritic network. The mean value function for any downstream link is the sum of mean value functions of upstream links:
 For the schematic network in Figure 1, the mean value function for any point x on link 3 is:
where p(t, Li + LObs, x0i) is calculated as:
where pk(t, x, x0i) is the probability that a microorganism in state k, k = 2, 4 at x = x0i and t = 0, will be within (x0i, x) by time t;
where I[.] is the indicator function equal to unity when the condition in the argument is satisfied, and equal to zero otherwise. Term (g12 + g14) in the expression for p(t, x, x0i) corresponds to the event of leaving state 1; to the event of “survival,” i.e., not transitioning to the absorbing state 5; the term corresponds to the event of transitioning from state 1 to state 2; the term corresponds to transition from state 1 to state 4; and the last term, corresponds to the event that nothing happens. Integrals p2 and p4 are constructed analogously.
 While the general form of formulae (4)-(6) is similar to those of single-link transport, the expansion of the theory to a dendritic network, more specifically to account for transport on downstream links, required modifications in the limits of integration and arguments in expressions for p2 and p4. For instance, the term in the lower integration limit in (5) and in indicator functions in (6a) and (6b) indicates the amount of time microorganisms will spend traveling on link i with velocity ui plus the amount of time spent on link 3. Further, integrals W3 and W4 both pertain to the event of microbial transitioning from state 4 to state 2, given that they survive. W3 describes transport on upstream links, and W4 on the downstream link, with limits of integration and arguments for p2 updated accordingly. For example, the last argument for p2 in W4, , indicates the terminus of the interval on link 3 where the microorganisms can end up. It is computed as follows: y is the total time of travel, is the time of travel on link 3, is the distance covered on link 3 up to some point before x, and finally is the distance between that point and x (see Figure 1).
2.3. Risk Calculations
 Risk is defined as the probability that the number of microorganisms exceeds a certain threshold, let us call it Nmax. As we have shown above, for any t > 0 the number of microorganisms N(t, x) forms a Poisson process with intensity Λ. Several sampling studies indicated that the number of microbes in surface waters typically follows a lognormal distribution [USEPA, 2010]. A lognormal distribution can be derived from our formulation if the Poisson intensity Λ itself is modeled as a lognormal random variable, Λ ∼ logN(µ, σ), where µ and σ are the mean and standard deviation of the underlying normal distribution. The mean value function mt(x) of the Poisson process then becomes lognormal. Since a Poisson variable with a large mean value function is, approximately, equal to its mean, the values of the Poisson process become themselves approximately lognormal. It is then straightforward to derive the expression for risk as:
where FΛ(.) is the cumulative distribution function (cdf) of Λ, , and, from (1), . In other words, the calculation of risk reduces to computing the cdf of Λ, where its distribution parameters would be estimated from field data.
 The mean value function is shown in Figure 3. It calculates the cumulative mean, up to point x on link 3, so it starts with some value at time zero, and then decreases to zero as the microorganisms either enter the absorbing state or pass x.
 The results of risk calculations are shown in Figure 4. Here the probability that N(t, x) exceeds a threshold Nmax is computed at point x on link 3 at t = 0.75, 1.0, and 1.5 h. All three curves start at and then decline to zero. For a given location, probabilities of exceeding any threshold decline with time, due to the transient nature of events.
 In this article, we have expanded the stochastic theory of microorganism transport over individual hillslopes to a framework applicable to flow networks in watersheds. Each microorganism in the ensemble follows Markovian dynamics driven by microbial interactions with soil sediments on the soil surface and in the flow, giving rise to a nonhomogeneous Poisson random field that describes the ensemble behavior in space and time.
4.1. Particle Dynamics at the Microscopic Scale
 In coupling the microscopic scale of individual microorganism behavior with macroscopic ensemble behavior, out model relies on the physics of microbial dynamics at the microscopic scale. These processes are captured in the definitions of possible microbial states and transitions between them; and in formulation of transition probabilities gjk.
 Markov states are defined based on physical or biological microbial states (Figure 2). Both the surface water and bottom sediment contain free and attached microbes, with the degree of attachment depending on the sediment size, and viability that is subject to biological decay. While the importance of microbial attachment to sediments is recognized in general, the experimental methods commonly employed in standard field studies do not measure microbial partitioning between aqueous versus solid phase in sediment or water column samples. A common assumption is that all microbes in the water and on the streambed are attached to sediments, which would underestimate free microorganisms [Droppo et al., 2011; Pandey et al., 2012]; or that microbes are limited to only two states: waterborne and sediment-borne based on whether they were recovered from water column or streambed samples, which would not measure the fraction of microbes attached to soil sediments within either sample type [Cho et al., 2010; Kim et al., 2010]. Microbial attachment to sediments is sometimes underestimated if coarse sieves are used, which would pass all clay-sized (<2 µm) and some silt-sized (2–50 µm) particles together with any attached microorganisms [Jamieson et al., 2005]. The lack of representation of partitioning is reflected in models that accompany these studies. Partitioning, however, must be accounted for, experimentally and theoretically, because microbial fate depends on whether they are free or attached to sediments—both in the water column and on the streambed [Koken et al., 2013].
 Our model takes a step further and makes a clear distinction between free and attached states of microbes both on the soil surface and in the water flow; and it does that in the probabilistic context of Markov states (Figure 2). Each Markov state therefore represents a natural state of a microorganism, originating from its fate and transport characteristics. Markov states are mutually exclusive, and collectively they exhaust the microbial state space.
 Transition probabilities between states are also critical in the model formulation, because they afford a mechanism to incorporate process physics. For instance, transitioning from an immobile state can take place due to raindrop impact, or the shear stress of the flow; attachment of microbes to sediments is a function of sediment size, etc. [Yeghiazarian et al., 2006]. Our model accounts for partitioning between solid and aqueous phases, with differentiation between microbial attachment rates to sediments of various size classes [Yeghiazarian et al., 2004]. This is accomplished by employing respective kinetic rate coefficients in transition probabilities.
4.2. Relationship With the Two-State Erosion Approach of Parlange and Colleagues
 There is a conceptual link between our microscopic model of microbial behavior, and the stochastic erosion model developed by [Lisle et al., 1998]. The Lisle et al. model follows the microscopic fate of a soil particle and describes its transport as periods of rest and motion according to Markov dynamics. It shows that the deterministic erosion model by Parlange and colleagues [Hairsine and Rose, 1991; Hairsine et al., 1999; Hogarth et al., 2004; Parlange et al., 1999; Sander et al., 1996] can be obtained from the more general, stochastic model by averaging of the stochastic particle motions. Our microbial model is more involved as it contains five Markov states, and incorporates interactions between two species: microbes and soil particles [Yeghiazarian et al., 2006, 2009]. Both Lisle et al.  and Yeghiazarian et al.  derive the macroscopic, ensemble behavior based on the mathematical identity of forward Kolmogorov equations describing the space-time evolution of probabilities, and deterministic mass conservation equations describing the space-time evolution of concentrations of particles in each Markov state. While retaining the description of microbial behavior at the microscopic scale, in the current and in the 2009 papers [Yeghiazarian et al., 2009] we depart from the above macroscopic formulation. Instead of establishing the mathematical equivalency between the flow of probabilities and concentrations, we employ Poisson random fields to describe the ensemble behavior. This allows, within a single stochastic framework, to seamlessly bridge the microscopic scale of individual particle dynamics with the macroscopic scale of population behavior.
 This approach has several advantages. First, this formulation naturally leads to the number of microorganisms at any observation point on the flow network to be lognormally distributed, which is consistent with results of monitoring studies [USEPA, 2010]. Second, it derives and helps understand the spatiotemporal dynamics of risk, a challenging problem because contaminants in water are spatially heterogeneous and temporally transient. Within our framework, risk computations are simple and straightforward, and are practically reduced to calculating the lognormal cdf at any given time and location. The straightforward and computationally efficient assessment of risk is an important feature of our framework because such simplicity in a decision-making tool is likely to appeal to water resources managers who need to calculate the risk of exceeding water quality standards on a regular basis.
 The ability of our framework to capture transient events (Figures 3 and 4) is important in identification of microbial sources, as well as in development of informative monitoring strategies. Specifically, we anticipate that the understanding of how risk changes in time across a watershed, enabled by our model, will inform the characterization of nonpoint sources. This information will further enable targeted and strategic placement of sample collection sites, not only in space but also in time.
 Finally, the concept and computations demonstrated here for a simple three-link network, can be easily expanded to larger networks, which is the topic of our current work.
 The authors would like to thank Astrid Jacobson of Utah State University at Logan, UT for stimulating discussions. This work was partially supported by the ARO grant W911NF-12-10385.