Construction of Food and Water Borne Pathogens’ Dose–Response Curves Using the Expanded Fermi Solution

Authors

  • Micha Peleg,

    1. Authors Peleg and Normand are with Dept. of Food Science, Univ. of Massachusetts, Amherst, MA 01003, U.S.A. Author Corradini is with Inst. de Tecnología, Facultad de Ingeniería y Ciencias Exactas, Univ. Argentina de la Empresa, Cdad. de Buenos Aires, Argentina. Direct inquiries to author Peleg (E-mail: micha.peleg@foodsci.umass.edu).
    Search for more papers by this author
  • Mark D Normand,

    1. Authors Peleg and Normand are with Dept. of Food Science, Univ. of Massachusetts, Amherst, MA 01003, U.S.A. Author Corradini is with Inst. de Tecnología, Facultad de Ingeniería y Ciencias Exactas, Univ. Argentina de la Empresa, Cdad. de Buenos Aires, Argentina. Direct inquiries to author Peleg (E-mail: micha.peleg@foodsci.umass.edu).
    Search for more papers by this author
  • Maria G Corradini

    1. Authors Peleg and Normand are with Dept. of Food Science, Univ. of Massachusetts, Amherst, MA 01003, U.S.A. Author Corradini is with Inst. de Tecnología, Facultad de Ingeniería y Ciencias Exactas, Univ. Argentina de la Empresa, Cdad. de Buenos Aires, Argentina. Direct inquiries to author Peleg (E-mail: micha.peleg@foodsci.umass.edu).
    Search for more papers by this author

Abstract

Abstract:  Theoretically, the relationship between the number of pathogens that cause acute infection if settling in the gut, N, and that initially ingested, M, can be constructed from the survival probabilities at the different “stations” along the digestive tract. These probabilities are rarely known exactly, but their ranges can be estimated. If for a given N one generates estimates of M using random probabilities within these ranges, the estimates’ distribution will be approximately lognormal and its cumulative (CDF) form will represent the pathogen's dose–response curve. The distribution's logarithmic mean and standard deviation can be calculated from the ranges with a formula and used to plot the curve. The method was used to generate dose–response curves of hypothetical food and waterborne pathogens and calculate their infective dose (ID) at 5%, 50%, and 95% probability. The curves were compatible with the Beta Poisson model and robust against minor perturbations in the underlying probabilities’ ranges. The calculation and plotting procedure was automated and posted on the Internet as a freely downloadable interactive Wolfram Demonstration. It allows the user to generate, modify, examine, and compare dose–response curves, and to calculate their characteristics, by moving sliders on the screen.

Introduction

In the context of gastrointestinal infection, a dose–response curve usually refers to a plot of the relationship between the number of pathogens’ units ingested (x-axis) and the probability or frequency that it will have a clinical manifestation as illness or death (y-axis). The pathogen can be a food or waterborne infectious microorganism (including protozoa), a virus, and, less frequently, a bacterial spore. Dose–response curves have been an important tool in microbial risk assessment (MRA). They have been amply investigated and documented, as well as mathematically characterized (see Cassin and others 1998; Holcomb and others 1999; Haas and others 2000; Buchanan and others 2000, Kothary and Babu 2001; Strachan and others 2005, for example). The most commonly used 2 parameter mathematical model used to describe dose–response curves is the Beta Poisson (BP) model

image(1)

where Pinf(dose) is the probability of infection after ingesting a dose, the number of pathogen units, and α and β constants, characteristic to the particular pathogen and affected by environmental circumstances (for example, Holcomb and others 1999; Teunis and others 1999; Haas and others 2000). Alternative models include lognormal, log logistic, simple exponential, a “flexible” exponential, and Weibull–Gamma models (Holcomb and others 1999; Haas and others 2000), expanded versions of the BP and other models (for example, Bartrand and others 2008) and a version of the BP model into which the Time Post Incubation (TPI) has been incorporated (Huang and Haas 2009). The mathematical properties of these functions, their fit to simulated and experimental data, and relation to the infectious mechanism have also been amply studied and compared (for example, Teunis and others 1996; Holcomb and others 1999; Teunis and Havelaar 2000; Xie Yang and others 2000; Latimer and others 2001). In many cases, experimental dose–response data on a particular pathogen have been limited for logistic, technical and ethical considerations, especially when human volunteers have been involved. The problem is further complicated by that the infectivity of pathogens can vary dramatically from a few units in some of the most virulent (Alam and Zurek 2006) to many thousands or even millions in the less infectious (Teunis and others 1996). Thus while the ordinate of the dose–response curves’ plots is almost always linear, that of the abscissa is frequently logarithmic. Also, data obtained from epidemiological and laboratory studies frequently do not cover the entire dose–response curve and are notoriously scattered. For these reasons, comparison of the fit of the various dose–response models has not always rendered a clear winner.

The scatter in dose–response data is not unexpected. As has been long recognized, the clinical manifestation of an infection, and sometimes even its reporting, is the result of a stochastic process, involving probabilities that can rarely be accurately assessed, let alone determined exactly—see subsequently. Therefore, an element of uncertainty is inherent in the construction of dose–response curve, regardless of the mathematical model used for its characterization and whether the independent variable is the actual number of pathogens ingested or their logarithm. Since infection by a food or water borne pathogen is largely but not exclusively affected by the conditions of the infective agent, the particulars of the medium in which it has been transmitted, and the immunological state of the humans that have ingested it, the observed scatter in recorded and reported dose–response curves is not at all surprising.

Construction of a dose–response curve, in contrast with merely fitting epidemiological or experimental laboratory data, is based on certain underlying assumptions. The most well known of these, is the “single-hit” hypothesis and its relation to the Beta Poison model, which was critically evaluated by Teunis and Havelaar (2000). The “stochastic approach” is based on identifying the chain of events that leads to infection or death, assigning them with probabilities and multiplying these probabilities to obtain the occurrence's probability (Teunis and others 1996). The concept has been the foundation of several risk assessment methods—see Cassin and others 1998, for example. The starting point of the methods based on it is the tacit admission that the details of the events leading to the infection or death are rarely, if ever, known with certainty and hence the appeal to probabilities. The “Fermi Solution” is a general method of estimation based on the multiplication and division of several factors whose values, which are unknown, can nevertheless be “reasonably assumed.” It is named after the great physicist Enrico Fermi (1901 to 1954) who made it a high art (von Baeyer 1993). We will explain the term “reasonably assumed” and describe the method and its expansion in the following sections. The objectives of this study were to use an expanded version of the Fermi Solution, which has been originally developed for microbial risk assessment (Peleg and others 2007), to generate food and water pathogens’ dose–response curves when hard information on what happens to them prior to and after their ingestion is scant or nonexistent, and to develop a user-friendly interactive program to do the calculations and post it on the Internet as freely downloadable software.

Methodology

The “Fermi Solution”

Suppose the number of pathogens “units” reaching viable or settling in the human gut needed to cause acute infection is known or can be assumed to be N, N ≥ 1. How many “units,” M, M ≥ N, ought to be ingested in order that N of them will survive the digestive tract's hurdles to cause the infection? The question can be answered by animal studies but this will leave extrapolation to humans an open issue. When the infective agent is not lethal and does not cause permanent damage, the answer can come from studies where it is used to infect human volunteers. But in this case, although the number of ingested units, M, can be controlled, the actual number of surviving pathogen units that had produced the acute infection, N, might have varied among the subjects. Either way, an element of uncertainty will remain and the relationship between M and N would have to be estimated rather than accurately determined. Moreover, the susceptibility of individual humans to the same infective agent also can vary widely as a result of its origin and history, their immune system's state, the particular food with which it has been ingested, and other factors. Another way to approach the question is to assume that the number of viable pathogen units, N, reaching the gut is the number of units ingested, M, multiplied by a series of probabilities, Pi's, that they will survive the series of hurdles posed by the host's digestive and immune systems. Identification and quantification of these hurdles, or “barriers” (Teunis and Havelaar 2002), is a major issue in what is known as “Key Events Dose–Response Framework” (Julien and others 2009). But while listing the hurdles or barriers can be straightforward, see Buchanan and others (2009) for example, quantification of their roles is not. This is for the same reasons already mentioned, that is, the inherent variability in the response of individual humans, the pathogen's infectivity dependence on its origin, history, and the food with which it has been ingested (for example, Peterson and others 1989, Hofmann and Eckmann 2006), and so on.

For simplicity, let us suppose that P1 is the probability that the microbial pathogen survives the stomach's acid and enzymes, P2 that it survives the pancreatic juice, P3 that it survives the bile, P4 that it survives the competition from resident microbiota (“microflora”) in the gut, and so on. If indeed these are all the major factors that determine this pathogen's survival and ability to infect (although an expert would certainly add a few other probabilities to the list), then if all these Pi's could be “reasonably estimated,” the expected number of pathogen units that will cause acute infection, N, would be:

image(2)

Hence M, the number of ingested pathogen units that will result in N viable pathogen units, is:

image(3)

For example, if 10 surviving pathogens in the gut are required to cause acute infection, N = 10, and there are 5 main survival probabilities: P1= 0.05, P2= 0.8, P3= 0.7, P4= 0.5, P5= 0.9, then M = 10/(0.05 * 0.8 * 0.7 * 0.5 * 0.9) = 794 (rounded to the nearest integer). This is the “Fermi Solution,” albeit with a fairly small number of factors. Notice that if one of the Pi's value is very close to 1, then the corresponding factor can be considered unimportant and discarded. At the other end, if one of the Pi's value is very small, 10−4 or 10−6 say, then the corresponding hurdle is overwhelming as far as the pathogen is concerned. In such a case, a very large number of pathogen units would have to be ingested to be infective. This number's order of magnitude would be determined not only by that particular probability but by the other probabilities too.

A “reasonable estimate” of a parameter's role, expressed as a survival probability here, is not a scientifically defined term. However, it is not the same as a wild guess, and experts in the field can come up with estimates on the basis of experience and/or published clinical or laboratory data, for instance, which might be fairly accurate (Peleg and others 2007). The reason why the method frequently works is that in a series of “reasonably estimated” parameter values it is unlikely that all or the decisive majority will be exaggerated in one direction. It is more likely that overestimation of some parameter values will be at least partly compensated by underestimation of the values of other parameters, and vice versa. Thus, although never guaranteed, the Fermi Solution method often renders an estimate that is close to the correct value.

The Expanded Fermi Solution

One of the shortcomings of the original Fermi Solution method, especially if the number of factors (probabilities in our case) is small, is that a substantial error in 1 or 2 can skew the final estimate. Also, different experts may disagree on some or all of the parameters values, which will result in a considerable discrepancy between their estimates. A way to avoid such errors and reduce potential disagreement among experts is to specify the parameters magnitudes not as single values but as their likely lower and upper bounds instead (Peleg and others 2007, 2011). Thus, experts who might vigorously disagree on a particular value are more likely to reach a consensus concerning the range in which this value ought to be. Once the key factors have been identified and their corresponding probability ranges decided, one can generate numerous Fermi Solution estimates using combinations of random values of the Pi's each within its assigned range. It can be shown, that the distribution of these estimates, will be approximately lognormal, a manifestation of the central limit theorem (Peleg and others 2007, 2011). The distribution's logarithmic mean, μL, and standard deviation, σL, can be used to calculate the sought “Best Estimate” being the distribution's mode (ibid). This is the Expanded Fermi Solution's Monte Carlo version.

When the random numbers used to pick the parameters values within their respective ranges have a uniform distribution, reflecting “maximum ignorance,” the lognormal distribution's logarithmic mean, μL, and standard deviation, σL, can be also calculated analytically (see J. Horowitz's appendix to Peleg and others 2007). In our case, we want to estimate M for a given (fixed) N, which translates into

image(4)

and

image(5)

where, μLM and σLM are the distribution's logarithmic (base e) mean and standard deviation, respectively, and μLPi and σLPi those of the probabilities.

When the lower and upper bounds of a probabilities i is specified by Pimin and Pimax, respectively, the logarithmic mean (μLPi) and standard deviation (σLPi) are calculated from

image(6)

and

image(7)

where μLPi is calculated by Eq. 6.

Once μLM and σLM have been calculated in this way, the cumulative form (CDF) of the lognormal distribution can be used to generate the relationship between the probability of N pathogens settling in the gut and causing acute infection for any M of them ingested. A plot of this relationship on linear or log-linear coordinates (see Figure 1) will be the estimated dose–response curve for the pathogen in question whose infectivity and survivability are specified by the assigned N and set of probability ranges, respectively.

Figure 1–.

Examples of simulated “monotonic” and sigmoid dose–response curves generated by the Expanded Fermi Solution plotted on linear and log-linear coordinates.

Interactive program to estimate dose—response curves using the Expanded Fermi Solution

The procedure to calculate and plot a dose–response curve using the described method has been automated and posted on the Internet as a Demonstration in the Wolfram Demonstrations Project, see http://demonstrations.wolfram.com/ExpandedFermiSolutionsInPathogensDoseResponseCurves/. The full screen display of the Demonstration is shown in Figure 2. Mathematica Player, the program that runs all the Demonstrations in the Project, is freely downloadable—just follow the instructions on the screen. The user need not install Mathematica® (Wolfram Research, Champaign, Ill., U.S.A.) to run the Demonstrations. This will only be required for modifying a Demonstration's code, adding probabilities, and/or changing their ranges, for example. However, just to create a new curve or change an existing one, see figure, all that the user has to do is enter the new values of N and/or the Pi's minimum and maximum values. They can be entered as numbers or by moving sliders on the screen with the mouse. As soon as any of the parameters is altered, the curve corresponding to the new setting will be displayed almost instantaneously on the screen. Above the plot, the new numerical values of the logarithmic mean (μL) and standard deviation (σL) calculated by Eqs. 4 to 7 will also appear, accompanied by the 5%, 50%, and 95% quantities, that is, the infective doses, ID5%, ID50%, and ID95%. The Demonstration also allows the user to enter any particular value of M by typing its value, or moving its slider, and the corresponding probability's numerical value will also be displayed. The location of the selection is shown as a moving colored dot on the displayed curve. The curve itself can be plotted on linear or log-linear coordinates, the choice being made by clicking on the “x-axis” bar setter.

Figure 2–.

Screen display of the Wolfram Demonstration that generates dose–response curves using the Expanded Fermi Solution method. Note that N, M, and the Pi's are all in lower case.

The ease at which N and the probabilities are varied enables the user to generate a large number of their combinations in a very short time and examine their potential effect on the dose–response curve and its characteristics, notably ID50%. The Demonstration also facilitates the examination of how N, the assumed number of surviving pathogen units needed to cause an infection, affects the risk when different amounts of the pathogen are ingested. Or alternatively, the Demonstration can be used to assess how errors or uncertainties in the probabilities’ lower and upper bounds might affect the dose–response curve's shape and its characteristics. The Demonstration, therefore, can facilitate an expert team's discussion and help them reach a consensus. Or, if a disagreement among the experts persists, the Demonstration would enable them to quantify the differences in their assessments in clearly defined terms, that is, the corresponding ID5%, ID50%, and ID95%. The large ranges of N's and M's that the Demonstration allows would enable the assessors to examine scenarios involving a single or very few pathogens, as in cases involving extremely virulent ones, and scenarios involving up to 108 pathogens, which would include most of if not all the more “benign” ones. In its present version, the Demonstration only allows the user to enter up to 6 different probabilities. To include more will require the program's code modification. To use fewer than 6 probabilities, all the user has to do is set the lower and upper limits of the unneeded probabilities to 1. In the current version of the Demonstration posted in the Internet, the probabilities’ ranges are all set to 0.001 to 1. To change any of them or zoom on a particular probability interval, from 0.0001 to 0.0005 or from 0.5 to 0.7, say, will require replacement the corresponding current values of Pmin and Pmax by these new values.

The method and program can be extended to probabilities involving a pretreatment of the food or water (Peleg and others 2010). For example, P1 can be the probability that the pathogen in question has survived refrigeration or washing of the food, or the water disinfection, in which case M will be the number of pathogen units present prior to the treatment. In a similar manner, one can include in the analysis the probability of mortality from the infection. In such a case, P5 might be the probability of the pathogen establishing itself in the gut and P6 that if there would cause the victim's (human or animal) death.

Comparison of the Lognormal and Beta Poisson Models

As already stated, the Beta Poisson equation is the most commonly used mathematical dose–response model. However, it has been amply demonstrated that alternative 2 parameter models, including the cumulative form of the lognormal distribution, can have a comparative fit, especially when applied to highly scattered data. To demonstrate the point, we have generated numerous smooth dose–response data with the Beta Poisson model, added to it a random scatter of various amplitudes, and fitted the scattered data with the lognormal model (Eqs. 8 and 9—see subsequently). The procedure to generate the data, choose the number of points (5 to 300 in the current web version) and the scatter's amplitude, execute the nonlinear regression and display the comparison between the resulting dose–response curves has been automated. It is available to the reader in the form of another freely downloadable Wolfram Demonstration, open: http://demonstrations.wolfram.com/PathogenDoseResponseCurvesWithTheBetaPoissonAndLognormalMode/, whose screen display is shown in Figure 3.

Figure 3–.

Screen display of the Wolfram Demonstration that generates scattered dose–response data with the Beta Poisson model and fits them with the lognormal model.

Figure 4 shows examples of dose–response curves generated by the program. It demonstrates that there is a considerable range of α and β values, and M range, where the Beta Poisson and lognormal models can be used interchangeably for all practical purposes. Notice that published real dose–response curves rarely have more than 5 to 10 data points and, in many cases, a larger scatter than that shown in the figure and allowed by the program. The reason for including the much larger number of data points and a smaller scatter is to demonstrate that the lognormal model's fit is not accidental or merely an artifact of the scatter.

Figure 4–.

Two examples of the fit of the lognormal distribution to dose–response data generated with the Beta Poisson model. They demonstrate that in a large number of situations the 2 models can be used interchangeably.

The Beta Poisson model is a simple algebraic equation while the cumulative form of the lognormal distribution is not—it contains the Error Function (erf), which requires numerical integration for its values calculation, that is,

image(8)

With modern mathematical software such as Mathematica,® the software used in this research, this has no practical consequences. The function defined by Eq. 8 when written in the syntax of Mathematica® has the form

image(9)

where CDF stands for the cumulative density function of the distribution specified between the brackets. The function p[m] in Eq. 9, representing P(M) in the text, is executed and plotted just as fast as an algebraic function containing only power, exponential or logarithmic terms, and so on. (For this reason, the Wolfram Demonstration's response to changes in the sliders appears instantaneous.) Also, p[m] as defined by the equation can be used as a model for nonlinear regression to extract μL and σL from experimental dose–response data. The lognormal distribution model's parameters may have little intuitive meaning to many practitioners. But the same can be said on the Beta Poisson model's parameters α and β and on the parameters of other dose–response models, perhaps with the exception of the exponential model. The model parameters’ lack of intuitive meaning becomes a moot issue when one compares different dose–response curves in terms of their ID5%, ID50%, and ID95% as in the shown Wolfram Demonstration, or alternative terms such as ID1%, ID10%, and ID99%, and so on.

The Results' Robustness Against Small Perturbations

Examples of dose–response curves created with the model are shown in Figure 5 and 6. Each pair of curves was generated with 2 sets of slightly different probability ranges whose numerical values and the resulting dose–response parameters are given in Table 1 and 2. The figures and tables demonstrate that the method is robust against small perturbations. This was also observed in other applications of the methods (Peleg and others 2007, 2010) and is therefore not surprising. It suggests that experts can reach a consensus on the final result even if minor differences of opinion concerning the magnitudes of one or more probabilities remain unsettled. But more importantly, the ease at which the dose–response curves are generated enables the experts to examine immediately how their differing estimates of the ranges would affect the final assessment.

Figure 5–.

A pair of dose–response curves generated with the Expanded Fermi Solution method with slightly different probability ranges and plotted on linear coordinates. The corresponding generation, fit and “pathogenicity parameters” are listed in Table 1. Note the method's robustness against minor perturbations in the underlying probability ranges.

Figure 6–.

A pair of dose–response curves generated with the Expanded Fermi Solution method with slightly different probability ranges and plotted on log-linear coordinates. The corresponding generation, fit and “pathogenicity parameters” are listed in Table 2. Note the method's robustness against minor perturbations in the underlying probability ranges.

Table 1–.  The generation and virulence (“pathogenicity”) parameters used to generate the dose–response curves shown in Figure 5 and calculate the hypothetical pathogen's Infective Doses.a
Factor/resultSet 1Set 2
  1. aμL and σL are the logarithmic mean and the standard deviation of the estimates’ distribution.

N1020
P1min– P1max0.7 to 0.80.6 to 0.9
P2min– P2max0.5 to 0.70.4 to 0.6
P3min– P3max0.2 to 0.80.3 to 0.7
P4min– P4max0.1 to 0.30.2 to 0.8
μL5.5225.472
σL0.2520.231
Probability of infection M = 30076.584.2
Probability of infection M = 20018.722.5
Infective dose 5%165163
Infective dose 50%250238
Infective dose 95%378348
Table 2–.  The generation and virulence (“pathogenicity”) parameters used to generate the dose–response curves shown in Figure 6 and calculate the hypothetical pathogen's Infective Doses.a
Factor/resultSet 1Set 2
  1. aμL and σL are the logarithmic mean and the standard deviation of the estimates' distribution.

N2030
P1min– P1max0.05 to 0.30.1 to 0.5
P2min– P2max0.7 to 0.90.5 to 0.7
P3min– P3max0.6 to 0.90.01 to 0.6
P4min– P4max0.3 to 0.60.4 to 0.7
P5min– P5max0.01 to 0.50.5 to 0.9
μL7.797.63
σL0.9690.965
Probability of infection M = 200042.248.8
Probability of infection M = 100018.127
Infective dose 5%492421
Infective dose 50%24222059
Infective dose 95%1192010070

Again, the closer the probabilities estimates are to their correct values and the narrower their ranges, the more accurate and reliable the dose–response curve so created will be. Had all the probabilities been known exactly, the Expanded Fermi Solution would be reduced to the original Fermi Solution method. But encountering such a scenario, as previously explained, is very unlikely because of the inherent uncertainties concerning the events that lead to an infection or in severe cases death.

Conclusions

The proposed Expanded Fermi Solution is not intended to replace clinical or experimental methods to determine microbial dose–response curves but a way to generate such curves in situations where information on the infection either does not exist or is insufficient to develop an accurate mechanistic model. Uncertainties concerning infection by food and waterborne pathogens stem from inherent variability in their physiological state when ingested, the number of “units” actually consumed, their fate during and after passing through the digestive tract, and the state of the host's immune system among others. In many cases, the virulence of a pathogen is determined in an animal model. Thus, extrapolation of the results to humans introduces another kind of uncertainty. The same can be said about dose–response data obtained from human volunteers. Although the most reliable, the issue of how representative a group of rather healthy volunteers is of the population at large remains unresolved. The extrapolation issue becomes even more troublesome when it comes to assessing the risk of particularly vulnerable groups or segments of the populations, such as the elderly, chronically ill, malnourished children or infants, and people with a compromised immune system. In such cases, the Expanded Fermi Solution offers a way to integrate expertise and knowledge obtained from related cases or the literature to construct a dose–response curves that could not be determined directly. To do that, one would only need to adjust the number of surviving pathogens that are likely to cause an acute infection or death, identify the probabilities that determine their survival and infectivity once ingested, and set their most likely lower and upper bounds. Although the method does not guarantee an accurate result, the dose–response curves that it generates might have a better chance to be close to the correct ones than those based on arbitrary assumptions or guessed parameters values.

The Expanded Fermi Solution method can also be useful in establishing a decision making protocol that can be evaluated a posteriori and if necessary revised. This could be done by adding or eliminating probabilities or adjusting their lower and upper boundaries, for example, or by modifying the value of N when new information becomes available. The Wolfram Demonstration that generates and plots the dose–response curve and calculates its parameters (ID5%, ID50%, and ID95% in the current version), enables quick assessment of the consequences of a change or variations in the model's parameters. Through modification of the probability ranges, the method also enables the user to compare the survivability and virulence of different pathogens and to simulate the effects of environmental factors, such as the food with which the pathogen is ingested, on the dose–response curve's characteristics. Yet, we should reiterate that the probabilistic approach has only been developed for situations where hard information is scarce and for simulations. It should not be construed as a reason to reduce the efforts to collect relevant clinical and epidemiological data. These have been and will continue to be the basis of pathogens’ dose–response curves construction. Hopefully, future research will show that the clinical and probabilistic approaches are complementary, the first creating the database and its mechanistic interpretation and the second serving as a tool of risk quantification and attempts at its extrapolation to populations for which data are scarce or nonexistent.

Acknowledgment

Contribution of the Massachusetts Agricultural Experiment Station at Amherst.

Ancillary