Particle count estimation in dilution series experiments

Estimation of microorganism concentration in samples (bacterial cells or viral particles) has been a focal point in biomedical experiments for more than a century. Serial dilution of the samples is often used to estimate the target concentrations in immunology, virology, and pharmaceutical industry. A new methodology, called joint likelihood estimation (JLE), is proposed to estimate particles such as the number of microorganisms in a sample from counts obtained by serially diluting the sample. It models count data from the entire single dilution series rather than using only specific dilutions. The theoretical framework is based on the binomial and the Poisson distributions and is consistent with the actual experimental process. The estimator of the target concentration is obtained by MLE with derived joint likelihood functions of the observed counts including right‐censored values. Simulations demonstrated that the new JLE method significantly increases precision and accuracy of the estimate compared to the existing methods. It can be applied to a variety of studies with similar experimental designs, especially when the number of particles in the neat sample is very large.


INTRODUCTION
Estimation of microorganism concentration in samples (bacterial cells or viral particles) has been a focal point in biomedical experiments for more than a century. Serial dilution of the samples is often used to estimate the target concentrations in immunology, virology, and pharmaceutical industry (Ben-David & Davidson, 2014). The viral plaque assay is a well-known example, first developed to quantify plaque forming units (PFU) of bacteriophage in a sample (Dulbecco, 1952;Dulbecco & Vogt, 1954). The technique has been also applied in other areas including plant disease assessment (Hepworth, 1996) and bacterial densities estimation in water samples (Myers et al., 1994).
Typically, a series of diluted samples is transferred onto multi-tier plates and spots, each representing an endpoint such as viral plaque or bacterial colony, counted in each well. These counts are then used to back-calculate the particle concentration in the neat (undiluted) sample. Several methods have been proposed and used. The first method is the single well method (Ben-David & Davidson, 2014), where a single dilution from the dilution series is selected as the first or the best countable one and used for the estimation of the neat sample concentration by multiplying the counts by the dilution factor. The estimates can be improved by accounting for the well and particle size to correct for bias due to the overcrowding or merging of nearby spots (Ben-David & Davidson, 2014). The second method, the most probable number (MPN) (Cochran, 1950;Trumble et al., 2017), utilizes the frequency of empty wells in multiple replicates of serial dilutions. Finally, a third hybrid method (Andres Christen & Parker, 2020) estimates the neat sample concentration by fitting a Bayesian hierarchical model to the counts of the first countable wells from multiple replicates of serial dilutions.
All these methods utilize only a portion of the data (i.e., spot counts from particular wells rather than from the entire plate) and can result in relatively low precision and high error for the estimate. The new method proposed in this work, called joint likelihood estimation (JLE), models the particle count using all available information. It is rooted in the experimental design and models the process of serial sampling and diluting of the samples mathematically. The estimate of the neat sample concentration is obtained by maximizing the joint likelihood of all the well counts. The uncountable wells are estimated by suitable right-censored values. Then the joint likelihood that combines censored and uncensored counts is maximized as a function of the neat sample concentration, to obtain the estimate. The JLE method significantly increases precision and accuracy of the estimate and can be applied to a variety of studies with similar experimental designs.

Experimental design
To illustrate the study design, a single vial containing a sample with an unknown number of particles (bacterial cells or viral particles) is serially diluted (Figure 1). A portion of the sample (p 1 ) is transferred into a new vial and mixed with a medium in p 1 to (1 − p 1 ) proportions. The process is repeated creating N progressively diluted samples. Next, a portion of each diluted sample (s 1 ) is transferred into The process of a serial dilution experiment. a multi-well plate. Hence, p 2 = s 1 ∕ (1 − p 1 ) is the proportion of the particles transferred from a diluted sample into a single well. Let us denote the number of particles in each vial as Y 1 , Y 2 , Y 3 , … , Y N , and the number of particles in each well as X 1 , X 2 , X 3 , … , X N . To estimate the particle count in the neat sample ( ), a likelihood function can be constructed and evaluated based on the distributions of the observed counts from the diluted samples. Note that neat sample concentration is the count divided by the volume but for simplicity and without loss of generality we assume the total volume of the neat sample is equal to one.

Statistical modeling
To be consistent with the experimental process, we assumed binomial distributions as the following: and The marginal distributions could be obtained by the properties of conditional binomial distribution, specifically, if X ∼ Bin(n, p) and the conditional distribution of Y given X is Y | X ∼ Bin(X, q), then Y is a simple binomial random variable with distribution Y ∼ Bin(n, pq). Furthermore, if Y | X ∼ Bin(X, q), then X − Y ∼ Bin(X, 1 − q). The marginal distributions are then derived as the following: Once again, by the properties of conditional binomial distribution, the marginal distributions of X i 's are: where q i = p i 1 (1 − p 1 ) p 2 = p i 1 s 1 since p 2 = s 1 ∕ (1 − p 1 ). The X i 's are independent by construction. For example, if p 1 = s 1 = 0.1, then q i = 0.01 i , specifically, X 1 ∼ Bin ( , 0.01), X 2 ∼ Bin ( , 0.01 2 ) , … , X N ∼ Bin ( , 0.01 N ) . Alternatively, the X i 's can be modeled with Poisson distributions (Hodges & Le Cam, 1960) that count discrete occurrences among a continuous domain: with the same values of q i . When the sample size is large and probability p 1 is very small, the binomial distribution can be approximated by the Poisson distribution and the two estimates of are nearly identical.
Let us further denote the observed particle counts in the wells as x 1 , x 2 , x 3 , … , x N . If the neat sample is highly concentrated, so will be the first few samples in the series. This can render the wells uncountable due to particle overlap. In those wells, the actual counts are replaced by censored values (right-censoring) (Hewett & Ganser, 2007). Assuming the first n wells in the series are right-censored, the joint log-likelihood function of the observed counts x 1 , x 2 , x 3 , … , x N for the binomial distribution model is as follows: where F Bin (x i ; , q i ) and f Bin ( x j ; , q j ) denote the cumulative distribution function (CDF) and probability mass function (pmf) of the Binomial distribution with parameters q i and unknown respectively. Similarly, for the Poisson distribution, the joint log-likelihood function is: where F Pois (x i ; q i ) and f Pois ( x j ; q j ) denote the CDF and pmf of the Poisson distribution with parameters q i respectively.
Then, the estimators of by this JLE method is obtained by maximizing the joint log-likelihood functions in the space set S = { ∈ Z + ∶ c 1 r 1 < < c 2 r 2 } , where (r 1 , r 2 ) is the range of the rough estimates from non-zero uncensored observed counts by multiplying them with 1∕q i , and c 1 , c 2 are pre-determined constants, for example, c 1 = 0.5, c 2 = 2, specifically,

JLE method with binomial and Poisson models versus single well methods
Simulations were conducted to check the performance of the proposed JLE methods. First, for different values of , p 1 and s 1 , count sequences x 1 , … , x N were generated according to the experimental process described in formulas (1) and (2), that is: An example of simulated particle counts data with = 10 7 particles in the neat sample, 12 dilutions (N = 12), dilution factors of ×10, ×5, and ×4 (corresponding to p 1 values of 0.1, 0.2 and 0.25 respectively), and the proportions of samples transferred into the wells s 1 = 0.1 and 0.25 are presented in Table 1.
As expected, higher dilution factors (i.e., smaller p 1 ) and smaller portions of the samples transferred to the wells (s 1 ) resulted in empty wells starting earlier in the series. Hence, to obtain more non-empty wells p 1 = 0.25 and s 1 = 0.1. were selected for data simulation. Table 2 shows a scenario of dilution series with 8 dilution series, 10 dilutions each.
The data shown in Table 2 represents the simulated true counts but in reality, densely populated wells (marked with "a" in the table) will become uncountable due to spot overlaps. These values are replaced with censored values. The maximal number of particles per well that can be distinguished depends on the sizes of the well and the spots. In this study, the censored value was set to 120 spots per well (denoted as "a" in Table 2). The threshold of 120 spots per well is the recommended default value based on the spot counting experience of the authors, taking account of well size and spot size. The cutoff could be adjusted by the practitioners and our code allows for that. First (right-most) dilution that encountered counts above this threshold was censored to 120. Moving backwards (to the left) from that dilution, TABLE 1 Simulated counts with different p 1 and s 1 for = 10 7 and N = 12.
x 7 x 8 x 9 x 10 x 11 x 12 0.  Eight simulated samples with = 10 7 , N = 10, p 1 = 0.25 and s 1 = 0.1. Values are too big to be counted and will be replaced in the observed data by censored values.
FIGURE 2 (A) Estimated particle counts by each single well for 8 samples in Table 2. (B) Box plot for estimated particle counts for 8 samples in Table 3, by the first countable well method (Single.1st), the best countable well method (Single.Best) and our JLE method with binomial (JLE.Bin) and Poisson (JLE.Pois) models respectively. Red lines with points represent mean values. censored values were multiplied by the dilution factor time the number of dilutions removed from the first censored dilution. Table 3 shows how the censoring worked for Table 2 counts.
One common way to do the estimation is based on selecting the most countable single well in each series. The estimated particle count is then equal to the count in the selected well divided by its corresponding dilution ratio, that is,̂= x i ∕q i , where q i is the dilution ratio for the selected well. To check the performance of the method, we first applied it on the true counts at each dilution level for the 8 simulated samples in Table 2. For each sample (row in Table 2), 10 estimates of were obtained. Figure 2A shows the estimates and each line represents one sample. The estimates based on the first several dilutions are close to the true = 10 7 but in reality, those wells would be uncountable due to spots' overlap resulting in almost complete coverage of the well. Moving to more diluted samples increased variability due to sampling errors and decreased precision of the estimates due to the rounding effect. Highly diluted samples also produced empty wells which worsened the estimates further.
One way to select a well from the series is taking the first countable well. For the Table 3 data, the first countable wells were found in the 7th column. Another approach is to choose the best countable well balancing between sampling variability and potential counting errors. Counting errors may occur when two or more spots are very close together and become indistinguishable from one another even to the human eye. Another source of counting errors is the undercount by operator when some spots are missed during counting. Both counting errors would result in an under-count that gets exacerbated as the number of spots on a well increases. However, variability of counts because of sampling would increase if choosing a well with a smaller number of spots. Both counting errors and sampling variability would cause inaccuracy and variability of the estimates. By balancing between both, the best countable well could be chosen to do the estimation. According to this rule, a well with 20 to 50 counts is preferred. If none of the well counts fall into this range, the search for the best well continues by moving the target range up by 10 from the upper bound or down by 10 from the lower bound of the 20 to 50 intervals, that is, the order of preference will be 50 to 60, 10 to 20, 60 to 70, 1 to 10, and 70 to 80. The  ordering of the sequence of intervals was chosen to minimize the estimation errors. For example, compared with a well with 50 to 60 spots, a well with 20 to 50 spots is selected because the potential counting error of the former case outweighs sampling variability of the latter. These ranges are selected based on the spot counting experience of the authors, taking account of well size and spot size. For the example in Table 3, the wells selected using the best countable well method are bolded.
The two single-well methods were compared to our new JLE methods described above (for the binomial and Poisson models) by applying them to the Table 3 data. The R function optimize (R Core Team, 2013) was used to numerically maximize the log-likelihood functions. Figure 2B shows the box plot of the 8 estimates by the four methods. The performance of the methods was evaluated by five statistics, with the reported run-time in seconds. The five statistics were the median and the mean of the estimates, the difference between the mean estimate and true (mean bias) divided by , the standard deviation of estimates (SD) divided by , the median absolute error (MAE) divided by , and the mean squared error (MSE) divided by . The calculated statistics and reported run-time are presented in Table 4.
The results indicate that the first countable well method performed better than the best countable well due to the former use of wells with larger counts. The new JLE method used MLE based on the entire series and performed better than the single well methods with smaller variations and estimation errors. The results from the two new methods are similar confirming that Poisson model approximated the binomial model very well for large values. Moreover, by comparing reported run-time, the Poisson method is faster than the binomial method. The Poisson model has less computational cost because the values of PMF for the Poisson distribution are easier and faster to compute than those for the binomial distribution. The difference in run-time would be much larger for bigger sample sizes.
To compare the methods in a more systematic way, 1000 experiments were simulated as described above and with varying parameter values. The true counts in the neat samples were set to = 10 4 , 10 6 , 10 8 , with the proportion of samples transferred into the wells s 1 = 0.1. The number of dilutions (N) and dilution factors (1∕p 1 ) varied. Table 5 displays the results of simulations by the single well methods and the new methods under different designs. Besides the statistics reported in Table 4, 95% bootstrap confidence intervals (CI) were calculated for each estimate. The coverage of bootstrap CI, that is, the percentage of CI that included the true , and the mean values of the bootstrap CI lengths divided by the true were computed and compared ( Table 5).
The simulations show that the estimates using the JLE method with binomial and the Poisson models have smaller SD, MAE and MSE than those obtained by the single well methods. The best countable well performed better than the first countable well method. The 95% bootstrap CI for the estimates by the new methods had the smaller mean length of intervals and their coverage was larger than that of the 95% bootstrap CI by the single well methods in most cases. In addition, the performance of the new methods depended on the size and the number of uncensored values in the series. Given the same count in the neat sample ( ), with a larger dilution ratio (smaller dilution factor) and more dilution levels, a smaller number of wells are censored leading to more informative data overall. This was illustrated by the simulations.

The most probable number method
The most probable number (MPN) method (Cochran, 1950;Garthright, 1993) is based on the frequency of wells with zero spots (i.e., zero counts). Assuming m replicates for each diluted sample, the frequency of empty wells for each dilution level can be estimated with a Binomial model as a function of and its dilution ratio q i by maximizing the product of log-likelihood functions of the frequencies for each dilution. 1000 simulations were conducted with various designs to compare the new methods with the MPN.
To simulate dilution series with emptier (zero-count) wells, smaller dilution ratios were used, and m = 2 replicates were generated for each experimental design. Specifically, = 10 4 , 10 5 , 10 6 , 10 7 and 10 8 were considered, with the proportion of samples transferred into the wells s 1 = 0.1, the number of dilutions N = 10, the number of replicates m = 2, and different dilution proportions p 1 . Similarly, 95% bootstrap CI were constructed for the new methods and the MPN. Table 6 shows the results of simulations by the new methods and the MPN under these design conditions.
A parametric method similar to the MPN, with confidence intervals (CI) and p-values for goodness of fit was proposed by Myers et al. (1994) which was applied on the virologic quantitative microculture (QMC) assay used to quantify free or blood-borne HIV in clinical trials. Trumble et al. (2017) developed an R package called SDLAssay and a web tool for Myers's method with exact and asymptotic CI. The objective of the method is to estimate the concentration of infectious units per million (IUPM) in the HIV studies that the presence of infectious unit is necessary and sufficient to produce a positive result in standard HIV p24 enzyme immunoassays. It assumes 10 6 patient cells in the undiluted sample. The concentration of infectious units is estimated by maximizing the likelihood functions of frequencies of non-infectious wells. As in the MPN, the frequencies are modeled with Binomial distributions with probabilities being a function of the concentration and u i , the number of patient cells per well at the ith dilution level. Here, the u i is equivalent to 10 6 q i in the previous case but it needs to be a positive integer. The true concentration must be smaller than 10 6 per unit of volume because the parameter to be estimated is the concentration per 10 6 patient cells.
The results in Tables 6 and 7 show a better performance from the new method with much smaller mean bias, SD, MAE and MSE. The coverages of 95% CI by the new JLE method are similar to those by the MPN and SDLAssay but the mean lengths of CI by the new method are much smaller. Therefore, for samples with a large number of particles the estimates based on dilution series work better than those based on the frequencies of empty wells. This is expected because of more informative data from non-empty wells. However, even if a small number of spots in the wells are uncountable and we could only provide information about the presence or absence of particles, the methods based on the frequency of empty wells are the only ones that could be used to obtain the count estimate.

Conclusion
Based on the results of simulations, compared to single-well, MPN

DISCUSSION
This article proposes a new methodology for estimating particles such as the number of microorganisms in a sample from counts obtained by serially diluting the sample. The proposed JLE method is especially applicable when the number of particles in the neat sample is very large. Two novel algorithms are proposed to estimate the particle concentration based on a single dilution series by modeling data from the entire series rather than using only specific dilutions. The theoretical framework is based on the binomial and the Poisson distributions and is consistent with the actual experimental process. The estimator is obtained by MLE with derived joint likelihood functions of the observed counts including right-censored values.
Simulations demonstrated good performance of the new JLE method, with smaller estimation errors and variations compared to the existing methods. The two models (binomial and Poisson) produced nearly identical results for large particle counts but the algorithm based on the Poisson model was orders of magnitude faster to compute. In conclusion, the newly proposed JLE method performs well to estimate particle count in a neat sample compared to the conventional method. The method with Poisson model is preferred to that with binomial model due to lower computational cost.
In the new JLE method, the sampling errors are taken into consideration but not the counting errors (Ben-David & Davidson, 2014;Cohen, 1954). For future directions, the counting errors will be considered in the models and the exact CI of estimates will be derived with hypothesis testing. Moreover, a Bayesian hierarchical model will be applied to the new methodology with prior information about the concentration, which is based on an initial rough estimate from the number of empty wells. With the help of the proposed algorithm, a novel and efficient experimental design for the serial dilution assays will be developed to obtain more precise estimates with minimal experimental effort, which will be a significant step forward for these types of studies.
An R package called LoST (Lin et al., 2022) contains the function JLE that implemented the methods in this article. The R package will be submitted to CRAN shortly. In the meantime, it's available on the website: https://sites.rutgers. edu/javier-cabrera/research/

DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available from the corresponding author upon reasonable request.