Corresponding Author Shira A. Mitchell, Department of Biostatistics, Harvard University, 655 Huntington Avenue, Boston, MA 02115, USA. E-mail: smitchel@hsph.harvard.edu

Objective To present an effective classification method based on the prevalence of Schistosoma mansoni in the community.

Methods We created decision rules (defined by cut-offs for number of positive slides), which account for imperfect sensitivity, both with a simple adjustment of fixed sensitivity and with a more complex adjustment of changing sensitivity with prevalence. To reduce screening costs while maintaining accuracy, we propose a pooled classification method. To estimate sensitivity, we use the De Vlas model for worm and egg distributions. We compare the proposed method with the standard method to investigate differences in efficiency, measured by number of slides read, and accuracy, measured by probability of correct classification.

Results Modelling varying sensitivity lowers the lower cut-off more significantly than the upper cut-off, correctly classifying regions as moderate rather than lower, thus receiving life-saving treatment. The classification method goes directly to classification on the basis of positive pools, avoiding having to know sensitivity to estimate prevalence. For model parameter values describing worm and egg distributions among children, the pooled method with 25 slides achieves an expected 89.9% probability of correct classification, whereas the standard method with 50 slides achieves 88.7%.

Conclusions Among children, it is more efficient and more accurate to use the pooled method for classification of S. mansoni prevalence than the current standard method.

Abstract

Objectif: Présenter une méthode de classification efficace basée sur la prévalence de S. mansoni dans la communauté.

Méthodes: Nous avons créé des règles de décision (définies par des seuils pour le nombre de lames positives), qui représentent la sensibilité imparfaite, à la fois avec un simple ajustement de la sensibilité fixe et avec un ajustement plus complexe pour la sensibilité variant avec la prévalence. Pour réduire les coûts du dépistage tout en maintenant la précision, nous proposons une méthode de classification poolée. Pour estimer la sensibilité, nous utilisons le modèle de De Vlas pour les distributions des vers et des œufs. Nous comparons la méthode proposée à la méthode standard pour investiguer sur les différences d’efficacité, mesurée par le nombre de lames lues et la précision, mesurée par la probabilité de classification correcte.

Résultats: La modélisation de la sensibilité variante abaisse le seuil inférieur de façon plus significative que le seuil supérieur, classant correctement les régions comme modérées plutôt que comme plus faibles, recevant ainsi un traitement sauvant des vies. La méthode de classification va directement à la classification sur base des pools positifs, en évitant d’avoir à connaître la sensibilité pour estimer la prévalence. Pour les valeurs des paramètres du modèle décrivant les distributions des vers et des œufs chez les enfants, la méthode poolée avec 25 lames permet d’obtenir une probabilité attendue de 89,9% de classification correcte, alors que la méthode standard avec 50 lames atteint 88,7%.

Conclusions: Chez les enfants, il est plus efficace et plus précis d’utiliser la méthode poolée pour la classification de S. mansoni que la méthode standard actuelle.

Abstract

Objetivo: Presentar un método de clasificación efectivo basado en la prevalencia de S. mansoni en la comunidad.

Métodos: Hemos creado reglas de decisión (definidas por puntos de corte para números de láminas positivas), que tienen en cuenta una sensibilidad imperfecta, tanto con un ajuste simple para una sensibilidad fija como con un ajuste más complejo de sensibilidad cambiante con prevalencia. Para reducir los costes de evaluación al tiempo que se mantiene la precisión, proponemos un método de clasificación por grupos (pooled). Para estimar la sensibilidad, hemos utilizado el modelo De Vlas para la distribución del gusano y los huevos. Hemos comparado el método propuesto al del método estándar para investigar diferencias en eficiencia, medidas por número de láminas leidas, y precisión, medidas por la probabilidad de clasificación correcta.

Resultados: El modelar sensibilidades variables disminuye el punto de corte inferior de forma más significativa que el punto de corte superior, clasificando de forma correcta las regiones como moderadas, más que bajas, y por lo tanto recibiendo tratamiento que puede salvar vidas. El método de clasificación lleva a una clasificación basada en los grupos positivos, evitando la necesidad de tener que conocer la sensibilidad para calcular la prevalencia. Para los valores de parámetros del modelo que describen la distribución de los gusanos y huevos entre niños, el método por grupos con 25 láminas consigue una probabilidad esperada del 89.9% de clasificación correcta, mientras que el método estándar con 50 láminas alcanza un 88.7%.

Conclusiones: Para la clasificación de prevalencia de S. mansoni entre niños, es más eficiente y más preciso utilizar el método por grupos que el método estándar actual.

Schistosoma (S.) mansoni infection (also known as Bilharzia) is a chronic disease caused by parasitic worms. It causes severe liver and intestinal damage, physical growth retardation, and cognition and memory problems. WHO reports that more than 200 million people are infected worldwide and an estimated 700 million people are at risk of infection due to their residence in tropical and subtropical areas, in poor communities without access to safe drinking water and adequate sanitation. The WHO strategy for S. mansoni control focuses on reducing disease through periodic, targeted treatment with praziquantel. WHO guidelines identify a three-tiered strategy based on community prevalence of infection: (i) in communities with a high prevalence (more than 50% infected, we call the high group), universal treatment is conducted once a year; (ii) in communities with a moderate prevalence (more than 20% infected but <50%, we call the moderate group), school-age children are treated once every 2 years; and (iii) in communities with a low prevalence (<20% infected, we call the low group), chemotherapy should be available in health facilities for treatment of suspected cases (Montresor et al. 1998; Albonico et al. 2006). To implement the WHO strategy, we must rapidly and accurately classify communities into one of these three categories. We argue that as the end step is one of classification, it is wasteful to first estimate prevalence and then classify, so instead we directly classify. This has some advantages, as we show below. Our method can easily be extended to more than three categories or redefinition of cut-off values. For example, if it becomes of interest which areas have no infection at all, we may in addition define a very low category.

True prevalence of active infection is defined as the proportion of individuals with at least one worm pair capable of producing eggs. Ideally, this is the quantity that needs to be measured, but the perfect measuring instrument has yet to be devised. Currently, most control programmes are based on the detection of eggs in 47 mg of faecal smears on slides using the Kato-Katz technique. The observed prevalence is defined as the proportion of individuals who show at least one positive egg count (de Vlas & Gryseels 1992). That the observed prevalence is not necessarily equal to the true prevalence should be clear as the observed prevalence is dependent on the quantity of stool examined in the sample, the number of samples collected (and at which time intervals) and the average worm load (indirectly measured by intensity– eggs per gram faeces). This is due to the possibility that a person has worms but that eggs are not present in the stool on a given day, or the eggs are present in the stool, but not in the smear taken from the sample and placed on the slide, or that the egg(s) on the slide is missed by the reader. One can define the sensitivity of the measuring technique to be the probability that an infected individual would be so labelled, and it is clear that this probability is less than one. It has been shown that due to the relationship between the intensity of infection and the prevalence in a region, the sensitivity of the test is correlated with prevalence, with areas of increased prevalence also exhibiting higher sensitivity of the test (de Vlas et al. 1993). The DeVlas model provides a way to quantify the relationship between sensitivity and prevalence. This complicates matters further, and it is difficult to correct for the varying sensitivity with prevalence, itself the unknown quantity (Mitchell & Pagano 2012). This article presents an alternative, direct classification scheme that bypasses the estimation of the prevalence and the associated sensitivity issues.

Current WHO recommendations for measuring prevalence are based on sample surveys of 50 children per school within defined ecological zones (Montresor et al. 1998). This sample size was possibly selected because it was considered to be the maximum number of sample slides that a survey team examines in a single day (Albonico et al. 2006). Such an approach typically involves a survey team of several staff moving with a single vehicle and necessitates entry and analysis of survey data. It is therefore often considered prohibitively expensive for a national programme to sustain parasitological surveys on a large scale where this approach is used (Brooker et al. 2005).

Motivated by the wish to reduce screening costs by reducing the number of slides examined, but at the same time not losing any accuracy, we propose that the classification scheme be based on pooled samples using the standard 47 mg of faeces per slide, but now composed of faecal matter from four separate children. We show that the proposed methodology has statistical properties that are comparable to the standard individual technique based on twice as many readings. We model how the number of eggs per faecal sample varies from person to person and from day to day using the DeVlas model (Mitchell & Pagano 2012).

Methods

Decision rules and OC curves

The WHO control policy is to estimate the prevalence and, on that basis, classify as low, moderate, or high based on whether this estimate falls below 20%, between 20% and 50%, or above 50%, respectively. Our proposed approach is to classify based on the number of slides reported with eggs. We look at m slides and pick decision rules, or cut-offs, d1 and d2. If less than or equal to d1, slides are reported to have eggs, we say we are in the low category; between d1 and d2, we say we are in the moderate category; and above d2, we say we are in the high category. To decide how to choose these cut-off values, we use a metric called the Figure of Merit (FOM), which measures how well we are classifying, on average, given a particular prior for the prevalence, (Olives & Pagano 2010). We want to maximize the FOM, defined as

(1)

We use the FOM to search through all possible decision rules (d1 going from 1 to m and d2 going from d1 + 1 to m) for the highest FOM. The FOM can also be used to compare two methods, such as pooled or individualed, given certain assumptions about reality (a prior belief about the prevalence, and a model for sensitivity). The method with the higher FOM is more accurate in classifying correctly, on average.

Varying vs. constant sensitivity

As sensitivity changes with prevalence, and we need to know the sensitivity to get an unbiased estimate of prevalence, the standard approach leads to (avoidable) complications. We can sidestep this issue by directly classifying a district and foregoing an estimate of the prevalence. In this section, we examine the difference in the decision rules using the current standard individualed method when we consider the sensitivity as fixed or varying with prevalence.

The model describing variation in egg counts in a population (de Vlas & Gryseels 1992; Mitchell & Pagano 2012) is specified by four parameters: h_{0}, the number of eggs output per smear per worm pair; r, the index of aggregation of the number of eggs output per smear for a given worm load; k, the index of aggregation in the distribution of worms in the population; and M, the mean worm load in the population. Small values of k indicate more aggregation and relative overdispersion, with the worm counts highly concentrated in a small section of the population. This situation arises in populations with a low level of immunity, where variation in exposure is not countered by the development of immunity. Such low levels of immunity are seen in younger age groups, where there are lower values of k (de Vlas et al. 1992). Another reason that there could be a high level of overdispersion in worm load (i.e. a low value for k) would be community variation in exposure to infection. Such heterogeneity could arise from a community being composed of a variety of occupations (de Vlas et al. 1992). Thus, when modelling the value of k in a community, one should consider age and homogeneity of exposure. One might then have a prior distribution on k and the prevalence p, which together can be used to compute a prior distribution for M (as prevalence p is a function of k and M).

For choosing values for h_{0} and r, we rely on information we know about the community. For example, in sub-Saharan Africa, the high-fibre diet consisting largely of cassava gives a value of h_{0} close to 0.0125. We also examine h_{0} = 0.035, 0.085, values found by de Vlas 1992 for some regions (note that the h in (de Vlas et al. 1992) is the number of eggs per slide, so h = 4h_{0}, because we consider four smears per slide).

The value for r, describing the variability from smear to smear for the same individual, does not depend on the endemic situation (de Vlas et al. 1993), and we take r = 1.6, as found by (de Vlas & Gryseels 1992). Thus, we can obtain values for h and r based on estimates obtained from similar (geographically, nutritionally) data sets, without assuming particular values of the prevalence. Next, we vary M and k to achieve the range of prevalences from 0% to 100%. As we are testing in very young age groups, where there is very little acquired immunity, we are interested in lower ranges of k (de Vlas et al. 1992). Later in our results, we examine both a narrower and a wider range of k for comparison. We want to include small values of k at small prevalences because prevalence decreases as k decreases (the worms are more concentrated in smaller section of population). Also, lower values of k are more reflective of younger age groups, which are the ages of school children sampled. For the constant sensitivity, we take it to be the corresponding average sensitivity across the range of prevalences (indicated in the left-hand column of Table 1, below the corresponding h_{0} value). The Appendix 1 gives details on the model used.

Table 1. In this table, we examine the difference in the decision rules using the current standard individualed method with m = 50 slides, where we consider the sensitivity as fixed or varying with prevalence

Decision rules for m = 50 slides

Constant sensitivity

Sensitivity varying with prevalence

k = 0.01–0.25

h_{0} = 0.0125 (avg sens = 0.56)

d1

4

3

d2

13

13

h_{0} = 0.035 (avg sens = 0.69)

d1

6

4

d2

16

17

h_{0} = 0.085 (avg sens = 0.79)

d1

7

6

d2

19

20

k = 0.01-0.6

h_{0} = 0.0125 (avg sens = 0.3)

d1

2

1

d2

7

6

h_{0} = 0.035 (avg sens = 0.47)

d1

4

2

d2

11

10

h_{0} = 0.085 (avg sens = 0.64)

d1

5

4

d2

15

15

Pooled vs. individualed

We now compare the pooled and individualed methods, assuming the more complex model of sensitivity varying with prevalence. We use the FOM as a measure of how well the methods perform in terms of classification. To demonstrate the impact of the prior on choice of cut-offs, we can look at three possible choices that capture a range of possibilities: One is the flat prior, where we weight all prevalences as being equally likely (a non-informative prior). The prior Beta (10,2) weights 90% of the probability mass between 70% and 100% prevalences, and for the Beta (2,10), 90% of the mass is between 0% and 30% prevalences. These three models provide a broad coverage of priors.

Undertreatment

The current standard method of classification ignores the test’s low sensitivity, maximizing the FOM assuming a perfect test and a flat prior for prevalence. Above, we compute decision rules for 50 individualed slides based on two alternative realities: constant sensitivity not equal to one and sensitivity that increases with prevalence. In this section, we compare these three decision rules using field data.

Ignoring the varying sensitivity of the diagnostic test and instead assuming perfect sensitivity unfortunately results in undertreatment of S. mansoni. If schools that are truly in a higher WHO category of prevalence are mistakenly classified into a lower category, the actions taken are inappropriate and insufficient to eliminate the disease. In this section, we use real data to quantify the number of schools classified into each category according to three decision rules based on three possible realities: (1) the decision rules assuming a perfect test (d1 = 9, d2 = 24), (2) the decision rules assuming a constant sensitivity of 0.79, corresponding to h_{0} = 0.085 and a low value for k, (d1 = 7, d2 = 19) and, finally, (3) the decision rules assuming varying sensitivity, corresponding to h_{0} = 0.085 and low k, (d1 = 6, d2 = 20).

We used school-level data from schools in Uganda [296 schools, average prevalence of S. mansoni per school: 28% (Kabatereine et al. 2004)]; Tanzania [143 schools, average prevalence of S. mansoni per school: 4.4% (Clements et al. 2006)], Mali [454 schools, average prevalence of S. mansoni per school: 10% (Clements et al. 2009)] and Cameroon [402 schools, average prevalence of S. mansoni per school: 7.3% (Ratard et al. 1990)].

Using data from these four countries, we compiled a list of 1295 schools with the measured prevalences from the field. For the purpose of simulation, we took these prevalences to be truth (likely an underestimate, due to the poor sensitivity of the test). We then simulated testing these schools using 50 slides with sensitivity modelled according to each of the three possible realities. To each of these three results, we applied the three decision rules above.

Results

Varying vs. constant sensitivity

We see in Table 1 that when sensitivity increases with prevalence, the lower cut-off point d1 decreases relative to the d1 under the constant sensitivity model, while the upper cut-off point (d2) stays quite similar (it all depends on how sensitivity increases with prevalence relative to the mean sensitivity). The lower cut-off is most critical. If we believe sensitivity is lower for lower prevalences, we would require fewer positive slides in a lower prevalence area to be convinced to classify in the middle category. As it is doubtful that sensitivity is constant as a function of prevalence, if we rely on the constant sensitivity model, we will likely be misclassifying regions/schools in the low category as moderate.

Pooled vs. individualed

To see how well the classification scheme performs, we turn to the operating characteristic (oc) curve, which completely specifies the classification procedure probabilistically. The probability of classifying prevalence categories into low, moderate or high is plotted against the prevalence. Figure 1a,b shows the oc curves for the decision rules corresponding to the flat prior with no penalty in the pooled method based on 25 slides and individualed method based on 50 slides, respectively. Figure 1c shows the difference between the two. A thicker line in the plot for individualed (pooled) indicates that the probability of correct classification in the individualed (pooled) method is greater than or equal to the probability of correct classification in the pooled (individualed) method. We see that the pooled estimator does better than (or at least as well as) the individualed estimator in correctly classifying the moderate category and in correctly classifying the high category in the range of prevalences 55% to 100%. In the low category, the pooled estimator has a slightly lower probability of correctly classifying as low. Overall, the behaviour of decisions based on 25 pooled slides is quite comparable to 50 individual slides, even doing better at times.

Table 2 shows the decision rules and figures of merit for pooled vs. individualed methods at different levels of k (for h_{0} = 0.0125) and for different priors. We see that the pooled method with only 25 slides outperforms the individualed method with 50 slides for low values of k. For higher values of k, the pooled method with 25 slides does almost as well as the individualed with 50. In all cases, the pooled method with 50 slides does best as measured by the FOM (the probability of correct classification under a given prior).

Table 2. In this table, we compare the pooled and individualed methods, assuming the more complex model of sensitivity varying with prevalence, for h_{0} = 0.0125 and both high and low levels of k

Prior

Beta(2,10)

Uniform

Beta(10,2)

k = 0.01–0.25

m_{p} = 50

d1_{p}

6

6

1

d2_{p}

28

26

24

FOM_{p}

89.6%

92.8%

99.7%

m = 50

d1

4

3

1

d2

17

13

10

FOM

85.0%

88.7%

99.5%

m_{p} = 25

d1_{p}

3

2

1

d2_{p}

15

13

10

FOM_{p}

86.2%

89.9%

99.6%

k = 0.01–0.6

m_{p} = 50

d1_{p}

2

1

1

d2_{p}

11

7

4

FOM_{p}

80.9%

85.1%

99.5%

m = 50

d1

2

1

1

d2

10

6

2

FOM

78.9%

82.2%

99.4%

m_{p} = 25

d1_{p}

1

1

1

d2_{p}

7

4

2

FOM_{p}

77.3%

77.2%

99.0%

Undertreatment

Table 3 shows how the schools were classified into low, moderate and high categories of prevalence (the columns of each 3 by 3 subtable) vs. the actual prevalence categories (the rows of each 3 by 3 subtable). Note that the rows sum to the same numbers in each box, representing the actual prevalence category totals.

Table 3. Comparison of misclassification into the WHO categories using decision rules based on three possible realities: (1) the decision rules assuming a perfect test (d1 = 9, d2 = 24), (2) the decision rules assuming a constant sensitivity of 0.79, corresponding to h_{0} = 0.085 and a low value for k, (d1 = 7, d2 = 19) and, finally, (3) the decision rules assuming varying sensitivity, corresponding to h_{0} = 0.085 and low k, (d1 = 6, d2 = 20). We report the total errors for each, and the sum of the off-diagonal counts

Rules used to classify

1: Assume perfect test

2: Assume constant sensitivity

3: Assume varying sensitivity

Reality 1: perfect test

True category

Low

1019.20

16.80

0.00

996.78

39.20

0.02

980.90

55.10

0.00

Med

9.66

102.04

10.30

3.22

79.30

39.48

1.66

88.04

32.30

High

0.00

6.24

130.76

0.00

0.46

136.54

0.00

0.74

136.26

Errors

43

82.38

89.8

Reality 2: Constant sensitivity

True category

Low

1030.12

5.88

0.00

1016.66

19.34

0.00

1004.72

31.28

0.00

Med

27.22

93.76

1.02

12.38

97.78

11.84

6.82

107.06

8.12

High

0.04

38.54

98.42

0.00

9.28

127.72

0.00

13.42

123.58

Errors

72.7

52.84

59.64

Reality 3: Varying sensitivity

True category

Low

1033.04

2.96

0.00

1024.82

11.18

0.00

1016.44

19.56

0.00

Med

32.94

88.04

1.02

15.82

94.18

12.00

10.08

103.76

8.16

High

0.00

24.76

112.24

0.00

5.58

131.42

0.00

7.90

129.10

Errors

61.68

44.58

45.7

The numbers represent averages over 50 simulations. At the bottom of each 3 by 3 subtable, we list the average number of total misclassifications (a sum of the table’s off-diagonal elements). The rows of the table indicate the sensitivity model used to simulate test results, while the columns indicate which rule was used for classification. Rule 1 corresponds to the rule that assumes reality 1 to model the sensitivity, and same for Rules 2 and 3.

Note that because we chose decision rules that optimize the FOM with a uniform prior, it does not necessarily hold that the fewest errors of classification will occur with the rules developed for the corresponding reality. Note that in reality 3, which is most likely closest to the way the test actually works, Rule 3 is more conservative at the lower boundary, making much fewer mistakes, while Rule 2 is slightly more conservative at the upper boundary but misclassifies many moderate regions into low (on average, 15.82 schools are misclassified into low when truly moderate when using Rule 2, as opposed to only 10.08 on average when using Rule 3).

Discussion

As seen previously, the decision rules for classification change depending on whether sensitivity increases with prevalence. Thus, it is of primary importance to use this relationship in order to choose the most appropriate decision rules in either the pooled or individualed methods. Once we decide upon a model for how sensitivity increases with prevalence, we see that the pooled method offers comparable accuracy for half the number of slides read. Thus, if the laboratory technicians are trained to form pooled slides with the same ease as individualed slides, the halving of reading time will greatly increase the efficiency of monitoring the prevalence of S. mansoni. Furthermore, incorporating a more realistic sensitivity model into the decision rules protects against the limitations of the Katz-Kato method of detection, the current standard.

Halving the slide reading time (by halving the number of slides used) nearly doubles the efficiency of classification, even though the pooled method involves collection of faecal samples from twice as many children (100 rather than 50 to make the 25 pooled slides). The WHO has a protocol for collecting faecal samples from schools (Albonico et al. 2006). Asking more children to provide samples is not a serious practical limitation, as instructions are given to the group, and sample containers are cheap. Collectors of the samples need not spend significantly more time, even with double the children contributing samples. Laboratory technicians already take four smears per sample to create a slide. Thus, taking these four from different specimens does not present additional practical difficulty.

Although individual testing does provide the opportunity for selective treatment, the current work focuses on increasing the efficiency of the epidemiological survey used to rapidly assess the community, as per the WHO protocol. The results of these individual tests are only used to assess the community and provide treatment at the community level. Separately from the survey, patients may visit the clinics for testing whether they exhibit symptoms and obtain an individual test.

After long period of treatment and control in a region, prevalence should decrease, and with that the test sensitivity (as per the De Vlas model). Our methods account for this sensitivity reduction, so that even in areas of lower prevalence, the assumed sensitivities used for assessment are realistic.

More in-depth analyses could investigate whether the efficiency and accuracy further improve if using stool from more than four children. The methodology presented does not have theoretical limitations and is limited only by practical constraints. We have provided the theoretical foundation to investigate further. We chose to focus on four for the purposes of this article because our field experts advised us that dividing a slide by four was feasible, and dividing more would be difficult because it would involve smaller smears with insufficient faecal matter in which to detect eggs. Similarly, the theoretical foundation presented can easily be extended to more than three categories or redefinition of cut-off values. For example, if it becomes of interest which areas have no infection at all, we may also define a very low category.

Misclassifying a population is problematic because wrong classification results in inappropriate community treatment. An area classified into a category that is too low does not receive the treatment necessary to control the disease. An area classified into a category too high may receive drugs unnecessarily, which could result in adverse reactions, and excessive administration of drugs may lead to drug resistance (Albonico et al. 2006).

Accuracy of the prevalence estimation is all important. We have shown in this article that among children, it is more efficient and more accurate to use the pooled method for classification of S. mansoni prevalence than the current standard method.

Acknowledgements

The authors gratefully acknowledge the help given by DrWendi Bailey, Liverpool School of Tropical Medicine, in the laboratory aspects of this manuscript and Dr. Sake J De Vlas, Erasmus University Rotterdam, for help in the modelling aspects of the manuscript, and Professor Simon Brooker, London School of Hygiene and Tropical Medicine, for kindly giving us the data. The project described was supported by Awards Number T32AI007358, R56EB006195 and U54GM088558 from the National Institutes of Health. The content is solely the responsibility of the authors and does not necessarily represent the official views of the National Institutes of Health.

We define the true prevalence as the proportion of individuals with at least one worm-pair capable of producing eggs. The data consist of positive (eggs found) or negative (eggs not found) readings of faecal smear slides. We need to understand the relationship between the true prevalence and probability that eggs are found in the lab testing procedure (the sensitivity). Thus, it is necessary to model how the number of eggs per faecal sample varies from person to person and from day to day.

To this end, let P (Y = y; h_{0}x,r) be the probability of finding y eggs in a stool smear (approximately 12 mg of faecal matter) from a person with x worm-pairs. Note that the distribution of Y incorporates the variability in egg output in the stool, the variability in the number of eggs captured in the smear, and the variability in what the lab technician can actually count. DeVlas considers the negative binomial model where Y∼NegBin (h_{0}x, r) with mean h_{0}x and index of aggregation r (de Vlas et al. 1992). As r (the index of aggregation in the distribution of egg counts) increases, the variance decreases. The parameter r can also account for imprecision in the measurement of 12 mg of faecal matter per smear. For example, if one smear is actually 16 mg, and another is 8 mg, the difference in egg counts from smear to smear is more variable, which can be built into the model with a smaller value for r. We in simulations that the relative benefit of pooling (compared to the individualed method) does not seem to change much when varying r (Mitchell & Pagano 2012).

If n_{m} and n_{f} represent the number of male and female worms, respectively, (n_{m} + n_{f} = n) then x = min (n_{m}, n_{f}) is the number of worm-pairs. DeVlas considers n_{m} and n_{f}∼Bin (n, 1/2). Let P(X = x|N = n) be the probability of having x worm-pairs for an individual with worm load n. Let P(N = n; M, k) be the probability of having n worms. Let N∼NegBin (M, K) have mean M and index of aggregation k. As k (the index of aggregation in the distribution of worms in the population) increases, the variance decreases. The overall distribution of the number of eggs per smear (y) P (Y = y; M, k, h_{0}, r) [see the appendix in (de Vlas et al. 1992)].

Let c be the number of smears per slide. In the literature, c is referred to as the composite sample size, or the pool size. We focus on the case of c = 4 due to common field practice, but our derivations are kept general and our simulation program can accommodate other values of c. To describe collecting c = 4 smears from 1 person on the same day we refer to h_{0} as the number of eggs per smear per worm-pair, so h = ch_{0} is the number of eggs per slide (i.e. sample) per worm-pair. We define prevalence as the probability of having at least one worm-pair, so p = P[X > 0] [see (de Vlas et al. 1993) Box 1.] We define the c-smear sensitivity as the sensitivity when using c smears from the same individual.

Note that h = ch_{0} the number of eggs per sample (i.e. slide) per worm – pair

which does not depend on c. Γ (.) represents the gamma function, a component of various probability distributions and defined as for any z > 0. The I (n ≠ 2x) denotes the indicator function that equals 1 if n ≠ 2x, and 0 otherwise.