The impact of diarrhoea measurement methods for under 5s in low‐ and middle‐income countries on estimated diarrhoea rates at the population level: A systematic review and meta‐analysis of methodological and primary empirical studies

Abstract Objective We systematically reviewed all studies published between 2000 and June 2021 that estimated under 5 diarrhoea rates in low‐ and middle‐income countries and extracted data on diarrhoea rates, measurement methods and reactivity. Methods We summarised data from studies that performed direct comparisons of methods, and indirectly compared studies which utilised only one method using meta‐regression to determine the association between methods and estimated diarrhoea rates. Results In total, 288 studies met our inclusion criteria: 4 direct comparisons and 284 studies utilising only one measurement method. Meta‐regression across all studies showed that diarrhoea rates were sensitive to method of measurement. We estimated that passive surveillance methods were associated with a 97% lower estimated rate than active surveillance (IRR = 0.03, 95% CI [0.02, 0.06]). Among active surveillance studies, a doubling of recall period was associated with a 48% lower rate (IRR = 0.52 [0.46, 0.60]), while decreased questioning frequency was associated with a higher estimated rate: at the extreme, one time questioning yielded an over 4× higher rate than daily questioning (IRR = 4.22 [2.73, 6.52]). Conclusions Estimated diarrhoea rates are sensitive to their measurement methods. There is a need for a standardisation of diarrhoea measurement methods, and for the use of other outcomes in the measurement of population‐level gastrointestinal health.

thereby obscuring the effects of the explanatory variable(s) of interest. Since diarrhoea is one of the biggest causes of death in children, this is a methodological point of considerable practical importance for surveillance, evaluation of interventions and establishing the true burden of diarrhoeaassociated morbidity and mortality. The two most common methods of diarrhoea surveillance at the population level, passive and active surveillance, take different approaches. Passive surveillance relies on data collected from health facilities and therefore excludes all children with diarrhoea who do not attend a facility. Passive surveillance estimates are therefore skewed towards more severe disease and away from marginalised groups such as slum dwellers, refugees and migrants, who are less likely to visit health facilities and who are more likely to visit informal facilities than nonmarginalised groups [4][5][6][7][8][9]. Passive surveillance is a useful, inexpensive tool to detect new outbreaks of severe diseases such as cholera, but is arguably less useful as an epidemiological tool for the measurement of population-level diarrhoea rates or in trials of WASH interventions.
Active surveillance, based on door-to-door surveys, provides a more complete report of diarrhoea rates than passive surveillance, but may also be subject to measurement error and bias [4]. Carers may forget events that happened in the past, particularly during longer lengths of recall [10,11]. They may also have a poor understanding of what diarrhoea is [12]. UNICEF and the Demographics and Health Surveys (DHS) programme have recommended a method based on asking carers if their child has had three or more loose or watery stools in any 24-h period within the previous 14 days [13]. However, this method is by no means universally applied. In addition to concerns over measurement error, concern has been expressed that diarrhoea rates may be subjective to bias due to 'reactivity', where despite any true clinical difference, people report different diarrhoea rates for psychological reasons. For example, they may be the beneficiaries of an intervention and not want to appear ungrateful; not want to answer in a way that is not socially desirable or be guided subliminally or non-verbally by the surveyors (e.g. if the question is asked in a way which steers the respondents towards a certain answer) [14,15]. As such, reactivity creates a particular concern for evaluations of WASH interventions.
In order to examine the association between the method used in diarrhoea measurement and diarrhoea rates, we conducted a systematic review of all studies published between 2000 and June 2021 that report under five diarrhoea rates in low-and middle-income countries (LMICs). We examined studies that perform direct 'head-to-head' comparisons of different methods in order to estimate differences in diarrhoea rates by method. However, we found only four such studies. We therefore obtained studies that used only one measurement method so that we could compare the estimated diarrhoea rates of each method across studies indirectly by means of meta-analytical methods [16,17].
The aims of the systematic review and meta-analysis were to determine: (1) the frequency of the use of the different diarrhoea measurement methods; (2) the association between passive and active surveillance methods and estimated diarrhoea rates; (3) the association between recall periods, questioning frequencies and prospective (diary) versus retrospective recall on estimated diarrhoea rates among active surveillance studies; and (4) the extent of reactivity in diarrhoea measurement.

Search strategy and selection criteria
We conducted a systematic review of studies published between 2000 and June 2021 that made quantitative measurements of diarrhoea rates among children under the age of 5 in LMICs (as defined by OECD) [18]. The search strategy aimed to capture any study that estimated diarrhoea rates among under 5s, including both studies that performed direct 'head-to-head' comparisons of methods, and studies estimating diarrhoea rates using only one method that we can compare indirectly. As we are only interested in populationlevel diarrhoea rates, we excluded studies that were not designed to capture population-level diarrhoea rates, such as studies which measured hospital acquired infection, clinical trials in which diarrhoea was an adverse drug event and case-control studies in which diarrhoea was the case. Importantly, these exclusions do not include WASH trials which took place in the community setting. Studies were restricted to English or French (Table 1).
We searched the MEDLINE, Embase and PubMed databases for studies matching the inclusion and exclusion criteria. The search string (Appendix 1) included keywords relating to diarrhoea and population-level disease measurement. The string then restricted the results to human studies and studies in LMICs.
SW and RR independently screened each title and abstract, and any disagreements were resolved through discussion with RL. Full texts were screened in a similar manner. Where full texts were not available, we requested the article from the University of Warwick's Article Reach Service. We excluded unavailable studies and duplicate studies. In the event that multiple studies used the same data source, we selected a random study for inclusion.

Data extraction
RR extracted the data, with SW duplicating data extraction for random 15% of the included studies. The random 15% extracted by SW matched completely with the extraction by RR. As a single reported study may have included several independent reports of diarrhoea rates, we treated each report as separate 'observations' within one study. This would not only apply to all the direct studies but also arose in indirect studies when conducted in more than one site, included multiple rounds of data collection and/or was a trial with multiple arms. For example, a two-armed trial which estimated diarrhoea rates at baseline and end-line yielded four observations. Data extracted (Table 2) included participant demographics; study design, if the study was an observational study using primary data (including non-randomised trials), an observational study using secondary data or a randomised control trial; diarrhoea rates; measurement methods and if the study was a direct comparison of methods. We defined direct comparison studies as those which included at least two separate arms, each with a different method of diarrhoea measurement (including altering recall period or questioning frequency) that compare estimated diarrhoea rates between each arm.

Data analysis
We indirectly compared the diarrhoea rates from included observations, including those from studies that performed direct comparisons of measurement methods, and studies using only a single method of measurement. We summarised the key variables in each individual observation, including estimated diarrhoea rates, measurement type, year and region (as defined by UNDP). We estimated a hierarchical meta-regression model for log diarrhoea rate (incidence in episodes per child year), adjusting for region and time (in years) and interactions between time and region. We also adjusted for study design, including a categorical variable with levels: (1) observational studies which use primary data sources; (2) observational studies which use secondary data sources; (3) RCT intervention arms before the intervention, or the control arm (if reported) and (4) RCT experimental arm after the intervention. The model was estimated in StataSE Version 15 using generalised least squares with random effects at the study level, to account for within-study correlation, and weighting by the study sample size [19,20].
We estimated two separate meta-regression models. The first iteration included the observations from both passive and active surveillance studies to estimate the effect of surveillance types (passive/active) on estimated diarrhoea rates, and as such included a dummy variable for surveillance type. We also estimated pooled temporal and regional trends from this model. The second iteration included only observations from active surveillance studies to examine the effects of variables exclusive to active surveillance studies on estimated diarrhoea rates. These included variables for recall period (as a continuous numeric term in days), and questioning frequency and recall type (both as categorical variables). Furthermore, any other variations found between active surveillance studies, including reactivity and questioning type (e.g. verbal or pictorial), were included (Table 3).

Study identification
We identified 2040 studies in total, which was reduced to 1973 after duplicates were removed. Abstract and title screening yielded 577 studies, with a further 289 excluded after full-text review. Common reasons for exclusion included not presenting data on under 5s (n = 64), not including data on diarrhoea rates (n = 58), and not being able to obtain the full text (n = 4). Thus, 288 full-text studies (Appendix 2) were included in the final review ( Figure 1).

Study characteristics
We first describe the characteristics of the described studies, such as observations per study, data collection method, geography and others. As stated, many studies included more than one observation of diarrhoea rate, arising from observations at different time points, being a trial with two or more arms, or data collection in more than one location. Appendix 3 presents information on the number of observations per study. We identified 671 separate observations of population-level diarrhoea rates, and these constitute our denominator. In total, there were 646 (96%) active surveillance observations and 25 (4%) passive surveillance observations. Of the 646 active surveillance observations, 633 (94%) were retrospective, while 13 (2%) were prospective (a diary kept by the carer). Of the observations, 354 (56%) used a 14-day recall period, as recommended by UNICEF and the Demographic and Health Survey (DHS) programme. Of the 188 observations which came from randomised control trials (RCTs), 53 (28%) used a 14-day recall period and 95 (51%) used a 7-day recall period. Furthermore, of the observations from RCTs, 21 (12%) questioned daily, 56 (30%) questioned weekly and 49 (26%) questioned biweekly.
None of the included active surveillance observations used non-verbal methods of diarrhoea measurement (e.g. showing carers pictures of stool), and no studies made mention of a 'gold standard' of diarrhoea measurement.
Four of the included studies performed direct head-tohead comparisons of diarrhoea measurement methods: three examining the effect of differing recall periods on diarrhoea rates, and one examining the effect of questioning frequency on estimated diarrhoea rates. No studies were identified that analysed reactivity in diarrhoea measurement.

Differences between active and passive surveillance
As stated in the Introduction, the two main categories of disease surveillance are active surveillance (community surveying) and passive surveillance (measures of visits to health facilities). No studies performed direct 'head-tohead' comparisons of active and passive surveillance. After model-based adjustment to perform an indirect comparison of passive and active surveillance, passive surveillance was associated with a 97% lower estimated diarrhoea rate than active surveillance (incidence risk ratio [IRR] = 0.03, 95% CI [0.02, 0.06]) ( Table 4).

Impacts of factors within active surveillance studies on estimated diarrhoea rates
Specific to active surveillance studies, questioning frequency (how often participants are surveyed), recall period of diarrhoea surveying and retrospective versus prospective questioning may differ.

Questioning frequency
One study directly examined the effect of differing questioning frequencies on estimated diarrhoea rates; Zwane et al. estimated that biweekly surveys had a 7-15% lower diarrhoea rate than six-monthly surveys when using the same recall period [21]. Our indirect comparison of active surveillance observations produced comparable results; after modelbased adjustment, we found that less frequent questioning was associated with an increase in estimated diarrhoea rates. For example, one-time questioning was associated with a rate over four times higher than daily questioning (IRR = 4.22 [2.73, 6.52]) (Table 4). This, however, is not evident graphically in unadjusted crude data -however, a large amount of variance as questioning frequency increases can still be seen ( Figure 2a).

Recall period
Three studies directly examined the effect of differing recall periods on estimated diarrhoea rates, though the recall periods examined were different. Melo et al. found that diarrhoea rates were cut by a third when carers recall over 4 weeks versus 24 h [22]. Feikin et al. similarly estimated that diarrhoea rates were cut by a fifth when carers recall over 11-13 days versus 1-2 days [10]. Lee et al. estimated that estimated diarrhoea rates were similar for carers who recalled over a 72-h period versus a 24-h period [23], but this is a much shorter range than that investigated in the other two studies.
Based on an indirect comparison of the included active surveillance observations, we estimated that recall periods and estimated diarrhoea rates were inversely associated. After model-based adjustment, we found that a doubling of recall period was associated with a 48% reduction in diarrhoea rate (IRR = 0.52 [0.46, 0.60]; Table 4). This is also evident graphically in the crude data ( Figure 2b).

Prospective versus retrospective
No studies directly compared prospective (diary) recall designs against retrospective. We estimated through indirect  Table 4).

Main findings
We provide evidence that estimated under 5 diarrhoea rates are sensitive to the methods used in their measurement. This includes variance introduced by the choice of passive or active surveillance, as well as factors specific to active surveillance. Passive surveillance methods were associated with 97% lower diarrhoea rates than active surveillance methods. The most probable explanation is that carers do not seek health care for the majority of cases where, if asked, they would report diarrhoea. While not shown in our results, several studies on access to health care among infants in LMICs show that the propensity of carers to seek care for their under 5s with diarrhoea is influenced by diarrhoea severity, socioeconomic or legal status and other demographic characteristics [5][6][7][8][9].
Regarding active surveillance methods, we found that different questioning frequencies influence estimated diarrhoea rates. There was a trend to lower estimated diarrhoea rates given higher questioning frequencies: one off questioning was associated with an over four times higher estimated diarrhoea rate than daily questioning. We also found that differing recall periods were associated with a change in estimated diarrhoea rates: a doubling of recall period was associated with a halving of estimated diarrhoea rate.

Factors that result in subjective measurements during active surveillance
As the distinction between and recall of a diarrhoeal or non-diarrhoeal stool by carers is largely subjective, several cognitive factors can affect measurement. These include respondent fatigue (becoming tired of answering questions), recall bias (forgetting events that have occurred in the past), perception bias (not understanding the question being asked) and reactivity (answering differently due to experiencing an intervention).

Respondent fatigue
Declining diarrhoea rates with increasing questioning frequency (but the same recall period) suggest respondent fatigue -participants may be inclined to pay attention to their bowel movements at first, but lose motivation with further rounds of questioning.

Recall bias
Recall bias is the effect of forgetting: participants are more likely to recall recent than older events. We would expect a lower reported number of diarrhoea episodes with longer recall periods and this was borne out by our analysis including two of the three head-to-head comparisons -the exception examined a much smaller gap between questions than the other two [10,22]. This finding was corroborated by a more recent head-to-head comparison where daily recall was associated with a 30 percentage point higher estimated diarrhoea rate than fortnightly recall during a text message survey of under 5 diarrhoea in urban Tanzania [24]. While not examined in our review, it has also been reported that the effect of recall bias is more apparent for moderate diarrhoea compared to severe diarrhoea. Zafar et al., for example, found that moderate diarrhoea is reported at half the rate of severe diarrhoea during longer recall periods [11].

Other factors affecting diarrhoea measurement
Other factors outside the scope of this review can further influence estimated diarrhoea rates. For example, poor caregiver perception of diarrhoea (understanding what is or is not diarrhoea) can result in error in diarrhoea measurement. Voskuijl et al. determined that carers of children under 5 were only able to identify 56-75% of loose or watery stool and 80% healthy stools [12]. Another relevant phenomenon is 'reactivity', whereby participants adjust their answers to a survey according to how they believe they ought to respond, regardless of any true underlying difference. We did not identify any studies of reactivity in this review, but it has been discussed as a potential explanatory factor in previous trials. Luby et al. mentioned courtesy bias as a potential source of reactivity, stating 'people who received the intervention might have been grateful and, out of courtesy, reported less diarrhoea' [14]. Wood et al. further found evidence for reactivity in clinical trials for various diseases, reporting that inadequate concealment of interventions is associated with improved treatment performance in trials, particularly for subjective outcomes [25]. It was not possible to examine reactivity through courtesy bias or inadequate concealment in our review as WASH trials by nature are unblinded.

CONCLUSION
The magnitude of the variation in diarrhoea rates, even among active methods of surveillance, suggests the need for standardisation of diarrhoea measurement methods to facilitate comparisons between studies. Despite the 14day UNICEF and DHS standard, there does seem to be a trend towards using a 7-day recall period. It was the most frequently used recall period (51%) among RCTs in our review. Three of the three recent large integrated WASH trials (the SHINE trials in Bangladesh and Kenya, and the WASH Benefits trial in Zimbabwe) also used a 7-day retrospective recall to measure diarrhoea [26][27][28][29], in contravention of the UNICEF and DHS guidelines. However, the above three trials differed among themselves with respect to question frequency: the SHINE trials questioned carers annually, while the WASH Benefits trial questioned mothers every '2 to 6' months [27][28][29].
It could be argued that lack of standardisation simply introduces measurement error in trials that can be counteracted by increasing sample size. However, this is only likely to be true if it is assumed that the different methods affect only the propensity of someone with a true case of diarrhoea (or indeed enteric infection) to report a case of diarrhoea (the 'sensitivity' of the method) [30]. If, however, there is a loss of 'specificity' -the propensity of someone who did not have diarrhoea (or enteric infection) to report a case of diarrhoea -then intervention effects will also be biased across studies using different methods. It is also likely that any measurement errors will bias results towards the null, rather than towards reports of intervention effectiveness [30]. It is therefore possible that the choice of methodology is at least partially responsible for the widely varying, and often disappointing, results of evaluations on WASH interventions [14,27,29,31,32].
While a widely accepted standard would facilitate comparisons across different observational and experimental studies, this raises the question of what the optimal standard may be that would also produce reliable reports of diarrhoea rates. There is no 'gold standard' method for measurement of diarrhoea rates. In part, this is because of the difficulty in defining the underlying construct and providing a culturally and linguistically consistent definition of a case or episode of 'diarrhoea'. Direct observation by an expert might constitute a gold standard against which other methods could be compared, as has been described above in the study by Voskuiljl and colleagues [12]. However, judgements among experts may not be universal. Moreover, the collection of every stool and use of experts to classify them quickly becomes impractical at larger scales.
We propose two policies to mitigate the problem. First, an agreed consensus method for the measurement of diarrhoea rates in surveys. Second, triangulation of diarrhoea rates with other observations that reflects on gastrointestinal health when interventions are evaluated. Many WASH studies already include anthropometric measurements as outcomes alongside diarrhoea. Furthermore, direct measurement of environmental contamination and pathogen levels in stool samples should complement diarrhoea rates in clinical studies. This would also allow for determination of how much diarrhoea is attributable to infection (and which can be reduced by WASH interventions), rather than  ((developing or less* developed or under developed or underdeveloped or middle income or low* income) adj (economy or economies)).ti,ab.
(low* adj (gdp or gnp or gross domestic or gross national)).ti,ab.
(low adj3 middle adj3 countr*).ti,ab. (lmic or lmics or third world or lami countr*).ti,ab. transitional countr*.ti,ab. or/2-10 measurement or burden or survey or questionnaire or stool sample or collection or (Bristol adj2 scale) or (Amsterdam adj2 scale) or scale or pathogen or microbiological or biological or protein or olfaction or prevalence or incidence or odds or risk exp animals/ not humans.sh. 1 and 11 and 12 14 not 13 Limit to ((english or french) and yr = "1993 -Current") A PPE N DI X 3