Estimating acute human leptospirosis incidence in northern Tanzania using sentinel site and community behavioural surveillance

Abstract Many infectious diseases lack robust estimates of incidence from endemic areas, and extrapolating incidence when there are few locations with data remains a major challenge in burden of disease estimation. We sought to combine sentinel surveillance with community behavioural surveillance to estimate leptospirosis incidence. We administered a questionnaire gathering responses on established locally relevant leptospirosis risk factors and recent fever to livestock‐owning community members across six districts in northern Tanzania and applied a logistic regression model predicting leptospirosis risk on the basis of behavioural factors that had been previously developed among patients with fever in Moshi Municipal and Moshi Rural Districts. We aggregated probability of leptospirosis by district and estimated incidence in each district by standardizing probabilities to those previously estimated for Moshi Districts. We recruited 286 community participants: Hai District (n = 11), Longido District (59), Monduli District (56), Moshi Municipal District (103), Moshi Rural District (44) and Rombo District (13). The mean predicted probability of leptospirosis by district was Hai 0.029 (0.005, 0.095), Longido 0.071 (0.009, 0.235), Monduli 0.055 (0.009, 0.206), Moshi Rural 0.014 (0.002, 0.049), Moshi Municipal 0.015 (0.004, 0.048) and Rombo 0.031 (0.006, 0.121). We estimated the annual incidence (upper and lower bounds of estimate) per 100,000 people of human leptospirosis among livestock owners by district as Hai 35 (6, 114), Longido 85 (11, 282), Monduli 66 (11, 247), Moshi Rural 17 (2, 59), Moshi Municipal 18 (5, 58) and Rombo 47 (7, 145). Use of community behavioural surveillance may be a useful tool for extrapolating disease incidence beyond sentinel surveillance sites.


| INTRODUC TI ON
Our understanding of the burden of many infectious diseases in lowresource areas is hampered by few robust estimates of incidence.
The most rigorous approach to estimating incidence is through population-based cohort studies. Population-based approaches require substantial resources in order to recruit participants and maintain participation through the duration of the study (Szklo, 1998). Such studies have not been conducted for many infectious diseases in low-resource areas, including leptospirosis.
In Tanzania, leptospirosis has been identified as prevalent among patients with fever , but there are few estimates of incidence (Allan et al., 2015;de Vries et al., 2014).
We have previously estimated the annual leptospirosis incidence for Moshi Municipal District and Moshi Rural District, in the Kilimanjaro Region of Tanzania as 75-102 cases/100,000 people during 2007-2008 and 11-18 cases/100,000 people during 2012-2014 using multiplier studies that account for under-ascertainment based on surveillance of acute leptospirosis among patients presenting to hospital (Biggs et al., 2013;Maze et al., 2016).
However, Moshi Municipal District and Moshi Rural District are only two (1.2%) of 169 districts in Tanzania and tools are needed to infer disease incidence of leptospirosis away from sentinel surveillance sites across a broader geographical scope. Aside from our previous estimates of leptospirosis incidence in Moshi Districts, data on leptospirosis incidence are scarce. Costa and others estimated leptospirosis national incidence for Tanzania as 20.89 (95% confidence intervals [95% CI] 7.27, 38.34) as part of an estimate of global incidence. They used a prediction model that incorporated previously identified environmental and population risk factors for leptospirosis such as distance from the equator, percentage of the population urbanized, life expectancy at birth, and whether the country was a tropical island (Costa et al., 2015).
While such estimates are useful, they focus on broad environmental risk factors, ignore the effects of human behaviour and do not address subnational variation. In addition, the authors acknowledged that their model was likely to be unreliable for African countries as the selected risk factors were derived almost exclusively from studies done elsewhere (Costa et al., 2015). In this context, we sought to estimate leptospirosis incidence in districts across northern Tanzania where surveillance data are lacking in order to better understand the variation in leptospirosis incidence at a subnational level.
Exposure to cattle and rodents has recently been identified as risk factors for human leptospirosis among patients with fever in northern Tanzania (Maze et al., 2018). Our case-control study used logistic regression to investigate associations between acute leptospirosis and scales of cumulative exposure to potential sources of infection that we identified from the published literature: urine of cattle, goats, pigs and rodents, and surface water (Ashford et al., 2000;Bovet, Yersin, Merien, Davis, & Perolat, 1999;Leal-Castellanos, Garcia-Suarez, Gonzalez-Figueroa, Fuentes-Allen, & Escobedo-de la Penal, 2003;Mwachui, Crump, Hartskeerl, Zinsstag, & Hattendorf, 2015;Sarkar et al., 2002;Sugunan et al., 2009) • While our prediction model and our estimates contain considerable uncertainty, we think our method may have widespread use for leptospirosis and other infectious diseases. data with data from community risk factor questionnaires to estimate the incidence of leptospirosis across a broad geographic area of northern Tanzania.

| Study setting
Arusha and Kilimanjaro regions are the two most populous regions of northern Tanzania. Each region is divided into seven districts.
Human and animal population density, climate, and farming systems vary considerably between districts of the Arusha and Kilimanjaro

| Evaluation of leptospirosis prediction model among sentinel site patients with fever
The diagnostic accuracy of our previously developed leptospirosis risk factor multivariable model was evaluated among patients with fever in whom the model was developed. As reported previously (Maze et al., 2018), the patient population was recruited from paediatric and adult patients presenting with fever to two referral hospitals, In our current study, we estimated the probability of leptospirosis among each fever study participant by using fitted values from the final multivariable model of acute leptospirosis, hereafter called the logistic regression model, to participants' exposure scores. We evaluated errors associated with the magnitude of association of risk factors in our model by assessing the difference in estimated probability among cases and controls at the estimated upper and lower 95% confidence intervals of the co-efficient for each risk factor. The diagnostic accuracy of the model was estimated by calculating the area under the receiver-operator-curve (AUROC). Out-of-sample error of the final exposure-scale multivariable model was assessed using root mean square error (RMSE) evaluated through leave-oneout cross validation (Kohavi, 1995;Picard & Cook, 1984

| Behavioural surveillance
All members of selected households were approached for enrolment in the study. Trained study staff members who were fluent in the participant's language administered standardized questionnaires inquiring about established risk factors for zoonotic disease including leptospirosis from studies done in other settings (Ashford et al., 2000;Bovet et al., 1999;Leal-Castellanos et al., 2003;Mwachui et al., 2015;Sarkar et al., 2002;Sugunan et al., 2009), adapted for the situation in northern Tanzania. We also asked whether fever had been present during the 2 weeks prior to the interview. The questionnaires were developed in conjunction with those administered at our sentinel sites at KCMC and MRRH to patients with fever (Maze et al., 2018) in order to harmonize analysis. Questionnaires were developed in English and translated by professional translators. Risk factors were aggregated into the scales of cumulative exposure to cattle urine and rodent urine that were analogous to the aggregated exposure scales developed among patients with fever at our sentinel hospital sites (Maze et al., 2018). Since questions on whether participants had fed cattle or worked in the sugarcane fields were not included in the questionnaire, there were minor differences in the weightings of those previously published (Table S1). We calculated a score, between 0 and 5 on each scale, for each participant, based on their questionnaire responses. A participant who had performed none of the exposure activities scored zero, and someone who performed all of the activities scored 5.

| Predicted probability of acute leptospirosis
We predicted the risk of leptospirosis during the 2 weeks prior to the interview for each participant in the cross-sectional community dataset by applying the logistic regression model to participants who reported that they had experienced fever during the preceding 2 weeks. For those participants who did not report fever, we set the probability of leptospirosis during the 2 weeks prior to the interview as zero. We assessed the effect that plausible changes in the coefficient of the variables in our logistic regression model would have on the predicted probability of leptospirosis among participants by repeating probability predictions using the upper and lower bounds of the 95% confidence intervals of the regression coefficients.

| Prediction of incidence by district
We aggregated the predicted probability of recent leptospirosis for individuals by district and calculated the mean. We benchmarked predicted incidence in each District to that of Moshi Municipal District, where the leptospirosis incidence had been previously established as 11 cases per 100,000 people during the study period. To benchmark predicted incidence, we multiplied the incidence of leptospirosis in Moshi Municipal District by the ratio of the mean predicted probability of leptospirosis between the relevant district and Moshi Municipal District.

| Data management
Data were entered using the Cardiff Teleform system (Cardiff, Inc.) into an Access database (Microsoft Corporation). Analyses were performed using Stata, version 13.1 (StataCorp).

| Research ethics
This study obtained clearance from ethical review committees at KCMC, the National Institute of Medical Research (Tanzania), the University of Glasgow and the University of Otago, and an Institutional Review Board at Duke University. Figure 1 shows the distribution of predicted probabilities of leptospirosis that were obtained when using the point estimate and 95% confidence intervals of coefficients obtained from the final logistic regression model among leptospirosis cases (n = 24) and controls (n = 592). The distribution of probabilities was higher among cases than controls (Kruskal-Wallis test p value .01). The AUROC was 0.64. Leave-one-out cross-validation among febrile patients of model found the RMSE = 0.193.

| Cross-sectional behavioural risk factor study among livestock keepers
We consented and administered questionnaires to 286 participants.
The characteristics of participants, described by district, are sum-

TA B L E 2 Prevalence of cattle and rodent related risk factors for leptospirosis among cross-sectional community study participants, by District, northern Tanzania, 2013-2015
Variable Handled aborted cattle products

| Predicted probability of leptospirosis and estimation of incidence among livestock keepers
The mean predicted probability of leptospirosis in the previous 2 weeks among participants within each district, and the ratios of mean predicted probability in each district compared to Moshi Municipal District are shown in Table 4 and Figure 2. Leptospirosis annual incidence estimates are shown in Table 4. Participants in Moshi Rural District had the lowest predicted annual incidence of leptospirosis (10 cases per 100,000; upper and lower bounds 1, 36), and Longido District had the highest predicted annual incidence (53 cases per 100,000; upper and lower bounds 7, 174).
When we performed a sensitivity analysis with Moshi Rural District as the benchmark, the estimated incidence ranged from 11 per 100,000 people in Moshi Rural District to 56 per 100,000 people in Longido District.

| D ISCUSS I ON
This study has applied and explored the limitations of a relatively simple method to estimate the incidence of leptospirosis across a broad area of northern Tanzania, including areas not served by leptospirosis surveillance. The predicted incidence of leptospirosis varied across the six districts from 10 to 53 cases per 100,000 people. The existing data used to estimate incidence had significant limitations and the estimates of districts overlap. We suggest that our numerical estimates should be viewed with caution as the risk factors for leptospirosis were modestly predictive. Despite these limitations, when behavioural risk factors are well defined, our approach may be useful for estimating the incidence of a range of infectious diseases in lowresource areas that are not served by sentinel surveillance.
In many low-and middle-income countries, our best estimates of incidence of infectious diseases come from studies that have use macro-level risk factors to estimate incidence at a national scale, such as distance from the equator and per cent urbanization of the population (Costa et al., 2015;Mogasale et al., 2014). While such estimates are useful, they focus on broad environmental risk factors and ignore the effects of human behaviour. Our approach of estimating zoonotic disease incidence from locally relevant risk factors adds to the country-level environmental risk factor approach and provides a useful tool for extrapolating data from sentinel sites across broad subnational areas. Our study aimed to develop novel methods of estimating incidence, and an important next step is to validate our approach by estimating leptospirosis incidence using more established methods in the districts studied here.
Our probability estimates rely on the assumption that the risk factors measured adequately account for leptospirosis risk, that the risk factors operate consistently between districts and that the proportion of fevers caused by leptospirosis is similar among community members reporting fever and hospitalized patients reporting fever. To mitigate these assumptions, we have assessed the potential for error at each stage of the modelling process and estimated confidence intervals to account for each potential error. Based on the AUROC and RMSE, our model is an imperfect predictor of disease, limiting its use for out-of-sample datasets. The wide range between the upper and lower bounds of our probability estimates reflects the poor pre-  .
Malaria is unlikely to account for differences in fever prevalence as the prevalence of Plasmodium parasitemia in all study districts is low (Hochedez et al., 2015).  (Costa et al., 2015), as well as data on the prevalence of Leptospira in soil, waterways and animal hosts. The parent study from which our data were collected will provide data on livestock and human Leptospira seroprevalence. While the complexities of the relationship between Leptospira seropositivity and acute leptospirosis infections make determination of incidence solely through seroprevalence challenging (Cumberland, Everard, Wheeler, & Levett, 2001;Haake & Levett, 2015), seroprevalence data could provide additional supporting evidence to risk factor-based estimates of incidence. As well as improving our prediction model, future studies that estimate disease incidence from risk factor surveillance should include representative sampling of the entire population.
In conclusion, we extrapolated existing estimates of leptospirosis incidence across a broad geographical area using behavioural surveillance. While our approach could be improved through further data collection to develop a prediction model with greater accuracy, we propose that our approach may have application across many infectious diseases in low-resource areas.

ACK N OWLED G EM ENTS
The authors would like to thank those involved in recruitment, laboratory work, data management and study administration, in- Karia. In addition, we would like to thank the study participants as well as the clinical staff and administration at Kilimanjaro Christian Medical Centre and Mawenzi Regional Referral Hospital for their support during this study. This study was conducted in accordance with the principles of the Declaration of Helsinki.

CO N FLI C T O F I NTE R E S T
The authors report no conflict of interest.

D I SCL A I M ER
The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the Centers for Disease Control and Prevention. Use of trade names and commercial sources is for identification only and does not imply endorsement by the US Department of Health and Human Services or the Centers for Disease Control and Prevention.