Toward Elimination of Infectious Diseases with Mobile Screening Teams: HAT in the DRC

In pursuit of Sustainable Development Goal 3 “Ensure healthy lives and promote well‐being for all at all ages,” considerable global effort is directed toward elimination of infectious diseases in general and Neglected Tropical Diseases in particular. For various such diseases, the deployment of mobile screening teams forms an important instrument to reduce prevalence toward elimination targets. There is considerable variety in planning methods for the deployment of these mobile teams in practice, but little understanding of their effectiveness. Moreover, there appears to be little understanding of the relationship between the number of mobile teams and progress toward the goals. This research considers capacity planning and deployment of mobile screening teams for one such neglected tropical disease: Human African trypanosomiasis (HAT, or sleeping sickness). We prove that the deployment problem is strongly NP‐Hard and propose three approaches to find (near) optimal screening plans. For the purpose of practical implementation in remote rural areas, we also develop four simple policies. The performance of these methods and their robustness is benchmarked for a HAT region in the Democratic Republic of Congo (DRC). Two of the four simple practical policies yield near optimal solutions, one of which also appears robust against parameter impreciseness. We also present a simple approximation of prevalence as a function of screening capacity, which appears rather accurate for the case study. While the results may serve to more effectively allocate funding and deploy mobile screening capacity, they also indicate that mobile screening may not suffice to achieve HAT elimination.


Introduction
Goal 3 of the United Nations sustainable development goals (SDGs) is to "Ensure healthy lives and promote well-being for all at all ages" (UN 2015).While much progress has been made in this area in recent years, "many more efforts are needed to fully eradicate a wide range of diseases" (WHO 2018b).Among the infectious diseases considered for eradication are the "big three" AIDS/HIV, tuberculosis and malaria, as well as a collection of neglected tropical diseases (NTDs).A reported 1.5 billion people required treatment for NTDs globally in 2016 (WHO 2018c).Among the NTDs are dengue, leprosy, and rabies, as well as lesser known diseases such as dracunculiasis, and human African trypanosomiasis (HAT)-also known as sleeping sickness.
In pursuit of the SDGs, the WHO has set an agenda with specific eradication and elimination targets (WHO 2018c).Eradication targets are most ambitious as they refer to reducing the global prevalence, that is, the global number of infected persons, to zero.Elimination refers to (intermediate) targets such as reducing regional or national prevalence to a given target level.For HAT, elimination is specified by the World Health Organization as a prevalence of < 1 case per This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.10,000 inhabitants (WHO 2018a) per focus area.This research addresses the elimination of HAT.
Given the infectious nature of HAT, timely detection and treatment form important elements of any elimination policy.Timely detection of HAT is mostly achieved through active case finding (ACF), which involves proactive screening of individuals in predetermined target groups.ACF importantly contributes to achieving the SDG related elimination goals for HAT (WHO 2013), as well as for tuberculosis (Golub et al. 2005) and leprosy (Moura et al. 2013).Remote rural areas are of special importance from the perspective of screening and treatment, as they may lack an appropriate health system infrastructure (WHO 2018a).The deployment of mobile screening teams in such areas, which actively find cases by screening the population village by village, has considerably contributed to the progress made toward elimination of HAT (Franco et al. 2017).
De Vries et al. (2016) show that the HAT prevalence dynamics vary considerably across villages and that for villages with higher prevalence levels (1 per 1000 or higher) the current WHO polices for ACF may well be insufficient to achieve the specified elimination targets.Moreover, the number of mobile teams deployed may not suffice to conduct necessary screening visits to the villages because of funding limitations.These funds tend to be diminished when prevalence decreases.Since population screening is rather expensive, this risks halting progress toward elimination, or may even result in prevalence increases (Hasker et al. 2010).Over the period 2010-2014, around 2 million people have been screened annually for HAT (Franco et al. 2017).The total yearly cost of a mobile team amounts to 130.000 USD (when using rapid diagnostic testing), resulting in a cost per screened person of 2.40 USD (Bessell et al. 2018).
Given the challenges encountered with current policies and capacities, it is important to understand the relationship between mobile screening capacity and elimination progress.Obviously, this relationship crucially depends on the planning of the mobile teams: which villages to screen and with what time intervals to screen them (cf.Mpanya et al. 2012).Yet, research on how to plan screening optimally at village level appears to be lacking (WHO 2015a).In fact, planning practices are reported to vary widely (see, e.g., Paquet et al. 1994, Ruiz et al. 2002, Simarro et al. 1990).The research question that rises is: What are suitable methods to plan screening visits to villages by mobile screening teams and what mobile screening team capacity is required to meet prevalence level targets for HAT?
The word "suitable" requires some elaboration.On the one hand, a suitable planning method is one that leads to lowest prevalence levels.For practical purposes, on the other hand, planning methods need to be easy to understand and to implement (as is a strength of the current planning method recommended by the WHO).Hence, next to advanced exact methods, this study also proposes and analyzes simple planning policies.
Below, we review related literature, formally define the problem under consideration, model it mathematically, and prove (the optimization version) to be NP-Hard.Moreover, using a stylized version of the model, we derive a simple approximation of the relationship between capacity and prevalence.In addition, we develop three general solution approaches and four simple planning policies.The performance of these methods and their robustness are benchmarked for a HAT region in the DRC, for which we also assess the relationship between capacity and prevalence numerically.

Literature
Human African Trypanosomiasis.HAT, or sleeping sickness, is a slowly progressing parasitic disease, transmitted between humans by the Tsetse fly (Brun et al. 2010).The presented case study regards the T.B. Gambiense variant of HAT, which accounts for 98% of all HAT cases (WHO 2015b).This variant develops in two phases.Infected humans are infectious for Tsetse flies in both phases (Rock et al. 2015).In the first phase, the parasite typically causes minor and unspecific symptoms such as headaches, fever, and weakness (Brun et al. 2010).The median duration of the first phase, which is often considered asymptomatic, is about 1.5 years (Checchi et al. 2008).The second stage commences when the parasite crosses the blood-brain barrier.The parasite then causes various neurological disorders, including sleeping disorders, severe suffering, and death if left untreated.Because of the severity of suffering, patients in this symptomatic second stage seek treatment, although often with significant delays (Bukachi et al. 2018, Rock et al. 2015).
The treatment delay associated with both disease phases is a major enabler of sustained transmission of HAT.Patients are a potential source of infection for the Tsetse fly (Brun et al. 2010, Fevre et al. 2006) and hence indirectly for uninfected people.They may be infectious for more than 1.5 years until they start to seek care themselves, as is a form of passive case finding (PCF)) (Hasker et al. 2010).These dynamics underline the crucial importance of ACF for elimination of HAT (Hasker et al. 2010, WHO 2013).De Vries et al. (2016) study several models to capture the effects of ACF policies on HAT prevalence at village level.To be of use for village level planning, these models solely use data that are routinely collected per village: numbers of cases detected (both phases combined) and the timing of screening rounds.
The current practice of ACF is to send mobile teams to endemic villages for exhaustive population screening (Brun et al. 2010, Hasker et al. 2010, Mpanya et al. 2012).The national HAT disease control program in the DRC, for example, employs 35 such mobile screening teams.Screening campaigns are managed and coordinated on the national level (not on the health facility level).Population screening is proven to be effective (Fevre et al. 2006, Rock et al. 2015), but is also considered to be costly.The high costs are a main reason to reduce funding and thus screening activity when prevalence reduces (Hasker et al. 2010).This can in turn lead to increases in prevalence, which totaled to above 300,000 cases in 1998 alone (WHO 2015b).In 2018, the reported number of cases dropped below 1000 for the first time since the start of datacollection 80 years ago.The DRC, our case study country, accounts for more than 80% of new HAT cases reported (Franco et al. 2017).
Population Screening and Treatment Planning for Infectious Diseases.In their seminal work, Blount et al. (1997) formulate and solve generic models for the optimal timing of interventions to reduce prevalence and incidence of an infectious disease.The interventions require resources of which capacity is limited.Prevalence progression at population level is modeled through an SIS model.Brandeau et al. (2003) consider an SI model for multiple populations and aim to optimize deployment of scarce resources over time and populations.They derive structural properties of optimal solutions and present an excellent overview of closely related literature.
More recently, Deo et al. (2015) consider optimal screening, testing, and treatment for HIV/AIDS in the United States.Based on an individual disease progression model-instead of an epidemiological model -they present results on capacity requirements as well as practical policies to allocate available human resources so as to maximize health (in QALYs).Deo et al. (2013) also use an individual disease progression model to maximize deployment of scarce resources over time.The objective is to reduce the burden of disease for young patients suffering from the noninfectious disease Asthma in the United States.They present an easy-to-implement, myopic heuristic that is provably optimal in special cases.These studies, as well as closely related ones, rely on a detailed multistage model to capture disease progression (Bishai et al. 2007, Paltiel et al. 2005).
Various authors use extensive simulation models to study the effectiveness of HAT screening and treatment programs and the time to elimination.In addition to multiple stages of disease progression in the human population, such models typically account for vectors and animal disease reservoirs (Casta ño et al. 2020, Rock et al. 2015, Rock et al. 2018).These studies typically consider one or several given screening frequencies for the population within a district.Hence, the prevalence models in these studies are homogeneous over all villages in a district (as opposed to village specific).Optimal allocation of scarcely available screening capacity, as needed to ensure screening frequencies necessary for elimination, is not considered in these studies.
In comparison to the aforementioned multi-stage and multi-host models, our study is less complex as it considers only one disease stage explicitly.Moreover, as data on vectors and animal reservoirs are not explicitly available at village level, our model only considers the human population.At the same time, the model presented below is more elaborate than the aforementioned models as it distinguishes the villages in which the district population lives and correspondingly relies on village specific prevalence functions.Moreover it takes travel times between villages into account, as required to model the deployment of mobile teams.
De Vries et al. (2016) present an extensive econometric analysis of HAT prevalence considering a variety of functions to model prevalence per village.The functions are validated through out-of-sample predictions, using data from 2004 to 2013 for 143 villages in Bandundu (DRC).This analysis identifies LMVCC, an SIS model based function in which the carrying capacity (i.e., the equilibrium prevalence level) varies per village and over time, to best fit the data.The varying carrying capacity also serves to implicitly reflect vector and reservoir dynamics per village.Although, as we elaborate below, the LMVCC model is a simplification, it solely uses data that are routinely collected on the village level, and thereby facilitates village level predictions.The optimization models and policies presented below rely on the LMVCC function to analytically model prevalence.
Literature which explicitly considers distance and travel in the deployment of scarce resources is of importance as travel time to the nearest healthcare facility is a major determinant of healthcare utilization and several types of health outcomes (De Vries et al. 2014).Providing sufficient levels of access through spatially fixed facilities is often not feasible, particularly in scarcely populated and poor areas.For this reason, mobile healthcare units are being used in several countries (see, Doerner et al. 2007, for references).
The routing problem for mobile healthcare facilities was first presented by Hodgson et al. (1998).The authors model this as a tour-location problem, which is to select tour stops and a tour so as to minimize the total travel time and to satisfy the constraint that each demand point is covered.Hachicha et al. (2000) extend this model to the multiple vehicle case and propose three heuristics to solve it.Doerner et al. (2007) model the problem as a multi-objective optimization problem, using coverage and travel time criteria.The main difference with this study is that we consider the health outcome measure prevalence in the objective, rather than logistic process measures such as travel time or coverage.
McCoy and Lee ( 2014) analyze optimal deployment of motorcycles to provide healthcare services in rural areas.The problem of determining the number of visits to each outreach site is modeled as a resource allocation problem with effectiveness and equity criteria.Effectiveness is modeled to depend (polynomially) on the number of visits.In comparison, our study is based on a HAT specific model for the relationship between visits and disease prevalence presented in De Vries et al. (2016).We refer to the PhD thesis by De Vries (2017) for generalizations of the presented models.In addition, we refer to Dasaklis et al. (2012) for an overview of scientific research on logistics operations for epidemic control.

Problem Formulation
This section formally models the planning problem for mobile HAT screening teams, which we refer to as the mobile screening team deployment problem (MSTD).We consider M mobile teams, a set of villages V ¼ f1, 2, . . ., Vg, and planning periods T ¼ f1, 2, . . ., Tg.The problem is to determine for each mobile team m ∈ {1, . . ., M}, and for each period t ∈ T , which subset of V to visit, so as to minimize the prevalence levelthat is, the number of people infectedover all villages in V.
A planning period corresponds to one multi-day mission, where a team visits one or multiple villages per day and camps overnight in the field to minimize travel times.In the DRC, a planning period lasts 1 month and a mission lasts 20 days (the remainder of the month is spent at home and in the office).Travel to the next village typically represents only a minor part of a screening day.Following current practice, we therefore do not explicitly model travel times but stipulate that a team stays within the same region or cluster during the planning period.A cluster c ∈ C thus corresponds to a set of villages for which travel times between villages can be conveniently incorporated into the schedules.We denote the subset of villages corresponding to cluster c by V c .The planning problem then boils down to specifying for each team and for each period which cluster c it is assigned to and which villages v ∈ V c it visits.Given this subset of villages, the team autonomously takes routing decisions.
We let binary variables y ct indicate whether a team is assigned to cluster c in period t.For village v ∈ V, r v denotes the fraction of the duration of a mission that is required to screen the village.We let binary variables x vt indicate whether village v is screened in planning period t and let x v denote the vector (x v1 , . . ., x vT ).These variables translate into a vector of time intervals between consecutive screening rounds τ v ðx v Þ ¼ fτ v0 , τ v1 , . . ., τ vn v g.Here, τ v0 represents the time between the beginning of the planning horizon and screening round 1, τ v1 the time between screening round 1 and screening round 2, . .., and τ vn v the time between the last screening round (i.e., screening round n v ) and the end of the planning horizon.For convenience of notation, we will denote this vector by τ v from now on.We base the precise relationship between x v and τ v on the following assumption: ASSUMPTION 1.For each village v ∈ V, the time interval between consecutive screening rounds in periods t and t + k equals exactly τ = k periods.
This assumption is justified when the exact timing of the screening round within the period has little impact on the development of the prevalence level, which is particularly the case for slowly evolving epidemics such as HAT.For ease of exposition, we henceforth always assume screening to take place at the end of the planning period.
Given the random nature of HAT infection, which depends on encounters between humans and Tsetse flies, it is not possible to exactly predict the new cases, that is, the incidence, nor the resulting prevalence.Instead, we consider expected HAT prevalence levels, as modeled in De Vries et al. (2016).We note that expected prevalence levels vary over time.As a result, they may meet elimination targets at some moments in time, but may increase to above target levels afterwards.To avoid bias toward certain time moments, the proposed model considers average expected prevalence levels.By consequence, the objective does not consider prevalence and health at the end of the planning horizon, but instead during the planning horizon.
For a given solution x v and corresponding screening intervals τ v ðx v Þ, function B v ðτ v Þ represent the resulting average expected HAT prevalence level over the planning horizon.We now define MSTD as: de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases For each period t and cluster c, Constraint (2) regulates screening capacity available.Constraint (3) limits the number of teams assigned to clusters per period.We note that this model can also be applied to the case when teams make single day trips from a depot.In that case, a cluster would simply represent a collection of neighboring villages for which travel can be conveniently included in a single-day trip.Before turning to solving MSTD, we first discuss and analyze the objective function.
Average expected HAT prevalence level.For each village v ∈ V, prevalence progression function f v (s) ≥ 0, s ∈ [0, +∞) describes the development of the expected HAT prevalence level over time in the absence of ACF.Hence it refers to the situation where x vt = 0 ∀t.We refer to s as the stage of progression.The reader may note that the stage of progression is a continuous variable referring to prevalence of HAT and not to be confused with the two phases of HAT described in the introduction.
We let s vn denote the stage of progression in village v after screening round n.We model a screening round to decrease the expected prevalence level with a given strictly positive impact fraction p.As elaborated in the case study (see section 6), p is the product of four variables: the average participation rate in screening rounds, the sensitivity of the screening test and the confirmation test, and the fraction of infected people who proceed to treatment (Robays et al. 2004).For example, if p = 0.8, the effect of screening in village v at stage of progression s results in reducing the prevalence level from f v (s) to 0.2 × f v (s).Thus, screening leads to resetting the stage of progression to an "earlier" stage.The process is further explained in Figure 1 and defined by the following recursive relationship: (5) The average expected prevalence level relates to decision variables τ v and progression function f v (s) as follows: In an extensive modeling study, De Vries et al. (2016) investigate closed-form expressions for f v (s).Based on predictive performance and theoretical justification, variants of the function corresponding to the SIS epidemic model appear most suitable.The theoretical justification lies in the observations that (1) HAT closely resembles a (multi-host) SEIRS model (Rock et al. 2015) and (2) the number of people in the E (exposed) and R (removed) compartments are negligible given the low prevalence of the disease (De Vries et al. 2016, Rock et al. 2015).The closed-form function for disease prevalence in an SIS model is also known as the logistic function.As illustrated in Figure 1, this function implies the expected prevalence to initially grow exponentially and to level off to an equilibrium prevalence level afterwards.This equilibrium prevalence level is called the carrying capacity and represented by the dotted horizontal line in Figure 1.
The logistic function is generic in the sense that the only information about village v ∈ V it requires is N v , the population size of v, and K v , the carrying capacity of v as a percentage of the population of village v.The latter depends on factors such as the density of Tsetse flies around the village and the intensity of passive case finding (PCF).It can be estimated based on past prevalence levels and screening rounds.The national HAT control program in the DRC recently started digital collection of these data (Bluesquare 2018, Hasker et al. 2018).
According to the logistic function, after screening round n, the expected prevalence level in village v develops as follows as long as there is no screening: Here, κ represents a constant determining the steepness of the s-shaped curve and A v ¼ K v f v ð0Þ À 1 reflects the initial prevalence level.For convenience, we define A vn ¼ A v e Àκs vn .Substituting function ( 7) into (6) and deriving the integral yields the following expression for the average expected prevalence level: Here, variable A vn , n > 0 equals ∞ when p = 1 and can be determined recursively when p > 0 (see De Vries et al. 2016): We now more precisely define MSTD as problem ( 1)-( 4) using the HAT prevalence function ( 8) and prove the following result by reduction from the three-partition problem (see Appendix A): PROPOSITION 1. MSTD is strongly NP-Hard, even when M ¼ jCj ¼ 1.

Relationship between Capacity and Prevalence
As motivated, insight into the relationship between screening capacity-that is, the number of mobile screening teams-and prevalence is crucial to assess the resource needs for reaching elimination or other targets.We provide further insight in this relationship by considering a stylized variant of the mobile screening team deployment problem (MSTD), named MSTD R .In MSTD R , the constraints that at most M teams can be deployed in each planning period t and that teams cannot visit multiple clusters in one period are relaxed.Instead, MSTD R requires that capacity is larger than or equal to the average capacity required per period and that each village has an infinite sequence of screening intervals of equal length, denoted by parameter Note that in MSTD R , B v can be defined as a function of fixed screening interval τ v (instead of a vector of screening intervals).Note further that MSTD R makes clustering irrelevant.
In Appendix A, we prove the following proposition: PROPOSITION 2. An optimal solution to MSTD R is attained by greedily assigning screening interval to villages in descending order of the ratio N v K v /r v .Here, τ* = − log(1 − p)/κ.It represents the maximum screening interval that leads to eradication (zero prevalence in the long term).τ R denotes the minimum screening interval that can feasibly be attained using the remaining screening capacity.
Figure 2 illustrates the implications of this finding.Our proof implies that, for village v, increasing the screening frequency decreases the expected prevalence level linearly from N v K v to zero.Note that doing so "consumes" on average r v τ Ã teams per planning period.Hence, doing so in descending order of the presented ratio yields a piecewise linear relationship between capacity M and total average expected prevalence level.This linear relationship subsequently enables to determine the minimum capacity required to reach an elimination target (e.g., prevalence of at most 1 per 10.000) in the stylized setting.Here, capacity could be expressed in terms of the number of mobile teams required or the number of people to be screened per planning period.Section 6 discusses the accuracy of this easy capacity estimation method.

Planning Methods
We present seven methods to solve problem (1)-( 4).The first two methods are based on two different mathematical models which they aim to solve to (near) optimality.The third method is a heuristic, which will turn out to be especially valuable when solving larger instances.Methods four to seven are simple policies, designed for practical applicability.Below, we briefly outline each of these methods.Full details are provided in Appendix B. Although specifically developed for HAT, the methods can be applied to any disease with an increasing prevalence progression function f v (s) (De Vries 2017).
1.The Binary Linear Programming (BLP) approach takes formulation (1)-( 4) as a starting point and tackles the non-linearity of B v ðτ v Þ by discretizing the prevalence progression function f v (s).Details about the BLP approach are provided in Appendix B.1.The proposed discretization may restrict the optimization to an incomplete set of relevant prevalence levels, possibly excluding the optimal solution(s).Discretized formulations can, however, be ensured to be exact as follows.
First, we might pre-calculate and include all attainable prevalence levels.Since this number of prevalence levels might grow exponentially with T, this may be computationally infeasible for larger T. As we show in Appendix B.1, an alternative is to repeatedly add the actual set of prevalence levels attained by a solution to the optimization model and reoptimize.2. The Column Generation approach uses a mixedinteger programming (MIP) formulation that defines MSTD in terms of visit schedules or visit patterns.The problem then boils down to selecting a visit pattern for each village and allocate teams to clusters.Constraints are that each village can only be visited in periods in which a team is assigned to its cluster and that in each period no more than M teams are assigned.This problem has exponentially many variables, but can be approached through column generation.
The column generation subproblem that emerges can be solved as a shortest path problem, using discretization techniques (as used to solve the BLP formulation), which are further elaborated in Appendix B.2 (along with other algorithmic details).The method generates columns until the LP relaxation is solved to optimality, and then solves the binary version with the set of generated columns.Hence, this approach is not necessarily optimal.3. Iterated Local Optimization iteratively improves the current solution (x, y).Specifically, it iterates over t ∈ T and reoptimizes the planning for t while keeping the planning for all other periods fixed.The solution thus found can be identical to the previous solution or a different solution with the same or lower solution value.The approach terminates when the reoptimizations did not produce improvement for any t ∈ T .Per iteration, this approach requires to optimize the allocation of teams.In Appendix B.3 we show that this allocation problem is a knapsack problem, which we solve as a BLP (see Kellerer et al. 2004).
4. The Equalization Policy equalizes screening frequencies of the villages.We define For each cluster c, it calculates l 1 (c), the total expected prevalence at the beginning of the period among villages screened in that cluster if a team would be assigned to it.Within a cluster, villages are selected for screening in decreasing order of the ratio of expected number of cases among the people screened in village v over r v .This can be interpreted as the expected number of cases detected per time unit spent screening in village v. Next, the policy assigns teams to the M clusters with highest l 1 (c).7. The Prevalence Increase Policy strives to screen villages for which no screening would lead to a relatively large increase in expected prevalence.
For each cluster c, it calculates l 2 (c), the total expected increase in prevalence averted for the villages screened in that cluster if a team would be assigned to it.Within a cluster, villages are selected for screening in order of highest prevalence increase per time unit spent screening.Next, the policy assigns teams to the M clusters with highest l 2 (c).

Case Study
This section addresses our research question for a case study on HAT screening in the Kwamouth health zone in the Bandundu province of the Democratic Republic of Congo.The DRC accounts for 80% of new HAT cases reported (Franco et al. 2017).We first analyze the methods designed to find (near) optimal solutions, to assess their computation times and solution values.We subsequently use one of these methods as a benchmark for the performance of the practice oriented policies in terms of average expected prevalence reduction.Next, we analyze the validity of our insights on the relationship between prevalence and capacity (see section 4).To provide insight into the robustness of the results, we also present sensitivity analyses with respect to data impreciseness.Additional results on end of horizon effects and determinants of optimal solutions can be found in De Vries (2017).The methods were implemented using Matlab R2015a and used CPLEX 12.63 as a BLP solver.

Baseline Case Description
The case study contains 239 villages in the Kwamouth health zone, and is derived from HAT screening data from 2324 villages between 2004 and 2013 (see De Vries et al. 2016, for the data).The 239 villages were included when there exists at least one record of the number of people screened and the geocoordinates of the village are known.The first criterion is required to estimate population sizes and the second is required for assigning villages to clusters.We excluded 463 villages (at least partly) due to lacking geocoordinates.
In 114 of the excluded villages, HAT cases were found in at least one screening round.The average number of cases per visit to these villages was 1.90.The average participation rate per screening round, which we denote by part, has been reported to be 71% (of the population) and to vary substantially in Bandundu (Robays et al. 2004).Hence, for now we estimate the total village population to be 1.2 times the maximum number of people participating in a screening round reported for that village, and revisit participation in the sensitivity analysis.To ensure that each village can be screened in one planning period, one large village had to be split into two halves.Both are considered separate villages in the remainder of our analyses, which increases the total number of villages to 240.
Current planning practices in the national sleeping sickness control program of the DRC largely support cluster-based planning, by which teams select a cluster per planning period of 1 month, in which they only visit villages in the selected cluster (E.Hasker, personal communication, 20-9-2016): "They define different axes, and then simply visit one axis per trip.(...)An axis can be a major road or a river by which you travel.Of course, there are not many roads.Most villages can only be accessed by one road, so if you are traveling in a given direction, it is logical to stay in that region.""The planning assumes that they screen 300 persons per day, 20 days per month, so 6000 [persons] per month, and then they return to their bases." We manually clustered the villages, following the structure of the road network.The resulting clusters, the villages, and their relative population sizes are depicted in Figure 3.
Reflecting current practice in the DRC, we assume that a team can screen 6000 persons per planning period.As the number of people participating in a screening round in village v equals partÁN v , we estimate that screening of village v takes r v = partÁN v /6000 of a planning period (where part was defined as the average participation rate).Since the total estimated number of people from the 240 villages participating in screening equals 73,521, one team would need approximately 12 months to visit the villages, which is in line with the current WHO guidelines for villages having one case in the past 3 years (WHO 2015a).
For 70 villages we were able to obtain the carrying capacities as percentages of the population from De Vries et al. ( 2016).Since sufficient screening data were lacking for the other villages (required are at least two screening rounds and at least one disease case) these were randomly generated from an exponential distribution with mean 0.858%.This distribution was based on 143 carrying capacities estimated for this region by De Vries et al. (2016).We do not claim these estimates to be accurate, but consider them to be realistic enough for the purpose of the computational analysis presented in this case study.We also note that it is realistic to assume that required data will be available on a large scale in the future: the national HAT control program in the DRC recently started digital data collection (Bluesquare 2018, Hasker et al. 2018).
Baseline values for the impact related parameters introduced in section 3 as well as other parameters used in the case study can be found in Table 1.We denote the sensitivity of the screening test and the confirmation test by sens scr and sens con , respectively.Parameter treat represents the fraction of infected people who proceed to treatment.We note that time is measured in months.We consider a planning horizon of 36 months to examine how the practice oriented policies-which look at the past (Equalization and Differentiation), the current situation (Max Cases), or 1 month ahead (Prevalence Increase)-perform in the long term.

(Near) Optimal Methods
To identify which of the first three solution methods proposed in section 5 could serve as a suitable benchmark for the practice oriented policies, we now consider their computation times and solution quality.To allow the calculation of the exact solution and hence of exact optimality gaps, we first consider small instances in which T ranges from 4 to 7 months, and M ranges from one to two mobile teams.The binary linear programming (BLP) approach uses a discretization consisting of 25 prevalence levels equally spaced between zero and the villages' carrying capacities.The exact solution was determined by applying this approach for a discretization containing all 2 T prevalence levels attainable.Table 2 shows the CPU times (sec.) and optimality gaps for the three methods.
Each of the methods yields optimal solutions or solutions within 0.1 % of optimality for all instances.Yet, computation times for the BLP approach soon become impractical for a discretization using 25 prevalence levels.For the other approaches, solution times grow much more slowly, which renders them more suitable for large problem instances, such as instances considering up to 36 months, as in the case study at hand.
To assess the added value of the column generation approach over the iterated local optimization (ILO) approach, we also applied these two methods to T ∈ {12, 18, 24, 30, 36}.In each case, the approaches attain the same solution value.As the ILO is much faster, we use the solutions it produces as a reference in the remainder of this case study and refer to them as "optimized" (rather than "optimal") solutions.

Performance of Practice Oriented Policies
Figure 4 compares the prevalence levels resulting from applying the policies introduced in section 5. Here, the Differentiation Policy divides villages into four equally sized classes.The first class contains the villages with highest K v , etc.The policy assigns the classes 40%, 30%, 20%, and 10% of the screening capacity, respectively.
Without screening, the average expected number of people infected in the 240 villages during the next 36 months would be 341 persons.Hence, the solutions avert 52% (Equalization) up to 67% (Optimized) of average expected prevalence, which shows the substantial impact active case finding can have.Surprisingly, despite their simplicity, the Max Cases and Prevalence Increase Policies perform only 0.7% and 3.4% worse than optimized planning, respectively.The slightly poorer performance of the second might well be caused by the s shape of the prevalence progression function, which implies that prevalence growth is steepest for modest prevalence values, thus foregoing villages with higher prevalence (yet lower prevalence increase).The Equalization and Differentiation Policies perform substantially worse.To better understand these results, let us zoom in on the planning decisions recommended by the different methods.Figure 5a describes how the different methods "allocate" screening capacity over the population.We order the population along the x-axis in decreasing order of carrying capacity of the villages in which they live and depict on the y-axis the allocated percentage of screening capacity (in terms of people screened).A straight line from the origin would mean a perfectly equal distribution.The (near-) horizontal parts of the curve for Prevalence Increase support the hypothesis that it neglects several villages where prevalence is high but not strongly increasing.The curves for Optimized screening and the Max Cases policy are rather similar.The main difference is that Max Cases screens slightly more people overall.
To understand why optimized screening outperforms Max Cases, we must zoom in on the timing of the screening rounds, which is analyzed in Figure 5b.Here, we depict the "average visit period" for people living in villages with a carrying capacity that is in the lowest quartile (Q1), the second quartile (Q2),. ..1 For example, visits in period 10 and 20 yield 15 as the "average visit period."The figure shows that optimized planning visits villages with a high carrying capacity relatively early: in comparison with Max Cases, the average visit period is 0.6 days earlier for villages in the highest quartile and 1.1 days earlier for villages in the second highest quartile.It thereby preventatively controls the epidemic in these villages.Max Cases, instead, is more reactive as it solely screens a village when the expected prevalence is relatively high.This insight will explain some of the observations presented below.
To assess the sensitivity of our results to changes in characteristics of our case study, we analyzed three additional variants.The first considers the situation when cases are concentrated in fewer villages.The second uses population numbers from the WorldPop database, which are estimated on the basis of satellite images (see Tatem et al. 2007).The third clusters the villages randomly.Results are described in Appendix C and confirm our findings: (1) solutions obtained for these cases avert 51%-68% of average expected prevalence, (2) the Max Cases and Prevalence Increase Policies perform only 0.8%-1.3%and 1.6%-3.0%worse than optimized planning, and (3) the other policies perform substantially worse.Numerical results presented in Appendix C suggest that our conclusions are also rather robust to excluding villages.

Accuracy of Capacity Estimation Method
Figure 6 depicts the accuracy of the capacity estimation method derived in section 4. The solid line represents the piecewise linear relationship between capacity and average expected prevalence estimated by this method.The dashed line represents the actual average prevalence level over the next 30 years when the number of people who can be screened per planning period equals 0, 100, 200, . .., 10,000, as attained by the optimized solution.(As explained, one could alternatively express capacity scenarios in terms of the number of mobile teams.However, as our case study is relatively small, this would yield few data points.)Despite neglecting routing considerations and the finite planning horizon, the capacity estimation method provides a rather accurate estimation of the capacity needed to reach a given prevalence level target.The average vertical difference between the curves equals 25 people (maximum difference: 45).The average horizontal difference (for the parts of the graph where it can be calculated) equals 330 persons or 5.5% of a team's capacity.We note that the method's accuracy appears relatively high for high prevalence levels and vice versa for lower ones.For the part of the curve where capacity ≤ 3000, the difference is 8% (measured as a percentage of the optimized solution value).For the part where 3000 < capacity ≤ 6000, this average difference is 54%, while for 6000 > capacity, the average difference is 100%.This could be explained from the slow convergence toward eradication during the 30 years, which the method neglects by analyzing average prevalence over an infinite planning horizon.Next to model accuracy, which we shall discuss later, this provides a second argument to primarily use this result in the context of relatively high prevalence levels.
Although we leave formally proving this to future research, we hypothesize that the method provides a lower bound on the actual capacity required in case of an infinite planning horizon.(Note: our stylized model not only relaxes capacity constraints but also restricts solutions to constant screening intervals.)Our results suggest that the method generally provides lower bounds in case of a finite horizon as well.

Sensitivity Analysis: Carrying Capacities
As we cannot directly observe the carrying capacities of the villages, we estimate them from observables.De Vries et al. (2016) propose and fit a formula relating the carrying capacity to the average observed prevalence and the average screening frequency in the past 5 years.Due to stochasticity in prevalence levels, however, such estimates may be imprecise, which begs the question to what extent this impacts the quality of scheduling decisions.Figure 7 summarizes sensitivity analysis results.For each village, we randomly draw the "real" value of K v according to a uniform distribution on [K v (1 − Δ), K v (1 + Δ)], determine the "real" optimized solution, and calculate the "real" value of the solutions that were based on the "incorrect" carrying capacities.This is repeated 100 times for each Δ ∈ {0.2, 0.4, 0.6, 0.8, 1.0} which yields the depicted average and maximum observed optimality gaps.
The results show that solution quality is rather robust with respect to impreciseness.For example, the estimated "optimality" gap for the Max Cases Policy ranges from 1.0% for Δ = 0.2 to 12.3% for Δ = 1.0.Hence, even when the real carrying capacities deviate up to 100% from the assumed values, the estimated average gap for this policy equals only 12.3%.Furthermore, the maximum observed "optimality" gap is rather close to the average.For the solution obtained by the Max Cases Policy, for example, the maximum ranges from 1.5% for Δ = 0.2 to 20.3% for Δ = 1.0.Note that the actual optimality gap may be larger since we used the optimized solution for comparison.

Sensitivity Analysis: Screening Impact
There is an ongoing debate about the expected impact of active case finding (Welburn et al. 2016), which is known to vary among regions (Robays et al. 2004) and may change over time.As a consequence, the true impact of screening may deviate from the assumed impact, as quantified by parameter p. Figure 8 depicts how the quality of solutions obtained for baseline value p = 0.5 (which follows from the values in Table 1) is affected when p equals 0.1, 0.3, 0.7, or 0.9 in reality.Here, "optimized incorrect" refers to the optimized solution using p = 0.5.We use the optimized solution for the "real" values of p to estimate optimality gaps.
We observe that the estimated optimality gap for the optimized schedule using p = 0.5 is only 3.1%, 2.7%, and 9.4% when in reality p equals 0.1, 0.3, and 0.7, respectively.This shows that even a substantial over-or underestimation of impact does not necessarily have serious consequences.When p equals 0.9 in reality, however, sub-optimality increases to 50.2%.An explanation is that, as explained in section 6.3, optimized planning has the incentive to screen villages with low expected prevalence levels but high carrying capacities, so as to control the epidemic and prevent high prevalence levels.Such solutions, however, tend to overly focus on such villages in early planning periods when p is larger than the assumed value and thereby miss opportunities to address the epidemic in other villages.
The Max Cases Policy is more resistant to this because it focuses efforts on areas where expected prevalence is highest.As a consequence, it outperforms optimized planning by 4.5 and 21.4 percent points when p equals 0.7 and 0.9 in reality, respectively.This strengthens the belief that this policy provides a good alternative to optimized planning.The Prevalence Increase Policy overly focuses on preventing prevalence from increasing, rather than maximizing prevalence decrease.The lack of focus on the latter becomes particularly visible when the screening impact is high.For example, the gap with optimized planning equals 38.7 percent points when p equals 0.9 in reality.The schedules obtained by the Equalization and Differentiation Policies remain inferior to the optimized schedules, irrespective of p.

Sensitivity Analysis: Stochastic Participation
Our model implicitly assumes that the participation level in a screening round is fixed.In reality, however, participation is stochastic.This section examines how the quality of planning decisions is affected when participation in screening round n in village v, part vn is uniformly distributed on [part(1 − Δ), part(1 + Δ)], with Δ ∈ {0.05, 0.10, 0.15, 0.20, 0.25}.We randomly generate part vn for our baseline case, determine the "real" optimized solution (i.e., the optimal solution value when participation is known in advance) and determine the "real" value of solutions that were based on the assumption that part vn = part.This process is repeated 100 times for each value of Δ. Figures 9a and b depict the average and maximum observed optimality gaps.
The results again show that the Max Cases Policy outperforms the other practice oriented policies and optimized planning.The latter could again be explained from the insight presented in section 6.3: optimized planning has the incentive to screen villages with low expected prevalence levels but high carrying capacities.This is, however, too conservative in expectation.For example, if participation is high for just one screening round, this can substantially reduce screening efforts needed afterwards (cf.De Vries et al. 2016).

Conclusions and Discussion
In pursuit of Sustainable Development Goal 3, considerable global effort is directed toward elimination and eradication of infectious diseases in general and Neglected Tropical Diseases in particular.For various such diseases, the deployment of mobile screening teams forms an important instrument to reduce prevalence toward the disease elimination goals.There is considerable variety in planning methods for the deployment of these mobile teams in practice, but little understanding of their effectiveness.Moreover, there appears to be no systematic understanding of the relationship between capacity, for example, in number of teams, and progress toward the goals.
We have addressed these research topics for the neglected tropical disease HAT and establish that the planning problem is strongly NP-Hard.Moreover, we presented exact solutions methods and a relatively simple approximation of the relationship between capacity and prevalence.We empirically analyze the methods for a case study on a HAT region the Democratic Republic of Congo, and present two practically feasible planning policies which yield near optimal solutions.Moreover, the presented approximation of prevalence as a function of capacity appears rather accurate for the case study.
To assess the quality of solution methods we firstly developed and tested two (near) exact methods and an iterative heuristic.For the case study at hand, the heuristic delivers solutions which are very close to optimal (within 0.1%), referred to as optimized solutions.Moreover, it is fast enough to provide optimized solutions for the larger case study instances.
For the purpose of practical implementation in remote rural areas, we developed simple, practical policies, and bench marked their performance against the optimized solutions.The Equalization Policywhich strives to visit all villages equally oftenperformed poorest.It was outperformed by a generalization of this policy, the Differentiation Policy, which partitions the set of villages into classes, and subsequently strives to equalize visits per class.The Differentiation Policy shares some commonalities with the current WHO policy which distinguishes three classes.The prevalence resulting from the Differentiation Policy is still substantially above optimized prevalences.
The Max Cases Policy prioritizes in each planning period villages which have highest prevalence at the beginning of the planning period, per unit of time required to screen the village.The Prevalence Increase Policy prioritizes the villages with largest growth in prevalence if left unscreened, per unit of time required to screen the village.Despite their simplicity, the Max Cases and Prevalence Increase Policies yield decisions that are only 0.7% and 3.4% worse than optimized decisions in the baseline case.The near optimality of these policies and the slightly better performance of the Max Cases policy remain valid for three variants of the baseline case.As implementation of Max Cases policy only requires monthly (access to) current prevalence estimates, it appears intuitive and feasible to implement.Given its near optimality, we strongly recommend conducting an experimental study to implement and empirically evaluate the Max Cases policy.
The sensitivity analyses suggest that solution quality is relatively insensitive to inaccuracy of input data for the better performing solution methods.This is important because of the difficulty to collect accurate data in relevant settings.Larger inaccuracies, however, likely result in substantial sub-optimality of solutions.Thus we stress that the investment in mobile screening teams is considerable more effective when accompanied by investments in data collection for reliable parameter setting.We note that particularly the Max Cases policy appears robust to inaccuracies, which strengthens the belief that this policy provides a good alternative to optimized planning.
For the case study, our computational results show that the presented approximation of prevalence as function of capacity forms a practical proxy.The insight it can provide into the capacity required to achieve and maintain a prevalence level-which has been repeatedly problematic in the past-is particularly valuable.The capacity estimates enable to set or update capacity cost-effectively, instead of relying on experimental budget reductions which undo previously achieved prevalence reductions.
Our results indicate very substantial reductions in prevalence levels from mobile screening for the population of 73,512 people considered.More specifically, they estimate prevalence without screening to increase to 494 over a period of 3 years, whereas optimized screening can bring it down to 81 and the Max Cases Policy to 83.This confirms the empirical finding that the effectiveness of mobile teams greatly contributes toward elimination (Franco et al. 2017).At the same time, even optimal screening for 3 years still results in a prevalence level exceeding 1 case per 1000, more than 10-fold the goal set by the WHO (which implies a total prevalence target of seven or lower for the case at hand).Despite the effectiveness, we therefore doubt whether mobile screening team deployment will lead to the elimination goals in the case study region, especially so as it will remain tempting to cut the large screening expenses when prevalences become low.Our results may be viewed to confirm that alternative, innovative, approaches may be required to achieve the WHO targets.Current developments to improve the accessibility of treatment, for instance through the oral medication Fexinidazole may lead to this direction.One may doubt whether the presented functions for expected prevalence remain accurate when prevalence levels approach zero, and hence whether the presented models continue to give reliable insights.To better understand how to achieve the ambitious elimination goals of the WHO, we therefore call for the development of (discrete event simulation) models which simulate cases explicitly, rather than relying on continuous expected prevalence.Such models also allow to analyze policies which update prevalence functions and allocation decisions dynamically, based on cases found.The cases found can then additionally include cased found by passive case finding, which have been disregarded in our study.
Future modeling work could also examine innovative case finding approaches.There have been promising pilot studies involving mini, motorcyclebased, teams (Snijders et al. 2020) and a targeted door-to-door (TDD) approach (Koffi et al. 2016).The expected impact of a screening round on prevalence likely differs in comparison to traditional teams, due to differences in participation and diagnostic algorithms.This raises novel questions regarding the optimal case finding approach depending on context, how optimal screening frequencies differ per approach, and how different types of teams can complement each other (cf.Snijders et al. 2020).
Finally, future research could investigate more elaborate models or approximations of prevalence progression.Examples include models that incorporate multiple disease hosts (Casta ño et al. 2020, Rock et al. 2015, 2018) and models that explicitly distinguish multiple disease stages, for example, the exposed, asymptomatic, and symptomatic phases (cf.Deo et al. 2013Deo et al. , 2015)).An interesting resulting question then arises regarding the cost effectiveness of additional data collection: are the resulting gains towards elimination worth the additional costs of implementation and village level data collection?(cf.De Vries and Van Wassenhove 2020).
The WHO has not only formulated elimination goals for HAT, but also for other infectious diseases.For several of these diseases, among which is tuberculosis, screening by mobile teams is an important instrument in the efforts toward elimination.The methods we propose, and in particular the practical policies that perform very well, are general enough to be applied to other diseases.Of course, such application requires the present HAT specific average expected prevalence function to be replaced by another function, specific to the disease under consideration.It will be of interest to learn whether the Max Cases Policy and Prevalence Increase Policy again perform so well, and hence are more broadly of value to effectively reach elimination goals.Further research in this direction is encouraged.

Proofs
Proof of Proposition 1.
PROOF.Consider a three-partition instance with target value B and positive integers B/4 < r i < B/2 for i ∈ {1, . . ., 3T} and a target value B such that ∑ i r i ¼ TB.The three-partition problem requires one to decide whether the integers can be partitioned into T triples (i, j, k) such that r i + r j + r k = B for each of the T triplets.The three-partition problem is known to be strongly NP-complete (Gary and Johnson 1979).
A polynomial-time reduction from three-partition to MSTD is obtained as follows.We consider T + 1 planning periods, one cluster of villages, and one screening team.For each integer i ∈ {1, . . ., 3T}, we introduce a village v(i), resulting in 3T villages.For There is one team available.We set A Ã v ¼ 0, which implies that prevalence in a given village v(i) equals 1 until the first period in which v(i) is screened and is reset to zero starting from the end of that first screening period (because of Assumption 1).
We now claim that the three-partition instance I is a yes-instance if and only if the MSTD instance has a solution of value at most 1 2 TB.The proof is straightforward and the main intuition is depicted in Figure A1.
If. Let S = (s 1 , s 2 , . . ., s T ) be solution for I satisfying r i + r j + r k = B for each triplet s t , t = 1, . . ., T. Then a solution for MSTD is formed by simply maintaining the triplets of corresponding villages (v(i), v(j), v(k)).Now, schedule villages corresponding de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases to the first triplet s 1 at period 1, the villages corresponding to the second triplet at period 2, et cetera, until the villages of the final triplet s T are schedule at period T. Notice that the screening of the villages corresponding to integers in triplet s t is assumed to take effect at time t, t = 1, . . ., T.
The prevalence level at the beginning of the planning horizon equals ∑ n i¼1 N vðiÞ ¼ TB.Hence this is also the prevalence level during this period, as screenings takes effect when the period ends.More generally, during period t, the scheduling of villages vðiÞ ∈ s t reduces the prevalence level by ∑ i ∈ s t N vðiÞ ¼ B a time t, while the prevalence level during period t sums to (T − t + 1)B, as depicted in Figure A1.Hence the average expected prevalence level equals 1 Only if.Now suppose the MSTD instance has a solution of value at most 1 2 TB.As there is one team available in each period t, t = 1, . . ., T, the sum of the screening capacity consumptions r v(i) = r i /B of the villages v(i) screened in period t cannot exceed 1.As a result, a population of at most B people can be screened in each period.Starting with the given prevalence level of TB, and reducing the prevalence level per period with the maximum attainable value of B, we arrive at a total minimum average prevalence level of 1 2 TB, as is required for a yes-instance of MSTD.This value is only attained if in each period t, t = 1, . . ., T + 1, the reduction in prevalence is equal to the maximum possible reduction per period of B. As r v(i) = r i /B, it then follows from the fact that B/4 < r i < B/2 for i ∈ {1, . . ., 3T}, that exactly three villages are screened in each period t, t = 1, . . ., T. As the sums of the screening capacity consumptions of the villages v(i), v(j), v(k) screened in the same period t add up to 1, r i + r j + r k = B. Hence, the T triplets of the village indices form a (certificate for a) yes-answer for three-partition instance I.
Note that the decision version of MSTD is in NP since we can calculate the value of a solution in at most jVj Á jT j steps: for each village and for each of the time intervals τ vn , one needs to determine A vn , which costs a constant amount of time.Next, the solution value is calculated by means of Equation ( 8).This completes the proof that the decision version of MSTD is strongly NP-complete.
Proof of Proposition 2.
PROOF.Let f vn denote the average expected prevalence level in village v between screening rounds n and n + 1.The following lemma states how the long-term average expected prevalence level relates to τ v .LEMMA 1. Screening village v with constant interval τ v yields average prevalence level: PROOF.Our summation is the Cesaro mean of the sequence f f vn g n .De Vries et al. (2016) show that this sequence monotonically converges to the value defined in Equation (A1).The fact that the Cesaro mean of a convergent sequence yields the limit value when n→∞ (see Hardy 2000) proves our result.
The following lemma implies that we can get rid of the maximum operator in Equation (A1): LEMMA 2. There exists an optimal solution to Problems (10)-( 12) in which the screening interval is at least τ Ã ¼ Àlogð1ÀpÞ κ > 0 for each village.
PROOF.Suppose there exists an optimal solution with τ v < τ*.Then increasing τ v to τ* does not change the value of function (A1) and does not violate capacity constraint (11).The fact that τ* > 0 follows from the assumption that p > 0.
Enforcing this lower bound implies that the second term in Equation ( A1) is nonnegative.Based on this  10)-( 12) can be formulated as the following LP problem: Here, π v ¼ 1 τ v and π Ã ¼ 1 τ Ã .This problem can be seen as a continuous knapsack problem with capacity M, items v, weights r v , and values . The optimal solution to this problem is to "select" items in descending order of the ratio of value over weight (Kellerer et al. 2004).This corresponds to ordering the villages in descending order of the presented ratio, and (in this order) setting π v = π* (i.e., τ v = τ*) if remaining capacity suffices and setting π v to minimum possible screening frequency otherwise.

BLP Approach
The binary linear programming (BLP) approach takes formulation (1)-( 4) as a starting point and tackles the non-linearity of function B v ðτ v Þ by discretizing the prevalence progression function f v (s).We define a discretization as an ordered set of prevalence levels g, where f v (s 1 ) < f v (s 2 ) < . ... Here, f v (s 1 ) is a lower bound on attainable prevalence levels.
Let i ∈ F v be the prevalence level of village v the beginning of planning period t, and j ∈ F v be the prevalence level of village v at the end of period t.Notice that j depends on whether village v is screened in period t.Binary parameter A 1 vij equals 1 if screening village v, which is at prevalence level i at the current period's beginning, results in prevalence level j at the next period's beginning, and 0 otherwise.Similarly, binary parameters A 0 vij reflect the prevalence level transitions in case village v is not screened in period t.
Since we assume each screening round takes place at the end of a planning period (see section 3), the average expected prevalence level during period t only depends on the prevalence level i at the beginning of that period.Let parameters b vi represent the corresponding average expected prevalence level.Furthermore, let variables z vit indicate whether or not village v encounters prevalence level i ∈ F v at the beginning of period t.For the beginning of period 1, the expected prevalence level is indicated by binary parameters ζ vi1 , and we set z vi1 correspondingly.Using this notation, the planning and allocation problem can be formulated as the following BLP problem: Here, Equations ( B1)-(B5) model the discretized objective function.The other constraints have the same interpretation as in formulations ( 1)-( 4).We observe that both the number of constraints and the number of variables are O jVj Á jT j Á max v fjF v jg ð Þ .The proposed discretization may restrict the optimization to an incomplete set of relevant prevalence levels and hence may imply incorrect solution values and sub-optimal solutions.Discretized formulations can, however, be ensured to be exact in the following two ways.First, we can pre-calculate all attainable prevalence levels and include them into F v .Since, in principle, the number of possible prevalence levels grows exponentially with T, this is typically only feasible when T is small.
As an alternative, we can repeatedly solve the model while adding the actual set of prevalence levels F va encountered by each village to F v .Details are provided in Algorithm 1. First, we set parameters A 1 ij and A 0 ij optimistically: the prevalence level j at the beginning of the next period is the tightest lower bound on the actual prevalence level obtained when starting the current period with prevalence level i.Second, we solve the BLP, determine the actual set of prevalence levels F va encountered in each village, add this to the existing set of prevalence levels F v , update parameters A 1 vij and A 0 vij , and resolve the BLP problem.This process is repeated until we find a solution in which the prevalence levels encountered mirror the actual prevalence levels.Optimality of this solution is guaranteed by the fact that the quality of each other solution is represented optimistically by the choice of the parameters A 1 vij and A 0 vij .Convergence of the algorithm is implied by the observations that the algorithm always terminates when a solution is obtained for the second time and the number of possible visit patterns is finite.Computation times are a major drawback of this approach.

Column Generation Approach
An alternative approach is to formulate the problem in terms of selecting a visit schedule or visit pattern for each of the villages.Let P denote the set of patterns, each of which can be characterized by a binary vector  jT j .Subset P t represents the patterns in which a screening round takes place at time t.Furthermore, let variables x pv indicate whether pattern p is assigned to village v and let b pv denote the corresponding average expected prevalence level.Then the planning and allocation problem can be formulated as: A main disadvantage of this formulation is that the number of patterns equals 2 jT j , and hence that the number of variables grows exponentially with jT j.
Our third proposed solution approach is therefore based on column generation.
As is standard practice, our column generation approach starts from solving the so-called master problem (MP)-the LP relaxation of problems (B9)-(B13)-using only a subset of the visit patterns.The resulting problem is referred to as the restricted master problem (RMP).Next, "promising" patterns are identified in the so-called pricing problem, and added to the RMP.This process is repeated until no more promising patterns can be found.The resulting set of patterns is then added to Equations (B9)-(B13), which can subsequently be solved to optimality using integer programming techniques.Notice that this approach is not necessarily exact, as the optimal solution to Equations (B9)-(B13) may require patterns which have not been included or generated to find the optimal solution to the LP relaxation.
Pricing Problem.For a given solution, the pricing problem for village v corresponds to finding a pattern with negative reduced costs.Let c v denote the cluster village v belongs to and let γ v and γ ct represent the dual variables corresponding to constraints (B10) and (B11).Then the reduced costs of column p are given by (cf.Desaulniers et al. 2005, Desrosiers andL übbecke 2005): Consequently, the pricing problem is defined as: LP duality implies the following lower bound on the solution value of the LP relaxation of our problem, Z MP , and hence of Z (Desrosiers and L übbecke 2005).
Here, Z RMP denotes the solution value of the RMP: In solving the pricing problem, we again encounter the difficulties caused by the non-linearity of the prevalence progression function.We deal with this as follows.First, we again discretize the progression function f v (s), yielding the set of prevalence levels F v .Next, we build the graph GðF v Þ depicted in Figure B1.Each column of nodes represents one period, and each node within a column represents a prevalence level i ∈ F v .A visit pattern corresponds to a sequence of the red and blue arcs which model the sequence of screening decisions.A blue (red) arc from node i in period t to node j in period t + 1 represents the situation where village v has prevalence level i at the beginning of period t and is screened (not de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases screened) in period t, resulting in prevalence level j at the beginning of period t + 1.As the node set forms a discretization of the possible prevalence levels, the prevalence level of node j may not exactly equal prevalence level j* resulting from screening at prevalence level i.We choose j ¼ max j 0 ∈ F v : j 0 ≤ j Ã j 0 , that is, the node with highest prevalence level not larger than j*.Hence the resulting prevalence level j is a lower bound for the actual resulting prevalence level.Parameter b i denotes the average expected prevalence level in a given period when prevalence level i represents the prevalence level at the beginning of the period.
It can easily be verified that the length of a given s − t path provides a lower bound on the reduced costs of the corresponding visit pattern.Consequently, the shortest path obtained for any discretization F v yields a lower bound on rc Ã v and hence can serve to provide a lower bound on Z (see Equation (B16)).Moreover, if the prevalence levels visited by a given s − t path mirror the actual prevalence levels encountered, its length exactly equals reduced costs.This implies that one can obtain an exact solution method for the pricing problem by repeatedly solving the shortest path problem: Convergence of the algorithm is again guaranteed by the observation that the algorithm terminates as soon as the same path is encountered for the second time, and the number of possible paths is finite.Optimality is guaranteed by the observation that the length of the shortest path found equals the reduced costs of the corresponding visit pattern.Since the length of each alternative path provides a lower bound on their respective corresponding patterns, there is no alternative pattern having lower reduced costs.
Heuristic Pricing.A common strategy to accelerate a column generation approach is to search for one negative reduced costs column rather than the most negative one (Desaulniers et al. 2002).Since many iterations tend to be needed to find the optimal pattern using the exact method, we also follow this strategy.Specifically, we first try to find a promising pattern using the local search algorithm presented next.If this fails, we apply the shortest path approach for a dense discretization (see Figure B1).
The idea behind the local search method is to repeatedly fix the schedule for village v for T − k planning periods and to optimize the schedule for the remaining periods by enumerating all options.We do so for k = 1 until the algorithm converges and proceed with k = 2 afterwards.For a given value of parameter k, the algorithm is explained below.As with the iterated local optimization method, we run the algorithm twice, using starting solutions x v ¼ 0 and x v ¼ 1. Warm Start.A second acceleration strategy is to give the column generation algorithm a "warm start" by providing it with the patterns from the solutions obtained by iterated local optimization.

Iterated Local Optimization
For a given planning period t and solution (x, y), which satisfies y ct = 0 for all c ∈ C, let bct ðxÞ denote the solution value improvement when choosing to additionally send a mobile team to cluster c in period t.When considering the planning for all other periods to be fixed, it is optimal to send mobile teams to the M clusters for which bct ðxÞ is largest.This idea is elaborated in the iterated local optimization (ILO) approach, as formally explained in Algorithm 4. The algorithm repeatedly selects a period t, sets y ct = 0 for all clusters c, and reoptimizes the planning for that period while keeping the planning for all other periods fixed.The solution thus found can be identical to the initial solution, or can be another solution with the same or lower solution value.To improve the quality of the final solutions obtained, we run the algorithm for two starting solutions: x = 0 and the initially infeasible solution x = 1, and continue until no further improvements are found.
Determining bct ðxÞ is an optimization problem in itself.For a given solution x and period t, let bvt ðxÞ denote the decrease in average prevalence level when choosing to screen village v ∈ V c instead of choosing not to.Then the problem is to find the subset of villages in cluster c that maximizes the total prevalence level decrease while not exceeding screening capacity.We note that this is equivalent to a knapsack problem with values bv ðxÞ, weights r v , and capacity 1.We solve this problem as a BLP (see Kellerer et al. 2004).

Results on Case Study Variants
Table C1 describes results for three variants of the baseline case.In the first variant, which we refer to as Concentrated Cases, K v has been randomly generated from a U-squared distribution with offset parameter α = 0.00237442 and vertical scale parameter β = 8.580473.The mean of this function is the same as the mean of the function used for the baseline case.The WorldPop variant uses 100 meter resolution population estimates from the WorldPop database (Tatem et al. 2007).Population of village v is estimated as the total population of grid cells within a 3.5 km radius.We fitted this radius by comparing the resulting average estimated population with the average population in the baseline case.Finally, the Random Clustering variant randomly assigns villages to the six clusters, while keeping the number of villages per cluster equal.
We next assess the impact of excluding villages on our results.Specifically, we randomly select 10% of the villages to be removed from the baseline case, apply the policies, and determine the resulting expected average prevalence levels.We repeat this process 100 times.The following table describes the average and maximum optimality gap for the policies.It shows that the gaps are hardly affected and our general conclusions remain valid: Max Cases and Prevalence Increase yield near optimal decisions and the former slightly outperforms the latter Table C2.

Figure 2
Figure 2 Piecewise Linear Relationship between Capacity M and Average Expected Prevalence Level [Color figure can be viewed at wileyonlinelibrary.com]

Figure 3
Figure 3 Map of the 240 Villages in Kwamouth Included in Our Case Study.Clustering is Indicated by Color.Population is Indicated by Node Size [Color figure can be viewed at wileyonlinelibrary.com]

Figure 4 Figure 5
Figure 4 Average and Final (end of planning horizon of 36 months) Expected Number of People Infected in the 240 Villages for the Optimized Schedule and the Schedules Following from the Planning Policies [Color figure can be viewed at wileyonlinelibrary.com]

Figure 7 Figure 8
Figure 7 Results of the Sensitivity Analysis on Carrying Capacity Estimates (a) Average Optimality Gap (%) based on 100 Draws and (b) Maximum Optimality Gap (%) in 100 Draws [Color figure can be viewed at wileyonlinelibrary.com]

Figure 9
Figure 9 Results of the Sensitivity Analysis on Carrying Capacity Estimates (a) Average Optimality Gap (%) based on 100 Draws and (b) Maximum Optimality Gap (%) in 100 draws [Color figure can be viewed at wileyonlinelibrary.com] Figure A1 Prevalence Level Over Time in the MSTD Instance if the 3-Partition Instance is a yes-Instance [Color figure can be viewed at wileyonlinelibrary.com]

Figure
Figure B1 Graph Used to Solve the Pricing Problem for Village v as a Shortest Path Problem [Color figure can be viewed at wileyonlinelibrary.com] Figure 1 The Black Line in this Figure Represents the Prevalence Progression Function f v (s).After Screening Round n, We are at the Stage of Progression Corresponding to the Left Red Point, after which the Prevalence Level Develops to the Right Red Point.Next, Screening Round n + 1 Decreases the Prevalence level with Fraction p, Causing us to End up in an "Earlier" Stage of Progression: the Stage of Progression Corresponding to the Left Blue Point.Afterwards the Prevalence Level Develops to the Right Blue Point and the Process Repeats [Color figure can be viewed at wileyonline library.com] de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases that is, the minimum number of planning periods needed to visit all villages once.Then, in period t, we first determine for each cluster c the number N(t, c) of persons who were not screened in the past τ u planning periods.Next, the policy selects for screening in period t the M clusters with highest N(t, c).
ec .Next, in period t, we first determine for each cluster c the number N(t, c) of people for which the time since the last screening equals at least their target interval.Assignment of teams to clusters and to villages within the clusters is done as in the Equalization Policy.The present WHO policy is a Differentiation Policy.6.The Max Cases Policy strives to maximize the number of cases detected per planning period.

Table 1
Baseline Case Parameters Average Expected Prevalence Level vs. Capacity (# people screened per planning period).Dashed line: Average Prevalence Level Over the Next 30 Years Corresponding to the Optimized Solution.Solid line: Average Prevalence Level Over an Infinite Horizon, as Estimated by the Method Derived in Section 4 [Color figure can be viewed at wileyonlinelibrary.com] de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases Table C1 Average and Final (end of planning horizon of 36 months) Expected Number of People Infected in the 240 Villages for the Optimized Schedule and the Schedules Following from the Planning Policies, for the three Variants of the Baseline Case Table C2 Average and Maximum Optimality Gap when Excluding 10% of the Villages from the Baseline Case, Based on 100 Random Draws Subset of villages in cluster c y ct 1 if cluster c is visited in period t; 0 otherwise x vt 1 if village v is visited in period t; 0 otherwise r v Fraction of available time per mission consumed when visiting village v M Number of mobile screening teams B v ðτ v Þ Average prevalence level in village v during the planning horizon, given screening intervals τ v level j successes prevalence level i in village v if it is not screened; 0 otherwise b vi Average expected prevalence in a period if at the beginning of the period prevalence level i is encountered bct ðx Þ Solution value improvement when choosing y ct = 1 instead of y ct = 0, given solution x (continued) de Vries, van de Klundert, and Wagelmans: Toward Elimination of Infectious Diseases