A spatial genetics approach to inform vector control of tsetse flies (Glossina fuscipes fuscipes) in Northern Uganda

Abstract Tsetse flies (genus Glossina) are the only vector for the parasitic trypanosomes responsible for sleeping sickness and nagana across sub‐Saharan Africa. In Uganda, the tsetse fly Glossina fuscipes fuscipes is responsible for transmission of the parasite in 90% of sleeping sickness cases, and co‐occurrence of both forms of human‐infective trypanosomes makes vector control a priority. We use population genetic data from 38 samples from northern Uganda in a novel methodological pipeline that integrates genetic data, remotely sensed environmental data, and hundreds of field‐survey observations. This methodological pipeline identifies isolated habitat by first identifying environmental parameters correlated with genetic differentiation, second, predicting spatial connectivity using field‐survey observations and the most predictive environmental parameter(s), and third, overlaying the connectivity surface onto a habitat suitability map. Results from this pipeline indicated that net photosynthesis was the strongest predictor of genetic differentiation in G. f. fuscipes in northern Uganda. The resulting connectivity surface identified a large area of well‐connected habitat in northwestern Uganda, and twenty‐four isolated patches on the northeastern margin of the G. f. fuscipes distribution. We tested this novel methodological pipeline by completing an ad hoc sample and genetic screen of G. f. fuscipes samples from a model‐predicted isolated patch, and evaluated whether the ad hoc sample was in fact as genetically isolated as predicted. Results indicated that genetic isolation of the ad hoc sample was as genetically isolated as predicted, with differentiation well above estimates made in samples from within well‐connected habitat separated by similar geographic distances. This work has important practical implications for the control of tsetse and other disease vectors, because it provides a way to identify isolated populations where it will be safer and easier to implement vector control and that should be prioritized as study sites during the development and improvement of vector control methods.


| INTRODUC TI ON
Tsetse flies (genus Glossina) are the only vectors of the trypanosome parasites that cause animal African trypanosomiasis (AAT) and human African trypanosomiasis (HAT), respectively, known as nagana and sleeping sickness. Together, the animal and human diseases pose health threats and economic burdens to vast regions of sub-Saharan Africa where they are endemic (Budd, 1999;Committee, 1998;Diall et al., 2017;Murray & Lopez, 1996;PAAT, 2000;Simarro, Diarra, Ruiz Postigo, Franco, & Jannin, 2011;Simarro, Franco, Diarra, Postigo, & Jannin, 2012). The parasites responsible for these diseases form a group of closely related taxa within the genus Trypanosoma that requires the tsetse fly vectors to complete their life cycle, and for transmission between mammal hosts. The animal infective form of sleeping sickness, AAT, is caused by multiple Trypanosoma parasites in sub-Saharan Africa, including T. congolense, T. vivax, and T. brucei brucei. The human-infective form of sleeping sickness, HAT, is caused by parasites that are closely related to T. b. brucei (Balmer et al., 2010;Sistrom et al., 2014) with two distinct host-evasion types that cause unique disease symptoms known as "chronic" and "acute" sleeping sickness. Although the parasites that cause these two forms of the human disease are currently known in the literature as subspecies T. b. gambiense and T. b. rhodesiense, respectively, the formal taxonomic rank is under revision (Berriman et al., 2005;Echodu et al., 2015;Gibson, Marshall, Marshall, & Godfrey, 1980;Jackson et al., 2010;Sistrom et al., 2016). Regardless of taxonomy, both forms of the disease cause serious human illness and are difficult to treat, and the specific drug treatment course depends on the type and stage of the infection (Fèvre, Picozzi, Jannin, Welburn, & Maudlin, 2006;Fèvre, Wissmann, Welburn, & Lutumba, 2008). Furthermore, for all forms of the disease including AAT, there are no vaccines available (Diall et al., 2017), and the drugs for treatment are expensive, can cause severe side effects, and are difficult to administer in remote villages where the disease is most prevalent (Simarro et al., 2011(Simarro et al., , 2012. Consequently, one of the most effective means of disease control is to reduce tsetse fly populations and thereby interrupt the transmission cycle. Successful vector control relies on large-scale coordination of on-the-ground measures that are based on detailed knowledge of the vector's distribution, movement patterns, and connectivity across a landscape. Without large-scale coordination, control efforts often result in reemergence of the vector after the intervention program has been halted (Aksoy, Caccone, Galvani, & Okedi, 2013;Bouyer et al., 2015;Manangwa et al., 2015;Okeyo et al., 2017;Opiro et al., 2016Opiro et al., , 2017. One strategy for coordination relies on using environmental data to model the vectors' movement patterns, or connectivity, across the landscape. Connectivity is difficult to model accurately because it depends on many factors including the species physiological requirements, such as thermal tolerance and metabolic limits, and current and past evolutionary forces acting on the resident populations. Nonetheless, models of connectivity across landscapes have greatly improved with advances in the quality and resolution of publicly available environmental data, and with the development of complex computational tools. For example, field surveys that provide GPS locations of vector presence can be combined with high-resolution satellite imagery to model landscape connectivity and to identify regions of priority in vector control planning (Elith et al., 2011;Wint & Rogers, 2000). Another advance combines genetic data with high-resolution environmental data to model genetic differentiation across the landscape, thereby inferring areas of probable species movement (Dyer, 2015;Manel & Holderegger, 2013;Segelbacher et al., 2010). These, and gene flowrelated methods (Bouyer et al., 2015;McRae, Dickson, Keitt, & Shah, 2008;McRae, Shah, & Mohapatra, 2013), have greatly improved opportunities to enhance the lasting benefits of vector control campaigns and can be applied to a range of vectors and regions where both ecological and genetic data are available.
Northern Uganda is a region with a special need for vector control because it is the only region in sub-Saharan Africa that harto date have experienced some setbacks due to resurgence of tsetse in treated areas because of either residual populations that were not eliminated or immigration from neighboring untreated areas Opiro et al., 2016;Vreysen, Seck, Sall, & Bouyer, 2013).
This highlights the need for improvements in vector control and monitoring (PATTEC, 2001;Simarro et al., 2011). The identification of isolated patches can provide useful natural settings for novel control methods and the improvement of old ones, very much as island locations have been argued by many as ideal sites to test novel protocols prior to their use in the main range of a species.
In this study, our main goal was to identify isolated geographic regions for consideration in targeted vector control of G. f. fuscipes in northern Uganda ( Figure 1) and to do so using a novel methodological pipeline that accounts for both evolutionary and ecological factors that can impact current levels of connectivity among tsetse populations. The main steps of the novel approach are summarized in Figure 2, including the types of inputs and outputs, and methods used. As input data, we used population genetic data ( O 6 ) and thereby allows for the identification of areas with the least risk of long-range colonization over both short and long time scales ranging from several to thousands of generations. We validate this pipeline by (1) conducting a field-survey searching for tsetse flies in geographic areas predicted by the model to have suitable habitat for G. f. fuscipes and to be isolated from other similarly suitable areas; (2) testing the level of genetic isolation of tsetse from one of these patches, as they should be more genetically differentiated for other tsetse populations from connected areas. We discuss how the integration of landscape genetics approaches with maximum entropy F I G U R E 1 Map showing the spatial context of the study in northern Uganda. Sampling sites used for the population genetic input data are indicated as black dots. Numbers are the same as in Table S1 (Appendix S1), where information on these sites is reported. The map also shows the distribution of the two Trypanosoma parasites, Trypanosoma brucei gambiense to the west and T. b. rhodesiense to the east (gray lines), responsible for the chronic and acute form of the HAT disease. Water bodies (rivers and lakes) are shown in light gray with the major ones identified by name. The map also reports the district names for the region 0 50 100 150 km and current movement patterns. We also discuss the possible immediate applications of this pipeline to help control tsetse flies in northern Uganda and consider its general applicability to identify isolated habitat in management and conservation of wild populations.

| Pipeline overview
The inputs for the pipeline include population genetic data, environmental data, and field-survey presence data ( reflects the genetic differentiation among populations that exceeds isolation by distance (IBD; Wright, 1943). The connectivity surface F I G U R E 2 Flow diagram of the methodological pipeline. Inputs (I 1 , I 2, and I 3 ) are shown as parallelograms, methods (M 1 -M 6 ) as rectangles, and outputs (O 1 -O 6 ) as ovals TA B L E 1 Sampling locality details include sample number (#) from Figure 1, village, district, sample size (N), latitude (lat), longitude (long), and basic diversity statistics reported in Opiro et al. (2017)  Allelic richness (AR), observed heterozygosity (H O ), expected heterozygosity (H E ), and the individual fixation index relative to the sample (F IS ) as estimated using GENALEX v6.501 (Peakall & Smouse, 2006), and Ne estimated with the LD method in NEESTIMATOR v2.01 (Do et al., 2013), "no estimate" indicates indistinguishable from infinite. The Ne estimate is marked if there was significant evidence (p value <.05) of a bottleneck under the TPM model (*), or with the mode-shift test ( §) implemented in BOTTLENECK (Piry et al., 1999).

| Population genetic data
We included analysis from 16 microsatellite loci from G. f. fuscipes from northern Uganda (Figure 1) that represented a subset of the samples described by Opiro et al., 2017  to the sample (F IS ) as estimated using GENALEX v6.501 (Peakall & Smouse, 2006).
To assess the levels and patterns of genetic differentiation among the 38 sampling sites targeted in this study, we used principal components analysis (PCA) and discriminant analysis of principal components (DAPC) with prior groupings based on sampling site and with the cross-validation formula for optimizing number of principal components to retain, using the adegenet R library (Jombart et al., 2010(Jombart et al., , 2012. We also performed clustering analysis with STRUCTURE v2.3.4 (Pritchard & Stephens, 2000;Falush et al., 2003). Finally, we test for isolation by distance (IBD) with a Mantel test with 10,000 randomizations implemented in the adegenet R library (Jombart et al., 2008(Jombart et al., , 2011. For this test, genetic distances were generated using Reynolds, Weir, and Cockerham (1983) method and geographic distances, in km, were generated using the Java-based "geographic matrix generator" v1.2.3 (Erst, 2017downloaded November 2017. To discuss the impact of demographic history on our predictions, we also present the results from Opiro et al. (2017)

| Environmental data
We selected continuous environmental variables of possible relevance to G. f. fuscipes population genetic structure based on published G. f. fuscipes laboratory and field experiments, known physiological requirements of this species, and population genetics. Although this step has an unavoidable subjectivity, we minimize this using all the environmental variables that may be relevant to tsetse distribution and dispersal, given G. f. fuscipes' know ecological requirements. Habitat preference for humid, cooler environments (Cecchi, Mattiolo, Slingenbergh, & De La Rocque, 2008;Dyer et al., 2011;Hargrove et al., 1992;Langridge, Kernaghan, & Glover, 1963;Leak, 1999), and laboratory and field experiments that indicate a negative behavioral response of this species to high temperature and low humidity (Dyer et al., 2011;Hargrove & Brady, 1992;Pollock, 1982) lead us to include surface temperature and rainfall. Field data from mark-recapture and population genetics studies that indicate G. f. fuscipes disperses a maximum of 2.5-14 km, depended on vegetation types (Bouyer et al., 2007;Challier, 1982;Cuisance, February, Dejardin, & Filledier, 1985;Hyseni et al., 2012;Rogers, 1977) led us to include vegetation parameters.
We used 13 types of environmental data (Table S1, Appendix S1) from moderate-resolution imaging spectroradiometer ( (Table S1, Appendix S1) at 1 km 2 resolution for northern Uganda.
To exclude covarying variables, we estimated pairwise straightpath distances between each of the 38 sampling sites for each of the 13 environmental variables and calculated the mean value encountered along these straight-path distances, using the Zonal Statistics tool in ArcMap 10.3 (ESRI 2014). We then performed linear regressions between straight-path means for each pair of variables, using the "pairs" function in R. We also carried out principal components analysis (PCA) in JMP v11.0 (SAS Institute Inc., Cary, NC, USA, 1989USA, -2007 to identify among the covarying variables the ones that explained the most variance. We used the results from these two types of analyses to select just one variable representative of any two or more variables that had a strong linear relationship (|R| > .80 and pvalue <.02) and that also explained the most variance in the PCA.
This reduced the number of environmental variables used as input data from 13 to five: RNF, DST, NST, NDVI, and PSN ( Figure 2: I 2 ).

| Field-survey data
We used data from Opiro et al. (2017) collected from 317 traps deployed in northern Uganda (latitude 1.542°-3.692° and longitude 31.119°-33.217°), using a field protocol previously published (Beadell et al., 2010;Echodu et al., 2013;Opiro et al., 2016Opiro et al., , 2017. In brief, traps were set out in groups of 10-15 that were placed at ~100-m intervals, and each cluster was spaced > 5 km from neighboring sites to cover a large geographic area of ~46,500 km 2 . Coordinates from multiple traps (within <2 km distance) were averaged to provide a single geospatial coordinate per sample. Very scant information on G. f. fuscipes distribution was available for this region before these surveys, because of the area was difficult to access due to civil unrest that plagued this region from the 1980s through 2009. Table S2 (Appendix S1) summarizes the geographic information for the presence data, which are the inputs together with the environmental variable(s) correlated with genetic differentiation for the MaxEnt analysis to obtain a connectivity surface (Figure 2: I 3 , M 4 and O 4 ).
This matrix together with a matrix of geospatial coordinates for each site made up the input data for the Circuitscape analysis ( Figure 2: I 1-2 ).

| M 2 : Circuit models of environmental connectivity
We used Circuitscape (McRae et al., 2013) to build models of environmental connectivity between the 38 sampling sites, using the five non covarying environmental variables selected (Figure 2: Ib).
Circuitscape employs circuit theory to predict how environmental resistance impacts species connectivity across a landscape (e.g., it produces "current" maps). Each of the 38 sampling sites was considered an "electrical source" to each of the remaining sites acting as the "ground." Environmental resistance between pairwise sourceto-ground sites is measured by applying random-walk theory on a surface that reflects the putative cost to tsetse fly movement based on the ecology of the species (McRae et al., 2013). We first created "geographic distance only" current map and pairwise resistance matrix to provide a control for IBD for the next method in the pipeline ( Figure 2: M 3 ). Then, to prepare the environmental data input as resistance surfaces, we followed an approach similar to that which is described in Wang, Savage, and Bradley Shaffer (2009). This involved assigning hypothetical costs ("resistance costs") to tsetse movement to environmental or topographic features in the landscape using 11 bins ranging from (1)

| M 3 : Testing for correlation with genetic differentiation
To quantify the contribution of the environment to genetic differentiation, we used a generalized matrix regression called multiple matrix regression with randomization (MMRR; Figure 2: M 3 ; Wang, 2013). Other methods are available for assessing environmental association (reviewed by Hall & Beissinger, 2014), including linear mixed effect modeling, geographically weighted regression (Fotheringham, Brunsdon, & Charlton, 2002), generalized dissimilarity modeling (Ferrier, Manion, Elith, & Richardson, 2007;Thomassen et al., 2010), Bayesian geographic analysis (Gautier, 2015) and ordination methods such as redundancy analysis. However, these models did not provide clear advantages over MMRR and can have higher rates of type I errors (Kierepka & Latch, 2015). Given that there were no clear advantages of other methods, we chose to use MMRR for this study.
The input for the MMRR was the F ST matrix ( We additionally provide results from partial Mantel tests (Manel, Schwartz, Luikart, & Taberlet, 2003) as a comparative and confirmatory tool. Mantel tests remain common in landscape genetics (Manel et al., 2003) and can perform better than other methods when the assumption of linearity is not violated (Kierepka & Latch, 2015;Shirk, Landguth, & Cushman, 2017;Zeller et al., 2016). However, use of Mantel tests is controversial because of weakness in accounting for spatial autocorrelation (Legendre & Fortin, 2010;Manel et al., 2003), so we interpret results of the partial Mantel test with caution and as a supplement to the results of the MMRR. The partial Mantel test was implemented with the "vegan" package (Oksanen et al., 2013) in R.
For both MMRR and the partial Mantel tests, we used 10,000 permutations to calculate empirically based p-values, and only considered models that were significantly correlated with genetic differentiation (p-value <.05; Figure 2: O 3 ) as candidates of high influence on genetic differentiation. These models were selected as inputs for modeling a G. f. fuscipes connectivity surface (see below).

| M 4 : Connectivity surface
We used a maximum entropy model (MaxEnt; Figure 2 to G. f. fuscipes dispersal and gene flow (Beadell et al., 2010;Echodu et al., 2013). Note that in this step, we used MaxEnt in a univariate rather that multivariate analyses, as we only had one environmental variable of significant correlation with genetic differentiation (see results). Thus, the output of this analysis is not meant to be a habitat suitability model, but a connectivity surface reflecting environmentally driven genetic differentiation.

| M 5 : Identifying discrete landscape patches
To identify geographic regions with low connectivity to the main tsetse habitat (isolated landscape patches), we used the connectivity surface ( Figure 2: O 4a ) as the input of a clustering analysis (Figure 2: For the clustering analysis, we used the R packages "raster" (Hijmans, 2016), "rgdal" (Bivand, Keitt, & Rowlingson, 2017) and "dismo" (Hijmans, Phillips, Leathwick, & Elith, 2017) to find discrete patches within our connectivity surface (Appendix S3). We first converted the connectivity surface into a matrix in which pixels with scores >0.50 were selected as the "environment of interest" (value of 1), while the remaining pixels, including water bodies, were considered "background" (value of 0). To evaluate sensitivity of our analysis to cutoff decisions, we explored different cutoff scores from 0.35 to 0.65 and a minimum number of pixels of 2-7 (each pixel measuring 1 km 2 ; Appendix S4). After finding that discrete patches were stable across many of the chosen cutoffs, we decided to use a cutoff of a score >0.50 with a minimum size of 4 pixels, because it produced a representative set of discrete patches useful in a management context (i.e., discrete but still large enough to be useful for testing vector control and monitoring strategies; Appendix S4). We then clustered together these discrete patches according to their distance from other patches and the known dispersal ability of tsetse flies (5-8 km; Challier, 1982;Cuisance et al., 1985;Bouyer et al., 2007) and chose to conservatively use the smallest value (5 km), as the upper limit to group together geographically close patches (Appendix S4). Thus, the final criteria to identify discrete landscape patches (Figure 2: O 5 ) in northern Uganda were as follows: a final cutoff score of >0.5, a minimum size of 4 pixels, and a clustering range of 5 km.

| M 6 : Identifying the isolated habitat patches that fall within suitable habitat
We overlaid the resulting discrete landscape patches (Figure 2: O 5 ) onto a habitat suitability model to identify and exclude the largest continuous patch of suitable habitat (referred to as the "main habitat belt" from here forward) for consideration as an isolated patch. For a habitat suitability model, we started with the existing FAO model (Wint & Rogers, 2000). However, after discovering that some of the observed GPS points of G. f. fuscipes field-survey observations we had from our survey data fell outside of the FAO model, we decided to update the habitat suitability model using MaxEnt (Elith et al., 2011). To build this updated model, we used the same five environmental parameters and presence data prepared for the main analysis pipeline (Figure 2: I 1 and I 2 ), with the same default program parameters used to build the connectivity surface. We then merged this MaxEnt output with the existing FAO model (Wint & Rogers, 2000) using the maximum suitability of each pixel from the new MaxEnt and the existing FAO models as our final updated habitat suitability model. We used the maximum value for each pixel to make our prediction of suitable habitat more inclusive. If, through this method, we potentially over predicted suitability, we would also be under predicting areas of low suitability and the number of isolated patches, making this a conservative approach.
We overlaid the patches identified in M 5 onto this updated habitat suitability map to identify patches that fell within areas modeled at above 25% suitability. In doing so, we had to exclude discrete patches located <5 km from the edge of the model because the clustering algorithm used could not account for connection with habitat outside of the study area, for example to the west where the main suitable habitat extends past the geographic frame considered.

| Validation of the methodological pipeline using field and genetic data
To determine whether flies from isolated but suitable areas identified in our pipeline were (1) present and (2) more genetically isolated from neighboring areas than would be expected based on geographic distance alone, we conducted a field survey and collected samples along three transects that crossed five of the model-defined isolated patches along the northern shore of Lake Kyoga. The survey was carried out over a period of 3 days for each transect in November of 2016. Presence data were collected by deploying 317 biconical tsetse traps as described in previous work (Beadell et al., 2010;Echodu et al., 2013;Opiro et al., 2016Opiro et al., , 2017. Traps were set out in 10-15 per group and placed at ~100-m intervals within each group. The GPS coordinates for points of suitable habitats falling in each of these transects were used to deploy on average 26 traps/transect (total of 106 traps) within ≤5 km of each transect. To maximize finding flies preference was given to sites close to water bodies, with suitable vegetation nearby. Site accessibility by road was also taken into consideration to increase survey efficiency. Flies were stored individually in 95% ethanol, and information on sex, collection date, trap number, and geographic coordinates of each trap was recorded.
One site was selected for genotypic analyses to test whether flies from this area were indeed more genetically isolated than flies from sampling sites in connected habitat separated by similar geographic distance, as this would lend support to our approach to identify isolated patch with tsetse flies using a combination of environmental, genetic, and field-survey presence data. DNA was extracted from two to four legs using Qiagen DNeasy Blood and Tissue Kit (Qiagen, Germany), following manufacturer's protocols and stored at −20°C.
Genotypic data from 16 microsatellite loci were collected using published protocols to be able to merge the new data with the existing genotypic database for this region (Opiro et al., 2017).
Pairwise F ST between all samples from (Opiro et al., 2017) and the ad hoc sample were obtained using ARLEQUIN (Excoffier & Lischer, 2010) with Wright's statistics (Wright, 1951), following the variance method developed by (Weir & Cockerham, 1984), using 10,000 permutations to obtain exact p-values. F ST was adjusted for finite populations (Rousset, 1997), using the equation F ST /(1 − F ST ), and then we compared the level of differentiation of the isolated patch to the closest sites that we sampled from within the main habitat (25-100 km away), to the level of differentiation found between pairs of sites within the main habitat belt that were separated by the same range of geographic distances (25-100 km), and tested the prediction that the ad hoc isolated patch would have higher differentiation than pairs from within the main habitat belt with a t-test in JMP.

| Population genetic data
The results of the PCA, DAPC, and STRUCTURE analyses are presented in Figures S1, S2, and S3, respectively (Appendix S2). The PCA and DAPC indicate that the majority of the variance in the genetic data correlates closely with geographic placement of the study sites ( Figures S1 and S2, Appendix S2). In contrast, the STRUCTURE results ( Figure S3, Appendix S2) indicate some substructure within the study area. The Mantel test for correlation between genetic differentiation and geographic distance indicates highly significant IBD ( Figure S4, Appendix S2; p-value <.00001).

| Environmental data
We used the 13 continuous environmental variables (Table S1, Appendix S1) and carried out linear regression and PCA analyses to assess the presence of covariation and the weight of contribution of each variable. Figure S5 (Appendix S2) shows the results for the linear regression. We found strong linear relationships between NDVI, leaf area index, fPAR, evapotranspiration, and mean annual tree cover (|R| > .80 and p < .001) and ( Figure S5, Appendix S2) between gross primary production and net photosynthesis (R value of .82 and a p-value of .001). Figure S6, Appendix S2, shows the results of the PCA of environmental data, where the first and second principal components accounted for 56.6% and 18.5% of the variation among the variables, respectively. Using the results of both methods we selected a single variable (marked by asterisk in Table S1, Appendix S1) to represent all covarying factors if |R| was >.85. NDVI was selected to represent covarying fPAR, LAI, TC, EVI, LE, and ET based on the PCA, where NDVI explained the most variance on the first axis (56% of total variance; Figure S6, Appendix S2). PSN was selected to represent covarying GPP and elevation based on the PCA, where PSN explained the most variance on the second axis (18.5% of total variance; Figure S6, Appendix S2).

| O 1 : Genetic differentiation matrix
Genetic differentiation among the 38 sites, expressed as pairwise F ST values, is published and described in detail by Opiro et al. (2017) and is available in Table S3 (Appendix S1).

| O 2 : Circuit models of environmental connectivity
Twenty resistance surfaces based on the five independent environmental variables (Figure 2: I 2 ) produce 20 current maps in Circuitscape (Figure 2: O

| O 3 : Correlation of environmental data and genetic differentiation
Results from the tests for normality and homogeneity of residuals confirmed that linearity assumptions of MMRR were not violated ( Figure S7, Appendix S2). Results from the MMRR and the partial Mantel tests are presented in

| O 4 : Connectivity surface
The environmental variable that was significantly correlated with genetic differentiation (PSN; Table 2) was used to create the connectivity surface. This connectivity surface indicates fine-scale variability of expected environment-dependent genetic differentiation in G. f. fuscipes, and the existence of a large region of well-connected landscape in the northwest along the Albert Nile and the Achwa River (green in Figure 4). There is also patchy PSN on the eastern margin of the study area (pink in Figure 4), indicating low modeled connectivity among habitats in these regions.  This equates to ~20,000 km 2 more landscape that is suitable for G. f. fuscipes in northeastern Uganda than previously recognized.

| O
Of the forty discrete landscape patches of adequate size and distance (purple outlines in Figure 4), twenty-four fell within habitat modeled at greater than 25% suitability for G. f. fuscipes (purple outlines in Figure 5a), so were retained as possible candidates for local eradication and/or testing of novel control methods. All these patches fall north and east of Lake Kyoga, in the southern parts of the study area.

| Validation of the methodological pipeline using field and genetic data
The field survey of the transects spanning three model-defined included the isolated patch averaged 0.14 (range: 0.12-0.16). This difference was statistically significant, according to a t test p-value <.0001 ( Figure 6). This result suggests that the model we built correctly identified isolated habitat patches with relative high genetic differentiation compared to pairs of samples located in more continuous habitat and that PSN, although only marginally significant, is useful to identify isolated habitat patches in this species.

| Population differentiation and genetic diversity of G. fuscipes fuscipes in the study area
The analyses of genetic differentiation among the sampling sites shown that strong IBD can create a false signal of population structure (Frantz et al., 2009;Meirmans et al., 2012;Falush et al., 2016) and can result in a STRUCTURE pattern that looks like a smooth transition between two genetic clusters, which is exactly what we find at K = 2 in this study area ( Figure S3, Appendix S2). Given these considerations, we suggest that there is little evidence of genetic structure beyond IBD in the study area. This is different from the conclusion drawn by Opiro et al., 2017; where the observed zone of genetic admixture, the so-called the transition zone, was interpreted to be the result of secondary contact between two genetically distinct clusters. This conclusion is strengthened by the fact that there is a smooth gradient of allelic richness from northwest to northeast (Table 1; Opiro et al., 2017), and by the finding of high migration rates and highly admixed individuals reported for this region that suggests free interbreeding (Opiro et al., 2017). Nonetheless, this is an important point to take into account because genetic structure caused by past allopatry, if present, can limit the accuracy of MMRR in detecting correlation of genetic differentiation and environmental F I G U R E 5 Habitat suitability maps for G. f. fuscipes in northern Uganda: (a) updated habitat suitability map obtained using 317 presence data, 12 environmental variable relevant to tsetse ecology (Table 1), and a canonical multivariate MaxEnt (Elith et al., 2011) analysis. This map also shows the twenty-four isolated patches identified by the model (gray polygons), the three transects (black segments) used for the field survey, and the location of the tsetse sample from one of the isolated patches used to validate the method; (b) FAO habitat suitability map for G. f. fuscipes (Wint & Rogers, 2000). The legend to the right of each map explains the map color scheme, ranging from dark red (highly suitable habitat) to green (unsuitable habitat). Water bodies are shown in light blue variables. Here, we take this possibility into account in the interpretation of PSN as a good predictor of genetic differentiation (below) and urge any future work that includes more than a single genetic cluster to incorporate genetic structure explicitly into tests for correlation using approaches that use population-specific covariates (Gautier, 2015).
Another important factor to consider in landscape genetics is the possible artifacts caused by demographic history. For instance, founder events can lead to genetic drift, which inflates genetic differentiation in isolated populations, and can confound landscape genetics interpretations by uncoupling patterns of differentiation from its environmental drivers. For this reason, we consider Ne estimates and bottleneck tests results (Opiro et al., 2017), which are presented in Table 1. Ne estimates varied considerably in size (average ~425; range = 17-1,549), with wide confidence intervals and some estimates not available because they were indistinguishable from infinity (Table 1; Opiro et al., 2017). Despite evidence of relatively small and variable Ne estimates, tests for bottlenecks were positive in only a few populations (sites 14, 26, 35, and 36; Figure 1, Table 1) that had been the target of recent vector control (Opiro et al., 2016(Opiro et al., , 2017.

| Environmental-dependent genetic differentiation
Results from the MMRR and the partial Mantel tests ( f. fuscipes in Uganda and our knowledge of its ecology (Dyer et al., 2011;Pollock, 1982;Rogers, 1977 (Dyer et al., 2011;Hargrove & Brady, 1992;Pollock, 1982) that often occurs near water sources along rivers and wetlands (Beadell et al., 2010;Katunguka-Rwakishaya & Kabagambe, 1996). Habitat along rivers and wetlands would also sustain higher plant growth than drier landscape further from a water source and could account for the correlation with PSN.
Assuming that PSN is a reliable predictor of dispersal patterns  (Opiro et al., 2017). Furthermore, G. f. fuscipes in the study area has small but relatively stable Ne. This suggests that the major patterns of genetic differentiation were created under conditions of migration-drift balance and that bottlenecks did not confound results in this study. This conclusion is also supported by previous population genetics studies that have shown evidence of ongoing gene flow among distinct populations separated by over 100 km (Abila et al., 2008;Beadell et al., 2010;Echodu et al., 2013;Hyseni et al., 2012;Opiro et al., 2017). Perhaps genetic drift contributes to high levels of genetic differentiation across small spatial scales and is counterbalanced periodically by rare long-distance dispersal that connects populations along the habitat corridors identified in our analyses. Taken together, results from the Circuitscape analysis and MMRR suggest that PSN can be used as a predictor of genetic differentiation and that control and monitoring activities should maximize efforts along river corridors and within wetlands in order to improve both short-and long-term success of G. f. fuscipes suppression in northern Uganda.

| Improved habitat suitability model
Northern Uganda is a region critical for HAT control, because it is the only place in the world where the two Hat diseases are likely to merge in the near future. Yet, the most recent suitability map for the main vector of the two parasites responsible for the diseases dated back to 2000 (Wint & Rogers, 2000), before the conclusion of civil unrest that plagued the region throughout the 1990s until ~2009 (Royo, 2008;Ruaudel & Timpsen, 2011;Welburn & Odit, 2002) and made this region difficult to access. The updated habitat suitability model provided in this study (Figure 5a) built on the previous one ( Figure 5b) and provides additional insights because of the addition of more than 300 presence data points (Table S2, Appendix S1). Compared to the FAO Uganda-wide suitability maps (Figure 5b; Wint & Rogers, 2000), our updated habitat suitability model indi- cates the presence of about 20,000 km 2 more suitable habitat than previously recognized at the eastern margin of G. f. fuscipes' range ( Figure 5a). The existence of a larger suitable habitat than previously thought needs to be taken into account when planning the spatial scale of control intervention and follow-up monitoring activities. This is because areas previously considered not suitable for tsetse persistence were deemed as low priority for control, and thus may be enabling the persistence of small tsetse populations and be sources for re-invasion.

| Identification of isolated patches
The PSN-based landscape connectivity surface (Figure 4 The genetic analyses confirm discontinuity in the tsetse range in this area, which further supports the contribution of PSN to patterns of genetic connectivity in tsetse populations over evolutionary time scales. We validated the existence of isolated habitat patches by conducting a field survey and genetic analyses of one population sample from one of the isolated patches. The finding of high and significant levels of genetic differentiation ( Figure 6; Table S5, Appendix S1) between flies from the isolated patch and flies from sampling sites within the main continuous habitat located at similar geographic distances confirms the ability of this methodological pipeline to detect However, as with any control measure, caution should be practiced because long-range migration is also known to occur in this system (Beadell et al., 2010;Krafsur, Marquez, & Ouma, 2008;Opiro et al., 2017).

| Comparison with other methods
The pipeline we describe ( Figure 2) and implement in this article is not the first one to propose the use of genetic data to identify isolated tsetse populations, as it builds on advancements by Bouyer et al. (2015) and others (Biek & Real, 2010;Bouyer & Lancelot, 2017;Dicko et al., 2014;Kulkarni, Desrochers, & Kerr, 2010;Laporta, Ramos, Ribeiro, & Sallum, 2011). However, it differs from previous methods in several ways that we suggest improve our predictive power to describe tsetse movement over both ecological and evolutionary time scales.
The main difference of this pipeline from previous methods is the separate use of landscape genetics and field-survey data, first to identify the environmental factors important in genetic differentiation over multiple generations (MMRR, Figure 2: M 1-3 ), and second to build a model of habitat isolation based on current-day conditions and tsetse presence (Figure 2: M 4 and M 5 ). This is different than the approach used by Bouyer et al. (2015), wherein genetic differentiation between populations was used to parameterize a landscape friction map that identified genetically isolated patches in a single step. This single step approach limits the spatial observation data to the same 37 GPS points used for the genetic analysis (Bouyer et al., 2015). However, Bouyer et al.'s (2015) approach has the advantage of removing the need to assign resistance costs, a step often criticized because of its subjectivity (Manel et al., 2003(Manel et al., , 2003Spear, Balkenhol, Fortin, McRae, & Scribner, 2010). We minimize subjectivity in this pipeline by testing four models for each variable (Table 2; positive linear, negative linear, positive exponential and negative exponential (sensu Wang et al., 2009).
Using genetic data as an input in modeling connectivity across the landscape (Figure 2: O 4a ) allows us to build an understanding of the patterns of movement along corridors of habitat over hundreds to thousands of generations. When we combine this output ( Figure 2: O 3 ) with models of connectivity using field-survey presence data, we integrate the evolutionary and ecological time scales to obtain a more complete understanding of patterns of migration over hundreds to thousands of generations and current regions of high ecological suitability. This is especially relevant in this context because long-range migration events have been described in G. f. fuscipes populations, as evidenced by the finding of migrants from distances of up to ~60 km (Beadell et al., 2010;Krafsur et al., 2008;Opiro et al., 2017).
We think this approach increases the practical utility of the isolated habitat patches that we identified (Figure 2: O 6 ) over what could be found using a landscape friction model using GPS coordinates from a limited population genetics sample (Bouyer et al., 2015;Dicko et al., 2014;Kulkarni et al., 2010;Laporta et al., 2011), or from field-survey data alone as would be possible from results provided by Wint and Rogers (2000). The fact that this pipeline (1) identified a habitat patch the indeed includes tsetse flies and that (2) the fly population in this patch is significantly more genetically differentiated than others located in connected habitat and separated by similar geographic distances lends support to the ability of the method to produce accurate models of past and current movement of the target species across the landscape.

| CON CLUS I ON AND FUTURE DIREC TIONS
Our methods pipeline builds on the progress by Bouyer et al. (2015) and others (Dicko et al., 2014;Kulkarni et al., 2010;Laporta et al., 2011) to develop an approach that produce a connectivity surface to identify isolated habitat patches that reflect both genetic connectivity and ecological connectivity at a spatial scale of interest.
Our goals were threefold: to identify isolated habitat patches as candidates for local eradication, to improve the G. f. fuscipes habitat suitability model for a particularly high disease risk region of Uganda, and to produce a pipeline that integrates historical and current species movement patterns. We achieved this by integrating genetic data from 38 G. f. fuscipes sampling sites in northern Uganda (Opiro et al., 2017), high-resolution satellite imagery, and field-survey results. By doing so, we improved our understanding of tsetse connectivity across the landscape in this region and provide relevant information for vector control by identifying areas where tsetse population occur but are at low risk of genetic exchange with other tsetse populations. These areas can then be prioritized for trials of new control and monitoring methods and improvement of old ones. Future work will include an expansion of a similar methods pipeline to the whole country and inclusion of whole genome SNP (single nucleotide polymorphisms) data to improve resolution of patterns of habitat connectivity across the genome and the landscape, and to identify genetic associations between specific regions in the genome and environmental and physiological traits relevant to disease transmission.

ACK N OWLE D G M E NT S A N D FU N D I N G I N FO R M ATI O N
We acknowledge financial support from Fogarty Global Infectious We list the environmental variable input into the model, other highly correlated variables that were not included (Table S1, Appendix S1; Figures S5 and S6, Appendix S2), and the contribution of the variable to the final model update.
TA B L E 3 Contributions of each of the five independent environmental variables to the MaxEnt habitat suitability model (Elith et al., 2011) used to update the existing FAO's habitat suitability model (Wint & Rogers, 2000)