Population structure of a vector of human diseases: Aedes aegypti in its ancestral range, Africa

Abstract Aedes aegypti, the major vector of dengue, yellow fever, chikungunya, and Zika viruses, remains of great medical and public health concern. There is little doubt that the ancestral home of the species is Africa. This mosquito invaded the New World 400‐500 years ago and later, Asia. However, little is known about the genetic structure and history of Ae. aegypti across Africa, as well as the possible origin(s) of the New World invasion. Here, we use ~17,000 genome‐wide single nucleotide polymorphisms (SNPs) to characterize a heretofore undocumented complex picture of this mosquito across its ancestral range in Africa. We find signatures of human‐assisted migrations, connectivity across long distances in sylvan populations, and of local admixture between domestic and sylvan populations. Finally, through a phylogenetic analysis combined with the genetic structure analyses, we suggest West Africa and especially Angola as the source of the New World's invasion, a scenario that fits well with the historic record of 16th‐century slave trade between Africa and Americas.


| INTRODUC TI ON
The mosquito Aedes aegypti is the major vector of diseases such as dengue, chikungunya, yellow fever, and Zika, that have plagued humanity for centuries and remain threats to millions of people worldwide. It is an invasive species with patterns of global migration that continue today (Powell, 2016).
There is little doubt that the ancestral range of the species is Africa. The ancestral form has been given the subspecies name Ae. aegypti formosus (Aaf), a dark mosquito breeding in tree holes and preferring blood meals from nonhuman wildlife (Lounibos, 1981;Powell & Tabachnick, 2013;Tabachnick, 1991). Aaf can be found today in Africa in its original sylvan habitats (larvae in tree holes and rock holes), as well as in cities and peridomestic habitats (e.g., villages, transient human dwellings, and their surroundings).
The paler form, or subspecies Ae. aegypti aegypti (Aaa), is a "domestic" mosquito, breeding in human-generated containers and preferring humans for blood meals (McBride et al., 2014). It is this form that during the last 400-500 years colonized much of the world's tropics and subtropics with the help of human movement and trade (Powell, 2016;Powell & Tabachnick, 2013), causing some of the largest outbreaks of mosquito-borne diseases, most recently the Zika outbreak (Centers for Disease Control and Prevention 2016).
Understanding the genetic structure of Ae. aegypti within Africa in high resolution and predicting the invasion dynamics and gene flow among populations can be very informative and helpful to control and predict future outbreaks of diseases they transmit. In Africa alone, more than 800 million people (~70% of the African population) are at risk for at least one of the diseases transmitted by this species (Weetman et al., 2018). Contrary to the traditional view that African Aaf is less competent for flavivirus transmission than Aaa outside Africa (Bosio, Beaty, & Black, 1998;Tabachnick et al., 1985), there is increasing evidence that the vector competence of Aaf varies considerably and is population-specific, with some African populations being as competent as those outside Africa (Diallo et al., 2008;Dickson, Sanchez-Vargas, Sylla, Fleming, & Black, 2014;Vazeille et al., 2013).
To address this challenge, we leverage a high-throughput speciesspecific genotyping single nucleotide polymorphism (SNP) chip (Evans et al., 2015). Dense genomic sampling of SNPs is extremely powerful for high-resolution analysis of historical biogeography and invasion dynamics [e.g., in the study of Aedes species (Brown et al., 2014;Kotsakiozi, Richardson et al., 2017;Rašić, Filipović, Weeks, & Hoffmann, 2014)]. The goals of this work were to (a) study the genetic structure of Ae. aegypti populations within Africa, (b) estimate the genetic diversity and differentiation among African populations and compare them with Aaa populations outside of Africa, and (c) identify the possible source(s) of the New World and Asia invasion.
Note on nomenclature: The subspecies designations Aedes aegypti formosus and Ae. aegypti aegypti were formally recognized by Mattingly (1957) with the former being a darker colored mosquito in African forests, while the latter are lighter colored with white abdominal scales found in human habitats primarily outside Africa.
While generally, collections of Ae. aegypti in Africa correspond to subspecies Ae. aegypti formosus, there are exceptions with some back migration of Aaa to Africa (particularly in East Africa) and mixed populations in West Africa which may represent the initial differentiation of Aaa (Crawford et al., 2017). Here, we use Aaf as shorthand to refer to populations in continental Africa and Aaa to refer to populations outside Africa with the explicit recognition these names are not clear-cut especially in Africa. In Table 1, we designate the ecological setting where the samples from Africa were taken to explicitly recognize the ecological diversity occupied by this species in Africa.

| Mosquito samples, DNA extraction, and genotype process
We sampled 20 populations of Aedes aegypti originating from continental Africa and nearby Reunion Island ( Figure 1, Table 1) covering a large part of the Aaf distribution. We also used 10 previously studied populations Kotsakiozi et al., 2018) of Aaa originating from the New World and Asia (Table 1).
Aedes mascarensis from the island of Mauritius was used as an outgroup; this species is very closely related to Ae. aegypti being able to form viable hybrids (Hartberg & Craig, 1970), but genetically distinct (Brown et al., 2014). Samples were either larvae preserved in 70%-90% ethanol, collected from multiple breeding sites per sampling locality, or eggs collected from multiple ovitraps set up at various locations. Eggs were reared to larvae or adults in standard laboratory conditions. DNA was extracted with Qiagen DNeasy blood/tissue kit using the standard kit protocol with an additional  (Table 1) to avoid large differences in sampling size between populations that can obscure the subsequent genetic structure analyses [for details on the effect of uneven sample size on genetic structure analyses, see (Puechmaille, 2016;Wang, 2017)]. This sample size is considered adequate for the purposes of the study, given the large number of SNPs assayed, the very low percentage of missing data (Evans et al., 2015), and the expected differentiation among populations (Gloria-Soria et al., 2016) estimated from previous studies [for details on the sampling size discussion and examples of using similar sampling size, see (Brown et al., 2014;Nazareno, Bemmels, Dick, & Lohmann, 2017;Patterson, Price, & Reich, 2006;Puckett et al., 2016)]. Here, we report results on a total, 315 mosquitoes (208 Aaf, 104 Aaa, and four Ae. mascarensis) genotyped using the Axiom_aegypti1 genotyping array (Evans et al., 2015). A total of 27,674 loci were included in the Axiom_aegypti1 SNP-Chip (overall genotyping rate 97.1%) unambiguously genotyped on the chip TA B L E 1 Population information for the Aedes aegypti samples used in this study Note. For each population, the sampling locality (with abbreviation), the ecological setting where sampled, the number of mosquitoes analyzed, the average number of SNPs obtained, and location in latitude/longitude for the samples are presented.
and passed the tests for conformance to being inherited as singlecopy Mendelian variants (Evans et al., 2015).

| Genetic structure analyses
From the 27,674 validated loci available on the Ae. aegypti SNPchip, a subset of 20,117 were variable in our dataset of 315 samples (hereafter referred to as broad dataset) including both Aaf and Aaa samples (as well as Ae. mascarensis). We further filtered this dataset eliminating highly linked loci using the-indep option (SNP window size = 500, window shift size = 50, variance inflation factor = 2) of plink (Purcell et al., 2007), so the final filtered dataset consisted of 17,069 SNPs. This allows us to use analytical procedures that assume independence across loci. The average percentage of missing data per sample in this dataset was 2%, and details on the average number of SNPs used per population are provided in Table 1.
Population genetic structure was evaluated using the Bayesian clustering method implemented in the software fastSTRUCTURE (Raj, Stephens, & Pritchard, 2014). We performed 10 independent runs, and the results were summarized and plotted using the online version of CLUMPAK (Kopelman, Mayzel, Jakobsson, Rosenberg, & Mayrose, 2015).
To complement the genetic structure analysis, we performed principal component analysis (PCA) and discriminant analysis of principal components (DAPC), using the R packages LEA (Frichot & Francois, 2015) and ADEGENET (Jombart, 2008), respectively, in R v.3.4.4 (R Core Team 2018). In DAPC analysis, the raw data is first transformed through a PCA and then a discriminant analysis (DA) is performed on the retained principal components (PCs). Thus, DAPC analysis can provide an efficient description of the genetic clusters present in the dataset using a few synthetic variables (discriminant functions). These variables are linear combinations of the original variables (raw data) that maximize the between-group variance and minimize the within-group variance.
The partitioning of the genomic variation among and within populations was evaluated through a hierarchical analysis of molecular variance, AMOVA (Excoffier, Smouse, & Quattro, 1992)

| Phylogenetic relationships
To infer the evolutionary relationships among the populations, we used a maximum likelihood (ML) analysis, as implemented in RaxML (Stamatakis, 2014), using 1,000 bootstraps and the GTR model of evolution along with the CAT model of rate heterogeneity. For the runs, we used the string "ASC" to apply an ascertainment bias correction to

| Genetic structure analyses
The results of the fastSTRUCTURE analyses on the broad (all samples) and the African dataset are shown in Figure 2 and   The results of the analysis of AMOVA are presented in Table 3.

| Genetic diversity and differentiation
The majority of the genetic variation in our dataset, regardless of the grouping, is within the populations. However, a great deal of variation (~20%) exists between groups in the first AMOVA analysis (Africa/outside Africa) confirming the pattern in Figure 2 and

| Phylogenetic analysis
The rooted ML phylogenetic tree is presented in Figure 7.

| D ISCUSS I ON
Considering the global scale, the SNP-chip data (Figures 2 and 4a) are consistent with microsatellite, and RAD-seq studies in showing that Ae. aegypti has two major genetic groups. These two groups generally correspond to the described subspecies, Ae. aegypti formosus (Aaf) in Africa and Ae. aegypti aegypti outside Africa (Aaa) with Aaa being monophyletic (Figure 7) thus implying a single out of Africa event (Brown et al., 2014;Gloria-Soria et al., 2016).
The population from Reunion Island, however, is exceptional in that it clustered with the Aedes mascarensis separately from the Aaf continental populations. Three hypotheses can be proffered for this unexpected distinction of Reunion samples. One is that Reunion Ae. aegypti has been introgressing with Ae. mascarensis (endemic to Mauritius), given the geographic closeness of the two islands, ~120 km apart.
Ae. aegypti and Ae. mascarensis can hybridize and produce fertile offspring (Hartberg & Craig, 1970). possibility is that Reunion, being ~1200 km from mainland Africa, has been isolated for considerable time, although simple isolation does not address the issue of its genetic closeness to Ae. mascarensis. A third possibility we cannot formally dismiss, is that this clustering may be an artifact of biased SNP choice. When the SNP-chip was designed (Evans et al., 2015), we did not have access to either Ae. mascarensis or the Reunion samples, so genetic variation in these populations was not incorporated into the chip design. However, even though Ae. mascarensis and Reunion samples genotyped at somewhat fewer loci (Table 1), enough loci (~13-14,000) did genotype to provide reliable data and seems unlikely this could have biased our conclusions.
Considering continental Africa alone, it is clear that ~17,000 SNPs provide better genetic resolution than that provided by 12 microsatellites [e.g., compare Figure 3 here with Figure 3b in Gloria-Soria et al.

Percentage of variation (%)
Africa/out of Africa  flow that disrupts the West-East geographic pattern ( Table 2) that had been suggested by previous studies (Bennett et al., 2016;Brown et al., 2011;Moore et al., 2013). This could be due to the fact that the forest habitat typical of ancestral Aaf was continuous across this part of Africa for a long period of time, before human habitation and cutting of forests, allowing enough time in a continuous habitat for even a poor disperser to become relatively genetically homogeneous.

TA B L E 3 Population Differentiation
Alternatively, the clustering of Kahawa, Kenya, with Cameroon samples (Figures 2, 3, and 7) may imply an old human-mediated migration across the continent. Bennett et al. (2016)  [as also seen in microsatellite data (Brown et al., 2011)]. While generally domestic collections are closely related to geographically close sylvan or peridomestic collections, the case of Nairobi, discussed above, is an exception and highlights the complex patterns of colonization that occur in Africa.
Using Ae. mascarensis as an outgroup, Aedes aegypti (sensu lato) forms a monophyletic group. Aaa outside Africa (New World and Asia) also forms a monophyletic group implying a single origin ( Figure 7). The single out of Africa origin of Aaa has been previously supported by microsatellite (Gloria-Soria et al., 2016) and RAD-seq (Brown et al., 2014) data as well as by a combination of five nuclear gene sequences and mtDNA (Bennett et al., 2016). More specifically, Bennett et al. (2016) supported West Africa as most likely origin of Aaa, in agreement with our data (Figure 7).
However, there is a major difference between Bennett et al. A recent study (Crawford et al., 2017) based on exome sequence data, suggested that Aaa may have arisen from populations of Aaf in West Africa, specifically from Senegal which was the only West African country sampled in that study. Our data indicate that, while Senegal has some genetic signal typical of Aaa outside Africa, the Angola sample displays an even stronger signal of genetic relatedness to Aaa outside Africa (Figure 2). The population from Angola F I G U R E 6 Isolation-by-distance plots for all pairs of populations from continental Africa. Statistical significance was evaluated through a Mantel test as implemented in the "ade4" R package. The original value of the correlation between the two matrices (geographic distance and genetic distance) is represented by a dot, while the histogram (a) represents the permutated values assuming the absence of spatial structure. Significant spatial structure results in the original value being out of the reference distribution. The correlation between geographic and genetic distance was plotted using the R package "MASS." The scatterplot (b) shows one single consistent cloud of points. The colored gradient from light blue to red indicates the density of the points which are also shown as red points in the background of the graph. The blue dashed line represents the regression line between the geographic and genetic distance shows admixed ancestry (Q values; 0.42-0.60) toward the New World genotype (Figure 2). Our phylogenetic analysis (Figure 7), including several West African populations (Figure 1), revealed that indeed Senegal samples are phylogenetically closely related to the Aaa, but that Angola is even closer and would be the best candidate for the origin of Aaa.
F I G U R E 7 Maximum likelihood (ML) rooted phylogenetic tree re-constructed using a panel of ~12,000 SNPs. Ae. mascarensis was used as an outgroup, and Aaa samples from New World and Asia were used to test the distinctiveness of Aaf and Aaa lineages. Bootstraps are presented on the nodes; values <70 are not shown Using genetic data, the time of origin of Aaa in the New World has been estimated to be ~400-500 years ago (Crawford et al., 2017;Gloria-Soria et al., 2016;Kotsakiozi et al., 2018). Yellow fever was first reported in the New World in 1648 (McNeill, 1976) not long after the introduction of Ae. aegypti to the New World. This is also the time of the rise of trans-Atlantic shipping by Europeans.
Ships starting their journey in Europe stopped in West Africa to pick up native Africans for the slave trade (Eltis & Richardson, 2010). It is likely that Ae. aegypti (as eggs and/or larvae) would have been introduced to those ships and they may have been already semidomesticated in the towns or coastal villages of West Africa (e.g., ovipositing in stored water containers during the prolonged dry periods in West Africa). Thus, these "proto-Aaa" mosquitoes could survive the long voyage between West Africa and New World. Interestingly, during the early period of slave trade, 1500-1650, ~70% of the trade was carried out by Portugal (Eltis & Richardson, 2010) with ships that primarily used what is today Angola as their source of slaves (Eltis & Richardson, 2010). An Angolan source of invasion is consistent with the genetic patterns observed (Figure 7).
From a public health perspective, Ae. aegypti in Africa has taken on new importance. After decades of low levels, yellow fever has been resurging in Africa (Kraemer et al., 2017). Insecticide resistance and lack of vaccine supplies are doubtlessly contributing to this resurgence. As urban environments continue to encroach on this formerly forest-adapted mosquito's habitat in Africa, it is clear that Aaf possesses the adaptive flexibility to repeatedly switch to urban breeding. This ongoing active evolution is also an attractive opportunity to study insect adaptations to human habitats, an issue of general importance in a number of medical and agricultural contexts.

ACK N OWLED G M ENTS
This work was supported by NIH grant RO1 AI101112 awarded to JRP.

CO N FLI C T O F I NTE R E S T
None declared.

AUTH O R S' CO NTR I B UTI O N S
PK carried out part of the molecular laboratory work, the data analy-