Genetic evidence for the origin of Aedes aegypti, the yellow fever mosquito, in the southwestern Indian Ocean

Abstract Aedes aegypti is among the best‐studied mosquitoes due to its critical role as a vector of human pathogens and ease of laboratory rearing. Until now, this species was thought to have originated in continental Africa, and subsequently colonized much of the world following the establishment of global trade routes. However, populations of this mosquito on the islands in the southwestern Indian Ocean (SWIO), where the species occurs with its nearest relatives referred to as the Aegypti Group, have received little study. We re‐evaluated the evolutionary history of Ae. aegypti and these relatives, using three data sets: nucleotide sequence data, 18,489 SNPs and 12 microsatellites. We found that: (a) the Aegypti Group diverged 16 MYA (95% HPD: 7–28 MYA) from its nearest African/Asian ancestor; (b) SWIO populations of Ae. aegypti are basal to continental African populations; (c) after diverging 7 MYA (95% HPD: 4–15 MYA) from its nearest formally described relative (Ae. mascarensis), Ae. aegypti moved to continental Africa less than 85,000 years ago, where it recently (<1,000 years ago) split into two recognized subspecies Ae. aegypti formosus and a human commensal, Ae. aegypti aegypti; (d) the Madagascar samples form a clade more distant from all other Ae. aegypti than the named species Ae. mascarensis, implying that Madagascar may harbour a new cryptic species; and (e) there is evidence of introgression between Ae. mascarensis and Ae. aegypti on Réunion, and between the two subspecies elsewhere in the SWIO, a likely consequence of recent introductions of domestic Ae. aegypti aegypti from Asia.

Aedes aegypti is globally distributed in the tropics and subtropics. Ancestrally, there is little doubt that it occupied sub-Saharan Africa, where populations are still found in tropical rainforests, with larvae breeding in tree holes and female adults taking bloodmeals from nonhuman mammals (Lounibos, 1981;McBride et al., 2014).
As human settlements grew in Africa, populations of this mosquito evolved to become associated with human habitats, where larvae can be found in human-generated containers and females prefer humans for bloodmeals. About 500 years ago, this human-associated form left Africa, likely via slave trade, and first invaded the New World, then subsequently Asia and the Pacific Islands, including Australia (reviewed in .
These two forms, one sylvatic in continental Africa and one domestic, primarily outside Africa, have been given subspecific names: Ae.
aegypti formosus (abbreviated Aaf) and Ae. aegypti aegypti (Aaa), respectively. The vast majority of work has been on these two subspecies that we here refer to as Aedes aegypti sensu stricto (s.s.), to distinguish them from the newly studied populations from the islands in the southwestern Indian Ocean, which we will refer to here as Aedes aegypti sensu lato (s.l.). We note that while the clear-cut distinction of the subspecies designations is questionable in some instances (e.g. Powell & Tabachnick, 2013), here we use Aaf and Aaa as convenient shorthand for African native and outside Africa populations, respectively.
While the evolutionary history and genetic diversity of Ae. aegypti s.s. have received substantial study, populations of Ae. aegypti (and close relatives) on the islands east of Africa have received comparatively little attention (Failloux, Vazeille, & Rodhain, 2002;Kotsakiozi, Evans, et al., 2018;Vazeille et al., 2001). In this region of the southwestern Indian Ocean, Ae. aegypti occupies Madagascar and numerous smaller islands between Africa and Madagascar, such F I G U R E 1 Map of the southwestern Indian Ocean, east of Africa. Estimates of island ages are from Ashwal et al. (2017), Warren et al. (2003) and Michon (2016). While Ae. aegypti is found on Madagascar and all islands highlighted, Ae. mascarensis is endemic to Mauritius, and as Mayotte and Europa, as well as islands east of Madagascar, such as Réunion and Mauritius (Figure 1). A previous study estimated that Malagasy Ae. aegypti diverged from their continental African counterparts approximately 10 MYA (Fort et al., 2012).
In addition to Ae. aegypti, two closely related species from these islands have been formally described. These are Ae. mascarensis, endemic on Mauritius (MacGregor, 1924); and Ae. pia, endemic on Mayotte (Le Goff, Brengues, & Robert, 2013). Together with Ae.
aegypti s.s. can mate in the laboratory forming fertile F 1 offspring, but hybrid viability breaks down after the first generation (Hartberg & Craig, 1970). Genetic analysis of Ae. aegypti s.l. from Réunion has suggested the possibility of natural introgression from Ae. mascarensis (Kotsakiozi, Evans, et al., 2018). Aedes pia was described in 2013 (Le Goff et al., 2013), and is basal in phylogenetic analyses to all other members of the Aegypti Group (Soghigian, Andreadis, & Livdahl, 2017

| Mosquito collection and DNA extraction
Aedes aegypti were collected by our group, as well as collaborators from various geographic locations worldwide, particularly from the islands in the southwestern Indian Ocean (Table S1) (Soghigian et al., 2017). DNA from two adult Aedes pia was extracted from pinned specimens, using phenol-chloroform and genotyped as described for the other specimens. We also collected adult specimens of Culiseta melanura, Culiseta inornata and Culex erraticus from Worcester, for use in divergence time analyses, below.

| Estimating the evolutionary history of the Aegypti Group with nucleotide data
To estimate the evolutionary relationships and date the divergence times in the Aegypti Group, we generated new sequence data for Ae.
We estimated the evolutionary history of the Aegypti Group, and the rest of the subgenus Stegomyia, with the nucleotide data set in IQ-Tree (Hoang, Chernomor, von Haeseler, Minh, & Vinh, 2018;Kalyaanamoorthy, Minh, Wong, von Haeseler, & Jermiin, 2017;Nguyen, Schmidt, von Haeseler, & Minh, 2015). We partitioned our data set according to marker, and by codon position for protein-coding genes. We then allowed IQ-Tree to choose the best-fitting substitution model per partition, and we assessed clade support using ultrafast bootstrap values. In order to determine whether our mitochondrial markers (cytochrome oxidase I and II) had concordant topologies with our nuclear data (18S, 28S, ITS2, arginine kinase and enolase), we repeated this analysis with only the mitochondrial partitions and with only nuclear partitions, and compared subsequent results with our topology from all seven markers, as well as compared our results with the phylogeny of the Aegypti Group obtained from SNPs (see below). We included several Stegomyia species missing loci (see Table S2) in order to increase the breadth of our taxonomic sampling of the subgenus.
Although missing data are generally thought not to influence phylogenetic results in terms of topology or support values (Pyron, Burbrink, & Wiens, 2013;Roure, Baurain, & Philippe, 2013;Wiens & Morrill, 2011;Wiens & Tiu, 2012), we evaluated whether the degree of missing data at some markers might influence relative branch lengths, as these branch lengths would be essential for the proportion of nucleotides in the alignment.
Next, we used BEAST2 (Bouckaert et al., 2014) to estimate divergence times among members of the Aegypti Group. We used bModelTest (Bouckaert & Drummond, 2017) to estimate substitution models for each partition for ribosomal RNA genes, and by codon position for protein-coding genes. We used a birth-death tree process and a relaxed clock model following recommendations for  Table S16 for fossils, constrains and explanation). We ran the BEAST2 analysis for 150,000,000 generations on CIPRES (Miller, Pfeiffer, & Schwartz, 2010), after which we evaluated logs to ensure convergence of parameter estimates and subsequently generated a maximum clade credibility tree with TreeAnnotator from BEAST2 with common ancestor heights and 95% highest posterior density (HPD) intervals. We ran two additional chains to confirm consistent estimates of divergence time across chains.

| SNP genotyping, calling and filtering
For fine-scale population structure and admixture analysis, we generated a SNP data set with populations of Ae. aegypti from throughout its global distribution, and from other members of the Finally, we exported all polymorphic SNPs for downstream analyses.
Resulting polymorphic SNPs were then filtered using Plink version 1.9 (Chang et al., 2015). We removed SNPs that Evans et al. (2015) found not to follow Mendelian inheritance patterns. We also removed variants present missing in more than 25% of individuals (flag --geno 0.25), alleles with a minor allele frequency lower than 0.01 (--maf 0.01) and linked loci within a 75-kb window that exceeded a variance inflation factor of 2 (--indep 75k 1 2). This window size was chosen as it exceeded the linkage decay previously reported in Ae. aegypti (Matthews et al., 2018). We refer to this data set throughout the text as the SNP data set.

| Microsatellite genotyping
For complementary estimations of population history and demographic parameters, we generated a microsatellite data set comprised of samples from Europa Island (hereafter Europa) and Africa.
Microsatellite genotyping was performed as described previously from Burkina Faso and Europa, for which data were generated in this study.

| Genetic diversity and population structure with the SNP data set
We used our population-level SNP data set to infer genetic diversity and population structure of Ae. aegypti in the southwestern Indian Ocean. Unless otherwise described, all analyses were performed in R version 3.5 (R Core Team, 2018), and maps were generated using Google Maps, or PaleoMap PaleoAtlas for GPlates (available from http://www.gplat es.org/). We used a nested analysis of molecular variance (AMOVA) from the package poppr (Kamvar, Tabima, & Grünwald, 2014), defining nested strata according to (a) species/ subspecies and (b) population (see Table S1) to assess whether regions and populations were significantly different from one another.
We also calculated pairwise F st values with the R package StAMPP (Pembleton, Cogan, & Forster, 2013) for subspecies and population pairs (see Table S1 for populations and subspecies), regardless of the sample size of each population, as F st can be reliably estimated from as few as two individuals, so long as there are many thousands of markers (Nazareno, Bemmels, Dick, & Lohmann, 2017;Patterson, Price, & Reich, 2006). Next, we used sparse non-negative matrix factorization (hereafter SNMF) as implemented in the package LEA (Frichot & François, 2015) to analyse population structure. We used the minimal cross-entropy criterion to determine the optimal number of ancestral populations (K values), and also visualized alternative K values based on biologically reasonable ancestral clusters due to similarity in K values past K = 4. We also assessed whether the re-

| Phylogenetic analysis with the SNP data set
We evaluated the evolutionary relationships among populations of Ae. aegypti, as well as other members of the Aegypti Group, using maximum-likelihood inference on the population-level SNP data set with ascertainment bias correction for invariant sites, as implemented in IQ-Tree (Hoang et al., 2018;Kalyaanamoorthy et al., 2017;Nguyen et al., 2015). We allowed IQ-Tree to select the best-fitting substitution model, and we assessed clade support using ultrafast bootstrap values. We rooted the resulting topology on the branch leading to Aedes pia based on our BEAST2 analysis (above), which was consistent with previous results in Soghigian et al. (2017) that suggested Aedes pia was sister to Aedes mascarensis + Aedes aegypti.
This choice was further supported by all other analyses that suggested Aedes pia was more distant from Aedes aegypti than Aedes mascarensis (or any other population), for example relative missing data (see online supplemental information) and population clustering results.

| Admixture analysis with the SNP data set
Because our initial analysis of population structure supported a scenario of potential admixture, we used the F3 test (Reich, Thangaraj, Patterson, Price, & Singh, 2009) implemented in Treemix (Pickrell & Pritchard, 2012) to evaluate whether there was significant evidence of admixture among populations in our SNP data set. We compared population triplets, considering whether there was significant nontree-like behaviour for all sets of populations, where negative Z scores were indicative of admixture between two parent populations to produce the third sibling population. p-values were calculated from these Z scores with false discovery rate correction for multiple comparisons with the R function p.adjust (R Core Team, 2018).
To ensure that specific characteristics of our data set in terms of missing data and uneven sampling did not bias our admixture analy-

| Estimates of demographic parameters and population history with microsatellites
We complemented our inferences on population history using microsatellite markers to further investigate the divergence events that involved African Ae. aegypti s.s. and their relatives on Europa.
Particularly, we were interested in those events that occurred more recently in evolutionary history and for which nuclear and mitochondrial genes did not resolve the divergence times or directionality. per year and a mutation rate ranging from 9 × 10 −6 to 1 × 10 -5 , consistent with previous publications (Gloria-Soria et al., 2016) and rates reported in the literature for other Diptera species (Pfeiler, Flores-López, Mada-Vélez, Escalante-Verdugo, & Markow, 2013;Schug et al., 1998). ABC allowed us to explicitly contrast different hypotheses on the population history of these species, namely whether Europa diverged prior to East and West Africa, or after East or West Africa.
The ABC analysis was performed as five independent runs on mi- Kisumu-KE). We tested three scenarios ( Figure S3) to determine the history of Ae. aegypti in continental Africa relative to Europa.

| RE SULTS
We estimated the evolutionary history and divergence times of the members of the Aegypti Group in the southwestern Indian Ocean (Figure 2; Figures S1 and S2) with nucleotide sequence data from seven markers. Our maximum-likelihood phylogenetic analyses on the full nucleotide data set, the mitochondrial-only data set and the nuclear-only data set differed in clade support values but overall recovered concordant topologies for the major lineages of Stegomyia, including monophyly of the Aegypti Group ( Figure S1). Moreover, there was no correlation between missing data in the alignment and terminal branch length (for the number of loci in the alignment per species, t 31 = 0.12, p-value >0.9; and for the proportion of missing data in the alignment per species, t 31 = −0.6, p-value >0.5). As such, we performed divergence time estimation on the combined nuclear and mitochondrial nucleotide data set. The topology recovered from our divergence time estimate was similar to the likelihood analyses; thus, we report only the results from the divergence time analysis here (but see Figure S1).
Although sparsely sampled outside the Aegypti Group, these re-   Figure S1). The first support value indicates is the ultrafast bootstrap support value from IQ-Tree, while the second is the posterior probability from  (Figure 3a). Population structure as assessed by the STRUCTURE-like SNMF was also consistent with the PCA and phylogenetic results (see Figure S4 for SNMF).
At the optimal K determined by cross-entropy, K = 3, island populations had highest identity with the genetic clusters associated with Ae. aegypti from Madagascar, Ae. mascarensis and Ae. pia, than with the clusters associated with Aaa or Aaf. However, at this and higher K values ( Figure S4), individuals from island populations frequently contained a minority of ancestry to ancestral populations associated with Aaa or Aaf, potentially indicating admixture.
A maximum-likelihood phylogenetic analysis of the SNP data indicated that all southwestern Indian Ocean populations are basal to Ae. aegypti s.s. (Figure 4). Consistent with the maximum-likeli-  Although Jeddah was a common contributor in the lowest F3 scenario across all analyses, all Asian populations tested were suggested as putative ancestors of these admixed populations (Tables S7, S8 and S19). As such, any Asian population in our analysis could also have been a source of admixture in this region. Unlike other island populations, we found no evidence of admixture within mosquito samples from Madagascar or Europa.   (Kotsakiozi, Evans, et al., 2018). As our main focus was on populations in the southwestern Indian Ocean, and because the population structure of African Ae. aegypti has been recently reported from these microsatellites (Gloria-Soria et al., 2016), we report detailed population statistics from microsatellites in our supplement but summarize our descriptive statistics on the microsatellite data set briefly here. The main microsatellite data set used for this analysis included genotypes from 12 loci with 159 total alleles (Table S10). This data set contained individuals from six populations from West Africa and four from East Africa, plus the Europa population, and had 1.02% missing data (Table S11). Population genetic statistics for individual locations are reported in Table S12.
The average observed heterozygosity (Ho) across the data set was 0.567, with Ho = 0.458 in Europa, Ho = 0.579 in West Africa and Ho = 0.577 in East Africa; see Table S12. A total of 14 out of 143 (9.7%) population-by-locus comparisons deviate significantly from Hardy-Weinberg expectations, after sequential Bonferroni correction (Table S13). The ABC analysis on the African and Europa Island samples of Ae. aegypti supports a scenario where the colonization of Africa occurred from the Indian Ocean (represented by Europa) less than 85,000 years ago (Mean: 36,500 YA, 95% CI across runs: 9,780-84,300 YA; see Tables S14 and S15), with a predicted split between West and East Africa no older than 50,000 years ago (mean: 21,000 YA, 95% CI across runs: 7,810-49,400 YA; see Tables S14 and S15), assuming an average of 10 generations per year (Scenario 1: p = .8864 and p = .8481 in Tables S14 an S14; Figure S3). Alternative scenarios were poorly supported by the analysis (p < .11; Tables S14 and S15; Figure S3). The estimated mutation rate under the best-fit scenario was ~9.5 × 10 −6 , and falls within the range of microsatellite mutation rates estimated for other Diptera (Pfeiler et al., 2013;Schug et al., 1998).

Nevertheless, our analyses indicate that one lineage of Afrotropic
Stegomyia diversified in Africa (leading to Ae. africanus, Ae. simpsoni, and others; see Figure 2). Another lineage, which entered the islands of the southwestern Indian Ocean more than 16 MYA, diversified there and gave rise to the Aegypti Group, before entering continental Africa as the Ae. aegypti s.s. lineage.

Our analyses on the divergence times of species in the Aegypti
Group on Indian Ocean islands ( Figure 2) correspond well with the ages of the islands on which these species are found, as the age of Mayotte (where Ae. pia is endemic) is less than 20 MY (Michon, 2016) and Mauritius (where Ae. mascarensis is endemic) is less than 9 MY Group on Madagascar were basal to continental African Aedes aegypti (Delatte et al., 2011). This observation is also consistent with the maximum-likelihood analysis on the primary SNP data set reported here, although branch lengths from our SNP-based likelihood analysis may be underestimating variation in more distant species and populations, as the SNP chip was designed to capture variation within Ae. aegypti. Nonetheless, the concordance between the analyses performed on our nucleotide sequence data set, our primary SNP dataset and the microsatellite estimation of population history (see below) indicates that the common ancestor of the Aegypti Group was most likely found throughout the islands in the southwestern Indian Ocean.
Our results suggest that Ae. aegypti s.s. entered Africa less than 85,000 years ago, as the divergence of Europa and continental African populations was younger than 85,000 years across all five ABC analyses ( Figure S3; Tables S14 and S15). Pollen analyses of cores from Lake Malawi in East Africa at about the same latitude as Madagascar suggest frequent fluctuations in dry-and wet-associated flora until about 80,000 years ago when present-day vegetation stabilized (Ivory, Lézine, Vincens, & Cohen, 2018;Lyons et al., 2015).
This is consistent with the unusually high and stable levels that Lake Malawi experienced over the last 70,000 years, indicative of a period of steady rain (Ivory et al., 2018;Lyons et al., 2015) that would have provided a suitable habitat typical of Aaf on continental Africa.
Interestingly, our F3 tests did not indicate admixture has occurred in Ae. aegypti from Europa, suggesting that the apparent admixture signatures detected by the STRUCTURE-like SNMF in this population ( Figure S4) may represent instead ancestral polymorphism for this lineage. If true, Ae. aegypti from Europa could be a remnant of the island-dwelling lineage that originally colonized continental Africa, although testing this hypothesis requires further sampling and analysis. Moreover, we estimated a strikingly early last common ancestor for continental Ae. aegypti s.s. in Africa, at approximately 17,000 to 25,000 years ago. This time period coincides with the end of the last glacial maximum (Clark et al., 2009), where stable Pleistocene forest refugia along the east African coast (Finch, Leng, & Marchant, 2009;Fjeldsaå & Lovett, 1997;Mumbi, Marchant, Hooghiemstra, & Wooller, 2008)   .

| A divergent lineage of Aedes aegypti on Madagascar
We find strong evidence that Ae. aegypti on Madagascar are genetically distinct, and deeply diverged from Ae. aegypti s.s., at a level equal to or greater than what is seen in the named species Ae. mascarensis (Figures 2 and 3). A previous study, including only putative Ae.
aegypti samples, suggested Madagascar Ae. aegypti separated from Europa and continental populations 7-15 MYA (Fort et al., 2012), similar to our estimate ( Figure 2). Phylogenetic analyses using mtDNA from mosquitoes from northern Madagascar that were morphologically identified as Aaf indicated that these mosquitoes were genetically distinct from other populations of Ae. aegypti (Mousson et al., 2005), and basal to Ae. mascarensis, consistent with our results. To date, no phenotypic differences are known between Ae.
A cryptic lineage-particularly one that has shown some evidence

| A recent history of introgression in Ae. aegypti
On the islands in the southwestern Indian Ocean, we find evidence for recent introgression among members of the Aegypti Group.
These results indicate a component of the admixture is Aaa from Asia, rather than from New World Aaa or from Aaf. In our analysis, Aaa from Saudi Arabia was most often identified as the Aaa parental population in admixture analyses, but we note that the inference of ancestral population origins with Treemix is limited to the populations used in the analysis, and thus, future sampling may yield different results; additionally, all Asian populations we tested had similar strength of support as Saudi Arabia to be the origin of the parental Aaa involved in introgression in these island populations (Tables S13-S15). This inference is consistent with a hypothesis of Aaa expanding eastward after the Suez Canal opened in 1869 to colonize Asia (Kotsakiozi, Gloria-Soria, Schaffner, Robert, & Powell, 2018;Powell et al., 2018). Here, we infer they also entered islands east of Africa where they encountered other species in the Aegypti Group and, in some cases, interbred. It is also possible due to the small sample sizes in some populations (e.g. Ae. mascarensis or Ae. aegypti from Sri Lanka) we have underestimated or missed potential admixture events, as the relative F3 statistics inferred from our analysis of resampled populations were slightly lower than those from the full SNP data set. However, our resampling provided qualitatively the same results as our full data set. This indicates that small sample sizes can detect admixture that is consistent with observations from larger samples of the same population, at least given the relatively large number of SNPs in our study.
The consequences of this introgression remain to be tested.
However, some Ae. aegypti on Réunion exhibit a particular scutal morph that appears intermediate between Ae. aegypti and Ae.
mascarensis (Le Goff et al., in preparation). Other Ae. aegypti from Réunion-differentiated from Aaf in scaling patterns-have been reported to be primarily a sylvatic mosquito (Bagny, Delatte, Elissa, Quilici, & Fontenille, 2009;Salvan & Mouchet, 1994); that is, most are morphologically Aaa but ecologically Aaf. Previous studies speculated that the presence of Aaa in rural locations on Réunion, rather than in urban locations typical of Aaa, was due to the widespread use of pesticides, and larval competition with Ae. albopictus (Bagny et al., 2009). We propose here that admixture with Ae. mascarensis, or a mascarensis-like population, could have resulted in behavioural shifts due to the sylvan nature of Ae. mascarensis, while the morphology typical of Aaa was maintained. Admixture may complicate efforts to control Ae. aegypti by shifting typical behaviours associated with this mosquito in a way that may allow them to evade control measures that are typically effective against them. Vector control efforts in this region thus may need to account for not only a cryptic species, but also admixed populations, and will likely require more detailed information on ecological differences between species and populations to be successful. Moreover, the primary vector of the chikungunya virus during the outbreak of 2005/6 on Réunion was attributed to Ae. albopictus (Delatte et al., 2008), rather than Ae. aegypti, the more competent vector present throughout the rest of the world. Does this mean Ae. aegypti on Réunion is less competent than most Ae. aegypti populations for chikungunya? While some information is available for Madagascar (Failloux et al., 2002;Vazeille et al., 2001), the vector competence of other members of the Aegypti Group on southwestern Indian Ocean islands is largely unknown.

| CON CLUS ION
Ae. aegypti is among the best-studied mosquitoes, yet from an evolutionary standpoint, the results presented here highlight how much more there is still to be learned. We present the first solid evidence that the common ancestor Ae. aegypti s.s. and its nearest relatives were occupying Madagascar and/or islands in the southwestern Indian Ocean, indicating that these islands gave rise only relatively recently (<85,000 YA) to Ae. aegypti s.s. on continental Africa. Moreover, we find strong evidence of a cryptic genetic lineage on Madagascar. It is also clear, however, that while our limited sampling in this region of the world has revealed important insights, more intense sampling and analyses of collections are likely to reveal a more complete picture of the evolutionary history of the Aegypti Group.

ACK N OWLED G EM ENTS
We thank T. Chiodo for laboratory assistance and processing of some samples used in this study. We also thank J. Baltzegar and  Andrea Gloria-Soria https://orcid.org/0000-0002-5401-3988