• Open Access

Spatial genetic structure of Aedes aegypti mosquitoes in mainland Southeast Asia


Dr Catherine Walton, Faculty of Life Sciences, University of Manchester, Michael Smith Building, Oxford Road, Manchester M13 9PT, UK.
Tel.: 00 44 (0)161 275 1533; fax: 00 44 (0)161 275 5082; e-mail: catherine.walton@manchester.ac.uk


Aedes aegypti mosquitoes originated in Africa and are thought to have spread recently to Southeast Asia, where they are the major vector of dengue. Thirteen microsatellite loci were used to determine the genetic population structure of A. aegypti at a hierarchy of spatial scales encompassing 36 sites in Myanmar, Cambodia and Thailand, and two sites in Sri Lanka and Nigeria. Low, but significant, genetic structuring was found at all spatial scales (from 5 to >2000 km) and significant FIS values indicated genetic structuring even within 500 m. Spatially dependent genetic-clustering methods revealed that although spatial distance plays a role in shaping larger-scale population structure, it is not the only factor. Genetic heterogeneity in major port cities and genetic similarity of distant locations connected by major roads, suggest that human transportation routes have resulted in passive long-distance migration of A. aegypti. The restricted dispersal on a small spatial scale will make localized control efforts and sterile insect technology effective for dengue control. Conversely, preventing the establishment of insecticide resistance genes or spreading refractory genes in a genetic modification strategy would be challenging. These effects on vector control will depend on the relative strength of the opposing effects of passive dispersal.


Dengue is a major re-emerging public health problem in the tropics and subtropics outside Africa. An estimated 50 million people are infected annually, which can lead to hospitalization and death in severe cases (WHO 2007). The dengue virus is transmitted to humans by mosquitoes, primarily Aedes aegypti. Southeast Asia has experienced severe dengue epidemics since the 1950s, particularly in urban areas (Hammon et al. 1960; Gubler 1998a,b; Muto 1998; Prasittisuk et al. 1998; Thu et al. 2004; Kittayapong 2006). This great burden of disease in Southeast Asia stems from the invasion and spread of A. aegypti throughout the region that is thought to have occurred following its introduction from Africa by the shipping trade in the late 19th century (Smith 1956; Tabachnick and Powell 1979). Aedes aegypti is a highly domestic species that not only feeds on humans but also breeds in and around human habitation, laying its eggs in water-filled, man-made containers (e.g. water storage jars and discarded containers). This species, and the disease it transmits, have therefore proliferated in conjunction with human population growth, economic development, increased mobility and uncontrolled urbanization (Gubler and Clark 1995; Gubler 1998b; Guzman and Kouri 2002; WHO 2007).

At present, the reduction of dengue transmission relies on vector control (WHO 2008). For this reason, it is essential to gain a good understanding of the genetic population structure of A. aegypti, and the factors underlying this, particularly gene flow (Gooding 1996; Ravel et al. 2001). Genetic population structure usually results from a combination of several contemporary and historical processes such as dispersal ability of the species, mating patterns, environmental barriers to dispersal and demographic history (Balloux and Lugon-Moulin 2002). Disentangling the different roles of these factors in determining population structure is needed if the information is to be useful for vector control. For example, information on the rate of ongoing gene flow would help to predict the spread of insecticide resistance genes, the presence of which can be a major limiting factor to the success of control campaigns (Pasteur et al. 1995; Mousson et al. 2002). An understanding of gene flow and environmental barriers to mosquito dispersal is also essential if plans to genetically modify vector populations are to be realized (Scott et al. 2002; James et al. 2006).

Although it is generally the contemporary processes such as gene flow that are most relevant for vector control, it is also important to determine the demographic history of A. aegypti. This could not only help in enabling us to distinguish more clearly the contemporary factors shaping population structure, it could also help to elucidate the genetic basis for the apparent geographic variation in the susceptibility of A. aegypti to dengue virus (Gubler et al. 1979; Bosio et al. 1998). There have been several mtDNA-based studies that might have been expected to be informative of demographic history. For example, the apparent presence of two genetically divergent lineages of mtDNA has often been interpreted to indicate that colonization by A. aegypti was from multiple, divergent source populations (Gorrochotegui-Escalante et al. 2002; Bosio et al. 2005; Herrera et al. 2006; Scarpassa et al. 2008). However, the presence of nuclear copies of mtDNA (Numts) in A. aegypti (Hlaing et al. 2009) means that such inferences may be unreliable. Consequently, as yet we know relatively little about the population history of this species.

Previous large-scale studies of genetic population structure of A. aegypti using allozymes (Tabachnick and Powell 1979; Powell et al. 1980; Wallis et al. 1983; Failloux et al. 2002) indicate structuring on a worldwide scale. The overall levels of genetic differentiation are relatively low, consistent with the recent spread of this species throughout the tropics (Tabachnick and Powell 1979). More recently, microsatellite markers have been used for determining the genetic population structure in A. aegypti. Most of these studies have been on small spatial scales: regions within a country, i.e. Mexico (Ravel et al. 2001), Ivory Coast (Ravel et al. 2002), Cameroon (Paupy et al. 2008), or within cities, e.g. Ho Chi Minh City in Vietnam, Phnom Penh in Cambodia and Chiang Mai in Thailand (Huber et al. 2002, 2004; Paupy et al. 2004). Like the studies on a larger spatial scale, these small geographical-scale studies generally conclude that there is a significant level of genetic differentiation. However, these previous microsatellite studies of genetic population structure of A. aegypti in Southeast Asia have been very localized and used small numbers of markers (from three to eight loci). Further, there has been no substantial study of the broad-scale patterns of genetic structure of A. aegypti in Southeast Asia and the possible demographic factors underlying this.

The aim of this study was therefore to determine the genetic population structure of A. aegypti at a hierarchy of spatial scales across the region of mainland Southeast Asia and to determine the factors shaping structure at each scale. In particular, we wish to understand the relative roles of historical and contemporary factors shaping the population structure to enable us to take into account any potentially confounding historical effects in the estimation of contemporary gene flow. Thirteen microsatellite markers were genotyped in mosquitoes sampled from 36 sites in Myanmar, Cambodia and Thailand as well as collections from Sri Lanka and Nigeria. In addition to conventional population-based approaches, we also used landscape genetics methods (Manel et al. 2003; Storfer et al. 2006) to identify the factors shaping genetic structure. Landscape genetic approaches are individual-based rather than population-based, so should not result in misleading conclusions being drawn from the incorrect designation of a priori populations. The findings were interpreted in relation to their significance for vector control efforts and for the future utility of using landscape genetics approaches to identify the factors shaping the genetic structure of this species.

Materials and methods

Mosquito sampling

Mosquitoes from mainland Southeast Asia were collected in 2004 and 2005 from a total of 36 sites in Myanmar, Thailand and Cambodia at a hierarchy of spatial scales (Table 1). There were three main collection regions per country each of which comprised four collections sites: three that were ∼5 km apart and a fourth that was ∼50 km away from the trio. For example, sampling within a region could include three clustered sites in suburban and/or periurban areas of a town or city and a village or town ∼50 km away. Although choice of sample sites was governed largely by spatial position, we also attempted to collect from a range of city, periurban and rural sites in order to determine if connectivity (presumably highest in cities and lowest in rural sites) influenced genetic structure within and among sites. Each collection site covered an area ∼500 m in diameter based on the ability of A. aegypti to fly up to several hundred metres (Christophers 1960; Reiter et al. 1995; Honório et al. 2003). Third- and fourth-stage larvae and pupae were collected from ∼50 different water storage containers (such as outside water storage jars, indoor cisterns and discarded cans, coconut shells, tyres) in and around people’s houses, sampling as evenly as possible throughout the collection site. Larvae and the adults (which hatched from the collected pupae) were examined under a light microscope and morphologically identified to species using taxonomic keys (Rattanarithikul and Panthusiri 1994). Identified larvae were preserved in 95% ethanol and adults were preserved by desiccation using silica gel. The locations of the sampling sites were recorded using a global positioning system. For the microsatellite genotyping, a single individual was selected at random from each container to avoid incidental sampling of close relatives.

Table 1.   Sample collection data for Aedes aegypti mosquitoes.
Map referenceCountryRegionSample siteSample codeNLongitudeLatitudeCollection date
  1. The fourth sites in each cluster (i.e. M4, M8, M12, T4, T8, T12, C4, C8 and C12) are ∼50 km distant from the others within each corresponding cluster.

  2. *Citwntown or urban areas

  3. †Suburban or peri-urban settlements.

  4. ‡Rural areas.

 1MyanmarYangonNorth Okkalapa*M14896.17E16.45NDec-2004
South Okkalapa†M23996.18E16.85NDec-2004
 2MeiktilaAung San*M52895.83E20.85NJul-2005
Yadana Man Aung†M61895.86E20.88NJul-2005
 3MyitkyinaYangyi Aung†M92497.39E25.36NAug-2005
Shwe Nyaungbin†M101197.38E25.38NAug-2005
Pamma Tee‡M112097.31E25.37NAug-2005
Moe Kaung†M123496.93E25.30NAug-2005
 4ThailandChiang MaiChiang Mai moat*T14398.98E18.78NOct-2004
Mae Hia†T21498.95E18.74NOct-2004
Ban Pong Noi†T31698.94E18.76NOct-2004
 5Ubon RatchathaniBan Khamyai Moo†T541104.86E15.29NNov-2004
Wat Hat Tai*T628104.87E15.22NNov-2004
Wat San Sumran*T724104.85E15.19NNov-2004
Ban Kudkrasean†T846104.55E15.33NNov-2004
 6SongklaTee Main†T921100.59E7.20NJun-2005
Kao Seng†T1028100.62E7.17NJun-2005
Ban Bang Dan†T1142100.59E7.14NJun-2005
Ban Bo Tru†T1233100.40E7.65NJun-2005
 7CambodiaBattambangCham Kasamrong†C121103.19E13.11NSep-2005
Preak Preahsdech†C238103.21E13.09NSep-2005
Reusey Krok‡C439103.02E13.54NOct-2005
 8Phnom PenhChrang Chamreh Pir*C527104.89E11.63NSep-2005
Svay Pak*C635104.86E11.66NSep-2005
Sala Leak Pram†C845104.71E11.93NOct-2005
 9KratieThma Kre†C940106.00E12.55NNov-2005
Oresey villa‡C1141106.03E12.49NNov-2005
Kbal Snoul‡C1244106.42E12.07NNov-2005
10Northeast IndiaAssamDibrugarh*IND696.27E26.76NDec-2005
11Sri LankaColumboMattakkuliya†SRI1779.89E6.96NMar-2006

DNA extraction and microsatellite genotyping

DNA was extracted from individual mosquitoes using a standard phenol/chloroform method (Sambrook and Russell 2001) or ammonium acetate precipitation method (Nicholls et al. 2000). Thirteen dinucleotide microsatellite loci have been characterized previously in A. aegypti that have suitable levels of variation (Slotman et al. 2007). These microsatellites were amplified in two sets of multiplex PCR in 2 μL reaction volumes. Each reaction comprised: DNA template (1 μL of a 1:400 dilution); 1 μL of primer mix (containing each primer at 0.2 mm with the forward primer of each pair fluorescently labelled with HEX, FAM or NED); and 1 μL of Qiagen Master Mix (QIAGEN, Crawley, UK). The amplification conditions used were an initial activation and denaturation step at 95°C for 15 min; 35 cycles at 94°C for 30 s, 55°C for 90 s and 72°C for 90 s; and a final extension step of 10 min at 72°C. The amplified products were run on an ABI 3730 capillary sequencer (Applied Biosystems, Warrington, UK) at the Molecular Genetics Facility, University of Sheffield (SMGF). A subset of amplified fragments was included in all runs to ensure consistency in size estimation using GeneScan ROX500-bp internal size standards (Applied Biosystems). In addition, for genotype scores of low amplitude the DNA was re-amplified and re-genotyped to confirm the genotyping. The amplified microsatellite markers were genotyped using the GeneMapper software 3.7 (Applied Biosystems).

Genetic analyses

Allelic richness (RS), observed and expected heterozygosity (HO and HE), the inbreeding coefficient (FIS) and deviation from Hardy–Weinberg equilibrium (HWE) were estimated for each locus in each population using arlequin 3.01 (Excoffier et al. 2006). arlequin was also used to test for the presence of linkage disequilibrium (LD) between all possible pairs of loci in each population for a total of 38 populations. To determine if significantly positive FIS values were due to isolation by distance within a 500-m diameter collection site, we determined if the genetic and geographic distances between individuals within a site were correlated using a Mantel test (arlequin 3.01). The genetic distance measure between individuals was the proportion of shared alleles (Bowcock et al. 1994), which was calculated in Microsatellite analyser 4.05 with 10 000 permutations (Dieringer and Schlötterer 2003).

Genetic differentiation between pairs of populations was estimated in arlequin 3.01 (Excoffier et al. 2006) using both FST (Weir and Cockerham 1984) and RST (Slatkin 1995) as the distance metric. Significance was estimated at the 5% level by 1000 permutations of the genotypes among populations. A Mantel test was implemented in arlequin 3.01 to test for isolation by distance using FST/(1 − FST) as a linearized estimate of pairwise genetic distance between populations and the logarithm of geographic distance (Slatkin 1987).

Two Bayesian clustering methods, tess 2.0 (Francois et al. 2006; Chen et al. 2007) and geneland 3.1.4 (Guillot et al. 2005), were used to detect genetic clusters. Individuals are assigned probabilistically to genetic clusters based on their multilocus genotypes to maximize HWE and minimize LD. As these methods do not assume predefined populations, they are useful for identifying spatial discontinuities between samples. Both methods used here are based on the reasonable assumption that spatially proximate individuals are more likely to be genetically related than those that are not and they therefore use the spatial location of individuals as a prior. Although ideally individuals should be sampled in a spatially continuous manner, these approaches have been shown to be effective even when this is not the case (Chen et al. 2007). Explicitly taking into account spatial information is particularly useful for the optimal assignment of individuals into k clusters when the overall level of population structure is low (Chen et al. 2007), as is the case here.

In geneland, we first estimated the number of clusters (k) from five separate runs allowing k to vary from 1 to 38. Each run comprised 10 independent Markov Chain Monte Carlo (MCMC) chains of 600 000 iterations with a thinning interval of 50 and a short burn-in of 200. All runs used the correlated frequency model and spatial model. Each of the five runs generated a distribution of the posterior probability density for different values of k from which the modal value of k was taken to indicate the number of genetic clusters. Four longer runs of geneland were then performed using 1 000 000 iterations for a fixed value of k estimated from the previous runs to determine the consistency with which individuals were assigned to clusters.

Unlike geneland, k is a fixed parameter in the model used by tess. The optimal value of k was inferred by running the software for sequentially higher values of the maximal number of clusters, kmax (from 2 to 38) until the estimated number of clusters, k, was less than kmax. For each value of k, we ran 10 independent MCMC runs each with 50 000 sweeps and a burn-in period of 10 000 sweeps. As recommended by the authors, we used the no admixture and no F model. The spatial interaction parameter (ψ), which determines the extent of spatial dependence of the algorithm, was fixed at 0.6 according to Chen et al. (2007). Similar trial runs with ψ set at 0.4, 0.8 or 1.0 generated consistent results. For some values of k, we performed five longer runs of tess each with 200 000 sweeps and a burn-in of 20 000 sweeps. For each value of k used for the long runs, the overall assignment of individuals to clusters was determined using the Greedy algorithm in clumpp 1.1.1 (Jakobsson and Rosenberg 2007).

To identify the environmental factors that determine the genetic population structure of A. aegypti in Southeast Asia, we applied the software geste 2.0 (Foll and Gaggiotti 2006) to the 36 populations from Myanmar, Thailand and Cambodia. Northeast India was excluded due to its small sample size. The analysis was restricted to mainland Southeast Asia as it is more likely that the same factors underlie the population structure across this region whereas additional factors are likely to determine the wider-scale genetic structure including Sri Lanka and Africa. The geste modelling process uses a hierarchical Bayesian method to test the effect of different environmental factors on population structure. FST values are estimated for each local population (population specific FST values) and provide information on how genetically distinct a population is relative to other populations in the sample. For example, under a model of diffusive dispersal following a single colonization event, populations furthest from the origin would have the highest FST values due to the cumulative effects of drift from repeated founder events. Population-specific FST values are related to environmental factors using a generalized linear model. Posterior probabilities are estimated for alternative models with differing environmental variables and the model with the highest posterior probability best explains the data. Deviation from the regression, i.e. how well the data fit the model, is measured by σ2 and the extent of uncertainty of model parameter values is estimated by the 95% highest probability density interval (HPDI), the smallest interval containing 95% of the values.

We considered three different environmental scenarios each of which includes two environmental factors and explored the effect of the factors individually and their interaction in determining population structure. The first scenario was connectivity which included spatial separation and town size (population size of the town) as the factors. Spatial separation on its own is effectively an isolation-by-distance model. The second factor, town size, was expected to increase the connectivity between A. aegypti populations due to increased movement of people. The second scenario was a land-based range expansion, assuming a major point of entry of mosquitoes into Southeast Asia from Africa followed by diffusive dispersal, making latitude and longitude the two environmental factors. The third scenario investigates the effect of human transportation routes using distance from a port and town size as the factors. All these scenarios are of course more simplistic than the reality is likely to be, perhaps particularly the idea of a single colonization event and simple spread of A. aegypti across Southeast Asia. However, they serve as a means to attempt to distinguish the most important environmental factors shaping population structure. To check for consistency, each scenario was tested using three short and one long run. Each short run had a total of 250 000 iterations with a thinning interval of 20 including a burn in of 50 000. Each long run had a total of 2 050 000 iterations with a thinning interval of 20 including a burn in of 50 000.

Tests of population expansion

The populations were tested for deviation from mutation–drift equilibrium using the intra-locus k-test and inter-locus g-test (Reich et al. 1999). Both tests were conducted using the Excel Macro kgtests (Bilgin 2007). The k statistic tests whether the frequency distribution of allele lengths is more peaked than would be expected for a population of constant size. A constant-sized population is expected to have a ragged, multimodal distribution due to ancient bifurcations caused by stochastic lineage loss whereas an expanding population tends to have a unimodal distribution due to a lack of deep bifurcations in the genealogy. A model of constant population size can be rejected if there are a lower than expected number of positive k statistics for the number of loci tested. The g statistic tests for there being lower variance among loci in the variance of allele frequency sizes than expected for a constant sized population. In populations of a constant size, the dates of deep bifurcations will vary among loci whereas, in an expanding population, bifurcations for all loci will tend to date to the time of expansion. A significantly low value for the g-test was determined empirically by the fifth-percentile cut-off (Reich et al. 1999).


Summary statistics

We analysed variation in 13 microsatellite loci for a total of 1111 individuals encompassing 36 sites in mainland Southeast Asia. We also included 17 individuals from Sri Lanka and 46 individuals from West Africa (Table 1). The summary statistics, including, allelic richness (RS), observed heterozygosity (HO) and deviation from HWE are shown for each locus in each population in the Appendix. All loci were polymorphic in all populations. Allelic richness (RS) was the highest for the Nigerian population, which had an average RS of 8 over all loci compared with an average of 5.6 and a maximum of 7 for all other populations. For the 494 tests of HWE, 181 tests showed significant deviation (at  0.05), all due to a deficiency of observed heterozygotes, but this was not associated with particular populations or particular loci.

Genetic differentiation within sites

Out of the 2964 pairwise tests of LD among loci, 462 were significant compared with 148 expected at  0.05. This is unlikely to be due to tight physical linkage or selection since all pairs of loci showed LD in at least one population and no particular pairs of loci consistently showed LD. The LD together with the deviation from HWE could therefore be indicative of some form of inbreeding within collection sites. An alternative explanation for the deficiency of heterozygotes is the presence of null alleles. At the start of the study, we investigated this possibility for five loci (AG7, AC4, CT2, AG3 and AC7) by amplifying the same 259 individuals using both the primers designed by Slotman et al. (2007) and more exterior primer pairs that were designed from the available genome sequence (Table 2). The amplification of every individual for all loci consistently generated a heterozygote or homozygote with both primer pairs, even though overall there were slightly fewer heterozygotes than expected (157 rather than 168; Table 2). This indicates that the large number of significant FIS values in the final dataset is unlikely to be due to null alleles and instead indicates genetic structuring within a site. Although this implies very limited dispersal in these mosquitoes, when six populations with high FIS values (M2, M10, T2, T6, C3 and C12) were tested they showed no signal of isolation by distance within the ∼500-m diameter sites.

Table 2.   Comparison of the number of heterozygotes generated compared with the numbers expected for five microsatellite loci from the same Aedes aegypti individuals using both the original primers designed by Slotman et al. (2007) and redesigned, exterior primer pairs.
LocusNew primersSize range (bp) for new primersSize range (bp) for original primersPopulationN1No. Hets observed with old and new primers (N1)No. Hets expected (N1)FIS (N1)N2zFIS (N2)
  1. FIS values were estimated for the smaller number of individuals (N1) genotyped with both primer pairs and for the final sample size (N2) with the original primers.

  2. N1 = Number of individuals genotyped with modified primers.

  3. N2 = Number of individuals genotyped with original primers.

  4. Level of significance: *< 0.05; **< 0.01.

AC4Forward: 5′-TAAGCAAGCAGCATGTTTCG-3′210–214111–115Yangon241214.70.186480.062**
Reverse: 5′-TCTCGTCTCACACGCATACAC-3′Chiang Mai301818.90.053430.056
CT2Forward: 5′-ATGCTCTCCCAAACTCTTCG-3′278–288168–178Yangon231612.4−0.295480.048**
Reverse: 5′-GTGCGACCAAGGTTAGATCC-3′Chiang Mai301818.50.024430.176*
AG7Forward: 5′-CCAAAGCTATGTGTTTAGTGGTAGG-3′204–218136–150Yangon211212.90.073480.05
Reverse: 5′-ACGGGCGTGTATTACAGGAG-3′Chiang Mai272022.30.103430.15*
AC7Forward: 5′-CCAGCAATAGGAAAGTCTTAGGC-3′214–228114–127Yangon231314.40.099480.082
Reverse: 5′-TGTTTAACAATCTCATTGGACTCG-3′Chiang Mai301114.30.232430.201
AG3Forward: 5′-CGGAGAGCAGGAAAGTTCAC-3′233–243146–156Yangon211415.40.09480.119*
Reverse: 5′-TTGGCGGGACTCTATTGTG-3′Chiang Mai3023240.041430.059**

Genetic differentiation among sites

Genetic differentiation was estimated for all pairs of populations and is summarized in Table 3. In general, although the FST-based estimates of population differentiation were not higher than RST-based estimates, far more of the FST-based tests showed significant differentiation. The relative lack of signal of population differentiation from RST is probably due to deviation from the assumed stepwise mutation model (Balloux and Lugon-Moulin 2002). The FST-based estimates revealed that even sites that were only ∼5 km apart were significantly genetically differentiated, although the level of differentiation was low (average FST values of 0.026). This observation was repeated at all spatial scales and even pairwise comparisons among populations from Africa and Asia had only low levels of differentiation (average FST values of 0.066). It is possible that there are some false positives as 5% was used as the significance level for these tests, but these will have minimal effect on the overall findings as the majority of tests were positive (808 out of 820) and typically had high significance ( 0.0001). As spatial scale increases, the average FST value and the level of significance also increases. (The three nonsignificant comparisons at the highest spatial scale involved comparisons with Kenya or Northeast India where sample sizes were very low.) This is consistent with the signal of isolation by distance found in the populations from mainland Southeast Asia (Fig. 1). The Mantel test shows that genetic and geographic distances are significantly correlated, although the level of this correlation is relatively low (R2 = 0.111; = 0.036).

Table 3.   Average pairwise population-genetic differentiation of Aedes aegypti mosquitoes from Southeast Asia and Africa at a range of spatial scales.
 Within clustersWithin and between countriesBetween countries across continents (2000 to ∼10000 km)
∼5 km (∼3–13 km)∼50 km (∼20–70 km)∼500 km (∼150–500 km)>500 km (∼500–1000 km)>1000 km (∼1000–1500 km)<2000 km (∼1500–2000 km)>2000 km
Average pairwise FST0.0260.0320.0390.0430.0450.0470.0630.066
No. significant pairwise FST values24/2726/27146/148172/175203/20366/6624/24147/150
Average pairwise RST0.0410.0230.0440.0410.0420.0520.0520.041
Range−0.008 to 0.123−0.019 to 0.067−0.006 to 0.214−0.012 to 0.207−0.009 to 0.158−0.005 to 0.1750.009 to 0.108−0.117 to 0.265
No. significant pairwise FST values11/273/2742/14845/17557/20319/6616/2441/150
Figure 1.

 Scatterplot and regression line of genetic and geographic distance for all 36 populations of A. aegypti in mainland Southeast Asia (Mantel test: R2 = 0.111, = 0.036).

Landscape factors underlying population structure

For the 38 populations (i.e. excluding the small Northeast Indian and Kenyan samples), geneland consistently estimated a highly peaked posterior probability distribution for the number of population clusters (Fig. 2) with a modal value of = 7 and the second highest value for = 8. There was also a small but significant posterior probability for = 38, consistent with the small but significant level of structure detected between most populations (Table 3). For = 7, the average FIS value for each cluster was 0.14 and FST values between clusters ranged from 0.01 to 0.06. The four long runs of geneland with = 7 generated largely consistent results, which are represented in Fig. 3. The Nigerian population was always reported as a distinct cluster as was M4 (the site 50 km from Yangon) and T8 (the site 50 km from the cluster of three populations in Ubon Ratchathani, eastern Thailand). There was also evidence of regional clustering. The Central and Upper Myanmar populations (M5–M12) were always clustered together and half of the time they were also grouped with Yangon populations (all individuals from M1 and M2 and nine individuals from M3). In addition, 11 of the Cambodian populations (except for C4) always form a cluster. Conversely, the Cambodian C4 population (50 km outside of Battambang) only clustered with the other Cambodian populations once and instead usually clustered with Chiang Mai and Songkhla (three out of four runs). The Thai populations from Chiang Mai (T1–T4) and Songkhla (T9–T12) always formed a cluster, even though they were more than 1000 km apart. This indicates that spatial distance alone was not responsible for the clusters detected. Similarly, the geographically distant Sri Lankan individuals were never found in a separate group but always clustered with three of the eastern Thai populations (T5–T7) and most of the individuals from the Yangon M3 population. The Yangon populations (M1–M3) were very heterogeneous compared with other populations. In addition to the clustering with the other Myanmar populations or with Sri Lanka and Eastern Thailand as outlined above, M1, M2 and some individuals from M3 also sometimes clustered (two out of four times) with the northern and southern Thai populations from northern Thailand (Chiang Mai, T1–T4) and southern Thailand (Songkhla, T8–T12).

Figure 2.

 Distribution of posterior probability for the number of genetic clusters (k) estimated by geneland.

Figure 3.

 Relief map of Southeast Asia (and Sri Lanka) showing sample collection sites and genetic clusters detected by geneland. In mainland Southeast Asia, a large circle represents three sites spaced 5 km apart and a small circle represents the site 50 km distant from this. The Yangon sites are numbered according to Table 2. Each genetic cluster is represented by a different colour but the 7th cluster, Nigeria, is not shown.

When tess was run with the spatial interaction parameter (ψ) set to 0 for = 3–38, there were no clear clusters with the exception of Nigeria. When run like this, with no spatial information taken into account, tess is equivalent to the structure program of (Pritchard et al. 2000). With ψ set to 0.6, the likelihood of tess increased with increase in k, reflecting the low but significant differentiation between all populations. As kmax was increased for repeated runs of tess, the first point at which some estimates of k were lower than kmax was for = 7 and = 8, consistent with the geneland clustering results. However, it was only by kmax = 16–18 that k was consistently lower than kmax with = 16 being the modal value. Figure 4 shows the cluster assignments from distruct for = 7 and = 8 (to provide a comparison with the geneland output) and for = 16. At = 7 and = 8, the grouping structure is very comparable with that obtained from geneland, for example, the genetic distinctiveness of T8 in eastern Thailand and the genetic similarity of northern Thailand (Chiang Mai), southern Thailand (Songkhla) and C4 in Cambodia. In tess at = 16, these latter three regions become genetically distinct from each other. The M4 population on the outskirts of Yangon also becomes distinct (as in geneland) as does C11 in Cambodia. tess and geneland also both identify the Yangon populations (M1–M3) as being genetically heterogeneous. In addition, the tess assignments for = 7 and = 8 indicated that sites C1–C3 in Battambang, Cambodia, and sites T9–T11 in Songkla, southern Thailand and Sri Lanka were also very heterogeneous with individuals from these populations being assigned to several different genetic clusters. At = 16, additional heterogeneity in the Cambodian and Nigerian samples also became apparent.

Figure 4.

 Membership of individuals to 7, 8 and 16 genetic clusters estimated by tess and clumpp.

The three alternative, but not necessarily mutually exclusive, environmental scenarios that were fitted to the data using geste are shown in Table 4. The posterior probabilities were consistent across runs and are often high. However, the quality of fit to the models is actually very weak with wide HPDI and high σ2 values so we cannot make reliable inferences from the model results. We show the data here for two reasons. First, these data give some indication of factors that are worthy of further investigation as they appear to indicate a role for human communication routes (via ports and roads that connect large towns), spatial distance and spatial range expansion in determining genetic population structure in this species. Secondly, the data indicate how future sampling could be carried out to ensure that reliable inferences could be made from the modelling. Although the relatively large number of populations used here should enable robust model determination (Foll and Gaggiotti 2006), most of the populations have similar and very low population-specific FST values providing little information for the modelling. There is a small proportion of populations with higher FST values (notably M8 and M10 in Central and Upper Myanmar, T7 in Northern Thailand and C7 in Phnom Penh, Cambodia) that have experienced greater genetic drift relative to the others. It is these distinctive populations that are largely determining the regression, but as they are few in number there is great uncertainty in parameter estimation. As these distinctive populations are both more inland and also tend to be more isolated, future sampling would need to include sites where environmental factors were not confounded.

Table 4.   Posterior probabilities for 24 different models under the three environmental scenarios from the geste analysis.
Environmental ScenarioFactorsPosterior probability
Spatial distance0.331
Constant and spatial distance0.170
Size of city0.337
Constant and size of city0.177
Constant, spatial distance and size of city0.160
Spatial distance and size of city interaction0.305
Spatial range expansionConstant0.0333
Constant and longitude0.0189
Constant and latitude0.0261
Constant, longitude and latitude0.0252
Longitude and latitude interaction0.896
Human transport routesConstant0.120
Distance from port0.210
Constant and distance from port0.102
Size of city0.206
Constant and size of city0.098
Constant, distance from port and size of city0.108
Distance from port and size of city interaction0.572

Tests of population expansion

The results of the k- and g-tests for population bottlenecks are shown in Table 5. In Southeast Asia, the clusters of four sites each have about half the loci (from 4 to 8 out of 13 loci) with positive values of the k-test statistic, which indicates that the allele frequency distributions are no more peaked than expected by chance. The number of loci that have positive values of k decreases when these clusters are pooled into country or the whole of Southeast Asia (from 3 to 6 out of 13 loci) with only Thailand having a significantly low number of positive k-values. Sri Lanka, with 7 out of 13 loci positive, fits in with the other Southeast Asian sites whereas Nigeria appears to have a larger number of positive k test values (10 out of 13). All the g-values for these groups of populations are also positive indicating that a model of constant population size could not be rejected for any of the population groupings.

Table 5.   Tests of population expansion in Aedes aegypti using the k- and g-tests for 13 microsatellite loci.
Sitesk-Test (number of positives)g-Test
  1. *k-test (P-value = 0.037).

Cluster 1 (Myanmar)7/13 NS0.64
Cluster 2 (Myanmar)7/13 NS0.84
Cluster 3 (Myanmar)6/13 NS0.65
Total (Myanmar)5/13 NS0.70
Cluster 4 (Thailand)7/13 NS0.70
Cluster 5 (Thailand)5/13 NS0.62
Cluster 6 (Thailand)4/13 NS0.81
Total (Thailand)3/13*0.63
Cluster 7 (Cambodia)8/13 NS0.67
Cluster 8 (Cambodia)8/13 NS0.47
Cluster 9 (Cambodia)8/13 NS0.86
Total (Cambodia)7/13 NS0.66
Total (Southeast Asia)6/13 NS0.62
Northeast India9/13 NS1.07
Sri Lanka7/13 NS0.69
Kenya8/13 NS1.14
Nigeria10/13 NS0.42


Restricted dispersal on a small spatial scale

Although the overall level of genetic differentiation was low, populations that were only 5 km apart were significantly different from each other. Even within our collection areas that were no more than 500 m in diameter, estimates of the inbreeding coefficient (FIS) were often significant. The generation of the same genotypes with alternative primers indicates that this inbreeding is unlikely to be explained solely, if at all, by the presence of null alleles. Together, therefore, these data indicate that genetic structuring on a very small spatial scale (<500 m) is a general phenomenon in A. aegypti in Southeast Asia, in agreement with previous localized studies (e.g. Paupy et al. 2004). The lack of detection of a signal of isolation by distance within a collection site indicates that structuring is not due to limited dispersal in a continuous population. For some sites, preliminary analyses using the tess software showed some indication of genetic clustering within sites but, before any conclusions can be firmly drawn on this, more loci and more individuals are needed to provide the necessary resolution at this small spatial scale.

Data from mark–release recapture studies shows that A. aegypti mosquitoes have a limited flight range but can move up to a few hundred metres around their larval habitats (Reiter et al. 1995). Although such dispersal distances could be sufficient to genetically homogenize clusters within a 500-m area, there are several factors that may prevent this. First, dispersal rates are reduced where oviposition sites are abundant (Edman et al. 1998). Secondly, the frequency distribution of dispersal distances is highly skewed with the vast majority dispersing extremely low distances (Harrington et al. 2005). Thirdly, in A. aegypti, mating takes place in swarms near to the host, in and around houses (Hartberg 1971; Cabrera and Jaffe 2007). Consequently, the small-scale genetic clustering of A. aegypti may be due to the clustering of oviposition sites and hosts around human habitation coupled with low dispersal. It is possible that these genetic clusters correspond to a house or group of closely situated houses and their immediate environs as studies of the distribution of A. aegypti have shown that they tend to be highly spatially clustered with a house typically acting as the unit of clustering (Getis et al. 2003). However, at present, we do not know the exact spatial scale of genetic clusters nor what environmental features may form the barriers between them.

The levels of genetic differentiation here were notably lower than those detected in previous studies. Average values of FST in this study were 0.026 or 0.032 on spatial scales of 5 and 50 km (Table 3) respectively. In comparison, other microsatellite-based studies on a similar spatial scale (comparing sites within cities) had overall FST values of 0.056 (Huber et al. 2002) and 0.053 (Paupy et al. 2004) with many individual values being substantially larger (>0.1). Another study on a larger spatial scale (14 samples from three cities in Thailand, Vietnam and Cambodia) had an overall FST value of 0.117 (Huber et al. 2004) compared with an average value of 0.045 here for a similar spatial scale. This could be due to differences in polymorphism level of the microsatellites used in different studies (Hedrick 1999). However, as the markers appear to have similar diversity, at least part of the reason for the lower levels of genetic differentiation in this study could be that our 500 m sampling sites encompass multiple demes due to the highly clustered structure of A. aegypti (Wright 1921). In some other studies, larvae were collected from small areas of two to three (Huber et al. 2004) or four to five houses (Huber et al. 2002). Despite the small collection area, these studies reported some very high levels of FIS for sampling sites (up to 0.661 in the later study and 0.579 in the earlier one). This may indicate that the sampling area contains only a small number of families with high levels of inbreeding. Alternatively, if the high FIS is due to a sampling effect, for example, the pooling of a relatively small number of larval collections containing siblings, this would result in over-inflated estimates of population differentiation. Despite these differences between studies associated with differences in sampling strategy, it is clear that A. aegypti has a clustered distribution and restricted dispersal on a very small spatial scale.

Factors underlying large spatial scale population structure

Although a signal of isolation by distance is detected, here it is low (Fig. 1), indicating that the restricted mosquito dispersal detected at a small spatial scale does not explain larger-scale population structure. Further, the Nigerian sample has relatively high genetic distinctiveness and substantially higher allelic richness than the Southeast Asian populations. This is consistent with Africa being the ancestral region of A. aegypti and suggests there has been a founder effect in its colonization of Southeast Asia. This provides evidence to support the suggestion made by several authors (Smith 1956; Tabachnick and Powell 1979; Failloux et al. 2002) that these mosquitoes have spread from Africa to Southeast Asia via shipping. As Southeast Asian A. aegypti most likely originated from east, rather than west, Africa (Tabachnick 1991), it will be necessary to confirm the founder effect by determining the genetic diversity of A. aegypti in East Africa, which was not possible here due to the small sample size.

Despite the higher genetic diversity found in Africa, there was no evidence of a substantial population expansion in Southeast Asia. There are several reasons why the tests may be unable to detect an expansion even if one has occurred, e.g. too few loci used, too long since the time of expansion (Reich et al. 1999). However, it is also possible that the lack of an expansion signal indicates there have been multiple introductions into Southeast Asia from several genetically differentiated sources. This would result in a high variance among loci in the variance of allele frequency sizes and generate multimodal allele size frequency distributions, resulting in negative g- and k-values respectively. Even if there has been colonization from multiple sources, the overall numbers of mosquitoes introduced may have been insufficient to capture all the genetic diversity present in Africa. This certainly seems to be a likely scenario given that, over the last two to three centuries, there have been large numbers of ships coming into many Southeast Asian ports from different locations in Africa.

The spatial clustering analyses (tess and geneland) generated similar results giving greater confidence in their findings. Both methods show that although there is some regional clustering with similarities among populations within countries, genetic clusters do not correspond obviously to spatial distance; some populations that are very close to each other are highly divergent (for example, the M4 population 50 km from the main Yangon Cluster and the T8 population 50 km from the eastern Thailand cluster) whereas some populations that are far from each other were genetically similar. Some sites that are geographically distant yet genetically similar (e.g. T9–T12 in Songkla, southern Thailand, and T1–T4, Chiang Mai, northern Thailand) are connected by major roads. In addition, many of the locations with high genetic heterogeneity are ports (Colombo in Sri Lanka, Yangon in Myanmar, Songkla in Thailand) or major cities (Phnom Penh, Cambodia). Together, these findings indicate that there is some long-distance dispersal of A. aegypti facilitated by major human transportation routes. This evidence for passive migration supports previous similar suggestions based on population-genetic studies in the southern USA (Merrill et al. 2005) and in Southeast Asia (Huber et al. 2004). Passive dispersal likely involves the movement of immature stages of mosquito as well as adults; eggs, larvae and pupae could easily occur in water containers transported by people and the eggs can withstand desiccation for several months (Christophers 1960). Much of the genetic structuring caused by passive migration could be the result of the initial colonization process, when population sizes were smaller and there were empty niches to expand into. Although contemporary passive migration may have a smaller influence on structure, there is every reason to suspect that such passive dispersal is ongoing.

The historical processes of colonization and range expansion in this species appear to have played a major role in shaping large-scale population structure in A. aegypti in Southeast Asia. This therefore makes it more difficult to infer contemporary factors that are of relevance for vector control, i.e. active and passive dispersal using conventional population-genetics methods; the limitations of traditional population-genetics methods that assume equilibrium between migration–mutation–drift equilibrium for the inference of gene flow are well recognized (e.g. Nichols and Beaumont 1996). In this context, it is worth noting that the clustering approaches used here provided valuable information on genetic structure that was not apparent from the traditional population-genetics methods.

Implications for vector control

The very restricted dispersal of A. aegytpi on a small spatial scale has several implications for vector control. Conventional control measures such as insecticides or the removal of larval habitats in and around houses are often implemented following a dengue outbreak (WHO 2006). The limited dispersal of A. aegypti indicates that this approach should be effective in removing infective populations and preventing their spread. (Of course, this does not prevent the spread of the virus by humans.) A major problem for vector control is insecticide resistance (Gubler 2002). Restricted dispersal of A. aegypti could make the implementation of a stable zone strategy to delay the spread of insecticide resistance genes difficult. In the stable zone strategy, the area treated with insecticide needs to be small relative to the scale of dispersal in order to allow the reinvasion of fitter, nonresistant genotypes into the treated area so preventing the establishment of the resistance genes (Lenormand and Raymond 1998). The restricted dispersal of A. aegypti would also make the strategy of introducing and driving refractory genes through large geographical areas (James et al. 2006) extremely difficult. On the other hand, the use of sterile insect technique (SIT) could be very effective in localized areas. In SIT, large numbers of sterile males are released and reduce population sizes when they mate with local females (Thomas et al. 2000 and references therein). This method is most effective when dispersal is limited and sufficient numbers of sterile males are released to generate a travelling wave of extinction (Lewis and Van Den Driessche 1993). As the extent of passive large-scale dispersal would affect all of these control measures, there is clearly a need to better understand the extent and means by which this takes place.


Our finding that Southeast Asian populations of A. aegypti are characterized by genetic structuring on a very small spatial scale indicates the need for further fine-scale studies. Such studies need to determine the exact spatial scale of genetic clusters and to identify the factors determining active dispersal, particularly which environmental features may form barriers to dispersal. An effective approach to this will be detailed landscape genetic studies (using methods such as those used in this study) in which individuals are sampled continuously and on a fine scale throughout urban, peri-urban and rural environments. It is also important to determine the role of environmental factors, particularly those leading to passive migration, in shaping large-scale population structure and the extent to which these are historical or contemporary effects. The geste-modelling approach has the potential to be extremely valuable for this if more loci are used coupled with a more extensive geographical sampling programme to give greater power to distinguish between alternative models. As all analytical methods undoubtedly have their specific limitations, the future use of a variety of methodologies with correspondingly careful interpretation is the approach most likely to lead to a meaningful understanding of population structure and gene flow in A. aegypti.


We would like to thank anonymous referees for their valuable comments, which have enabled improvements to this paper. We are grateful to all the entomology field staff from Myanmar, Thailand and Cambodia for their assistance and help in mosquito sample collections. We also thank Prof. Terry Burke and Dr Deborah Dawson from the University of Sheffield for their kind advice and help. The microsatellite genotyping was performed in the SMGF Laboratory with support from the Natural Environment Research Council (NERC), UK. This study was funded by the World Health Organization, Special Programme for Research and Training in Tropical Diseases (WHO/TDR) Collaborative Research Project Grant ID-A40198 and Research Training Grant (RTG) ID-A60987.


Statistics for microsatellite diversity. Allelic richness (RS), observed heterozygosity (HO) and population inbreeding coefficients (FIS) with associated probability are given for each locus for each population. Additionally, RS, HO and gene diversity are averaged over all loci (for each population) and FIS is averaged over all populations (for each locus). The numbers of significant deviations from Hardy–Weinberg Equilibrium (HWE) are given as proportions of the total number of loci and total number of populations.

LocusMyanmar populations
Average RS5.695.315.855.626.235.234.545.695.924.155.316.15
Average HO0.640.620.660.610.660.660.600.680.660.600.670.70
No. significant departures from HWE10/137/135/138/131/131/131/133/131/131/131/131/13
Mean gene diversity over all loci0.700.730.760.750.730.740.730.760.740.710.740.77
LocusThailand populations
Average RS5.924.154.545.
Average HO0.670.600.610.670.690.710.700.680.650.650.560.66
No. significant departures from HWE7/132/131/135/137/133/136/1311/1307/1310/131/13
Mean gene diversity over all loci0.740.680.680.720.760.790.780.770.730.720.760.68
LocusCambodia populations
Average RS5.626.546.234.775.855.624.387.386.236.857.006.77
Average HO0.600.670.650.590.620.630.630.710.640.650.650.66
No. significant departures from HWE4/136/138/138/135/136/132/130/134/136/136/135/13
Mean gene diversity over all loci0.750.760.740.680.740.760.730.780.720.760.770.75
LocusOther populations
KenyaNigeriaSri LankaNE IndiaNo. significant departures from HWE in all 40 populationsAverage FIS in all 40 populations
Average RS3.548.084.923.08  
Average HO0.790.720.680.65  
No. significant departures from HWE010/131/131/13  
Mean gene diversity over all loci0.710.800.770.67