Novel genotyping approaches to easily detect genomic admixture between the major Afrotropical malaria vector species, Anopheles coluzzii and An. gambiae

Abstract The two most efficient and most recently radiated Afrotropical vectors of human malaria – Anopheles coluzzii and An. gambiae – are identified by single‐locus diagnostic PCR assays based on species‐specific markers in a 4 Mb region on chromosome‐X centromere. Inherently, these diagnostic assays cannot detect interspecific autosomal admixture shown to be extensive at the westernmost and easternmost extremes of the species range. The main aim of this study was to develop novel, easy‐to‐implement tools for genotyping An. coluzzii and An. gambiae‐specific ancestral informative markers (AIMs) identified from the Anopheles gambiae 1000 genomes (Ag1000G) project. First, we took advantage of this large set of data in order to develop a multilocus approach to genotype 26 AIMs on all chromosome arms valid across the species range. Second, we tested the multilocus assay on samples from Guinea Bissau, The Gambia and Senegal, three countries spanning the westernmost hybridization zone, where conventional species diagnostic is problematic due to the putative presence of a novel “hybrid form”. The multilocus assay was able to capture patterns of admixture reflecting those revealed by the whole set of AIMs and provided new original data on interspecific admixture in the region. Third, we developed an easy‐to‐use, cost‐effective PCR approach for genotyping two AIMs on chromosome‐3 among those included in the multilocus approach, opening the possibility for advanced identification of species and of admixed specimens during routine large scale entomological surveys, particularly, but not exclusively, at the extremes of the range, where WGS data highlighted unexpected autosomal admixture.


| INTRODUC TI ON
Two among the two most efficient and recently radiated Afrotropical vectors of human malaria -Anopheles coluzzii and An. gambiae (Coetzee et al., 2013) -are morphologically identical but genetically distinct, with most differentiation concentrated in regions of high divergence on the centromeres of the sexual chromosome and of the two autosomes (Ag, 1000G Consortium, 2017. The two species -provisionally named M and S molecular forms (della Torre et al., 2001) -were initially identified by species-specific SNPs mapping in the IGS multicopy ribosomal region within chromosome-X centromere (Favia et al., 1997), based on which a few diagnostic PCR and PCR-RFLP approaches have been developed (Fanello et al., 2002;Santolamazza et al., 2004;Wilkins et al., 2006). A second diagnostic marker, SINE-X6.1, situated again in the X-centromeric region and consisting of the exclusive insertion of a transposable element of the SINE-X200 family in An. coluzzii (Santolamazza et al., 2008), is also commonly used for species identification either alternatively or in association with IGS-markers. Hereafter we will refer to IGS and to SINE for identifications based on IGS-SNPs and SINE-X200 insertion, respectively.
Anopheles coluzzii and An. gambiae are sympatric across west and central Africa although with different biogeographic patterns of habitat segregation Simard et al., 2009), and show strong ecological differentiation, e.g., in larval breeding sites (Cheng et al., 2012;Gimonneau et al., 2012;Kamdem et al., 2012), and in strategies for surviving the dry season (Dao et al., 2014;Huestis et al., 2019). Although viable hybrid progeny are easily generated under laboratory conditions, IGS-hybrid genotypes are found in nature at a frequency generally <0.2%, thanks to the additive effect of genetic, behavioural and ecological isolating mechanisms . The strength of reproductive isolation between An. coluzzii and An. gambiae varies over space and time leading to periodical breaks in reproductive isolation resulting in extensive hybridization followed by re-establishment of strong premating barriers (Fontaine et al., 2015;Lee, Marsden, Norris, et al., 2013;Pombi et al., 2017). This incomplete isolation creates opportunities for current gene flow with relevant implication for malaria control, as clearly revealed by extensive evidence of recent adaptive introgression of alleles associated to insecticide resistance between the two species (Clarkson et al., 2014).
Differently from the situation in the rest of the range, at the western edge of the two species range populations showing heterozygous IGS-patterns are persistently found at frequencies >20% in Guinea Bissau and Senegal (Ndiath et al., 2014;Niang et al., 2014;Nwakanma et al., 2013;Oliveira et al., 2008;Vicente et al., 2017) and up to 7% in The Gambia (Caputo et al., , 2014Nwakanma et al., 2013). Moreover, in this area, IGS and SINE diagnostics are not always in agreement, as it is believed the case in the rest of the range, suggesting recombination within chromosome-X centromeric region, which was confirmed by multilocus analysis (Caputo et al., 2016). Indeed, genomic studies have revealed that in coastal Guinea Bissau extensive asymmetric introgressive hybridization from An. coluzzii to An. gambiae has eroded the major genomic islands of divergence on chromosomes-2 and −3, but to a far lesser extent on chromosome-X (Caputo et al., 2016;Lee, Marsden, Norris, et al., 2013;Marsden et al., 2011). Impacts of massive introgressive hybridisation may range from genomic erosion and species collapse to rapid adaptation and speciation (Seehausen et al., 2008;Baack & Rieseberg, 2007;Nolte & Tautz, 2010). Microsatellite and WGS analyses suggested that the coastal region of Guinea-Bissau is colonized by a stable "hybrid form" -characterised by an A. gambiaelike sex chromosome and by massive introgression of An. coluzzii autosomal alleles. This coastal "hybrid form" appears to be separated from genetically distinct An. gambiae inland populations by an intermediate area dominated by An. coluzzii (Vicente et al., 2017).
The consequences of this partitioning on malaria transmission have still to be investigated, but preliminary evidence showed that the coastal "hybrid form" in Guinea Bissau has a lower anthropophylic tendency compared to inland populations (Vicente et al., 2017) and may be resilient to insecticide resistance traits. Intriguingly, a similar pattern of genetic partitioning was reported also between coastal and inland An. gambiae populations along the Gambia River (Caputo et al., 2014). The above data clearly demonstrate that a simple gambiae/coluzzii species dichotomy based on the widely used X-centromeric diagnostic approaches is not sufficient to capture the rich diversity and complex histories of contemporary populations. There is a need for novel, easy-to-implement and inexpensive molecular tools with higher ability to detect possible introgression, to be applied on large samples collected within epidemiological or applied field surveys. To this aim,  developed a Divergence Island SNP (DIS) mass-spectrometry genotyping assay based on 15 SNPs mapping on the centromeric regions of chromosome X (N = 7), chromosomal arms-2L (N = 5) and -3L (N = 3) and reported to be fixed in the two species (Stump et al., 2005;Turner et al., 2005;White et al., 2010).

Results from the
"Pure" species, F1 and backcross hybrids were arbitrarily identified allowing two mismatched calls to provide reliable estimates of hybridization. This approach allowed the authors to study introgression between An. coluzzii and An. gambiae even in the absence of huge sequencing capacities and provided important evidence of higher than expected gene flow between the two species as well as of fluctuating hybridization in both time and space (Lee, Marsden, Norris, et al., 2013;Sanford et al., 2014;Norris et al., 2015;Hanemaaijer et al., 2018). While DIS represented a great advance compared to conventional diagnostic assays, allowing the better detection of F1 and backcross hybrids without the need of genome sequencing, it has a few pitfalls, mainly due to the fact that SNPs were chosen based on samples from a limited geographical range, before data from AG1000G project became available. First, none of the 15 SNPs map on chromosomal arm 2R and 3R. Second, the SNPs were ascertained based on a restricted geographical range in west Africa and Cameroon, overlooking the species-specific diversity in central and east Africa. Third, all SNPs map in relatively small centromeric regions, with presumably high linkage within each chromosomal arm.
The main aim of this study was to develop novel, easy-toimplement tools for genotyping An. coluzzii and An. gambiae-specific ancestral informative markers (AIMs) identified by the Ag1000G project (Ag, 1000G Consortium 2017). First, we took advantage of the large set of Ag1000G data covering most of the distribution range of the two species, in order to develop a multilocus approach to genotype a number of unlinked AIMs on all chromosome arms valid across the species range. Second, we tested the multilocus assay on samples from Guinea Bissau, The Gambia and Senegal, three countries spanning the high hybridization zone, where singlelocus species diagnostic assays are problematic due to the putative presence of a novel "hybrid form". Third, we developed an easy-touse cost-effective end-point PCR approach for genotyping two AIMs among those on chromosome-3 included in the multilocus approach, opening the possibility for advanced identification of species and of admixed specimens in routine large scale entomological surveys.

| Computational ascertainment of diagnostic variants
In order to take into account the maximum possible genetic diversity within An. coluzzii and An. gambiae genetic pools, diagnostic variants were ascertained based on genomic data from Ag1000G project Phase-1 data set (Ag, 1000G Consortium, 2017) and from Neafsey et al. (2010), which included sympatric populations from Mali). The latter populations were added in order to identify markers on chromosome arm 2L, as An. coluzzii individuals from the Ag1000G Phase 1 data set experienced introgression in the pericentromeric region of this chromosome driven by the kdr mutation.
Overall, allele frequency difference (DAF) was calculated from 675 individuals from six countries (genotyped in Phase 1 of Ag1000G, i.e., An. coluzzii from Burkina-Faso and Angola and An. gambiae from Angola, Burkina-Faso, Guinea, Gabon, Cameroon, Uganda) spanning a broad geographic range from western to eastern Africa (Ag, 1000G Consortium, 2017). Different DAF cutoffs were calculated in order to be able to find at least 30 diagnostic variants for each chromosomal arm. In the case of SNPs on chromosomal arm 2L, selection was initially carried out based on the three nonintrogressed An. coluzzii individuals from Ag1000G and all An. gambiae populations, and later selecting only variants matching with the variants with DAF ≥ 0.9 in the Malian populations genotyped in Neafsey et al., (2010). Match with variants with DAF ≥ 0.9 in the Malian populations was also carried out for chromosome-X variants in order to take into account this additional pool. This was not possible for variants on chromosome arms 2R, 3R and 3L as no matches were present between variants found in WGS data from Ag1000G and the chip array by Neafsey et al., (2010).

| Mass spectrometry assay design and validation
To allow optimal assay design, all SNPs selected from the above analysis were inspected for polymorphisms in 250 bases of the 5′ and 3′ flanking sequences with respect to the Ag1000G Phase-1 data.
We only included those flanking polymorphisms where three or more chromosomes were identified carrying the minor allele (MAF > 0.2%). Target SNPs and flanking sequences with polymorphisms identified using standard IUPAC single-letter notation were formatted for the Agena (formerly Sequenom) MassARRAY Assay Design v3.1 Software (Agena Bioscience).
The successfully designed assays were tested on the Agena iPLEX platform on 343 individuals from nine Ag1000G populations and on 26 individuals from G3 laboratory colony (Table S1).
Subsequently, a selection of a final set of assays was undertaken using the following criteria: (a) Performance in preliminary multiplex assays, (b) maximization of distribution on different chromosome arms, (c) location within coding regions, (d) maximization of physical distance between variants, and (e) maximization of the number of assays that fit into a single multiplex. This resulted in a set of 35 SNPs (Table S2 and Supplementary Material S1) for validation on the Agena iPLEX platform using the above mentioned mosquito samples included in Ag1000G.
The subset of validated SNPs was used to genotype a total of 564 indoor-resting An. coluzzii and An. gambiae individuals collected in the HHZ along a west to east transect from coastal to inland The Gambia and Senegal (N = 188; [Caputo et al., , 2011) and from coastal to inland Guinea-Bissau (N = 376; [Vicente et al., 2017]), identified based on chromosome-X linked IGS-diagnostic marker (Fanello et al., 2002) and, in case of specimens from Guinea Bissau, also by the species-specific insertion of a SINE-X200 in the same centromeric region of chromosome-X (Santolamazza et al., 2008).
Eventually, to further validate species-specific variants included in the final assay, allele frequency difference was calculated in silico on Ag1000G Phase2 data. This allowed the evaluation of DAF on a total of 938 specimens from eight An. gambiae and five An. coluzzii populations. Allele frequencies were calculated by python module scikit-allel v.1.2.1 (https://zenodo.org/badge/ lates tdoi/7890/cggh/ sciki t-allel).

| Agena iPLEX mass-spectrometry assay
Genotyping of samples was undertaken using the Agena iPLEX platform as described in Rockett et al. (Malaria Genomic Epidemiology Network, 2014) and Fabrigar et al., (2015) using primer-extension preamplified gDNA samples. For specimens from The Gambia whole genome amplification was performed directly on mosquito fragments, such as single legs or wings. Initial genotype calls were assessed using cluster plots viewed on the Agena MassARRAY TYPER v4.0.22 software. Further QC of the data included comparison with Ag1000G data, pass rates of assays across the sample sets, pass rates of samples the set of "valid" assays.

| Development of a PCR approach
Based on results obtained from the mass-spectrometry assay, sequences flanking the 7 SNPs on chromosome-3 were scored in order to identify those most suitable ones to design primers for the development of an easy-to-use and cost-effective end-point PCR-approach to easily detect signatures of autosomal introgression. SNPs from chromosome arms 2L and 2R were excluded from the design since they are heavily influenced by the introgression of the kdr mutation from An. gambiae to An. coluzzii (Ag, 1000G Consortium, 2017) and the wide presence of chromosomal inversions (Pombi et al., 2008), respectively. Primers for a Tetra-ARMS-PCR for loci 3L_129051 and 3R_42848 on chromosome arms 3L and 3R, respectively, were designed using the online program PRIMER1 (http:// prime r1.soton.ac.uk/prime r1.html) developed by (Ye et al., 2001) and specificity of primers was checked using BLAST (http://www. ncbi.nlm.nih.gov/blast). Primer sequences, fragment length and annealing temperature for amplification of the two loci are provided in Table 1. PCRs for the two loci were optimized separately in a 25 µl PCR reaction performed on a BIORAD C1000. Results were visualized on a 2.5% agarose gel stained with Midori Green Advance (NIPPON Genetics Europe) as shown in Figure  Both PCR-assays were tested on An. coluzzii and An. gambiae samples from coastal and inland Guinea Bissau for which it was possible to compare PCR and MassArray genotyping results, as well as on specimens from Burkina Faso (Koubri) for which no previous MassArray genotyping data were available. All these specimens were also genotyped for both X-centromeric diagnostic markers (IGS and SINE) used to identified samples from Guinea Bissau, The Gambia and Senegal (see above). For a subset of individuals, including all the specimens showing unclear or discordant results between PCR and mass spectrometry assay, PCR products were purified using the SureClean protocol (Bioline) and sequenced at Eurofins Genomics GmbH.

Ta (°C)
Primer  (Lee, Marsden, Nieman, et al., 2013), two did not pass all the quality filters in the WGS Ag1000G data and the remaining 7 had DAF values below the cut-off applied in the present study.
Primers were successfully designed for 122 species-specific variants and split into four multiplexes (Supplementary Material S1).
The remaining 216 SNPs failed assay design primarily due either to failure in satisfying the assay constraints or to polymorphisms in possible primer positions (Supplementary Material S1). Mass spectrometry genotyping was successful for 104 species-specific variants (Supplementary Material S1) and of these, 35 were arranged in a single multiplex mass-spectrometry assay (Table S2). Out of the latter 35 diagnostic variants, 29 (including three variants also present in the DIS method; (Lee, Marsden, Nieman, et al., 2013; Supplementary Material S1) were successfully genotyped in >75% of tested individuals. Among these 29 variants, 10 mapped on chromosome-X centromere, seven on chromosome-2L centromere, three on chromosome-2R, four on chromosome-3L centromere and five on chromosome-3R telomere.
Out of the 29 selected variants, two (3L_367248 and 3R_50590) showed a < 99% concordance between results from Illumina genotyping and from mass-spectrometry assay (N = 229 individuals) ( Of the 26 most reliable species-specific variants included in the final mass-spectrometry assay, 10 mapped in a 5.5 Mb region on chromosome-X centromere, six in a 2.2 Mb region on chromosome-2L centromere, three in a 9.5 Mb region on chromosome-2R, three in a 251 Kb region on chromosome-3L centromere and four in a 37 Kb region on chromosome-3R telomere ( Table 2).
Computation of allele frequency difference on An. gambiae (N = 655) and An. coluzzii (N = 283) specimens, including Phase-1 and Phase-2 Ag1000G specimens, produced allele frequencies of the 26 SNPs similar to those calculated from Phase-1 data set and used as DAF cutoffs for each chromosomal arm (Table S4).
We arbitrarily decided to establish a minimum number of consensus between species-specific variants to define pure An. gambiae (Ga) and An. coluzzii (Co) multilocus genotypes, as well as F1 (Co/Ga) and admixed (adm) ones, based on the mass-spectrometry assay. In agreement with the different DAF cutoffs used for the se-

| Genotyping of An. coluzzii and An. gambiae field populations
The 26 species-specific variants included in the Agena iPLEX massspectrometry assay were genotyped in 564 specimens from the westernmost extreme of An. coluzzii and An. gambiae range, where introgression between the two species is known to be widely pre- Grey shading indicates SNPs excluded from the final assay. a SNPs included also in the DIS assay proposed by Lee, Marsden, Norris, et al. (2013).

TA B L E 2 Anopheles coluzzii and
An. gambiae species-specific variants included in mass-spectrometry assay. Additional sequence and primer design data provided in Supporting Information Material the fact that genotyping was performed directly on body parts (legs, wings) preserved >10 years in silica gel, and not on extracted DNA as in the case of samples from Guinea Bissau. Despite this, >50% of all the species-specific variants, as well as >50% of loci/chromosome, were successfully amplified in 86% of specimens from The Gambia, highlighting the potentiality of the approach to be applied to suboptimal DNA templates (i.e., old samples and/or without DNA extraction). Table 3 and Figure 2 show the results of multilocus massspectrometry assay with reference to 23/26 SNPs (three speciesspecific variants, X-22164043, 2R-49438586 and 3L-380974, with pass-rate <75% in samples from Gambia/Senegal are excluded from the analyses) for 533 specimens from Guinea Bissau, Gambia and Senegal with >50% of species-specific variants overall and per chromosome successfully amplified (Table S5). Samples are classified as pure An. coluzzii (Co) or An. gambiae (Ga), Co/Ga hybrid (F1) and admixed applying the same criterion described as described above with two differences: (a) The six loci on chromosomal arm 2L are not excluded, since no signs of introgression were detected for the vgsc gene (which could have indicated linkage with kdr alleles) -with the exception of AIM 2L_1776348 which is highly introgressed in An. coluzzii but unlikely due to linkage with kdr alleles, as an adjacent AIM (2L_2431005) shows a completely different pattern; (b) a flexibility of two instead of one discordant speciesspecific variants is allowed for autosomal loci (as in Lee, Marsden, Nieman, et al., 2013;Lee, Marsden, Norris, et al., 2013) to account for the larger number of loci considered due to inclusion of the 6 chromosome-2L loci. Thus, specimens were classified as pure Co and Ga when 9/9 loci on chromosome-X and 12/14 loci on autosomes were genotyped as An. coluzzii or An. gambiae, respectively; as F1 when 9/9 loci on chromosome-X and 12/14 loci on autosomes were at the heterozygous state and adm when more discordances among loci were present.
Results show high levels of concordance between IGS-based identification and results of the genotyping of the nine loci on chromosome-X in all populations (92%, 90% and 100% in inland Guinea Bissau, The Gambia and Senegal, respectively), except that from coastal Guinea Bissau, where concordance drops to 71%, largely due to IGS-An. gambiae specimens with admixed multilocus genotypes (Table 3). Interestingly, a pure multilocus Ga genotype characterises 7/12 and 14/74 IGS-hybrids from the Gambia and coastal Guinea Bissau, respectively.
When results from genotyping of the 23 loci are taken into account, concordance remains high in the inland Senegal population (95%), but is reduced to 46%, 75% and 59% in populations from coastal, inland Guinea Bissau and coastal Gambia, respectively. In coastal populations from Guinea Bissau and The Gambia the proportion of individuals showing admixed multilocus genotypes with pure IGS-genotype is higher in Ga (100% and 96%, respectively) than in Co (53% and 26%).

TA B L E 3
Comparison between species ID obtained using the X-diagnostic marker (IGS) or the multilocus mass-spectrometry assay. Only specimens for which >50% of species-specific variants per chromosome were successfully genotyped are included. Specimens are classified as follows: (i) based on the nine loci on chromosome-X: Co and Ga when all 9/9 loci correspond to An. coluzzii or An. gambiae, respectively, F1 when all loci are at the heterozygous state and adm when discordances among loci are present; (ii) based on 9 loci on chromosome X + the 14 loci on autosomes: Co and Ga when ≥12/14 autosomal loci correspond to An. coluzzii and An. gambiae, respectively, F1 when ≥12/14 autosomal loci are consistently found at the heterozygous state and adm when discordances among loci are present Country IGS X (9 loci) Concordance X(9 loci) + autosomes (12 out of 14 loci)

| Development of a PCR approach for chromosome-3 SNP genotyping
The novel PCR approach developed to genotype two speciesspecific autosomal SNPs (3L_129051 and 3R_42848) was validated on 106 specimens from across Guinea Bissau, previously analysed by the novel mass-spectrometry assay, and tested on 146 An. coluzzii and 39 An. gambiae specimens from Burkina Faso, where no/low admixture is expected . For all Guinean specimens, DNA quality was sufficient to allow genotyping of both loci in a single PCR reaction, while for 50% of the specimens from Burkina Faso each locus had to be amplified separately, due to low DNA quality affecting efficacy of simultaneous amplification of the two loci.
In Guinea Bissau samples, results of PCR-genotyping of both loci are highly concordant (>97%) with results obtained by massspectrometry assay (Table S6). Three (2.8%) discordances are observed for locus 3L-129051: sequences of the PCR-products allowed to identify one genotyping error in the mass-spectrometry assay and two in the PCR-genotyping approach, one of which was explained by a polymorphism next to the species-specific SNP genotyped (Table   S7). For the single mismatch observed for locus 3R-42848, sequencing confirmed the mass-spectrometry assay results. Considering specimens from Guinea Bissau and Burkina Faso altogether, four discordances between PCR and Sanger sequencing were revealed (Table S7). Of these, three specimens produced a heterozygous sequence for locus 3L_129051, while showing a homozygote (AA) PCR-pattern, either due to a preferential amplification of the allele A or, as explained above, by the presence of a polymorphism in the inner primer binding site. One specimen genotyped by sequencing as homozygote CC for locus 3R_42848 produced a heterozygote PCR-pattern, suggesting an aspecific binding of the Aspecific interior primer.

| Novel tools for An. coluzzii and An. gambiae genotypization/identification
In this study we developed and validated two complementary genotyping approaches to allow discrimination among An. coluzzii, An. gambiae and admixed individuals (including F1 hybrids) and to overcome the limitations of conventional chromosome-X-linked markers (i.e., IGS and SINE).
The first approach-based on the genotyping of selected 26 AIMs   ), the bilocus PCR assay confirmed low levels of admixture expected in the region, but detected 2 An. coluzzii and 1 An. gambiae with admixture in one or both of the two autosomal loci, which was confirmed by sequencing. The bilocus PCR approach appears to be robust to PCR errors due to polymorphisms in the inner primer binding sites or to aspecific binding of inner primers, which could result in deviations from Hardy-Weinberg equilibrium, such as heterozygote excesses or null allele homozygotes. In any case, in order to exclude the occurrence of such biases it is advisable to confirm at least a proportion of admixed genomes, identified by the bilocus assay, through sequencing of surrounding PCR fragments. inland Senegal show a virtually pure AIM-pattern, confirming preliminary evidence that the evolutionary process ongoing along the coast is very limited inland. More detailed analysis of the data set does not pertain to this study, but the above evidence clearly indicates the high value of a multilocus approach to study evolutionary processes ongoing between and within these two major malaria vectors. This is instrumental in order to address this issue not only in the "far-west" but also in other regions in west Africa (e.g., to define the limits of the high hybridization zone), as well as An. coluzzii and An. gambiae throughout their range, whose relevance in malaria epidemiology and in vector control has still to be fully recognised. This is largely due to lack of easy to implement and affordable tools to detect signature of genomic admixture on a large scale and to associate these to epidemiologically relevant characters (e.g.,
We suggest that knowledge on these topics can be rapidly advanced by large scale application of the biallelic PCR assay here presented in association to one of the two conventional chromosome-X-linked diagnostics. So far, hybridization has been only revealed by unusually high number of specimens showing an IGS heterozygous pattern (Caputo et al., , 2014Ndiath et al., 2014;Niang et al., 2014;Nwakanma et al., 2013;Oliveira et al., 2008;Vicente et al., 2017). On the other hand, the high level of autosomal admixture detected by WGS in Kenya would have never been detected by routine X-linked diagnostics (AG, 1000G Consortium, 2017). We envisage that applying this new genotyping approach in the frame of large scale field studies carried out for malariological purposes (e.g., to monitor susceptibility to insecticide in populations target of vector control activities of shifts in behaviours following LLIN massive implementation) will allow to detect possible signatures of admixture between the two species. Whenever genotyping results will be found not to be consistent between the autosomal and the X-chromosome markers, further analysis could be performed using the novel multilocus mass-spectrometry assay to investigate levels of introgression across the population and the genome and to pave the way to deeper demographic studies based on larger set of neutral markers. We expect that the application of this approach on samples from several geographic regions from where little is known about An. coluzzii and An. gambiae has a great potential to provide a continental perspective of interspecific relationships and on speciation processes. In addition, both the bi-and multilocus the assay allow to detect admixture in male samples hemizygous for the chromosome-X markers, opening the unprecedented opportunity of studying the bionomics of hybrid/admixed males, with particular to swarming and mating behaviour.
More ambitiously, we propose that the two selected autosomal SNPs become complementary diagnostic markers for An. coluzzii and An. gambiae. Indeed, we are aware that this could question the actual identity of the two species, which were initially described as molecular forms (della Torre et al., 2001) and eventually raised at the species status (Coetzee et al., 2013) based on IGS-linked species-specific SNPs.
However, we believe that after 20 years from M-and S-molecular form description (now corresponding to An. coluzzii and An. gambiae), the time is mature for the recognition of the need to take into account the complex relationships between and within them. Including the genotype of the two chromosome-3 SNPs in the species definition will avoid biases in the characterization of the vectorial system, an information that may impact vector control, if the admixed populations will turn out to differ from pure ones in relation to their capacity to transmit malaria or respond to vector control measures.

ACK N OWLED G EM ENTS
This publication uses data from the MalariaGEN Anopheles gambiae 1000 Genomes Project as described online (https://www. Sequence data for 3L and 3R genotyping are available at GenBank accession numbers MW480883-MW480891.