Identification of phlebotomine sand flies (Diptera: Psychodidae) from leishmaniasis endemic areas in southeastern Mexico using DNA barcoding

Abstract Leishmaniasis, a vector‐borne disease transmitted to humans through the bite of phlebotomine sand flies, is of public health significance in southeastern Mexico. Active and continuous monitoring of vectors is an important aspect of disease control for the prediction of potential outbreaks. Thus, the correct identification of vectors is paramount in this regard. In this study, we employed DNA barcoding as a tool for identifying phlebotomine sand flies collected in localized cutaneous leishmaniasis endemic areas of Quintana Roo, Mexico. Specimens were collected using CDC light and Shannon traps as part of the Mexican Ministry of Health surveillance program. DNA extraction was carried out using a nondestructive protocol, and morphological identification based on taxonomic keys was conducted on slide‐mounted specimens. Molecular taxonomic resolution using the 658‐bp fragment of the mitochondrial cytochrome c oxidase subunit 1 (cox1) gene was 100% congruent with the morphological identification. Seven species were identified: Lutzomyia cruciata (Coquillett 1907), Lutzomyia longipalpis (Lutz & Neiva 1912), Psathyromyia shannoni (Dyar 1929), Dampfomyia deleoni (Fairchild & Hertig 1947), Dampfomyia beltrani/steatopyga (Vargas & Díaz‐Nájera 1951), Bichromomyia olmeca olmeca (Vargas & Díaz‐Nájera, 1959), and Brumptomyia mesai (Sherlock 1962). Mean intraspecific divergence ranged from 0.12% to 1.22%, while interspecific distances ranged from 11.59% to 19.29%. Neighbor‐joining (NJ) analysis using the Kimura 2‐parameter model also showed specimens of the same species to be clustered together. The study provides the first cox1 sequences for three species of sand flies and indicates the utility of DNA barcoding for phlebotomine sand flies species identification in southeastern Mexico.


| INTRODUC TI ON
Over half of tropical infectious diseases are vector-borne, with arthropods directly or indirectly involved in the transmission of pathogens to humans (WHO, 2017). Leishmaniasis, vectored by phlebotomine sand flies, is of significant public health importance in southern Mexico. It is an endemic disease in this part of the country with localized cutaneous leishmaniasis as the predominant clinical form of the disease (Ready, 2013;Velasco-Castrejón, Ibáñez-Bernal, & Rivas-Sánchez, 1994). Other forms such as diffuse cutaneous leishmaniasis, mucocutaneous leishmaniasis, and visceral leishmaniasis are, however, not uncommon (Velasco-Castrejón et al., 1994).
Leishmaniasis control is often exacerbated by the complexity of the transmission cycle that involves several vectors and reservoir hosts, depending on geographical locations (Monroy-Ostria, Hernandez-Montes, & Barker, 2000).
Unambiguous species identification is necessary to ascertain the role of each species in disease transmission (Cohnstaedt et al., 2011).
However, uncertainties about the taxonomic resolution of certain groups are common as morphological identification is based on gender-specific morphological traits. This makes identification often difficult because of isomorphism among phlebotomine sand fly of different species of the same sex, and because of the presence of species complexes (Cohnstaedt et al., 2011;Hebert, Cywinska, Ball, & de Waard, 2003;Testa, Montoya-Lerma, Cadena, Oviedo, & Ready, 2002). Furthermore, ecological niche modeling has suggested incongruency in vectors and disease distributions in Mexico (González et al., 2010), necessitating a review of vector identification techniques. Molecular identification techniques using standard mitochondrial markers have become a popular approach, especially the use of cytochrome c oxidase subunit I (cox1) for DNA barcoding (Hebert, Cywinska, et al., 2003). In this study, we present evidence of the suitability of the DNA barcoding approach to support phlebotomine sand fly identification in Mexico. We used the DNA barcode variability as a tool for molecular taxonomy in local sand fly population in southeast Mexico and provide baseline data towards the establishment of a phlebotomine sand fly barcode reference library in Mexico. The genetic relationship with other phlebotomine sand fly sequences from the new world was also investigated.

| Sample collection and identification
Samples were collected by the state health authorities as part of an entomological surveillance for monitoring transmission of diseases between October 2016 and February 2018 using CDC light and Shannon traps. Samples were stored in 70% ethanol and at −20°C prior to molecular processing. Species identification was carried out at the Centro de Investigación y Desarrollo en Ciencias de la Salud, Universidad Autónoma de Nuevo León (CIDICS-UANL) after DNA extraction. Phlebotomine sand flies were clarified and mounted in Euparal ® (Bioquip Products, Inc.) as permanent slide as described by Young and Duncan (1994) and Ibañez-Bernal (2005a). Morphological identification was carried out using published dichotomous keys (Ibañez-Bernal, 2005a, 2005bYoung & Duncan, 1994), phylogenetic classification of Galati (1995Galati ( , 2016, and the abbreviations for genera and subgenera proposed by Marcondes, (2007). Voucher specimens were deposited in the arthropod collection of CIDICS-UANL.

| DNA extraction, PCR, and sequencing
Genomic DNA extraction was carried out using a slightly modified nondestructive DNA extraction method as described by Truett et al. (2000). Briefly, whole insect bodies were put directly into individual 200 μl PCR tubes containing 20 μl of alkaline lysis buffer and frozen at −20°C for 5-6 hr. Afterward, the tubes were incubated in a PCR thermocycler for 30 min at 94°C and 4°C for 5 min to cool down. Samples were vortexed gently using the Genie 2 Vortex Mixer (Daigger Scientific), and 20 μl of the neutralizing buffer was added.
Samples were then spun briefly and stored at −80°C overnight for another freeze-thaw cycle before PCR processing. Insect samples were removed, put back in ethanol, and stored for morphological identification.
The reaction cycle consisted of an initial 1 min at 94°C, followed by a preamplification 5 cycles of 94°C for 1 min, 45°C for 1.5 min, 72°C for 1.5 min, an amplification step of 35 cycles of 94°C for 1 min, 57°C for 1.5 min, 72°C for 1.5 min with a final extension of 72°C for 5 min. PCR products were separated by electrophoresis in 1.5% agarose gel, and samples showing correct band size were purified using the QIAquick PCR purification kit and sequenced in both directions using the ABI PRISM ® BigDye ® Terminator sequencing kit (Applied Biosystems) at commercial sequencing facilities with the same primer pair.

| Sequence analysis
DNA sequences generated in both directions were edited manually using BioEdit sequence alignment Editor v.7.0.5.3 (Hall, 1999) and consensus sequences generated using the in-built ClustalW (Larkin et al., 2007). Multiple sequence alignment, base pair content, and coding positions analysis were completed in MEGA v.7 (Kumar, Stecher, & Tamura, 2016). Mean genetic distances, pairwise sequence divergences, and neighbor-joining (NJ) analysis were calculated using the Kimura 2-parameter (K2P) distance model with 1,000 bootstrap replicates (Saitou & Nei, 1987). The choice of K2P was to make results comparable with other DNA barcoding studies and because it provides conservative estimates of long branches than other models as it underestimates the number of multiple hits (Nei & Kumar, 2000).
The number of haplotypes, polymorphic sites, and nucleotide diversity were determined using DnaSP v6 (Rozas et al., 2017). org) and GenBank, was created and used in the analyses (Ibáñez-Bernal, 1999, 2000, 2001, 2003. Sequences that were submitted to databases from similar barcoding studies were given higher consideration over those not from a DNA barcoding study, and sequences less than 500 bp were excluded (Table S1).

Molecular species delimitation was accomplished using the
Automatic Barcode Gap Discovery (ABGD) software (Puillandre, Lambert, Brouillet, & Achaz, 2012). The minimum intraspecific distance (P min ) and maximum intraspecific distance (P max ) were limited to the default of 0.001 and 0.1, respectively, with the default barcode gap width of 1.5 and K2P model. shannoni, Dampfomyia (Coromyia) deleoni, Dampfomyia (Cor.) beltrani/ steatopyga, Bi. olmeca olmeca, and Brumptomyia mesai were identified (Table 1). Species discrimination of Da. beltrani/Da. steatopyga could not be accomplished because male specimens, which are needed to separate these species, were not collected in the present study.

| RE SULTS
The 44 cox1 sequences generated were uploaded to the BOLD database (www.bolds ystems.org) under the project "AAASF," and the sequences were also submitted to GenBank (accession numbers MK851242-MK851285). Final alignment of the 44 sequences obtained was 654 bp with 354 variable nucleotide positions, 234 conserved sites, and 91 parsimony informative sites. There was no stop codon, insertions, or deletions observed suggesting the absence of nuclear pseudogenes of mitochondrial origin (NUMTs). The average nucleotide compositions of the cox1 sequences were 37.5% T, 28.4% A, 18% C, and 16.1% G with mean AT richness of 65.9%. Individual species were represented between one and twenty individuals. All sequences had more T in the second and third codons than the first (Table S2). The overall mean genetic distance was 11.06%, and pairwise Kimura 2-parameter genetic distance ranged from 0% to 19.8% (Table S3). Intraspecific mean sequence divergence ranged between 0.12% and 1.22% (Appendix S1), while interspecific divergence ranged from 11.59% to 19.29% (Appendix S2). When the sequences obtained in this study were analyzed together, the highest intraspecific mean genetic distance of 1.22% was found in Lu. cruciata, followed by Pa. shannoni (1.13%). Also, 23 haplotypes were generated with a range of 1-8 haplotypes per species (Appendix S1). However, higher sequence divergence was observed when our dataset was compared with the other sand flies sequences from the new world downloaded from BOLD and GenBank. Mean intraspecific divergence ranged from 0% to 9.48%, with the highest divergence (9.48%) also found in Lu. cruciata (Table 2). High intraspecific divergence was also found in Br. mesai (9.12%), Pa. shannoni (5.47%), and Lu. longipalpis s.l (4.51%). Interspecific divergence ranged from 6% to 22.2% with the highest divergence between Psathyromyia (Forattiniella) carpenteri (Fairchild & Hertig 1953) and Da. beltrani/Da. steatopyga (Table 3).
The NJ tree using the 156 cox1 sequences dataset shows that conspecific individuals clustered together in most cases with high bootstrap support, and there was a clear separation among congeneric species (Figure 2). However, Psathyromyia (Psathyromyia)
Furthermore, Lu. longipalpis s.l., Lu. cruciata, and Pa. shannoni showed a deep split in the NJ tree which agrees with the high intraspecific genetic divergence observed in these taxa (Table 2).
Using the default ABGD settings, nine potential barcode gaps were identified with two without recursive partitions from the 44 sequences generated in the present study (Appendix S3). Barcode gap with prior intraspecific divergence values between 1% and 2.5% was considered for this study, to enable comparison with other barcoding studies and allow the use of the lower limit of the 2%-3% (Hebert, Cywinska, et al., 2003). Two values of barcode gaps were found within this range: 1.29% and 2.15%, and even though the initial partition in both values grouped species into 7, the recursive partition under TA B L E 1 List and location of phlebotomine sand flies analyzed in this study

| D ISCUSS I ON
The fundamental aim of DNA barcoding is to standardize molecular approach used in complementing morphological species identification, and this has been previously exploited in phlebotomine sand flies (Arrivillaga, Norris, Feliciangeli, & Lanzaro, 2002 USA, but barcoding was not with the main objective of the study.

Phlebotomine sand flies have been shown to exhibit A-T bias in their nucleotide composition, and the 66% A-T composition in this
study is consistent with similar results in Latin America (Azpurua et al., 2010;Contreras Gutiérrez, Vivero, Vélez, Porter, & Uribe, 2014;de Pinto et al., 2015) and India (Kumar, Srinivasan, & Jambulingam, 2012). We obtained a coherent matrix of DNA barcode sequences that differentiated all species collected without ambiguous identification. High interspecific divergence (>3%) was observed in both datasets, and these agree with the interspecific limit for insects as proposed by Hebert, Ratnasingham, & de Waard, 2003). Sequences from the seven species from the current study had a mean intraspecific divergence of <2% (Appendix S1) that is also within proposed limit of species for barcode studies (Hebert, Cywinska, et al., 2003). However, although a low mean intraspecific divergence was observed among sequences generated from the present study (Appendix S2), a much higher mean intraspecific divergence was observed in Lu. longipalpis, Lu. cruciata, Pa. shannoni, and Br. mesai when compared with sequences from other countries (Table 2). This could be due to varying geographical locations suggesting population differentiation, presence of cryptic species , and/ or possible cases of misidentification of the specimens of the cox1 sequences retrieved from GenBank, the latter being a more plausible explanation given that some of these cox1 sequences retrieved from GenBank were from unpublished studies (Table S1).
The intraspecific variability of the Lu. longipalpis s.l. population in the present study, though suggesting homogeneity with a mean divergence of 0.39%, and a maximum pairwise divergence of ~1%, produced eight haplotypes (Appendix S1) from two localities (Chunhuhub and Felipe C. Puerto). However, a higher divergence (4.51%) was observed when analyzed with sequences from Brazil, Honduras, and Colombia forming three clades in the NJ analysis ( Figure S1). This is consistent with extant literature that Lu.
High intraspecific divergence was also observed in Lu. cruciata and Pa. shannoni, which are also of potential medical importance in the Yucatan Peninsula (Pech-May et al., 2010, 2016. Pa. shannoni is a well-established species in Mexico with recent report of population divergence in southern Mexico (Florin & Rebollar-Téllez, 2013).
Our material originates from the same locality (Chetumal Othon P.
Blanco), and our results supported this hypothesis with a high maximum pairwise divergence of 4.5% (Appendix S1) and a deep split in the NJ tree ( Figure 2). This taxon is also the only one that groups with two partitions in ABGD analysis and has two BINs (BOLD:AAY4824 and BOLD:AAY4825) assigned. We, however, observed an intraspecific mean divergence of 1.13% that is within the established limit F I G U R E 2 Bootstrapped neighbor-joining (NJ) tree with 1,000 replicas showing the clustering pattern of sand flies species based on the barcoding region of the mitochondrial cox1 gene. Expanded tree is shown in Figure S1 Pa for species delimitation in barcoding studies (Hebert, Cywinska, et al., 2003;de Pinto et al., 2015) with relatively higher number of samples (n = 11). Availability of male and female samples in our materials also eliminated doubts of misidentification (Cohnstaedt et al., 2011;Florin & Rebollar-Téllez, 2013). Furthermore, there were no ambiguities in the NCBI BLAST analysis of sequences generated in this study (Table 1) Mexico, using other genetic markers and larger sample population, might be required to ascertain the composition of this complex. In addition, given that the vectorial competence of this species is still unresolved and the potential effect on the epidemiology of leishmaniasis in this endemic area is unknown (Bennett et al., 2002), this is an important issue for future research.
Although NJ analysis is essentially not a phylogenetic tool, it is an appropriate method for evaluating distances when combined with bootstrap analysis (Felsenstein, 1985). All individuals belonging to the same species grouped together and were supported by high bootstrap values. Congeneric groupings were also well-separated in the NJ tree supporting our morphological identifications.
Although cases of misidentification in DNA barcoding studies are not uncommon, this could have serious implications for end users of reference libraries (Collins & Cruickshank, 2013;Hernández-Triana et al., 2019). It appears that the incongruence observed in the NJ analysis for Pa. abonnenci ( Figure 2) seems to be one of such a case.
However, the inability to reidentify the vouchers specimens from which the sequences were generated due to lack of access and unavailability of Pa. abonnenci sequences from the current study does not allow us to make further comments on its identity. We believe, however, that these are separate species based on the clear interspecific divergence of 8.6% found between the Pa. shannoni and Pa. abonnenci sequences we analyzed (Table 3). In addition, Collins and Cruickshank (2013) suggested that NJ and other tree inference methods are indeed poor proxies to infer specimen identifications.
A similar occurrence can be found in the grouping pattern of Lu.
longipalpis and Lu. cruzi in a study in Brazil (de Pinto et al., 2015).
Furthermore, queries of the Pa. abonnenci sequences on NCBI and BOLD databases returned Pa. shannoni and Pa. bigeniculata, respectively, as the closest match with percentage identity <94%, which is low for concluding on definite species identification. Occurrence like this is likely to reduce as the reference library becomes more populated with additional sequences from sand flies species across the taxonomic spectrum of this group.
In contrast, high intraspecific divergence (Table 2) Figure 2) and showed a high mean intraspecific divergence of 9.12% and maximum pairwise intraspecific divergence of 15.56% (Table 2) compared to the 0.61% and 0.94% from the sequences generated in the current study (Appendix S1 BOLD also has high sequence divergence with sequences obtained in the current study (Table 2). However, all Lu. cruciata sequences clustered together in the NJ tree, albeit with a deep split (Figure 2).
Morphological examination of BOLD ID: HNLUZ014-17, based on the photograph uploaded in BOLD, is consistent with Lu. cruciata supporting the conclusion that the divergence observed could likely be due to genetic isolation as a result of differing geographical locations or the presence of cryptic species .
A species is considered as successfully delimited using ABGD when all its members belong to the same predicted group and no other sequences were added to it (Puillandre et al., 2012

| CON CLUS ION
In conclusion, our results are congruent with the argument that the DNA barcoding approach is a valuable tool for species identification sand flies. This study augmented available DNA barcoding data for phlebotomine sand fly species and provided three unique BINs that were not previously found in BOLD, contributing toward the establishment of a reliable reference DNA barcode library for phlebotomine sand fly identification in Mexico. Certain taxa might, however, require additional genetic markers in addition to cox1 for correct delimitation. Limited representation of species from different geographical regions in Quintana Roo and Mexico in the present study also warrants an expanded study to provide a comprehensive national barcode reference library for phlebotomine sand flies species in the region.

ACK N OWLED G M ENTS
We thank the staff of Centro Nacional de Programas Preventivos y Control de Enfermedades (CENAPRECE) that assisted with the sample collection conducted by the State Ministry of Health.

CO N FLI C T O F I NTE R E S T S
The authors declare that they have no competing interest.

AUTH O R S ' CO NTR I B UTI O N
AAA and MAR-P designed and conceived the study. NAF-S, NT-G, HH-J, PCM-A, WAP-P, and JJR-R collected and identified the samples. AAA and NAF-S did the molecular analysis. AAA and LMH-T interpreted the data. NT-G, HH-J, and MAR-P coordinated the study.
AAA wrote the initial manuscript draft. All authors read and approved the final manuscript.

E TH I C S A PPROVA L A N D CO N S E NT TO PA RTI CI PATE
The study involved the use of adult sand flies collected as part of regular entomological surveillance by local health authorities. No ethics committee approval is needed for such work.

CO N S E NT FO R PU B LI C ATI O N
Not applicable.

DATA AVA I L A B I L I T Y S TAT E M E N T
All sequences generated in the study and information about additional sequences downloaded from GenBank and BOLD databases are provided in Table S1. Data generated from the study have been deposited and available in GenBank with the accession numbers MK851242-MK851285 and on BOLD under the project AAASF.