Effective mosquito and arbovirus surveillance using metabarcoding

Abstract Effective vector and arbovirus surveillance requires timely and accurate screening techniques that can be easily upscaled. Next‐generation sequencing (NGS) is a high‐throughput technology that has the potential to modernize vector surveillance. When combined with DNA barcoding, it is termed ‘metabarcoding.’ The aim of our study was to establish a metabarcoding protocol to characterize pools of mosquitoes and screen them for virus. Pools contained 100 morphologically identified individuals, including one Ross River virus (RRV) infected mosquito, with three species present at different proportions: 1, 5, 94%. Nucleic acid extracted from both crude homogenate and supernatant was used to amplify a 269‐bp section of the mitochondrial cytochrome c oxidase subunit I (COI) locus. Additionally, a 67‐bp region of the RRV E2 gene was amplified from synthesized cDNA to screen for RRV. Amplicon sequencing was performed using an Illumina MiSeq, and bioinformatic analysis was performed using a DNA barcode database of Victorian mosquitoes. Metabarcoding successfully detected all mosquito species and RRV in every positive sample tested. The limits of species detection were also examined by screening a pool of 1000 individuals, successfully identifying the species and RRV from a single mosquito. The primers used for amplification, number of PCR cycles and total number of individuals present all have effects on the quantification of species in mixed bulk samples. Based on the results, a number of recommendations for future metabarcoding studies are presented. Overall, metabarcoding shows great promise for providing a new alternative approach to screening large insect surveillance trap catches.

Metabarcoding enhances vector surveillance by allowing the rapid identification of large numbers of insects, which could also be simultaneously screened for pathogens. Furthermore, due to the correlation between NGS reads and species abundance, metabarcoding has the potential to quantify the number of insects in a trap (Amend, Seifert, & Bruns, 2010;Carew et al., 2013;Zhou et al., 2013). The utility of metabarcoding in surveillance is dependent on its sensitivity and specificity, and also what DNA barcode databases are available.
Reference databases containing DNA barcodes corresponding to known species are essential for metabarcoding data analysis. The Barcode of Life Data Systems (BOLD) is the most comprehensive reference database currently available for insects and is regularly integrated with barcode data from the popular sequence depository GenBank (Benson, Karsch-Mizrachi, Lipman, Ostell, & Sayers, 2009;Ratnasingham & Hebert, 2007). Almost 170,000 insect species have been barcoded to date (accessed July 2016); however, this is only a small fraction of the estimated 5,000,000 insect species in the world (Chapman, 2009). Sequences belonging to unknown taxa are a common problem in environmental DNA barcoding (Cowart et al., 2015;Pawlowski et al., 2011), highlighting the importance of a comprehensive reference database prior to commencing a metabarcoding study.
The most commonly used marker for insect metabarcoding is cytochrome c oxidase subunit I (COI), although cytochrome B (CytB) and 16S rDNA have also been used (Carew et al., 2013;Kocher et al., 2017).
Mitochondrial genes such as COI and CytB are popular targets for barcoding due to their high copy number, conserved regions and lack of noncoding regions (Lin & Danforth, 2004;Yu et al., 2012). Certain nuclear genes can increase taxonomic resolution; however, they can be more difficult to amplify and are not well represented in reference databases. Using more than one marker is often beneficial in metabarcoding as it can alleviate biases in sequence recovery and improve the detection of low-frequency species (Carew et al., 2013;Gibson et al., 2014). Diagnostic markers can also be added to screen for pathogens, thereby allowing the simultaneous detection of vector species and pathogens in mixed bulk samples.
Metabarcoding also benefits from the use of smaller barcoding regions. While the universal barcoding region for COI is typically 650 bp long, regions as small as 100 bp have been shown to successfully identify specimens to species (Meusnier et al., 2008). These 'mini-barcodes' are more efficiently amplified in degraded samples and are easily sequenced by NGS technology (Hajibabaei, Shokralla, Zhou, Singer, & Baird, 2011). Due to the short read lengths of some NGS platforms, the size of the metabarcoding marker must be appropriate for the sequencing system being used. For instance, the Illumina HiSeq 2000 can only produce 100-bp single-end reads (Shokralla, Spall, Gibson, & Hajibabaei, 2012). Paired-end reads can improve coverage of longer fragments and are joined during data analysis.
This study investigates the utility of metabarcoding in surveillance, using mosquitoes as an example. Mosquitoes are important vectors of disease and the targets of surveillance programmes worldwide (Eldridge, 2000). Vector surveillance programmes monitor the detection and abundance of indigenous species, as well as exotic and invasive species such as Aedes aegypti and Aedes albopictus (Kraemer et al., 2015), and are often coupled with arbovirus detection (Knope et al., 2013). Usually, the mosquito pools used for virus screening are small (<50 specimens) due to sensitivity issues associated with cell culture (Almeida et al., 2008;Ochieng et al., 2013).
Furthermore, a subsampling system is often employed when a surveillance trap contains >1,000 mosquitoes due to the labour involved in identifying and screening large numbers of mosquitoes (Janousek & Olson, 2006). The development of a metabarcoding pipeline to upscale this process would significantly enhance vector surveillance programmes. Schneider et al. (2016) describe a metabarcoding protocol to detect mosquito species in water samples; however, an approach on whole mosquitoes from traps that also includes virus detection has yet to be performed. This study utilizes metabarcoding to determine the species composition (meaning both the presence and abundance) of bulk mosquito samples, while also screening them for an arbovirus of public health significance: Ross River virus (RRV, Family Togaviridae Genus Alphavirus). The effect of laboratory protocol variables on the identification and quantification of mosquito species present in bulk samples and also on virus detection is specifically examined. These variables include sample centrifugation during DNA extraction, subsampling, primer design, PCR cycles and sample size.

| Mosquito collection
Three mosquito species were used in this study: Aedes camptorhynchus, Anopheles annulipes and Aedes notoscriptus. The first two species were collected from surveillance traps and identified as part of the Victorian Arbovirus Disease Control Program (VADCP) using previously described methods (Batovska, Blacket, Brown, & Lynch, 2016a). Both of these species were trapped in Lake Wellington, Victoria in January 2015. The Ae. notoscriptus specimens were obtained from the Queensland Institute of Medical Research (QIMR) Berghofer colony, which was established from wild-caught material in Brisbane, Queensland in 2015. These mosquitoes were kept in an insectary at 25°C,~80% relative humidity and 12/12-hr photoperiod.
Larvae were maintained with finely ground fish food (Tetramin Tropical Fish Food, Tetra), and adults were fed with 10% sucrose solution, ad libitum. BATOVSKA ET AL. | 33 2.2 | Infecting mosquitoes with Ross River virus Ross River virus strain QML1 was isolated from human serum collected from Queensland, Australia (Jones, Lowry, Aaskov, Holmes, & Kitchen, 2010). Female Ae. notoscriptus mosquitoes (6-7 days old) were starved for 24 hrs and then transferred to 1-L plastic feeding cups. Frozen stock of RRV (10 6.8 TCID 50 /ml) was rapidly thawed and diluted 1:10 in defibrinated sheep's blood. Mosquitoes were fed with blood-virus mixture maintained at 37°C for 2 hrs by glass membrane feeders (Rutledge, Ward, & Gould, 1964). After feeding, mosquitoes were anaesthetized with CO 2 , and only fully engorged mosquitoes were maintained in new plastic feeding cups with 10% sucrose.
At 10 days postinfection, the mosquitoes were anaesthetized, and the bodies and legs of individuals were separated. Leg samples were tested for RRV using a cell culture enzyme-linked immunosorbent assay (Oliveira et al., 1995) at QIMR Berghofer, Queensland. Of the 125 Ae. notoscriptus mosquitoes fed blood-virus mixture, 23 tested positive for RRV and these were sent to AgriBio, Victoria for use in the bulk pool samples. Finally, to test the sensitivity of the method, a larger pool was prepared consisting of: 974 Ae. camptorhynchus, 25 An. annulipes and 1

| Mosquito bulk pool preparation
Ae. notoscriptus mosquitoes. These pools are referred to as the 100 virus positive pool, 100 virus negative pool and 1000 pool, respectively (Table 1).

| Nucleic acid extraction
The 100 virus positive and negative pools were homogenized in 1 ml of growth medium (Minimal Essential Medium (Life Technologies) supplemented with 7% v/v foetal bovine serum (Sigma Aldrich), 15 mM HEPES (Life Technologies), 100 lg/ml benzyl penicillin (CSL) and 5 lg/ml streptomycin B (Sigma Aldrich)) using 10 3 mm acidwashed beads (Livingstone) and a 2010 Geno/Grinder (SPEX Sam-plePrep) at 1500 rpm. To investigate the effect on virus detection, four 200 ll subsamples were prepared: two were extracted from the crude homogenate, the other two were extracted from the supernatant after centrifugation for 1 min at 5,000 g. The remaining 200 ll homogenate could not be separated from the mosquito fragment and was not processed further.
Nucleic acid (including both DNA and RNA) was extracted from each of the 200 ll subsamples using 500 ll QuickExtract DNA Extraction Solution (Epicentre) according to the manufacturers' instructions.
The dsDNA concentration of the subsamples was quantified using a NanoDrop 1000 spectrophotometer (Thermo Scientific), and they were diluted to 40 ng/ll. All subsamples were stored at -20°C.
The 1000 pool sample was extracted as stated above, with the following alterations: 35 acid-washed beads were used instead of 10; 10 ml of growth medium was added instead of 1 ml; and only a single 200 ll subsample was taken from the crude homogenate and the supernatant instead of duplicates (Table 1).
T A B L E 1 Treatments and cycle numbers for each subsample within the three bulk mosquito pools. The COI read counts for species detected in each subsample are also shown 2.5 | Primer design and amplicon production All primers listed below were created with tails attached to them at the 5 0 end: forward tail 5 0 -ACACTCTTTCCCTACACGACGCTCTTCCGATCT; reverse tail 5 0 -GACTGGAGTTCAGACGTGTGCTCTTCCGATCT. These tails serve as adapter primer sites during amplicon library preparation.
The primer was positioned in a conserved region of the COI gene, ensuring that the 269-bp fragment created was located entirely within the barcode region used in the database. Degenerate bases were placed in the 3 0 half of the primer to account for some of the known differences found between the 27 species. A neighbour-joining tree of the 269-bp region in COI was generated using sequences from the 27 species, and it was found that the differentiation between species was equivalent to what was found in Batovska et al. (2016a).  (Table 1).
For RRV, the primer pair RRVE2F and RRVE2R (Hall, Prow, & Pyke, 2011) was used, also with tails attached to the 5 0 end. Prior to amplicon production, each extraction was reverse transcribed using the SuperScript III First-Strand Synthesis System (Invitrogen). The tailed RRVE2R primer was used for the cDNA synthesis reaction with 6.5 ll of undiluted extract. All manufacturer's instructions were followed; however, RNaseOUT was omitted from the protocol due to the high concentration of input RNA. The cDNA was used to amplify a 67-bp region of the RRV E2 gene with the tailed RRV primers. The PCR reaction composition and programme is the same as above, with 40 cycles being used each time. The RRV amplicons were verified on a 2% w/v agarose gel.

| Amplicon sequencing and data analysis
The COI and RRV amplicons were purified using AMPure XP beads (Beckman Coulter) at a 2 9 beads ratio. PCR was used to attach Demultiplexed reads were quality trimmed using previously described parameters (Batovska, Cogan, Lynch, & Blacket, 2016b), and adaptors were removed using CUTADAPT version 1.9 (Martin, 2011

| Species detection in mosquito pools
Both Aedes camptorhynchus and Anopheles annulipes were detected at varying proportions in all pooled samples (Tables 1 and 2

| Species detection and quantification
This study corroborates the use of metabarcoding for species identification in bulk mosquito samples. Metabarcoding was able to detect all mosquito species (i.e., up to 3) present in each bulk sample tested, including the 1000 pool. Furthermore, species composition was consistent in subsamples (Figure 2), suggesting that one subsample from a bulk sample may be sufficient to characterize the species present.
However, due to the sensitivity of NGS and the possibility of laboratory contamination, it is recommended multiple replicate subsamples are used to improve accuracy. At a minimum, samples with unexpected results should be re-tested. It is also recommended that the bulk sample supernatant be subsampled rather than the crude homogenate, as they have no observable difference in species composition ( Figure 2), and it prevents the transfer of large mosquito fragments.
The results indicate that even though sequence read number is correlated with mosquito abundance, it is not a completely accurate measure of species frequency (Table 2). This finding is consistent with other metabarcoding studies (Carew et al., 2013;Elbrecht & Leese, 2015;Piñol, Mir, Gomez-Polo, & Agust ı, 2015;Porazinska et al., 2009). Currently in surveillance programmes, the abundance of large trap catches is often estimated using weight, which can be affected by a number of variables including the size of the insects and the level of humidity in the catch (Kesavaraju & Dickson, 2012).
Overview of the metabarcoding method used to determine the species composition of mosquito pools and screen them for a virus. The bioinformatic pipeline shown is performed for each set of amplicon reads for each sample As such, the approximate abundance derived from metabarcoding would still be useful for surveillance programmes.
While species abundance affects sequence read number, the number of PCR cycles used also has an effect on read numbers.
Raising PCR cycle number increased the proportion of reads for the single Aedes notoscriptus specimen in the 1000 pool sample, thereby improving sensitivity, while in the 100 pool samples, it led to a decrease in the proportion of Anopheles annulipes reads (Figure 3).
The inconsistent effect of PCR cycle number could be a result of primer bias. The reverse primer used in this study was specifically designed for mosquitoes, with the inclusion of degenerate bases to account for much of the interspecies variability. However, the universal LCO fragment was used as the forward primer (Folmer et al., 1994) and could have resulted in unequal species amplification due to mismatches. The use of universal primers for amplification of a range of species can cause variance in read number among taxa by several orders of magnitude (Elbrecht & Leese, 2015;Piñol et al., 2015). Approaches to reducing primer bias in metabarcoding studies include testing newly designed primers with in silico PCR to estimate taxonomic coverage (Clarke, Soubrier, Weyrich, & Cooper, 2014); using more conserved barcode regions (Kocher et al., 2017); designing multiple sets of primers that match specific taxonomic groups (Gibson et al., 2014); and utilizing PCR-free shotgun sequencing pipelines (Zhou et al., 2013). Each of these methods has limitations, and the metabarcoding approach chosen needs to be based on the sensitivity and taxonomic resolution required for the target species.

| Pathogen detection
In addition to species, metabarcoding also detected RRV in every pool containing an infected mosquito, indicating the potential of this method to improve not only species identification in surveillance programmes, but also virus detection. Ross River virus was detectable in both 100 and 1000 pools, highlighting the sensitivity of the method, and detection was not affected by centrifugation of samples prior to extraction. These results encourage testing for a wider range of arboviruses using group-reactive primers ( (Beerenwinkel, G€ unthard, Roth, & Metzner, 2012

| Future directions
Based on these results, there are a number of considerations that need to be addressed in future metabarcoding studies (Table 3). Factors such as primer design, nucleic acid extraction, PCR cycle number and analytical thresholds all need to be incorporated into the study design to achieve optimal surveillance results.
Further research is required to test pooled samples containing a wider range mosquito species, in order to better reflect the diversity of species seen in surveillance programmes. Additionally, it would be useful to expand the pathogen screening to include other arboviruses and arboviral families. The metabarcoding method could also be extended to other vector species, helping to improve regular surveillance programmes, and biosecurity response in the event of an exotic incursion (Comtet, Sandionigi, Viard, & Casiraghi, 2015).
As the number of mosquito reference barcodes increases, the databases used for the BLASTN search should include mosquito species from other regions, which could help to detect the spread of species to new geographic areas. Other data analysis methods could also be trialled with the mosquito metabarcoding data, such as clustering the reads into operational taxonomic units (OTUs), thereby allowing the detection of exotic species through the formation of novel clusters (Ji et al., 2013;Yu et al., 2012).
To improve the sensitivity and specificity of species characterization, the use of numerous barcoding markers could be explored. The use of additional markers relies on the establishment of more comprehensive and centralized databases, based on accessible, curated specimens (e.g., the ITS2 locus in Batovska et al., 2016b).
Sequence capture technology is an efficient approach to metabarcoding with multiple markers (Bragalini et al., 2014;Campana et al., 2016). This method involves using many sequence probes to capture species-specific sequences, with one recent study designing 3,901 probes for a single NGS library preparation (Campana et al., 2016). The use of one reaction decreases the amount of required DNA, cost and time. Sequence capture could also potentially improve the bias found in metabarcoding where only one primer region is used, therefore requiring a lower degree of conservation among taxa. Capture technology could also be used to expand the breadth virus screening (Blouin et al., 2016).
When combined with other emerging technologies such as the Nanopore MinION, metabarcoding may be further utilized in biosecurity by offering "handheld barcoding" in the field (Bleidorn, 2016).
This would allow real-time trap characterization and pathogen detection without having to send samples back to a laboratory, which would be particularly useful during exotic incursions or viral outbreaks (Greninger et al., 2015).

| CONCLUSION S
The results of this study suggest metabarcoding is useful for the species characterization and pathogen detection of mosquito trap catches. We successfully detected a single RRV-infected mosquito in T A B L E 3 Recommendations for metabarcoding projects based on the results from this study

Experimental factor Recommendation
Primer design Design taxon-specific primers for the target organisms to maximize species recovery by reducing amplification bias.

DNA extraction
After homogenization, centrifuge the bulk sample and subsample from the supernatant. This allows for a more homogeneous sample and does not appear to affect species or pathogen detection PCR cycle number Optimize the PCR cycle number to the bulk sample size for all target species using known samples. Choose the cycle number that provides the required sensitivity and best quantification Analytical thresholds When sequencing multiple samples in one run, apply a read count threshold for positive results when analysing the data to account for possible imperfect demultiplexing a pool of 1000, emphasizing the sensitivity of the method and its utility in surveillance and biosecurity operations. While the accurate quantitative function of metabarcoding is currently limited due to the biases involved in primer use and PCR, we were still able to provide relative abundance information for the species contained in the pools. Due to the multiplexing capability of NGS technology, metabarcoding can lower the cost and time associated with sample processing in vector surveillance programmes (Ji et al., 2013). The efficiency of metabarcoding could be further improved with the addition of new technologies such as sequence capture and handheld sequencing. Metabarcoding shows great promise for modernizing vector surveillance programmes and improving the detection of pathogens of biosecurity significance.