FastD: Fast detection of insecticide target‐site mutations and overexpressed detoxification genes in insect populations from RNA‐Seq data

Abstract Target‐site mutations and detoxification gene overexpression are two major mechanisms conferring insecticide resistance. Molecular assays applied to detect these resistance genetic markers are time‐consuming and with high false‐positive rates. RNA‐Seq data contains information on the variations within expressed genomic regions and expression of detoxification genes. However, there is no corresponding method to detect resistance markers at present. Here, we collected 66 reported resistance mutations of four insecticide targets (AChE, VGSC, RyR, and nAChR) from 82 insect species. Next, we obtained 403 sequences of the four target genes and 12,665 sequences of three kinds of detoxification genes including P450s, GSTs, and CCEs. Then, we developed a Perl program, FastD, to detect target‐site mutations and overexpressed detoxification genes from RNA‐Seq data and constructed a web server for FastD (http://www.insect-genome.com/fastd). The estimation of FastD on simulated RNA‐Seq data showed high sensitivity and specificity. We applied FastD to detect resistant markers in 15 populations of six insects, Plutella xylostella, Aphis gossypii, Anopheles arabiensis, Musca domestica, Leptinotarsa decemlineata and Apis mellifera. Results showed that 11 RyR mutations in P. xylostella, one nAChR mutation in A. gossypii, one VGSC mutation in A. arabiensis and five VGSC mutations in M. domestica were found to be with frequency difference >40% between resistant and susceptible populations including previously confirmed mutations G4946E in RyR, R81T in nAChR and L1014F in VGSC. And 49 detoxification genes were found to be overexpressed in resistant populations compared with susceptible populations including previously confirmed detoxification genes CYP6BG1, CYP6CY22, CYP6CY13, CYP6P3, CYP6M2, CYP6P4 and CYP4G16. The candidate target‐site mutations and detoxification genes were worth further validation. Resistance estimates according to confirmed markers were consistent with population phenotypes, confirming the reliability of this program in predicting population resistance at omics‐level.


| INTRODUC TI ON
Insect pests have a great impact on many aspects of human life.
Among all of these aspects, the harm to human health and the yield loss in agricultural production are the most concerning. To make matters worse, some insects serve as medium of pathogens, spreading diseases and causing damage simultaneously. For example, Anopheles gambiae spread malaria and caused millions of deaths annually in Africa (Consortium, 2017). As for agricultural production, the estimated yield loss of crops due to insect pests is over 18% globally (Oerke, 2005).
Although there are many insect pest control methods available, application of insecticides is still one of the most frequently used methods. Chemical insecticides were first introduced to control insect pests in the 1940s. Since then, thousands of insecticides have been developed to protect human health and crops. Unfortunately, long-term mismanagement of insecticide application led to the development of insecticide resistance within insect pest populations.
So far, more than 553 insect species have been reported to have developed resistance to approximately 331 insecticides (Gould et al., 2018). The development of insecticide resistance necessitates the application of higher dosages of said insecticide for controlling insect pests, which in turn causes more serious threats to human and environmental health (Kim et al., 2017;Tang et al., 2018). Insecticide resistance has become one of the most formidable obstacles in insect pest control (Gould et al., 2018).
Insecticide target-site mutations and overexpression of detoxification gene(s) are two major mechanisms conferring insecticide resistance (Ffrench-Constant, 2013). Due to long-term selection by insecticides, the individuals containing resistance-associated genotypes rapidly accumulate within populations. Generally, insecticide resistance of insect pest populations can be predicted according to the prevalence of resistance markers including target-site mutations and overexpressed detoxification genes (Sonoda, 2010). To date, most resistance cases occurred within five classes of insecticides: organophosphates, pyrethroids, carbamates, neonicotinoids and diamides (Thomas & Ralf, 2015). According to the modes of action listed by the Insecticide Resistance Action Committee (IRAC), organophosphates and carbamates target acetylcholinesterases (AChE), pyrethroids target voltage gated sodium channels (VGSC), diamides target ryanodine receptors (RyR) and neonicotinoids target nicotinic acetylcholine receptor (nAChR). In addition, metabolic resistance of these five classes of insecticides are mainly associated with three important detoxification gene families: cytochrome P450 (P450), glutathione S-transferase (GST) and carboxyl/cholinesterases (CCE) (Yan et al., 2012).
Detecting target-site mutations and overexpressed detoxification genes within insect pest populations has long been a useful method in monitoring resistance. Many methods have been developed to detect target mutations such as PCR amplification of specific alleles (PASA) (Yan et al., 2014) and random amplified polymorphic DNA (RAPD) (Ferguson & Pineda, 2010). DNA microarray has been used to detect overexpressed detoxification genes (Mavridis et al., 2019).
However, these methods are inefficient and time-consuming.
RNA-Seq data contains information allowing detection of single-nucleotide polymorphisms (SNPs) and gene expression levels (Costa et al., 2010). Thus, RNA-Seq data can be used to detect resistance markers including target-site mutations and overexpressed detoxification genes (Bonizzoni et al., 2015;De Wit et al., 2015).
Here, to monitor the resistance of the aforementioned five classes of insecticides, we collected reported target-site mutations, target gene allelic sequences and three groups of detoxification gene sequences from 82 insect species, and then developed a program,

| Resistance-associated gene sequences
We collected corresponding gene sequences from 82 insect species: 26 Hymenopterans, 21 Dipterans, 14 Lepidopterans, 10 K E Y W O R D S detoxification genes, insecticide resistance, RNA-Seq, target-site mutations, webserver Hemipterans, 6 Coleopterans, and 5 of other orders from NCBI and InsectBase (Yin et al., 2016). According to the two main mechanisms of insecticide resistance, resistance-associated genes generally include two types: target genes and detoxification genes. To collect target gene sequences, we downloaded the confirmed full cDNA sequences of VGSC, AChE, RyR, and nAChR from InsectBase.
Next, these confirmed target gene sequences were used as queries to BLAST against the NCBI GenBank for each target of each species. The first search step obtained target sequences for species from most orders. Then, we selected the obtained target sequences from species with annotated genome as the secondary queries to search against other species within the same order. These two step searches yielded most sequences of four targets in the tested species. For species still without target sequences, we used the target sequences from the closely related species as the tertiary queries to search against the genome of this species.
To collect detoxification gene sequences of different species as comprehensively as possible, genome official gene set (OGS) files for the species were downloaded. Then, we selected all the sequences annotated as "cytochrome P450" or "glutathione S-transferase" or "carboxyl/cholinesterase". For some important insect species without published genome OGSs, the detoxification gene sequences were obtained by directly searching against NCBI nucleotide database with terms: (((cytochrome P450) OR glutathione S-transferase) OR carboxyl/cholinesterase) AND "species name" [Organism].

| RNA-Seq datasets
We searched the NCBI SRA database with the term "insecticide resistance," yielding a total of 94 RNA-Seq datasets from 28 populations of 12 insect species. Among these, 51 RNA-Seq datasets from 15 populations of 6 insect species were submitted with resistance phenotypes. We downloaded the datasets from these populations, including nine RNA-Seq datasets from three P. xylostella populations (CHS, ZZ, and CHR), six RNA-Seq datasets from two A. gossypii populations (NS and KR), 16 RNA-Seq datasets from four A. arabiensis populations (Mozambique, Asendabo, Chewaka, and Tolay), eight RNA-Seq datasets from three M. domestica populations (aabys, KS8S3, and ALHF), six RNA-Seq datasets from two L. decemlineata populations (RS and SS), and six RNA-Seq datasets from A. mellifera for further analysis. CHS was a susceptible population while ZZ and CHR populations were resistant to chlorantraniliprole with a resistance level 42-fold and 65-fold (Zhu et al., 2017), respectively. NS was a susceptible population while KR was resistant to neonicotinoids with a resistance level 23.8 to 394-fold (Hirata et al., 2017).

Mozambique (MOZ) was a susceptible population while Asendabo
(ASN), Chewaka (CHW), and Tolay (TOL) were resistant to deltamethrin and DDT (Simma et al., 2019). aabys was a susceptible population. KS8S3 was a multi-resistant population (Reid et al., 2019). And ALHF was resistant to permethrin more than 1,800-fold compared with aabys population (Li, Reid, et al., 2013). SS was susceptible population while RS was resistant to imidacloprid. Three A. mellifera RNA-Seq datasets were sequenced after selection of imidacloprid, while the other three RNA-Seq datasets were set as controls.

| Preparing FastD inputs
To identify target-site mutations and detoxification genes associated with resistance phenotypes, clean reads from RNA-Seq datasets should be mapped to the corresponding target gene sequence and all detoxification gene sequences. For the detection of markers conferring resistance to chlorantraniliprole in P. xylostella populations, clean reads were mapped to RyR gene and detoxification gene sequences. For the detection of markers conferring resistance to neo-

| Simulation
First, we downloaded the genome annotation of model species, Drosophila melanogaster from NCBI. The resistance-associated genes of four abovementioned targets and three detoxification gene families were selected as reference. To generate artificial target gene reads, we used Flux-simulator v1.2.1 (Griebel et al., 2012) to simulate the process of RNA sequencing. The simulated reads were then mapped to target gene sequences using bowtie2 (Langdon, 2015) to generate alignment files (sam format). Then, the sam files were transferred into bam format using samtools v1.9 (Li et al., 2009). To simulate variants in alignments, we used BAMsurgeon v1.0 (Ewing et al., 2015) to insert 600 random single-nucleotide variants with random frequencies into bam files and transferred bam files back to sam format. At last, the sam files were submitted to FastD-TR, and we called variants and calculated their allele frequencies. We compared the detected variants with the inserted random variants and calculated the sensitivity and specificity of the detected inserted variants. The sensitivity was calculated as the ratio of the number of detected inserted variants to all inserted variants, and the specificity was defined as the ratio of the number of detected inserted variants to the number of all detected variants.
To test the accuracy of differential expression analysis of FastD-MR, we used polyester v1.24.0 (Frazee et al., 2015) to simulate RNA-Seq datasets with differential gene expression. We selected 172 detoxification genes in D. melanogaster as reference and set the expression fold changes for all genes between two groups each with three replicates. Then, all generated reads from RNA-Seq datasets were mapped to reference using bowtie2 (Langdon, 2015). The generated alignment files (sam format) were submitted to FastD-MR to calculate the fold change of each gene between two groups. We compared the detected expression fold changes with the set expression fold changes.

| Resistance-associated target-site mutation profiles
By searching against the NCBI PubMed database, we obtained 440 articles reporting resistance to organophosphates and carbamates associated with target-site mutations of AChE, 368 articles reporting resistance to pyrethroids associated with target-site mutations of VGSC, 32 articles reporting resistance to diamides associated with target-site mutations of RyR, and 81 articles reporting resistance to neonicotinoids associated with target-site mutations of nAChR.
Among these published target-site mutations, 20 target-site mutations at 17 sites on AChE were distributed among 36 insect species (Table S1); 46 target-site mutations at 29 sites on VGSC were distributed among 39 insect species (Table S2); 6 target-site mutations at 4 sites on RyR were distributed among 4 insect species (Table S3); 4 target-site mutations at 4 sites on nAChR were distributed among 4 insect species (Table S4)

| Resistance-associated gene sequences
In total, we collected 403 insecticide target gene sequences, includ-

| The workflow of FastD program
There are two parts in the FastD program, FastD-TR (Fast Detection of Target-site Resistance) to detect target-site mutations and FastD-MR (Fast Detection of Metabolic Resistance) to detect overexpressed detoxification genes.
The workflow of FastD-TR consists of six main steps: preprocessing, mapping, SNP calling, differential SNP identification, translation and visualization ( Figure 2). To detect previously unknown target-site mutations associated with insecticide resistance, both the resistant case samples and the susceptible control samples were required for the analysis. Besides, it was optional for the detection of confirmed resistance-associated target-site mutations. First, raw reads from RNA-Seq data of case and control samples should take quality control to filter out adapters and reads with low sequencing quality. The obtained clean reads were then mapped to the target gene sequences using bowtie2 (Langdon, 2015) with additional option, --no-unal (filter out unaligned reads), to generate a Sequence Alignment/Map (SAM) file (Li et al., 2009). According to the POS tag of each reads, the nucleotide corresponding to the position on reference gene were extracted for both the case and control samples by a Perl script. For each position including more than one types of corresponding nucleotides and with reads coverage ≥30 was treated as SNP. Next, allele frequency for each SNP was calculated and compared between case and control samples. SNP with allele frequency difference between case and control samples ≥40% in either direction was treated as differential SNP. Then, the codons at differential SNP positions were translated into amino acid residues. Only the nonsynonymous differential SNPs were selected as potential target-site mutations. An R script named ggseqlogo (Wagih, 2017) F I G U R E 2 Workflow of the two FastD pipelines, FastD-TR and FastD-MR.
Fast-TR was used to detect target-site mutations and FastD-MR was used to detect overexpressed detoxification genes. The workflow of FastD-TR was illustrated in the left part of the figure, and the workflow of FastD-MR was illustrated in the right part.

| Webserver
A webserver (http://www.insect-genome.com/fastd) was constructed to provide online services. The Apache HTTP server (Version 2.4.6) runs on a CentOS Linux 7.4.1708 (core) system.
The Web pages were written in HTML and Cascading Style Sheets

| Evaluation of FastD performance based on simulated data
In the simulated RNA-Seq reads of target gene, we randomly inserted 600 single-nucleotide variants into protein coding regions of four target genes and compared the detected variants with the inserted variants. Due to the low coverage of some loci, only 523 variants were inserted successfully. Using FastD-TR, we detected 469 (89.7%) variants among the inserted variants. We accessed calling performance using AUC (area under curve) in ROC (receiver operating characteristic) curve (Figure 3). ROC with a AUC of 0.870 indicated a reliable calling performance. We compared the detected allele frequencies of detected variants with their set allele frequencies. We found that the allele frequencies calculated by FastD-TR were highly correlated with their "true" allele frequencies (R 2 = .834; p < 10 -16 ). In the RNA-Seq reads simulation of detoxification gene, we set expression fold changes for 172 detoxification genes between two groups as "true" fold changes. The detected expression fold changes were compared with their true fold changes.
We found a high correlation between fold changes calculated by FastD-MR and fold changes set by polyester (R 2 = 0.929; p < 10 -16 ).
The simulation results showed the high sensitivity and specificity of FastD.

| RyR mutations and overexpressed detoxification genes in diamondback moth
We used FastD-TR to detect RyR mutations between a resistant population CHR and a susceptible population CHS, and between another resistant population ZZ and CHS. The results showed that there were 12 target-site mutations detected in CHR versus CHS group, and there were eight target-site mutations detected in ZZ versus CHS group. Among these target-site mutations, G4946E, a previously confirmed resistance-associated target-site mutation (Troczka et al., 2012), was detected in CHS, CHR and ZZ with frequency of 2.55%, 94.68% and 65.82%, respectively (Table 1).
However, FastD-MR searching showed that no detoxification gene was expressed more than twofold (|log2FoldChange|> 1, p value < .01) higher in the CHR compared with CHS. In contrast, six detoxification genes were overexpressed in the ZZ compared with CHS (Table 2). Among these genes, a confirmed resistance-associated gene CYP6BG1, which was reported to be associated with resistance to chlorantraniliprole in P. xylostella , had elevated expression levels by 3.3-fold in ZZ population compared with CHS population.

| nAChR mutations and overexpressed detoxification genes in cotton aphid
We applied FastD-TR to detect nAChR mutations between a resistant population KR and a susceptible population NS, showing that only one nAChR mutation was detected on the beta1 of nAChR. A previously confirmed mutation R81T on the beta1 subunit of nAChR was detected with a frequency of 49.85% in KR but not in NS (Table 3). By using FastD-MR, nine detoxification genes were detected with elevated expression levels more than twofold (|log2FoldChange|> 1, p value < .01) in the KR compared with NS (Table 4). Among these genes, CYP6CY22 and CYP6CY13, which were reported to be associated with neonicotinoids resistance (Hirata et al., 2017), had elevated expression levels by 39.61and 22.04-fold in the resistant KR compared with susceptible NS population, respectively.

| VGSC mutations and overexpressed detoxification genes in A. arabiensis
We applied FastD-TR to detect VGSC mutations in three com-  (Table 6). Among these genes, CYP6P3 and CYP6M2 were reported to be associated with pyrethroids resistance in A. gambiae (Müller et al., 2008;Stevenson et al., 2011); CYP6P4 and CYP4G16 were related to pyrethroids resistance in A. arabiensis (Ibrahim et al., 2016;Jones et al., 2013

| VGSC mutations and overexpressed detoxification genes in house fly
We applied FastD-TR to detect potential VGSC mutations in two

| Overexpressed detoxification genes in Colorado potato beetle
We detected nAChR mutations between a resistant population and a susceptible population in Colorado potato beetle. There was no nAChR mutation found both in the resistant and susceptible populations. By using FastD-MR, 17 detoxification genes, including 13 P450s and 4 CCEs, were detected with elevated expression levels more than twofold (|log2FoldChange|> 1, p value < .01) in the resistant compared with susceptible population with fold changes ranging from 20.07 to 2.17 (Table 9).

| Overexpressed detoxification genes in honey bee
For honey bee, there were no corresponding RNA-Seq datasets from resistant populations available. We applied FastD to detect the genetic changes of honey bee after selection of imidacloprid. Using FastD-TR, there was no target-site mutation detected both in selected samples or control samples. While four detoxification genes were detected with overexpression in selected sample compared with control sample using FastD-MR (Table 10).

| D ISCUSS I ON
Insecticide resistance monitoring is the key to sustain insecticidemediated control efficiency. Molecular detecting assays can be used to detect resistant markers accurately at early stages to avoid resistance evolution (Network, 2016). Target-site resistance, which is mainly caused by target-site mutations, and metabolic resistance, which is mainly caused by overexpressed detoxification genes, are the two main mechanisms of insecticide resistance (Ffrench-Constant, 2013). The detection of these two kinds of resistance can well reveal the mechanism of resistance of insect pest populations. researchers have adopted RNA-Seq as a method to study resistance mechanisms and detect resistant markers (David et al., 2014;Faucon et al., 2017;Mamidala et al., 2012).

PCR
ACE was a tool developed to detect previously known target-site mutations in organophosphates and carbamates target AChE from RNA-Seq data (Guo et al., 2017). Compared with ACE, FastD was  Li, Fang, et al., 2013). Estimating the resistance to herbicide and fungicide will be added in the next version of FastD program.

ACK N OWLED G M ENTS
This work was supported by National Key R&D Program of China (2019YFD1002105) and Key Project of Zhejiang Provincial Natural Science Foundation (LZ18C060001).

CO N FLI C T O F I NTE R E S T
The authors declare that they have no competing interests. Writing-review & editing (supporting).

DATA AVA I L A B I L I T Y S TAT E M E N T
The FastD program, the online version and standalone version, installation instructions, resistance-related gene sequences, and a step-by-step example on how to use were freely available at the web server: http://www.insect-genome.com/fastd. The RNA-Seq datasets used in this study were publicly available (NCBI SRA data-  TA B L E 1 0 The overexpressed detoxification genes in imidacloprid selected samples of Apis mellifera compared with the control samples