The unigenes were annotated by aligning with the deposited ones in diverse protein databases including the National Center for Biotechnology Information (NCBI) nonredundant protein (nr) database, the NCBI nonredundant nucleotide sequence (nt) database, the Kyoto Encyclopedia of Genes and Genomes (KEGG), the UniProt/Swiss-Prot, Gene Ontology (GO), Cluster of Orthologous Groups of proteins (COG) and the UniProt/TrEMBL databases, using BlastX with a cutoff E-value of 10−5 (Table 2). The analyses showed that 16 460 unigenes (44.77%) had significant matches in the nr database, 11 458 unigenes (31.65%) in the nt database, and 14 441 unigenes (39.28%) in the Swiss-Prot database. In total, 17 788 unigenes (48.38%) were successfully annotated in the nr, nt, Swiss-Prot, KEGG, GO, COG and TrEMBL databases; however, 18 978 unigenes (51.62%) were unmapped in those databases, which could be attributable to the short sequence reads generated by the sequencing technology (Hou et al., 2011).
Table 2. Functional annotation of the Dialeurodes citri transcriptome
|Annotated databases||All sequences||≥300 bp||≥1000 bp|
|Total||17 788||10 396||2451|
For GO analysis, 4850 unigenes were divided into three ontologies: 2978 unigenes (61.40%) for molecular function, 688 unigenes (14.19%) for cellular components, and 1184 unigenes (24.41%) for biological processes (Fig. 2). The molecular function category mainly comprised proteins involved in binding, predominantly Hsps, and catalytic activities including kinases, hydrolases and transferases, allowing us to identify the genes involved in the secondary metabolite synthesis pathways. Regarding, cellular components, cell and cell part were highly represented. For biological processes, the genes involved in cellular and metabolic processes were both highly represented. GO annotation provided a general gene expression profile signature for D. citri, which showed that the expressed genes in this species encode diverse structural regulatory and stress proteins.
Figure 2. Functional annotation of assembled sequences based on gene ontology (GO) categorization. GO analysis was performed at the level two for three main categories (cellular component, molecular function, and biological process).
Download figure to PowerPoint
In addition, all unigenes were subjected to a search against the COG database for functional prediction and classification. In total, 5731 unigenes with hits in the nr database could be assigned to COG classification and divided into 25 specific categories (Fig. 3). ‘General function prediction’ (16.96%) represented the largest group, followed by ‘translation, ribosomal structure and biogenesis’ (10.33%), ‘post-translational modification, protein turnover, chaperones’ (8.43%), ‘replication, recombination and repair’ (3.93%), and ‘transcription’ (2.98%). Only a few unigenes were assigned to ‘nuclear structure’ (0.017%) and ‘cell motility’ (0.19%). The category of ‘secondary metabolites biosynthesis, transport and catabolism’ (2.11%) was an important group, because of the importance of secondary metabolites to the insecticide in insects.
Figure 3. Cluster of orthologous groups (COG) classification. In total, 5731 of the 36 766 sequences with nonredundant database hits were grouped into 25 COG classifications.
Download figure to PowerPoint
The unigene metabolic pathway analysis was also conducted using the KEGG annotation system. This process predicted a total of 308 pathways, which represented a total of 7507 unigenes. The pathways involving the highest number of unique transcripts were ‘metabolism’ (14.82%), followed by pathways in ‘chromosome’ (3.71%) and ‘spliceosome’ [3.06% (Table S1)].
Detection of gene sequences encoding insecticide detoxification enzymes
In D. citri, as in other insect species, a suite of detoxification enzymes such as GSTs, CarEs, and P450s are involved in metabolizing xenobiotics, secondary plant chemicals and insecticides. Based on this transcriptome, a large number of candidate genes and gene families related to insecticide resistance were identified, which provided valuable information regarding further investigation of the detailed mechanisms. Furthermore, these enzymes were searched against the GO and COG database for functional prediction and classification in D. citri (Table 3), the GO and COG annotations associated with these candidate unigenes yielded new insights to better understand their functions and relations.
Table 3. Insecticide detoxification enzymes potentially involved in GO and COG annotations
|Enzymes/class||Gene numbers||GO annotation||COG annotation|
|Cytosolic GSTs|| || || |
| Delta||3||GO:0016829(2);GO:0016740(2) GO:0004364(1)||COG0625(3)|
|CarEs|| || || |
| Clade B||1||None||COG2272(1)|
| Clade D||1||GO:0016787(1)||COG2272(1)|
| Clade E||6||GO:0052689 (2)||COG2272(6)|
|P450s|| || || |
| CYP3||12||GO:0005506(1);GO:0016491(2) GO:0046872(1);GO:0004497(1)||COG2124(9)|
In insects, GSTs fall into six major subclasses: sigma, omega, theta, zeta, and insect-specific delta and epsilon (Hayes et al., 2005; Tu & Akgul, 2005). In the present study, a total of 15 unique sequences with a mean length of 623 bp, encoding specific GSTs, were identified (Table S2). Among these, 10 genes were classified into four classes by phylogenetic analysis with GST genes from Drosophila melanogaster and Acyrthosiphon pisum (Fig. 4); five in sigma, one in omega, one in epsilon, and three in delta. In addition, we screened these 10 GST unique transcripts against the GO and COG annotations. In the GO annotations database, ‘catalytic activity’, ‘glutathione peroxidase activity’, ‘glutathione transferase activity’ and ‘transferase activity’ were involved in the sigma class of GST, which was closely associated with xenobiotic detoxification, and it has been reported that GST can catalyse the conjugation of the reduced glutathione to electrophilic centres of a wide range of exogenous or endogenous toxic compounds, chemical carcinogens, insecticides, herbicides and oxidative stress products (Hayes et al., 2005). Similarly to the sigma class of GST, ‘lyase activity’, ‘transferase activity’, and ‘glutathione transferase activity’ were also found in the delta class of GST. Moreover, for the COG annotation ‘post-translational modification, protein turnover, chaperones’ were found in the omega, epsilon, and delta classes of GST, which suggested that these three classes can display chaperone-like activity, helping the unfolding proteins to maintain their correct states; however, only four classes of GST were indentified in D. citri, including sigma, omega and the two insect-specific classes; the theta and zeta were absent. Similar results were also found in A. pisum and Myzus persicae: epsilon and zeta classes of GSTs were absent in their transcriptomes (Ramsey et al., 2010). In T. vaporariorum, most of the identified GSTs were assigned to the delta class, but in D. citri only three sequences of delta class were found, which was much less than those in T. vaporariorum. It has been reported that GSTs play important roles in phase II detoxification of several chemical insecticide classes, i.e. pyrethroids (Lumjuan et al., 2011), organophosphates (Melo-Santos et al., 2010), such as dichloro-diphenyl-tricgloroethane (DDT), and neonicotinoid resistance were associated with the sigma class of GST in Aedes aegypti and B. tabaci (Grant & Hammock, 1992; Rauch & Nauen, 2004). Interestingly, the sigma unique transcripts in D. citri were the most abundant class, and the sigma class of GST in this insect may also be associated with insecticide resistance. Further functional studies (gene expression and RNA interference [RNAi]) are required to elucidate their role in D. citri.
Figure 4. Neighbour-joining phylogenetic analysis of the glutathione S-transferases (Gst) from Dialeurodes citri (DC), Drosophila melanogaster (Dm), and Acyrthosiphon pisum (Ap).
Download figure to PowerPoint
A total of 49 unigenes with a mean length of 476 bp were indentified to encode specific putative CarEs genes in D. citri (Table S2). Phylogenetic analysis with genes from B. tabaci, Nasonia vitripennis and A. pisum found that eight unigenes were classified into three clades (Fig. 5). Among these, clade E contained six sequences, and DC-Unigene 29909 and DC-Unigene 31976 each had high homology to clade A and clade D. A search against the GO and COG annotation database showed that ‘hydrolase activity’ and ‘carboxylic ester hydrolase activity’ were each involved in clade D and clade E in the GO term. Importantly, for the COG database, all three clades were annotated as ‘lipid transport and metabolism’, and the annotation associated with ‘metabolism’ would be starting points to study insecticide resistance. It was reported that CarEs can be divided into 13 clades (Ranson et al., 2002), only three of which were presented in D. citri. Six unigenes belong to clade E were identified, which was the same as in T. vaporariorum, and clade D was also found in D. citri, but not identified in T. vaporariorum (Karatolos et al., 2011). Clade D and E enzymes were thought to be largely involved with pheromone- and hormone-processing in insects. Moreover, CarEs are the key esterase enzyme family associated with insecticide resistance in insects, and overproduction and qualitative changes in enzyme structure are two mechanisms (Baffi et al., 2007; Zhang et al., 2007; Kwon et al., 2009). CarEs involved in the detoxification of insecticides belong to clades A–C, and only one sequence was assigned to these clades in D. citri, which was much less than those identified in T. vaporariorum CarEs (12 sequences). Furthermore, that clade A, DC-unigene 34696 had a homology to a CarE gene in B. tabaci (COE1, accession ABV45410), which was closely associated with organophosphate resistance (Alon et al., 2008). Future work could therefore focus on the relationship between insecticide resistance and the sequences in D. citri.
Figure 5. Neighbour-joining phylogenetic analysis of the carboxylesterases from Dialeurodes citri (DC), Nasonia vitripennis (Nv), Bemisia tabaci (Bt) and Acyrthosiphon pisum (Ap).
Download figure to PowerPoint
In the D. citri transcriptome, a total of 53 sequences with a mean length of 653 bp were identified to encode specific P450 genes (Table S2). Based on the phylogenetic analyses with B. tabaci, A. pisum, and D. melanogaster, P450s from D. citri were assigned into appropriate CYP families. Among these, 12 P450s belonged to the CYP3 family, three to the CYP4 family, and two to the CYP2 family (Fig. 6). For the GO annotation, only ‘oxidoreductase activity’ was involved in the CYP2 family. For the CYP3 and CYP4 families, the GO annotations were different from each other. The CYP3 family was mapped into ‘iron ion binding’, ‘metal ion binding’, and ‘monooxygenase activity’; whereas the CYP4 family was annotated into ‘oxidation reduction’, ‘alkane 1-monooxyhease activity’, ‘electron carrier activity’ and ‘heme binding’, which suggested that P450s may have multiple functions in D. citri. Furthermore, ‘second metabolites biosynthesis, transport and catabolism’, was involved in CYP2, CYP3 and CYP4 in the COG term, which suggested that these cytochrome P450 genes are closely associated with secondary metabolites to the insecticide used in D. citri. Similarly to T. vaporariorum, the CYP2, CYP3 and CYP4 families were also found in D. citri, and a majority of the identified P450s belonged to the CYP3 family; however, unlike in T. vaporariorum, the mitochondrial families were not identified in the D. citri transcriptome. The P450s are a major family of enzymes involved in detoxification and metabolism (Tijet et al., 2001). CYP3 and CYP4 of P450 families have been implicated in the metabolism of plant secondary metabolites and synthetic insecticides in some insect species (Karatolos et al., 2011). Genes of the families identified in the present study were candidates for a potential role in insecticide resistance in D. citri. In B. tabaci, it has been reported that overexpression of CYP6CM1 contributed to resistance to neonicotinoid insecticides (Karunker et al., 2008; Puinean et al., 2010), and in D. citri, DC-Unigene 36133 and DC-Unigene 20657 (CYP4) all had a high homology with CYP6CM1, future research on the expression of these important genes could facilitate the discovery of genes involved in detoxification and resistance, and technologies such as RNAi can be adapted to identify the function of these genes, Moreover, analysis of fully sequenced insect genomes has indentified 164 P450s in Ae. aegypti (Strode et al., 2008), 106 in Anopheles gambiae (Holt et al., 2002), 85 in D. melanogaster (Adams et al., 2000), and 83 in A. pisum (Ramsey et al., 2010); however, the current number of P450s in D. citri was at the lower level and additional P450 genes may await discovery because they were absent from the present transcriptomic dataset.
Figure 6. Neighbour-joining phylogenetic analysis of cytochrome P450s from Dialeurodes citri (DC), Bemisia tabaci (Bt), Acyrthosiphon pisum (Ap), and Drosophila melanogaster (Dm).
Download figure to PowerPoint
Detection of gene sequences encoding insecticide target proteins
A number of sequences encoding insecticide target proteins including the GABA receptor, the voltage-gated sodium channel (VGSC), nicotinic acetylcholine receptor subunits (nAChRs), the AChE enzyme, and the ryanodine receptor were identified in the transcriptome of D. citri (Table 4). These target proteins have been reported to be associated with insecticide resistance, and a number of mutations have been detected in many of these target proteins that lead to varying degrees of insensitivity in other arthropod species. For exampe, in B. tabaci, resistance to endosulfan was associated with a mutation in the GABA receptor subunit gene (Houndété et al., 2010); the M918V, L925I and T929V mutations of the VGSC were reportedly associated with resistance to pyrethroids (Chung et al., 2011). The imidacloprid resistance in Nilaparvata lugens was also associated with a single point mutation at a conserved position (Y151S) in two nAChR subunits, Nlα1 and Nlα3 (Liu et al., 2005). Furthermore, it was previously observed that resistance to organophosphates in the B biotype of B. tabaci was associated with a point mutation (Phe392Trp) in ace1- type AChE. B. tabaci and T. vaporariorum have also been reported to have developed a substantial resistance to neonicotinoids, in which both mutations in AChRs and elevated metabolic detoxification have been found to be involved (Honda et al., 2006; Gorman et al., 2007; Alon et al., 2008). Although most of these unigenes were not full length, a further characterization of these targets using RACE to retrieve the full-length cDNAs will be facilitated. As long as the full-length sequences of all the unique unigenes are obtained, the characterization of alternative exons of these important genes could be determined in further studies. Since many target genes have been obtained in D. citri, the identification of the described mutations at known ‘hot-spots’ must await further investigation. Moreover, future research will focus on the correlation of these possible single nucleotide polymorphisms with insecticide resistance by using TaqMan® assays to screen additional populations with different resistance phenotypes (Karatolos et al., 2011).
Table 4. Unique transcripts associated with insecticide target sites in Dialeurodes citri
|Target sites||Sequence number||Unigene ID||Insecticide class|
|GABA receptor||1||Unigene 35642||Organochlorines, Phenylpyrazoles|
|Voltage-gated sodium channel||1||Unigene 9689||Pyrethroids, Pyrethrins|
|Nicotinic acetylcholine receptor||2||Unigene 35929||Neonicotinoids|
|Acetylcholinesterase||3||Unigene 32037||Organophosphates, Carbamates|
|Ryanodine receptor||7||Unigene 29189 Unigene 29191 Unigene 29839 Unigene 32598||Flubendiamide, chlorantraniliprole|
|Unigene 32599 Unigene 23546|
Analysis of Hsp genes
Hsps are highly conserved proteins found in all eukaryotes and prokaryotes, and these gene families consist of stress-inducible and constitutively expressed genes (Parsell & Lindquist, 1993). A wide variety of environmental stresses such as oxygen radicals, heavy metals, high temperature, nutrient deprivation, bacterial and viral infections, as well as malignant transformation, were all stimuli for the production of Hsps (Gehrmann et al., 2004). Generally, Hsps can be divided into five families according to molecular weight and the homologous relationship of Hsps, including small Hsps (sHsps), Hsp60, Hsp70, Hsp90, and Hsp100 (Nover & Scharf, 1997). In the present study, Hsps were also searched against the GO and COG database for functional prediction and classification in D. citri (Table 5).
Table 5. Heat shock proteins potentially involved in GO and COG annotations
|Hsp genes||Gene numbers||GO annotation||COG annotations|
|sHsp|| || || |
| Hsp20||7||GO:0006950(2); GO:0006497(1)||COG0071(1)|
| Hsp40||2||GO:0005524(1)||COG0484(1); COG2214(1)|
|Hsp60||4||GO:0005524(2); GO:0003994(1);||COG0607(1); COG0459(1)|
|Hsp70||9||GO:0000166(1); GO:0005524(5); GO:0001666(1)||COG0443(9)|
sHsps are a family of molecular chaperones, with molecular weight ranging from 12 to 43 kDa, and reflect the response mechanism of organisms to some extreme stresses existing in the environment (Kim et al., 1998; Franck et al., 2004). sHsps were suggested to contribute to thermal resistance, and may therefore extend the geographical distribution of some invasive species (Qin et al., 2005; Huang & Kang, 2007). In the present study, 18 unigenes were found to have similarities to sHsps in the transcriptome of D. citri (Table S3). Among them, nine unigenes appeared to be complete or almost complete sequences, and these nine sequences were further identified by phylogenetic analysis with genes from B. tabaci, A. pisum and T. vaporariorum. The results showed that seven unigenes belonged to Hsp20 and two unigenes belonged to Hsp40 (Fig. 7). In the GO database, ‘protein lipidation’, and ‘ATP binding’ were each involved in Hsp20 and Hsp40, which were similar to the previous studies in that sHsps had an ATP-independent holdase activity (Gobbo et al., 2011). Importantly, ‘response to stress’ was detected in sHsps, which suggests that sHsps may be involved in cellular stress resistance in D. citri. For the COG database, only ‘post-translational modification, protein turnover, chaperones’ was found in sHsps, which revealed that sHsps could act as molecular chaperones that block the aggregation of unfolded proteins and have a cytoprotective function under stressful situations.
Figure 7. Neighbour-joining phylogenetic analysis of HSPs from Dialeurodes citri (DC), Bemisia tabaci (Bt), Acyrthosiphon pisum (Ap), and Trialeurodes vaporariorum (Tv).
Download figure to PowerPoint
The Hsp60 family is a group of proteins with distinct ring-shaped, or toroid quaternary structures (Quintana & Cohen, 2005). Most studies in Hsp60 have been focused on mammals and typical model organisms, indicating its possible role in certain cellular processes, such as development, thermoprotection and toxic stress response, and it has even been regarded as an potential environment stress marker (Choresh et al., 2001; Timakov & Zhang, 2001; Chen et al., 2008). In the present study, only four unigenes encoding for putative Hsp60 were identified in the database (Table S3). Among them, no sequence appeared to be complete, and all the unigenes were shorter than 600 bp. In addition, we screened four unique transcripts against the GO and COG annotations. ‘ATP binding’, and ‘aconitate hydratase activity’ were involved in the GO term. Similarly to the sHsps, ‘post-translational modification, protein turnover, chaperones’ was also found in Hsp60 for the COG annotation, which showed that Hsp60 could also act as a molecular chaperon. Interestingly, Hsp60 was mapped into the annotation of ‘inorganic ion transport and metabolism’, which is consistent with previous studies that reported that Hsp60 was implicated in activities such as amino acid transport, signal transduction and cellular metabolism (Ikawa & Weinberg, 1992; Jones et al., 1994; Xu & Qin, 2012).
Hsp70 was a molecular chaperone that was expressed in response to stress by binding to its protein substrates and stabilizing them against denaturation or aggregation until conditions improved (Mayer & Bukau, 2005). In the present study, 36 unigenes were shown to be highly conserved identified to the classic inducible of Hsp70 from other organisms, and nine complete sequences were identified in the transcriptome of D. citri. Phylogenetic analysis (Fig. 7) showed that these nine complete sequences have high homology with Hsp70 identified in other whiteflies (B. tabaci and T. vaporariorum). In the GO database, the important annotation was ‘response to hypoxia’ which revealed that Hsp70 may play an important role in the response to the hypoxia-stress, and according to the COG annotation, Hsp70 also had a role in chaperone activities.
Hsp90 is a highly conserved molecular chaperone, and studies have shown that it has housekeeping functions in the folding, maintenance of structural integrity, and proper regulation of a subset of cytosolic proteins (Picard, 2002; Sonoda et al., 2006). Hsp100 uses an ATP-dependent protein unfoldase activity to solubilize protein aggregates or to target specific classes of proteins for degradation (Lee et al., 2004). In the present study, eight unigenes were identified encoding for putative Hsp90 (Table S3), and among them two sequences appeared to be complete; however, only one unigene of Hsp100 was found in our database, and that sequence was shorter than 400 bp, which was too short to conduct the phylogentic analysis with genes from B. tabaci, A. pisum and T. vaporariorum. Hsp90 and Hsp100 were both unsuccessfully annotated in the GO term. Similarly to other Hsps, Hsp90 and Hsp100 were all mapped into ‘post-translational modification, protein turnover, chaperones’ in the COG term, which revealed that the families of Hsps were all highly conserved, and function mainly as molecular chaperones, allowing cells to adapt to gradual changes in their environment and to survive in otherwise lethal conditions.
Microsatellite markers (SSRs) are highly informative and widely used for evolution and genetics studies (Liu et al., 2012). To further evaluate the assembly quality and develop new molecular markers of D. citri, the 36 766 unigenes generated in the present study were used to mine potential microsatellites. Totally, 149 microsatellite markers were detected, including 31 (20.81%) dinucleotide motifs, 64 (42.95%) trinucleotide motifs, 25 (16.78%) tetranucleotide motifs, 23 (15.44%) pentanucleotide motifs, five (3.36%) hexanucleotide motifs, and one (0.67%) compound SSR (Table 6). The most abundant repeat type was ATC, followed by TC, CTT, AGG, AC and ATTT; however, the number of molecular marker SSRs identified in the present study was much lower than those in the transcriptome of B. tabaci, which contained 9075 SSRs. It is known that microsatellite loci are not universally abundant in some arthropod genomes (Fagerberg et al., 2001), e.g. the SSRs of lepidopteran genomes appear to be rare and the recent study on the butterfly Euphydryas editha showed that only 92 SSRs were detected (Mikheyev et al., 2010).
Table 6. Summary of simple sequence repeat (SSR) types in the Dialeurodes citri transcriptome
|Repeat motif||Number||Percentage (%)|
|Dinucleotide|| || |
| AC||9|| |
| AT||4|| |
| TC||18|| |
|Trinucleotide|| || |
| AAC||2|| |
| CCG||3|| |
| AAT||4|| |
| ACC||4|| |
| AGG||10|| |
| CTT||12|| |
| AGC||5|| |
| ATC||24|| |
|Tetranucleotide|| || |
| ACAG/ACTT/AGGT/ATGT/CAGT/CATT||6|| |
| AAAC||2|| |
| AATC||2|| |
| ATTT||6|| |
| CTTT||9|| |
|Pentanucleotide|| || |
| AAAAT/AATAG/AACTC/ACAGG/ACATT/ACCTT/AGGGG/||7|| |
| ATGGT/CCTCT/CGAGT/GACTT/GCGGT/GCGTT||6|| |
| AACCT||2|| |
| ATATC||2|| |
| CTTTT||2|| |
| GTTTT||4|| |
|Hexanucleotide|| || |