Integrative transcriptomic and genomic analysis of odorant binding proteins and chemosensory proteins in aphids

Abstract Odorant binding proteins (OBPs) and chemosensory proteins (CSPs) play essential roles in insect chemosensory recognition. Here, we identified nine OBPs and nine CSPs from the Myzus persicae transcriptome and genome. Genomic structure analysis showed that the number and length of the introns are much higher, and this appears to be a unique feature of aphid OBP genes. Three M. persicae OBP genes (OBP3/7/8) as well as CSP1/4/6, CSP2/9 and CSP5/8 are tandem arrayed in the genome. Phylogenetic analyses of five different aphid species suggest that aphid OBPs and CSPs are conserved in single copy across all aphids (with occasional losses), indicating that each OBP and CSP class evolved from a single gene in the common ancestor of aphids without subsequent duplication. Motif pattern analysis revealed that aphid OBP and CSP motifs are highly conserved, and this could suggest the conserved functions of aphid OBPs and CSPs. Three OBPs (MperOBP6/7/10) are expressed antennae specifically, and five OBPs (MperOBP2/4/5/8/9) are expressed antennae enriched, consistent with their putative olfactory roles. M. persicae CSPs showed much broader expression profiles in nonsensory organs than OBPs. None of the nine MperCSPs were found to be antennae specific, but five of them (MperCSP1/2/4/5/6) showed higher expression levels in the legs than in other tissues. MperCSP10 mainly expressed in the antennae and legs. The broad and diverse expression patterns of M. persicae CSPs suggest their multifunctions in olfactory perception, development and other processes.


Introduction
The green peach aphid Myzus persicae (Hemiptera: Aphididae) is a major destructive polyphagous pest. They can use more than 400 plant species from more than 40 families for parthenogenetic reproduction, and use peach, its primary host, for sexual reproduction (Van Emden et al., 1969;Weber, 1986;Troncoso et al., 2005). Other than its direct feeding damage to plants, M. persicae can also transmit more than 100 plant viruses, including both persistent viruses such as potato leaf roll virus (Eskandari et al., 1979) and nonpersistent viruses such as cucumber mosaic virus (Bwye et al., 1997). In addition, M. persicae is a typical host-alternating aphid species, and exhibits highly colour-polymorphic association with different host plants. Green M. persicae aphids can produce pink offspring when they are fed on poor quality hosts (Williams et al., 2000), whereas red M. persicae aphids can produce green offspring when they are crowded on poor diets (Ueda and Takada, 1977). The ovipositing female aphids locate their hosts usually by chemical cues and antennal contacts (Read et al., 1970). The success of M. persicae in nature is due to its extremely high population adaptability to the environment, a wide genetic variability and a broad phenotypic plasticity for which chemical communications play a critical role (Weber, 1986).
Insects use their sensitive and selective olfactory organs, mainly the antennae, to perceive the surrounding world. Aphids, like other insects, rely heavily on chemical signals, including plant volatiles and species-specific pheromones, to locate correct hosts, find mates and avoid predators or parasitoids (Pickett and Glinwood, 2007). Mature sexual females release sex pheromones from their tibiae of the hind legs to attract conspecific males (Marsh, 1972;Pickett and Glinwood, 2007). The sex pheromones of aphid species usually comprise (4aS,7S,7aR)-nepetalactone and (1R,4aS,7S,7aR)-nepetalactol, a mixture of two monoterpenoids (Pickett et al., 1992;Dewhirst et al., 2010). When attacked by predators or parasitoids, most aphid species, including M. persicae, emit chemical signals known as alarm pheromones that signal other individuals to escape and defend (Nault et al., 1973;Pickett and Griffiths, 1980), or even attack the predator (eg Ceratovacuna lanigera) (Arakaki, 1989). The alarm pheromones of most aphids are a mixture of (E)-β-farnesene (Eβf), α-pinene, β-pinene and β-limonene (Pickett and Griffiths, 1980). Eβf, a sesquiterpene hydrocarbon, is the primary component of the alarm pheromones of many aphids, and thus is considered as a repellent to control aphids (Bowers et al., 1972). Chemoreception plays important roles in the complex life cycle of aphids. Studying the interaction of aphids with aphids, plants, natural enemies, competitors and mutualists will contribute to exploit novel environment-friendly strategies for aphid control.
Odorant binding proteins (OBPs) and chemosensory proteins (CSPs) are two families of small water-soluble proteins; they are highly concentrated (as high as 10 mm) in sensillum lymph of the antennal sensilla and believed to be involved in the initial chemosensory recognition of insects (Vogt and Riddiford, 1981;Calvello et al., 2003;Pelosi et al., 2006). Both OBPs and CSPs have small molecular weights, approximately 15 kDa for OBPs and 12 kDa for CSPs. The common feature of insect OBPs is that they have six highly conserved cysteines that are paired to form interlocked disulphide bridges Northey et al., 2016). Insect CSPs, on other hand, have four conserved cysteines that are linked to form disulphide bridges . So far, the genomic sequences of four aphid species have been published: the pea aphid Acyrthosiphon pisum (The International Aphid Genomics Consortium, 2010), the Russian wheat aphid Diuraphis noxia (Nicholson et al., 2015), the green peach aphid M. persicae (Mathers et al., 2017) and the soybean aphid Aphis glycines (Wenger et al., in press). Meanwhile, the transcriptomic data of several aphid species are also available (Gu et al., 2013;Xue et al., 2016). Fifteen OBPs and 13 CSPs were identified from the A. pisum genomic data (Zhou et al., 2010). Nine OBPs and nine CSPs were identified in the cotton aphid Aphis gossypii (Gu et al., 2013), and 13 OBPs and five CSPs were identified in the grain aphid Sitobion avenae (Xue et al., 2016) from their transcriptomic data. For M. persicae, only four OBPs and five CSPs were identified from the expressed sequence tags (Xu et al., 2009). The functional studies of aphid OBPs and CSPs in recognizing and discriminating the sex pheromones, the alarm pheromone and the host plant volatiles, however, are still limited. The OBP3 of A. pisum (Qiao et al., 2009), the OBP7 of S. avenae (Vandermoten et al., 2011;Zhong et al., 2012), the OBP3 of Megoura viciae and Nasonovia ribisnigri (Northey et al., 2016) and the OBP3 and OBP7 of Rhopalosiphum padi (Fan et al., 2017) have high binding affinities with the alarm pheromone Eβf. However, the in vivo evidence of OBP3 and OBP7 participating in Eβf recognition is still not conclusive.
In this study, we identified nine OBP and nine CSP genes from the M. persicae transcriptomic and genomic sequence datasets and for the first time demonstrated their genomic structures, clusters and uniquely long introns. We further analysed their motif patterns and phylogenetic relationships with the OBPs and CSPs in five other aphid species and in other hemipterans. We also examined the tissue expression profiles of M. persicae OBP and CSP genes to infer their putative functions in aphid chemoreception of semiochemicals. The evolution and differentiation of Hemiptera OBPs and CSPs are also discussed.

Sequencing, de novo assembly and functional annotation
A total of 110 059 204 raw reads were produced from the apterous M. persicae transcriptome sequencing project. After trimming adaptor sequences and removing low-quality sequences, 107 199 360 clean reads were remained. After assembly, 50 352 unigenes were generated, with lengths ranging from 201 bp to 27.17 kb and an average length of 829 bp (N50 = 1745 bp, N90 = 294 bp).
The unigenes were searched with blastx and blastn programs against the sequences in the NCBI GenBank database. The results indicated that 15 881 of the 50 352 unigenes (31.54%) had blastx hits in the nonredundant protein (nr) databases, and 14 329 unigenes (28.46%) had blastn hits in the nonredundant nucleotide sequence (nt) databases. Some unigenes are homologous to more than one species, and most of the annotated unigenes have the best hit with Hemiptera insect genes.
The Gene Ontology (GO) category analysis revealed that only 12 521 of the 50 352 unigenes (24.87%) could be annotated into different functional groups (biological process, cellular components and molecular functions). The cellular process (7650 unigenes) and metabolic process (6913 unigenes) GO categories were most abundantly represented within the biological process GO. In the cellular components GO, the transcripts were mainly distributed in the cell (4832 unigenes) and cell part (4935 unigenes). The GO analysis also showed that the unigenes involving in binding (7306 unigenes) and catalytic activity (5696 unigenes) were most abundant in the molecular function ontology (Fig. 1).

Identification of OBP and CSP genes in M. persicae
A total of nine OBP and nine CSP genes were identified from the transcriptome and genome of M. persicae using motifsearch and tblastn program ( Table 1). The identification of these OBPs and CSPs was confirmed by searching M. persicae genome sequences (https://bipaa. genouest.org/is/aphidbase/myzus_persicae). All M. persicae OBPs and CSPs are full-lengths with open reading frames (ORFs) ranging from 396 to 858 bp. The nucleotide sequences of M. persicae OBP and CSP genes were verified by molecular cloning and sequencing. We name these M. persicae OBP genes as MperOBP2-10 and the CSP genes as MperCSP1-2 and MperCSP4-10 following the nomenclatures reported for the OBPs and CSPs of A. pisum (Zhou et al., 2010), A. gossypii (Gu et al., 2013) and S. avenae (Xue et al., 2016). Like in A. gossypii (Gu et al., 2013), both OBP1 and CSP3 identified in A. pisum are missing in the M. persicae transcriptome and genome dataset. Like AgosCSP1 (Gu et al., 2013), MperCSP1 does not have a signal peptide, a signature of secretory protein. The rest of the M. persicae OBP and CSP proteins all have a signal peptide at their N-terminus (Table 1). The nine identified M. persicae OBPs can be divided into two distinct subfamilies: MperOBP2/3/4/7/8/9/10 belong to the classical OBP subfamily, which has the typical six conserved cysteines and fit the motif pattern of 'C 1 -X 15-39 -C 2 -X 3 -C 3 -X 21-44 -C 4 -X 7-12 -C 5 -X 8 -C 6 ' ( Fig. 2A, Zhou et al., 2004;Xu et al., 2009), MperOBP5 and MperOBP6 belong to the plus-C OBP subfamily, which has one additional cysteine (C6a) and a conserved proline immediately after the sixth cysteine ( Fig. 2A, Zhou et al., 2004). All of the MperCSPs have four conserved cysteines and fit the motif pattern of 'C 1 -X 6-8 -C 2 -X 16-21 -C 3 -X 2 -C 4 ' (Fig. 2B, Zhou et al., 2004). The nucleotide sequences of MperOBPs and MperCSPs have been deposited in GenBank under the accession numbers MG356454 to MG356471.

Phylogenetic analyses of Hemiptera OBPs and CSPs
MperOBPs share relatively low amino acid identities (8-59%) with each other (Supporting Information    . 3), indicating that these subgroups evolved from a single gene in the common ancestor of aphids.  suggesting a more recent evolution into OBP1 subgroup and OBP8 subgroup from a common ancestor before aphid speciation (Zhou et al., 2010). The phylogenetic tree of Hemiptera CSPs was built using 110 CSP sequences from five different aphid species (M. persicae, A. gossypii, A. pisum, A. glycines and S. avenae), three plant bugs (A. suturalis, A. lucorum and  Table S9). Like the OBPs shown in Fig. 3, the phylogenetic analysis clusters aphid CSPs into several clear clades of nine homologous subgroups (CSP1, CSP2, CSP4, CSP5, CSP6, CSP7, CSP8, CSP9 and CSP10). Interestingly, some aphid CSP subgroups are clustered with non-aphid CSP genes in conserved clades. For example, the plant bug AsutCSP2 and the aphid CSP7 subgroup are clustered in the same clade with a bootstrap value as high as 95, and the plant hopper LstrCSP1 and the aphid CSP9 subgroup are clustered in the same branch with a bootstrap value as high as 93 (Fig. 4). This suggests that the AsutCSP2 and aphid CSP7s and the LstrCSP1 and aphid CSP9s evolved from the same ancestor and may play similar roles in these sucking insects.

Genomic structure of M. persicae OBP and CSP genes
The genomic structures and the splice site of the intronexon junctions of M. persicae OBP and CSP genes were analysed based on the M. persicae genome annotations and the GT-AG rules (Modrek and Lee, 2002). The results revealed that the sizes of the genomic sequences of MperOBP genes range from 2.85 to 16.81 kb with an average length of 6.92 kb. Five OBP genes (MperOBP4/7/8/9/10) have six introns and seven exons. The remaining four OBP genes (MperOBP2/3/5/6) have 4, 5, 8 and 7 introns and 5, 6, 9 and 8 exons, respectively (Fig. 5, Table 5). The intron lengths of each MperOBP are variable; for example, the intron lengths of MperOBP3, MperOBP5 and MperOBP10 are 16.38 kb, 9.98 kb and 2.42 kb, respectively (Table 2). Cross-species comparison reveals that the numbers of introns and exons of each OBP subgroup are the same among four different aphid species (M. persicae, A. gossypii, A. pisum and A. glycines), but the lengths of the introns and exons are different among the four species (Table 2).
Motifs 1 and 2 that contain four highly conserved cysteine residues (C2 and C3 in motif 1, C5 and C6 in motif 2) of the classic OBP subfamily in insects (Zhou et al., 2004(Zhou et al., , 2010 are present in all the 102 Hemiptera OBPs (Fig. 8A). Motif 3 with highly conserved glycine (G) and aspartic acid (D) is present in the top five common motif patterns (92 of the 102 OBPs). Motif 4, which contains another characteristically conserved cysteine residue (C4), was only mapped to some OBPs. However, motif 6, containing highly conserved tryptophan (W) and conserved cysteine 4 (C6a), and motif 7, containing highly conserved lysine (K), C6b and proline (P) of the plus-C OBP subfamily, were not present in the 102 OBPs (Fig. 8A).

Expression profiles of M. persicae OBP and CSP genes in different tissues
The general and quantitative expression profiles of M. persicae OBP and CSP transcripts in different tissues (antennae, heads, legs and decapitated bodies) were examined using both semi-quantitative reverse transcription PCR (RT-PCR) and quantitative real-time PCR (qRT-PCR). Because equal amounts of cDNA (150 ng) were used in the RT-PCR reactions, the intensity of the PCR bands of MperOBP2/3/6/9/10 and MperCSP2/4/5/9 was higher than the remaining MperOBPs and MperCSPs, which can reflect their relatively high expression levels in M. persicae. Actually, the reads per kilobase per million mapped reads (RPKM) analysis also showed their relative high abundances in the M. persicae transcriptome, with the RPKM values of MperOBP2/3 and MperCSP2/4/5/9 more than 100 (Table 1).
The absolute values of the slope of all lines from template dilution plots (log cDNA dilution vs. ΔC T ) were <0.1 (Supporting Information Figs S2 and S3), and the real PCR amplification efficiencies of target and reference genes were more than 1.90 (calculated using the LinRegPCR program and listed in Supporting Information Table S5). Therefore, the efficiencies of the target and reference genes were similar in our analysis, and the ΔΔC T calculation method can be used for the relative quantification. Both RT-PCR and qRT-PCR results showed that three OBP genes (MperOBP6/7/10) were antennae specific, and their expression levels were approximately 101, 8.7 and 222 times higher in the antennae than in the body (p < 0.05), respectively (Fig. 9). MperOBP2/4/5/8/9 showed the highest expression in the antennae, and their expression levels were, respectively, 3.5, 3.8, 3.6, 6.5 and 63.9 times greater in the antennae than in the body (p < 0.05). MperOBP2/4/5/8/9 were also detectable in other tissues, such as head, leg and body (Fig. 9). MperOBP3 was the only OBP whose expression level was lower in the antennae than in the head and body.
None of the nine MperCSPs were found to be antennae specific, and five of them (MperCSP1/2/4/5/6) expressed at a higher expression level in the legs than the other three tissues (antennae, head and body) (Fig. 10), with the expression levels 20-50 times higher in the legs than in the decapitated bodies (p < 0.05). MperCSP10 expressed 41 and 39 times higher in the antenna and legs, respectively, than in the body (p < 0.05). MperCSP8 and MperCSP9 mainly expressed in the body. MperCSP7 exhibited a 2.2 times higher expression in the antennae than in the body (p < 0.05) (Fig. 10).

Discussion
With the increase in insect genome projects and transcriptome sequencing projects, particularly in recent years, very large numbers of both OBPs and CSPs have been identified in different insect species. These studies have provided an excellent chance to comparatively study insect OBPs and CSPs. For the first time, we had identified nine OBPs and nine CSPs, both from the M. persicae transcriptomic and genomic data, including four OBPs and five CSPs reported from the expressed sequence tags in M. persicae (Xu et al., 2009). The number of M. persicae OBPs and CSPs identified here is less than the number of OBPs and CSPs identified in A. pisum (15 OBPs, 13 CSPs) (Zhou et al., 2010), the same as in A. gossypii (nine OBPs, nine CSPs) (Gu et al., 2013), and similar to those of other sucking insects such as Figure 8. Motif analysis of Hemiptera odorant binding proteins (OBPs) and chemosensory proteins (CSPs). Parameters used for motif discovery were minimum width = 6, maximum width = 10 and maximum number of motif to find = 8. The upper parts in (A) and (B) list the eight motifs discovered in the Hemiptera OBPs and CSPs, respectively. All the motifs were discovered using meme (version 4.11.4;Bailey et al., 2009) (Zhou et al., 2014), S. furcifera (12 OBPs, nine CSPs) (He and He, 2014;Zhou et al., 2015) and the plant bug A. suturalis (16 OBPs, eight CSPs) (Cui et al., 2017). The high conservation in the sequence identities and genomic structures of the OBP and CSP orthologues among aphid species indicates that aphid OBPs and CSPs are conserved in a single copy across all aphids (with occasional losses), indicating that each OBP and CSP class evolved from a single gene in the common ancestor of aphids without subsequent duplication.
The absence of OBP1 and CSP3 homologous genes in M. persicae, as demonstrated in this study by transcriptome sequencing, searching the genome database and molecular cloning using gene-specific primers of A. pisum OBP1 and CSP3 (data not shown here), is consistent with the same results reported in the cotton aphid A. gossypii (Gu et al., 2013) and recently in the A. glycines genome (Wenger et al., in press). However, OBP1 homologous genes have been found in six other aphid species (A. pisum, S. avenae, M. viciae, M. dirhodum, T. salignu and P. salicis). The OBP1 orthologous genes are also not found in the plant hoppers and plant bugs in this study. It is possible that OBP1 and CSP3 genes might have been lost after aphid speciation of M. persicae, A. gossypii and A. glycines. Furthermore, there is a high amino acid identity between OBP1 and OBP8 subgroups in aphids, consistent with previous reports (Gu et al., 2013;He and He, 2014;Yuan et al., 2015;Xue et al., 2016).
The phylogenetic analyses clearly divided most of the aphid OBPs and CSPs into species-specific homology subgroups. However, some aphid OBP and CSP subgroups are clustered with non-aphid OBP and CSP genes in conserved clades. For example, aphid OBP4 and OBP5 subgroups are clustered with OBPs of plant hopper and plant bug with a bootstrap support of 100 and 84, respectively (Fig. 3), and aphid CSP7 and CSP9 subgroups are clustered with plant hopper and plant bug CSPs with a bootstrap support of 73 (Fig. 4), suggesting these genes may have a common function in these sucking insects. The motif analysis contributes to the understanding of the functional domains of insect OBPs and CSPs. The members of each OBP subgroup have a common motif pattern, particularly for members of the aphid OBP family and members of the CSP9 subgroup (Fig. 7). Five motif patterns are shared by different CSP subgroups, further supporting previous reports of greater conservation between insect CSPs than between OBPs (Fig. 7). Interestingly, A. pisum ApisCSP3 shares the same motif pattern arrangement with members of the aphid CSP1 and CSP6 subgroups as well as ApisCSP2, despite the sequences between these three subgroups (CSP1, CSP2 and CSP6) being very different (Fig. 4). This result is not consistent with a previous report on the motif analysis of Hemiptera OBPs and CSPs . This could be because different sets of OBP and CSP protein sequences were used in these two studies. The comparative sequence analyses with other Hemiptera species confirmed the results from the previous studies that CSPs are more conserved than OBPs and some highly conserved cysteine residues (C2, C3, C5 and C6) in insect OBPs (Fig. 8), suggesting their involvement in protein structure and possible functions.
The intron numbers and lengths of M. persicae OBPs are nearly equal to other orthologous genes in A. pisum and A. gossypii (Zhou et al., 2010;Gu et al., 2013), but much greater than those in aphid CSPs; this appears to be a unique feature of aphid OBP genes in general. Four gene clusters (OBP3/7/8, CSP1/4/6, CSP2/9 and CSP5/8) are found in three aphid species (Fig. 6). The conserved genomic structure and same transcriptional orientation demonstrate that these aphid OBP and CSP genes were created by gene duplication, possibly through chromosome recombination, supporting the birth-anddeath evolutionary model (Nei and Rooney, 2005;Zhou et al., 2010). The major alarm pheromone component Eβf emitted from the cornicles of aphids  is recognized by large placoid sensillum neurons, which express OR5 on the sixth antennal segment in A. pisum ; OBP3 and OBP7 are both expressed in the large placoid sensilla in M. persicae (Sun et al., 2013) and in the type II trichoid sensilla in A. pisum in the antennae (Biasio et al., 2015). OBP3 and/or OBP7 proteins in several aphid species (A. pisum, S. avenae, M. viciae, N. ribisnigri and R. padi) have high binding affinities with Eβf (Qiao et al., 2009;Vandermoten et al., 2011;Zhong et al., 2012;Fan et al., 2017), but the cellular location of all M. persicae OBPs in the antennal sensilla and their binding abilities with Eβf are still unknown. Our data reveal that MperOBP7 is antennae specific and expressed at a level that is 8.7 times higher in the antennae than in the body (p < 0.05) (Fig. 9), consistent with previous reports of the high binding affinity of this OBP to Eβf (Qiao et al., 2009;Vandermoten et al., 2011;Zhong et al., 2012;Fan et al., 2017) and its localization in antennal sensilla (Sun et al., 2013;Biasio et al., 2015). However, OBP3, which was also reported as the Eβf binding protein in several aphid species (A. pisum, S. avenae, M. viciae, N. ribisnigri and R. padi), is neither antennae specific nor antennae enriched, but is highly expressed in the heads and body in M. persicae (Fig. 9). This expression pattern is similar to that of A. pisum OBP3 (Biasio et al., 2015) and of S. avenae OBP3 (Xue et al., 2016), suggesting that aphid OBP3 proteins may have other functional roles in addition to alarm pheromone sensation. The fact that the A. pisum OBP3 was found in the type II trichoid sensilla in the antennae and in the terminal region of the body (Biasio et al., 2015) suggests that it may have dual functions of recognizing the Eβf from the environment and transporting the Eβf from the cornicles to the external environment.
Insect CSPs usually exhibit much broader expression profiles both in chemosensory organs and nonchemosensory organs than OBPs do, and play multiple roles in chemoreception, growth and development. For example, the American cockroach (Periplaneta americana) CSP gene P10 has a putative role in leg regeneration (Nomura et al., 1992), and the migratory locust (Locusta migratoria) antennae-expressed CSP gene LmigCSP3 can regulate the rapid switch between attraction and repulsion behaviours in this species (Guo et al., 2011). In the present study, M. persicae CSPs also showed much broader expression profiles in nonsensory organs than M. persicae OBPs do (Fig. 10). None of the nine MperCSPs were found to be antennae-specific. Five of them (MperCSP1/2/4/5/6) showed a higher expression level in the legs than the other three tissues (antennae, heads and body). MperCSP10 was mainly expressed in the antennae and legs. The broad and diverse expression patterns of M. persicae CSPs are consistent with their possible multifunctions in olfactory perception, development and other processes.
The sex pheromones from the Aphididae family are usually released from the tibiae of the hind legs of the mature sexual female aphids (Marsh, 1972;Pickett and Glinwood, 2007), and usually comprise of a mixture of the ubiquitous iridoids (4aS,7S,7aR)-nepetalactone and (1R,4aS,7S,7aR)-nepetalactol (Pickett et al., 1992;Dewhirst et al., 2010); however, in order to convey species integrity, the ratio of (4aS,7S,7aR)-nepetalactone to (1R,4aS,7S,7aR)-nepetalactol is species dependent, and this ratio is 1 : 1.5 for M. persicae (Dawson et al., 1990). The aphid OBPs and CSPs that can specially recognize the two ubiquitous sex pheromones are still unknown. It is reasonable to assume that the M. persicae OBPs and CSPs which are highly expressed in the antennae and legs may participate in the intersexual communication process. Till now, functional studies of the aphid OBPs and CSPs have narrowly focused on several OBPs (eg OBP3/7); the putative functional roles of other aphid OBPs and all the CSPs in the M. persicae and other aphid species still need to be elucidated. Future work towards the functions of these M. persicae OBPs and CSPs will enhance our understanding of the ecological context of aphid-aphid and aphid-plant interactions and facilitate design of novel sustainable strategies for aphid control.

Experimental procedures
Aphids, sample collection and RNA extraction M. persicae aphids were collected from Chinese cabbage (Brassica rapa L. ssp. pekinensis) leaves at Langfang Experimental Station of the Chinese Academy of Agricultural Sciences, Hebei Province, China. Apterous adults were reared continuously as a parthenogenetic colony on tobacco (Nicotiana tabacum L.) seedlings in chambers, at 18-24 °C, 65-75% relative humidity, and a 16 h : 8 h light : dark photoperiod.
About 2000 apterous aphids were dissected with fine scissors to collect their antennae, heads, legs and decapitated bodies into separate tubes and stored in liquid nitrogen. Total RNAs from these samples were extracted using Trizol reagent (Life Technologies, Carlsbad, CA, USA) following the manufacturer's protocol. The quantity and integrity of RNA samples were verified using 1.1% agarose gel electrophoresis and a NanoDrop ND-1000 spectrophotometer (Thermo Scientific, Wilmington, DE, USA).

cDNA library construction and Illumina sequencing
About 50 mg of apterous aphids was collected and kept in a 1.5 ml centrifuge tube in liquid nitrogen for total RNA extraction using the Trizol reagent. The cDNA libraries were constructed using the NEBNext Ultra RNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA) according to the manufacturer's protocol. Briefly, messenger RNAs (mRNAs) were isolated from 10 µg total RNAs from apterous aphids using oligo © 2018 The Authors. Insect Molecular Biology published by John Wiley & Sons Ltd on behalf of Royal Entomological Society, 28, 1-22 (dT) magnetic beads and fragmented into short nucleotides using the fragmentation buffer provided with the kit at 94 °C for 5 min. The cleaved mRNA was transcribed into the first-strand cDNA using a random hexamer primer and M-MuLV reverse transcriptase (RNaseH − ). The second-strand cDNA was subsequently synthesized using DNA polymerase I, deoxyribonucleotide triphosphate and ribonuclase H. After the end repair, dA-tailing and adaptor ligation, the products were amplified by PCR and purified using the QIAquick PCR purification kit (Qiagen, Valencia, CA, USA) to create the final sequencing library. The cDNA library was pair-end sequenced using a PE150 strategy on an Illumina Hiseq2500 platform (Illumina, San Diego, CA, USA).

Bioinformatics analysis
The raw reads were cleaned by removing adaptor sequences and low-quality sequences using Q20 with 99% accuracy using trimmomatic software (version 0.32) (Bolger et al., 2014). The clean reads were de novo assembled into unigenes with the short read assembly program trinity (version 20121005) with default parameters (Grabherr et al., 2011). After assembly, homology searches of all unigenes were performed using the blastx and blastn programs against the GenBank nonredundant protein (nr) and nucleotide sequence (nt) databases at the National Center for Biotechnology Information (NCBI). Matches with an E-value less than 1.0 × 10 −5 were considered to be significant and kept (Anderson and Brass, 1998). Gene names were assigned to each unigene based on the best blastx hit with the highest score value.
The BWA-MEM alignment algorithm (Li, 2013) and htseq program (version 0.6.1) (Anders et al., 2015) were used for aligning the RNA reads and counting the read numbers mapped to each gene, respectively. The abundances of the unigenes in the transcriptome were calculated by the RPKM method (Mortazavi et al., 2008), using the formula where RPKM(A) is the expression abundance of gene A, C is the number of reads that are uniquely mapped to gene A, N is the total number of reads that are uniquely mapped to all genes and L is the number of bases on gene A. The RPKM method is able to eliminate the influence of different gene lengths and sequencing discrepancies on the calculation of transcript abundance.

Annotation of putative OBP and CSP genes in M. persicae transcriptome and genome
The genomic sequences of M. persicae clone G006 assembly v2 were downloaded from AphidBase at the BioInformatics Platform for Agroecosystem Arthropods (https://bipaa.genouest. org/is/aphidbase/myzus_persicae/). Two methods were used to identify unigenes encoding putative OBPs and CSPs in the M. persicae transcriptome and genome. First, we ran the OBP motifsearch program on C1-X 15-39 -C2-X 3 -C3-X 21-44 -C4-X 7-12 -C5-X 8 -C6 (Zhou et al., 2008) and the CSP motifsearch program on C 1 -X 6-8 -C 2 -X 16-21 -C 3 -X 2 -C 4  to identify the respective putative OBP and CSP genes from the M. persicae transcriptomic and genomic databases obtained earlier. Second, a tblastn search with known OBP and CSP sequences from other aphid species as the 'query' was performed to identify OBPs and CSPs from both the transcriptomic and genomic datasets. All candidate OBP and CSP genes were manually checked using the blastx program available at the NCBI.

Verification of OBP and CSP sequences by cloning and sequencing
ORFs of each identified OBP and CSP sequence were found by orf finder graphical analysis at NCBI (https://www.ncbi.nlm. nih.gov/gorf/gorf.html). Then, gene-specific primers (Supporting Information Table S1) were designed using primer 5.0 software to clone the ORF of each M. persicae OBP and CSP gene. The template cDNA was synthesized using the Fast Quant RT Kit (TianGen, Beijing, China). PCR reactions were carried out with 200 ng antennal cDNAs with 0.5 units of Ex Taq DNA polymerase (TaKaRa, Dalian, China). The PCR amplification conditions were set as 95 °C for 2 min, followed by 36 cycles of 94 °C for 45 s, 56 °C for 1 min, 72 °C for 1 min and a final extension at 72 °C for 10 min. The PCR products were gel-purified and subcloned into the pGEM-T Easy vector (Promega, Madison, WI, USA). The inserts were confirmed by sequencing on the ABI3730XL automated sequencer (Applied Biosystems, Carlsbad, CA, USA) with standard M13 primers.

Genomic structure analysis of M. persicae OBPs and CSPs
The genomic DNA sequences of each M. persicae OBP and CSP gene were extracted using the blastn program, with the mRNA sequences of M. persicae OBPs and CSPs as 'query'. The mRNA-to-genomic DNA alignment of each OBP and CSP gene was analysed using the splign online program (https:// www.ncbi.nlm.nih.gov/sutils/splign/splign.cgi).

Motif analysis
A total of 45 OBPs and 41 CSPs from 5 different aphid species (M. persicae, A. gossypii, A. pisum, A. glycines and S. avenae) (Supporting Information Table S2) were used for motif discovery and pattern analysis. A total of 199 OBPs and 103 CSPs from 18 different Hemiptera species (Supporting Information  Table S3) were used for comparing the motif patterns between Hemiptera OBPs and CSPs. All OBP and CSP sequences used in this study have intact ORFs and similar lengths. The motifbased sequence analysis tool meme (Bailey et al., 2009) (version 4.9.1; https://meme-suite.org/), which has been widely used for the discovery of DNA and protein motifs, was used to discover and analyse the motifs in the OBP and CSP sequences. The parameters used for motif discovery were as follows: minimum width = 6, maximum width = 10 and the maximum number of motifs to find = 8. This program analyses any multiple input sequences and predicts motifs and their arrangements along an individual sequence. The outputs are two graphic presentations: one contains the similarity plots for each identified motif

Sequence and phylogenetic analysis
The putative signal peptides of derived M. persicae OBP and CSP proteins were predicted using the SignalIP 4.1 Server (https://www.cbs.dtu.dk/services/SignalP/) (Petersen et al., 2011). The percentage identity matrix of each pair of OBPs was calculated using Vector NTI 11.5 (Invitrogen Corporation, Carlsbad, USA). Owing to the wide divergence of the 237 OBPs (12.63% amino acid identity) and 110 CSPs (16.48% amino acid identity) used in this study, sequence alignments were conducted with the prank alignment program (Löytynoja and Goldman, 2010), phylogenetic trees were established based on maximum likelihood by raxml version 8 (Stamatakis, 2014) with LG substitution matrix selected by the prottest 3 program (Darriba et al., 2011). Bootstrapping was performed to compute the confidence of the branches using 1000 replicates.

Expression profiles of M. persicae OBP and CSP genes in different tissues
The cDNAs from aphid antennae, heads, legs and the body parts were synthesized using the PrimeScript RT Reagent with gDNA Eraser (TaKaRa, Dalian, China). An equal amount of cDNA (150 ng) was used as the RT-PCR and qRT-PCR templates. Gene-specific primer pairs for RT-PCR analyses were designed with Primer 3 (https://frodo.wi.mit.edu/) or Primer Premier 5 (see Supporting Information Table S4). β-Actin (GenBank Acc. XM_022309797) of M. persicae was used as the control gene to test the integrity of the cDNA. The PCR cycling profile was: 95 °C for 2 min, followed by 35 cycles of 95 °C for 30 s, 56 °C for 45 s, 72 °C for 1 min and a final extension for 10 min at 72 °C. The PCR products were separated on 1.2% agarose gels and stained with ethidium bromide. Each reaction was done at least six times with three biological replicates. In addition, the PCR products were randomly selected and verified by DNA sequencing. qRT-PCR was carried out on an ABI 7500 Real-Time PCR system (Applied Biosystems, Carlsbad, CA, USA). The β-actin and GAPDH (GenBank Acc. XM_022315441) genes are stably expressed in different tissues and developmental stages (Supporting Information Fig. S1), so were used as reference genes for normalizing the target gene expression and correcting the sample-to-sample variations. The primers used for qRT-PCR were designed with beacon designer 7.90 (Supporting Information Table S5). Each qRT-PCR reaction was conducted in a 25-µl reaction mixture containing 12.5 µl of SuperReal PreMix Plus (TianGen, Beijing, China), 0.75 µl of each primer (10 µm), 1 µl of sample cDNA (150 ng/µl), 0.5 µl of Rox Reference Dye and 9.5 µl of sterilized water. The qRT-PCR cycling parameters were 95 °C for 15 min, followed by 40 cycles of 95 °C for 10 s and 60 °C for 32 s. Then, the PCR products were heated to 95 °C for 15 s, cooled to 60 °C for 1 min, heated again to 95 °C for 30 s and cooled to 60 °C for 15 s to measure the melt curves. Negative controls without any template were included in each experiment. To ensure reproducibility, each qRT-PCR reaction for each sample was performed in three technical replicates and three biological replicates. The comparative 2 −ΔΔC T method (Livak and Schmittgen, 2001) was used to calculate the relative expressions between tissues. The comparative analyses of each target gene among various tissues were determined using a one-way nested analysis of variance, followed by Tukey's honest significance difference test using the spss statistics 18.0 software (SPSS Inc., Chicago, IL, USA). When applicable, the values were presented as the mean plus/minus the standard error.
An assumption in the ΔΔC T calculation for the comparative 2 −ΔΔC T method for quantification is that the amplification efficiencies of the target and reference genes are approximately equal (Livak and Schmittgen, 2001). To confirm this, a pilot experiment was conducted to examine the variation of ΔC T (C T, Target − C T, β-actin/GAPDH ) with template dilution. Briefly, five serial fivefold dilutions of cDNA from each sample were amplified. For each dilution, amplifications were performed in triplicate using primers for the target gene and β-actin or GAPDH. Mean C T was calculated for both target gene and β-actin or GAPDH, ΔC T was calculated, and log cDNA dilution vs. ΔC T was plotted. The real PCR amplification efficiencies of target and reference genes were calculated using the linregpcr program (version 11.0) (Ramakers et al., 2003).

Supporting Information
Additional Supporting Information may be found in the online version of this article at the publisher's web-site: Figure S1. The stable expression of M. persicae β-Actin and GAPDH in different development stages and tissues measured by RT-PCR.     Fig. 7. Table S3. The protein names and sequences of the 199 OBPs and 103 CSPs from Hemiptera insects used in Fig. 8. Table S4. Primers used in RT-PCR for determination expression levels of M. persicae OBP and CSP genes. Table S5. Primers used in real-time PCR for determination expression level of M. persicae OBP and CSP genes. The real PCR amplification efficiencies of target and reference genes were calculated using LinRegPCR program (version 11.0) (Ramakers et al., 2003).    Fig. 3. Table S9. The protein names and sequences of the Hemiptera 110 CSPs used in Fig. 4.