Dr Linda M. Field, Department of Biological Chemistry, Rothamsted Research, Harpenden, Hertfordshire AL5 2JQ, UK. Tel.: +44 158276 3133; fax: +44 158276 2595; e-mail: email@example.com
Odorant-binding proteins (OBPs) and chemosensory proteins (CSPs) are two families of small water-soluble proteins, abundant in the aqueous fluid surrounding olfactory receptor neurons in insect antennae. OBPs are involved in the first step of olfactory signal transduction, carrying airborne semiochemicals to the odorant receptors and can be classified into three groups: Classic OBPs, Plus-C OBPs and Atypical OBPs. Here, we identified and annotated genes encoding putative OBPs and CSPs in the pea aphid Acyrthosiphon pisum using bioinformatics. This identified genes encoding 13 Classic and two Plus-C OBPs and 13 CSPs. Homologous OBP sequences were also identified in nine other aphid species, allowing us to compare OBPs across several aphid and non-aphid species. We show that, although OBP sequences are divergent within a species and between different orders, there is a high similarity between orthologs within a range of aphid species. Furthermore, the phylogenetic relationships between OBP orthologs reflect the divergence of aphid evolution lineages. Our results support the ‘birth-and-death’ model as the major mechanism explaining aphid OBP sequence evolution, with the main force acting on the evolution being purifying selection.
The publication of the whole genome sequence of the pea aphid Acyrthosiphon pisum now provides a platform for the analysis of OBPs and CSPs in aphids in relation to their chemical ecology. Aphids communicate with each other and migrate between host plants using species-specific chemical signals (semiochemicals) such as pheromones and plant volatiles acting as attractants and repellents. The aphid sex pheromones are often similar amongst different species with discrimination relying on blends of two or three compounds in species-specific ratios (Pickett et al., 1992; Hardie et al., 1997). In addition, behavioural studies show that plant volatile components enhance successful mate location by male aphids (Pickett et al., 1992; Guldemond et al., 1993; Hardie et al., 1994; Lösel et al., 1996; Campbell et al., 2003). Thus, there is considerable interest in understanding how aphids respond to semiochemicals. This is further motivated by the possibility that insight into the response of aphids to host odours could help to develop novel control strategies that interfere with the interactions between pest aphids and crop plants.
Here, we report the annotation of genes encoding putative OBPs and CSPs in the genome of A. pisum and the identification of orthologous genes in nine other aphid species from two families and four tribes. We compared aphid OBP sequences with each other and with other insect OBPs. We also address the molecular evolution of the gene family as a whole and the influence of natural selection on the evolution of insect OBP genes.
Results and discussion
OBP and CSP genes in Acyrthosiphon pisum
Analyses of the A. pisum genome sequence (27 798 scaffolds) and expressed sequence tag (EST; 169 599 sequences) databases identified 15 sequences encoding putative OBPs (four of them are truncated, likely due to incomplete DNA sequencing/assembling) and 13 encoding CSPs (three of them are also truncated) (Table 1). Some of these are very similar (ApisOBP3, ApisOBP11 and ApisOBP12) with an identity of 45.2% at the amino acid level. The number of OBPs in A. pisum is small compared to that reported in Drosophila melanogaster (Hekmat-Scafe et al., 2002; Vieira et al., 2007), Anopheles gambiae and Aedes aegypti (Zhou et al., 2008) but is comparable to Apis mellifera (Forêt & Maleszka, 2006) and Bombyx mori (Zhou et al., 2009). Multiple factors are probably responsible for the differences in the OBP numbers. It has been observed that parasitic and symbiotic lifestyles lead to a genome reduction, either by redundancy of function, or as a result of a simpler and homogeneous host environment (Wernegreen, 2002; Moya et al., 2008). A. pisum, with what can be considered as a parasitic lifestyle, may have relaxed selective constraint on genes related, for example, to avoidance of hazardous substances, digestive processes and food/mate location.
Table 1. Odorant-binding proteins and chemosensory proteins annotated in the Acyrthosiphon pisum genome
AphidBase scofflod ID (start.stop nt, orientation).
CN, both C- and N-terminus missing; CT, C-terminus missing; NA, not applied; ND, not detected; NT, N-terminus missing; SP, signal peptide.
EQ115283 (2092..7402, −)
EQ125284 (8441..12325, +)
EQ113858 (4821..13329, +)
EQ127504 (11369..14618, −)
EQ119281 (57020..67390, +)
EQ122941 (3782..9126, +)
EQ121833 (58105..66623, +)
EQ124790 (36319..42743, +)
EQ112785 (118532..124337, −)
EQ111016 (18155..20991, −)
EQ113328 (24063..39281, +)
EQ124790 (31961..35227, +)
EQ121843 (21060..21949, −)
EQ117403 (522..1405, +)
EQ118788 (55206..56130, −)
EQ110797 (7930..17292, −)
EQ125317 (40244..41374, +)
EQ126525 (52894..53537, +)
EQ117790 (6019..7042, +)
EQ121783 (61045..62836, +)
EQ110797 (29846..31524, −)
EQ122410 (2584..10467, −)
EQ121783 (40804..46054, +)
EQ125317 (41584..44149, −)
EQ116326 (76913..80468, −)
EQ110797 (20097..25202, −)
EQ116510 (25..818, +)
We have identified six putative OBP genes from 15 611 ESTs of the blood sucking bug Rhodnius prolixus, a hemimetabolous insect, and found A. pisum OBP orthologs. For example, ApisOBP2 has 23.2% and 26.3% identity to RproOBP3 and RproOBP6, respectively. We also found orthologs in the body louse Pediculus humanus, a human parasitic insect (Vieira et al., unpubl. data). However, only the hemimetabolous insects A. pisum and R. prolixus have paralogs with high amino acid identity. Thus there is 36.3% identity between RproOBP3 and RproOBP6 and 80.9% between RproOBP4 and RproOBP5 which is clustered with RproOBP2 with an overall identity of 38.9%. There is 36.8% identity between ApisOBP1 and ApisOBP8, and 66.4% between ApisOBP11 and ApisOBP12 with 60.3% and 61.0% identity to ApisOBP3, respectively. These OBP expansions by gene duplication are much less than observed in the Dipteran insects (Xu et al., 2003; Zhou et al., 2008). In contrast to the varying number of OBPs amongst different insect species there is a more consistent expansion of chemoreceptor genes (odorant and/or gustatory receptors) in B. mori (Wanner & Robertson, 2008), D. melanogaster (Clyne et al., 2000; Robertson et al., 2003), Ae. aegypti (Bohbot et al., 2007; Kent et al., 2008), An. gambiae (Fox et al., 2001), Apis mellifera (Robertson & Wanner, 2006), Tribolium castaneum (Engsontia et al., 2008) and A. pisum (Smadja et al., 2009).
Genomic structure of Acyrthosiphon pisum OBP and CSP genes
The A. pisum OBP genes are generally sparsely distributed across 14 scaffolds (Fig. 1). However, the CSP genes are clustered with ApisCSP1, ApisCSP6 and ApisCSP11 on EQ110797 within a 23594 base pair (bp) region; ApisCSP2 and ApisCSP9 on EQ125317 within a 3905 bp region (210 bp apart); ApisCSP5 and ApisCSP8 on EQ121783 within a 22032 bp region and ApisCSP3 and ApisCSP12 on EQ126525 within a 8363 bp region.
The A. pisum OBP genes have more and longer introns than their counterparts in D. melanogaster with an average intron number of 6.0 introns per gene (n= 11 genes) for A. pisum and 1.5 per gene (n= 51 genes) for D. melanogaster. The average intron length is 6111.8 bp (n= 77 introns), ranging from 58 bp to 118532 bp for A. pisum and 93.2 bp (n= 90 introns), ranging from no intron to 638 bp for D. melanogaster (Fig. 1). These data are consistent with results showing that D. melanogaster has experienced a drastic reduction in non-coding DNA including introns (Petrov et al., 1996; Zdobnov et al., 2002). Two of the A. pisum OBP genes, ApisOBP5 and ApisOBP6 encode proteins predicted to be Plus-C OBPs and this is the first report of this type of OBP outside of dipteran insects, indicating that Plus-C OBPs must have evolved before the divergence of the aphids from the dipterans.
Orthologous genes in other aphid species
Orthologs of the A. pisum OBP genes were identified in nine other aphid species using PCR with primers designed to the A. pisum sequences (Table S1). The gene encoding OBP2 is present in all species, except Tuberolachnus salignus, and further analysis of aphid OBP2 shows that the sequences cluster into three groups, which correspond with the three aphid ‘tribes’, the Macrosiphini, the Pterocommatini and the Aphidini (Fig. 2 and Table S2). The average number of orthologous genes found was 6, 3, 5 and 1 for Macrosiphini, Aphidini, Pterocommatini and Lachnini, respectively. Only one orthologous OBP was found in Tu. salignus, which belongs to a different family from the other aphids. Thus, although we cannot exclude the presence of undetected OBP genes in these aphid species, the distribution of the OBP orthologs does reflect the life style and the host relationship of the aphid species within the tribes. The morphology of Pterocommatini and their simple life cycles on woody hosts are regarded as primitive and this tribe is placed as sister to Aphidini plus Macrosiphini (Blackman & Eastop, 2000), which is supported by our analysis of OBP genes (Fig. 2 and Table S2).
Phylogenetic analysis of aphid OBP genes
The phylogenetic relationships of the predicted OBPs in the nine aphids, and other insect species (D. melanogaster, An. gambiae, B. mori, T. castaneum, and Apis mellifera) are shown in Fig. 3. This reveals a divergent repertoire with only a few clear orthologous groups that include non-aphid species, possibly reflecting the OBP gene family's evolutionary process, dominated by a number of gene losses and lineage-specific expansions. The two orthologous groups with a clear member across all sequenced insects (apart from Hymenoptera) are those that include ApisOBP4 and ApisOBP13 (Figs 3 and 4). Interestingly, some members of the ApisOBP4 group, for example DmelOBP73a have not been assigned as OBPs previously because of their divergence from other OBP members (Hekmat-Scafe et al., 2002; Vieira et al., 2007). The high conservation of OBPs in this group, across a large number of divergent species indicates a possible crucial function for these proteins.
The phylogenetic relationship of the OBPs from the aphid species is shown in Fig. 4 and this is largely consistent with the accepted species tree (von Dohlen et al., 2006). The tree shows higher divergence times for paralogs compared with orthologs, long tree branches and a scattered phylogenetic distribution, indicating that the OBP gene family is quite old (with the MRCA tracing back to the origin of insects – 350–400 million years ago). In addition, the analysis including some R. prolixus OBPs also supports a highly dynamic evolutionary process for this family. In fact, and in spite of the close relationship with R. prolixus, we were able to detect several lineage specific expansions and few common orthologous groups.
Overall, the analysis of the aphid OBPs further supports the ‘birth-and-death’ evolutionary model (Nei & Rooney, 2005) as the major mechanism for the evolution of this gene family. That is, in aphids, OBPs would originate by tandem gene duplication and gradually diverge from each other in sequence (and presumably also in function) while others could eventually be lost (transiently by a pseudogenization event). This would lead to the identification of more orthologous groups at short time scales (among aphids) with better phylogeny coordination and to inferring several gene duplications and putative non-functional members (pseudogenes).
Impact of natural selection
We determined the impact of natural selection on the evolution of the OBP coding regions from the nine aphid species using only those groups with at least five sequences: OBP2, OBP3, OBP4, OBP5, OBP8 and OBP10 groups (the OBP1 group was not analysed due to the low level of sequence variability). To avoid the problem of the saturation of substitutions we analysed each orthologous group separately.
Estimates of the ω values ranged from 0.11 to 0.30 (Table S3a); which is similar to that obtained in D. melanogaster (average ω= 0.153; Vieira et al., 2007) and point to purifying selection as a major selective force. The comparative analyses of the M0 and FR models reveal that, in general, the M0 model fits the data better than does the FR model, with the only exception being the OBP2 orthologous group (LRT; P = 0.0286). There is a slight indication of positive selection but the branch is too small (Table S3b). Nevertheless, the statistical power seems to be low since the only significant group (the OBP2 group) is the one with more sequences (n= 9) (Fig. 4) and the remaining orthologous groups have a relatively low number of sequences.
For the analysis of the putative heterogeneity in the distribution of the ω rates along the coding region we contrasted the M0 and the M3 (k = 2) models. We found that in all cases the M3 model was the best fit to the data (LRT; P < 0.001). We then tested whether the heterogeneity results from some form of positive selection. After contrasting the M7 and M8 models, the null hypothesis (M7 model) was again only rejected in the large OBP2 orthologous group (LRT; P = 0.0221). In addition, the LRT between the more conservative M8a and M8 models is also significant (LRT; P= 0.0407), further supporting the positive selection analysis. More specifically, the analysis allows us to identify a single amino acid, Ser88 in ApisOBP2 with the positive selection hallmark (ω= 1.851; PP(ω>1)= 0.977). It is worth emphasizing that, despite the observed high functional constraint levels, the large time-scale analysed may obscure short and/or recent/ongoing episodes of molecular adaptation. In similar analyses of the Drosophila OBP gene family (Vieira et al., 2007), the fingerprint of positive selection was only identified using other approaches (Sánchez-Gracia & Rozas, 2008).
The genome sequence of A. pisum, has allowed us to identify OBP and CSP genes in a range of aphid species. This had not been possible previously using homology cloning (Jacobs et al., 2005). The small number of OBPs in A. pisum relative to dipteran insects may be due to the highly specialized ecology of the aphid (host plant specialization) and its parasitic lifestyle. The high similarity of OBP genes in different aphid species indicates that OBP genes become divergent before aphid speciation through the mechanisms proposed by the birth-and-death model. Furthermore lifestyle and environmental factors may be the main forces driving the expansion of insect OBPs.
Identification sequences encoding putative OBPs and CSPs in the Acyrthosiphon pisum genome
The whole genome sequences and predicted gene model sets of A. pisum were downloaded from Aphidbase (http://genouest.org/AphidBase/) and the EST sequences retrieved from the NCBI EST database (http://www.ncbi.nlm.nih.gov/dbEST/). The genome sequences were searched using (1) an OBP ‘MotifSearch’ algorithm to identify the conserved cysteine motif C1-X8-41-C2-X3-C3-X21-47-C4-X7-15-C5-X8-C6 in the 6-frame translated sequences (Zhou et al., 2004, 2006, 2008); (2) rps-BLAST with the PBP/GOBP (pfam01395) and CSP (pfam03392) conserved domains and 3) tBLASTn and PSI-BLAST using, as ‘query’, known OBPs from other insects. For the predicted gene set, the same basic methodology was used but with (1) BLASTp and (2) HMMER.
Aphids were reared as parthenogenic clones at 22 °C in a 16 h light: 8 h dark regime. Wingless morphs of mixed ages were collected, frozen in liquid nitrogen and stored at −80 °C prior to use.
Cloning and sequencing of OBP genes
Whole insects were ground in liquid nitrogen and total RNA extracted using RNAqueous kit (Ambion, Huntingdon, UK) and treated with DNaseI (Sigma, St Louis, MO, USA). RT-PCRs were done with each primer pair using Hotstart Taq DNA polymerase (Qiagen, Valencia, CA, USA). The PCR primers were designed on the pea aphid OBP sequences with Primer3 (http://www-genome.wi.mit.edu/cgi-bin/primer) (Table S1). The PCR products were run on 1% agarose gels and stained with ethidium bromide to check that the correct products were being amplified. They were then purified using a Qiagen kit and sequenced in both directions with the ABI BigDyeTM Terminator Cycle Sequencing Ready Reaction Kit (PE Applied Biosystems, Foster City, CA, USA).
Multiple sequence alignments
Two types of multiple sequence alignment (MSA) were generated, one with amino acid and one with nucleotide coding sequences (CDS). Protein sequences were aligned using MAFFT (Katoh et al., 2005), with the following settings: E-INS-i with BLOSUM30 matrix, maximum 10 000 iterations, gap opening penalty ‘1.53’ (default) and offset (equivalent to gap extension penalty ‘0’). The OBP peptide signal was removed (using PrediSi software; Hiller et al., 2004) prior to the alignment. The MSA for the CDS orthologous regions was done by first aligning the amino acid sequences and then using this alignment to guide the nucleotide CDS alignment.
Phylogenetic relationships between homologous OBPs (both orthologs and paralogs) were obtained using the software MrBayes v3.1.2 (Ronquist & Huelsenbeck, 2003), under the WAG evolutionary model of amino-acid evolution (Whelan & Goldman, 2001). The analysis used the default parameters except: ‘stoprule = yes’, ‘stopval = 0.005’, ‘samplefreq = 1000’ and ‘burnin = 20%’.
The impact of natural selection on the CDS of the OBP genes was deduced by analysing the non-synonymous to synonymous divergence ratio (ω=dN/dS) using the program ‘codeml’ of the software package PAML v4.1 (Yang, 2007) (this estimates by maximum likelihood the ω parameter under several evolutionary scenarios) and the phylogenetic relationships of von Dohlen et al. (2006). To assess for heterogeneity across branches the M0 (a single ω ratio for all lineages and sites) and the free ratios (FR; allows for different ω rates across branches) models were compared. To analyse the ω rate heterogeneity across sites M0 was compared with M3 (k = 2) (one ω ratio for all lineages; two ω categories of sites) models. To determine the presence of positive selection the M7 (one ω ratio for all lineages; 10 ω categories of sites following a beta distribution) and M8 (one ω rate for all lineages; 10 ω categories of sites following a beta distribution plus one extra site with ω > 1) models were compared and, in order to be conservative, the M8a (one ω rate for all lineages; 10 ω categories of sites following a beta distribution plus one extra site with ω= 1) with the M8 models (Swanson et al., 2003; Wong et al., 2004). The comparison between models was assessed using likelihood-ratio tests (LRTs) for hierarchical models (Anisimova et al., 2001), a significantly higher likelihood of the alternative model than that of the null model indicating positive selection in the dataset examined. The posterior probability (PP) that a given site evolves under positive selection was estimated applying the BEB method (implemented in PAML software).
Rothamsted Research receives grant-aided support from the BBSRC of the UK. We thank Janet Martin and Lesley Smart at Rothamsted Research who provided us with A. pisum, Myzus persicae, Metopolophium dirhodum, Megoura viciae, Nasonovia ribis-nigri, Sitobium avenae, Rhopalosiphon padi, Aphis fabae and Gia Aradottir for supplying Pterocomma salicis and Tu. salignus. FGV was supported by the predoctoral fellowship SFRH/BD/22360/2005 from the ‘Fundação para a Ciência e a Tecnología’ (Portugal). This work was funded by grant BFU2007-62927 from the ‘Dirección General de Investigación Científica y Técnica’ (Spain) to JR. We thank the International Aphid Genomics Consortium and the Baylor College of Medicine Human Genome Sequencing Centre for making the A. pisum genome sequences publicly available prior to publication.