• chemosensory;
  • olfaction;
  • mosquito;
  • semiochemicals


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

The yellow fever mosquito Aedes aegypti is an important human health pest which vectors yellow fever and dengue viruses. Olfaction plays a crucial role in its attraction to hosts and although the molecular basis of this is not well understood it is likely that odorant-binding proteins (OBPs) are involved in the first step of molecular recognition. Based on the OBPs of Drosophila melanogaster and Anopheles gambiae we have defined sequence motifs based on OBP conserved cysteine and developed an algorithm which has allowed us to identify 66 genes encoding putative OBPs from the genome sequence and expressed sequence tags (ESTs) of Ae. aegypti. We have also identified 11 new OBP genes for An. gambiae. We have examined all of the corresponding peptide sequences for the properties of OBPs. The predicted molecular weights fall within the expected range but the predicted isoeletric points are spread over a wider range than found previously. Comparative analyses of the 66 OBP sequences of Ae. aegypti with other dipteran species reveal some mosquito-specific genes as well as conserved homologues. The genomic organisation of Ae. aegypti OBPs suggests that a rapid expansion of OBPs has occurred, probably by gene duplication. The analyses of OBP-containing regions for microsynteny indicate a very high synteny between Ae. aegypti and An. gambiae.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

It is well established that insects use a wide range of chemical signals or semiochemicals such as pheromones, plant volatiles or animal odours to detect each other and to locate suitable plant or animal hosts (Zwiebel & Takken, 2004). These semiochemicals are small hydrophobic molecules which enter the antennae and other sensory organs via pores and pass across the hydrophilic sensillum lymph surrounding the olfactory neuronal dendrites. There is evidence that this passage is facilitated by odorant-binding proteins (OBPs), including the pheromone-binding proteins (PBPs) and the so-called general odorant-binding proteins (GOBPs), which solubilize and transport the hydrophobic molecules to the insect olfactory receptors (ORs) (Xu et al., 2005; Syed et al., 2006). In the Lepidoptera it has been suggested that the two families of OBP genes originated from a common ancestral gene by gene duplication (Vogt et al., 2002).

OBPs are small (15–20 kDa, ca. 120–150 amino acids), water soluble, globular proteins with a signal peptide. They were first reported in the Lepidoptera (Vogt & Riddiford, 1981), but now many studies have described such proteins and their associated genes in a wide range of insect species. OBPs are highly concentrated in the lymph of chemosensilla (up to 10 mM) (Vogt et al., 1985; Klein, 1987) and many have been shown to bind pheromones and other odorants (for review see Pelosi et al., 2006) supporting a role in insect molecular recognition. This is further supported by the finding that most OBPs from Lepidoptera as well as some from Drosophila melanogaster are expressed specifically in the antennae (Shanbhag et al., 2001; also reviews Steinbrecht, 1998; Vogt et al., 2002; Leal, 2003; Pelosi et al., 2006). Furthermore, cells expressing particular ORs have been shown to be closely associated with cells expressing the corresponding PBPs (Steinbrecht, 1998; Krieger et al., 2005). It has been reported that the OBP LUSH of D. melanogaster is required for pheromone detection (Xu et al., 2005) and recently, Syed et al. (2006) demonstrated the involvement of a Bombyx mori PBP (BmPBPl) in facilitating pheromone activation of the corresponding OR. Although it is not clear exactly how mosquitoes locate their hosts at the molecular level and hence transmit diseases to humans, there is some evidence for an involvement of OBPs in odour recognition in Anopheles gambiae (Justice et al., 2003; Li et al., 2005).

Insect OBPs are very diverse proteins with an average of only 14% amino acid identity. The sequence analysis of genes encoding putative OBPs in the genomes of D. melanogaster (Graham & Davies, 2002; Hekmat-Scafe et al., 2002; Zhou et al., 2004; Pelosi et al., 2006) and An. gambiae (Xu et al., 2003; Zhou et al., 2004) has suggested that there are several distinct subgroups of OBPs with differing numbers of conserved cysteines. These have been classified as ‘Classic’ OBPs, which have a highly conserved pattern of six cysteine residues; ‘Plus-C’ or ‘C-Plus’ OBPs, which have at least two additional conserved cysteines and a proline immediate after the sixth cysteine (Hekmat-Scafe et al., 2002; Xu et al., 2003; Zhou et al., 2004); ‘dimer’ OBPs, which contain two Classic OBP motifs in tandem (Zhou et al., 2004); and ‘Atypical’ OBPs with an extended C-terminal region (Xu et al., 2003). So far, the majority of OBPs characterized belong to the Classic subgroup with their six cysteines paired in three interlocked disulphide bridges (Leal et al., 1999; Scaloni et al., 1999) forming a compact structure, consisting mainly of alpha-helical domains defining an internal binding pocket (Sandler et al., 2000; Lee et al., 2002; Kruse et al., 2003; Lartigue et al., 2003; Wogulis et al., 2006). It has been shown that several of the Atypical OBPs are transcribed in the chemosensory organs of adult An. gambiae (Xu et al., 2003) and a Plus-C OBP AgamOBP48 interacts specifically with other Classic OBPs (Andronopoulou et al., 2006). Atypical OBPs also share similar sequence motifs with Classic OBPs (Zhou et al., 2004) and are expressed in the heads of An. gambiae (Xu et al., 2003; Li et al., 2005) as well as in the antennae and are downregulated after a blood meal (Biessmann et al., 2002, Justice et al., 2003; Biessmann et al., 2005). Phylogenetic analyses of all dipteran OBPs did not demonstrate a clear segregation among Atypical, Plus-C and Classic OBPs (Pelosi et al., 2006).

With the publication of the genome sequences of D. melanogaster (Adams et al., 2000), An. gambiae (Holt et al., 2002), Apis mellifera (The Honeybee Genome Sequencing Consortium, 2006) and Aedes aegypti (Nene et al., 2007), it has become possible to look at how many genes encoding putative OBPs are present. There are 51 in D. melanogaster (Galindo & Smith, 2001; Graham & Davies, 2002; Hekmat-Scafe et al., 2002), for An. gambiae there are three independent genome annotations, reporting up to 57 (Vogt, 2002; Xu et al., 2003; Zhou et al., 2004) and a recent report suggests 21 in the honeybee A. mellifera (Forêt & Maleszka, 2006). A genome-wide analysis of chemosensory proteins (CSPs), another subfamily of putative OBPs (Pelosi et al., 2006), from many species has also been reported (Zhou et al., 2006). However, for Ae. aegypti only one OBP (Aaeg-OBP10) has been found by conventional molecular techniques (Bohbot & Vogt, 2005), and no genome-wide analysis has been carried out. We have collated the insect OBP genes identified previously, aligned them into their distinct subgroups according to the number of, and spacing between, the conserved cysteines. This has allowed us to develop a motif search algorithm (MotifSearch) to search insect genomes and expressed sequence tags (ESTs) for all predicted transcript sequences that contain an OBP motif. This has found 66 genes encoding putative OBPs in Ae. aegypti and an additional 11 OBPs in An. gambiae. We have identified four Ae. aegypti-specific OBPs and three groups of OBPs which are present in all three dipteran species, ie Ae. aegypti, An. gambiae and D. melanogaster. We have examined the genomic organisation of the genes and can speculate on their possible evolutionary origins.

Results and discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Determination of motifs for the identification of genes encoding putative OBPs

An alignment of 33 D. melanogaster and 29 An. gambiae Classic OBPs showed that they all fitted the motif C1-X15–39-C2-X3-C3-X21–44-C4-X7–12-C5-X8-C6 and we designated this as the ‘Classic Motif. Similarly, alignment of the 16 previously annotated Atypical OBPs in An. gambiae (Xu et al., 2003, Zhou et al., 2004) produced an ‘Atypical Motif’ of C1-X26–27-C2-X3-C3-X36–38-C4-X11–15-C5-X8-C6, which has a less flexible spacing between C3 and C4. A ‘Plus-C Motif’ of C1-X8–41-C2-X3-C3-X39–47-C4-X17−29-C4a-X9-C5-X8-C6-P-X9–11-C6a was derived from the previously reported Plus-C OBP sequences of An. gambiae (Xu et al., 2003), D. melanogaster and D. pseudoobscura (Zhou et al., 2004), where the number of amino acids between C4 and C5 is increased from (7–12) to (27–39) and there are two additional conserved cysteines (C4a, C6a) and a highly conserved proline immediately after C6. These motifs were then used in an algorithm ‘MotifSearch’ that expands the motifs to any of the possible combinations and uses each combination to search peptide sequences for the occurrence of the conserved cysteines and the spacing between them. It then retrieves the motif-containing fragments and sequence ID and finally compares them with known OBPs using Blastp with the scoring matrix BLOSUM62 (Fig. 1).


Figure 1. Flowchart for identification of insect odorant-binding proteins (OBPs) from genome sequences (see details in Experimental procedures).

Download figure to PowerPoint

Identification of OBPs using MotifSearch

The MotifSearch algorithm was used to search peptide sequences from the Ae. aegypti genome data. To verify the use of MotifSearch, we also analysed the genomes of An. gambiae and D. melanogaster. Five previously reported D. melanogaster OBPs (OBP8a, OBP44a, OBP99c, OBP99d and OBP59a) and three An. gambiae OBPs (AgamOBP16, AgamOBP22 and AgamOBP42) have no OBP motif as defined by our algorithm because they lack one or more of the conserved cysteines and some have shorter 5′- and 3′-terminals so are missing the first, fifth, and sixth cysteines, respectively. However, some of them have a significant similarity to other OBPs with E-values from 10−2 to 10−90 (data not shown), so they were included in our data analysis.

For An. gambiae we found 33 Classic OBPs, 19 Plus-C OBPs and 14 Atypical OBPs (Table 1). Of the 33 Classic OBPs, four had not been annotated previously and we have named these as new OBPs, AgamOBP65, AgamOBP66, AgamOBP67 and AgamOBP68 (for an alignment with other Ae. aegypti Classic OBPs see Supplementary Material Fig. S1). We also identified seven new Plus-C OBPs and named them as AgamOBP58, AgamOBP59, AgamOBP60, AgamOBP61, AgamOBP62, AgamOBP63 and AgamOBP64. For D. melanogaster the MotifSearch found 49 putative OBPs, 37 had the Classic Motif and all of these had been annotated previously as OBPs. The Plus-C Motif found all of the 12 known Plus-C OBPs and there were no peptide sequences with the Atypical Motif, in agreement with the previous comparison between D. melanogaster and An. gambiae (Xu et al., 2003). The total of 49 peptide sequences in the D. melanogaster genome is lower than the 51 reported previously which included ‘C-minus OBPs’ (Hekmat-Scafe et al. 2002), which do not have any sequence features of the Classic OBPs.

Table 1.  Odorant-binding proteins (OBPs) identified from insect genomes using MotifSearch and Blastp
  • *

    The number of peptide sequences including alternately spliced transcripts downloaded from genome databases and searched with MotifSearch.

  • The numbers in parentheses indicate the number of OBPs reported previously.

  • Two dimers (DmelOBP83cd and DmelOBP83ef) are included.

Drosophila melanogaster (19 389 sequences)*
 No. motif-containing sequencesNo. putative OBP
 Plus-C 1412
 Atypical  9 0
  Total12549 (51)
Anopheles gambiae (13 639 sequences)*
 No. motif-containing sequencesNo. putative OBP
 Classic 9533
 Plus-C 1619
 Atypical 2514
  Total13666 (57)
Aedes aegypti (16 789 sequences)*
 No. motif-containing sequencesNo. putative OBP
 Plus-C 1717
 Atypical 2215
  Total14766 (1)

For Ae. aegypti, we identified 34 Classic OBPs, 17 Plus-C OBPs and 15 Atypical OBPs (Table 1). Their identities and names are listed in Table 2 and the alignments are presented in Supplementary Material Figs S1 (Classic OBPs), S2 (Atypical OBPs) and S3 (Plus-C OBPs). The previously reported Ae. aegypti OBP (Aaeg-OBP10) (Bohbot & Vogt, 2005) was identified by our algorithm and named as AaegOBP10 (Table 2). Our finding of Atypical OBPs in Ae. aegypti, along with the previous reports in An. gambiae but not D. melanogaster (Xu et al., 2003; Li et al., 2005) suggests that this type of OBP may be unique to mosquito species.

Table 2.  List of identified Aedes aegypti odorant-binding proteins (OBPs) and comparison to OBPs of Anopheles gambiae and Drosophila melanogaster
Ae. aegyptiAn. gambiaeD. melanogaster
OBPTranscript GenBank IDSupercontHomologueE valueChr. Location*OBP*HomologueE valueOBP
  • *

    Chromosome location and OBP names were obtained from VectorBase feature report.

  • †OS-E/OS-F homologues;

  • ‡LUSH homologues;

  • §

    DmelOBP19a homologues. Light grey shading indicates Ae. aegypti-specific OBPs and dark grey indicates Dipteran OBPs. NA, not available.

AaegOBP1AAEL006454EAT419421.206:50737:67086:1ENSANGP000000155956E-403L: 4997975-4998746AgamOBP4CG88078E-23LUSH
AaegOBP2§AAEL000071EAT489781.1:4056982:4057610:-1ENSANGP000000226541E-412R: 39200949-39201650AgamOBP6CG117482E-16obp19a
AaegOBP3§AAEL000051EAT489791.1:4124658:4140143:1ENSANGP000000170575E-422R: 55987081-55987846AgamOBP19CG11748 3E-21obp19a
AaegOBP4§AAEL000073EAT489801.1:4140321:4154466:-1ENSANGP000000103149E-332L: 12288987-12289437AgamOBP20CG11748 6E-20obp19a
AaegOBP5AAEL000139EAT488481.2:651524:652909:−1ENSANGP000000272982E-332L: 26101766-26102773OBPjj17No HitN.A.N.A.
AaegOBP6AAEL000821EAT481271.17:3417930:3418697:1ENSANGP000000242932E-233L: 30753691-30754558AgamOBP30No HitNANA
AaegOBP7AAEL000833EAT481361.17:3935676:3936843:−1ENSANGP000000253702E-23X: 11416805-11417807AgamOBP33No HitNANA
AaegOBP8AAEL001826EAT470621.43:1541221:1541846:1ENSANGP000000252302E-253R: 10317254-10317832AgamOBP21CG112185E-10obp56d
AaegOBP9AAEL002596EAT462021.61:1448858:1449413:1ENSANGP000000248613E-253L: 40168854-40169329AgamOBP23CG138734E-08obp56g
AaegOBP10AAEL007603EAT406821.266:1269968:1270519:1ENSANGP000000121028E-222R: 1168792-1169405AgamOBP10CG112181E-11obp56d
AaegOBP11AAEL002587EAT462051.61:1518773:1519546:−1ENSANGP000000200791E-343L: 40209753-40210315AgamOBP25CG112181E-09obp56d
AaegOBP12AAEL002617EAT462061.61:1525746:1526203:1ENSANGP000000201179E-313L: 40221019-40221499AgamOBP28CG84621E-09obp56e
AaegOBP13AAEL002591EAT462071.61:1541151:1541767:−1ENSANGP000000289621E-363L: 40217380-40217850OBPjj11CG84627E-12obp56e
AaegOBP14AAEL002605EAT323191.61:1560540:1560999:1ENSANGP000000289623E-373L: 40217380-40217850OBPjj11CG84623E-12obp56e
AaegOBP15AAEL002598EAT462111.61:1605563:1618454:1ENSANGP000000284533E-563L: 40219630-40220281AgamOBP27CG112181E-03obp56d
AaegOBP16AAEL003315EAT454291.83:2455477:2456452:1ENSANGP000000253702E-18X: 11416805-11417807AgamOBP33No HitNANA
AaegOBP17AAEL004339EAT442811.115:1055155:1055633:1ENSANGP000000125884E-07X: 5035502-5035999AgamOBP9CG75928E-03obp99b
AaegOBP18AAEL004342EAT442821.115:1056506:1057069: −1ENSANGP000000133936E-072R: 17331939-17332503AgamOBP14CG181115E-03obp99a
AaegOBP19AAEL004343EAT442841.115:1059658:1060207: −1ENSANGP000000252443E-063L: 2853089-2853645AgamOBP22CG181118E-03obp99a
AaegOBP20AAEL005778EAT427121.174:827019:827663: −1ENSANGP000000121785E-392R:22787966-22788641AgamOBP56CG11767E-11Pbprp4
AaegOBP23AAEL006109EAT423621.189:217057:217969: −1ENSANGP000000247498E-633L: 22025866-22026710AgamOBP56No HitNANA
AaegOBP24AAEL006108EAT423501.189:234195:245248:1ENSANGP000000205454E-483L: 22028556-22029245AgamOBP57No HitNANA
AaegOBP25AAEL006103EAT423601.189:2037030:2054778: −1ENSANGP000000205451E-483L: 22028556-22029245AgamOBP57No HitNANA
AaegOBP26AAEL006106EAT423621.189:2063847:2065664:1ENSANGP000000247499E-453L: 22025866-22026710AgamOBP56No HitNANA
AaegOBP27AAEL006176EAT422731.193:1418261:1435259:-1ENSANGP000000319722E-382R: 6152228-6154270AgamOBP7CG114214E-10OS-F
AaegOBP28AAEL006393EAT420301.203:1485149:1497912:-1ENSANGP000000223814E-403L: 7968500-7969832AgamOBP31No HitNANA
AaegOBP29AAEL006387EAT420311.203:1497040:1521142:-1ENSANGP000000223815E-423L: 7968500-7969832AgamOBP31No HitNANA
AaegOBP30NANA1.203:1520641:1521601:-1ENSANGP000000235042E-373L: 7964186-7965353AgamOBP44CG181113E-01obp99a
AaegOBP31AAEL006396EAT420321.203:1526927:1527922: −1ENSANGP000000223813E-383L: 7968500-7969832AgamOBP31No HitNANA
AaegOBP32AAEL006398EAT420331.203:1537009:1538019: −1ENSANGP000000223814E-383L: 7968500-7969832AgamOBP31No HitNANA
AaegOBP33AAEL006385EAT420341.203:1538561:1539571: −1ENSANGP000000223815E-403L: 7968500-7969832AgamOBP31No HitNANA
AaegOBP34AAEL014082EAT336391.1002:187323:188351:1ENSANGP000000155951E-353L: 4997975-4998746AgamOBP4CG88073E-21LUSH
AaegOBP35AAEL002606EAT462041.61:1497852:1498861:1ENSANGP000000200722E-383L: 40213915-40214391AgamOBP26CG112186E-15obp56d
AaegOBP36AAEL008011EAT402431.294:802739:803197: −1ENSANGP000000020637E-252R: 35436492-35436969AgamOBP15CG114212E-11OS-F
AaegOBP37AAEL008009EAT402441.294:823671:828548: −1ENSANGP000000178971E-162R: 35434050-35434601AgamOBP2CG114216E-12OS-F
AaegOBP38AAAEL008013EAT402461.294:1185169:1186628: -1ENSANGP000000175612E-662R: 4211084-4211688AgamOBP3CG114221E-37OS-E
AaegOBP39AAEL009449EAT386811.397:1059780:1064175:1ENSANGP000000145148E-602R: 35643123-35644108AgamOBP1CG114211E-41OS-F
AaegOBP40AAEL009597EAT385271.411:257365:258240:1ENSANGP000000254036E-362R: 17333034-17333889AgamOBP39CG75924E-05obp99b
AaegOBP41AAEL009599EAT385281.411:258377:259352:1ENSANGP000000238578E-302R: 17334178-17335129AgamOBP40CG181114E-02obp99a
AaegOBP42AAEL010666EAT323091.495:848252:848790:1ENSANGP000000167078E-592L: 45009046-45009783AgamOBP48CG135242E-05obp58c
AaegOBP43AAEL010662EAT373381.495:849614:850509:1ENSANGP000000167072E-492L: 45009046-45009783AgamOBP48CG135244E-07obp58c
AaegOBP44AAEL010718EAT372761.500:470519:471178:1ENSANGP000000223583E-233R: 32225169-32226168AgamOBP43No HitNANA
AaegOBP45AAEL010714EAT372771.500:485823:486827:1ENSANGP000000253705E-29X: 11416805-11417807AgamOBP33No HitNANA
AaegOBP46AAEL010872EAT370951.514:549927:550986:−1ENSANGP000000253706E-19X: 11416805-11417807AgamOBP33No HitNANA
AaegOBP47AAEL011499EAT364141.584:253363:254080:−1ENSANGP000000167071E-512L: 45009046-45009783AgamOBP48CG135248E-06obp58c
AaegOBP48AAEL011494EAT364161.584:270443:271204:−1ENSANGP000000166105E-262L: 45014421-45015087AgamOBP46CG172843E-03obp93a
AaegOBP49AAEL011484EAT364191.584:357678:358469:−1ENSANGP000000256389E-16UNKN: 22699009-22699832OBPjj4CG300722E-05obp50c
AaegOBP50AAEL011490EAT364201.584:358749:359468:1ENSANGP000000297081E-292L: 44998236-44999054OBPjj16CG135243E-05obp58c
AaegOBP51AAEL011487EAT364231.584:391587:398988:−1ENSANGP000000297083E-392L: 44998236-44999054OBPjj16CG135245E-05obp58c
AaegOBP52AAEL011491EAT364251.584:422989:423783:1ENSANGP000000297082E-122L: 44998236-44999054OBPjj16CG135241E-06obp58c
AaegOBP53AAEL011482EAT364261.584:423953:424743:1ENSANGP000000297088E-192L: 44998236-44999054OBPjj16CG172841E-04obp93a
AaegOBP54AAEL011481EAT364271.584:434945:435608:1ENSANGP000000297086E-142L: 44998236-44999054OBPjj16CG300728E-03obp50c
AaegOBP55§AAEL012377EAT354521.685:122245:142217:-1ENSANGP000000103143E-442L: 12288987-12289437AgamOBP20CG117482E-21obp19a
AaegOBP56AAEL013018EAT347781.776:429863:431937:1ENSANGP000000145145E-582R: 35643123-35644108AgamOBP1CG114214E-42OS-F
AaegOBP57AAEL000035EAT489641.1:3668288:3668784:−1ENSANGP000000187748E-252R: 29134184-29134707AgamOBP13CG16682E-07pbprp2
AaegOBP58AAEL014430EAT332871.1115:141516:14944:−1ENSANGP000000223581E-243R: 32225169-32226168AgamOBP43CG75843E-02obp99c
AaegOBP60AAEL015499EAT323571.2733:6977:7459:−1ENSANGP000000178973E-412R: 35434050-35434601AgamOBP2CG114212E-29OS-F
AaegOBP61AAEL015554EAT462081.3221:5993:6448:1ENSANGP000000289624E-383L: 40217380-40217850OBPjj11CG84625E-12obp56e
AaegOBP62AAEL015566EAT323081.3337:1317:2221:−1ENSANGP000000167073E-492L: 45009046-45009783AgamOBP48CG135245E-07obp58c
AaeOBP63AAEL015567EAT373371.3337:2936:3570:−1ENSANGP000000167074E-682L: 45009046-45009783AgamOBP48CG135246E-05obp58c
AaegOBP65AAEL002618NA1.61:174006:174371:1ENSANGP000000289623E-253L: 40218226-40218764OBPjj12CG112184E-05Obp56d

We did not find any irregular transcript lengths among the Classic OBPs of Ae. aegypti when we compared each sequence to OBPs of An. gambiae and D. melanogaster (Supplementary Material Fig. S1). However, some of the Ae. aegypti transcripts, encoding Atypical and Plus-C OBPs, predict proteins with differences to those of other OBPs. For example, transcript AAEL009599 (AaegOBP41), an Atypical OBP, has a predicted protein with an extra 29 amino acid residues after C6 (Supplementary Material Fig. S2) and we were able to manually assemble a contig from seven ESTs in NCBI dbEST using the ContigExpress programme of Vector NTI software (InforMax, Inc. Frederick, MD) to confirm the integrity of the annotated sequence. Another transcript AAEL010718 (AaegOBP44), an Atypical OBP, has a shorter 3′-terminal (∼60 amino acids) but nine ESTs allowed us to extend this terminal to obtain a full-length transcript named as EST010718N (AaegOBP44B). Two transcripts AAEL006106 (AaegOBP26) and AAEL006109 (AaegOBP23) of Plus-C OBPs have very long 5′-terminals (∼106 amino acids) (Supplementary Material Fig. S3) and they encode the same protein, which corresponds to seven ESTs (EB101959, DV260479, DV408951, DW217440, EB101753, BQ789643 and BQ789654). The assembly of these seven ESTs (AaegOBP23B) has a shorter 5′-end without changing the rest of the coding region and contains a Plus-C motif and a signal peptide. We named it EST006106N (AaegOBP23B). The transcript AAEL000139 (AaegOBP5) has a very long 3′-terminal (Supplementary Material Fig. S3) and in this case there is no matching EST. It has a homologue (AgamOBP58) in An. gambiae (similarity of 41.3% and identity of 30.5%) which again has no EST.

Comparison of Ae. aegypti OBPs with OBPs from D. melanogaster and An. gambiae

The identified Classic OBPs of Ae. aegypti have an overall amino acid sequence similarity of 21.7%, which is higher than the 13.3 and 16.6% found for D. melanogaster and An. gambiae Classic OBPs, respectively. The overall 25.5% similarity of the Plus-C OBPs of Ae. aegypti is much higher than the 9.4% for An. gambiae and slightly higher than 22.1% for D. melanogaster. The similarity of Ae. aegypti Atypical OBPs is 36.2%, which is comparable with 38.3% for An. gambiae. When the Classic OBPs for all three dipteran species are compared, the overall amino acid identity is less than 5%. Most of the 66 Ae. aegypti OBPs have a homologue in An. gambiae with E-values better than 10−22 (Table 2) and similarities ranging from 16% (AaegOBP48 and AgamOBP46) to 63% (AaegOBP2 and AgamOBP6). However, there are four Classic OBPs, AaegOBP17, AaegOBP18, AaegOBP19 and AaegOBP64, specific to Ae. aegypti (E-values less than 10−7 when compared to An. gambiae OBPs and less than 10−3 compared to D. melanogaster OBPs). These have mature proteins of molecular weights 13.5, 13.7, 13.7 and 14.1 KDa and pIs of 4.7, 5.4, 5.4 and 5.1, respectively. It is likely that these OBP genes evolved after the divergence of the two mosquito species Ae. aegypti and An. gambiae about ∼150 million years ago (Mya; Krzywinski et al., 2006), possibly by gene duplication events.

Sequence comparisons (Table 2) and phylogenetic analyses (Fig. 3) of OBPs of the three dipteran species also revealed nine Ae. aegypti OBPs which have homologues in both An. gambiae and D. melanogaster. These OBPs can be clustered into three groups that we have named after the D. melanogaster OBP in the group. Thus there is an OS-E/OS-F group, a LUSH group and an OBP19a group (Fig. 2). The OS-E/OS-F homologues of An. gambiae have been reported previously (Vogt, 2002). The similarities and identities of the mature proteins are 89.8 and 34.3%, respectively for the OS-E/OS-F group, 85.6 and 28.0% for the LUSH group and 83.5 and 15.8% for the OBP19a group. These values are much higher than the overall similarity and identity of insect OBPs. It is therefore likely that these OBPs have functional roles common to dipteran insects and may have evolved from an ancestral gene before the divergence of mosquitoes and fruit flies about ∼250 Mya (Gaunt & Miles, 2002). Alignment of the OS-E/OS-F OBPs of the three dipteran species (Fig. 2A) reveals a highly conserved region, LKCYMNC, around the second and third conserved cysteines and there are other regions with highly conserved residues. Similarly, there are residues conserved in the other two groups of OBPs (Fig. 2B–D). The finding of conserved amino acids in the OBPs of all three dipteran species suggests that these regions may have an important function in the proteins.


Figure 3. Phylogenetic relationships of the mature dipteran odorant-binding proteins (OBPs) for the Classic OBPs (A), the Plus-C OBPs (B) and the Atypical OBPs (C). The unrooted trees were generated with MEGA4 using a maximum-likelihood model and the BLOSUM62 amino acid matrix. Bootstrap support values from 1000 replications of neighbor-joining with uncorrected distances are shown on the relevant branch points with a cut-off value of 50. The gene expansions and the dipteran OBP group lineages are indicated by vertical black bars. Protein names are abbreviated to AaegOBP, AgamOBP and DmelOBP for Aedes aegypti, Anopheles gambiae and Drosophila melanogaster OBPs, respectively.


Figure 2. Alignments of three groups of dipteran odorant-binding proteins (OBPs). (A) OS-E/OS-F group; (B) LUSH group; (C) OBP19a group. The numbers on the right of the alignment indicate amino acid residue positions in the whole sequence. The numbers above the alignment indicate amino acid position in the alignment. Dashes indicate gaps created by the GeneDoc alignment. All identical and similar amino acid residues are shaded. The letters below the alignment indicate identical residues and the numbers indicate similarity scores. Horizontal bars above the alignments show positions of alpha-helices according to the crystal structure of AgamOBP1 (A) (Wogulis et al., 2006) and LUSH (B) (Kruse et al., 2003).

Download figure to PowerPoint

Genomic organisation of OBP genes

There is no detailed genomic map for Ae. aegypti, so the genomic organisation of the OBP genes was estimated from their positions on supercontigs, obtained from the NCBI GenBank and the genome project (AaegL1.41) using the sequence ID retrieved by MotifSearch (Fig. 1) and visualized using MGAlign (Fig. 4). Many of the OBP genes are clustered together, with eight pairs being less than 1 kb apart (AeagOBP3 and AaegOBP4, AeagOBP17 and AaegOBP18, AeagOBP49 and AaegOBP50, AeagOBP52 and AaegOBP53, AeagOBP62 and AaegOBP63, AeagOBP42 and AaegOBP43, AeagOBP32 and AaegOBP33, AeagOBP40 and AaegOBP41). For each of the three OBP subgroups there is a large cluster of genes. Nine Classic OBPs are clustered on supercontig 1.61 within a 169 406 bp region (Fig. 4A), there are eight Plus-C OBP genes on supercontig 1.584 within a 182 245 bp region (Fig. 4B) and five Atypical OBP genes on supercontig 1.203 within a 54 422 bp region (Fig. 4C). The adjacent nine Classic OBP genes have a similarity of 64.6% and the six clustered Atypical OBPs are 93.1% similar. The similarity value is lower for the clustered Plus-C OBPs at only 42.1%. The OBP genes within each cluster have similar intron-exon structures and the size of most introns is about 60 bp (see below). Most Classic OBPs have one intron, most Atypical OBPs have no intron and most Plus-C OBPs have two introns. These data provide good evidence that the Ae. aegypti OBP genes evolved rapidly by gene duplication, as has been reported for An. gambiae (Xu et al., 2003) and D. melanogaster (Hekmat-Scafe et al., 2002). The Ae. aegypti-specific OBPs (AaegOBP17, AaegOBP18, AaegOBP19 and AaegOBP64) are clustered on the same supercontig 1.115 within a 61 200 bp region (Fig. 4A).


Figure 4. Genomic organisation of Aedes aegypti odorant-binding proteins (OBPs) for the Classic OBPs (A), the Plus-C OBPs (B) and the Atypical OBPs (C). The genomic sequences (top black bar) and OBP transcript sequences (small rectangles and squares) of the genome project were aligned using MGAligIt programme ( The large black arrows indicate the length of each OBP gene with the head of the arrow indicating the direction of transcription. Introns are indicated by small arrows with the head indicating the direction from the beginning to the end of the intron. The numbers below the arrows give the intron size in base pairs and the numbers between the large arrows are the number of base pairs between the genes.

We combined both EST assembly and manual annotation to check the transcripts with introns larger than 60 bp using ContigExpress of Vector NTI software (InforMax, Inc.) and GenScan ( We were able to predict an alternative AaegOBP37 (AAEL008009) so that it has same intron/exon structure as the adjacent AaegOBP36 without changing the original coding sequence (as annotated in the genome project). EST data support the presence of the large introns in the other Classic OBPs, AaegOBP15 (AAEL002598), AaegOBP3 (AAEL000051), AaegOBP4 (AAEL000073), AaegOBP23 (AAEL006109), AaegOBP24 (AAEL006108), AaegOBP25 (AAEL006103) and AaegOBP26 (AAEL006106) (Fig. 4). Both EST data and manual GenScan prediction showed that AaegOBP28 (AAEL006393) is the same sequence as AaegOBP29 (AAEL006387) and encodes the same protein. GenScan also predicted new Ae. aegypti transcripts AaegOBP30 and AaegOBP64 from the supercontigs 1.203 and 1.115, respectively (Table 2 and Fig. 4). Interestingly, each of two pairs (AaegOBP23 and AaegOBP24, AaegOBP25 and AaegOBP26) of OBP genes with large introns have the same sequence and encode the same protein. This clearly demonstrates that one pair of OBPs was derived by gene duplication from the other pair on the same supercontig (separated by 1 791 782 bp Fig. 4C).

Isoelectric points and molecular weights of predicted proteins with OBP motifs

When the predicted isoelectric points (pIs) of mature OBPs of Ae. aegypti are plotted against their predicted molecular weights (MWs) (Supplementary Material Fig. S4A), it shows that they have similar pIs and MWs to those of An. gambiae (Supplementary Material Fig. S4B) and D. melanogaster (Supplementary Material Fig. S4C). The range of pIs for the dipteran OBPs is between 4 and 10, a wider range than that reported for the acidic pIs of lepidopteran OBPs. Thus the OBPs in the dipteran species can be positively or negatively charged at the physiological pH in insect antennae. The MWs of the Ae. aegypti Classic OBPs are less than 15.5 kDa in agreement with the MWs of other insect OBPs. Most of the Plus-C OBPs have MWs between 17 and 25 kDa (except AaegOBP5 and AaegOBP23) and the Atypical OBPs have MWs between 25 and 35 kDa (except AaegOBP44). For the Atypical OBPs of both Ae. aegypti and An. gambiae there is a range of MWs between 27 and 38 kDa (Supplementary Material Fig. S4A,B); this is mainly because of the long C-terminal present after the sixth conserved cysteine. It has been suggested that this C-terminal may occupy the binding pocket of the proteins at low pHs as demonstrated for AgamOBP1 of An. gambiae (Wogulis et al., 2006), BmPBP1 of B. mori (Wojtasek & Leal, 1999; Horst et al., 2001; Lee et al., 2002) and D. melanogaster OBP LUSH (Kruse et al., 2003).

Synteny among genomic regions of An. gambiae, D. melanogaster and Ae. aegypti containing OBP genes

We used the longest OBP-containing regions (microsyntenic regions) in Ae. aegypti to analyse the presence and nature of the microsynteny among An. gambiae, D. melanogaster and Ae. aegypti for Classic OBPs (supercont1.61:1448800-1618500), Plus-C OBPs (supercont1.584:253000-437000) and Atypical OBPs (supercont1.203:1428000-1597500) (Fig. 4 and Table 3). We compared the amino acid sequences of each OBP-containing region with the syntenic region in An. gambiae and D. melanogaster obtained from the VectorBase and Ensembl precomputed tBLAT DNA-DNA comparison for these three insect species (Lawson et al., 2007). The sequence characteristics of the Ae. aegypti microsyntenic regions and comparison with syntenic regions in An. gambiae and D. melanogaster are given in Tables 3 and 4. Gene density in these regions is generally higher in An. gambiae and D. melanogaster than in Ae. aegypti. However, only three homologues out of a total of 83 genes were found in the syntenic regions of D. melanogaster with synteny qualities (see Experimental procedures) of 5.9, 6.1 and 3.9%, respectively, for the Ae. aegypti microsyntenic regions containing Classic, Plus-C and Atypical OBPs (Table 4). There is a substantial synteny quality of 72.0% in the microsyntenic region containing Classic OBPs between Ae. aegypti and An. gambiae, with all nine Ae. aegypti Classic OBPs having homologues in the An. gambiae syntenic region. There are five OBP homologues (AgamOBP46, AgamOBP47, AgamOBP48, OBPjj16 and AGAP007283) in the syntenic region of An. gambiae with a synteny quality of 30.0% to the Ae. aegypti microsyntenic region containing Plus-C OBPs. Some of these An. gambiae OBPs (AgamOBP47 and AgamOBP48) are clustered within an Ae. aegypti OBP expansion clade (Fig. 3B). These An. gambiae Plus-C OBPs might have evolved along the same monophyletic lineage with the Ae. aegypti OBPs in the clade which duplicated within Ae. aegypti. There are two Atypical OBPs (AgamOBP31 and AgamOBP44) out of three homologues in the An. gambiae genomic region syntenic to the Ae. aegypti mircosyntenic region containing Atypical OBPs with a synteny quality of 40.0% (Table 4). They are clustered into a separate clade forming an An. gambiae OBP expansion group although closely related to one Ae. aegypti expansion group (Fig. 3C), suggesting the duplications occurred after the speciation of Ae. aegypti and An. gambiae. The Classic OBPs are more divergent than the Atypical and the Plus-C OBPs without forming a large expansion group (Fig. 3A), but with high synteny between the two mosquito species. This may be the result of the duplications of the Classic OBPs occurring before the divergence of mosquitoes and the fruit fly.

Table 3.  Sequence characteristics of Aedes aegypti odorant-binding protein (OBP) microsyntenic regions
  • *

    The total number of genes and homologous genes were obtained in Ensembl MulticontigView.

Microsyntenic regionSupercont1.61:1448800-1618500Supercont1.584:253000-437000Supercont1.203:1428000-1597500
Sequence length169.7 kb 184.0 kb 169.5 kb
Number of genes* 10  14  12
Number of OBPs  9   8   5
Gene density  1 gene/17.0 kb   1 gene/13.1 kb   1 gene/14.1 kb
Average exon length183.12 bp 180.2 bp 491.0 bp
Average intron length926.5 bp1253.3 bp1250.8 bp
Exons per gene  2.4   3.1   2.4
Introns per gene  1.5   2.2   1.4
Average gene length392.7 bp 565.9 bp1183.9 bp
Table 4.  Relative synteny quality of Anopheles gambiae and Drosophila melanogaster regions syntenic to the longest odorant-binding protein (OBP)-containing regions in each OBP subgroup of Aedes aegypti
 Classic (supercont1.61: 1448800-1618500)Plus-C (supercont1. 584:253000-437000)Atypical (supercont1.203: 1428000-1597500)
An. gambiaeD. melanogasterAn. gambiaeD. melanogasterAn. gambiaeD. melanogaster
  • *

    The syntenic region was obtained from the VectorBase and Ensembl precomputed tBLAT (translated BLAT) DNA-DNA comparisons for these three species.

  • The total number of genes and homologous genes were obtained in Ensembl MulticontigView.

  • Values (in %) calculated as described in Cannon et al. (2006), with collapsing tandem duplications (two homologous genes within tandem duplicated regions counted as one) and by excluding transposable elements.

Syntenic region*3L:40130734- 403004342L:20547067- 207167672L:44919513- 45103513X:5831762- 60157623L:7858694- 80281943L:13876134- 14045634
Sequence length169.7 kb169.7 kb184.0 kb184.0 kb169.5 kb169.5 kb
Total no. genes 15 24 26 19 14 40
Gene density  1 gene/11.3 kb  1 gene/7.1 kb  1 gene/7.1 kb  1 gene/9.7 kb  1 gene/12.1 kb  1 gene/4.2 kb
Number of OBPs  9  0  2  0  2  0
Homologous genes  9  1  6  1  3  1
Synteny quality 72.0%  5.9% 30.0%   6.1% 40.0%  3.9%


The challenge of annotating eukaryotic genomes was comprehensively set out by Lewis et al. (2000) and the use of algorithms and sequence similarity searches to identify putative gene functions can clearly play a role as applied by the present and previous studies for OBPs (Galindo & Smith, 2001; Graham & Davies, 2002; Hekmat-Scafe et al., 2002; Xu et al., 2003; Zhou et al., 2004) and ORs (Clyne et al., 1999; Gao & Chess, 1999; Vosshall et al., 1999; Hill et al., 2002; Kim & Carlson, 2002; Robertson & Wanner, 2006; Bohbot, et al., 2007; Wanner et al., 2007). The use of our OBP motifs to identify OBP genes in Ae. aegypti and compare these with those of D. melanogaster and An. gambiae has been successful in consolidating our knowledge of these genes and identifying previously unknown members of this important protein family. Our genome-wide analyses of the OBPs of three dipteran species have provided further information on how the OBP genes might have evolved and suggested that some have common functions in all three species. Of course the genes identified in this way can only be annotated as encoding putative OBPs. The challenge now is to identify their ligands and determine their roles in the chemoreception and the chemical ecology of insects in general.

Experimental procedures

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Retrieving peptide sequences annotated as OBPs

The Drosophila database FlyBase ( was searched using ‘odorant binding’ and ‘pheromone binding’ as text inputs and both DNA and peptide sequences were retrieved. For An. gambiae OBP sequences the gene accession numbers of previous studies (Vogt, 2002; Xu et al., 2003; Zhou et al., 2004) were used to obtain the full length peptide sequences. Sequences were stored and maintained using Vector NTI software (InforMax, Inc.).

Identification of OBP motifs

The peptide sequences of each distinct subgroup of all annotated OBPs were aligned using ClustalX (8.1) (Thompson et al., 1997) with default gap-penalty parameters of gap opening 10.0 and extension 0.2. These alignments were then used to construct OBP sequence ‘motifs’, where the number of amino acid residues between the conserved cysteines in each peptide sequence was counted. In some cases manual adjustment was necessary to align the cysteines. This was then used to produce OBP motifs and used in the ‘MotifSearch’ algorithm (see Results).

Identification of sequences encoding peptides with OBP motifs in the Ae. aegypti genome

The FASTA files of predicted peptide sequences of the whole genome projects of An. gambiae (AgamP3.41) and Ae. aegypti (AaegL1.41) were downloaded from the Ensembl mosquito database ( and VectorBase ( EST sequences were downloaded from the EST database ( The genome sequences were searched with an in-house algorithm MotifSearch (Fig. 1). Briefly, the downloaded nucleotide sequences (genome sequences or ESTs) were translated in the six possible frames into peptide sequences and combined with the predicted peptide sequences of the genome projects into one single FASTA file as the input search file and then searched with the MotifSearch to obtain the motif-containing sequences. Apart from the downloading of sequences from the database and the combining of these into a FASTA file containing all published OBP sequences for comparison, the built-in functions in the MotifSearch automatically retrieve motif-containing sequence fragments and the sequence identities from the downloaded genome sequences, comparing them with published OBP sequences and identifying putative OBP sequences with E-values better than a threshold value of 0.005. The MotifSearch is simpler than homology searching by PSI-Blast or PHI-Blast and is an alternative annotation approach for very diverse protein families. It can be applied to any raw peptide data, produced from genome sequencing projects, for sequence motifs of small size containing conserved residues with constant spacings between them.

The annotated OBP sequences were blasted against the GenBank entries to identify Ae. aegypti-specific OBP genes and Ae. aegypti homologues in An. gambiae and D. melanogaster (Table 2). The predicted MWs and isoelectric points of the putative OBPs were determined using Vector NTI software (InforMax, Inc.) and the ExPASy ProtParam tool (

Phylogenetic analysis

The mature peptide sequences identified in the previous section and by previous studies were aligned using ClustalX (8.1) (Thompson et al., 1997) with default gap-penalty parameters of gap opening 10.0 and extension 0.2. The alignments were used to construct phylogenetic trees using mega4 software (Tamura et al., 2007). The final unrooted consensus trees were generated with 1000 bootstrap trials using the neighbor-joining method with a cut-off bootstrap value of 50 and p-distance model (Saitou & Nei, 1987).

Transcript/genome alignments

MGAlign ( (Lee et al., 2003) was used to display the exon/intron arrangements of the supercontigs of Ae. aegypti. GenScan ( was also used to predict possible protein-encoding sequences (CDS) of some genes from the supercontig of the whole genome shotgun sequence of Ae. aegypti ( ContigExpress of Vector NTI software (InforMax, Inc. USA) was used to assemble ESTs for the verification of some gene models from the Ae. aegypti genome project.

Analysis of microsynteny

The identified Ae. aegypti OBP sequences were used as queries to blast search the Ae. aegypti genome in a reiterative manner for further OBP-like proteins which could be missed by MotifSearch. Syntenic regions of An. gambiae and D. melanogaster to the largest OBP-containing regions in each OBP subgroup of Ae. aegypti (Fig. 4) were analysed using VectorBase and Ensembl precomputed tBLAT DNA-DNA comparison for these three insect species ( (Lawson et al., 2007). The syntenic regions were investigated within the ‘Aedes MultiView’ section at the VectorBase to look at all of the genes. The relative synteny quality in a region, expressed as a percentage, was calculated by dividing the sum of the conserved genes in both syntenic regions by the sum of the total number of genes in both regions, excluding retroelements and transposons and collapsing tandem duplications (Cannon et al., 2006).


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Rothamsted Research receives grant-aided support from the Biotechnology and Biological Sciences Research Council of the United Kingdom. Xiao Li He was funded by a BBSRC initiative ‘Selective Chemical Intervention in Biological Systems’.


  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results and discussion
  5. Experimental procedures
  6. Acknowledgements
  7. References
  8. Supporting Information

Figure S1. Alignment of the Classic odorant-binding proteins (OBPs) of Aedes aegypti, Anopheles gambiae and Drosophila melanogaster.

Figure S2. Alignment of the Atypical odorant-binding proteins (OBPs) of Aedes aegypti, Anopheles gambiae and Drosophila melanogaster.

Figure S3. Alignment of the Plus-C odorant-binding proteins (OBPs) of Aedes aegypti, Anopheles gambiae and Drosophila melanogaster.

Figure S4. Relationship between the predicted isoelectric points and molecular weights of putative odorant-binding proteins (OBPs) in the genomes of (A) Aedes aegypti, (B) Anopheles gambiae, (C) Drosophila melanogaster. (•) Classic OBPs, (▴) Plus-C OBPs, (▪) Atypical OBPs.

Please note: Blackwell Publishing is not responsible for the content or functionality of any supplementary materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

IMB_789_sm_Supmat.pdf895KSupporting info item

Please note: Neither the Editors nor Wiley Blackwell are responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.