Comparative analysis of detoxification enzymes in Acyrthosiphon pisum and Myzus persicae


  • John S. Ramsey and Dean S. Rider contributed equally to this research.

Georg Jander, Boyce Thompson Institute for Plant Research, 1 Tower Road, Ithaca, NY 14850, USA. Tel.: + 607 354 1365; fax: + 607 254 1502; e-mail:


Herbivorous insects use detoxification enzymes, including cytochrome P450 monooxygenases, glutathione S-transferases, and carboxy/cholinesterases, to metabolize otherwise deleterious plant secondary metabolites. Whereas Acyrthosiphon pisum (pea aphid) feeds almost exclusively from the Fabaceae, Myzus persicae (green peach aphid) feeds from hundreds of species in more than forty plant families. Therefore, M. persicae as a species would be exposed to a greater diversity of plant secondary metabolites than A. pisum, and has been predicted to require a larger complement of detoxification enzymes. A comparison of M. persicae cDNA and A. pisum genomic sequences is partially consistent with this hypothesis. There is evidence of at least 40% more cytochrome P450 genes in M. persicae than in A. pisum. In contrast, no major differences were found between the two species in the numbers of glutathione S-transferases, and carboxy/cholinesterases. However, given the incomplete M. persicae cDNA data set, the number of identified detoxification genes in this species is likely to be an underestimate.


In their long co-evolution with insect herbivores, plants have developed a variety of defences to keep from being eaten (Rosenthal & Berenbaum, 1991). These include physical barriers such as spines, tough bark and sticky sap, as well as numerous distasteful or toxic compounds that are often unique to particular plant genera or families. Defence against herbivory is likely the primary function for many of the several hundred thousand different plant secondary metabolites found in nature (Bino et al., 2004; Becerra, 2007). Nevertheless, most, and probably all, land plants are fed upon by at least one of the several hundred thousand herbivorous insect species (Schoonhoven et al., 1998). This suggests that insects, as part of an evolutionary arms race with their host plants, have developed efficient mechanisms to avoid or detoxify plant secondary metabolites.

Insect herbivores are often broadly classified as either specialists or generalists. Whereas specialists consume only a small number of closely related plant species, generalists tend to forage more widely on a variety of host plants. It is estimated that 90% of herbivorous insects are specialists that forage on three or fewer plant families (Bernays & Graham, 1988), and must therefore negotiate a relatively limited array of plant defences. In comparison with these more abundant specialists, generalist herbivores are almost certainly exposed to a greater diversity of plant defensive chemicals.

Insect responses to plant secondary metabolites can include avoidance of the most well defended tissue types, target site insensitivity, rapid passage of toxins through the gut, efflux pumps and direct metabolic detoxification. Cytochrome 450 monooxygenases (P450s) constitute the largest and most functionally diverse class of insect detoxification enzymes (Li et al., 2007). Members of the CYP3 clade have been implicated in the oxidative detoxification of furanocoumarins, alkaloids, numerous other plant secondary metabolites and synthetic insecticides (Snyder & Glendinning, 1996; Feyereisen, 1999; Scott, 1999; Mao et al., 2006). The CYP4 clade has been implicated in pheromone metabolism (Maïbèche-Coisne et al., 2004). Members of the CYP2 clade and mitochondrial targeted P450s contribute to hormone, sterol, and fatty acid metabolism (Feyereisen, 1999; Feyereisen, 2006).

Similar to P450s, carboxyl/cholinesterases (CCEs) can function broadly in xenobiotic detoxification. CCEs from aphids and other insects have been shown to hydrolyse both plant secondary metabolites, organophosphates, and other man-made insecticides (Field, 2000; Li et al., 2007). Other members of the CCE superfamily have important neurological and developmental functions, or are involved in pheromone processing (Oakeshott et al., 1999).

Glutathione S-transferase (GST) enzymes, which occur in all eukaryotic organisms, function by conjugating xenobiotics and endogenously activated compounds to the thiol group of reduced glutathione, thereby targeting them for more rapid excretion or degradation (Li et al., 2007). In insects, GSTs have been associated with resistance to insecticides, including DDT, spinosad, diazinon and nitenpyram, which target the nervous system, as well as lufenuron and dicyclanil, which cause larval lethality during life stage transitions (Enayati et al., 2005; Low et al., 2007). GST enzyme activity in the generalist Myzus persicae (green peach aphid) increases upon ingestion of isothiocyanates, a class of toxic secondary metabolites found in Brassicaceae (Francis et al., 2005) suggesting that GSTs are involved in detoxification. Comparison of the legume specialist Acyrthosiphon pisum (pea aphid) and generalist Aulacorthum solani (foxglove or potato aphid) showed broad differences in GST enzyme activities that may reflect their respective host plant preferences (Francis et al., 2001).

Whereas specialist insect herbivores tend to have highly efficient detoxification mechanisms that target the predictable set of secondary metabolites in their favoured host plants, generalists that feed on a wide variety of plants would need a less specific array of constitutive or inducible detoxification enzymes. Therefore, it is often assumed that generalist herbivores must possess a greater diversity of detoxification enzymes than specialists. Research on Lepidoptera in the genus Papilio (swallowtail butterflies) showed that the generalists possess a broader array of P450s for detoxification of furanocoumarins (Li et al., 2002; Mao et al., 2007). However, since the only sequenced lepidopteran genome is that of Bombyx mori (Mita et al., 2004), an extreme host plant specialist, it is difficult to assess the actual genetic diversity of P450s and other detoxification enzymes relative to the number of host plants that can be consumed by particular larvae of a particular lepidopteran species.

The A. pisum genome (International Aphid Genomics Consortium, 2010) together with a large M. persicae expressed sequence tag (EST) collection (Ramsey et al., 2007), allows direct comparison of the xenobiotic detoxification enzymes in two related insect species with different feeding habits. Both are classified in the tribe Macrosiphini within the aphid sub-family Aphidinae (von Dohlen et al., 2006) and are about 95% identical at the DNA sequence level. However, A. pisum is specialized for feeding on legumes (Fabaceae) and M. persicae is a broad generalist that feeds from hundreds of species in more than forty plant families (Blackman & Eastop, 2000).

Legumes, which can be consumed by both A. pisum and M. persicae, have aphid-deterrent secondary metabolites. For instance, a hemiterpene glucoside was identified as an aphid-deterrent in Vicia hirsuta (Ohta et al., 2005) and low saponin and phenolic content in alfalfa was associated with improved A. pisum performance (Golawska & Lukasik, 2008). Although phloem feeders likely encounter a smaller repertoire of secondary metabolites than chewing insects feeding on the same plants, M. persicae almost certainly ingests secondary metabolites that a legume specialist like A. pisum would not normally encounter. This would include glucosinolates in Brassicaceae (Kim et al., 2008) and alkaloids in the Solanaceae. Tobacco-adapted strains of M. persicae are reported to have a nine-fold greater resistance to nicotine in artificial diet (Nauen et al., 1996) and nicotine vapours in air (Devine et al., 1996), suggesting routine exposure to this alkaloid when M. persicae are feeding from tobacco plants.

Given its more diverse feeding habits, M. persicae as a species would need to avoid or inactivate a greater variety of plant defences than A. pisum. Although there are defensive proteins (Walz et al., 2004) and other barriers to phloem feeding that would not be specifically targeted by detoxifying enzymes (Walling, 2008), secondary metabolites definitely have important defensive functions in many plant species. Here we compare the predicted P450s, GSTs, and esterases produced by M. persicae and A. pisum to test the hypothesis that a broad generalist insect herbivore would have a greater abundance and functional diversity of xenobiotic detoxification enzymes than a specialist.

Results and discussion

Myzus persicae cDNA sequencing, assembly, and annotation

To allow large-scale comparisons with the recently sequenced A. pisum genome (International Aphid Genomics Consortium, 2010), an existing EST collection with >10 000 M. persicae unigenes produced by Sanger sequencing (Ramsey et al., 2007) was augmented by DNA sequencing using Roche GS-FLX (454) technology. Sequencing of cDNA libraries prepared from tobacco-adapted and non-adapted isolates of M. persicae produced 142 600 ESTs, with a median read length (N50) of 235 bp and a mean read length of 200 bp. The SeqClean program ( was used to remove polyA tails and primer sequences, and any sequence less than 30 bp long after cleaning was discarded. This resulted in a set of 118 756 sequences, with an N50 of 173 bp and a mean length of 166 bp. Removal of 39 795 reads that aligned to an existing M. persicae Sanger-sequencing unigene set (Ramsey et al., 2007) with a blastn e-value less than 1e-10 resulted in 78 961 sequences, which were clustered using the MCL algorithm (Enright et al., 2002; Van Dongen, 2008). Of these, 34 781 sequences could not be clustered with any other sequences, whereas the remaining 44 180 sequences yielded 9201 clusters. The sequences in each cluster were assembled into contigs using the CAP3 program (Huang & Madan, 1999). The default CAP3 parameters, b = 20 and d = 200, for the sequence quality score and number of allowable differences, respectively, were used to determine whether two given sequences would be assembled into one contig. This yielded a total of 7710 contigs and 5341 singlets. These two non-overlapping datasets, the 34 781 unclustered sequences (Supplemental Table S1) and the 13 051 contigs and singletons (Supplemental Table S2), represent a significant expansion of existing cDNA sequence available for M. persicae. A FASTA file containing the sequences of these 47 832 unigenes can be downloaded from AphidBase (

Both clustered and unclustered sequence datasets were compared with the Drosophila melanogaster RefSeq protein set, the merged set of A. pisum Glean and RefSeq proteins, and the A. pisum genomic scaffolds. Approximately 10% of the 454 sequences (4890) had a hit to a D. melanogaster RefSeq protein (blastx, e-value ≤1e-3). These sequences were annotated by parsing GenBank records to retrieve gene descriptions (Supplemental Table S2). More than 25% of the 454 sequences (12 371) had a match to A. pisum Glean and/or RefSeq predicted proteins. GenBank records containing automated annotation of A. pisum proteins were parsed to annotate M. persicae orthologues according to these gene descriptions (Supplemental Tables S1 and S2).

Over 25% of the 454 unigenes (13 531) had a hit in a BLAST comparison against the A. pisum genomic scaffolds (tblastx, e-value ≤1e-4), but not to any Glean or RefSeq protein data sets (International Aphid Genomics Consortium, 2010). This discrepancy likely results from M. persicae sequences aligning to untranslated regions of the A. pisum genome. A higher stringency e-value cutoff was used for this tblastx search than for the blastx against RefSeq proteins, but it is possible that differences between these two algorithms are responsible for the increased number of 454 unigenes having positive hits against the genome. It is also possible that the A. pisum Glean and RefSeq protein sets are incomplete, and that some of the M. persicae sequences with a match in the A. pisum genome but not the Glean/RefSeq set are aligning to coding regions of genes that were missed by gene prediction programs.

The M. persicae unigene set, consisting of >10 000 Sanger sequencing contigs (Ramsey et al., 2007) combined with 47 832 additional sequences representing expressed genes identified by 454 sequencing, was compared with the A. pisum genome sequence (International Aphid Genomics Consortium, 2010) to assess the relative abundance of xenobiotic detoxification enzymes in a specialist and a broad generalist aphid species.

Cytochrome P450s

The A. pisum genome encodes 83 predicted P450s (Supplemental Table S3). At least 58 loci exhibit both good sequence homology to other insect P450s and contain a complete P450 domain. Additionally, 25 P450-related loci in the A. pisum genome may have incomplete P450 domains. The presence of partial domains could result from gene assembly and annotation problems (e.g. aberrant exon prediction), or these loci could represent actual pseudogenes. The majority of pea aphid P450 sequences are related to members of the CYP3 clade, followed by those related to the CYP4 clade (Table 1). Of the 83 identified A. pisum P450s, 51 show evidence of expression through ESTs in public data sets. There is evidence for P450 evolution through tandem duplications, with several scaffolds in the A. pisum genome assembly possessing two or more P450 loci that are closely related to one another.

Table 1.  Cytochrome P450 genes
CladeDrosophila melanogasterApis melliferaAcyrthosiphon pisumMyzus persicae*
  • *

    Numbers based on M. persicae expressed sequence tag data.


Analyses of other sequenced arthropod genomes have identified 143 P450s in Tribolium castaneum (red flour beetle), 106 in Anopheles gambiae (malaria mosquito), 86 in B. mori (silk moth), 85 in D. melanogaster (fruit fly), 46 in Apis mellifera (honeybee) and 75 in Daphnia pulex (Li et al., 2005; Claudianos et al., 2006; Richards et al., 2008; Baldwin et al., 2009; Therefore, A. pisum appears to have a fairly typical number of P450s. However, if one assumes that some of the 83 putative P450 loci are pseudogenes with incomplete P450 domains, then the A. pisum P450 complement would be toward the lower end of sequenced insect genomes. This relatively low number of P450s may be particularly interesting in the light of mounting evidence that the pea aphid has a tendency to accumulate duplicated genes over time (International Aphid Genomics Consortium, 2010). It has been suggested that the social organization of the bee hive, which may shield the Ap. mellifera queen and larvae from environmental exposure to toxins, has permitted a relative loss of P450 loci in this species (Claudianos et al., 2006). However, this is certainly not the case in A. pisum.

A subset of A. pisum P450s that are predicted to be involved in 20-hydroxy-ecdysone biosynthesis (orthologues of the fruit fly loci disembodied and shade) appear to have undergone duplications in the A. pisum genome (Supplemental Table S3). This is in contrast to evidence from holometabolous insects, where it has been suggested that the presence of only single genes for these enzymes represents a structural and evolutionary constraint (Rewitz et al., 2007). Duplications of shade were not observed among the M. persicae ESTs, suggesting that this observation may be specific to the A. pisum genome.

Analysis of M. persicae ESTs identified more than 150 P450-related sequences. Unique genes, as opposed to allelic variants of the same P450 gene, were identified based on the contig assembly parameters described above. Many of the DNA sequences could not be confidently assigned as P450s based on translations, perhaps representing the non-coding regions, and most of the EST contigs represented incomplete coding regions. However, based on significant homology to the A. pisum P450s (e-value ≤ 1e-4), at least 115 of the M. persicae EST contigs represent expressed P450 loci. Neighbour-joining trees based on Clustal alignments (Larkin et al., 2007) of either DNA sequences or protein translations indicated that 30% to 50% of A. pisum P450 loci have an orthologue in M. persicae (Fig. 1A).

Figure 1.

Comparison of detoxification enzyme from Acyrthosiphon pisum and Myzus persicae: (A) cytochrome P450s, with clade names on the A. pisum branches as listed in Table 1; (B) esterases, with letters on the nodes representing the clades listed in Table 2 (C) and glutathione S-transferases, with class names on the branches as in Table 3. Predicted protein coding genes from A. pisum were compared with expressed sequence tag (EST) sequences from M. persicae using the Clustal alignment algorithm and a neighbour-joining tree. A. pisum sequences with putative orthologues in M. persicae are highlighted with green branches and those with no relatives present in the M. persicae EST data are indicated with red branches. In some cases, clear orthologues were not established due to many A. pisum sequences being grouped with a single M. persicae sequence (blue branches) or a single A. pisum sequence being grouped with many M. persicae sequences (orange branches). M. persicae sequences for which there is no clear A. pisum homologue are shown with black branches. A. pisum genes are represented by ACYPI numbers, and M. persicae genes are represented by contig numbers from the current analysis (, with the exception of previously studied esterases that are included as GenBank identifiers. The scale bars are proportional to the number of amino acid changes between different proteins.

Although the P450 loci from the pea aphid genomic sequence could be readily assigned to clades through the construction of phylograms, the addition of fragments of P450 loci from the M. persicae EST clusters increased the level of ‘noise’ in the phylograms. This is evident in Fig. 1A, where clade assignments for the pea aphid P450s (labelled nodes in the figure) are somewhat dispersed. However, potential M. persicae orthologues were identified using phylograms of DNA sequences from the M. persicae EST clusters and the predicted pea aphid DNA sequences. Putative orthologues are indicated in the supplementary Table S3. Sequences were considered putatively orthologous if they were paired with a pea aphid gene and had bootstrap support greater than 50%. Although there is considerable uncertainty in the specific assignment of the M. persicae sequences at the clade level (Table 1), it is clear that some the pea aphid sequences group separately from some M. persicae sequences, indicating divergence in the P450 complement between the two species. It is impossible to determine whether the remaining A. pisum loci have M. persicae orthologues, because the available EST data likely represent only a fraction of all M. persicae P450s. On the other hand, more than 100 of the M. persicae P450 ESTs have no clear match in the A. pisum genome, suggesting that there has been an expansion of this gene family in M. persicae or, conversely, a contraction in A. pisum. Most species in the Aphidinae (von Dohlen et al., 2006) are more specialized than M. persicae in their feeding habits, but the number of P450 enzymes encoded in their genomes is as yet unknown.


There are 30 members of CCE superfamily in the A. pisum genome (Supplemental Table S4), compared with the 24, 35, and 51 that have been identified in Ap. mellifera, D. melanogaster and An. gambiae, respectively (Table 2; Claudianos et al., 2006). All of the A. pisum genes appear functional, though several lack EST support and a few are truncated, likely due to errors with the genome assembly. In comparison, there are 19–23 identifiable CCEs in the M. persicae EST unigene set, 13 with putative A. pisum homologues and 6–10 esterase-like genes that do not have any obvious homologues in the A. pisum genome (Fig. 1B). This suggests that there is diversification of enzyme functions in the esterase gene family, at least for those that are not involved in basal metabolism that would likely be common to all aphids.

Table 2.  Carboxyl/cholinesterase genes
CladeDrosophila melanogasterApis melliferaAcyrthosiphon pisumMyzus persicae*
  • *

    Numbers based on M. persicae expressed sequence tag data.

Pheromone and hormone processingD3100
Neuro and develop-mentalH4000

Known CCEs can be divided into 13 clades (Ranson et al., 2002), seven of which are represented in A. pisum and M. persicae (Table 2). Clades without identifiable A. pisum or M. persicae homologues are the Diptera-specific clades B and C, integument esterases (D), dipteran juvenile hormone esterases (F), lepidopteran juvenile esterases (G) and the glutactin like esterases (H). Thirteen of the A. pisum esterase genes are found in small clusters of 2–6 genes per scaffold. This is particularly apparent in clade E, where there appears to be an expansion in A. pisum. However, this level of duplication is less than that found in the Diptera, where 8–10 CCEs may cluster together (Campbell et al., 2003). In A. pisum, the largest CCE cluster consists of 6 genes that share no more than 60% amino acid identity to each other (Fig. 1B), suggesting fairly ancient duplication events.

Esterases in clades A–C are involved in the detoxification of xenobiotics. A. pisum or M. persicae only possess esterases in clade A, which also contains an esterase linked to organophosphate resistance in Anisopteromalus calandrae, a parasitic wasp (Zhu et al., 1999). It has been suggested that the eight clade A esterases in Ap. mellifera represent an order-specific radiation within the Hymenoptera (Claudianos et al., 2006). However, given a similar radiation in A. pisum and M. persicae, it would seem that an order-specific radiation in clade A is not unique to Ap. mellifera. Comparing A. pisum and M. persicae shows that they both have five members of clade A. However, the absence of direct homologues suggests an ancient radiation event or independent expansions of this gene family (Fig. 1B).

A. pisum shows a reduction in diversity in CCEs involved in hormone and pheromone processing (clades D-H) with only clade E having any members at all (Table 2). However, within clade E, A. pisum esterase genes have undergone a considerable expansion to 18 genes, compared with three in D. melanogaster and Ap. mellifera and five in An. gambiae (Claudianos et al., 2006). When the 18 A. pisum clade E CCEs are aligned with CCEs from other species they form a monophyletic clade. However when aligned with M. persicae they split into two sub-clades, perhaps a reflection of the diversity within clade E in aphids (Fig. 1B). The 11–12 M. persicae unigenes in clade E suggest that this expansion is not unique to A. pisum. Clade E enzymes are thought to be largely involved with pheromone and hormone processing in insects, and it is interesting that this gene family has been expanded in A. pisum and M. persicae (Table 2). Although, aphid alarm and sex pheromones have been identified (Dawson et al., 2005; Hatano et al., 2008; Verheggen et al., 2008), aphid pheromone communication is almost certainly less complex than that of honeybees (Slessor et al., 2005). Therefore, the relative expansion of the aphid CCE clade E family likely serves a different function.

In organophosphate-resistant M. persicae, the E4 esterase gene is often found in clusters of up to 80 virtually identical copies, and is associated with another esterase, FE4 (Field & Devonshire, 1998). Although the closest A. pisum homologue of M. persicae E4 esterase (ACYPI623066) is also clustered with other esterase genes, there is only one copy in the sequenced genome (Fig. 1B). ACYPI623066 and ACYPI559388, the A. pisum genes most similar to M. persicae FE4 esterase (Fig. 1B), also generated the most hits in comparison with the M. persicae transcript data (Supplemental Table S4).

Clades F and G contain validated juvenile hormone esterases (JHE) for Diptera and Lepidoptera, respectively. Given their phylogenetic specificity, it is perhaps not surprising that A. pisum and M. persicae do not have members of these clades. However ACYPI381461 and ACYPI929836 in clade E are candidate aphid JHEs. The predicted coding sequence of ACYPI381461 contains two GQSAG nucleophilic elbow motifs, whereas ACYPI929836 has one. The GQSAG motif is present in the active site in all functionally validated JHEs, though there is usually only one in each protein. All A. pisum motifs align with the JHEs from D. melanogaster and lepidopteran species (Fig. 2). Other analyses show that ACYPI381461 is somewhat longer than the other aphid esterases in clade E and does not have an obvious M. persicae homologue.

Figure 2.

Alignment of known juvenile hormone esterase (JHE) sequences with the potential pea aphid JHEs. Almost all known JHE enzymes possess a GQSAG motif (in bold) in the nucleophillic elbow of the active site.

With the exception of acetylcholine esterase (ACHE, clade J), members of the neurological/developmental group (Table 2, clades I–M) tend to be non-catalytic and are involved in cell–cell interactions. Seven members of this group were identified in A. pisum (5–8 in M. persicae; Table 2), less than the 10–12 identified in Ap. mellifera, D. melanogaster and An. gambiae (Claudianos et al., 2006). Like Ap. mellifera and An. gambiae, A. pisum has two AChEs, one of which one (ACYPI102248) has a likely neurological function. In contrast, M. persicae EST data show three potential AChEs. Other CCEs found in aphids include the neuroligins (clade L), conserved structural proteins involved in synapse formation, gliotactin (clade K), which is thought to be a structural protein, and clade I, which has unknown function (Table 2).

Glutathione S-transferases

Analysis of the A. pisum genome identified 20 putative members of the GST superfamily (Fig. 1C; Supplemental Table S5). This is fewer than the number of loci identified in D. melanogaster (38) and An. gambiae (31) but more than Ap. mellifera (10), which has an reduced number of genes encoding detoxification enzymes (Ranson et al., 2001; Claudianos et al., 2006). cDNA sequences provide gene expression evidence for 15 of the 20 likely A. pisum GSTs. Insects generally harbour six different classes of GSTs (Chelvanayagam et al., 2001). However, although the A. pisum genome encodes two microsomal GSTs and GSTs in the delta (10), theta (2), and sigma (6) classes, it apparently lacks the epsilon, omega and zeta classes (Table 3). The delta and epsilon GST classes are found uniquely in insects (Ranson et al., 2001) and have been implicated in insecticide resistance. However, A. pisum has fewer members of these two GSTs classes than D. melanogaster and An. gambiae.

Table 3.  Glutathione S-transferase genes
ClassDrosophila melanogasterApis melliferaAcyrthosiphon pisumMyzus persicae*
  • *

    Numbers based on M. persicae expressed sequence tag data.


EST data from the generalist aphid M. persicae show at least 14 and a maximum of 21 GST-like genes (Fig. 1C; Table 3). M. persicae homologues were found for all except two of the predicted A. pisum genes. Eight M. persicae genes are obviously homologous to genes identified in A. pisum, whereas others seemed to have diversified (Fig. 1C). As in the case of A. pisum, the epsilon, omega and zeta GST classes are lacking in M. persicae (Table 3), suggesting that these three GST classes may be absent from aphids in general. Analysis of cDNA libraries from specific tissue types (Ramsey et al., 2007) suggests that one GST (contig 1196) is specifically expressed in gut tissue, whereas another (contig 3648) is over-represented in the salivary glands of M. persicae.

A. pisum EST data show several alternative splicing variants that are apparently derived from a single GST gene (Fig. 1C; Supplemental Table S5). Although the M. persicae EST data also suggest GST alternative splicing, these do not occur for the same gene as in A. pisum. Such alternative splicing could be an evolutionary strategy to broaden the spectrum of effectiveness in detoxification enzymes, and might thus allow aphids to expand their host range or to effectively metabolize xenobiotics, such as plant secondary metabolites and insecticides (Ranson et al., 2001). Further analysis of the A. pisum genome showed the presence of enzymes that are potentially involved in the degradation of conjugated glutathione, including eight gamma-glutamyl transpeptidases and 34 aminopeptidases (Supplemental Table S5). These A. pisum genes, along with their M. persicae EST homologues (Supplemental Table S5), indicate that aphids have the complete pathway for degradation of xenobiotics via conjugation to glutathione.


The A. pisum genome and large-scale sequencing of M. persicae ESTs have partially made it possible to determine whether a broad generalist insect herbivore has a greater diversity of detoxification enzymes than a specialist. There is apparently no great expansion of the GSTs and CCEs in M. persicae relative to A. pisum, and the number of interspecies differences in the presence and absence of specific genes are likely to be similar to those found for other aphid gene families. In the case of the CCEs, a relatively large fraction of these proteins are involved in basal metabolic functions that are likely to be the same or similar in A. pisum and M. persicae. Although it is difficult to estimate exact gene numbers from the EST data, the P450 gene family is at least 40% larger in M. persicae than in A. pisum. If one assumes similar gene representation in EST libraries produced from the two species (115 vs. 51 unique genes), then M. persicae may encode twice as many P450s as A. pisum. This expansion may reflect the different host ranges of the two aphid species, only Fabaceae for A. pisum and 40 different plant families for M. persicae, and is consistent with the hypothesis that a phloem-feeding generalist insect herbivore would require a greater number of detoxification enzymes than a specialist.

Experimental procedures

RNA isolation and cDNA synthesis

Total RNA was isolated using the RNeasy kit (Qiagen, Valencia, CA, USA) from 100 to 200 aphids, a mixture of adults and all larval stages. RNA samples were prepared from two different lineages of M. persicae: tobacco-adapted M. persicae feeding on tobacco, and non-adapted M. persicae feeding on cabbage. mRNA was purified using Oligotex (Qiagen) and precipitated to increase the concentration to ∼100 ng/μl. cDNA synthesis and normalization were performed using a modification of the Creator SMART cDNA synthesis protocol (Clontech, Mountain View, CA, USA) in conjunction with the Trimmer Direct cDNA normalization kit (Evrogen, Moscow, Russia). A modified reverse transcription primer, 5′-AAGCAGTGGTATCAACGCAGAGTGGCCGAGGTTTTGTTTTTTTTTCTTTTTTTTTTVN-3′, was used. To avoid potential problems associated with 454 sequencing of homopolymeric stretches, adaptor-ligated poly-T primers for reverse transcription were modified to disrupt the poly-T tract every several bases. The presence of the SMART IV oligo (Clontech) in the reverse transcription reaction established the 3′ end of the reverse transcription product as the reverse complement of the 5′ adaptor. Therefore, the subsequent second strand synthesis and amplification (16 cycles) was accomplished with one primer (5′ PCR primer, 5′-AAGCAGTGGTATCAACGCAGAGT-3′), which anneals to the identical 3′ ends of complementary strands.

cDNA normalization

Normalization of cDNA by duplex-specific nuclease treatment was performed to increase the relative abundance of low-expression transcripts (Zhulidov et al. 2004). Normalized cDNA was subjected to 2 rounds of amplification: 20 cycles in the first round and 12 cycles in the second round. cDNA was purified and sequenced with a Roche GS-FLX system.

Sequence assembly and analysis

Primer and adaptor sequences were trimmed from sequences, and any trimmed sequences less than 30 bp were discarded. In an effort to identify sequences not represented in previous Sanger EST sequencing of M. persicae, 454 sequences were aligned to an existing unigene set (Ramsey et al., 2007). 454 sequences with over 95% overlap with an existing unigene were discarded. The remaining sequences were clustered with MCL software (Enright et al., 2002), and consensus contigs were formed from each cluster using CAP3 software using default parameters (Huang & Madan, 1999). Contigs and singletons were BLASTed against the M. persicae Sanger unigene set, and sequences with more than 75% overlap with the Sanger unigene were discarded, as were sequences less than 100 bp in length.

Annotation of 454 unigenes

Annotation of M. persicae sequences relied on BLAST (Altschul et al., 1997; Tatusova & Madden, 1999) to identify the most similar sequences among Drosophila and A. pisum RefSeq proteins, as well as A. pisum genomic scaffolds. BioPerl modules and custom Perl scripts were used to parse GenBank files and extract gene descriptions for top BLAST hits.

Phylogenetic trees

Protein sequences for A. pisum were downloaded from AphidBase and/or GenBank. M. persicae ESTs were obtained from GenBank and converted into protein sequences. Trees are built using the amino acid sequence and neighbour-joining using FigTree (version1.2.1; Andrew Rambaut;


This work was funded by USDA grant 2005-35604-15446 to GJ and by the CSIRO Office of the Chief Executive postdoctoral fellowship scheme. The authors would also like to thank Qi Sun for help with sequence assemblies, and Alexandre dos Santos Cristino and John Oakeshott for their comments and advice in the preparation of this manuscript.