Halloween genes and nuclear receptors in ecdysteroid biosynthesis and signalling in the pea aphid


Guy Smagghe, Laboratory of Agrozoology, Department of Crop Protection, Ghent University, 9000 Ghent, Belgium. Tel.: +32 9 2646150; fax: +32 9 2646239; e-mail: guy.smagghe@ugent.be


The pea aphid (Acyrthosiphon pisum) is the first whole genome sequenced insect with a hemimetabolic development and an emerging model organism for studies in ecology, evolution and development. The insect steroid moulting hormone 20-hydroxyecdysone (20E) controls and coordinates development in insects, especially the moulting/metamorphosis process. We, therefore present here a comprehensive characterization of the Halloween genes phantom, disembodied, shadow, shade, spook and spookiest, coding for the P450 enzymes that control the biosynthesis of 20E. Regarding the presence of nuclear receptors in the pea aphid genome, we found 19 genes, representing all of the seven known subfamilies. The annotation and phylogenetic analysis revealed a strong conservation in the class of Insecta. But compared with other sequenced insect genomes, three orthologues are missing in the Acyrthosiphon genome, namely HR96, PNR-like and Knirps. We also cloned the EcR, Usp, E75 and HR3. Finally, 3D-modelling of the ligand-binding domain of Ap-EcR exhibited the typical canonical structural scaffold with 12 α-helices associated with a short hairpin of two antiparallel β-strands. Upon docking, 20E was located in the hormone-binding groove, supporting the hypothesis that EcR has a role in 20E signalling.


The insect steroid hormone, 20-hydroxyecdysone (20E) controls and coordinates the development in insects. A peak titer of 20E triggers the moulting/metamorphosis process, allowing insects to lose their old exoskeleton and enter into the next developmental stage. To produce the steroid hormone 20E, the precursor hormone ecdysone (E) is synthesized in the prothoracic glands from dietary cholesterol or phytosterols as insects cannot synthesize the steroid precursor cholesterol de novo. This E is then secreted into the body cavity and converted into 20E in various peripheral tissues predominantly in the insect midgut and fat body (Petryk et al., 2003; Rewitz et al., 2006a; Iga & Smagghe, 2009). The whole trafficking mechanism of the ecdysteroid precursors has not been elucidated yet in insects, but it may use similar mechanisms as the steroidogenesis in vertebrates (Rees, 1995; Gilbert et al., 2002; Lafont et al., 2005; Huang et al., 2008).

Cytochrome P450 (CYP) enzymes, well known for their monooxygenase activity, constitute one of the largest families and are distributed throughout a wide variety of living organisms, from bacteria to mammals (Werck-Reichhart & Feyereisen, 2000). To date, four P450 enzymes, namely CYP306A1 (Phantom, Phm), CYP302A1 (Disembodied, Dib), CYP315A1 (Shadow, Sad) and CYP314A1 (Shade, Shd), involved in the ecdysteroid biosynthesis have been identified and characterized. As shown in Fig. 1, the products of phm, dib and sad sequentially convert the precursor of E, 2,22,25-trideoxyecdysone (ketodiol), into 22,2-dideoxyecdysone (ketotriol), 2-deoxyecdysone and E (Chavez et al., 2000; Warren et al., 2002, 2004; Niwa et al., 2004, 2005; Rewitz et al., 2006b). Further on, the product of shd mediates the last step of the conversion from E into 20E (Petryk et al., 2003; Rewitz et al., 2006a; Maeda et al., 2008). In addition, CYP307A1 (Spook, Spo), the paralogue gene of Spo, CYP307A2 (Spookier, Spok) and CYP307B1 (Spookiest, Spot), involved in the initial conversion process from 7-dehydrochoresterol into ketodiol are identified, but their biochemical functions are not well understood (Namiki et al., 2005; Ono et al., 2006). Spok has so far only been identified in Drosophila, while spot is identified in mosquitoes (Aedes aegypti and Anopheles gambiae), honey bees (Apis mellifera) and red flour beetles (Tribolium castaneum). Together, they are called the Halloween genes. The Halloween genes have been identified/predicted in multiple insect species (Niwa et al., 2004, 2005; Warren et al., 2004; Sieglaff et al., 2005; Rewitz et al., 2006a,b, 2007; Iga and Smagghe, 2009) and the function of these genes is characterized in the fruitfly (Drosophila melanogaster), the silkmoth (Bombyx mori) and the tobacco hornworm (Manduca sexta). In addition to insects, the Halloween genes are also identified in the crustacean genome of Daphnia pulex (Rewitz & Gilbert, 2008), suggesting a high conservation for ecdysteroid biosynthesis in the Arthropoda phylum.

Figure 1.

Summary of the biosynthesis of 20-hydroxyecdysone (20E) and the 20E regulatory cascade. In the upper part, the biosynthetic scheme presents the functions of the Halloween genes (Spo/Spok/Spot, Phm, Dib, Sad and Shd) that are boxed, while intermediate products are mentioned in bold. In the lower part, binding of 20E to the EcR-Usp complex starts the ecdysteroid cascade with the expression of the so called ‘early’ genes (EcR, E75, BR, E74 and E93) that will then be responsible for the upregulation of a set of ‘early-late’ genes (including HR3, HR4, HR38 and E78). Via FTZ-F1, the signal will eventually be passed on to the ‘late’ genes. The nuclear receptors are boxed. Redrafted after Rewitz et al. (2007) and Bonneton et al. (2008).

After secretion into the hemolymph, 20E will start the moulting/metamorphosis process by acting directly upon the transcriptional activity of specific target genes through chromosome puffing. Ashburner (1973) proposed a formal model to explain control of the transcription of the vast network of genes whose activity is induced by the hormone. As shown in Fig. 1, the first step in this cascade is the binding of 20E to a postulated receptor protein, a heterodimer formed by the ecdysone receptor (EcR) and Ultraspiracle (Usp), which are both members of the nuclear receptor (NR) superfamily (Henrich, 2005; Billas et al., 2009). Activation of this receptor complex initiates and mediates the transcription of a number of other NRs, in a cascade with at first expression of ‘early’, then ‘early-late’ and finally ‘late’ genes for a successful moulting/metamorphosis.

The NR superfamily is a group of ligand-activated transcription factors which are present in various animals. Studies of these NRs in both mammals and arthropods revealed seven distinct subfamilies (NR0-NR6) in which these NRs can be classified. All of them possess a highly-conserved DNA-binding domain (DBD), containing two C4-type zinc finger regions, that is responsible for binding of the transcription factor to the DNA. Except the NR0 superfamily, all NRs also contain a less-conserved ligand binding domain (LBD) with which the receptor is able to bind its ligand. Unlike most other transcription factors, NRs can be activated by binding of small lipophilic ligands such as hormones and fatty acids that are capable of going through the cell membrane. Besides moulting and metamorphosis, NRs are involved in, e.g. embryonic development (King-Jones & Thummel, 2005), cell differentiation (Siaussat et al., 2007), reproduction (Raikhel et al., 1999), and are therefore also considered as important novel targets in pest insect control (Palli et al., 2005; Billas et al., 2009).

Recent genome projects of both vertebrate and insect species contributed a lot in identifying the different NRs. In total, 48 NRs are known in humans (Robinson-Rechavi et al., 2001), over 284 are present in Caenorhabditis elegans (Gissendanner et al., 2004), 49 in mouse and 47 in rat (Zhang et al., 2004). In insects on the other hand, the number of NRs found is surprisingly lower. In Drosophila, only 21 NR genes have been identified, 20 in Anopheles, 22 in Apis, 19 in Bombyx and 21 in Tribolium (Adams et al., 2000; Holt et al., 2002; Velarde et al., 2006; Bonneton et al., 2008; Cheng et al., 2008). The latter studies also showed that these NRs, especially the DBD and LBD, are highly conserved in holometabolous insects. So far, however, no complete set of NRs from hemimetabolous insects, which are indirect developers that do not undergo a pupal metamorphosis stage, has been described. The recent genome project on the pea aphid, Acyrthosiphon pisum (International Aphid Genomic Consortium, 2010) gives us the unique opportunity to present a comprehensive identification and characterization of the NRs in this important hemipteran insect, which is an emerging model organism for ecological, developmental and evolutionary studies (Brisson & Stern, 2006; Stern, 2008).

In this paper, we will first focus on the Halloween genes that control the ecdysteroid biosynthesis pathway to build up a peak titer of 20E hormone. In a second part, we will unravel the presence of the NRs in the pea aphid with an emphasis on understanding and identifying the pathway of hormone signalling by 20E through a regulatory cascade of NRs – especially the functional receptor formed by the heterodimer EcR-Usp – and also on ‘early’, ‘early/late’ and ‘late’ genes. We performed a phylogenetic analysis to confirm the annotation and to investigate evolutionary traits of the pea aphid in the phylum of Arthropoda. Besides, we also cloned the EcR, Usp, E75 and HR3. Finally, we constructed a 3D-modelling of the LBD of Ap-EcR to evaluate if it exhibits the typical canonical structural scaffold with 12 α-helices, and then performed a ligand docking to support the theory that EcR has a role in 20E hormone signalling.

Results and discussion

Phylogenetic analysis of Halloween genes

The candidates of the A. pisum Halloween genes were obtained from AphidBase (http://www.aphidbase.com/aphidbase/) by TBLASTN using the amino acid sequence of Apis mellifera and T. castaneum. We found three candidates of A. pisum spook (Ap-spo1, Ap-spo2 and Ap-spo3), one candidate for phantom (Ap-phm) and disembodied (Ap-dib) and also three candidates for shade (Ap-shd1, Ap-shd2 and Ap-shd3). The AphidBase ID and cross reference number are shown in Table 1. The expression of these predicted sequences was confirmed by reverse transcriptase-PCR (RT-PCR) (data not shown).

Table 1.  Halloween genes identified in Acyrthosiphon pisum; including Genbank IDs of Drosophila melanogaster, Apid mellifera and Tribolium castaneum are also included
Name FunctionD. melanogaster (Dm)T. castaneum (Tc)A. mellifera (Am)AphidBase ID Refseqidentity %
  1. In the case of T. castaneum CYP307A1/2 and CYP307B1, contigs were given on which the gene is found. Idem for Apis mellifera CYP307B1. On the right, identity percentages between A. pisum sequences and the respective orthologs are also presented.

CYP307A1/2spook / spookierUnknownAF484415 / NM_001110990AAJJ01000951ACYPI001519(Ap-spo1)XM_00194572643/4552

The predicted sequences have an open reading frame (ORF) encoding the putative protein of Ap-Spo1 (518 amino acids), Ap-Spo2 (528 amino acids), Ap-Spo3 (507 amino acids), Ap-Phm (492 amino acids), Ap-Dib (493 amino acids), Ap-Sad (443 amino acids), Ap-Shd1 (518 amino acids), Ap-Shd2 (518 amino acids) and Ap-Shd3 (505 amino acids). The product size of these candidates is consistent with the character of the different CYP products (approx. 500 amino acids). Alignment of the different A. pisum Halloween genes candidates with those of other insect orders (Lepidoptera, Diptera, Hymenoptera and Coleoptera) show high conservation of insect P450 motifs (helix-C, helix-I, helix-K, PERF-motif and heme-binding domain) (Fig. S1A–E). For Spo, the helix-C and helix-I structures that are usually so typical for P450 proteins were not well conserved but this is consistent with the P450 proteins in other members of the class of Insecta. Only the heme-binding domain of Ap-Shd3 shows a significant difference compared with that of other insect orthologs. The Ap-Shd3 completely lacked the sequence of the heme-binding domain which means the protein may not be functional at all. RT-PCR showed however, that the protein is expressed in the pea aphid (data not shown). This leads to the hypothesis that the protein might have other functions or another role in the pea aphid than just the ones attributed to the Shd proteins so far.

The result of phylogenetic analysis shows two classes: the 2 Clan with Spo/Spok/Spot and Phm, and the Mito Clan with Dib, Sad and Shd (Fig. 2). In the A. pisum genome we detected two Spo-like products, Ap-Spo1 and Ap-Spo2. Both are on the same branch as Spo orthologues of other species, and sequence comparison shows that both Ap-Spo1 and Ap-Spo2 are quite similar to each other (85% identity). Ap-Spo3, however, is very different from Ap-Spo1 and Ap-Spo2, showing only 38% and 39% identity, respectively. When we compare with other Halloween genes, we notice that Ap-Spo3 shows 33–42% identity with Spo/Spok orthologues, and 44–47% identity with Spot orthologues, suggesting Ap-Spo3 might be a Spot orthologue rather than a Spo orthologue. In addition, phylogenetic analysis confirms this hypothesis since both Ap-Spo3 and Spot orthologues are branched together (Fig. 2). Three Shd candidates were identified in A. pisum. Two of them, Ap-Shd1 and Ap-Shd2 show a high conservation (94% identity), suggesting these could be duplicated genes, while another Shd candidate, Ap-Shd3, only shows 66% identity with both Ap-Shd1 and Ap-Shd2. Ap-Shd1 and Ap-Shd2 show 38–51% and 38–50% identity with Shd orthologues, respectively, while Ap-Shd3 exhibits 34–44% identity to those same Shd orthologues. As we described before, Ap-Shd3 will probably not function as a P450 enzyme since it lacks the necessary heme-binding domain which is important for P450 enzyme activity. We can therefore assume that only Ap-Shd1 and/or Ap-Shd2 are likely to be responsible for converting E into 20E.

Figure 2.

Phylogenetic tree of the Halloween genes. This tree was constructed using the neighbour-joining method performed with the amino acid sequences of the whole sequences. Bootstrap values as percentage of a 1000 replicates >50 are indicated on the tree. Aa: Aedes aegypti, Ag: Anopheles gambiae, Am: Apis mellifera, Ap: Acyrthosiphon pisum, Bm: Bombyx mori, Dm: Drosophila melanogaster, Ms: Manduca sexta, Tc: Tribolium castaneum.

Identification of the nuclear receptors in the genome of Acyrthosiphon pisum displays strong conservation in insects

All available Gene prediction sets (Gnomon, Augustus, Genscan, GeneID) and all available A. pisum sequence data were used to identify the NRs in the pea aphid genome. The in silico detection of NRs in the genome is greatly facilitated by the strongly conserved DBD and LBD regions that characterize these NRs (Table 2). Blast searches were performed using peptide sequences of all known NRs from D. melanogaster, Apis mellifera and T. castaneum. As a result, an initial 20 NR sequences were identified in the A. pisum genome, representing all of the seven NR subfamilies. Predicted mRNA sequences and gene models were also manually edited if necessary using the Apollo Genome Annotation Curation Tool (Lewis et al., 2002). After further analysis, two NR0 sequences turned out to be duplicated Knirps-like (Kni-like) genes, which bring the total set of different NR genes to 19.

Table 2.  Nuclear receptors in Acyrthosiphon pisum
NuReBASENameProductDrosophila melanogasterTribolium castaneumApis melliferaAphidBase IDRefseqDm/Ap identity %Tc/Ap identity %Am/Ap identity %
  • Genbank IDs of D. melanogaster and Apis mellifera orthologues are shown, along with the BeetleBase IDs of the T. castaneum NRs. On the right, identity percentages between the LBD of A. pisum NRs and the LBD of D. melanogaster, T. castaneum and Apis mellifera orthologs are also presented.

  • *

    E78 DBD of T. castaneum was incomplete.

  • Sequences were not available.

NR1D3Ecdysone-induced protein 75E75NP_524133TC_12440XP_393790ACYPI007773XM_001946050955895749577
NR1E1Ecdysone-induced protein 78E78NP_524195TC_03935XP_396527ACYPI002307XM_001952697965194*609360
NR1F4Hormone receptor like in 46HR3NP_788303TC_08909XP_392128.3 (LOC100162388)97531007110072
NR1H1Ecdysone receptorEcRNP_724456TC_12112NP_001091685.2ACYPI001692XM_001942632886397759974
NR2A4Hepatocyte nuclear factor 4HNF4NP_476887.2TC_08726ACYPI009409XM_0019468938874927888
NR2D1Hormone receptor like in 78HR78NP_524203TC_04598XP_392769ACYPI004234XM_001948276873392449343
NR2E2TaillessTLLNP_524596TC_00441no orthologueACYPI009360XM_001945880832683268526
NR2E3Hormone receptor 51HR51NP_725457TC_09378XP_396999.3ACYPI007601XM_001948835966794769475
NR2E4DissatisfactionDSFNP_477140TC_01069XP_624265.2 (LOC100161040)937491828462
NR2E5Hormone Receptor 83HR83NP_649647TC_10460no orthologueACYPI48102 7184147628
NR2F3Seven upSVPNP_731681TC_01722XP_392402ACYPI005513XM_001943986969794969699
NR3B4Estrogen-related receptorERRNP_729340TC_09140NP_001155988.1ACYPI009262XM_001948964925295579558
NR4A4Hormone receptor like in 38HR38NP_477119TC_13146XP_623987ACYPI003909XM_001944676997397749671
NR5A3Fushi tarazu transcription factor 1FTZ-F1NP_730359TC_02550no orthologueACYPI003708XM_001945429996799779974
NR5B1Hormone receptor like in 39HR39NP_476932TC_14986XP_396918ACYPI006350XM_001946992837788828985
NR6A1Hormone receptor 4HR4NP_001033823TC_00543XP_394401.3ACYPI008092XM_001945691925891769274
NR0A2Knirps-like-1KNRL-1NP_788552TC_03413XP_395932.2ACYPI49096 91no LBD95no LBD91no LBD
 Knirps-like-2KNRL-2    (LOC100168450)      
NR0A3EagleEGNP_524206TC_03409no orthologueACYPI48166 87no LBD95no LBD92no LBD
NR1J1Hormone receptor 96HR96NP_524493TC_10645XP_624213.2not presentnot present      
NR2E6Photoreceptor specific NRApPNRno orthologueTC_13148XP_624042.1not presentnot present      
NR0A1KnirpsKNINP_524187no orthologueno orthologuenot presentnot present      

RT-PCR was used to confirm the presence of the predicted NR mRNAs in the transcriptome of the pea aphid (data not shown). All NR mRNAs were picked up by RT-PCR, except for the HR83 gene. We did not manage to get a conclusive result for this gene despite using several different primer pairs. Some of them resulted in clear single bands, but the fragment size was not as would be predicted based on the annotated gene. This means that either we picked up a wrong fragment, or the exon/intron prediction of the gene is incorrect. Further sequencing of this fragment should give us more information about the identity of this fragment and about the transcription of this gene. These results prove that the complete set of NRs found in the A. pisum genome, except HR83, is transcribed and none of them are pseudogenes.

Table 2 presents all the pea aphid orthologues for each of the previously annotated D. melanogaster (Adams et al., 2000), Apis mellifera (Velarde et al., 2006), B. mori (Cheng et al., 2008) and T. castaneum (Bonneton et al., 2008) NRs. Similar numbers of NRs were found in these five insect genomes. All NRs are also structurally very similar to their orthologues. All of them possess a DBD and LBD, except for the NR0 subfamily, which only contains a DBD. As could be predicted from previous analyses of NRs, pairwise alignments of the conserved domains of D. melanogaster and A. pisum NRs show a very high (71–99%) convergence for DBDs while the LBDs are more divergent (26–97%; with 77% identity for HR39 being the second highest). The most divergent NRs are HR83 (NR2E5) and TLL (NR2E2), while SVP (NR2F3) shows the least divergence.

In general, these results prove that NRs have a very strong conservation among insects, also outside the holometabolous insect group. All pea aphid NRs show similar identity percentages for its orthologues as the identity percentages which were reported in earlier NR annotation publications, where the NRs in T. castaneum and Apis mellifera were compared with the NRs in D. melanogaster, even though we would expect bigger differences based on the evolutionary distances of these species. The NRs that are part of the 20E regulatory cascade, the ‘early’ gene E75 (NR1D3) and the ‘early-late’ genes HR3 (NR1F4), HR4 (NR2A4), HR38 (NR4A4), E78 (NR1E1) and FTZ-F1 (NR5A3) also show the same kind of convergence as reported with other species, demonstrating that all the main NR members of this cascade are present in the pea aphid. One remarkable observation was the extremely high conservation of the SVP-LBD among insects, much more than for the other NRs (97%, 96% and 99% compared with D. melanogaster, T. castaneum and Apis mellifera orthologues, respectively). The latter phenomenon may suggest that the structure of SVP, the insect orthologue of the vertebrate chicken ovalbumin upstream transcription factor (COUP-TF), is critical to its function and is under strong selective pressure against amino acid replacements in the LBD of the molecule. In D. melanogaster, where two isoforms of this protein are expressed, SVP has multiple reported functions. It is required for the development of four of the eight photoreceptors that develop in the ommatidia of the eye (Hiromi et al., 1993; Begemann et al., 1995; Kramer et al., 1995), it is a key component in the control of cell proliferation in Malpighian tubules (Kerber et al., 1998) and it also has an important role as a regulator in the development of neuroblasts by acting upon the Hunchback/Krüppel switch necessary for neuroblast differentiation (Kanai et al., 2005). In Ae. aegypti, this protein also has an effect on the vitellogenesis by acting as a negative regulator in the ecdysone receptor complex-mediated transactivation in the fat body (Miura et al., 2002).

Three NRs which were previously found in other insect species seem to be missing in the A. pisum genome: namely the NR1 subfamily member HR96, the NR0 subfamily member Knirps (Kni), and the NR2 family member PNR-like (NR2E6).

HR96 is an orphan receptor belonging to the NR1 subfamily (NR1J1). It is closely related to the EcR itself and is believed to be related to the vertebrate vitamin D-receptor (VDR), PXR and CAR, all of which bind a wide variety of xenobiotics (Laudet, 1997). HR96 is proven to play a role in the response of D. melanogaster to xenobiotics as reported by King-Jones et al. (2006), but a function regarding development or metamorphosis has not been reported yet. This NR is a part of the 20E signalling cascade and it is known that this ecdysteroid-induced NR can bind to the hp27 20E response element. This suggests that HR96 can compete with EcR-Usp for binding to a common set of target sequences (Fisk & Thummel, 1995), but since this NR has no known hormone ligand, it is difficult to speculate on its actual function regarding the ecdysteroid cascade. The absence of this gene suggests its role in this signalling pathway is redundant in aphids or being taken over by another protein or NR.

A member of the NR2 group that was initially identified in the honey bee, the NR2E6, and an orthologue for vertebrate photoreceptor-cell-specific nuclear receptors (PNRs), is missing in the A. pisum genome. This gene is also missing in the Drosophila genomes, although it has been identified in the T. castaneum genome. The absence of NR2E6 in the A. pisum and Drosophila genomes is a secondary loss in these lineages. A function for NR2E6 in the development of the compound eye has been proposed based on mRNA ‘in situ’ localizations in the Apis mellifera developing compound eyes (Velarde et al., 2006). The fact that A. pisum shares with D. melanogaster the vast majority of the genes involved in compound eye differentiation (Shigenobu et al., 2009) suggests this gene has been retained potentially to regulate lineage specific differences in compound eye architectures.

The third missing NR in Acyrthosiphon is Knirps (NR0A1), while we identified two paralogues of Knirps-like and one orthologue of Eagle. The T. castaneum, Apis mellifera, B. mori genomes also seem to lack a Knirps orthologue gene, as seen in the phylogenetic tree of the NR0 subfamily members. In Drosophila, Knirps has been characterized as encoding a transcriptional repressor important for the segmentation pathway (Nauber et al., 1988). The two other genes in the NR0 group Knirps-like (knrl) and Eagle (egon) are present in the A. pisum genome. Analysis of these genes in the honey bee suggested no direct involvement during segmentation (Dearden et al., 2006), as is the case in Drosophila. However, in the case of T. castaneum, Knirps-like has been characterized as having specific functions during head segmentation (Cerny et al., 2008). Our A. pisum analysis supports the notion that these genes have been independently duplicated in different insect lineages. At least in the case of Dipterans and Coleopterans, Knirps and Knirps-like have retained an ancestral role during segmentation, which has been likely lost from honey bees and pea aphids. Knirps-like and Eagle may also function as transcriptional repressors, but it remains to be determined in which pathways they participate.

Further phylogenetic analysis for nuclear receptors of pea aphid in phylum of Arthropoda

Besides the phylogenetic analysis of the NR0 subfamily (Fig. 3), NRs from the 6 other different subfamilies (NR1–NR6) were also examined by phylogenetic analysis and compared with NRs from several different species representing the major insect orders, such as Lepidoptera, Diptera, Hymenoptera, Coleoptera, and also from Crustacea and Arachnida (Fig. 4). This phylogenetic analysis showed that many of the NRs of A. pisum show close relationship with the NRs of the human louse (Pediculus humanus), as has been observed for genes throughout the genome. The T. castaneum NRs also showed very high convergence with both the A. pisum and P. humanus NRs for a number of NRs, even though the red flour beetle is a member of the Endopterygota, while the pea aphid and the human louse belong to the infraclass of the Paraneoptera.

Figure 3.

Phylogenetic tree of the insect NR0 subfamily members, showing the clustering of the novel Acyrthosiphon pisum genes with their respective orthologues. Novel A. pisum members of this group are highlighted in bold. The tree was rooted using the NR0B1 and NR0B2 vertebrate sequences as outgroup. The tree was constructed using the neighbour-joining method with the maximum length of sequence, resulting in 160 complete aligned sites. Support for the branches, when present, is indicated as a percentage of 1000 bootstrap replicates of neighbour-joining. Am: Apis mellifera, Ap: Acyrthosiphon pisum, Dm: Drosophila melanogaster, Dpse: Drosophila pseudoobscura, Tc: Tribolium castaneum.

Figure 4.

Figure 4.

Phylogenetic trees of EcR (A), E78 (B), HR39 (C) and ERR (D). This tree was constructed using the neighbour-joining method performed with the amino acid sequences of the LBD of the selected sequences. Bootstrap values as percentage of a 1000 replicates > 50 are indicated on the tree.

Figure 4.

Figure 4.

Phylogenetic trees of EcR (A), E78 (B), HR39 (C) and ERR (D). This tree was constructed using the neighbour-joining method performed with the amino acid sequences of the LBD of the selected sequences. Bootstrap values as percentage of a 1000 replicates > 50 are indicated on the tree.

When we look at these phylogenetic trees in Fig. 4 in detail, we notice that A. pisum NRs show a much higher resemblance to the T. castaneum and Apis mellifera orthologues than to the Diptera and Lepidoptera NRs, which often cluster together in a separate branch, even branching off before the Crustacea and Arachnida. This deviation from normal topology, as shown in the trees of Fig. 4A and B for EcR and E78, respectively, is due to a long branch attraction caused by an acceleration of evolutionary rate in the Mecopterida line (Diptera + Lepidoptera). This is consistent with the earlier findings of Bonneton et al. (2008) who have discovered that some NRs in Mecopterida species (Diptera + Lepidoptera), including EcR and E78, have undergone an increase in evolutionary rate. Other phylogenetic trees also confirm their results (data not shown).

In order to distinguish the different NR2 subfamily genes found in the genome, we also constructed phylogenetic trees for this entire subfamily. The NR2 subfamily tree (Fig. 5) clearly shows that the PNR-orthologue, found in T. castaneum and Apis mellifera is missing in the pea aphid. Genes are clustered together according to the group (A–F) they belong to.

Figure 5.

Phylogenetic tree of the NR2 subfamily members with NR2A, NR2B, NR2D, NR2E and NR2F. This tree was constructed using the neighbour-joining method performed with the full-length protein sequences of NR2 subfamily members. The PNR-like NRs found in Tribolium castaneum and Apis mellifera are also added, although no orthologue in the pea aphid could be found. Bootstrap values as percentage of a 1000 replicates >50 are indicated on the tree. Am: Apis mellifera, Ap: Acyrthosiphon pisum, Dm: Drosophila melanogaster, Tc: Tribolium castaneum.

Elements of the 20-hydroxyecdysone regulatory cascade

The 20E signalling cascade, as mentioned earlier in this work, is involved in moulting/metamorphosis and development. NRs play a very important role in this signalling pathway. Binding of 20E to the EcR-Usp heterodimer is the start of this signal. This complex, after binding to the hormone, will act as a transcription factor, immediately inducing expression of a number of ‘early’ genes, including the NRs E75 and HR96. These ‘early’ gene products will then be responsible for the upregulation of a set of ‘early-late’ genes, including the NRs HR3, HR4, E78 and HR39. Through FTZ-F1, this signal will then be passed on to induce expression of the ‘late’ genes (Fig. 1; Table 2). So far, most attention in this field has gone to holometabolous insects, which undergo a pupal metamorphosis stage. No extensive set of NRs for a hemimetabolous insect has been identified until now. Even though the pea aphid still undergoes several larval stages, it is possible that there are differences between the moulting processes of hemi- and holometabolous insects.

Most of the NRs involved in the 20E regulatory cascade proved to be present, not only in the genome of pea aphid, but also in its transcriptome, indicating they are expressed correctly. Only the HR96 gene, which was discussed above, is missing from this set of ecdysone-inducible NRs. And since its function in the moulting/metamorphosis processes is still unclear, speculation about the implications for the entire pathway are very difficult to make.

EcR and Usp, two NRs that are at the basis of the 20E signalling cascade, were cloned and sequenced in order to confirm the annotation and sequence of both genes (Fig. S2A, B). Primers used to pick up the fragment spanning most of the cDNA are listed in Table S1. The EcR in A. pisum has the typical DBD and LBD found in its orthologues in other insects. The EcR-DBD shows the typical C4 zinc finger domains in this protein, as is the case for the P-box, the D-box and the A/T-box. Regarding the EcR-LBD we score a strong conservation, with 63%, 75% and 74% identity compared with the Drosophila, Tribolium and Apis orthologues, respectively. Furthermore, the typical structure with 12 α-helices is well conserved. Both retrieved sequences confirmed the in silico analysis. Two other important NRs in the 20E signalling cascade, the so called ‘early’ genes, E75 and HR3 were also cloned and partially sequenced in order to confirm their presence in the transcriptome (Fig. S2C, D).

3D-modelling of the ligand binding pocket of Ap-EcR and ligand docking

The 3D model built for Ap-EcR-LBD exhibits the canonical structural scaffold of the EcR-LDBs, made of 12 α-helices associated with a short hairpin of two antiparallel β-strands (Fig. 6A). In addition, docking of 20E into the hormone-binding groove of Ap-EcR-LBD revealed a binding scheme similar to that found for other EcR-LBD (e.g. from the beetles Leptinotarsa decemlineata, Tenebrio molitor and Anthonomus grandis) (Billas et al., 2003; Soin et al., 2009). Upon docking, the alkyl chain of the hormone becomes inserted into one of the two pockets located at the bottom of the hormone-binding groove (Fig. 6B). A network of nine hydrogen bonds connects the hormone to residues Glu20, Met56, Thr57, Ala 112 and Tyr122, forming the binding groove (Fig. 6C). Stacking interactions with aromatic residues Phe111 and Trp238 help to complete the interaction.

Figure 6.

(A) Ribbon diagram of the modelled Ap-EcR-LBD. The 12 α-helices and the two β-strands forming the 3D-structure are labelled and differently coloured. N and C indicate the N-terminal and C-terminal ends of the polypeptide chain, respectively. (B) Clipping plane across the ecdysone-binding groove showing the insertion of the alkyl chain of 20-hydroxyecdysone (20E) (represented in pink stick) in one (black star) of the two pockets located at the bottom of the groove. (C) Network of hydrogen bonds (black dotted lines) anchoring 20E (pink stick) to amino acid residues forming the hormone-binding groove of Ap-EcR-LBD. Aromatic residues involved in stacking interactions with 20E are coloured orange.

Experimental procedures

Annotation of Halloween and nuclear receptor genes

The 1.0 release of the A. pisum genome was used as a basis for the bioinformatic analysis. Putative Halloween genes sequences were searched and obtained by TBLASTN, using the known orthologues from Apis mellifera and T. castaneum against the complete scaffold collection of the pea aphid genome.

NR protein sequences from D. melanogaster, T. castaneum and Apis mellifera were first used in BlastP searches against the NCBI Gnomon version 1 predicted protein sequences to find putative A. pisum orthologues. In case no orthologues were found in the pea aphid Gnomon protein data set, we searched the Acyr 1.0 assembly of the pea aphid genome (International Aphid Genomic Consortium, 2010, main paper) for homologous sequences using TBLASTN. After identification and localization in the genome, genes were examined and manually edited if necessary using the Apollo Genome Annotation Curation Tool (Lewis et al., 2002). This editing was done based on alignments of D. melanogaster, T. castaneum and Apis mellifera orthologues together with several gene prediction programs (Gnomon, Augustus, Genscan and GeneID).

Phylogenetic analysis

Whole amino acid sequences for the Halloween gene orthologues in Apis mellifera, T. castaneum, D. melanogaster, Ae. aegypti, An. gambiae, M. sexta and B. mori and for the nuclear receptors of D. melanogaster, T. castaneum and Apis mellifera were collected from the GenBank database. The LBD and DBD sequences of B. mori, An. gambiae, Ae. aegypti, Culex quinquefasciatus, P. humanus corporis and Daphnia magna were retrieved by Blast searches of A. pisum LBD sequences against the GenBank database or against the species' sequenced genome if no GenBank entry was present. The chosen NR sequences were then aligned by CLUSTALW2/CLUSTALX2 (Larkin et al., 2007). The trees were made by the neighbour-joining method using MEGA4 software (Tamura et al., 2007). Bootstrap analysis with 1000 replicates for each branch position was used to assess support for nodes in the tree (Felsenstein, 1985).

Confirmation of transcription of the Halloween and NR genes

Presence of these transcripts in the A. pisum RNA was examined by RT-PCR. The pea aphids were taken from a continuous colony in the Laboratory of Agrozoology at Ghent University. A mixture of different stages of A. pisum and a collection of newborn aphids only was used to extract total RNA using the TRI Reagent (Sigma, Bornem, Belgium), based on the single-step liquid phase separation method reported by Chomczynski & Sacchi (1987). Then, cDNA was synthesized from 1 µg of this RNA in a 20 µl reaction using the First Strand cDNA synthesis kit (Roche, Berlin, Germany) according to the manufacturer's instructions. Both cDNA samples (newborn only and a mixture of stages) were used in these RT-PCR experiments. Primers were designed using Primer3 software (Rozen & Skaletsky, 2000) and are listed in Table S1.

Cloning and sequencing of EcR and USP, E75 and HR3 genes

Pea aphid EcR and Usp, E75 and HR3 were isolated by RT-PCR and afterwards sequenced. Same A. pisum cDNA as used in 2.3 was used for the initial PCR reactions. The PCR products were then purified using the Cycle Pure kit (Omega Bio-Tek, Norcross, GA, USA) and were ligated into a pGEM-T vector (Promega, Madison, WI, USA) according to the manufacturer's instructions. Afterwards, plasmids were transformed in competent Escherichia coli XL-1 Blue Cells by heat shock and then plated out on a carbenicillin-containing LB agar plate. After 16 h incubation, formed colonies were checked by colony PCR and several of these positive colonies were then purified using Plasmid mini prep kit (Omega Bio-Tek) and sent for sequencing (Agowa, Berlin, Germany).

3D-modelling of the ligand binding pocket of Ap-EcR and ligand docking

Multiple amino acid sequence alignments were carried out with CLUSTAL-X (Thompson et al., 1997) using the Risler's structural matrix for homologous amino acid residues (Risler et al., 1998). Molecular modelling of the EcR ligand-binding domain (EcR-LBD) from the pea aphid (Acces. NP_001152831.1), Ap-EcR-LBD, was performed on a Silicon Graphics O2 R10000 workstation, using the programs InsightII, Homology and Discover3 (Accelrys, San Diego, CA, USA). The atomic coordinates of Tribolium Tc-EcR-LBD in complex with ecdysone (RCSB Protein Data Bank code 2NXX) (Iwema et al., 2007) were used to build the 3D model of the receptor. The high percentages of both identity (∼75%) and similarity (∼90%) that ApEcR-LBD shares with the template Tc-EcR-LBD allowed us to build quite an accurate 3D model. Steric conflicts were corrected during the model building procedure using the rotamer library (Ponder & Richards, 1987) and the search algorithm of the Homology program (Mas et al., 1992) to maintain proper side-chain orientation. An energy minimization of the final model was carried out by 150 cycles of steepest descent using the cvff forcefield of Discover. PROCHECK (Laskowski et al., 1993) was used to assess the geometric quality of the 3D model. In this respect, about 87% of the residues of the modelled Ap-EcR-LBD were correctly assigned to the best-allowed regions of the Ramachandran plot. The remaining residues were located in the generously allowed regions of the plot except for three residues (Asn36, Glu39 and Glu42), which occur in the non-allowed region (result not shown). Molecular cartoons were drawn with PyMol (W.L. DeLano, http://pymol.sourceforge.net). The fold recognition program Phyre (http://www.sbg.bio.ic.ac.uk/phyre/html/index.html) (Bennett-Lovsey et al., 2008), which also used 2NXX and structurally related proteins as templates, yielded a readily superposable 3D model for Ap-EcR-LBD. However, some discrepancies that essentially deal with the shape of the loops, connecting the α-helical stretches, were observed with our lab-made, modelled structure. Importantly, these discrepancies occur far from the groove responsible for the binding of ecdysone.

Docking was performed with InsightII using Discover3 as a forcefield and we took TcEcR-LBD in complex with ecdysone as a template for docking. Clipping planes of Ap-EcR-LBD complexed to 20E were rendered with PyMol.


The authors are grateful for the support of the Special Research Fund of Ghent University and the Fund of Scientific Research (FWO-Vlaanderen, Belgium) to GS. PR acknowledges the financial support of Université Paul Sabatier and CNRS.