Stanley Dean Rider Jr, Department of Biochemistry and Molecular Biology, Wright State University, Dayton, OH 45435, USA. Tel.: + 1 937 775 3041; fax: + 1 937 775 3730; e-mail: email@example.com
Aphids display extraordinary developmental plasticity in response to environmental cues. These differential responses to environmental changes may be due in part to changes in gene expression patterns. To understand the molecular basis for aphid developmental plasticity, we attempted to identify the chromatin-remodelling machinery in the recently sequenced pea aphid genome. We find that the pea aphid possesses a complement of metazoan histone modifying enzymes with greater gene family diversity than that seen in a number of other arthropods. Several genes appear to have undergone recent duplication and divergence, potentially enabling greater combinatorial diversity among the chromatin-remodelling complexes. The abundant aphid chromatin modifying enzymes may facilitate the phenotypic plasticity necessary to maintain the complex life cycle of the aphid.
Aphid polyphenisms represent a set of morphologies and behaviours among genetically identical aphids that are individually adapted to perform specific life functions. These adaptations include winged (alate) and wingless (apterous) body plans, as well as parthenogenetic viviparae and egg-laying oviparae, but other specialized morphologies may develop in some aphid species. Although male production may be considered in this context, males differ genetically from their progenitors due to a chromosome elimination step that occurs during oocyte maturation. Fantastic examples of aphid polyphenisms, beyond dispersal and reproduction, come from eusocial aphids where a soldier caste may be produced. Members of the genus Tuberaphis produce a soldier caste that kills invaders by injecting them with a soldier-specific venom, and soldiers from the genus Pseudoregma have modified front legs and well-developed horns for ripping invaders apart (Shingleton & Foster, 2000; Kutsukake et al., 2008). This diversity in morphology and behaviour exists among genetically identical individuals and is best explained as an epigenetic phenomenon.
Very little is known about the molecular mechanisms by which aphid developmental paths diverge to produce different morphologies. Some aphid phenotypes are conditioned very early in the development of the aphid (Hardie & Lees, 1985; Shibao et al., 2003), perhaps during embryogenesis. Beyond gross morphological and behavioural differences, the aphid life-style, particularly that of apterous viviparae, requires that individuals adapt to a dynamic environment, which may include changes in host plant metabolism, photoperiod, temperature, population density and the presence of predators and parasites. Individual aphid phenotypes and responses to environmental changes are demonstrably associated with changes at the level of gene expression (Kutsukake et al., 2004; Ghanim et al., 2006; Brisson et al., 2007; Le Trionnaire et al., 2007; Cortes et al., 2008). We predict that in many cases altered gene expression patterns may be attributable to modifications in chromatin structure.
The fundamental unit of chromatin is the nucleosome, a highly conserved repeating unit composed of two copies each of the four ‘core’ histone proteins assembled into an octamer and wound by roughly 146–147 bp of DNA. Both DNA methylation and the post-translational modifications of histone proteins (particularly the N-termini) strongly influence the structure of chromatin beyond the nucleosome and can influence the recruitment of transcription factors and other histone-modifying enzymes (Ashraf & Ip, 1998; Schulze & Wallrath, 2007). Details on the DNA methylation machinery in the pea aphid are presented in an accompanying paper by Walsh et al. (2010). Known histone modifications (see Fig. 1) include acetylation, methylation, ADP ribosylation, phosphorylation and the addition of ubiquitin-like proteins (Strahl & Allis, 2000). Combinations of the DNA and histone modifications are sometimes referred to as a ‘histone code’ and the current understanding of how this code is interpreted is rudimentary when compared with the enzymes (chromatin-remodelling proteins) that are responsible for implementing individual histone or nucleosome modifications (Strahl & Allis, 2000).
Chromatin remodelling proteins participate in a number of essential processes such as gene expression, DNA replication, chromosome morphology and recombination. Thus these enzymes play vital roles in influencing cellular identity and developmental fate. Some histone-modifying enzymes possess intrinsic histone or nucleosome modification activity that is often modulated by interactions with other proteins. Many chromatin-remodelling proteins are multi-subunit enzymes that may integrate multiple signals (Kingston et al., 1996). Depending on the context and particular enzymes present, these complexes are able to add or remove the post-translational modifications associated with histone proteins (Fig. 1 and Supplemental Fig. S1). Moreover, individual components of the multi-subunit complexes may be interchanged to produce a highly diverse set of chromatin-remodelling machinery (Kingston et al., 1996; Simon & Kingston, 2009). Adding to this complexity, similar modifications can often be produced by structurally unrelated enzymes. For example, there are at least two unrelated superfamilies of histone acetyltransferases, three unrelated superfamilies of histone deacetylases, and two unrelated superfamilies of histone demethylase. We describe herein, some of the predicted chromatin-remodelling genes and proteins encoded in the recently assembled draft genome sequence (The International Aphid Genomics Consortium, 2010) of a developmentally plastic, hemimetabolous insect: the pea aphid, Acyrthosiphon pisum (Harris). The unique complement of histone-modifying enzymes encoded by this emerging model species may provide opportunities to understand how chromatin architecture responds to environmental changes to express different developmental outcomes.
Results and discussion
In general, the pea aphid possesses a large number of chromatin-remodelling genes (see Supplemental Table S1) when compared with other arthropods (Fig. 2 and Supplemental Table S2). As systematically detailed below, this relative increase in chromatin gene number is largely due to increases in genes coding for proteins involved in the methylation, acetylation and ubiquitination of histones. While many arthropod genomes are available for comparisons of putative orthologues, little biochemical or genetic evidence exists outside of Drosophila melanogaster to demonstrate function. Therefore, the bulk of the comparisons presented relate to the known functions of orthologues from D. melanogaster and humans.
The histone proteins H2A, H2B, H3, and H4 represent the core particle in the nucleosome. The histones H3 and H4 represent some of the most highly conserved proteins in eukaryotes, but the other histones (H1, H2A and H2B) are often highly divergent between different taxa. The linker histone (H1) is an important component of chromatin that is bound outside of the nucleosome in the internucleosomal space and confers additional stability to the nucleosomes by impeding their mobility (Pennings et al., 1994).
Despite a relatively steady rate of gene duplications in the pea aphid genome, our analysis indicates that in the pea aphid there is a relatively low number of histone coding loci. Over half of the histone-protein coding genes identified existed in uniquely organized clusters, as evidenced by six assembled scaffolds that contained combinations of different histone genes (Supplemental Fig. S2). An additional 13 scaffolds were identified encoding single histone genes. The distributed organization of histone coding genes in the pea aphid contrasts sharply with that of the large arrays of the histone genes seen in genomes like D. melanogaster and other invertebrates.
Unfortunately, the genomic organization and expression of histone genes has been examined in detail for only a few taxa and there are large gaps in the phylogenetic samplings. Therefore, it remains unclear if there are truly any biological consequences to changes in histone gene copy number variation, and histone gene organization among taxa. Nonetheless, each of the anticipated core histone coding genes was identified in the pea aphid genome, demonstrating conservation in the fundamental structural components of chromatin.
In addition to the above-mentioned proteins, histone variants that have specialized functions have been shown to exist in many species, and may be required e.g., during meiosis, DNA replication, recombination, or DNA damage and repair (Ausio, 2006). In the pea aphid genome, a single copy locus was identified that encodes H2A.Z, an evolutionarily conserved histone variant involved in many processes including chromosome segregation, gene regulation, and nucleosome positioning. Another evolutionarily conserved histone gene was identified, the H3.3 histone, or so-called ‘replacement histone’. Histone H3.3 is associated with actively transcribed genes where it apparently replaces histone H3 during gene transcription.
A clear homologue of the centromere-specific H3-like protein-coding gene (cid/cenp-a) was not found. CENP-A in D. melanogaster is an essential component of the centromere and is required for proper centromere segregation. Similar proteins have been identified in many other species, but it is clear that this protein is fairly rapidly evolving among different taxa. However, one predicted protein (XP_001945390.1) was discovered that has some homology to HCP-3, a Caenorhabditis elegansholocentric chromosome-binding protein that is analogous to centromeric H3. The predicted aphid protein contains a highly conserved domain of unknown function and is also related to a human protein (MAD2L1-binding protein) that interacts with components of the kinetochore, suggesting a link to centromere function.
We were also unable to clearly identify aphid orthologues encoding the protamine proteins (D. melanogaster mst35Ba and mst35Bb or mammalian prm1 and prm2) that are involved in spermatid chromosome condensation (Lewis et al., 2003). Likewise, an orthologue of the D. melanogaster male-specific transcript mst77F, which encodes a sperm-specific linker histone (Jayaramaiah Raja & Renkawitz-Pohl, 2005), could not be identified among the scaffolds or predicted proteins. A homologue of the Drosophila loquacious protamine RNA-binding protein is probably present (ACYPI002395; XP_001942593), but in other organisms, these proteins are not restricted to binding with protamine RNAs. An orthologue of HIRA was also identified (ACYPI005751; XP_001947756). In D. melanogaster HIRA is essential for the re-packaging of the sperm pronucleus following protamine removal and fertilization (Bonnefoy et al., 2007).
The pea aphid appears to be dependent on single loci for the production of highly conserved variant histones involved in several essential processes. However, a number of possible histone-like protein coding loci present in D. melanogaster and humans were not identified in the draft pea aphid genome sequence. Our inability to identify centromeric H3, protamines or the sperm-specific H1-like protein may be due to the poorly conserved nature of these protein orthologues from divergent taxa or gaps in the aphid genome assembly. On the other hand, aphids possess holocentric chromosomes, and some invertebrates package sperm chromatin using histones. Thus it is possible that the pea aphid possesses either highly divergent genes for these histone-like proteins, or utilizes a system different from what might be anticipated.
This set of enzymes catalyses the addition of acetyl groups to available lysine groups on core histone components, a process that is thought to create a chromatin architecture that is accessible to the transcriptional machinery. Known histone acetyltransferases (HATs) belong to one of two superfamilies: MYST-type acetyltransferases and GCN5-related N-acetyltransferases (GNAT) (Marmorstein, 2001; Utley & Cote, 2003). MYST HATs are named for the founding members: MOZ, YBF2/SAS3, SAS2, and TIP60. D. melanogaster possesses six GCN5-related HATs and 4 MYST-type HATs (Supplemental Table S1). MYST family members from D. melanogaster include Males absent on the first (MOF), Enoki mushroom (ENOK) and Chameau (CHM). MOF is capable of histone H4 lysine 16 (H4K16) acetylation and is partly responsible for the phenomenon of dosage compensation (Hilfiker et al., 1997; Rea et al., 2007). ENOK promotes cell proliferation in the mushroom body of the brain, wing and ovaries (Scott et al., 2001) while CHM activity has been demonstrated to suppress position effect variegation (Grienenberger et al., 2002). Thus, insect MYST family members are involved in gene regulation and development, including aspects associated with aphid polyphenisms (e.g. reproduction and wing development). Interestingly, the pea aphid genome not only contains homologues of the D. melanogaster MYST family members, but also duplications of mof and enok. Such duplications appear to be unique to the pea aphid when compared with other arthropod genomes (See Supplemental Table S2).
GCN5 in D. melanogaster (also known as PCAF) functions as a histone H3 acetyltransferase that regulates oogenesis and morphogenesis (Carre et al., 2005). In mammals, alternative splicing may produce a second protein that lacks the N-terminal PCAF domain, but that retains both the HAT and the bromodomains (Xu et al., 1998). The complement of HATs from the GNAT superfamily and the MYST superfamily are similar between the pea aphid and D. melanogaster. However, while the D. melanogaster genome encodes a single PCAF/GCN5 orthologue, the pea aphid possesses at least two PCAF/GCN5 paralogues that are about 78% identical, making the number of GNAT family members greater in the pea aphid than in D. melanogaster. The number may be even greater if, like in mammals, alternative splicing produces HAT enzymes lacking the PCAF domain. There also exists an unusual PCAF domain-only protein that shares 39–42% identity with the other two pea aphid PCAF domains. Whether this PCAF represents a functional protein remains to be determined. If so, it may represent a member of a protein family that has not been studied previously.
The histone deacetylases remove the post-translational modifications generated by HATs and are represented by three structurally unrelated enzyme superfamilies. HDACs are generally associated with a chromatin architecture that represses gene expression. The HDAC superfamilies present in animals include those related to the yeast reduced potassium dependency 3 (RPD3), and the Silent information regulator 2 (SIR2; the family is also known as sirtuins) (Gray & Ekstrom, 2001). The third family (HD2, originally discovered in Zea mays) appears to be present only in plants and some protozoa (Lusser et al., 1997; Aravind & Koonin, 1998). All three families have been demonstrated to regulate gene expression and developmental identity. The D. melanogaster genome encodes 5 RPD3-type and 5 SIR2-type histone deacetylases (Frye, 2000; Foglietti et al., 2006). Members of the RPD3 superfamily include diverse proteins that can be grouped into related families based on domain architecture and are further lumped into broader categories (designated as classes) based on the sequence motifs present in the HDAC domains (Gregoretti et al., 2004). The human genome includes a few families not present in the D. melanogaster genome but that do appear in some other arthropods. Thus, a comparison of the RPD3 family members in the pea aphid would not be complete without considering the presence of types that may have been lost from the Drosophila lineage. A comparison of the HDAC proteins from humans, D. melanogaster and the pea aphid is presented in Fig. 3.
The pea aphid genome encodes nine RPD3-type HDAC proteins and five sirtuins. Three of these proteins are from families not represented in D. melanogaster, but that are represented in the human genome (HDAC8 and HDAC10). On the other hand, the pea aphid appears to lack a classIV HDAC, while Dipterans and Coleopterans appear to possess this unusual type of HDAC (refer to Supplementary Table S2). Pea aphid HDAC duplications include three paralogues related to D. melanogaster RPD3, and two HDAC8 paralogues. Interestingly, HDAC8 appears to have been lost among a number of arthropod lineages, and currently the only other arthropod with more than one HDAC8 appears to be Aedes aegypti (Supplemental Table S2). The putative aphid HDAC10 protein has greater homology to the aphid HDAC6 (which apparently lacks a zinc finger) than to other HDAC10 proteins, but the domain architecture is much more indicative of an HDAC10 (see Supplemental Fig. S3). These aphid-specific additions to the histone acetylation pathway may have important consequences for the regulation of chromatin structure and gene regulation.
The known consequences of histone methylation may be either repression or activation of gene expression depending on the residue and extent of methylation. We assessed methyltransferase activity in the pea aphid using indirect immunofluorescence of sexual and asexual germaria. Histone H3 methylation of lysine 4, a marker for active transcription, is abundant in somatic follicle cells, trophocyte/nurse cell nuclei of asexual and sexual germaria, and young sexual and asexual oocytes (Figs 4A–F). This signal is greatly reduced in asexual syncytial blastoderm nuclei, suggesting that these cells may be largely transcriptionally silent. However, the H3K4 trimethylation signal reappears later in development (Figs 4M–P). Interestingly, the H3K4 trimethylation signal is absent or reduced in specific chromatin regions (arrows in Figs 4C,F). The significance of this remains unclear. However, trimethylation of lysine 9 of histone H3, a marker for constitutive heterochromatin (transcriptionally silent regions), has been detected at specific locations on several pea aphid chromosomes, including the sex chromosome (Mandrioli & Borsatti, 2007). We were also able to observe H3K9 trimethylation signal in specific chromatin regions that appear more condensed (Figs 4G–L). These trimethylated H3K9 chromatin regions may be mutually exclusive with regions of H3K4 trimethylation, confirming the expected antagonistic functions of these chromatin modifications.
Three broad categories of histone methylating enzymes exist: SET proteins, DOT1-like proteins and protein arginine methyltransferases (PRMT). SET proteins derive their name from a conserved motif observed in the Suppressor of variegation 3-9 [SU(VAR)3-9], Eenhancer of zeste [E(Z)] and Trithorax [TRX] genes of D. melanogaster that are important regulators of development and cellular identity (Dillon et al., 2005). SET domain proteins often possess additional chromatin-binding domains such as PHD fingers, chromodomains, the HMG box, etc. (refer to Supplemental Fig. S4). The pea aphid may have as many as 25 SET-domain containing histone methyltransferases, possibly being the greatest number among the sequenced arthropod genomes. Many of the aphid SET proteins appear to be related to an orthologue that can be found in D. melanogaster. Phylogenetic analyses of whole protein sequences produced a number of inconsistent results. Therefore, domain architecture provided a more reliable basis on which to assign orthology relationships. Three of the 25 aphid loci encode partial SET domains, possibly indicating the presence of a few pseudogenes. Each of the remaining pea aphid SET domain loci that lack a clear orthologue in D. melanogaster appear to be duplicated loci that have undergone some divergence or that have lost specific domains related to chromatin function. Among the duplicated loci are 5 loci related to eggless, three of which do not encode an expected methyl DNA binding domain; 4 loci related to Su(var)3-9, including one previously cloned but that is not present in the pea aphid draft genome sequence (one lacks chromodomains); and 4 additional loci that encode proteins whose domain architecture is related to CG4565, a poorly characterized protein comprised of a single SET domain. Duplications in the Eggless orthologues are potentially interesting because loss of the single eggless gene in D. melanogaster causes arrest of oogenesis, implicating a role for this gene in female fertility (Clough et al., 2007). The additional eggless-related loci may also play roles in the alternate reproductive pathways in female aphids (e.g. parthenogenesis and oogenesis). The predicted aphid Set2 protein lacks an expected Sri-2 domain, but one of the two duplicated proteins (with partial SET domains) that are related to Set2 possesses an Sri-2 domain. Thus, the Set2-related ‘duplicated’ proteins may represent portions of a mis-assembled Set2 orthologue. Interestingly, the putative trithorax-related orthologue has an N-terminus that contains five plant homeodomain (PHD) fingers and a high mobility group (HMG) box, making it more similar to vertebrate Mixed lineage leukemia 2 (MLL) orthologues than to the D. melanogaster trithorax-related protein with respect to domain architecture. Despite the presence of as many as nine duplications of the SET-type histone methyltransferases, clear orthologues of PR-Set7 (also known as dSET8) and SU(VAR)4-20 were not found. Both proteins are involved in histone H4K20 methylation, a mark that plays important roles in heterochromatin formation in other species (Ebert et al., 2006). Although ASH1 is capable of H4K20 methylation in D. melanogaster, it remains unclear how methylation of histone H4 at lysine 20 may be carried out in the pea aphid (Beisel et al., 2002). The remaining histone lysine methyltransferases in the pea aphid are related to Disruptor of telomeric silencing-1 (DOT1), an H3K79 methyltransferase from yeast (Feng et al., 2002; van Leeuwen et al., 2002). Two DOT1-related proteins are present in D. melanogaster as well as the pea aphid.
Histones may also be methylated at arginine residues through protein arginine methyltransferases (PRMT). In D. melanogaster, 9 PRMT homologues have been identified (designated DART1-DART9), including Capsuleen which is also known as DART5 (Boulanger et al., 2004). Capsuleen is the D. melanogaster orthologue of human PRMT5, a histone arginine methyltransferase associated with gene repression. However, DART5 also methylates other proteins, and plays a significant role in germ cell specification (Gonsalvez et al., 2006; Anne et al., 2007). The pea aphid possesses 11 PRMT-related loci, including orthologues of DART1, DART4 and DART7, all three of which are likely to dimethylate arginine residues on histones (see Supplemental Fig. S4). DART1 and DART4 (CARMER) were recently demonstrated to play roles in ecdysone-mediated responses in D. melanogaster (Cakouros et al., 2004; Kimura et al., 2008). The pea aphid genome encodes three proteins related to Capsuleen, and two proteins related to DART8, although one may represent a pseudogene. Two of the three remaining putative PRMTs are related to one another, but none are closely related to any of the DART proteins. These may be novel PRMTs or another type of methyltransferase. The last of the putative PRMTs may be a distant orthologue of DART3. The number of PRMT-like proteins encoded among arthropod genomes is relatively unchanged (see Supplemental Table S2). However, the similarities between PRMTs in D. melanogaster and the pea aphid suggest interesting possibilities for several aphid genes that may be associated with reproductive development, as seen in D. melanogaster.
Histone demethylation, like histone methylation, may be associated with gene activation or repression. Histone arginine demethylation in mammals is known to occur through peptidyl arginine deiminase (PADI) enzymes (Wang et al., 2004), but no clear orthologues for these enzymes appear to be present in D. melanogaster or any other arthropods for which DNA sequence is currently available. Consistent with these observations, no PADI homologues were identified in the draft pea aphid sequence. Two families of histone lysine demethylases are known – those related to flavin-containing monoxygenases and those containing a motif (JmjC) that was first identified in several chromatin-related proteins, including the mouse Jumonji protein (Takeuchi et al., 2006; Forneris et al., 2008). JmjC-domain containing proteins are further categorized by the presence of other domains, and those possessing an AT-rich DNA interacting domain (ARID) have been sometimes designated as JARID family proteins. D. melanogaster possesses two flavin-type histone demethylases, which have orthologues present in the pea aphid and two JARID proteins: little imaginal discs (LID) and the D. melanogaster orthologue of Jumonji (Sasai et al., 2007; Lee et al., 2009). Two additional JmjC proteins are encoded by the D. melanogaster genome that also carry a JmjN domain, which has been implicated as playing an essential role in the demethylation reaction for those enzymes (Lloret-Llinares et al., 2008). The D. melanogaster genome also encodes roughly nine other JmjC domain proteins that lack JmjN domains but these proteins have not been well characterized. Several of these proteins have orthologues in the pea aphid (Fig. 5). Two contain additional functional domains – either tetratricopeptide repeats (CG5640) or a zinc finger (CG11033). The pea aphid may have as many as 19 JmjC domain proteins, including homologues related to each of the six D. melanogaster JmjN domain proteins (Fig. 5). While the draft pea aphid genome sequence is predicted to encode at least three proteins that only possess the JmjC domain, all of the six remaining putatively complete JmjC domain-containing proteins possess a JmjN domain. The presence of the JmjN domain implies that each of these represents a bona-fide histone demethylase. Four of the JmjN+JmjC loci in the pea aphid genome also encode a conserved region related to PHD zinc fingers that bind specifically to methylated histone H3. Interestingly, these are represented by a clade that is unrelated to any of the proteins present in the D. melanogaster genome. None of the aphid JmjC proteins appears to be the result of very recent duplications as they share between 19% and 84% identity in the regions that can be aligned (Supplemental Table S3). With a complete, yet expanded set of enzymes modulating histone methylation, the pea aphid may have a complex means of controlling chromatin architecture and gene expression.
Ubiquitin (UB) and the Small UB-like modifier (SUMO) represent post-translational modifications that occur on a number of proteins including transcription factors, chromatin remodelling proteins and histones (Gill, 2004; Hilgarth et al., 2004). Notably, UB and SUMO have been demonstrated to function as transcriptional repressors when attached to histone proteins, possibly through prevention of acetylation of lysine residues that function in transcriptional activation (Sun & Allis, 2002; Gill, 2004; Weake & Workman, 2008). The pea aphid possesses several UB-like proteins (Fig. 6), as well as proteins that contain UB-like domains. Although D. melanogaster possesses a single SUMO protein (SMT3), it also possesses UB and several other UB-like protein-coding genes. The UB-like proteins, however, are better characterized from humans and, like humans, the pea aphid possesses several SUMO-related protein-coding genes (an SMT3 orthologue and two additional proteins related to human SUMO-1). The other UB-like proteins identified in the pea aphid include two polyubiquitins of different lengths, a bi-ubiquitin, and orthologues of at least 5 other UB-like proteins that are present in humans and D. melanogaster. Although the human FAT10 protein possesses two UB-like domains, it is phylogenetically distant from the structurally similar bi-ubiquitin in the pea aphid. While the human HDAC6 protein possesses a zinc finger on the C-terminus that interacts with FAT10 (Boyault et al., 2007; Kalveram et al., 2008), the pea aphid HDAC6 may lack a zinc finger. It is possible that if the aphid HDAC6 truly lacks the zinc finger domain, then this would be consistent with the apparent lack of a clear FAT10 orthologue in the aphid and may suggest some novel features in the regulation of HDAC6 function in the pea aphid.
The attachment of UB-like proteins to their targets is a multi-step process mediated after protein maturation by several enzymes (E1, E2 and E3 proteins; Fig. 2) (Hilgarth et al., 2004). These enzymes appear to be present in the pea aphid, based upon high scoring BLAST results for known E1, E2 and E3 proteins (supplemental Table S1). We found that the pea aphid genome possessed genes related to the maturation protease SENP1. We identified individual loci that encode the AOS1 and UBA2 proteins that form the heterodimeric E1 protein, an orthologue of UBC9 (the E2 protein known as Lesswright in D. melanogaster), as well as a second protein related to Lesswright. E3 proteins confer substrate specificity and enhance the SUMO ligation process. We found seven loci-encoding proteins that may have E3 ligase activity that are related to protein inhibitor of activated STAT (PIAS), which, in humans, helps target SUMO onto the methyl-CpG-binding protein (Lyst et al., 2006). Another E3 protein was also found that is an orthologue of RAN binding protein2 (RANBP2) that targets SUMO onto HDAC4 (Kirsh et al., 2002). Several proteins were also identified that may be involved in the proteolytic removal of SUMO from modified substrates (see Supplemental Table S1). Although the E1, E2 and E3 proteins that were associated with other UB-like modifications were not examined in greater detail, complete SUMO modification pathways appear to be present in the pea aphid.
The smt3 gene from D. melanogaster has been demonstrated to function in the larval-pupal transition and has been implicated in the packaging of sperm chromatin while the SUMO proteins from humans appear to have partially overlapping, but distinct, functions based upon identified protein targets. The presence of multiple SUMO-like proteins in the pea aphid suggests the possibility of a similar diversification of SUMO protein functions within the pea aphid.
Phosphorylation and ribosylation
Additional histone modifications of potential interest are phosphorylation and ribosylation. Although each of these modifications can be produced on histones by multiple different enzymes, we focused on identifying genes encoding specific enzymes of interest. One gene was identified in the A. pisum genome encoding the nuclear histone kinase (NHK-1). This is of particular interest because phosphorylation of H2A by NHK-1 occurs on nucleosomal histones and is required for the deposition of specific downstream marks of acetylation onto histone H3 and H4 during meiosis (Ito, 2007). Thus, NHK-1 provides a connection among histone phosphorylation, chromatin modifications associated with gene expression and reproductive physiology. Evidence is also emerging that mono- and poly-ADP ribosylation of histones are important aspects of chromatin remodelling (Tulin et al., 2002; Tulin & Spradling, 2003). A family of proteins related to poly-ADP ribose polymerase (PARP) was identified in the pea aphid (see Supplemental Table S1).
The bulk of our efforts were focused on the post-translational modifications of histones. However, chromatin conformations rely heavily on the relative positions of nucleosomes within the DNA. Nucleosome positioning can be modified by a number of distantly related proteins that possess an ATPase domain related to that from the yeast Mating type switching/Sucrose non-fermenting (SWI/SNF) chromatin remodelling protein (Becker & Horz, 2002; Mohrmann & Verrijzer, 2005; Bouazoune & Brehm, 2006). These proteins may function as motor proteins for the mobilization of nucleosomes along the DNA strand. The pea aphid genome possessed many of the same of chromatin-remodelling ATPases as D. melanogaster (Supplemental Table S1). However the pea aphid sequence lacked a clear orthologue of CHD3, but may possess an additional Imitation switch (ISWI) locus. One ISWI encompasses nearly half of the scaffold with which it belongs, while the other is well within a scaffold of about 89 kb. Beyond the ISWI region, neither scaffold can be readily aligned. The large size of the ISWI locus (∼11 kb) and the reasonably high conservation suggests that they could be allelic, but the lack of homology in the surrounding regions and the relatively high number of sequences present in the trace archive are compelling pieces of information that indicate that these loci are paralogues. In addition, it is possible that the aphid orthologue of Mi-2 may be non-functional because the predicted mRNA appears to contain a frame-shift that results in the loss of a conserved C-terminal domain. It should be noted that this shift occurs within a region that is not well conserved or represented by expressed sequence tag coverage and it is possible that the prediction represents a mis-annotation. The possible loss of Mi-2 function is intriguing because loss of function mutants in D. melanogaster are usually embryonic lethal, perhaps due to inappropriate expression of hox genes during embryogenesis (Kehle et al., 1998; Khattak et al., 2002). However, in other species like Arabidopsis thaliana (a plant), null mutants of the Mi-2 orthologue (Pickle; PKL) are viable and fail to suppress pathways that lead to totipotency (Ogas et al., 1999). However, sexual reproduction in pkl mutants is relatively normal. Thus, in A. thaliana, PKL functions to help suppress certain aspects embryo development that may lead to asexual reproduction. If Mi-2 in aphids is truly non-functional, it may provide a partial explanation for their increased capacity for asexual reproduction.
Relative to other arthropods, the pea aphid possesses a more diverse repertoire of histone modification machinery that remains to be examined experimentally. Numerous examples from other species indicate that the chromatin enzymes encoded by the pea aphid should be involved in developmental fate, perhaps serving vital functions in the differentiation of aphid polyphenisms, particularly reproduction. Several instances of gene duplication among the chromatin modifying enzymes were discovered, most notably among the enzymes involved in histone acetylation and methylation. It is interesting that there appears to be a rough balance in pea aphids among the histone-modifying enzymes that perform antagonistic biochemical reactions. For example, increases in the number of HAT loci correspond roughly with increases in the number of RPD3-type HDAC loci and increases in the number of SET domain methyltransferases coincide with increased numbers of JmjC domain demethylase genes.
Characterization of the expression profiles of these expanded chromatin-modifying genes may yield important clues to their roles in different aphid morphs. In other systems, the ectopic expression of histone-modifying enzymes can have dramatic effects on cell differentiation and development, and certain chromatin modifying proteins are required to be in a relative balance for normal cellular function. In yeast, loss of the HAT GCN5 causes a transcription defect that can be ameliorated by elimination of the HDAC RPD3, indicating a general need for balancing the activities of the two enzymes (Perez-Martin & Johnson, 1998). A similar phenomenon is observed in D. melanogaster with the HAT Chameau and the HDAC dRPD3 (Miotto et al., 2006). In D. melanogaster, disruption of normal levels of HAT activity in non-lethal eye models result in cell death, indicating that major changes in relative HAT activity are deleterious (Taylor et al., 2003; Lee et al., 2008). In addition, the Male specific lethal complex, which includes the histone acetyltransferase MOF, requires tight quantitative regulation for normal targeting to the appropriate regions in D. melanogaster chromosomes (Gu et al., 1998). Thus, it is becoming clear that balance is required in the levels or activities of histone-modifying enzymes. Is an evolutionary scenario analogous to the gene-for-gene interactions that exist, e.g., between plants and their pathogens affecting pea aphid chromatin gene duplications? If additional copies of individual chromatin loci cause deleterious effects that can be squelched by additional copies of their antagonists, this could represent a previously undescribed phenomenon that has been shaping the evolution of the pea aphid genome.
However, given the intricacies of chromatin remodelling, the idea that increased gene copies could offset extra copies of specific chromatin loci for an antagonist is overtly simplistic. For example, in D. melanogaster, the histone demethylase LID functions in a complex that can inhibit the activity of the dRPD3 HDAC (Lee et al., 2009) and genetic interactions between dSETDB1 and SU(VAR)3-9 indicate that these genes must be present in a balanced state for normal genome function (Brower-Toland et al., 2009). Thus, there are a number of different control mechanisms in place to regulate the activities of specific chromatin modification activities. It is possible, then, that the extra copies of various chromatin loci could have no net negative effect and that the presence of increased copies of these genes, along with their antagonists, is merely a coincidence. In fact, the pea aphid genome appears to have a relatively steady rate of gene duplication across many different types of loci, including the chromatin-modifying enzymes. In the absence of any deleterious effects due to gene duplication, paralogous chromatin loci in the pea aphid may still be undergoing divergence in function and may have adopted unique and essential roles. This is an exciting area that merits further investigation and could provide insight into the roles of chromatin modifications in pea aphid development and the regulatory systems involved in aphid polyphenisms.
Discovery of chromatin protein genes
Chromatin-remodelling proteins were identified by BLAST searches against the pea aphid genome scaffolds (NCBI accessions EQ110773.1 to EQ133570.1), predicted RefSeq mRNAs (NCBI A. pisum mRNA accessions NM_001126134.1 to XM_001952843.1), and predicted RefSeq proteins (NCBI A. pisum protein accessions NP_001119606.1 to XP_001952878.1) that were placed on local servers. Starting points for BLAST searches included chromatin-modifying proteins from D. melanogaster or Homo sapiens. Nearly all chromatin loci were examined through this manual process that involved comparisons of putative coding regions to known proteins to identify intron/exon borders and the termini for each gene. In most cases, automated annotations provided equivalent or superior results. Any significant modifications to automated gene predictions were deposited along with gene identities into the Apollo database at Aphidbase (http://www.aphidbase.com/aphidbase). Additional searches of both D. melanogaster and human genomes were done using 3 rounds of PSI-BLAST on the refseq proteins present at NCBI, or by searches of the SMART database. Estimates of genome sizes (C-values) were obtained from various databases (Gregory et al., 2006).
Determination of allelism
Predicted aphid protein coding genes sharing greater than 95% identity and encoded by different scaffolds were considered as possible alleles, rather than paralogues. To distinguish between the two alternatives, allelism was defined as >95% nucleotide sequence identity between scaffolds, conserved colinearity (synteny) between alternative scaffolds, and only small (<500 bp) insertions/deletions or single nucleotide polymorphisms representing the differences between scaffolds. Putative alleles not meeting those criteria were considered to be paralogous. Individual loci were used for MEGABLAST searches against the pea aphid WGS trace archive to determine the number of traces corresponding to those sequences and to determine the relative coverage of the individual loci. The results from 15 microsatellite loci and two single-copy genes were used as a basis for evaluating if the coverage of the genes being investigated was consistent with the number of paralogues identified (see Supplemental Fig. S5).
Orthology was determined using one of two methods: reciprocal best BLAST matches or through the construction of phylogenetic trees. Domain organization was used as a criterion for adjusting designations of orthology when relatedness trees were either ambiguous or produced inconsistent results. Conserved domains were identified in the predicted proteins by comparisons with the CDART database (available at http://www.ncbi.nlm.nih.gov /Structure/cdd/wrpsb.cgi). Putative orthologues were also identified by searching clusters of orthologous groups generated by OrthoMCL using 13 arthropod proteomes (summarized in Supplemental Table S2 and available at http://insects.eugenes.org/arthropods/). Protein alignments for phylogenetic trees were generated using ClustalX, followed by the generation of neighbor joining trees either with, or without bootstrapping. In some instances, consensus phylogenetic trees were also generated using protein distance neighbor joining trees developed after bootstrapping the original dataset. This was done using the phylip program package. Bayesian Inference methods implemented in MrBayes were also used for some protein families (e.g. HDAC) and utilized a wider spectrum of taxa for comparisons. Images for phylogenetic trees were rendered using FigTree and exported for further annotation with Adobe Illustrator.
Indirect immunofluorescence of histone methylation
Asexual and sexual ovaries were dissected in cold Dulbecco's phosphate-buffered saline from LSR1.G1.AC female aphids reared at either 19 °C with 16 h of light/8 h of dark (asexual) or 16 °C with 13.5 h lights/10.5 h dark (sexual). All steps described were performed in the wells of a glass dissecting dish. Ovaries were fixed in 4% paraformaldehyde for 20 min, washed quickly three times in PBS, and placed in BBT (0.2% BSA, 0.3% Triton X-100 in PBS) for 1 h at room temperature (RT) on a shaking platform. Tissues were then blocked in BBT-NGS (BBT solution with 2% normal goat serum) for 1 h at RT on a shaker. Primary antibodies were diluted in BBT-NGS and incubated overnight at 4 °C with slow shaking. Tissues were washed eight times in BBT and blocked 1 h in BBT-NGS. Secondary antibodies were diluted in BBT-NGS and incubated overnight at 4 °C. Tissues were washed 8 times in PBS+0.3% Triton X-100, counterstained with Hoescht (at 1 µg/ml) in PBS for 10 min, washed 6 times in PBS, equilibrated in Vectashield overnight at 4 °C, and mounted and sealed under coverslips with clear nail polish. Images were obtained using a Leica CTR6500 confocal imaging system on a Leica DM5500Q microscope with Leica LAS acquisition software. Images were imported into and adjusted for size and brightness/contrast in Adobe Photoshop. The following antibodies were used at the described dilutions: rabbit anti-histone H3 trimethyl-K4 at 1:500 (#ab8580, Abcam, Cambridge, MA), rabbit anti histone H3 trimethyl Lys9 at 1:500 (#39162, Active Motif, Carlsbad, CA) and goat anti-rabbit IgG-Cy5 at 1:500 (Jackson Immunologicals, Westmore, PA).
S.D.R. was supported in part by funds from the National Institutes of Health through the National Institute of Allergy and Infectious Diseases with Grant R21AI068461. D.G.S. was supported by a NIH Ruth L. Kirschstein NRSA Postdoctoral Fellowship (GM077928). The authors would like to thank Jeremy A. Kroemer for thoughtful discussions on content of the manuscript.