Characteristics of the genome of Arsenophonus nasoniae, son-killer bacterium of the wasp Nasonia

Authors


Alistair C. Darby, School of Biological Sciences, University of Liverpool, Crown Street, Liverpool L69 7ZB, UK. Tel: +44 151 7954551; e-mail: alistair.darby@liv.ac.uk

Abstract

We report the properties of a draft genome sequence of the bacterium Arsenophonus nasoniae, son-killer bacterium of Nasonia vitripennis. The genome sequence data from this study are the first for a male-killing bacterium, and represent a microorganism that is unusual compared with other sequenced symbionts, in having routine vertical and horizontal transmission, two alternating hosts, and being culturable on cell-free media. The resulting sequence totals c. 3.5 Mbp and is annotated to contain 3332 predicted open reading frames (ORFs). Therefore, Arsenophonus represents a relatively large genome for an insect symbiont. The annotated ORF set suggests that the microbe is capable of a broad array of metabolic functions, well beyond those found for reproductive parasite genomes sequenced to date and more akin to horizontally transmitted and secondary symbionts. We also find evidence of genetic transfer from Wolbachia symbionts, and phage exchange with other gammaproteobacterial symbionts. These findings reflect the complex biology of a bacterium that is able to live, invade and survive multiple host environments while resisting immune responses.

Introduction

Insect research has demonstrated, and continues to do so, the influence of microorganisms on insect biology, ecology and evolution. The variety of interactions is immense. In some cases, insects require resident symbionts for their physiological function (termed primary symbionts), and the absence of these microbes is associated with impaired host function (Douglas, 1998). In other cases the symbionts are not required, but are important components of the insect defence system against natural enemies (Oliver et al., 2003, 2006; Ferrari et al., 2004). In contrast, there are a variety of interactions where the microbe is vertically transmitted, but reduces host fitness by altering reproductive biology (reproductive parasites) (Werren & O'Neill, 1997; Engelstaedter & Hurst, 2009). These microbes can affect host reproductive ecology (Jiggins et al., 2000; Charlat et al., 2007b), accelerate the pace of evolution in suppressing the microbes' action (Charlat et al., 2007a), and generate reproductive isolation (Bordenstein et al., 2001). Finally, there are other microorganisms that are facultative pathogens or commensals (Duchaud et al., 2003; Toh et al., 2006).

Inherited parasites that affect host reproduction are diverse both taxonomically and in their phenotypic effects on hosts. For example, inherited bacteria in insects can induce cytoplasmic incompatibility, feminize hosts, induce parthenogenesis, and kill males (Werren & O'Neill, 1997; Engelstaedter & Hurst, 2009). In terms of microbial diversity, the aetiologic agents derive from the diverse clades including clade Wolbachia (Werren et al., 2008), Rickettsia (Werren et al., 1994), Cardinium (Hunter et al., 2003), Spiroplasma (Williamson & Poulson, 1979; Hurst et al., 1999b), an un-named member of the Bacteroides-Flavobacteria clade (Hurst et al., 1999a), and the genus Arsenophonus (Gherna et al., 1991). Despite this diversity of microbe and phenotype, complete genome sequences are currently only available for three genomes of reproductive parasites, all belonging to the clade Wolbachia, and all inducing CI (Wu et al., 2004; Klasson et al., 2008, 2009). Partial genome sequence is available also for a strain of Wolbachia that induces parthenogenetic reproduction in its host (Klasson et al., 2009).

In this paper, and its companion (Wilkes et al., 2009), we examine the genome of a reproductive parasite, Arsenophonus nasoniae (Gherna et al., 1991). Arsenophonus nasoniae is a member of the Enterobacteriacae (gamma proteobacteria) that infects and interacts with the wasp Nasonia vitripennis and has several unusual characteristics. First, it is a parasite that induces killing of haploid (male) embryos (Huger et al., 1985; Werren et al., 1986; Ferree et al., 2008). As such, it represents the first ‘male-killing’ bacterium, and only the second reproductive parasite species to have its genome sequenced. Here, its distinctness from Wolbachia is important. Second, it demonstrates an unusual mode of transmission. Whilst it does show maternal inheritance, this transmission occurs through intermediate infection of the fly pupal host following injection into the host during stinging by the wasp and subsequent peroral acquisition of the infection during larval wasp feeding (Huger et al., 1985; Werren et al., 1986). This mode of transmission also results in infection moving horizontally between N. vitripennis individuals following co-infection of parasitoid wasps within a host pupa (Huger et al., 1985; Skinner, 1985). In the context of symbiont genome evolution, this life cycle requires the symbiont to maintain properties associated with replication and survival in wasp and parasitized fly hosts, survival in the wasp larval gut, and transfer across the gut epithelia. The requirement to properly maintain these traits explains why this bacterium, unlike many other reproductive parasites, can be cultured on cell-free media (Werren et al., 1986). Finally, the conjunction of the sequencing of the A. nasoniae genome with that of its host N. vitripennis (Werren et al., 2010) is timely, allowing investigation of both parties in this symbiosis.

Beyond A. nasoniae being an interesting bacterium, the genus Arsenophonus (of which A. nasoniae is the type species) (Fig. 3) has been found to be widespread and common in arthropods (Hypsa & Dale, 1997; Werren, 2004), with screens suggesting around 5% of species are infected with members of this genus (Duron et al., 2008). So far, A. nasoniae is the only member of the genus known to cause male killing. Phenotypes of the other members of the genus are yet to be determined, but they are likely to be non-pathogenic and may confer benefits to the host. This microbial clade is thus likely not only to be common, but to have diverse interactions with its arthropod hosts.

Figure 3.

Bayesian phylogeny for the genus Arsenophonus using the 16S rDNA gene. Posterior probability support for each node is shown. Accession numbers are for RDP 10 database entries.

We present here a first analysis of the genome sequence of A. nasoniae. Our focus in this paper is to compare its genome to its closest relatives – members of the Enterobacteriacae – and to examine how the genome of this bacterium differs from other members of the clade. The Enterobacteriacae contain bacteria with a diverse array of lifestyles. Best recognized are the enteric pathogens of humans of the genera Yersinia, Escherichia, Shigella, Salmonella and Vibrio. In addition, there are a range of bacteria that interact symbiotically with invertebrate hosts. Vibrio fischerii strains are in symbiosis with squid in the light organ (McFall-Ngai & Ruby, 1991). Buchnera aphidicola, Wigglesworthia, Blochmannia and Reisia are all primary symbionts of insects (Shigenobu et al., 2000; Akman et al., 2002; Wernegreen et al., 2003; Tamas et al., 2008). Regiella insecticola and Hamiltonella defensa are inherited symbionts that function in defence of insect hosts against natural enemies (Oliver et al., 2003; Ferrari et al., 2004). Beyond this group, there are entomopathogen species, such as those in the genera Serratia and Photorhabdus (ffrench-Constant et al. 2003). Arsenophonus is thought to be most closely related to members of the genera Photorhabdus and Proteus. Photorhabdus luminsecens inhabits the guts of entomopathogenic nematodes, released into insects on nematode infection and is responsible for the death and digestion of the insect (ffrench-Constant et al. 2003). Proteus species have a wide array of ecological interactions, but are generally studied as pathogens (Proteus vulgaris & Proteus mirabilis) (Pearson et al., 2008).

In this paper, we first describe how the A. nasoniae genome is organized. We then describe how this genome compares with those of pathogenic and symbiotic members of the Enterobacteriacae, whether any components are obviously shared with other ‘inherited parasites’, how the genome reflects its known lifestyle, and what the genome itself can tell us about the lifestyle of the bacterium that was not apparent from previous study. In a companion paper, we examine the genome for evidence of secreted factors and secretion systems likely to be directly associated with interaction with its host, the wasp N. vitripennis (Wilkes et al., 2009).

Results and discussion

General features of symbiont genomes

Research in the area of insect symbiont genome evolution is strongly biased toward select groups of symbiotic phenotypes. These genomes produced new hypotheses concerning the functional capabilities of the microbes in relation to their contribution(s) to insect physiology (Shigenobu et al., 2000; Akman et al., 2002; Wernegreen et al., 2003), to natural enemy defence (Degnan et al., 2009) or with respect to the microbes' capabilities to manipulate host systems (Wu et al., 2004; Klasson et al., 2008, 2009). Moreover, genome sequences reveal interesting evolutionary patterns that impact the evolutionary trajectory of the symbiosis. Microbial genomes tend to become ‘streamlined’ upon entering inherited symbiosis, by losing genes made redundant by their niche within a host environment (Akman et al., 2002; Wernegreen et al., 2003; Toh et al., 2006; Tamas et al., 2008). These losses render the microbes dependent on their insect hosts, and are the most likely explanation for their fastidious nature (i.e. difficulty in growing on cell-free media). In cases where the host depends on their microbes and horizontal transmission of the symbiont is lost, the bacteria also commonly lose the genetic systems that encode recombination (Dale & Moran, 2006).

General features of the Arsenophonus nasoniae genome

The A. nasoniae genome sequence reads are assembled into 143 sequence scaffolds (containing 665 contigs), resulting in a draft genome size of 3 567 128 bp (Fig. S1, Table 1). These scaffolds are a mixture of bacterial chromosome/s and extrachromosomal DNA. On the basis of read coverage, gene annotation and biological information, the scaffolds were assigned to three putative groups, namely chromosomal, plasmid and phage DNA. The estimated chromosome size is c. 3.3 Mbp, with extrachromosomal content of ∼100 Kbp from plasmids and ∼200 Kbp from phage.

Table 1.  General statistics and features for the Arsenophonus nasoniae genome
Current genome assembly3 567 128 bp
Contigs665 (median size 4 Kbp, maximum 43 Kbp)
Scaffolds143 (median size 7 Kbp, maximum 212 Kbp)
261 sequencing gaps in scaffold
GC content of genome37.37
Predicted CDS3332
Assigned function2495
Conserved hypothetical441
Hypothetical483 (including 120 hypothetical phage proteins)
Transposons38
Phage related410
Plasmid related50
Virulence related>400
tRNA gene52
Ribosomal DNAEstimated of 8–10 copies number of the 16S, 23S, 5s
Genes with an error135

Genome size and GC content

Past work has identified two clear evolutionary trajectories for predominantly vertically transmitted symbiont genomes. First, the genomes evolve to be increasingly AT rich. Second, they evolve towards reduced size. These processes are thought to be partially associated with weaker selection against mildly deleterious mutations in small populations experienced during vertical transmission (relaxing codon adaptation bias) (Moran, 1996), and the loss of function for genes rendered redundant by symbiosis.

The predicted chromosome size of A. nasoniae is significantly smaller than that of the two most closely related sequenced genomes, Pr. mirabilis (Pearson et al., 2008) and P. luminescens (Duchaud et al., 2003) (Fig. 1), but is larger than the genomes of both other secondary symbionts (Degnan et al., 2009) and obligate insect endosymbionts (Shigenobu et al., 2000; Akman et al., 2002; Wernegreen, 2002). The 35% genomic GC content is AT rich relative to Pr. mirabilis (GC 38%) (Pearson et al., 2008) and P. luminescens (GC 43%) (Duchaud et al., 2003). Notably, the GC content is at the bottom of the range found when controlled for genome size (Fig. 1), indicating a process of evolution towards increasing AT bias.

Figure 1.

Bacterial genome size plotted against GC content. Gamma proteobacteria species are open circles. Blue fill indicates obligate endosymbionts, red =Arsenophonus nasoniae, yellow =Sodalis glossinidius, light blue =Yersinia enterocolitica, green =Photorhabdus luminescens, purple =Proteus mirabilis, Pink =‘Candidatus Hamiltonella defensa’, orange = alpha proteobacteria Wolbachia species. Grey crosses other bacteria.

In some respects, the A. nasoniae genome conforms to the pattern seen in vertically transmitted symbionts, notwithstanding its peroral (rather than transovariol or transovum) transmission, where it shows appreciable levels of horizontal transmission (Huger et al., 1985; Skinner, 1985). The reduced genome size of A. nasoniae reflects modest gene loss in pathways no longer needed by virtue of its symbiotic lifestyle (see section on metabolism below and Fig. 4). The reduced GC content suggests the evolutionary processes that produce a drive to elevated AT in symbionts, that is to say small population sizes producing relaxed selection, apply also to A. nasoniae, despite its distinct transmission biology. This finding parallels the observation that transovum transmitted gut symbionts also hold to these patterns of evolution (Dale & Moran, 2006), and indicates that the ‘bottleneck’ in the size of the colonizing population is common in symbiosis and not dependent on the physical constraint of passage through the egg.

Figure 4.

Venn diagram showing the orthologous/paralogous gene clusters observed between the genomes of An, Arsenophonus nasoniae (this study), Pl, Photorhabdus luminescens (NC_005126), Pm, Proteus mirabilis (NC_010554) and Ye, Yersinia enterocolitica (NC_008800), as obtained using ORTHOMCL.

The Arsenophonus narsoniae gene inventory

The contiguous sequences contain 3332 predicted coding domains. Phylogenetic analysis of a set of 23 concatenated orthologous genes suggests that P. luminescens and Pr. mirabilis are the closest sequenced relatives to A. nasoniae (Figs 2, 3). The genome annotation and whole genome comparisons by ORTHOMCL clustering (Fig. 4) reveal that the gamma proteobacteria share, as expected, the greatest number of genes with A. nasoniae. Overall 67% (2243/3332) of A. nasoniae ORFs have a homologous gene in at least one of Photorhabdus, Proteus or Yersinia. The distribution of orthologous gene clusters between A. nasoniae, Y. enterocolitica, P. luminescens and Pr. mirabilis (Fig. 4) demonstrate that the majority of genes are shared between all four taxa. The Yersinia, Proteus and Photorhabdus species share a number of genes not present in A. nasoniae, which likely reflects their larger genomes sizes, but also indicates that A. nasoniae has lost some genes that are ‘core’ in other members of the Enterobacteriacae (Figs 6 and 7– red pathways). It is also interesting that, although P. luminescens is most closely allied to A. nasoniae in terms of the identity of shared orthologous genes, the Proteus and Yersinia species both share more orthologous gene clusters with A. nasoniae than with P. luminescens– in excess of 80 more shared clusters (Fig. 4). Of the 1089 genes that have no homologues in the Photorhabdus, Proteus or Yersinia genomes, the majority are annotated as hypothetical, phage/mobile elements, or components of flagella/secretion apparatus. Of these genes of unknown function, 25 loci show a gene phylogeny that contradicts expected topologies if derived from a gamma proteobacterial ancestor (Table S1).

Figure 2.

Phylogenetic position of Arsenophonus nasoniae, obtained through analysis of 25 concatenated single-copy conserved othologous genes from 13 taxa. Tree topology obtained through maximum likelihood estimation, with node values showing Maximum likelihood/Neighbour-joining percentage from 1000 Bootstraps replicates for each node.

Figure 6.

Comparative analysis of genes by functional KEGG categories, from the genomes of Yersinia enterocolitica (‘pathogen’), Sodalis glossinidius (secondary symbiont), Proteus mirabilis (commensal), Arsenophonus nasoniae, Buchnera aphidicola (Obligate primary symbiont), Wolbachia pipientis (reproductive parasite/secondary symbiont).

Figure 7.

KEGG Atlas estimation of Arsenophonus nasoniae metabolism. Colour coding indicates the presence and absence of pathways between taxa: green = in ALL of Proteus (pmr), Photorhabdus (plu) and Arsenophonus (arn) species; red = pmr AND plu NOT arn; pink = pmr OR plu NOT arn; purple = pmr OR plu AND arn; blue = only arn.

We particularly examined the genome of A. nasoniae for elements that are more likely derived from microbial affiliates of arthropods from outside the Enterobacteriacae. We found two ORFs whose closest homologues lie within the alpha proteobacteria. One prominent ORF shows amino acid sequence similarity (59%) to a surface protein of a Wolbachia species infecting Drosophila ananassae (ORF WD1041 of wMel) (Fig. 5). Members of this protein family are found widely (but not ubiquitously) in Wolbachia, Anaplasma and Ehrlichia genomes, but have not previously been found in other sequenced bacterial genomes. A second ORF has the greatest sequence similarity to a gene in a Rickettsia species encoding polypeptide deformylase, and carries an intact domain for this function. This ORF is clearly derived by lateral transfer from a Rickettsia genome. Whilst the Rickettsia gene has a best match to the A. nasoniae ORF (61% amino acid similarity), there is also a good match with a homologous ORF in Legionella (54% amino acid similarity). We find no ORFs that share recent ancestry with firmicute symbionts, nor to Spiroplasma, spirochete or chlamydial genomes, although this observation should be tempered with the knowledge that these symbiont groups are less well represented in sequenced genomes.

Figure 5.

Phylogenetic position of Arsenophonus nasoniae gene ARN_18990, a Wolbachia-like surface protein. Tree topology estimated by Neighbour- joining with node values showing Maximum likelihood/Neighbour-joining percentage (identical for both methods) from 1000 Bootstraps replicates for each node. Accession of proteins used NP_966764, ZP_00373362, ZP_03334709, ZP_00372367, ZP_01314401, ZP_00372368, ZP_03334708, ZP_01314711, ZP_03788399, ZP_03787656, and ZP_01314712.

Extrachromosomal genome

In microbial evolution, lateral transfer occurs most easily for extrachromosomal DNA – plasmids and phage. These elements may have an important role in gene exchange between symbionts living in the ‘symbiotic microbiome’ (Fujii et al., 2004; Darby et al., 2005b; Moran et al., 2005). We therefore analysed the genome sequence for extrachromosomal elements and their potential origins. Approximately 10% of the genome is potentially extrachromosomal DNA. This estimate includes putative phage, prophage and plasmid sequences.

Plasmid scaffolds comprise about half of the extrachromosomal sequenced material. One plasmid, for which there is direct evidence (observed by gel electrophoresis following alkaline lysis), is small (c. 8 Kbp), is in high copy number, and may be related to a similarly sized plasmid in Arsenophonus arthropodicus (Dale et al., 2006). There is also likely to be a large conjugative plasmid of about 100–200 Kbp. Whilst there is no direct observational evidence for this plasmid, its presence is suggested by functional genes (presence of ORFs that function in conjugation or replication, including five ORFs encoding replication protein RepA), and by the greater than average depth of read coverage (Fig. S1). At this moment, the fragmented state of the draft assembly prevents us from stating whether this represents one large or many smaller plasmids. We have no direct evidence for its importance in symbiosis, but note that similar plasmids have been described in Proteus (Pearson et al., 2008), Hamiltonella (Degnan et al., 2009) and Sodalis (Darby et al., 2005b) species.

The putative plasmids in A. nasoniae encode many groups of type IV pili genes that show high homology with gamma proteobacteria conjugative transfer (tra) genes. These loci are spread across four regions (Table 2). Region one (ARN_17770 – ARN_18000) contains 19 genes, region two (ARN_20700 – ARN_21120) has 15 genes, region three (ARN_21440 – ARN_21570) contains 13 genes, and region four (ARN_36610 – ARN_36670) has five genes. All regions are smaller than those found in F and R100 plasmids (both have 35 genes) (Yoshioka et al., 1990), but are within the size range of other described conjugative pili. Region one is most akin to the tra regions of F and R100 plasmids. Region two shows most similarity to tra elements from F-like regions from Photobacterium damselae ssp. piscicida, pT99-018 and Proteus vulgaris, pRts1. In neither region 1 nor region 2 is there any evidence for the pilus assembly proteins TraQ and TraX or a functional copy of the DNA transport protein TraI (region two may be missing additional genes see Table 2). Region three shows good synteny, homology and gene content with the Yersinia pseudotuberculosis pGDT4 pil region f (but also with Shigella sonnei pColIb-P9; Serratia entomophila pADAP and Salmonella enterica ssp. enterica serovar Heidelberg str. SL476 pSL476-91). However, there is no homology with the tra region from this plasmid. Region four is a fragment of a region that shows similarity to tra regions found on plasmids from Haemophilus influenzae biotype aegyptius, pF1947 and Erwinia tasmaniensis Et1/99, pET49 (Kube et al., 2008). The data in total suggest that regions one, two and three likely encode a pilus. However, it is impossible to predict if this pilus is involved in conjugation, as key DNA transfer genes are unaccounted for in the genome. The number of conjugative regions represented on these sequence contigs reinforces the status of these contigs as being of plasmid origin, and thus potentially important in lateral transfer between symbionts.

Table 2.  Summary of putative tra gene and functions present in the Arsenophonus nasoniae genome. + present, X present but annotated as pil genes
FunctionGeneRegionNumber of ORFs
1234Other
ControlfinO    +2
DNA transporttraD++ + 2
traI++ ++3
Mate-pair stabilizationtraG+++++4
traN++   1
Pilus assemblytraA++X  4
traB++X  3
traC++X  3
traE++X  3
traF ++  1
traH++++ 3
traK++X  4
traL++X  3
traU+ X  2
traV++X  3
traW++X  3
trbC+ X  2
trhF +   1
Surface exclusiontraS+  + 1
traT+    1
UnknowntrbB+    1
trbI+    1
 19151353 

Phage and transposons

The phage component of the genome is relatively high; about 370 Kbp (10%) of the draft sequence assembly has BLASTp (e-value cut off 1e-3) sequence similarity to known phage and prophage genes. The genome encodes c. 410 identifiable phage proteins currently distributed in 54 regions. The regions commonly encompass whole scaffolds or ends of scaffolds/contigs, suggesting they form break points in the assembly and may be duplicated within the genome. Regions identified as phage by sequence similarity also have high read depth coverage, indicating that they are either prophage in multiple copies, or lytic phage. The abundance of phage type regulators (n= 97) suggests that a number of the phage sequences in the shotgun genome are involved in controlling prophage elements.

Among the phage elements, there are five regions that show homology and synteny (>5 genes) to previously described symbiont phage. The largest of these regions contains 20 genes (locus_tag ARN_10670 – ARN_10970) in synteny with phage loci isolated from Sodalis glossinidius (A.C. Darby unpublished data). The other regions correspond to part of the APSE phage from the aphid secondary symbiont H. defensa (Degnan & Moran, 2008). The presence of this phage within the Sodalis chromosome (Toh et al., 2006) led Degnan & Moran (2008) to suggest that the gamma proteobacterial symbiotic community may actively be exchanging phage elements, and our data from A. nasoniae corroborate this view. The significance of such transfer for the evolution of symbiotic interactions is clear because symbiont phage regions also include genes encoding putative virulence genes such as secreted effector proteins and toxins, and have been implicated also in mediating natural enemy resistance (Oliver et al., 2009). Notable are three ORFs within phage regions with BLASTp similarity to apoptosis inducing protein 56 of a Photobacterium species, and these A. nasoniae ORFs also show similarity to ORFs in APSE-2 (Wilkes et al., 2009).

Estimating the representation of transposons and IS elements in the genome is difficult because repetitive elements are prone to be misassembled or altogether excluded from the assembly. Nevertheless, c. 40 transposon insertions are found both within phage regions and flanking putative symbiotic/pathogenic islands. Five distinct groups could be observed and described based on the alignment of the genes (Table S2). Nearly 45% (n= 17) of the putative transposase-derived ORFs were integrase elements (Group1a and 1b) similar to genes described in Shewanella denitrificans OS217. Group 1a and 1b integrase elements were commonly observed contiguously. The next largest group were ISPsy23-like ORFs (n= 5). The remaining transposase-related genes (Groups 3, 4 and two additional genes) were similar to other symbiont transposases from Sodalis and Hamiltonella species. The final 10 transposases do not group, and are generally similar to gamma-proteobacteria transposases.

Metabolism

The genomes of obligate symbionts that make a nutritional contribution to host biology typically retain genes encoding the metabolic functions for this role (Ruby, 2008). Because A. nasoniae is present in only a fraction of host individuals, the bacterium is not likely to contribute significantly to nutrition of the insect host. As expected, no gene retention bias is observed for genes with these metabolic functions. Like the tsetse symbiont S. glossinidius (Dale & Maudlin, 1999; Matthew et al., 2005), A. nasoniae is able to grow on cell-free media, but does require nutritional supplements (Werren et al., 1986; Gherna et al., 1991). Therefore, the bacterium is likely to have retained a number of key metabolic pathways common to free-living bacteria, and to have lost genes in pathways where the host environment provides the necessary metabolites. The overall profile of the A. nasoniae genome is of a relatively inactive biochemical profile, with most metabolic functions underrepresented relative to E. coli, Y. enterocolitica, P. luminescens and S. glossinidius, but all abundant compared with the fastidious obligate mycetome-residing mutualistic symbionts such as Wigglesworthia and Buchnera species.

We overlaid A. nasoniae enzyme encoding genes (including possible pseudogenes) onto a global KEGG metabolic map to examine putative pathways present in A. nasoniae (Fig. 7). Findings for particular functions are described below.

Carbohydrate metabolism

Symbionts vary in the degree to which they maintain aspects of carbohydrate metabolism, and the substrates that are used. The strongly observed bias towards a loss of genes involved with carbohydrate metabolism (above other pathways) in Wolbachia species (Wu et al., 2004) is not observed in A. nasoniae (Figs 6, 7), which carries intact pathways for glycolysis, gluconeogensis, TCA cycle and the pentose phosphate shunt. Functionally, A. nasoniae can utilize glucose as a substrate (Gherna et al., 1991). However, the status of the glucose transport via the glucose phosphotransferase system (PTS) is uncertain, as there is no robust homologue to 6-phospho-beta-glucosidase I. This loss may be compensated for by low affinity glucose uptake by other sugar PTS (Ferenci, 1996). But the absence of this system (paralleled in Wigglesworthia glossinidia) does suggest glucose may not be either the most efficient or preferred source of carbohydrate. In terms of substrates that can be obtained to power metabolism, there is good evidence for PTS systems for N-acetyl-muramic acid, N-acetyl-galactosamine, N-acetyl-D-glucosamine, mannose and trehalose transport. The presence of a novel chitinase and seven putative chitin-binding proteins suggest that N-acetylglucosamine may be a key metabolite, and the lack of a pathway for N-acetylglucosamine synthesis is consistent with its acquisition via a specific PTS system. (Akman et al., 2002). It should be noted that chitin is freely available to A. nasoniae, particularly during the phase in the fly pupal cadaver.

Amino acids

Retention of amino acid biosynthesis typically defines primary symbionts that make a positive contribution to host anabolism. Conversely, the absence of amino acid biosynthetic capabilities is typical of secondary symbionts. The A. nasoniae genome presents no evidence for the capacity to metabolize histidine, arginine and proline. In the case of arginine and proline, their absence is due to the complete loss of the urea cycle. The loss of the histidine pathway is typical of a number of obligate parasitic species (Zientz et al., 2004) and suggests that A. nasoniae has evolved in parallel with other parasitic and facultative bacteria in this regard (Toh et al., 2006; Degnan et al., 2009). The presence of conserved ABC transporters for proline, arginine and methionine and putative ABC transporters for histidine suggest that A. nasoniae is capable of supplementing its reduced biosynthetic capacity by uptake from its environment. This strategy is consistent with the acquisition of these amino acids from the hymenopteran and dipteran hosts.

Other pathways

The gene sets of obligate symbionts typically include genes that are important for symbiosis and typically lack genes that are functionally redundant in specific host environments. Studying the presence and absence of genes with metabolic functions can thus lead to improved understanding of the symbiont/host interaction. The A. nasoniae ORF set suggests a bias for retaining enzymes and other proteins involved in lipid, nucleotide, cofactor and vitamin metabolism (Fig. 6), which are also characteristic of the genomes of Wolbachia species that infect insects (Wu et al., 2004). Genes encoding glycan biosynthesis and metabolism proteins are the only group of metabolic genes that are present in similar numbers to non-symbiotic enterics. However, this finding is not surprising because these genes are retained in all but the most reduced of genomes, e.g. insect Wolbachia and Buchnera species. The retention of genes for glycan and lipid metabolism suggests that a complex cell wall is necessary for completion of the life cycle, and may be essential for tolerating and evading the insect immune system.

DNA repair

DNA repair mechanisms are lost in obligate primary symbionts and some secondary symbionts (Dale et al., 2003), and are associated with suboptimal repair and increased rate of creation (and thus fixation) of deleterious mutations. There is no evidence of the loss of genes involved in the DNA repair and DNA replication pathways of A. nasoniae.

Motility flagellar system

Motility is postulated to be essential for transmission in symbiotic systems. For example, a species of Riesia– the symbiont of the body louse – has been show to rely on flagellar motility to migrate between the bacteriome (tissue of symbiosis) and ovaries (tissue of transmission to the next generation) (Perotti et al., 2006). A. nasoniae has 49 complete flagella genes divided across two clusters, a c. 38 Kbp region containing 35 genes and a c. 16 Kbp region of 14 genes. The smaller cluster contains the Class 1 master regulators FlhC and D, as well as the functional control operons Tar and MotA and the FlhB operon (export pore proteins). The larger cluster contains the remaining operons in full. A. nasoniae was initially described as lacking a flagellum, based on EM pictures (Gherna et al., 1991, and refs therein). A. nasoniae flagella genes show conserved orthology and synteny with the regions in P. luminescens and Pr. mirabilis. Yet it is unclear if these flagella genes are members of one cluster (as in Pr. mirabilis), or two clusters (as in P. luminescens). The genes are currently assembled into two distinct clusters, with the ‘Flh’, ‘Mot’, and chemotaxis genes in one region of the genome and the ‘Fli’ and ‘Flg’ genes in another. Both the clusters themselves, and the order of the genes within each cluster, are distinct from arrangements seen in E. coli. A fully assembled genome is required to deduce the functionality of the flagella genetic machinery. Yet it should be noted that genes with homology to flagella components are often found in non-motile gamma-proteobacterial endosymbionts. Where a flagellum is absent, this pattern is usually associated with varied degrees of gene loss or degradation, believed to represent an ancestral flagella apparatus that has been co-opted for secretory purposes (Toft & Fares, 2008). Where this co-option occurs, gene loss most commonly involves regulatory genes (e.g. FlhC and D, FliZ and A, FlgM), genes encoding motor proteins (motA and B), and in the filament and filament cap genes (FliD and C). The current assembly suggests that the FliD and FliE loci may contain frameshift mutations, but this conclusion awaits confirmation. The combination of pseudogenization in these ORFs with the presence of intact ORFs encoding other flagella constituents suggests that the flagella has either been co-opted into a secretory role (Toft & Fares, 2008), or that the functional flagella is only expressed at key points in the bacterium's transmission cycle.

Regulation

Arsenophonus nasoniae experiences changing environments during its infection cycle, growing and surviving in hosts from two different insect orders and in different host tissues. The genome inventory suggests that the bacterium senses changes in nutrient, cation availability, oxidative stress and osmolarity, utilizes six sigma factors (RpoD, FliA, RpoH, RpoN, RpoE, RpoS) and carries 157 predicted transcriptional regulators. There are 18 two-component signal transduction systems. Regulatory elements of the TTSS systems of A. nasoniae are intact. There are also 16 ORFs with sequence similarity to LysR, which in Photorhabdus have been shown to control transition between symbiosis and pathogenicity (Joyce & Clarke, 2003). Presence of an ORF with fepE sequence similarity indicates a regulatory mechanism for iron uptake (ARN_03750).

The ability of this bacterium to sense its environment seems comparable to Proteus and Photorhabdus species, the latter notable for also being present in multiple host species during different phases of its lifecycle (nematode and insect). In general, the number of regulatory genes is appropriate given the number of identified metabolic networks. However, there is evidence for reduced regulation of glucose and nitrogen metabolism, suggesting that gene loss in these pathways includes regulatory systems. This gene composition may reflect development of A. nasoniae in nutrient rich host environments.

Conclusions

The draft genome assembly of A. nasoniae identifies a chromosome of c. 3 Mbp that is combined with at least two plasmids, and a variety of phage elements. The constitution of the genome indicates that A. nasoniae actively interacts with its wasp and fly pupal host environments. Systems such as flagella and regulatory components are largely intact and are expected to have an important role in the symbiosis, transmission and transitions during its life cycle. Candidates for male killing and interaction with the insect immune systems are described in a separate paper (Wilkes et al., 2009).

The genome of A. nasoniae is reduced in size relative to both P. luminescens and Pr. mirabilis, and reflects the loss of some metabolic capabilities (Fig. 7). This reduction in capability is also reflected in an apparent loss of genes needed for the uptake of glucose and of the ability to synthesise certain amino acids (e.g. histidine). These observations suggest that the niche occupied by A. nasoniae provides an alternative carbon source and unrestricted access to amino acids. We speculate, from the genomic data, that chitin may represent the alternative source of organic carbon for this bacterium, particularly in the fly pupal cadaver. The level of gene loss within metabolic pathways also indicates that this bacterium is losing its ability to live freely outside the insect host. This observation may partly explain why A. nasoniae requires specialized culture media for growth in the lab (Werren et al., 1986; Gherna et al., 1991). Notwithstanding the apparent gene losses, the genome is rich in metabolic capabilities relative to obligate endosymbionts (Shigenobu et al., 2000; Akman et al., 2002; Wernegreen, 2002), facultative symbiont (Degnan et al., 2009) and reproductive parasites (Wu et al., 2004). This reflects the biological observation that other classes of symbionts are nonculture viable or can only be cultured within insect cells (Darby et al., 2005a; Dale et al., 2006). In classical thinking, more ‘complete’ symbiont genomes are held to reflect a more recent evolution to symbiosis, and one which will (over time) diminish (Dale & Moran, 2006). Alternatively, and perhaps more likely in this case, is that the unusual lifecycle of the bacterium selects against loss of capabilities, because growth in different hosts, and in different compartments within the host, is essential for A. nasoniae transmission.

There is some genetic interchange between Arsenophonus and other arthropod symbionts, most convincingly involving the surface protein of Wolbachia. These two widespread symbionts co-infect the same host species (Duron et al., 2008), including N. vitripennis (Breeuwer & Werren, 1990), which provides a likely avenue for such exchanges. The likely nature of the shared ORF – encoding a cell surface protein at the interface between bacterium and its host environment – suggests a similar function in Arsenophonus and Wolbachia. Phage-mediated transfer of genes is also likely, and the identity of the phage ORFs suggests that this also occurs within the ‘symbiome’, potentially with other secondary symbionts. This sharing of ‘information’ between symbionts is likely to be very important in the evolution of symbiosis. Evolutionary innovations – be they strategies for avoiding host immune systems, or means of host manipulation – that are beneficial for one bacterial ‘species’ in the community may spread into others through lateral transfer.

The availability of genome sequences for both A. nasoniae and its host N. vitripennis (Werren et al., 2010) provides an opportunity to investigate complementary gene expression in host and symbiont during invasion and infection, as well as providing avenues of research for identification mechanisms of male killing. Furthermore, Arsenophonus is a widespread genus of symbiotic bacteria found in many arthropods (Duron et al., 2008). The genome sequence presented here is the first for the genus, and will provide a reference to others for investigations of genome evolution related to host shifts and changes in parasitic life-style.

Experimental procedures

Bacterial strain and culture conditions

Preparation of agar medium for plating of A. nasoniae: 3.6 g of DifcoTM GC medium base (Becton Dickinson 228950) was added for every 100 ml of water to make a single strength base. GC medium contains per litre 15.0 g proteose peptone no. 3, 1.0 g corn starch, 4.0 g dipotassium phosphate, 1.0 g monopotassium phosphate, 5.0 g sodium chloride and 10.0 g agar. This was autoclaved at 121 °C for 20 min and allowed to cool to 50 °C. 1 ml IsoVitaleX TM enrichment (Becton Dickinson, Franklin Lakes, NJ, USA) was then added for every 100 ml of GC medium. IsoVitaleXTM enrichment contains per litre 0.01 g vitamin B12, 10.0 g L-glutamine, 1.0 g Adenine, 0.03 g guanine hydrochloride, 0.013 g p-aminobenzoic acid, 0.25 g nicotinamide adenine dinucleotide, 0.1 g thiamine pyrophosphate, 0.02 g ferric nitrate, 0.003 g thiamine hydrochloride, 25.9 g L-cysteine hydrochloride, 1.1 g L-cysteine and 100.00 g dextrose. The GC medium plus enrichment was then poured into several sterile 9 mm Petri dishes and left overnight to dry. The entire procedure was carried out under sterile conditions in a laminar flow unit.

Extraction of Arsenophonus nasoniae from Nasonia vitripennis pupae

Wasps in the pupal stage are used as a source for A. nasoniae to reduce contaminating environmental bacteria (wasps purge their gut contents shortly before pupation) (J. Werren, pers. comm.). The pupae are then surface sterilized by immersion in 70% ethanol for 30 s so only intra-organism bacteria are recovered. Excess ethanol is allowed to evaporate before the wasp is cut open and its body contents dissolved in a drop (2–3 µl) of 1 × PBS buffer (0.01 M phosphate buffer, 0.0027 M potassium chloride and 0.137 M sodium chloride, pH 7.4). This was then spread onto the pre-prepared agar plates using a flame-sterilized glass spreader. A single colony was then used to generate additional material for sequencing. All procedures were carried out under sterile conditions on an ethanol-sterilized surface and under a Bunsen flame. Plates were left at 25 °C for 2–4 days until bacterial growth was evident, then stored at 4 °C.

Genomic DNA Preparation and Sequencing Pyrosequencing was performed by generating a standard fragment and paired-end single-stranded template DNA library using the GS DNA Library Preparation Kits (Roche Applied Sciences, Indianapolis, IN, USA) that were then amplified by emPCR and sequenced on a GS-FLX (454 Life Sciences, Roche Applied Sciences, IN, USA). The 454 reads were assembled with Newbler (v1.1.03.24) using default assembly parameters.

Sequence analysis, annotation and comparative genomes

Assembly was performed with Newbler (Roche) and gap4 (http://staden.sourceforge.net). Protein-coding genes were identified with GLIMMER (Delcher et al., 1999) and GENEMARK (Lukashin & Borodovsky, 1998), and tRNA genes by tRNAscan-SE (Lowe & Eddy, 1997). Putative functions were inferred using BLAST against the National Center for Biotechnology Information databases (Altschul et al., 1990) and InterProScan (Hunter et al., 2009), metabolic pathways were examined by using the SEED (Overbeek et al., 2005) and KEGG databases (Kanehisa & Goto, 2000). Pathway figures were contructed using IPATH (Letunic et al., 2008). Artemis v11 was used to organize data and facilitate annotation (Rutherford et al., 2000). Repeat identification was made using MUMmer (Kurtz et al., 2004). Orthologues were defined using ORTHOMCL (Li et al., 2003).

The phylogeny was reconstructed using orthologous gene sets identified from other bacterial genomes using ORTHOMCL (Li et al., 2003), aligned with MUSCLE (Edgar, 2004) and trimmed with GBLOCKS (Castresana, 2000). Gene alignments were then concatenated and maximum likelihood trees calculated by JTT implicated in PHYML, estimated transition/transversion ratio, fixed proportion of invariable sites (J. Felsenstein 1993. phylip PHYLogeny Inference Package version 3.6a2, Distributed by the author, Department of Genetics, University of Washington, Seattle, WA, USA), 1000 bootstrap replicates were performed. The Bayesian MC3 approach was implemented in MrBayes v3.1 (Huelsenbeck & Ronquist, 2001).

Database submission

The genome has been submitted to EMBL under accession numbers FN545141-FN545284.

Acknowledgements

GH, MH, AD and TW were supported by funding from the MRC and NERC. Work by JW was funded by the U.S. National Science Foundation grant EF-0328363. We thank Prof Neil Hall of the Centre for Genomics Research, University of Liverpool, for bioinformatic and technical support throughout. This work was supported by the Indiana METACyt Initiative of Indiana University, through a major grant from the Lilly Endowment, Inc. The project was also supported by the Indiana Center for Insect Genomics project funded through the Indiana 21st Century Research and Technology Fund. Jade Buchanan-Carter, Zachary Smith and Nicole Di Camillo processed the DNA samples and sequencing reactions at the Center for Genomics and Bioinformatics. Phillip Steinbachs lead the computing support group. Amanda Avery and Jorge Azapurga isolated, quality checked DNA and cultured A. nasoniae for sequencing.

Conflicts of interest

The authors have declared no conflicts of interest.

Ancillary