SEARCH

SEARCH BY CITATION

Keywords:

  • symbiont;
  • Wolbachia;
  • male-killing;
  • Photorhabdus;
  • Arsenophonus;
  • genome;
  • toxins

Abstract

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Four percent of female Nasonia vitripennis carry the son-killer bacterium Arsenophonus nasoniae, a microbe with notably different biology from other inherited parasites and symbionts. In this paper, we examine a draft genome sequence of the bacterium for open reading frames (ORFs), structures and pathways involved in interactions with its insect host. The genome data suggest that A. nasoniae carries multiple type III secretion systems, and an array of toxin and virulence genes found in Photorhabdus, Yersinia and other gammaproteobacteria. Of particular note are ORFs similar to those known to affect host innate immune functioning in other bacteria, and four ORFs related to pro-apoptotic exotoxins. The genome sequences for both A. nasoniae and its Nasonia host are useful tools for examining functional genomic interactions of microbial survival in hostile immune environments, and mechanisms of passage through gut epithelia, in a whole organism context.


Introduction

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Insects engage in many different interactions with enteric bacteria (bacteria from the gamma division of proteobacteria). The spectrum varies from strongly parasitic (causing death of the host) through more commensal interactions, to ones where the presence of the bacterium is beneficial or even essential (Dale & Moran, 2006). Pathogenic interactions range from passive to extremely aggressive infections. For example, Erwinia carotovara and Pseudomonas entomophila are gut-invasive pathogens, causing a loss of gut epithelial integrity that allows their dissemination throughout the insect (Vallet-Gely et al., 2008). Photorhabdus luminescens represents a very aggressive infection. A ‘partner’ of nematodes, it switches to a pathogenic lifestyle on regurgitation inside a lepidopteran haemocoel (ffrench-Constant et al., 2003). This bacterium carries a diverse arsenal used to subjugate and kill its secondary host, comprising both toxins and systems for disabling host immune responses (Waterfield et al., 2004).

In contrast to these pathogens, other members of the gammaproteobacteria are symbionts of insects. These are persistent infections, and are commonly divided into two categories. Primary symbionts (such as Buchnera, Wigglesworthia, Blochmannia) required for host function by virtue of various anabolic roles, are typically integrated into both host anatomy (through a bacteriome) and physiology and commonly have a long evolutionary history with their host. Secondary symbionts, such as Hamiltonella and Sodalis, are dispensable, generally less integrated into anatomy, and sometimes exist within the host haemolymph (Moran et al., 2005). Here, they must either avoid eliciting or be refractory to any clearing immune response of the host. Secondary symbionts often provide an ecologically-contingent benefit, such as natural enemy resistance (Haine, 2008).

The gammaproteobacterium Arsenophonus nasoniae infects the wasp Nasonia vitripennis. First described phenotypically from its ‘son-killer’ behaviour, it represented a maternally inherited trait present in around 4% of female N. vitripennis wasps that were typified by the production of female biased secondary sex ratios (associated with the death of 80% of sons) (Skinner, 1985). Whilst male-killing bacteria have been found to be relatively common in insects (Hurst et al., 2003), the N. vitripennis son-killer has many unusual features. First, it can be infectiously transmitted by sharing a pupal host. Second, it was isolated into cell-free culture (Werren et al., 1986), and the causal agent therefore officially named and characterized (Gherna et al., 1991). Third, the bacterium has an unusual relationship with its host, in that it combines aspects of pathogenic and symbiotic lifestyles. It invades through the gut wall and establishes a persistent, ubiquitous infection (Huger et al., 1985) where it survives in a hostile immune environment and is maternally inherited. However, rather than passing through the cytoplasm of the egg as other male-killers, A. nasoniae is injected into the fly pupal host at oviposition, and is then ingested by early instar wasp larvae where it reinvades through the wasp larval gut (Huger et al., 1985; Werren et al., 1986). Therefore, it also routinely and alternately infects two kinds of hosts: parasitic wasps and fly pupae. Typically, the fly puparium has been injected with wasp venoms that alter its physiology (Rivers & Denlinger, 1994), but the host remains alive for several days, a time frame when the bacterium appears to replicate within the fly and is transmitted to feeding wasp larvae. In many ways, the only aspect of its symbiosis truly shared with other ‘male-killers’ is the phenotype: it kills male hosts. Male killing is achieved by blocking the formation of maternally-derived centrosomes in the unfertilized haploid (male) embryos of this haplodiploid insect (Ferree et al., 2008)

The combination of pathogenic and symbiotic features makes A. nasoniae an interesting organism. It can serve both as an insect model for gut invasion and as a model for the biology of persistent infection. In this paper, we report results from our analysis of a draft genome sequence of this organism (Darby et al., 2010), with special reference to candidate loci and genetic components of the genome that may be important for the microbe's interaction with the insect host, particularly in terms of how it invades, survives in a hostile immune environment and how it kills males. We first recapitulate the basic properties of the genome as described in Darby et al. (2010) and then examine what the genome can tell us about the ‘machinery’ used in interaction with the host, in terms of secretion systems. We then detail open reading frames (ORFs) that have protein sequence similarity to genes coding for effector molecules, as identified in other systems.

Results

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

The A. nasoniae draft genome analysed here was obtained through pyrosequencing of a standard fragment and paired-end single-stranded template DNA library using the GS DNA Library Preparation Kits (Roche Applied Sciences) that were then amplified by emPCR and sequenced on a GS-FLX (454 Life Sciences). The 454 reads were assembled with Newbler (v1.1.03.24) using default assembly parameters. The draft A. nasoniae genome sequence thus assembled comprised 143 sequence scaffolds (median size 7 Kbp, max 212 Kbp), in 665 contigs (median size 4 Kbp, maximum 43 Kbp), with 261 sequencing gaps in the scaffold, and resulted in a total draft genome assembly size of 3,567,128 bp, with a GC content of 37.7%. These scaffolds are a mixture of bacterial chromosomes (c. 3.2 Mbp) and extrachromosomal DNA (∼100 Kbp from plasmids and ∼200 Kbp from phage). The draft genome assembly has been deposited in EMBL (accession numbers FN545141-FN545284). Full details of the sequence and its properties can be found in Darby et al. (2009).

From the 3332 predicted ORFs, we defined those that are most likely to be associated with interaction with the insect host. Most of our report is based on inferring the identification and putative function of annotated ORFs with significant sequence similarities (e < 1 × 10−10) to genes of functionally characterized microbial genomes using BLASTp search algorithms (Altschul et al., 1990), and through the detection of conserved domains as identified from Interpro search. Homology is, where stated, inferred either from gene synteny, or from phylogenetic reconstruction (trees are not shown). The ORF accessions are either noted on figures or tables, or can be found in supplementary material Table S1.

Microbial systems of interaction with the host can be subdivided into the apparatus associated with delivery of proteins and small molecules (secretion systems), the acquisition of molecules from the host environment (transporter systems), and the secreted bioactive proteins and molecules themselves. Below, we identify ORFs within the A. nasoniae genome likely to function in these three roles.

Secretion machinery

There are two types of bacterial machinery for secretion: ones that possess a needle and secrete into eukaryotic cells (e.g. Type III secretion systems), and others that translocate proteins, ions and small molecules across the bacterial membranes into the environment (ABC transporters, Sec-dependent translocation, and Sec-independent translocation).

Type Three Secretion systems

Arsenophonus nasoniae has two complete Type III secretion systems (TTSSs). The first is most closely akin to the TTSS of Yersinia sp. and the second to the Inv/Spa-like apparatus of Salmonella. The genome also contains TTSS fragment regions, numbered 1–3, with sequence similarity to Shewanella, Yersinia and Salmonella, respectively (Figs 1–3).

image

Figure 1. The Yersinia-like type III secretion system (TTSS) operon of Arsenophonus nasoniae. Regulation of the operon is most likely exerted by a system akin to that of Pseudomonas via the ExsD,VirF/ExsA,ExsC and ExsE homologues. LcrD, R and V are homologues of the Yersinia low calcium response genes, which activate secretion in the presence of a host cell. YopB and YopD, along with their chaperone SycD, are type III secreted pore-forming proteins in Yersinia. ‘Sct’ is a unified TTSS gene nomenclature (Hueck, 1998). Gene order and function in this operon is compared with its closest homologue: the TTSS of Yersinia entercolitica. The light orange bars between loci indicate areas of sequence similarity and gene order conservation. Also shown is the A. nasoniae TTSS fragment region #2. Red stars indicate open reading frames pseudogenized by frame shift. Y. entercolitica gene order taken from Toh et al. (2006).

Download figure to PowerPoint

image

Figure 2. The Salmonella Inv/Spa-like type III secretion system (TTSS) of Arsenophonus nasoniae. The Salmonella Inv, Spa and Prg genes, which encode the secretion machinery, all have identifiable A. nasoniae counterparts (with the exception of InvH), as does the regulator HilA. OrgA (Salmonella) and HrpE (Burkholderia) are oxygen-regulated TTSS components. A. nasoniae open reading frames similar to the effector SipC and its chaperone SipA have been identified based on synteny and size similarities. ACP is an acyl carrier protein. Gene order and function in this operon is compared with its closest homologues: the SPI-1 TTSS of Salmonella enterica serovar Typhi strain CT18 and the SSR-2 TTSS of Sodalis glossinidius strain ‘morsitans’. The light orange bars between loci indicate areas of sequence similarity and gene order conservation. Also shown is the A. nasoniae TTSS fragment region #3, which is lacking several genes of the needle complex. S.enterica gene order taken from Toh et al. (2006).

Download figure to PowerPoint

image

Figure 3. Gene order and function of the Arsenophonus nasoniae type III secretion system (TTSS) fragment #1 compared with its closest homologue: the TTSS of Shewanella baltica OS155. The light orange bars between loci indicate areas of sequence similarity and gene order conservation. Notable is the loss of the AraC-like transcription regulator. In A. nasoniae this region is highly pseudogenized, containing many stop codons. S.baltica gene order taken from Toh et al. (2006).

Download figure to PowerPoint

All three virulent Yersinia species contain a TTSS encoded on their large virulence plasmid, with a core, highly conserved block of about 20 Kbp. The conserved region contains 31 genes in 8 transcriptional units (lcrGVH-yopBD|yopN|lcrDR|yscN-U|virG|virF|yscA-L|lcrQ) (Hueck, 1998). Organisationally, this region of the A. nasoniae TTSS operon is very similar to that of Yersinia in terms of both gene content and order (Fig. 1). Of all the genes previously shown to be essential for the functioning of the TTSS, only YscE has no discernable homologue in the A. nasoniae operon. The address in the genome where YscE would be expected has 29% amino acid identity to the type III export protein PscE of Pseudomonas aeruginosa, suggesting a functional equivalent. Interpro search indicates almost all the genes of the Yersinia-like TTSS of A. nasoniae contain complete motifs relevant to their function. There are four exceptions, detailed in supplementary material Table S2. The final ambiguity in terms of function is the presence of a VirF ORF. BLASTp similarity to this ORF is spread across contiguous A. nasoniae ORFs, with the second ORF containing an intact AraC domain. Whether this frameshift is actual or a homopolymer sequence artefact awaits resolution. Despite these queries, we conclude that A. nasoniae is highly likely to carry a functional TTSS akin to that found in Yersinia or Pseudomonas.

In addition to the structural elements of the Yersinia TTSS, A. nasoniae has a cluster of regulatory genes at the end of the operon and on the opposite strand. These ORFs have sequence similarity to components of the low Calcium response (Lcr) genes of Yersinia that promote effector secretion under low Calcium conditions (such as those found in the presence of a host cell). Also present is a PopB-like (Pseudomonas)/YopB-like (Yersinia) ORF, a pore-forming translocation protein that functions in regulation of secretion but has also been shown to act directly against host cells. Downstream of this region are two ORFs with similarity to ExsC and ExsE of Pseudomonas. Together with the ExsD and ExsA/VirF-like ORFs, we find a complete control mechanism for TTSS gene expression in A. nasoniae of the kind described for Ps. aeruginosa (Rietsch et al., 2005).

The second TTSS operon of A. nasoniae is homologous to the Inv/Spa apparatus of Salmonella sp. pathogenicity island 1 (SPI-1), responsible for the invasion of intestinal epithelia (Hueck, 1998). Homology is inferred both from conservation of gene content and synteny to the TTSS of both Salmonella sp. and Shigella sp. (Galan, 1996) (Fig. 2). The core packet of genes forming the Inv/Spa system is present in A. nasoniae, with the exception of an InvH homologue, a part of an outer membrane translocation complex (Crago and Koronakis, 1998), which is apparently absent. Effectors in this operon include ORFs with sequence similarity to the proapoptotic molecule SipB and its chaperone SipD. Whilst BLASTp indicates the effector SipA and its cognate chaperone SipC are absent, there are two predicted ORFs either side of the SipD-like gene that we believe are likely to be an effector/chaperone pair based on their size and synteny.

Other secretion systems

Arsenophonus nasoniae, like other gammaproteobacteria, possesses both Sec and TAT (twin arginine translocase) systems for translocating proteins carrying the cognate signal sequence through the inner membrane. Proteins secreted via this system may either remain in the periplasmic space, autotransport through the outer membrane, or move through the outermembrane via a type IV pilus (Natale et al., 2008), which the genome sequence suggests is also present. In addition to these systems, A. nasoniae possesses ORFs revealed by BLASTp to be related to a wide variety of ABC transporters that translocate small molecules and larger peptides/proteins across both cell membranes. Those likely to be important in virulence are described below (details of others in Darby et al., 2009).

Microbes that live inside live hosts are commonly limited by iron availability, and iron acquisition systems are thus regarded as ‘virulence determinants’ (Payne & Finkelstein, 1978). Arsenophonus nasoniae possesses ORFs with reciprocal BLAST match, confirmed by phylogeny (see supplementary material Fig. S3), to two systems for the translocation of chelated iron, one based on one Fe3+ ABC transporter system and the other on the TonB-dependent FepABCDG translocation system of Escherichia coli, with which it is syntenous. It is unclear whether A. nasoniae itself manufacturers siderophores, although there is a fragment of an ORF with polyketide synthase domains discussed below that may be involved with siderophore production. Arsenophonus nasoniae also possesses an operon showing both sequence similarity and synteny to the E. coli Ferrous-iron transport system FeoABC, which transports free (unchelated) ferrous iron.

Effector molecules and pathogenicity islands

The genome of A. nasoniae carries ORFs with sequence similarity to a variety of effector molecules. In addition, it contains three islands of effector molecules apparently undergoing pseudogenization. We first review the variety of ORFs akin to type III secreted effectors, describe a potentially novel symbiosis/pathogenicity island, and then outline ORFs encoding either potential effectors, or the synthesis of small molecule effectors. We finally describe three toxin islands that are apparently undergoing pseudogenization.

Type III effector-like ORFs. Twelve A. nasoniae ORFs with BLASTp sequence similarity to known TTSS effectors were observed, corresponding to ten different effector molecules (Table 1). As is typical of TTSS effectors, only a minority (three) of these are found within the TTSS operon itself (see above), with the others being dispersed throughout the genome. The ORFs to which these show sequence similarity variously alter host cell signalling pathways, in particular, altering host innate immune responses.

Table 1. Arsenophonus nasoniae open reading frames with sequence similarity to known type III effectors
A.nas ORF/sSimilar ORF (bit score, p)Function of similar ORFNotes on Arsenophonus nasoniae ORFReferences
  1. Underneath the A. nasoniae open reading frame (ORF) accession is the length in amino acids (l =), underneath the ‘Similar ORF’ information is the bit score (S =) and e-value. Where more than one A. nasoniae ORF exists, length and BLASTp data are given for most similar ORF only. TTSS, type III secretion system.

36660YopJ YersiniaAlters ubiquitination status, disrupting MAP Kinase and NF-kB signalling, and therefore affecting innate immune signalling and cytokine production. Antiapoptotic.Of similar length and 62% sequence similarity. Cysteine protease catalytic core and catalytic triad identified in YopJ is intact.Orth et al. (2000)
l = 297S= 211, e-53
23010YopH Yersinia/Tyrosine phosphatase activity provides resistance to phagocytosis.Longer than YopH and carries two tyrosine kinase elements, with precise match at known active site.Black & Bliska (1997)Galán (2001)
SptP Salmonella
l = 726S= 132, e-29
10280Effector in ShewanellaNot known. Effector hypothesis is raised by position in TTSS and chaperone binding site.Within an incomplete TTSS. Contiguous to ORF with similarity to cognate chaperone.Black & Bliska (1997) Galán (2001)
l = 1221S= 441, e-121
35130,SopA SalmonellaHECT-3 like ubiqutin ligase enzyme, alters ubiquitination status of proteins.Similarity over C terminal of protein (N terminal often different in members of this family). Intact HECT-3 domain and necessary cysteine residue for catalytic activity at site 753.Zhang et al. (2006)
02810,S= 240, e-61
26090 l = 553 
35620SopB Salmonella/Inositol phosphate phosphatase enzyme.66% sequence similarity over entire ORF. Active domain recognised. Resides next to cognate chaperone, SigE/IpgDNorris et al. (1998)
l = 556IpgB Escherichia coli
S= 504, e-142
23150PipA SalmonellaType III secreted protein of unknown function, save important in virulenceSequence similarity over entire ORF.Tenor et al. (2004)
l = 224S = 158, e-37
23940OspG ShigellaAlters phosphorylation of ikB, interfering with activation of NF-kB pathway and innate immune signallingProtein kinase domain intact, including lysine residue required for kinase activity.Kim et al. (2005)
l = 197S= 98, e-19
24950ExoY PseudomonasAdenylate cyclase activity, creates 100 fold increase in intracellular cAMP and thus interferes with signalling pathways.Sequence similarity at N terminal, including anthrax toxin domain. C terminal incomplete by virtue of sequencing gap means full functional assessment not possibleYahr et al. (1998)
l = 266S= 154, e-36
14250l = 376IpaD Shigella S = 100, e-19Surface antigen, required for bacterial entry into epithelial cells and introduction of late effectors. Activity depends on C terminal of protein.Within TTSS. Found proximal to cognate chaperone. 55% Sequence similarity at C terminal.Picking et al. (2005)
14270SipB SalmonellaImportant for cell entry and translocation of late effectors. Pro-apoptotic through binding to caspase-1.Within TTSS. 49% sequence similarity at C terminal. Partial invasin domain found.Hersh et al. (1999)
l = 666S= 123, e-26

In addition to those ORFs outlined in Table 1, we identified candidate type III effectors within the TTSS defined by synteny rather than sequence similarity (detailed above). We would also note two other ORFs that deserve investigation as candidate TTSS effectors by virtue of being both proximal to one of the likely effector ORFs in Table 1, and having pentapeptide repeat motifs (supplementary material Fig. S1). Pentapeptide repeat motifs are a soft indicator of involvement in virulence, being found in TTSS effectors such as PipB2 (Knodler & Steele-Mortimer, 2005), but also in protein with no virulence association. In the case of YopH– pentapeptide repeat ORF pair, a third ORF is found with N terminal similarity to the effector SopA. This ORF is clearly not a SopA homologue, but nevertheless warrants investigation.

Leucine Rich Repeat/mcf Symbiosis/Pathogenicity island.Arsenophonus nasoniae possesses a likely pathogenicity/symbiosis island/s containing ORFs with multiple Leucine Rich repeats alongside two ORFs, the first with sequence similarity to an Aeromonas cell wall enterotoxin encoded by the gene ast (Table 2) and the second a cell wall hydrolase (Fig. 4). All proteins containing LRR domains bind ligands (that may themselves be proteins, but can be RNA or polysaccharide), and many have been observed to be important in host/bacteria interactions, both in terms of bacterial pathogenicity and host immunity (Kobe & Kajava, 2001). One such bacterial gene is the poorly understood Yersinia effector YopM, a cytotoxin essential for Yersinia pathogenicity (Hines et al., 2001). Other LRR domains function in bacterial internalization in the presence of eukaryotic cells.

Table 2.  Intact open reading frames with sequence similarity to toxin elements
A.nas ORF/sSimilar ORF (Bit score, P-value)FunctionNotes on Arsenophonus nasoniae ORFReferences
  1. Underneath the A. nasoniae open reading frame (ORF) number is the length in amino acids (l=), underneath the matched ORF is the bit score (S=) and e-value. Where more than one A. nasoniae ORF is listed, length and BLASTp data are given for the most similar ORF only. TTSS, type III secretion system.

10220 23450, 07720, 33080 l = 486Aip56, Photobacterium damselae ssp. piscicidaPro-apoptotic exotoxinOf similar length and 45–54% sequence similarity across Aip56. 3 ORFs are in phage regions, one is proximal to a TTSS.Do Vale et al. (2007)
S= 325, e-87
36850 l = 1051cnf1: Cytotoxic necrotizing factor 1, Escherichia coliC terminus is translocated into cells, and causes illegitimate activation of Rho GTPase activity, altering signalling.Sequence similarity at N terminus, with intact cell receptor domain, and pair of membrane spanning helices. C terminus does not possess cnf1 catalytic domain that causes toxicity, and has no matches in NCBI nr.Boquet (2001)
S= 107, e-21
35480 l = 151Colicin V, PhotorhabdusBacteriocidalIntact colicin V production motifCascales et al. (2007)
S = 256, e-67
22290 l = 431Colicin 1bBacteriocidalPore forming domain complete. Lies proximal to colicin 1b immunity–factor like ORF factor.Cascales et al. (2007)
S= 225, e-57
31950 l = 533SerralysinInsecticidalComplete serralysin domain of insecticidal hemolysin of Serratia.Tao et al. (2007)
S= 328, e-88
28400 l = 651ast, Aeromonas hydrophilaEnterotoxinCarries signal peptide. 72% aa identity over entire Aeromonas ORF. Part of LRR/mcf island.Sha et al. (2002
S= 761 e= 0
image

Figure 4. The organization of the two LRR-MCF regions of Arsenophonus nasoniae. The first, larger region contains two LRR-MCF open reading frames (ORFs) and three YopM-like genes containing 12, 11 and 30 leucine rich repeat regions, respectively. The star in LRR-MCF ORF 1 indicates the location of a stop codon that may be an artefact of 454 sequencing. Upstream is a cell wall hydrolase-like gene (best BLASTp matches to genes found in various Salmonella isolates) and an ORF with similarity to the ast enterotoxin of Aeromonas (detailed in Table 2). The second, smaller region contains a single LRR-MCF ORF followed by a gene with weak homology to MCF (e < 1 × 10−4). Between this and the transposase are, from left to right, a tagatose bisphosphate aldolase and two phosphotransferase system (PTS) ORFs. Immediately upstream is a scaffold start.

Download figure to PowerPoint

Within this region, three ORFs are particularly notable for sharing a common, but previously unrecognized, chimeric structure. They combine N terminal LRR elements coupled to a C terminal with similarity to part of the Photorhabdus gene mcf (makes caterpillars floppy) (Daborn et al., 2002) (Fig. 5). The mcf-like part of these ORFs corresponds to a section of the Photorhabdus gene with sequence similarity to the toxin B and RTX domains found in the C terminal region of mcf. The pro-apoptotic BH3 domain found in the N terminal portion of Photorhabdus mcf was not detected.

image

Figure 5. A comparison of the three Leucine Rich Repeat (LRR) – Makes Caterpillars Floppy (MCF) open reading frames (ORFs) of Arsenophonus nasoniae. Green arrows indicate areas containing LRR domains, the red arrows indicate areas of homology to the Repeats in Toxin (RTX) toxin gene of Vibrio sp. and the orange bars show regions of homology to the MCF gene of Photorhabdus luminescens. The black region in LRR-MCF ORF 2 indicates a sequencing gap of known size (75 amino acids).

Download figure to PowerPoint

Other ORFs with potential for toxin function/production.Table 2 additionally provides details of other ORFs where BLASTp search finds significant similarity to toxin genes. Most notable are four ORFs, dispersed throughout the genome, with sequence similarities to the gene Aip56 (apoptosis inducing protein 56) carried on a plasmid of Photobacterium damselae ssp. piscicida. Functional studies demonstrated the protein product of Aip56 is a pro-apoptotic exotoxin, the protein being sufficient to kill neutrophils, and being necessary for virulence of the bacterium (immunization of fish against Aip56 resulted in failure of infection to kill) (Do Vale et al., 2007). BLASTp indicates these ORFs are members of a small family present in pathogenic gammaproteobacteria. They are similar to type III effector C protein in three species, and to an uncharacterized secreted protein in a pathogenic E. coli. A final BLAST return recovered similarity of the C terminal of the A. nasoniae ORF to an element in APSE-2, a phage present in the aphid secondary symbiont Hamiltonella defensa. A. nasoniae is the only genome to date to possess more than one copy of this family. However, the expansion of the family is not recent, and amino acid identity between members within A. nasoniae is no more than 75% in any case. Phylogenetic analysis cannot confidently resolve whether the A. nasoniae copies are monophyletic within A. nasoniae. Therefore, it is unclear if this finding represents a single origin or a serial transfer of the element.

The other notable putative ‘toxin’ found by sequence similarity searches (e < 1 × 10−21) has sequence similarity to the cnf1 gene (cytotoxic necrotizing factor 1). The cnf1 locus possesses three functional domains. The N terminal carries a cell receptor binding domain, important in adhesion to eukaryotic cells, and a pair of hydrophobic membrane spanning helices. These are conjectured to be important in initiating transfer of the bioactive C terminal domain into the cell, which then illegitimately activates host Rho GTPases (Boquet, 2001). The cnf1-like ORF of A. nasoniae carries the domains for both adhesion and transfer of the C terminal, but the C terminal is distinct (no returns on BLASTp search). This ORF deserves investigation, as it has the machinery for translocation of the C terminal into eukaryotic cells, making the C terminal potentially bioactive, but in unknown ways.

Additionally, the genome encodes 38 putative haemolysins and alkaline metalloproteases that are potentially transported by two ABC family transporters with weak sequence similarity to RTX/hemolysin transport proteins. One of the haemolysin genes has sequence similarity to serralysin, an insecticidal toxin of Serratia sp. (see Table 2).

Small molecules synthesis and secretion. Bacteria can also secrete an array of small bioactive organic molecules. These molecules may aid survival in a hostile host environment, or have direct activity against the host. Within the first category are molecules such as siderophores, which function in iron scavenging, an important determinant of fitness in iron-poor settings that typify the host environment. In the latter category are a range of secreted polyketides such as the antibiotic tetracycline, and molecules that show toxicity to eukaryotes. For instance, the Pseudomonas symbiont of Pederea beetles is known to synthesize and secrete the polyketide pederin, which produces repulsion of the host to predators (Kellner, 2002).

Siderophores and secreted toxin molecules often share biosynthetic pathways and can be synthesized in condensation reactions similar to those of the fatty acid synthesis pathways. There is evidence within the genome of A. nasoniae for genes that both form and export polyketide molecules. Three ORFs are found within the genome that may function in polyketide synthesis (Supplementary material Table S3): a putative polyketide cyclase, an ORF incomplete in the current assembly that nevertheless possesses a variety of domains commonly found in polyketide synthase (PKS) enzymes, and a final ORF (proximal to the PKS-like ORF) with homology to 4′-phosphopantetheinyl transferase, that is likely to activate ketosynthase (KS) enzymes, via post-translational modification of the KS acyl carrier protein domain (Lambalot et al., 1996).

Polyketide efflux is suggested by the presence of ORFs with reciprocal BLAST match (confirmed by phylogeny, see supplementary material Fig. S3) to macrolide/antimicrobial peptide transporter systems. Macrolide transport is associated with an ABC system comprising two proteins, the ABC transporter protein itself, and MacA, a periplasmic membrane fusion protein that connects inner and outer membrane components of the transporter (Tikhonova et al., 2007). Arsenophonus nasoniae has these components coupled within the genome. Both appear intact with the expected functional motifs and share similar length and significant sequence similarity to functionally characterized genes in Yersinia (e < 1 × 10−108).

Islands undergoing pseudogenization

The elements of a genome undergoing pseudogenization can provide insight into the historical biology of a bacterial species. Whilst homopolymer artefacts associated with sequencing make individual frame shift errors difficult to interpret, the following islands presented multiple frameshift mutations and stop codons and can thus be ascribed pseudogene status with more certainty.

RTX Island. The RTX (repeats in toxin) are a family of exotoxin proteins known from studies of mammalian pathogens to possess haemolytic, leucotoxic and leucocyte-stimulating activities (Lally et al., 1999). The A. nasoniae genome encodes ORFs with similarity to an RTX ABC transporter and has a 26 Kb region with sequence similarity to RTX proteins (ARN_13180 to ARN_13310). The transporter structure is complete and shows both strong sequence similarity and synteny with RTX transporters found in Photorhabdus sp. (confirmed by phylogeny, see supplementary material Fig. S3). The region with similarity to genes encoding RTX proteins itself is probably pseudogenized, with the equivalent of three ‘functional ORFs’ disrupted by two stop codons and five frameshifts to make 10 partial ORFs. Comparable genes in Yersinia sp. show similar fragmentation and may have a common ancestry from larger complete RTX proteins seen in Providencia sp.

Clostridium/Ricketsiella/Xenorhabdus Island.Arsenophonus nasoniae possesses a 10 Kbp island of ORFs with sequence similarity to toxins found in Clostridium or Xenorhabdus (ARN_08100 to ARN_08150). Whilst none of the ORFs have outstanding similarity to any of the toxin elements (e > 1 × 10−25), this series of ORFs does suggest a potential pathogenicity/symbiosis island that is undergoing pseudogenization. The island is predicted to contain six ORFs, but BLASTP sequence identity searches indicate that ancestrally it may have contained fewer. The first three ORFs show two alignments to Clostridium ORF X2-like genes from Erwinia (e < 1 × 10−46) separated by a stop codon and a frameshift. Clostridium ORF X2 has a role in botulinum toxin production (Dineen et al., 2004). The final two ORFs are similar to nematicidal protein Plu2242 of Xenorhabdus/Photorhabdus (e < 1 × 10−90), but the single Xenorhabdus/Photorhabdus ORF is disrupted by at least one stop codon in A. nasoniae.

Toxin complex. Insecticidal toxin complex (Tc) proteins were first isolated in P. luminescens (Bowen & Ensign, 1998) and have since been described in a variety of entomopathogens (Serratia entomophila, Xenorhabdus nematophilus, Pseudomonas entomophila) and several Yersinia sp. (Fuchs et al. 2008 and refs. therein). Three components are needed for full toxicity, a TcdA-like (TcaA, TcaB or TcdA), a TcdB-like (TcaC or TcdB) and a TccC-like component. Experiments expressing Tc genes in E. coli showed that TcdA1 on its own exhibits low levels of toxicity towards insect gut epithelium, with full toxicity achieved only with the addition of both a TcdB and a TccC-like gene product (Waterfield et al. 2005; Munch et al., 2008).

Arsenophonus nasoniae has a number of ORFs with BLASTp sequence similarity to Tc genes in P. luminescens (e < 1 × 10−58). A. nasoniae has ORFs with protein sequence alignments to TcdA4, TcdB1 (on one island) and TccC (elsewhere in the genome). The TcdA4 BLAST returns are spread over three separate predicted ORFs, with one frameshift event in the middle of the ‘gene’ and multiple stop codons in the end section. Likewise, a frameshift divides the region with sequence alignment to TccB1 over two ORFs, but does include the SpvB domain seen in TcdB1 of P. luminescens (Waterfield et al., 2001). The A. nasoniae TccC-like region is interrupted by an IS100 transposase and multiple stop codons. Although further analysis of these regions is required, from the available sequence data it appears that A. nasoniae contains the pseudogenized remnants of an ancestral Tc insecticidal toxin locus.

Secreted proteins defined by signal sequence presence

A search for signal peptide motifs reveals 310 intact ORFs predicted to carry a signal sequence for sec-dependent secretion under either NN or HMM algorithms (supplementary material Table S4). As expected, many of these ORFs have best BLASTp matches to membrane bound parts of transport (type III secretion, ABC, type IV pilus) or motile (flagellar) machinery. Other ORFs have best BLASTp matches to extracellular solute binding proteins, to lipoproteins, to proteins expected to be involved in peptidoglycan formation and breakdown, and to outermembrane proteins of undefined function, commonly described as ‘antigens’.

Of biological interest are a variety of ORFs with best BLASTp returns to genes that function in adhesion to eukaryotic cells (described in a later section), a group of five ORFs with chitin-binding domains, a superoxide dismutase, eight ORFs with peptidase/carboxypeptidase domains, and a homologue of ecotin, a serine protease inhibitor. There were also ORFs with Sel1 and ankyrin domains, both of which are involved in protein–protein interactions, potentially with eukaryotic cells. The ankyrin repeat containing protein carried four repeat units, and best BLASTp hits were to ORFs in Providencia and Proteus, the closest relatives of Arsenophonus. The Sel1-containing ORFs carried four and five repeat units. Sel1 repeats belong to the tetratricopeptide repeat family, and many prokaryotic genes involving Sel1 repeats mediate interaction with eukaryotic cells (Mittl & Schneider-Brachert, 2007). Both ORFs have best BLASTp hits to ORFs in Providencia.

We investigated in greater detail the likelihood that A. nasoniae possesses a functional homologue of ecotin. ORF ARN_08870 encodes a protein of similar length to that found in E. coli, and the four domains known to be present in ecotin (primary substrate binding, secondary substrate binding, inhibition and dimerization) are all present. All have perfect matches at the required amino acid motifs, with the exception of the dimerization motif (which matches at 10 of 11 amino acid residues). It is thus likely that this secreted protein possesses SERPIN properties.

Outer membrane proteins determine many aspects of interaction with the host. OmpA has recently been demonstrated to be a modulator of interaction with the host immune system (Weiss et al., 2008) as well as functioning as an adhesin and invasin, a modulator of biofilm formation and a bacteriophage receptor (Smith et al., 2007). Weiss et al. (2008) demonstrated that the sequence of the four external loops of this protein (especially the first, L1) dictated whether the bacterium induced an immune response and persisted in the insect haemocoel, with insect endosymbionts having a distinct L1 from vertebrate pathogens. ORF ARN_16100 is a homologue of OmpA (as determined by phylogenetic analysis) and carries a signal peptide. The A. nasoniae OmpA-like ORF contains an insertion in the L1 region not seen in pathogenic E. coli, Salmonella typhimurium or Shigella flexneri, but characteristic of the symbiont set. The A. nasoniae insertion is however, different to those seen in Sodalis glossinidius, P. luminescens, SOPE and a secondary endosymbiont of Craterina melbae, and more similar to that of H. defensa (Supplementary material Fig. S2). The fact that A. nasoniae contains a novel motif in a functionally important region of a gene shown to be so crucial in the switch to symbiosis makes it an important candidate for future investigation.

Adhesion to eukaryotic cells and invasin

Bacterial IgG repeats are commonly important in cell adhesion, and ORFs with these repeats are found frequently in the A. nasoniae genome. One such ORF has best sequence match to the invasin locus of Yersinia, and contains at least 12 IgG repeat domains. Two other ORF areas contain multiple IgG repeats with best sequence matches to Yersinia and, in Yersinia pestis, to invasin. The first of these matches is to a region containing three ORFs with multiple IgG repeats (4, 4 and 1, respectively). The final ORF with similarity to invasin is ARN_09430. Two of these five ORFs are identified as carrying signal peptides (ARN_19810 and ARN_09430). It should be noted that genes in this class are also secreted through the type III secretion system or phage.

The A. nasoniae genome carries two ORFs with strong sequence similarity to the attachment invasion locus (ail) protein of Yersinia. Yersinia ail encodes a 179 amino acid 17 kDa outer membrane protein that is important in cell adhesion and entry. One of the two A. nasoniae ORFs shows similarity to ail along its length, with Yersinia ail as the best BLASTp match (e < 1 × 10−36); the second has Photorhabdus and Yersinia ail best matches (e < 1 × 10−44). It is very likely that these ORFs both represent secreted outer membrane proteins in Arsenophonus, as both carry signal sequences.

Arsenophonus nasoniae also carries six ORFs containing at least partial haemagglutination activity domains (HADs), of which four are predicted to contain signal peptides (supplementary material Table S5). One of these, ARN_32080, contains a complete HAD, is predicted to carry a signal peptide and is only one codon shorter than its Proteus equivalent. Whilst A. nasoniae has several HAD-containing ORFs, it clearly has no homologs to the filamentous haemagglutinins of, e.g. Bordetella sp., these being much larger ORFs (Domenighini et al., 1990).

A. nasoniae also contains ORFs with sequence similarity to an agglutinin, a tia-like ORF (e < 1 × 10−42), which in enterotoxigenic E. coli allows their adherence to and invasion of human colon- and ileum-derived epithelial cells (Mammarappallil & Elsinghorst, 2000). The A. nasoniae ORF is fragmented by contig gaps and contains a stop codon between the region containing the signal sequence and the rest of the ORF. Further work is needed to determine its functionality.

Finally, four fimbrial adhesion operons are detected, which are most likely assembled via the chaperone-usher pathway (Soto & Hultgren, 1999). Each operon contains at least one fimbrial subunit protein, chaperone, outer membrane usher protein and adhesin. All four operons contain a fimbrial protein with a complete FimA domain, a chaperone gene with either a FimC domain or both the N- and C-terminus chaperone domains and an outer membrane usher gene with complete FimD or usher superfamily domain. As expected, the adhesin genes show weak sequence preservation. Fimbrial adhesins are comprised of an N-terminus receptor-binding domain fused to a C-terminus pilin domain; any homology between A. nasoniae adhesins and those of other bacteria is confined to this C-terminus region. The lack of sequence conservation at the N-terminal region may be indicative of host specialization.

Discussion

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Galan & Bliska (1996) described the relationship between bacteria and host as a complex cross talk. Examining the gene content of A. nasoniae provides detailed information on one side of this conversation and allows insight into the potential interactions occurring between bacteria and insect. Furthermore it allows us to marry gene content to lifestyle, to speculate which interactions are occurring during which stages of the symbiosis. In this respect, it is important to recognize that there are three main stages in the life cycle of A. nasoniae; bacterial invasion of the wasp larva following per oral exposure, spread within the wasp host including survival in the face of the host immune response, and finally injection into the fly pupa on oviposition, replication in the fly host, and re-infection of the feeding wasp larvae.

During the invasion phase, genetic mechanisms are required for A. nasoniae to survive digestive enzymes, adhere to the Nasonia gut epithelia, and then pass through it into the host haemocoel. Membrane-bound secreted peptidases and the SERPIN ecotin are candidates that may improve survival in the gut environment, removing or inhibiting gut enzymes that would otherwise digest the bacterium. As far as adhesion is concerned, fimbrial-like, intimin/invasin-like, and haemagluttinin-like ORFs may be of importance in the initial binding to gut epithelia prior to internalization, as in the mammalian system. It is likely that some of the effectors secreted by the type III secretion systems are involved in entry and passage into and through the gut epithelia; these are known to be important in gut epithelia transit for Salmonella and Yersinia in mammal hosts (Hueck, 1998). The Tc complex, important in gut entry in other bacteria, appears to be undergoing pseudogenization in A. nasoniae, and is thus a less likely candidate in this system.

Arsenophonus nasoniae then establishes intercellular infections in all major tissues of the wasp host (Huger et al., 1985). Their spread could involve the use of adhesins (both cell-surface and pilin-mediated), flagella motility (as is seen in the movement of the related bacterium, Riesia (Perotti et al., 2007)) and chitin-binding apparatus. As the bacteria exist extracellularly in the host, they are likely to be exposed to the full force of the immune system. A variety of ORFs are akin to TTSS effectors known to interfere with cell signalling and innate immunity, including YopJ-, OspG- and SopA-like ORFs. One can hypothesize that the ORFs related to apoptosis-inducing protein and CNF1 genes, as well as TTSS secreted effectors (YopH/SptP-like), may be important in preventing death following phagocytosis. The ecotin-like ORF may also be important in this context, ecotin being a SERPIN known to affect digestion by neutrophils.

Arsenophonus nasoniae first attracted attention because of its son-killer phenotype, and this paper is the first to describe potential virulence components of a male-killing bacterium. However, the biochemical mechanism by which A. nasoniae induces its killing of male offspring remains elusive. Ferree et al. (2008) have shown that death of male eggs results from a lack of maternal centrosome production (male offspring ensue from unfertilized eggs and receive their centrosomes maternally, rather than from the sperm). It is also known that the factor moves across eukaryotic cell membranes – implying it is either a peptide or small molecule, or that it is injected through the type III system. If the polyketide synthase system is functional, A. nasoniae may be able to synthesize small molecules, and this would present a tempting case study. The alternative hypothesis is that the agent is an effector placed into the eggs through the secretion system, or a secreted molecule that possesses the ability to translocate across eukaryotic membranes.

Some of the likely effector molecules are also rather elusive in terms of their function. Particularly enigmatic is the island of ORFs of the LRR/mcf family. These proteins carry a pattern recognition motif fused to one or two toxin elements (RTX/toxin B). Another interesting locus is the ORF with sequence similarity to cnf1. This gene possesses an N terminal that appears functional with respect to delivery of the C terminal into host cells. However, the C terminal is without matches in BLASTp database search, and thus the function of the protein remains unclear.

Our investigation of the virulence components of the A. nasoniae genome has led to one broad conclusion: that A. nasoniae possesses ORFs related to virulence genes from a wide variety of gammaproteobacteria. Although the A. nasoniae core genome shows significantly closest similarity to Photorhabdus and Proteus (Darby et al., 2009), its virulence genome shows much more equal representation from a range of different gammaproteobacteria. It is common to find effector molecules with greatest similarity to those in Yersinia, Salmonella, Vibrio, Pseudomonas or Serratia. Predictably, many of the anti-microbial genes in Photorhabdus are absent (A. nasoniae does not have to defend an insect corpse from invasion by other bacteria, as it maintains a live host).

The genome can be compared with other ‘inherited parasites’. Unlike Wolbachia, there is a limited array of ORFs with ankyrin or TPR domains that apparently function in interaction with eukaryotic proteins. Rather, there are a range of ORFs associated with interactions with eukaryotic cell surfaces and epithelia (invasin, intimin, adhesin homologues). There are also ORFs likely to be involved in interactions with the host antimicrobial immune system, which are not present in other reproductive parasites. These differences reflect the distinct lifestyle of intra and inter-cellular symbionts, and between transovarially transmitted symbionts, and transovum transmitted ones such as A. nasoniae that must cross the gut epithelium.

The genome also provides hints as to the history of A. nasoniae. Pseudogenized areas with homology to insecticidal regions from a wide range of different bacteria are observed, such as the two relic pathogenicity islands and Tc region. This pattern of pseudogenization suggests recent descent from a bacterium with more general insecticidal properties, possibly in the form of a microbe that demonstrated straightforward pathogenesis – as in Photorhabdus. Alternatively, A. nasoniae may historically have been a secondary symbiont, using the products from currently pseudogenized regions in interactions with natural enemies of the host, as speculated for the RTX and leukotoxin-like genes in the beneficial secondary symbiont Hamiltonella (Moran et al., 2005). Overall A. nasoniae's virulence factor complement is similar to Hamiltonella's. Both carry two TTSSs and a diverse assortment of effectors, multiple, possibly pseudogenized, RTX toxins and at least one CNF1-like gene (Degnan et al., 2009). With respect to this latter hypothesis, we would note that other Arsenophonus strains are likely to be secondary symbionts of insects, being vertically transmitted and showing no obvious pathology or reproductive manipulation phenotypes. However, the widespread distribution of this genus (Hypsa & Dale, 1997; Dale et al., 2004; Thao & Baumann, 2004; Werren, 2004; Duron et al., 2008) indicates significant levels of horizontal transmission and perhaps the existence of pathogenic Arsenophonus types in nature. Little is known about the phenotypes of Arsenophonus species found in other hosts, and comparative genomics is likely to be informative.

Beside the direct functional investigation of the candidate genes outlined above, comparative genomics may provide a useful tool in determining which aspects of A. nasoniae biology are likely to be involved in travel through gut epithelia and male-killing. In particular, sequencing of Arsenophonus strains that are apparently purely vertically transmitted, and act as secondary symbiont strains, may serve to develop hypotheses as to which elements of the genome are involved with invasion and male-killing. It may also serve to illuminate the elements of microbial genome evolution that occur directly following a shift in lifestyle towards obligate symbiosis.

The availability of both the host genome (Werren et al., 2010) and that of A. nasoniae provides new avenues for functional investigation of both host invasion and mechanisms of male-killing. For example, expression studies of both the host and bacterium could indicate candidate mechanisms for inhibition of maternal centrosomes. Proteomic studies of host and bacteria during oogenesis can also provide information on candidate molecules, and the availability of host and bacterial sequences greatly assists in proteomic approaches.

Experimental procedures

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

The A. nasoniae genome was sequenced, assembled and putative ORFs detailed as described by Darby et al. (2010). All predicted ORFs were subjected to BLASTp (Altschul et al., 1990) search against the NCBI nr database and Interpro domain searches on the standalone Interproscan tool (data release 19, 29th January 2009) running the blastprodom, coils, gene3d, hmmpanther, hmmpir, hmmpfam, hmmsmart, hmmtigr, fprintscan, patternscan, profilescan, superfamily, seg, signalp, and tmhmm programs (Quevillon et al., 2005). A virulence factor database (Chen et al., 2005) overlay was also prepared using local BLAST. Based on BLASTp, domain and virulence association information, a ‘virulence set’ of ORFs with potential roles in the interaction between bacterium and insect were selected at significance levels set below 1 × 10−10. To further assess functionality, ORFs of this subset were compared with their most similar matches in terms of size, presence of active domains or motifs, and location or synteny. ABC transport systems were analysed by reciprocal BLAST and phylogenetic analysis. Initially this was undertaken en-masse, comparing all twenty-one ORFs annotated as ABC-like with their BLAST hits from OrthoMCL (http://www.orthomcl.org/) with e-value < e−100, aligned using MUSCLE (Edgar, 2004) with well aligned blocks extracted using Gblocks (Talavera & Castresana, 2007), and assembled using PhyML (Guindon & Gascuel, 2003) (see supplementary material Fig. S3). All ABC ORFs referred to in the text were verified by creating phylogenies using the best BLASTp hit to the first eight species in the NCBI nr database, alignment and neighbour-joining tree assembled using clustalX v2.0.11 (http://www.clustal.org/).

We also estimated the component of the genome that is secreted into periplasmic space through the Sec/Tat systems, using the programme SignalP (Bendtsen et al., 2004). Presence of a signal peptide was estimated using neural networks (NN) and hidden Markov models (HMM) trained on Gram-negative bacteria. A signal peptide was considered present under the NN models if all criteria were fulfilled for presence of a signal peptide. A signal peptide was considered present under HMM output if the protein was identified as ‘secreted’ (S).

Acknowledgements

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Sequencing support for this project is provided by The Centre for Genomics and Bioinformatics (CGB) at Indiana University, which is supported in part by the METACyt Initiative of Indiana University, funded in part through a major grant from the Lilly Endowment, Inc. Computer support was provided by the University Information Technology Services (UITS) and by the CGB computing group. We thank the group leaders Phillip Steinbachs (CGB) and Craig Stewart and Richard Repasky (UITS). Additional support is provided by the Indiana Center for Insect Genomics project funded through the Indiana 21st Century Research and Technology Fund. We thank the CGB genome sequencing team, including Jade Buchanan-Carter and Zachary Smith. TW was in part funded by the MRC; GH by a grant from the NERC. Amanda Avery and Jorge Azpurua are thanked for isolation and culturing of bacterial strains. Work by JW was supported by the US National Science Foundation (EF-0328363).

References

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Supporting Information

  1. Top of page
  2. Abstract
  3. Introduction
  4. Results
  5. Discussion
  6. Experimental procedures
  7. Acknowledgements
  8. References
  9. Supporting Information

Figure S1. Two Arsenophonus nasoniae pentapeptide repeat containing open reading frames and their virulence effector-like neighbours. Green arrows depict pentapeptide repeat regions, containing 8 copies each that can be approximately described as A(D/N)LXX, where X can be any amino acid (Marchler-Bauer et al., 2007).

Figure S2. Multiple sequence alignment by ClustalX v2.0.11 of the N terminus of the Arsenophonus nasoniae OmpA-like open reading frame (ARN_16100) with OmpA from SOPE (Sitophilus oryzae principal endosymbiont, GenBank accession no. EU426969), Sodalis (BAE74305), CMS (C. melbaesymbiont, EU684475), Hamiltonella defensa (EU682308), Photorhabdus luminescens (NP929054), Yersinia pestis (NP670036), Escherichia coli 536 (UPEC, CP000247), Salmonella typhimurium (X02006) and Shigella flexneri (AF234271). External loop structures are underlined and labelled L1-4. Based on Fig. 3 of Weiss et al. (2008).

Figure S3. Phylogenetic analysis of Arsenophonus nasoniae ABC-like ORFs. Phylogeny A shows all twenty-one open reading frames (ORFs) annotated as ABC-like, with BLAST hits from OrthoMCL (http://www.orthomcl.org/) of e-value < e−100, aligned using MUSCLE (Edgar, 2004) with well aligned blocks extracted using Gblocks (Talavera & Castresana, 2007), and assembled using PhyML (Guindon & Gascuel, 2003). Phylogenies B to L show individual analyses of ABC-like ORFs mentioned in the text. Phylogenies B to E show A. nasoniae ORFs ARN_01800-01830, iron ABC transporter-like ORFs. Phylogenies F to I show A. nasoniae ORFs ARN_00840-00860 and ARN_19400, RTX ABC transporter-like ORFs. Phylogenies J to L show A. nasoniae ORFs ARN_29850-29870, feoABC-like ORFs. Phylogenies were created using the best BLASTp hit to the first eight species in the NCBI nr database, alignment and neighbour-joining trees were assembled using clustalX v2.0.11 (http://www.clustal.org/).

Table S1. Function of open reading frame (ORF) mentioned in text and corresponding ORF reference numbers within the Arsenophonus nasoniae draft genome

Table S2. Four open reading frames in the Yersinia-like Type III secretion system of Arsenophonus nasoniae without complete motifs relevant to their function. Lengths are given in amino acids (AA)

Table S3. Open reading frames in the Arsenophonus nasoniae genome with potential contribution to Polyketide synthesis

Table S4. A list of Arsenophonus nasoniae open reading frames predicted to contain a Signal Peptide, their location within the A. nasoniae draft genome and putative annotation, as ascertained from reciprocal BLAST

Table S5. Haemagglutination activity domain (HAD) containing open reading frames (ORFs) in the Arsenophonus nasoniae genome. ORF lengths are given in amino acids (aa). Domains were identified using aConserved Domain Database (CDD) search (Marchler-Bauer et al., 2007). Presence of a signal peptide was estimated using neural networks (NN) and hidden Markov models (HMM) trained on Gram negative bacteria

Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.

FilenameFormatSizeDescription
IMB_963_sm_Figure_S1-S3.pdf5251KSupporting info item
IMB_963_sm_Table_S1-S5.doc469KSupporting info item

Please note: Neither the Editors nor Wiley Blackwell are responsible for the content or functionality of any supporting materials supplied by the authors. Any queries (other than missing material) should be directed to the corresponding author for the article.