Diversity and distribution of hemerythrin-like proteins in prokaryotes

Authors


  • Editor: Rustam Aminov

Correspondence: F. Bruce Ward, Institute of Cell Biology, University of Edinburgh, Mayfield Road, Edinburgh EH9 3JR, UK. Tel.: +44 (0) 131 650 5370; fax: +44 (0) 131 651 3331; e-mail: fward@staffmail.ed.ac.uk

Abstract

Hemerythrins are oxygen-binding proteins found in the body fluids and tissues of certain invertebrates. Oxygen is bound at a nonheme iron centre consisting of two oxo-bridged iron atoms bound to a characteristic set of conserved histidine: aspartate and glutamate residues with the motifs H–HxxxE–HxxxH–HxxxxD. It has recently been demonstrated biochemically that two bacterial proteins bearing the same motifs do in fact possess similar iron centres and bind oxygen in the same way. The recent profusion of prokaryotic genomic sequence data has shown that proteins bearing hemerythrin motifs are present in a wide variety of bacteria, and a few archaea. Some of these are short proteins as in eukaryotes; others appear to consist of a hemerythrin domain fused to another domain, generally a putative signal transduction domain such as a methyl-accepting chemotaxis protein, a histidine kinase, or a GGDEF protein (cyclic di-GMP synthase). If, as initial evidence suggests, these are in fact hemerythrin-like oxygen-binding proteins, then their diversity in prokaryotes far exceeds that seen in eukaryotes. Here, a survey is presented of prokaryotic protein sequences bearing hemerythrin-like motifs, for which the designation ‘bacteriohemerythrins’ is proposed, and their functions are speculated.

Hemerythrins in eukaryotes

Hemerythrins are nonheme oxygen-binding proteins in which O2 binds to a di-iron centre (Stenkamp, 1994). They occur as intracellular oxygen-binding proteins in the invertebrate phyla Sipunculida (peanut worms), Priapulida (priapulid worms), some Annelida (segmented worms, including leeches and polychaete worms) and Brachiopoda (lamp shells). They have been best studied in the sipunculid worms. These organisms generally possess a multimeric (octomeric or trimeric) hemerythrin in blood cells, and a monomeric myohemerythrin in muscle cells (Klippens et al., 1968; Loehr et al., 1978; Long et al., 1992); these presumably play roles similar to haemoglobin and myoglobin in vertebrates. Brachiopods possess an octomeric hemerythrin (Yano et al., 1991; Negri et al., 1994), but no myohemerythrin seems to have been reported. Several crystal structures have been obtained, and the oxygen-binding characteristics of these proteins have been well studied. Oxygen binding is reported to be cooperative in the brachiopod hemerythrins, but not in the sipunculid hemerythrins (Zhang & Kurtz, 1991). More recently, myohemerythrin-like proteins have been reported in several annelid worms (Takagi & Cox, 1991; Baert et al., 1992; Demuynck et al., 1993; Coutte et al., 2001; Vergote et al., 2004). A myohemerythrin-like protein has also been reported in the pathogenic amoeba Naegleria fowleri, which can cause a rapidly fatal amoebic meningoencephalitis (Shin et al., 2001). Two hemerythrin-like sequences have also been detected in the genome of the coral Nematostella vectensis (loci NEMVEDRAFT_v1g100902 and NEMVEDRAFT_v1g220584). Thus, the distribution of hemerythrins is quite uneven in the animal kingdom, raising the question of whether they represent an ancestral protein lost in other lineages, or a horizontally acquired protein.

Hemerythrins in prokaryotes

Two recent reports describe hemerythrin-like proteins in bacteria, specifically Methylococcus capsulatus and Desulfovibrio vulgaris. Xiong et al. (2000) reported that a putative methyl-accepting chemotaxis protein, DcrH, of Desulfovibrio vulgaris possessed a C-terminal hemerythrin-like domain. This C-terminal hemerythrin domain was expressed in Escherichia coli and purified. Spectroscopic evidence was provided for the presence of a hemerythrin-like iron centre able to bind oxygen. The authors proposed that this oxygen-binding domain acted as an oxygen sensor for aerotaxis. The authors also reported that hemerythrin-like sequences occurred in Methanococcus (now Methanocaldococcus) jannaschii and Aquifex aeolicus. The structure of the hemerythin-like domain of DcrH was further investigated (Isaza et al., 2006), showing considerable similarity to animal hemerythrins, but also significant differences, possibly related to the facile auto-oxidation observed in DcrH (discussed further below).

Single-domain hemerythrin-like proteins have also been reported in bacteria. Kao et al. (2004), studying copper-induced proteins in Methylococcus capsulatus, noted that the sequence of one of the induced proteins resembled that of hemerythrin. They postulated that the protein served to distribute oxygen efficiently throughout the cell, to supply oxygen to methane monooxygenase. The Methylococcus hemerythrin was expressed in Escherichia coli and purified to homogeneity (Karlsen et al., 2005). Spectroscopic analysis showed that iron was present, as in animal hemerythrins. The authors conducted a search for other hemerythrin-like proteins encoded in prokaryotic genome sequences, and reported hemerythrin-like sequences in 21 organisms representing five bacterial phyla.

Here this investigation has been extended, leading to the detection of more than 400 distinct hemerythrin-like sequences in prokaryotic genomes.

Sequence structure and conserved motifs in hemerythrins; how can a hemerythrin be recognized?

Figure 1 shows an alignment of the sequences of the well-characterized hemerythrin of Themiste dyscritum, hemerythrin and myohemerythrin of Phascolopsis gouldii, and the two bacterial hemerythrins that have been biochemically characterized, from Methylococcus capsulatus and Desulfovibrio vulgaris. The residues involved in binding the iron centre are H25, H54, E58, H73, H77, H101, and D106 (numbering based on the T. dyscritum sequence, for which a crystal structure has been determined). These form the characteristic motifs H … HxxxE … HxxxH … HxxxxD. The spacing between these motifs is also conserved, with some variations. This led to the proposal that any translated protein sequence that possesses these motifs with similar spacing, and that possesses sufficient overall similarity to hemerythrins to be detected in a blast search, may be considered a putative hemerythrin. It should be noted that many other residues are invariant within the animal hemerythrins, but are not well conserved in putative bacterial hemerythrin sequences. One exception is W10, which is well conserved, although not invariant. The hydrophobic amino acid residues lining the oxygen-binding pocket of hemerythrin (F55, F80, W97, L98, and I102 in the T. dyscritum sequence) are not invariant but are generally substituted by other hydrophobic residues (discussed further in ‘Possible functions of hemerythrins’ section).

Figure 1.

 Alignment of some hemerythrin sequences. Sequence comparison, using clustalw, of three eukaryotic hemerythrins: Themiste dyscritum octomeric hemerythrin, Phascolopsis gouldii octomeric hemerythrin, Phascolopsis gouldii myohemerythrin, and two bacterial hemerythrins from Methylococcus capsulatus and Desulfovibrio vulgaris. The respective GenBank accession numbers are 123041, 21264441, 232242 and 253340, MCA0715 and DvulDRAFT_0258.

Taxonomic distribution of putative hemerythrin-like proteins in prokaryotes

Using these criteria, the occurrence of putative hemerythrin-like proteins in prokaryotes was investigated by searching the available translated nucleotide and protein sequence databases. Putative hemerythrins were found in a wide variety of Bacteria and Archaea from different taxonomic groups (Table 1; for more detailed information, see supplementary Table S1). Some of these sequences were missing one of the characteristic histidine residues (for example, the first histidine of the HxxxH motif was occasionally replaced by glutamine, and in some cases the histidine corresponding to H25 was not detected), but where they conformed in all other respects to the hemerythrin pattern, including significant sequence similarity to hemerythrins that did possess all of the expected iron-binding residues, these were also included in the analysis.

Table 1.   Hemerythrin-like protein sequences detected in prokaryotes
  1. MCP, methyl-accepting chemotaxis protein domain; CNBP, cyclic nucleotide binding domain; HK, histidine kinase; RR, response-regulator; GGDEF, cyclic di-GMP synthase and AC, adenylyl cyclase domain.

PHYLUM PROTEOBACTERIA (587 sequences)
Class Alphaproteobacteria (130 sequences)
 Order Rickettsiales (29 sequences): none detected
 Order Rhizobiales (45 sequences)
  Mesorhizobium sp., BNC1: 1 single-domain hemerythrin
  Rhodopseudomonas palustris (3 strains): 1 hemerythrin-GGDEF protein or GGDEF-hemerythrin
 Order Sphingomonadales (8 sequences): none detected
 Order Rhodospirillales (8 sequences)
  Magnetospirillum magneticum: 23 single-domain hemerythrins, 4 MCP-hemerythrins, 2 CNBP-hemerythrins, 1 hemerythrin-response regulator, 2 hemerythrins with putative domains of unknown function
  Magnetospirillum magnetotacticum: 21 single-domain hemerythrins, 3 MCP-hemerythrins, 2 CNBP-hemerythrins, 2 hemerythrin-response regulators, 1 hemerythrin with putative domain of unknown function
  Magnetospirillum gryphsiwaldense: 17 single-domain hemerythrins, 1 MCP-hemerythrin, 1 hemerythrin-MCP, 2 CNBP-hemerythrins, 1 hemerythrin-response regulator, 1 hemerythrin-histidine kinase, one hemerythrin with 2 putative domains of unknown function
  Rhodospirillum rubrum: 3 single-domain hemerythrins, 1 di-hemerythrin, 1 MCP-hemerythrin
 Order Rhodobacterales (34 sequences): none detected
 Order Caulobacterales (3 sequences): none detected
 Order Parvularculales (1 sequence): none detected
Class Betaproteobacteria (121 sequences)
 Order Burkholderiales (105 sequences)
  Burkholderia mallei (9 strains): 1 single-domain hemerythrin
  Burkholderia multivorans (1 strain): 1 single-domain hemerythrin
  Burkholderia phymatum (1 strain): 1 single-domain hemerythrin
  Burkholderia pseudomallei (11 strains): 1 single-domain hemerythrin
  Burkholderia thailandensis (1 strain): 1 single-domain hemerythrin
  Burkholderia vietnamiensis (1 strain): 1 single-domain hemerythrin
  Burkholderia xenovorans (1 strain): 3 single-domain hemerythrins
  Ralstonia eutropha (2 strains): 1 single-domain hemerythrin
  Ralstonia metallidurans (1 strain): 1 single-domain hemerythrin
  Ralstonia pickettii (2 strains): 1 or 2 single-domain hemerythrins
  Ralstonia solanacearum (2 strains): 1 single-domain hemerythrin
  Janthinobacterium lividum (1 strain): 2 single-domain hemerythrins
  Bordetella avium (1 strain): 1 single-domain hemerythrin
  Verminephrobacter eisenae (1 strain): 1 single-domain hemerythrin
  Acidovorax avenae (1 strain): 1 single-domain hemerythrin
  Acidovorax sp. (1 strain): 2 single-domain hemerythrins
  Polaromonas naphthalenivorans (1 strain): 3 single-domain hemerythrins
  Delftia acidovorans (1 strain): 2 single-domain hemerythrins
  Comamonas testosteroni (1 strain): 4 single-domain hemerythrins
  Rhodoferax ferrireducens (1 strain): 2 GGDEF-hemerythrins, 1 hemerythrin with possible domain of unknown function
  Herminiimonas arsenicoxydans: 1 single-domain hemerythrin
 Order Neisseriales (5 sequences): none detected
 Order Rhodocyclales (3 sequences)
  Azoarcus sp. (1 strain): 5 single-domain hemerythrins
  Dechloromonas aromatica (1 strain): 10 single-domain hemerythrins, 1 GGDEF-hemerythrin, 1 hemerythrin-histidine kinase
 Order Nitrosomonadales (5 sequences): none detected
 Order Methylophiliales (2 sequences): none detected
 Order Hydrogenophiliales (1 sequence): none detected
Class Gammaproteobacteria (130 sequences)
 Order Vibrionales (41 sequences)
  Vibrio cholerae (10 strains): 1 hemerythrin-GGDEF protein
  Vibrio harveyi (1 strain): 2 hemerythrin-GGDEF proteins
  Vibrio parahaemolyticus (1 strain): 1 hemerythrin-GGDEF protein
  Vibrio shilonii (1 strain): 1 hemerythrin-GGDEF protein
  Vibrio splendidus (1 strain): 1 hemerythrin-GGDEF protein
  Vibrio vulnificus (2 strains): 1 hemerythrin-GGDEF protein
 Order Enterobacteriales (89 sequences): none detected
 Order Legionellales (10 sequences): none detected
 Order Pasteurellales (23 sequences): none detected
 Order Thiotrichales (16 sequences)
  Beggiatoa sp. PS: 2 AC-hemerythrins
 Order Aeromonadales (2 sequences)
  Aeromonas hydrophila (1 strain): 1 GGDEF-hemerythrin, 1 MCP-hemerythrin
  Aeromonas salmonicida (1 strain): 1 GGDEF-hemerythrin
 Order Chromatiales (4 sequences)
  Alkaliliminicola ehrlichii (1 strain): 1 single-domain hemerythrin
 Order Pseudomonadales (25 sequences)
  Acinetobacter baumannii (1 strain): 1 single-domain hemerythrin
  Pseudomonas aeruginosa (6 strains): 1 single-domain hemerythrin
  Azotobacter vinelandii (1 strain): 1 single-domain hemerythrin
 Order Alteromonadales (38 sequences)
  Shewanella spp. (14 strains): 1 MCP-hemerythrin
  Shewanella woodyi (1 strain): 1 single-domain hemerythrin
  Idiomarina spp. (2 strains): 1 MCP-hemerythrin
  Colwellia psychrerythraea (1 strain): 3 single-domain hemerythrins, 2 hemerythrin-GGDEF proteins, 1 RR-hemerythrin, 1 PAS-HK-RR-hemerythrin
  Moritella sp.: 1 hemerythrin-GGDEF protein
 Order Oceanospirillales (7 sequences)
  Oceanospirillum sp. (1 strain): 3 single-domain hemerythrins
  Alcanivorax borkumensis (1 strain): 1 single-domain hemerythrin
 Order Cardiobacteriales (1 sequence): none detected
 Order Xanthomonadales (12 sequences)
  Xanthomonas campestris (3 strains): 1 single-domain hemerythrin
  Xanthomonas axonopodis (1 strain): 1 single-domain hemerythrin
  Stenotrophomonas maltophilia (1 strain): 1 single-domain hemerythrin
 Order Methylococcales (1 sequence)
  Methylococcus capsulatus (1 strain): 1 single-domain hemerythrin
Class Deltaproteobacteria (25 sequences)
 Order Desulfovibrionales (5 sequences)
  Desulfovibrio vulgaris (2 strains): 1 single-domain hemerythrin, 1 GGDEF-hemerythrin, 2 MCP-hemerythrins
  Desulfovibrio desulfuricans (1 strain): 1 single-domain hemerythrin, 2 MCP-hemerythrins
 Order Desulfuromonadales (9 sequences)
  Geobacter uraniumreducens (1 strain): 2 single-domain hemerythrins
  Geobacter bemidjiensis (1 strain): 6 single-domain hemerythrins, 1 MCP-hemerythrin
  Geobacter metallireducens (1 strain): 2 single-domain hemerythrins
  Geobacter lovleyi (1 strain): 3 single-domain hemerythrins, 1 MCP-hemerythrin, 1 GGDEF-hemerythrin
  Geobacter sulfurreducens (1 strain): 5 single-domain hemerythrins
  Geobacter sp. (1 strain): 2 single-domain hemerythrins
  Pelobacter carbinolicus (1 strain): 1 single-domain hemerythrin
  Pelobacter propionicus (1 strain): 2 single-domain hemerythrins, 1 MCP-hemerythrin
  Desulfuromonas acetoxidans (1 strain): 5 single-domain hemerythrins, 2 hemerythrin-GGDEF proteins, 1 GGDEF-hemerythrin, 1 hemerythrin-histidine kinase
 Order Myxococcales (5 sequences)
  Anaeromyxobacter spp. (2 strains): 8 single-domain hemerythrins
 Order Desulfobacterales (2 sequences)
  Desulfotalea psychrophila (1 strain): 1 single-domain hemerythrin, 1 hemerythrin-GGDEF protein, 1 hemerythrin-histidine kinase
 Order Bdellovibrionales (1 sequence)
  Bdellovibrio bacteriovorus (1 strain): 1 single-domain hemerythrin
 Order Syntrophobacterales (2 sequences): none detected
 Other Deltaproteobacteria (1 sequence)
  Deltaproteobacterium MLMS-1 (1 strain): 2 single-domain hemerythrins, 1 putative phosphatase-hemerythrin
Class Epsilonproteobacteria (27 sequences)
 Order Campylobacterales (24 sequences)
  Campylobacter jejuni (9 strains): 2 to 5 hemerythrins, probably single domain
  Campylobacter coli (1 strain): 6 hemerythrins, probably single domain
  Campylobacter fetus (1 strain): 1 single-domain hemerythrin and one hemerythrin-GGDEF protein
  Campylobacter upsaliensis (1 strain): 4 hemerythrins, probably single domain
  Campylobacter lari (1 strain): 2 hemerythrins, probably single domain
  Helicobacter hepaticus (1 strain): 2 hemerythrins, probably single domain
  Thiomicrospira denitrificans (1 strain): 1 hemerythrin-GGDEF protein
 Order Nautiliales (1 sequence)
  Caminibacter mediatlanticus: 1 single-domain hemerythrin
 Other Epsilonproteobacteria
  Nitratiruptor sp.: 1 hemerythrin-GGDEF protein
Other Proteobacteria
  Magnetococcus sp. (1 strain): 4 single-domain hemerythrins, 7 MCP-hemerythrins, 1 CNBP-hemerythrin, 1 response regulator-hemerythrin, 2 hemerythrins with domains of unknown function
  Mariprofundus ferrooxydans (1 strain): 1 single-domain hemerythrin, 1 hemerythrin-GGDEF protein
PHYLUM FIRMICUTES (238 sequences)
Class Bacilli (148 sequences)
 Order Bacillales (84 sequences): none detected
 Order Lactobacillales (64 sequences)
  Symbiobacterium thermophilum (1 strain): 1 single-domain hemerythrin
Class Clostridia (57 sequences)
 Order Clostridiales (52 sequences)
  Clostridium acetobutylicum (1 strain): 1 single-domain hemerythrin
  Clostridium beijerinckii (1 strain): 7 single-domain hemerythrins
  Clostridium botulinum (4 strains): 1 single-domain hemerythrin
  Clostridium cellulolyticum (1 strain): 2 single-domain hemerythrins
  Clostridium kluyveri (1 strain): 1 single-domain hemerythrin
  Clostridium phytofermentans (1 strain): 2 single-domain hemerythrins
  Clostridium sp. (1 strain): 3 single-domain hemerythrins
  Clostridium tetani (1 strain): 1 single-domain hemerythrin
  Clostridium thermocellum (1 strain): 2 single-domain hemerythrins
  Dorea longicatena (1 strain): 1 single-domain hemerythrin
  Alkaliphilus metalliredigens (1 strain): 1 single-domain hemerythrin
  Caldocellulosiruptor saccharolyticus (1 strain): 2 single-domain hemerythrins
  Syntrophomonas wolfei (1 strain): 1 single-domain hemerythrin
  Desulfitobacterium hafniense (2 strains): 2 single-domain hemerythrins
  Ruminococcus gnavus (1 strain): 1 single-domain hemerythrin
  Ruminococcus torques (1 strain): 1 single-domain hemerythrin
  Ruminococcus obeum (1 strain): 1 single-domain hemerythrin
 Order Thermoanaerobacteriales (4 sequences)
  Thermoanaerobacter tengcongensis (1 strain): 1 single-domain hemerythrin
 Order Haloanaerobiales (1 sequence): none detected
Class Mollicutes (33 sequences): none detected
PHYLUM AQUIFICAE (2 sequences)
Class Aquificae (2 sequences)
  Aquifex aeolicus (1 strain): 1 single-domain hemerythrin
  Hydrogenobacter thermophilus (1 strain): 1 single-domain hemerythrin
PHYLUM CYANOBACTERIA (40 sequences)
 Order Chroococcales (19 sequences)
  Thermosynechococcus elongatus (1 strain): 2 single-domain hemerythrins
 Order Prochlorales (12 sequences): none detected
 Order Oscillatoriales (3 sequences): none detected
 Order Nostocales (4 sequences): none detected
Class Gloeobacteria (1 sequence): none detected
PHYLUM SPIROCHAETES (19 sequences)
Class Spirochaetes (19 sequences)
  Treponema denticola (1 strain): 2 single-domain hemerythrins
PHYLUM PLANCTOMYCETES (3 sequences)
  Candidatus Kuenenia stuttgartensis (1 strain): 1 single-domain hemerythrin, 1 CNBP-hemerythrin
PHYLUM ACIDOBACTERIA (2 sequences)
  Solibacter usitatus (1 strain): 3 single-domain hemerythrins
  Acidobacteria sp. (1 strain): 5 single-domain hemerythrins
PHYLUM CHLOROFLEXI (8 sequences): none detected
PHYLUM THERMOTOGAE (5 sequences): none detected
PHYLUM BACTEROIDETES (32 sequences): none detected
PHYLUM CHLOROBI (10 sequences): none detected
PHYLUM ACTINOBACTERIA (62 sequences): none detected
PHYLUM VERRUCOMICROBIA (1 sequence): none detected
PHYLUM LENTISPHAERAE (2 sequences): none detected
PHYLUM CHLAMYDIAE (11 sequences): none detected
PHYLUM DEINOCOCCUS-THERMUS (5 sequences): none detected
PHYLUM FUSOBACTERIA (3 sequences): none detected
PHYLUM NITROSPIRAE (2 sequences): none detected

Considering the number of genomes sequenced, hemerythrin-like proteins appear to be rare in the Archaea, Bacteroidetes, Cyanobacteria, Actinobacteria, and Spirochaetes. They may be abundant in the Acidobacteria, because both of the genome sequences available from this phylum encode multiple putative hemerythrin-like proteins, but too few genomes are available from this poorly studied group for this to be definitive. In the Firmicutes (low-GC Gram-positive bacteria), they are found in many sequences from the anaerobic order Clostridiales, but not in any of the numerous sequences of aerobic or aerotolerant Bacillales, or Mollicutes, and only in a single representative of the aerotolerant Lactobacillales. Within the Alphaproteobacteria, numerous hemerythrin-like sequences are present in Rhodospirillum, Magnetospirillum, and Magnetococcus, but otherwise they are found only as single sequences in three strains of Rhodopseudomonas and one of Mesorhizobium. Distribution is also patchy in the Gammaproteobacteria. Hemerythrin-like proteins appear to be common in the aquatic Gammaproteobacteria such as Aeromonas, Shewanella, Colwellia, Idiomarina, Vibrio, and Oceanospirillum. However, they are absent from the Enterobacteriales, and in the Pseudomonadales, are found in all sequenced genomes of the pathogenic species Pseudomonas aeruginosa but apparently not in any of the sequenced genomes of Pseudomonas putida, Pseudomonas fluorescens, or Pseudomonas syringae. The plant pathogens Xanthomonas spp. and the related Stenotrophomonas also possess a hemerythrin-like sequence with very high identity to that of P. aeruginosa.

Hemerythrin-like sequences are much more common in the sequenced Betaproteobacteria, being found in many sequenced species of the Burkholderiales, and in the Deltaproteobacteria, where the majority of sequenced genomes contain multiple hemerythrins. An interesting exception is the Myxococcales, where the single anaerobic representative, Anaeromyxobacter, possesses multiple putative hemerythrins, but the aerobic species of Myxococcus and Stigmatella appear to possess none. Within the Epsilonproteobacteria, the great majority of Campylobacter spp. appear to possess multiple hemerythrin-like proteins; they are also present in Helicobacter hepaticus but apparently not in Helicobacter pylori.

Overall, it seems that hemerythrin-like proteins are most abundant in microaerophilic and anaerobic species, or those such as Shewanella and Geobacter, which possess multiple pathways for anaerobic respiration, although single highly conserved putative single-domain hemerythrins occur in the mainly aerobic bacteria Pseudomonas, Stenotrophomonas, and Xanthobacter, as well as in Burkholderia spp. It may also be noteworthy that the overwhelming majority of organisms detected in the survey are motile.

During the present analysis, it became clear that the putative prokaryotic hemerythrin-like sequences fall into two categories: those around 130–170 residues in length, which appear to consist, like the animal hemerythrins, of a single hemerythrin domain, and those, like DcrH of Desulfovibrio vulgaris, which consist of a hemerythrin-like domain fused to another protein domain. While single-domain hemerythrin-like sequences are widely distributed, multi-domain hemerythrins seem to be characteristic of particular groups of bacteria, being especially common in the magnetotactic Alphaproteobacteria and the sulphate-reducing Deltaproteobacteria, as well as some aquatic Gammaproteobacteria. The multi-domain hemerythrins are further discussed in ‘Multi-domain hemerythrins’ section.

Phylogenetic analysis

The short sequence lengths and considerable divergence of the sequences made phylogenetic analysis by standard methods difficult. Standard phylogenetic methods did not establish robust rooted trees, due to the high sequence divergence and short lengths of the core hemerythrins, which are c. 120 amino acids in length. To draw comparisons between hemerythrin sequences, clustering algorithms were constructed based on alignment of the conserved motifs discussed in ‘Hemerythrins in eukaryotes’ section. In these comparisons, the eukaryote hemerythrins always formed a monophyletic group to the exclusion of all putative prokaryotic hemerythrins. The myohemerythrins and myohemerythrin-like proteins formed a coherent group, with the brachiopod hemerythrins forming another group and the sipunculid hemerythrins forming several separate groups (for a discussion of the evolution of sipunculid hemerythrins, see Vanin et al., 2006). No putative prokaryotic hemerythrin or group thereof reproducibly clustered with the eukaryote hemerythrins in the present analyses.

The putative prokaryotic hemerythrin-like proteins, on the other hand, never formed a monophyletic group excluding the animal hemerythrins; rather, the group containing the animal hemerythrins always branched within the larger tree of prokaryotic hemerythrin-like sequences. Thus, the question of whether animals acquired hemerythrins from bacteria, or vice versa cannot be addressed directly. On the basis of the wide range of prokaryotic taxa in which hemerythrin-like sequences were detected, Isaza et al. (2006) have suggested that the hemerythrin domain probably evolved within bacteria and was later acquired by animals. However, the fact that the known animal hemerythrins appear to form a monophyletic group suggests either that the transfer occurred only once, in an ancestral animal, in which case hemerythrins must have been lost from those animal taxa that no longer possess them, or that hemerythrins were acquired independently by various animal taxa but always from the same source, perhaps a marine bacterial symbiont which does not appear in the present analyses. Transfer of genes from endosymbiotic bacteria to animal hosts has been described previously, an extreme example being the recent report that a species of the fly Drosophila appears to have integrated almost the entire genome of the parasite Wolbachia into its nuclear genome (Hotopp et al., 2007).

The prokaryotic hemerythrins formed a number of groups, with most of these showing significant and unambiguous within-group homology. However, in most cases these groups did not neatly correlate with phylogeny, suggesting either that multiple instances of horizontal transfer have occurred, or that the relationships between the hemerythrins are not strong enough for the analyses to give reliable results. However, in addition to the animal hemerythrins, there were two large groups that showed strong internal relationships and broad consistency with taxonomy. One of these included many of the hemerythrin-like sequences of the Betaproteobacteria; the other included all of the Campylobacter hemerythrin-like sequences and most of the other sequences from the Epsilonproteobacteria. These are described further in ‘Distribution of single-domain hemerythrin-like proteins’ section.

Multi-domain hemerythrins

In terms of taxonomic distribution, the putative multi-domain hemerythrins are so far mainly restricted to the Proteobacteria. It is clear that the magnetotactic bacteria Magnetococcus and Magnetospirillum have far more multi-domain hemerythrins than other organisms. Multi-domain hemerythrins also seem to be common in the sulphate-reducing Deltaproteobacteria and their relatives. A selection of putative multi-domain hemerythrins is shown in Table 2.

Table 2.   Examples of putative multi-domain bacteriohemerythrins
StrainLocus TagLengthType
  1. Complex hemerythrins contain hemerythrin domains fused to other putative domains: MCP, methyl-accepting chemotaxis protein domain; CNBP; cyclic nucleotide binding protein domain; HK, histidine kinase; RR, response-regulator; GGDEF, cyclic di-GMP synthase; AC, adenylyl cyclase domains, as well as domains of unknown function (X). For a complete list, see supplementary Table S1.

Magnetospirillum magneticum AMB-1amb0220807MCP-hem
amb0723275hem-RR
amb0871502hem-X
amb1682906CNBP-hem
amb2535282hem-RR
amb3095870CNBP-hem
amb3267665MCP-hem
amb3610499hem-X
amb3793810MCP-hem
amb4156795MCP-hem
Rhodospirillum rubrum ATCC 11170Rru_A0632736MCP-hem
Rru_A1523262hem-hem
Rhodopseudomonas palustris BisA53RPE_2062507GGDEF-hem
Dechloromonas aromatica RCBDaro_0965681hem-HK
Daro_3954667GGDEF-hem
Rhodoferax ferrireducens DSM 15236/T118Rfer_1451277X-hem
Rfer_3626840GGDEF-hem
Rfer_4030498GGDEF-hem
Aeromonas hydrophila ATCC 7966AHA_0130678MCP-hem
AHA_2968809GGDEF-hem
Colwellia psychrerythraea 34HCPS_1744368hem-GGDEF
CPS_19391328HK-RR-hem
CPS_3631722hem-GGDEF
CPS_4695519RR-hem
Idiomarina baltica OS145OS145_02680566MCP-hem
Marinobacter aquaeoli VT8MaquDRAFT_0582662hem-GGDEF
Shewanella oneidensis MR-1SO3890529MCP-hem
Shewanella sp. W3-18-1Sputw3181DRAFT_1600526MCP-hem
Vibrio cholerae O1 N16961VC1216372hem-GGDEF
Vibrio parahaemolyticus RIMD 2210633VP1289381hem-GGDEF
Vibrio splendidus 12B01V12B01_04093383hem-GGDEF
Desulfotalea psychrophila LSv54DP0972370hem-GGDEF
DP2039374hem-GGDEF
Desulfovibrio vulgaris DP4DvulDRAFT_2548963MCP-hem
DvulDRAFT_2859689MCP-hem
DvulDRAFT_3043659GGDEF-hem
Desulfovibrio vulgaris HildenboroughDVU0170689MCP-hem
DVU3106655GGDEF-hem
DVU3155, DcrH963 or 959MCP-hem
Desulfuromonas acetoxidans DSM 684Dace_03781174GGDEF-hem
Dace_0592368hem-GGDEF
Dace_0947591hem-HK-RR
Dace_1299371hem-GGDEF
Deltaproteobacterium MLMS-1MldDRAFT_3301, MldDRAFT_2276457phosphatase-hem?
Pelobacter propionicus DSM 2379PproDRAFT_0329518MCP-hem
C. fetus 82-40CFF8240_0606385hem-GGDEF
Thiomicrospira denitrificans ATCC 33889Tmden_09671004hem-GGDEF-EAL
Mariprofundus ferrooxydans PV-1SPV1_05088373hem-GGDEF
Kuenenia stuttgartensiskustc1183872CNBP-hem
Beggiatoa sp. PSBGP_2837402AC-hem
Beggiatoa sp. PSBGP_0713935MCP-AC-hem
Haloarcula marismortui ATCC 43049pNG7265617hem-CoA-ligase?

In terms of the attached domains, the sequences fall into a few clear groups, relating to different methods of signal transduction. One obvious group is the methyl-accepting chemotaxis proteins (MCP)-hemerythrins, with a large N-terminal domain related to MCP, and a C-terminal hemerythrin domain. Considering only the hemerythrin domains, these classify into four sequence similarity groups: the Magnetospirillum and Rhodospirillum sequences; the Magnetococcus sequences; the Shewanella/Idiomarina sequences (one per strain), possibly also including the Aeromonas sequences; and the Desulfovibrio/Pelobacter sequences. These groups are consistent with the taxonomy of the organisms. Within each group, the closest relatives to the MCP part of the sequence are other MCPs from related organisms, rather than other MCP-hemerythrins, suggesting that in each group the fusion of MCP and hemerythrin occurred independently. The MCP-hemerythrins of Magnetococcus do not seem to be close to those of Magnetospirillum, suggesting that they may have been developed independently rather than transferred laterally with the magnetosome biosynthesis machinery; in fact, their closest relatives seem to be MCPs from Desulfotalea rather than from other Alphaproteobacteria. However, the sequences from Magnetococcus are generally rather divergent and tend not to cluster reliably in the analyses. One recently deposited sequence from Magnetospirillum gryphiswaldense (locus MGR2868) was the only example in this survey of an apparent hemerythrin-MCP, with the hemerythrin domain at the N-terminus and the MCP domain at the C-terminus.

A second group includes the cyclic nucleotide-binding domain (CNBP)–hemerythrins, which have a C-terminal hemerythrin sequence fused to a large N-terminal region including a putative CNBP domain. These sequences seem to be characteristic of magnetotactic bacteria, being found only in Magnetococcus and Magnetospirillum, with the exception of one from the Planctomycete Kuenenia stuttgartensis. In contrast to the MCP-hemerythrins, the CNBP-hemerythrins of Magnetococcus and Magnetospirillum do seem to be specifically related, suggesting that they may be more closely associated with the magnetotactic lifestyle. In contrast to the MCP-hemerythrins in general, the N-terminal part of the CNBP-hemerythrins does not bear a close resemblance to other bacterial proteins, suggesting that this may represent a new class of bacterial signalling domain. Two recently deposited sequences from the sulphur-oxidizing Gammaproteobacterium Beggiatoa (BGP_2837 and BGP_0713) possess a domain related to adenylyl/guanylyl cyclases. No other sequence in our survey possessed such motifs. One of these two sequences (BGP_0713) also possesses an apparent MCP domain.

Most of the other examples contain a GGDEF domain, believed to be involved in synthesis of the second messenger cyclic-di-GMP (Jenal & Malone, 2006; Ryan et al., 2006; Cotter & Stibitz, 2007), which has been shown to be involved in the control of virulence and biofilm formation in Pseudomonas, Vibrio, and other organisms. This suggests an intriguing link between oxygen levels and the cyclic-di-GMP signalling network. These proteins occur in two groups: those with the hemerythrin at the N-terminus (hemerythrin-GGDEF proteins), as in Vibrio, Colwellia, Desulfotalea, and Mariprofundus, and those with the hemerythrin at the C-terminus (GGDEF-hemerythrins) as in Rhodoferax, Aeromonas, and Dechloromonas. Desulfuromonas has one of each, and Rhodopseudomonas strains are split between the two types. Some of these proteins are large and appear to contain additional domains, including PAS domains, EAL domains (cyclic-di-GMP hydrolases), and histidine kinase (HK) domains, suggesting complex signalling networks.

Several other sequences included HK-like domains without GGDEF domains. Interestingly, the hemerythrin domains of these proteins showed specific sequence similarity to the hemerythrin domains of some hemerythrin-GGDEF proteins. One sequence from the unnamed Deltaproteobacterium MLMS-1 possessed a possible phosphatase domain. Several of the sequences contain CheY-like receiver domains (Galperin, 2006). These may be phosphorylation targets. In these cases, phosphorylation may alter oxygen binding, or oxygen binding may control susceptibility to phosphorylation.

One unique sequence of Rhodospirillum (Rru_A1523) is the only example detected of an apparent di-hemerythrin, with two fused hemerythrin domains.

In some cases, the role of the apparent extra domain was not clear from sequence analysis. For example for two Magnetococcus proteins of lengths 249 and 253 (Mmc1_2834 and Mmc1_3109), the C-terminal regions were related to each other but not significantly to any other sequences in the database. Likewise, many of the Campylobacter hemerythrin-like sequences are long enough to encode another domain at the C-terminus, but the C-terminal regions appear to be related only to those of other Campylobacter hemerythrin-like proteins.

Bearing in mind the caveat concerning phylogenetic analysis of hemerythrins, in Fig. 2 a sequence similarity dendrogram based is presented on the hemerythrin domains of some of the multi-domain hemerythrin-like proteins detected in the survey. The hemerythrin domains alone were used in this analysis, as otherwise the result would have been strongly biased by relationships between the different classes of attached domain.

Figure 2.

 Guide tree generated by clustalw based on the amino acid sequences of hemerythrin domains of selected putative multi-domain hemerythrins. Where the hemerythrin occurred at the C-terminus, the sequence was taken from 10 residues before the conserved tryptophan (W10 in Themiste dyscritum); where the hemerythrin domain was at the N-terminus, the sequence was truncated 10 residues after the aspartate of the final HxxxxD motif. In the single case where the hemerythrin domain was central, both sides were truncated as described. Species abbreviations are as follows: AMB1, Magnetospirillum magneticum AMB1; MS1, Magnetospirillum magnetotacticum MS1; MSR1, Magnetospirillum gryphiswaldense MSR1; Kuest, Candidatus Kuenenia stuttgartensis; Rpal, Rhodopseudomonas palustris; Rferr, Rhodoferax ferrireducens; Ahydr, Aeromonas hydrophila; Asalm, Aeromonas salmonicida; Dvulg, Desulfovibrio vulgaris; Dacet, Desufuromonas acetoxidans; Darom, Dechloromonas aromatica; Ibalt, Idiomarina baltica; Iloho, Idiomarina lihoiensis; Samaz, Shewanella amazonensis; Ssp, Shewanella sp.; Soneid, Shewanella oneidensis; Sbalt, Shewanella baltica; Sputr, Shewanella putrefaciens; Sloho, Shewanella loihica; Ddesulf, Desulfovibrio desulfuricans; Pprop, Pelobacter propionicus; Hmaris, Haloarcula marismortui; Cpsyc, Colwellia psychrerythraea; Vchol, Vibrio cholerae; Vpara, Vibrio parahaemolyticus; Vsplen, Vibrio splendidus; Mferr, Mariprofundus ferrooxydans; Tdenit, Thiomicrospira denitrificans; Beggsp, Beggiatoa sp. Abbreviations referring to strain designations, accessory domains and locus tags are as in Table 2 and supplementary Table S1.

A few main groups can be discerned, as follows: the Alphaproteobacterial MCP-hemerythrins; the CNBP-hemerythrins; the GGDEF-hemerythrins; the Gammaproteobacterial and Deltaproteobacterial MCP-hemerythrins; the hemerythrin-GGDEF proteins and hemerythrin-HK; and the two adenylyl cyclase-hemerythrins.

If indeed these hemerythrin-like domains represent oxygen sensors, then interaction of the hemerythrin-like domain with oxygen must somehow alter the activity of the associated signalling domain. The mechanism of this process has been investigated in some detail by Isaza et al. (2006) for the isolated hemerythrin domain of the MCP-hemerythrin DcrH of Desulfovibrio vulgaris. Three forms of the hemerythrin domain must be considered (Stenkamp, 1994): the deoxy-form, in which both iron atoms are present as Fe(II); the oxy-form, in which oxygen is present as a bound peroxide and both iron atoms are present as Fe(III); and the met-form, in which the oxygen has dissociated and the iron atoms remain in an oxidized state. Isaza et al reported that, in the case of DcrH, auto-oxidation to the met-form occurred much more readily than in the animal hemerythrins. Significant conformational differences were noted in the N-terminal region (which in the intact protein would be fused to the MCP domain) between the oxidized met- and oxy-forms and the reduced deoxy-form. It was therefore proposed that this is the mechanism by which oxygen interaction is detected. Similar investigation of other proteins with an N-terminal hemerythrin domain, such as the unique hemerythrin-MCP of Magnetospirillum gryphiswaldense, or the hemerythrin-GGDEF proteins of Vibrio spp., would be very interesting.

Distribution of single-domain hemerythrin-like proteins

Putative single-domain hemerythrin-like proteins were detected predominantly in the Proteobacteria and the Firmicutes, with few examples from other phyla. Because the present phylogenetic analysis of the entire dataset did not yield useful results, subsets of the single-domain hemerythrins from different taxa were investigated in order to identify specific orthologues in related species.

The putative single-domain bacteriohemerythrins of the Alphaproteobacteria are dominated by those of the magnetotactic bacteria, with an astonishing 23 in Magnetospirillum magneticum AMB-1, 21 in Magnetospirillum magnetotacticum MS-1, and 17 in Magnetospirillum gryphiswaldense MSR-1. Only four single-domain hemerythrin sequences were detected in Magnetococcus sp. MC1, along with two further sequences of lengths 249 and 253, large enough to contain an extra unknown domain, but this organism, like Magnetospirillum spp., possesses multiple multi-domain bacteriohemerythrins, discussed above. Apart from the magnetotactic bacteria, a single putative sequence was detected in Mesorhizobium sp. and three in Rhodospirillum rubrum, a very close relative of Magnetospirillum. Magnetospirillum is presumed to be derived from a Rhodospirillum-like ancestor through loss of the photosynthetic apparatus and acquisition of the magnetosome biosynthesis machinery. At some point in this evolutionary process, it appears to have greatly diversified its array of hemerythrins, presumably to reflect some feature of its new lifestyle. Phylogenetic analysis of the Alphaproteobacterial sequences suggested three large families, each containing one of the Rhodospirillum sequences and multiple Magnetospirillum sequences; however, the relationships within these groups were not strong enough for the authors to be confident in these assignments, and for the present, the relationships among these sequences remain unresolved.

Among the Betaproteobacteria, a highly conserved hemerythrin-like sequence was detected in the majority of sequenced Burkholderia genomes, and a closely related sequence also occurred in all Ralstonia genomes. The differing lengths of these sequences (141, 145, or 149) are due to the presence of one to three repeats of the sequence PELK near the N-terminus of the protein. More distantly related orthologues were detected in Comamonas testosteroni, Polaromonas naphthalenovorans, and Acidobacterium spp. Based on the alignments, several of these sequences seem to have incorrectly assigned start sites, giving them an apparent N-terminal extension. The majority of the Betaproteobacterial sequences share a characteristic deletion of approximately five residues between the H and HxxxE motifs as compared with most other hemerythrins, suggesting shared ancestry. The more distantly related Betaproteobacterium Dechloromonas aromatica possesses a remarkable number of putative single-domain hemerythrin-like sequences (10), but these do not appear to be closely related to those of the Burkholderiales.

In the Gammaproteobacteria, the majority of the putative single-domain hemerythrins show distinct sequence similarity. Those of Xanthomonas spp., Stenotrophomonas maltophilia and Pseudomonas aeruginosa (present in multiple sequenced genomes) are extremely similar; those of Alkalilimnicola, Alcanivorax, and Reinekea less close but still clearly related. However, there is also evidence that lateral gene transfer has played a part in the distribution of hemerythrin gene sequences. The Pseudomonas resinovorans carbazole degradative plasmid pCAR1 (accession no. NC_004444) contains a 146-residue hemerythrin-like protein designated p034 within a degradative operon. This gene shows extremely high similarity to the 149 residue hemerythrin from the Betaproteobacterium Janthinobacter sp. J3, also located within a carbazole degradative operon (Maeda et al., 2003; Urata et al., 2004). It therefore seems very likely that P. resinovorans acquired this operon from Janthinobacterium sp. or a related organism. The putative hemerythrin of Azotobacter vinelandii also shows a characteristic deletion present mainly in Betaproteobacterial sequences, suggesting horizontal acquisition.

Within the Deltaproteobacteria, multiple hemerythrin-like sequences occur in species of Desulfovibrio, Desulfuromonas, Desulfotalea, Geobacter, and Anaeromyxobacter, as well as the strain MLMS-1. The most widely distributed cluster of sequences is 135 residues long and occurs in almost all Geobacter and Pelobacter sequences as well as the unnamed arsenate-reducing strain MLMS-1. Closely related are the 137–150-residue single-domain hemerythrins of Desulfovibrio spp., and the 132- or 133-residue proteins of Desulfuromonas and some Geobacter spp. A second group contains the remaining Geobacter sequences, and a third contains most of the Anaeromyxobacter sequences (five or six of the eight sequences found in each of two strains), suggesting that these have recently diverged from a single common ancestral gene.

In the Epsilonproteobacteria, hemerythrin-like sequences seem to be common in Campylobacter spp. All of the Campylobacter hemerythrins are clearly related to each other, to the exclusion of short-chain hemerythrins from other organisms. Most possess a C-terminal extension of unknown function. Within the group, they clearly fall into several subgroups, with all Campylobacter genomes sequenced to date containing at least two. One interesting feature is that some of the hemerythrins are apparently incomplete, lacking the C-terminal part with the HxxxxD motif. Investigation of the DNA sequences shows that the C-terminal portion is in fact present, but is coded in a different reading frame. In some cases, where an ATG codon follows soon after the frameshift point, the C-terminal part has been annotated as a separate gene. This might be considered to be a sequencing error or a missense mutation, but the fact that the break-point occurs in exactly the same place in multiple aligned sequences suggests that a programmed frameshift may occur following a series of A bases.

The Campylobacter jejuni, Campylobacter coli, and Campylobacter upsaliensis hemerythrins seem to fall into at least five orthologous groups. Those of group 1 occur in all strains, and are about 240–270 residues in length. Group 2 are about 133 residues in length and were detected in all strains apart from C. jejuni RM1221 and C. coli. Group 3 are 199 residues in length and were detected in all strains apart from C. jejuni doylei 269.97. Groups 4 and 5 occur in pairs encoded adjacent to each other, suggesting that they may be heterodimers or a higher order heteromultimer, as seen in some animal hemerythrins. In some sequences, there appears to be a possible programmed frameshift as described above. Campylobacter hemerythrins that did not fall into any of these five groups nevertheless were clearly more similar to other Campylobacter sequences than to hemerythrins of other groups. The sequences from Helicobacter hepaticus also seemed to be related to the Campylobacter sequences. The single Wolinella sequence may also be related, more distantly, to the Campylobacter–Helicobacter group.

Within the Firmicutes, three main groups of clostridial hemerythrins were identified. The first group comprises hemerythrins 129–130 amino acid residues in length. These were detected in almost all the strains, with C. beijerinckii possessing three. The second group includes the 137 residue proteins of Clostridium botulinum, Clostridium tetani, Clostridium sp. OhILAs, and Alkaliphilus metalliredidegens. The third group includes the remaining sequences, including Clostridium sequences of 131–143 residues as well as the single sequences of Caldocellulosiruptor, Thermoanaerobacter, and Symbiobacterium.

Possible functions of hemerythrins

The hemerythrin-like motifs on which the present analysis has been based bind iron, which in turn allows binding of oxygen. Putative functions of these proteins may be considered in several categories:

  • 1Binding of oxygen as a storage mechanism (as in animals) or for delivery to oxygen-requiring enzymes (as proposed for the hemerythrin-like protein of Methylococcus capsulatus; Karlsen et al., 2005).
  • 2Binding of oxygen as a sensory mechanism (as proposed for DcrH of Desulfovibrio vulgaris; Xiong et al., 2000; Isaza et al., 2006).
  • 3Binding of oxygen as a detoxification mechanism.
  • 4Binding of iron as a storage mechanism, as proposed for the ovohemerythrin of the leech Theromyzon tessulatum (Baert et al., 1992).
  • 5Binding of iron or other metals as a detoxification mechanism; for example, the hemerythrin-like protein of the polychaete worm Neanthes diversicolor is believed to be involved in cadmium detoxification (Demuynck et al., 1993).
  • 6Some other mechanism unrelated to binding of metals or oxygen.

The presence of the characteristic histidine-containing motifs implies the potential for binding iron and similar metal ions. Binding of oxygen is associated with the presence of hydrophobic amino acids lining the oxygen-binding pocket, especially L28, F55, F80, W97, L98, and I102 in the T. dyscritum sequence (Stenkamp, 1994). These residues are conserved in the Desulfovibrio MCP-hemerythrin DcrH, which has been shown to bind oxygen (Xiong et al., 2000). Therefore, the conservation of these residues in the larger group of prokaryotic hemerythrin-like proteins was investigated. A sample of 161 putative prokaryotic hemerythrin-like sequences, including 111 putative single-domain hemerythrins and the hemerythrin-like domains of 50 putative multi-domain hemerythrins, representing all of the different groups detected in this analysis, were aligned manually to ensure that the hemerythrin motifs were correctly aligned (the overall sequence similarity not being great enough to ensure this in all cases if automated alignment were used), and the residues present at the relevant positions noted. Because all of these residues are adjacent to residues forming the hemerythrin motifs, the corresponding residues could be easily identified, with the exception of L28 in a few cases, where the initial histidine was absent or where multiple histidines were present in this area. The results are shown in Table 3. It is clear that hydrophobic residues are present at these positions in the vast majority of the sequences considered, with a few exceptions, such as three Magnetospirillum single-domain hemerythrins in which I102 is replaced by D. These results are consistent with an oxygen-binding function for the great majority of prokaryotic hemerythrin-like proteins. Magnetospirillum is known to assimilate much larger amounts of iron than most other organisms, and one might speculate that the unusual hemerythrin-like proteins lacking conserved hydrophobic residues may serve a role in iron acquisition or storage.

Table 3.   Substitution of amino acid residues lining the oxygen-binding pocket
L28F55F80W97L98I102
  1. In Themiste dyscritum the oxygen-binding pocket involves six amino acids: L28, F55, F80, W97, L98 and I102. The data shows the relative occurrence of amino acids in the corresponding positions for (A) 111 single-domain hemerythrin-like sequences and (B) in the hemerythrin-like domains of 50 representative multi-domain sequences.

A
 L: 64 (57.6%)F: 108 (97.3%)F: 40 (36.0%)W: 103 (92.8%)L: 62 (55.9%)I: 69 (62.2%)
 F: 20 (18.0%)L: 2 (1.8%)L: 28 (25.2%)F: 2 (1.8%)F: 23 (20.7%)V: 18 (16.2%)
 I: 14 (12.6%)N: 1 (0.9%)I: 19 (17.1%)L: 2 (1.8%)I: 16 (14.4%)A: 9 (8.1%)
 M: 6 (5.4%) V: 16 (14.4%)V: 2 (1.8%)W: 5 (4.5%)F: 5 (4.5%)
 A: 2 (1.8%) M: 6 (5.4%)A: 1 (0.9%)V: 4 (3.6%)L: 2 (1.8%)
 E: 1 (0.9%) A: 1 (0.9%)Q: 1 (0.9%)E: 1 (0.9%)T: 2 (1.8%)
 K: 1 (0.9%) Q: 1 (0.9%)  D: 3 (2.7%)
 Q: 1 (0.9%)     
 Not assigned: 2 (1.8%)
B
 L: 43 (86%)F: 46 (92%)F: 26 (52%)W: 45 (90%)L: 41 (82%)I: 44 (88%)
 I: 3 (6%)L: 4 (8%)L: 21 (42%)M: 2 (4%)F: 6 (12%)L: 1 (2%)
 M: 3 (6%) I: 2 (4%)L: 1 (2%)I: 6 (12%)M: 1 (2%)
 G: 1 (2%) M: 1 (2%)V: 1 (2%)V: 1 (2%)T: 1 (2%)
   E: 1 (2%)W: 1 (2%)V: 1 (2%)
     E: 1 (2%)

It should also be noted, however, that the hemerythrin-like protein of the annelid worm Neanthes diversicolor, believed to be involved in detoxification of cadmium, also possesses hydrophobic residues in these positions. The full sequence of the ovohemerythrin of the leech Theromyzon tessulatum, which is believed to act as an iron storage protein, does not appear to be available for comparison.

Assuming that the role of most of these proteins involves oxygen binding, the function might be oxygen storage, oxygen detection, or oxygen detoxification. In the case of aerobic or facultatively aerobic organisms, a plausible function is storage of oxygen for respiration under fluctuating external oxygen levels, as in animals. It is also possible that these proteins may supply oxygen specifically to metabolic enzymes with a high demand for it. Such a function has been proposed for the hemerythrin-like protein of Methylococcus capsulatus, in which it is presumed to supply oxygen to methane monooxygenase (Karlsen et al., 2005). In strictly anaerobic organisms such as Clostridium, which are highly sensitive to oxygen, it might be supposed that a detoxification function is more likely, with auto-oxidation, followed by rereduction generating hydrogen peroxide, as discussed for DcrH by Xiong et al. (2000). Such a function might also be plausible in the case of cyanobacteria, where photosynthetic oxygen production can inhibit nitrogen fixation.

However, it is also noteworthy that almost all of the organisms in which hemerythrin-like sequences have been detected are motile, raising the possibility that the principal function of these proteins is in oxygen sensing and positive or negative aerotaxis, or other responses to oxygen levels. This is consistent with the observation that hemerythrin-like domains seem to have become fused to various signalling domains in multiple separate events. Perhaps the majority of the single-domain hemerythrin-like proteins normally interact with separately encoded signalling systems to control responses to oxygen.

Further research on hemerythrins would be facilitated by identification of regulators for hemerythrin genes, or proteins whose synthesis is coregulated with that of hemerythrins. If specific genes are always located adjacent to the hemerythrin genes, this would indicate the possibility that there might be a functional relationship between them. To this end, the genes encoded adjacent to the hemerythrin-like proteins were examined in selected organisms.

In the mainly aerobic bacteria Burkholderia mallei, Burkholderia pseudomallei, and Burkholderia thailandensis, the single hemerythrin-like protein is encoded between a putative iron-dependent regulatory protein and the subunits of ubiquinol oxidase, suggesting that the hemerythrin-like protein may play a role in respiration under low-oxygen conditions. However, the transcription of these genes in opposite directions does not support this hypothesis.

Similarly, the microaerophilic Campylobacter spp. typically possess multiple hemerythrin-like sequences, belonging to several subfamilies (discussed in ‘Distribution of single-domain hemerythrin-like proteins’ section). The 199-residue group three hemerythrin-like sequences from all eight sequenced strains of C. jejuni, C. coli, and C. upsaliensis are encoded adjacent to a HK and a response receiver protein, suggesting that they may interact with these proteins to act as an oxygen sensor. Again, the direction of transcription does not support the hypothesis that these are coregulated and experimental evidence for NCTC 11168 indicates that this two-component system plays a different role – in colonization of the gastrointestinal tract (Mackichan et al., 2004).

Experimental work with microarrays or reporter genes to define the conditions under which genes encoding hemerythrins are induced may provide better evidence for their role as would demonstration of a clear phenotype associated with mutations in the genes encoding hemerythrin-like proteins. Identification of upstream regulatory sequences would be useful to identify potential regulators.

Concluding remarks and suggestions for future research

From the evidence of genome sequences, hemerythrin-like proteins and domains appear to be abundant in prokaryotes, especially in microaerophilic bacteria and in those with complex anaerobic respiration pathways. To avoid confusion with the well-studied animal hemerythrins, it is proposed that these proteins should be designated ‘bacteriohemerythrins’ (by analogy with ‘myohemerythrins’, ‘neurohemerythrin’, and ‘ovohemerythrin’).

Only two of these proteins have been characterized to date: a copper-induced short-chain hemerythrin-like protein of Methyloccus capsulatus, and the hemerythrin domain of an MCP-hemerythrin from Desulfovibrio vulgaris (Xiong et al., 2000; Karlsen et al., 2005). In both cases, hemerythrin-like binding of iron and oxygen was observed. The authors have begun testing other putative hemerythrin-like proteins from organisms of interest to their laboratories, and have so far demonstrated that at least one of the hemerythrin-like sequences of Geobacter sulfurreducens also encodes a hemerythrin-like protein able to bind iron and oxygen, as indicated by spectral data (L. Bellamy & C. French, unpublished data). Thus, of the putative hemerythrin-like proteins listed here, three, which are not specifically close relatives within the group, have so far been studied biochemically, and all three have been found to show hemerythrin-like properties. It therefore seems plausible that most or all of the others are also genuine hemerythrin-like proteins.

Hemerythrin-like sequences are particularly abundant in the magnetotactic bacteria Magnetospirillum and Magnetococcus. These organisms possess far greater numbers of putative hemerythrin-like proteins than other organisms for which genome sequences are available. The reasons for this deserve investigation. It is noteworthy that these organisms are highly sensitive to oxygen concentration, producing magnetosomes only under a limited range of external oxygen levels. It therefore seems possible that hemerythrin-like proteins play a role in enabling these bacteria to move using magnetotaxis to regions where the oxygen tension is sufficient for aerobic respiration but not too high to prevent magnetosome synthesis. On the other hand, it is also known than magnetotactic bacteria accumulate much larger quantities of iron than other organisms; thus, a role in iron acquisition or storage must also be considered. Equally, it may be that a series of oxygen-binding hemerythrins with different affinities for oxygen is required to buffer intracellular oxygen levels for the cell to maintain the critical intracellular oxygen concentration for magnetite precipitation. Ultimately, only biochemical analysis can resolve this question.

While the authors are particularly interested in the magnetotactic bacteria, the abundance of hemerythrin-like sequences they possess makes them unattractive candidates for research into hemerythrin function. Organisms with one or a few hemerythrins, and in which closely related hemerythrin-like proteins are present in multiple species, will be more amenable to analysis, especially for studies using knock out mutants. For further characterization of the MCP-hemerythrins, Desulfovibrio vulgaris is an attractive target, due to the characterization of DcrH which has already been reported (Xiong et al., 2000; Isaza et al., 2006). Shewanella spp., with their highly conserved single MCP-hemerythrin and no other hemerythrin-like proteins, are perhaps even more attractive candidates, because they are well-established model organisms for the study of anaerobic respiratory pathways. It would be relatively straightforward to obtain deletion mutants and to observe the effect on aerotaxis. As for the hemerythrin-GGDEF proteins, Vibrio cholerae would appear to be a particularly attractive candidate, because it is a model organism for biofilm formation, and its cyclic-di-GMP signalling system has already been shown to be involved in biofilm formation (Jenal & Malone, 2006; Ryan et al., 2006; Cotter & Stibitz, 2007).

As for the single-domain hemerythrin-like proteins, those from Burkholderia and Ralstonia might be involved in delivering oxygen to oxygenases and respiratory oxidases but this requires experimental proof. Because B. mallei and B. pseudomallei are important human pathogens, and some Ralstonia spp. are well studied in the context of biodegradation and bioremediation, these would seem good model systems to pursue. Other single-domain hemerythrins may interact with separately encoded signalling domains. The well-characterized human pathogen P. aeruginosa, the plant pathogenic Xanthomonas spp., and the opportunistic pathogen Stenotrophomonas, which all possess a highly conserved hemerythrin-like sequence, may be suggested as suitable candidates for further investigation.

While any bioinformatic analysis carries the caveat that the functional assignments are putative, it is hoped that this analysis will be valuable to those wishing to demonstrate the biological functions of hemerythrins in bacteria.

Ancillary