Arsenic compounds have been abundant at near toxic levels in the environment since the origin of life. In response, microbes have evolved mechanisms for arsenic resistance and enzymes that oxidize As(III) to As(V) or reduce As(V) to As(III). Formation and degradation of organoarsenicals, for example methylarsenic compounds, occur. There is a global arsenic geocycle, where microbial metabolism and mobilization (or immobilization) are important processes. Recent progress in studies of the ars operon (conferring resistance to As(III) and As(V)) in many bacterial types (and related systems in Archaea and yeast) and new understanding of arsenite oxidation and arsenate reduction by respiratory-chain-linked enzyme complexes has been substantial. The DNA sequencing and protein crystal structures have established the convergent evolution of three classes of arsenate reductases (that is classes of arsenate reductases are not of common evolutionary origin). Proposed reaction mechanisms in each case involve three cysteine thiols and S–As bond intermediates, so convergent evolution to similar mechanisms has taken place.
1Introduction: arsenic resistance in natural and unnatural environments
Arsenic toxicity in both natural and humanly impacted environments is an important issue of current public health. Recently problem sites include (a) drinking water in Taiwan where arsenic poisoning is known as ‘Black foot’ because of the necrotic destruction of tissue, (b) well water in West Bengal India and Bangladesh [1,2] where World Health Organization (WHO) efforts to provide pathogen-free drinking water by placing shallow tube wells has replaced a problem of cholera with one of arsenic toxicity, (c) release of arsenic from burning coal into foods in Southwest China, and (d) within the USA , where arsenic in drinking water from residential wells in Michigan and Wisconsin [4,5], as well as industrially impacted mining waters in the western USA, and recreational waters north of Boston  are far above WHO recommended levels. Useful URLs for information about global arsenic pollution problems at the time of writing include http://www.who.int/inf-fs/en/fact210.html (WHO); http://co.water.usgs.gov/trace/arsenic/ (USA); and http://bicn.com/acic/ (Bangladesh). The widespread occurrence of high arsenic in water caused the US President to state “Arsenic is a natural substance that sometimes causes problems” and to reverse the previous government's decision to accept five times lower WHO standards .
The ultimate source of arsenic on the Earth's surface is igneous activity . Arsenic is widely spread in the upper crust of the Earth, although mainly at very low concentrations, with arsenic concentrations in soil ranging from 0.1 to more than 1000 ppm (mg kg−1). In atmospheric dust, the range is 50−400 ppm. In seawater, the average arsenic level may be 2.6 ppb (μg l−1) and in fresh water about 0.4 ppb. Arsenic at significant levels is all around us (Fig. 1).
Acidic mining environments are of particular concern with regard to arsenic pollution and microbial activities result in very low pH (sometimes approaching pH 1), since bacterial energy metabolism and growth oxidizes reduced sulfur to H2SO4. Arsenic is oxidized in parallel to AsO43−. Both bacteria and Archaea are found in these environments and are involved in arsenic transformations [9–11].
Arsenic compounds occur naturally at significant levels especially in marine foodstuffs, for example prawns contain levels approaching 200 ppm . In addition, arsenical compounds have been used widely in medicine (especially for protozoan diseases), in agriculture (as herbicides and animal feed additives), and in crime (as a poison) [12–14]. Arsenic is itself a carcinogen ([15,16], http://www.who.int/inf-fs/en/fact210.html). Indeed, Paul Ehrlich won the Nobel Prize in medicine in 1908 for use of arsenicals as chemotherapeutic agents, culminating with the famous arsenical compound ‘Salvarsan’, which was the best of ‘early modern’ antimicrobial agents. In North America and elsewhere in the developed world, organoarsenicals are added at levels of 20–40 g ton−1 to feed for boiler chickens . It is thought that the aromatic organoarsenicals Roxarsone and p-arsanilic acid enhance chicken growth, perhaps by limiting diseases such as coccidiosis. Considering the number of chickens consumed annually in North America and Europe, the amount of arsenic used and then released to the environment as chicken effluent is substantial . The microbial degradation of organoarsenicals is known , but barely studied.
1.1The arsenic global geocycle
Just as there are well-studied geocycles for carbon, nitrogen, oxygen, sulfur and other elements that are components of all living cells, there are also geocycles for toxic elements including arsenic (Fig. 1). Living cells (especially microbes) carry out redox and covalent bond chemistry and are important players in the arsenic geocycle. Higher plants and animals bio-accumulate compounds to levels far above those of the environments in which they live.
The major sources of human contamination and occupational exposure with arsenic are the burning of coal and industrial metal smelting (Fig. 1), and more recently the semiconductor industry, as well as release from arsenic-rich ores during mining. Bio-mining releasing soluble arsenic from rock produces local environments of high and toxic arsenic levels.
This arsenic geocycle is presented here for the first time (a more limited version was in Cervantes et al. ) and is patterned after a similar geocycle for mercury that we first published over 25 years ago. That geocycle figure has since been used in reviews and textbooks, usually without attribution. A similar experience is expected with greater understanding of the arsenic geocycle. To paraphrase and expand on the US President George W. Bush: arsenic occurs so broadly in nature at subtoxic and near-toxic levels that human exposure from perturbation of natural arsenic should always be of concern .
Arsenate (the main arsenic compound in seawater) is taken up by marine organisms, ranging from phytoplankton, algae, crustaceans, mollusks and fish [20,21], and converted to small organic compounds (such as methylarsonic acid or dimethylarsinic acid; Figs. 1 and 2), or is converted to organic storage forms that are then secreted into the environment. However, some arsenic is retained by phytoplankton and metabolized into complex organic compounds. The transformation of inorganic arsenic into lipid-soluble compounds might be an adaptive mechanism for marine phytoplankton to compensate for limited nitrate availability . More complex algal organoarsenical compounds include water-soluble arsenosugars (i.e. dimethylarsenosugars) and lipid-soluble compounds (arsenolipids). While phytoplankton and macroalgae are the primary producers of complex organoarsenic compounds in the sea (Fig. 1), these organisms are themselves consumed and metabolized by marine animals. Fish and marine invertebrates retain 99% of accumulated arsenic in organic form; and crustacean and mollusk tissues contain higher concentrations of arsenic than fish. The major organoarsenic compound isolated from marine organisms is arsenobetaine. It occurs in algae, clams, flounder, lobsters, sharks, and shrimp. It is not known how arsenosugars and arsenolipids are converted into arsenobetaine within the higher animals in the marine environment. Arsenobetaine is degraded by microbial metabolism in coastal seawater sediments to methylarsonic acid and to inorganic arsenic. This process completes the biological cycling of arsenic in marine biosystems (Fig. 1).
1.2Methylation of arsenic
Microbial conversion of arsenic to methylarsenic was first observed over 150 years ago. It has been understood at the level of products formed from the work of Challenger and coworkers before World War II [13,22–24]. Fungi dominate the microbes that produce volatile, garlic-smelling trimethylarsine , although bacteria and animal tissues also have this potential . Hall et al.  showed that the microbial content of the mouse intestinal cecum (mostly anaerobic bacteria) methylates inorganic arsenic. Up to 40% of low levels of As(III) and As(V) was methylated in vitro by cecal contents in less than 24 h. Both monomethyl- and dimethyl-arsenic compounds were formed, and addition of potential methyl donors (produced by microbes) increased the yield of methylarsonic . The conversion of arsenate to methylarsonic acid or to dimethylarsinic acid (Fig. 2) is a possible mechanism for detoxification.
Since microbial methylation has been reviewed recently [21,27], this topic will not be considered here in depth. However, Fig. 2 shows the proposed reaction pathway for microbial methylation (mostly derived from studies of whole cell transformations and crude enzyme preparations) [23,24]. The purpose here is to introduce a topic that needs the movement from geocycles to purified enzymes and genes, much as presented below for arsenite oxidation and arsenate reduction.
The conversion of water-soluble arsenate to volatile trimethylarsine is a multistep process (Fig. 2), with arsenate initially being reduced to arsenite. There then follows a sequence of methylation and reoxidation, followed by reduction of the organoarsenical intermediates (Fig. 2). Dimethylarsinic acid is the substrate for fungal conversion to arsenobetaine and arsenolipids found in marine animals [19,23].
2Genes for arsenic resistance
Toxic metals in the environment select and maintain microbes possessing genetic determinants which confer resistance to the toxic compounds. In bacteria heavy metal resistance genes are frequently located on plasmids. While bacterial resistance mechanisms to metallic ions, including those of arsenic, have been studied with molecular detail [28–30], there is much less information on this subject available for algae and fungi. Here, we attempt to put together current knowledge on resistance to, and microbial metabolism of, arsenic compounds, with emphasis on molecular mechanisms of reduction and oxidation coming from recent crystal structures of the enzymes. Other aspects of microbial interactions with arsenicals, especially the genes for plasmid and chromosomal resistance systems, were summarized recently  and here the emphasis will be different.
The transport of arsenate into bacterial cells as oxyanions comparable to those of phosphate is carried out by phosphate transport membrane systems, for example the Pit and Pst systems of Escherichia coli (Fig. 3). Arsenate oxyanions in water show three pKa values (2.2, 7.0, and 11.5) , comparable to 2.1, 7.2, and 12.7 for phosphate. This means that approximately equal amounts of HAsO42− and H2AsO4− occur at pH 7, whereas H3AsO4 and H2AsO4− predominate in acidic environments. Arsenite (As(III)), in contrast, appears mostly un-ionized as As(OH)3 at neutral pH, with a pKa of 9.3 for dissociation to H2AsO3−. As(OH)3 is transported into cells at neutral pH by aqua-glyceroporins (glycerol transport proteins)  in bacteria , yeast  (Fig. 3) and mammals , since As(OH)3 resembles the inorganic equivalent of a polyol. GlpF, which was originally identified as the glycerol facilitator in E. coli, is a member of the aqua-glyceroporin family. Disruption of the glpF gene confers resistance to antimonite, consistent with the GlpF channel allowing translocation of Sb(OH)3. It seems likely that As(OH)3 is a GlpF substrate also, but arsenite also may enter E. coli by another system. The structure of the E. coli GlpF protein has been solved at 6.9 Å resolution  and shows a tetrameric assembly of subunits, comparable to aquaporins, but with a larger central channel in each subunit. The problem for a glyceroporin membrane protein is how to allow glycerol (and arsenite) movement across the membrane while excluding the smaller water molecules.
Deletion of the gene for yeast aqua-glyceroporin Fps1p similarly confers resistance to Sb(III) and to As(III) [36,37]. However, the fps1Δ strain is still relatively sensitive to As(III), suggesting that the yeast cell also has an additional As(III) uptake pathway. High osmolarity, which closes the Fps1p aqua-glyceroporin channel, provides resistance to both As(III) and Sb(III); and mutants with a constitutively open glycerol channel protein are hypersensitive to As(III) and Sb(III). Recently, it was shown that the mammalian homologues of Fps1p, AQP7 and AQP9, functionally substitute for Fps1p in an fps1Δ yeast strain . Thus, it is probable that As(III) also enters mammalian cells via aqua-glyceroporins.
In addition to chromosomal genes that function for uptake of inorganic arsenic as alternative substrates to useful nutrients, many microbes possess genes that specifically confer resistance to inorganic arsenic, both arsenate (As(V)) and arsenite (As(III)), as their natural primary substrates [28,38,39]. As this has been reviewed repeatedly , these genes and the proteins will be only briefly considered. In bacteria, these resistance determinants are often found on plasmids, which has facilitated their study at the molecular level. However, basically the same determinants are found also on bacterial chromosomes, including those of E. coli, Pseudomonas aeruginosa, Bacillus subtilis and Mycobacterium tuberculosis. As more and more bacterial genomes are sequenced, it has become clear that arsenic resistance operons are ubiquitous. The chromosomal systems are functional and provide arsenic tolerance; inactivating these ars operons leads to ‘hypersensitivity’ to arsenic compounds. Bacterial ars systems confer arsenic resistance primarily by encoding a specific efflux pump that extrudes As(III) from the cytoplasm , thus lowering the intracellular concentration of the toxic arsenic.
In some plasmid-determined systems of Gram-negative bacteria, the efflux pump consists of a two-component ATPase complex. The arsA gene product is a soluble ATPase subunit , which physically associates with an integral membrane protein, the product of the arsB gene [41,42]. Prior to efflux, arsenate is enzymatically reduced to arsenite (the substrate of ArsB and the activator of ArsA ATPase activity) by the small cytoplasmic arsenate reductase, the product of the arsC gene  (see below). In most chromosomal arsenic resistance systems of Gram-negative bacteria and the plasmids and chromosomes of Gram-positive bacteria, contiguous arsB and arsC genes are found, but there is no arsA gene.
Little is known about this subject in algae and fungi. An arsenic resistance gene cluster similar to that of bacteria is found in the yeast Saccharomyces cerevisiae. There are three contiguous genes in the cluster, ARR1, ARR2 and ARR3 (previously called ACR1, ACR2 and ACR3). The first gene, ARR1, appears to produce a yeast transcriptional regulator (so it presumably functions as the equivalent of the unrelated bacterial arsR gene). Disruption of the chromosomal ARR1 gene leads to hypersensitivity to arsenite and arsenate. Deletion of the ARR3 gene (determinant of the membrane efflux protein) also eliminates both resistances , whereas disruption of ARR2 (equivalent to the bacterial arsC) eliminates arsenate resistance alone, as expected . The yeast ARR2 arsenate reductase gene also functions in E. coli. Arr3p (the protein product of ARR3) is a member of a family of arsenite carrier efflux that includes bacterial and archaeal members such as the B. subtilis SKIN (Sigma K insertion) element YqcL protein. This family is unrelated to the larger family of ArsB proteins found in many bacterial ars operons (including in E. coli and Staphylococcus aureus) and appears to be the result of convergent evolution . Thus far, no other yeast or other eukaryote orthologs of Arr2p and Arr3p have been identified.
In addition to the three ARR gene products, another yeast protein, Ycf1p, which is an ABC ATPase, also contributes to resistance to As(III) and Sb(III), by being located in the vacuolar membrane and by pumping glutathione adducts, As(GS)3 (and presumably Sb(GS)3) from the cytoplasm into the vacuole  (Fig. 3).
2.1Varieties of genes and operons
It seems clear that arsenite membrane efflux pumps and arsenate reductases have evolved more than once, although the former are all called ArsB membrane pumps and the latter ArsC reductases for historic and functional reasons. It should be emphasized that families of ArsB and of ArsC have evolved convergently to provide similar solutions to the problem of environmental arsenic. So, for example, the E. coli ArsB is unrelated to the B. subtilis SKIN element ArsB, a protein with similar function ; and the E. coli ArsC is unrelated to the B. subtilis ArsC protein. Different families of proteins may be represented in the same microbe, for example the P. aeruginosa genome contains two arsC genes that are unrelated in sequence and probably function with different coupling proteins (see below). An analysis at the National Center for Bioinformatics ‘Cluster of Orthologous Genes’ (COG) site finds 47 presumed ArsC protein sequences at http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?seq=ArsC and 29 presumed ArsB sequences at http://www.ncbi.nlm.nih.gov/cgi-bin/COG/palox?seq=ArsB
Regardless of which family they belong to, an arsB gene is followed by and cotranscribed with an arsC gene, with few exceptions [28,39]. It is thought that in addition to separate evolutionary origins for different ArsB and ArsC families, lateral transfer of either or both genes occurred later in microbial evolution, although sufficiently long ago that the G+C content of ars genes in each organism closely resembles that of other genes. Since the early Earth atmosphere was not oxidizing, there was probably little arsenate present, and little pressure to evolve an enzyme to reduce arsenate. Therefore, we favor the hypothesis that arsenite efflux proteins evolved earlier than the arsenate reductases, which appeared only after the introduction of oxygen into the atmosphere 3.8 billion years ago by the first cyanobacteria. One can conclude that arsenic resistance including arsenite oxidation and arsenate reduction arose early in the evolution of cellular life on Earth in response to high environmental levels. The subsequent fixation of different families of arsenate reductase (ArsC) and arsenite efflux pumps (ArsB) in different bacterial lineages may have occurred later, but nevertheless prior to the origin of eukaryotic cells some 850 million years ago.
Genomic sequencing of a more diverse range prokaryotes plus associated physiological studies are needed to phylogenetic questions. A member of one of the two families of bacterial arsB and arsC genes is found in most of the more than 100 bacterial ars operons that have now been sequenced and arsC usually follows the arsB gene, even when the families are ‘mixed’. There are a few variants from this pattern: for example, in M. tuberculosis, the arsB and arsC genes are fused into a single gene, encoding a 498-residue protein; other mycobacteria have separate genes. The mycobacterial fusion protein is related to the ArsB and ArsC proteins of the B. subtilis ars operon, with 50% (ArsB) and 37% (ArsC) identical amino acids, and only a three-amino acid linker between regions. Other exceptions include the arsC genes in the chromosomes of P. aeruginosa (accession number gi15596147), Haemophilus influenzae and Neisseria gonorrhoeae that are not associated with arsB genes. The P. aeruginosa genome contains a second copy of the arsC gene (accession numbers gi15597475 and AAC69644) within a canonical arsRBC operon. Although the two sequences are unrelated, it is thought that both are functional.
Evolutionary trees of protein sequence similarities show three separate clades of arsenate reductases (ArsC; Fig. 4), the two bacterial families and the yeast version. The first and best-studied group of arsenate reductases is referred to in Fig. 4A as the glutaredoxin/glutathione (Grx/GSH) clade and has as prototype the ArsC arsenate reductase of E. coli plasmid R773 [39,48,49]. The plasmid R773 arsenate reductase uses three separate cysteines, one in arsenate reductase and the others in GSH and Grx, in the reaction mechanism (see below). For the nine examples shown in Fig. 4A, six critical amino acids identified by mutational studies in the ArsC from plasmid R773 are completely conserved (His8, Cys12, Ser15, Arg60, Arg94 and Arg107 in the R773 numbering). Additional residues are also conserved. All the sequences shown in Fig. 4A are considered orthologs with probable or demonstrated arsenate resistance as phenotype. There are no currently known homologous sequences with different substrates (i.e. paralogs).
The second family of arsenate reductases is referred to as the thioredoxin (Trx) clade in Fig. 4B and has ArsC of S. aureus plasmid pI258 as prototype [50–52]. As described below, the pI258-type proteins are related in structure and in function [52,53] to the family of low molecular mass protein phosphotyrosine phosphatases (LMW-PTPases), which are widely found in microbes and animals, including humans [53,54]. Many Trx-linked ArsC arsenate reductases are from low G+C Gram-positive bacteria, but those from the ars operons of P. aeruginosa and Actinobacillus ferrooxidans are from Gram-negative proteobacteria. For both of these latter two genes, there is evidence that they confer arsenate resistance. The A. ferrooxidans arsC product has been shown to use Trx and not Grx .
The third clade of cytoplasmic arsenate reductases is represented by the yeast Saccharomyces cerevisiae Arr2p arsenate reductase [43,44], which is related to a different class of larger protein tyrosine phosphatases that includes the eukaryote Cdc25 cell cycle proteins . The closest homolog to Arr2p is the product of another S. cerevisiae gene product, the uncharacterized YG4E protein. Arr2p is not found in the second yeast genome, that of Schizosaccharomyces pombe, although another S. pombe gene product, the thiosulfate sulfurtransferase rhodanase (involved in cysteine biosynthesis ), is slightly related and is shown in Fig. 4C. These structurally and functionally related proteins also function with thiol redox chemistry. The evolutionary sequence relationships and protein crystal structure similarities suggest that there are common chemical reaction mechanism features between arsenate reductases, phosphatases and sulfurtransferases, each of which uses an oxyanion substrate. Finally, two human examples of the Cdc25 class of large protein tyrosine phosphates are shown as outliers in the Arr2p clade (Fig. 4C). Arr2p and Cdc25 show sequence homology only in the short active site cysteine region .
In parallel to the distinct ArsC clades, there appear to be two distinct and unrelated families of ArsB, the membrane efflux pump (analysis not shown). Several new and unexpected findings come from this analysis. The total number of known ArsC clades, representing convergent evolution and separate families of arsenate reductase, is currently three; but this number may increase with more microbes being analyzed. Although ArsC sequences show clustering in branches (for example for Enterobacteriaceae in Gram-negative bacteria and Staphylococci in low G+C Gram-positive bacteria (Fig. 4), the deeper branchings of these trees are not overall consistent with the ‘universal tree’ of life based on 16S rRNA sequences. For example, the arsB and arsC gene products of staphylococcal plasmids belong to different families. The ArsB sequences of low G+C Gram-positive Bacillus and Staphylococcus are not significantly related. There are in addition Archaeal Ars protein sequences that are not shown in Fig. 4. Halobacterium and Archaeoglobus ArsB and ArsC sequences are closest to those of the Gram-positive organisms.
3Arsenate reductases: convergent evolution to similar chemistry
Enzymes capable of reducing arsenate to arsenite arose independently a number of times . This point was argued above based on sequence homologies of amino acid primary sequences. In this section, the same conclusion is reached from recent X-ray crystallographic solutions of protein structure and enzymatic reaction pathways.
The cytoplasmic arsenate reductases share mechanisms based on cysteine thiol oxidation/reduction cycling coupled to the general thiol recycling enzymes Grx or Trx [48,51,60]. Nevertheless, they can be subdivided into families whose sequences are unrelated and whose mechanisms differ in detail. These families have common properties that developed by convergent evolution. It is a situation at the enzymatic level somewhat similar to the wings of birds and butterflies: they carry out the same function but have no evolutionary shared ancestor.
The bacterial arsenate reductases [48,50,51,61] are small (131 amino acid residues for S. aureus and 141 for E. coli), monomeric cytoplasmic enzymes. The S. aureus ArsC enzyme uses Trx (a small intracellular protein that functions as a general disulfide reducing agent) in vitro [50,51]. In contrast, the E. coli enzyme requires reduced GSH and Grx, which is similar to Trx but with a different coupling specificity [48,62]. These differences and other major differences [31,48,51] now are understandable in terms of recent studies of protein structure and detailed reaction mechanisms.
3.1The E. coli GSH/Grx ArsC family
The ArsC reductase from the large resistance plasmid R773 has been well characterized enzymologically and structurally. Purified R773 ArsC exhibits arsenate reductase activity , which is dependent on the presence of reduced GSH and Grx. The enzyme has two cysteine residues, Cys12 and Cys106. However, only Cys12 is required for arsenate reduction . There is now crystallographic evidence for both covalent Cys12 thiolate-As(V) and thiolate-As(III) intermediates in the catalytic cycle .
The role of Grx in arsenate reduction was investigated [48,62]. In general, Grx can catalyze either intraprotein disulfide bond reduction or reduction of mixed disulfides between a protein Cys thiol and glutathione . The N-proximal cysteine of Grx is required for both reduction reactions, while the other cysteine is required for protein disulfide reduction but not reduction of mixed protein–glutathione disulfides. Single cysteine mutants of each of the three E. coli Grxs were used to distinguish between these two catalytic modes in R773 ArsC-catalyzed arsenate reduction . Grx mutants lacking the second cysteine could still couple E. coli ArsC activity. In contrast, the N-terminal cysteine mutants did not support arsenate reduction. These results indicate strongly that R773 ArsC forms a mixed intramolecular disulfide between ArsC Cys12 and GSH (or, as described below, alternatively the novel intermediate Cys12–S–As(OH)–SG). E. coli has three glutaredoxins, Grx1, Grx2 and Grx3, each of which has a Cys-Pro-Tyr-Cys dithiol consensus sequence , and all three can serve as electron donor for the reduction of arsenate by the R773 reductase, with relative efficiencies of Grx2>Grx3>Grx1 .
Martin et al.  reported high-resolution X-ray crystal structures for three forms of R773 ArsC, without bound arsenic and separate crystals with complexed arsenate or arsenite. The overall structures, both secondary structure fold and tertiary structure of ArsC (Fig. 5A), are unrelated to others in protein structure databases, indicating no significant global similarity to other known proteins. This is consistent with the lack of homologous paralogs in sequence libraries (Fig. 4A). There is no relationship between the tertiary structures of the E. coli R773 and the S. aureus and B. subtilis arsenate reductases (Fig. 5), supporting the conclusion that these two classes of enzyme are not related. All three structures have a core of four β-sheet regions, although these are all parallel for the two enzymes from the Trx clade (Fig. 5), but with one antiparallel β-sheet segment for the Grx/GSH enzyme from plasmid R773.
In the crystal structures of R773 arsenate reductase, the catalytic Cys12 residue is in close proximity to Arg60, Arg94 and Arg107, which are concluded to function by stabilizing the bound substrate arsenate and lowering the pKa of the Cys12 thiolate. When arsenate was soaked in to the R773 ArsC crystal, a covalent Cys12–S–As(V) bond was formed , which is likely to be an intermediate in the reaction mechanism. This covalent intermediate is analogous to S–P bond intermediate of protein tyrosine phosphatases [66,67] and shows tetrahedral geometry with a sulfur–arsenic distance of 2.18 Å. Arg60 moves closer to the Cys12 thiol when the sulfur–arsenate bond forms and the S–AsO3 adduct hydrogen bonds with the side chains of Arg60 and Arg107, as well as the amide N of Gly11 . When arsenite was soaked in to the crystal instead of arsenate, a different C12 S–As covalent bond was seen; and this is also considered a likely reaction intermediate . This structure appears to contain C12 S–As+OH−, which on addition of H2O and OH− would form the product As(OH)3, that is then released. The proposed reaction pathway and intermediates are summarized in Fig. 6A. The covalent S–As–O structure in the electron density map with arsenite has only two apparent atoms linked to the arsenic atom. There are two possible structures for the intermediate. One is Cys–S–As?O, a trivalent arsenical with a double-bonded oxygen similar to that found in compounds such as phenylarsine oxide. However, the arsenic–oxygen bond length of 1.86 Å is too long for a double-bonded oxygen. On the other hand, the distance is consistent with the second possibility, Cys–S–As+–OH−, a novel covalent enzyme–As(III) intermediate.
3.2The Staphylococcus Trx ArsC family
The S. aureus plasmid pI258 family of arsenate reductases uses different active site cysteine thiols (Fig. 6B) than does plasmid R773 ArsC. Nevertheless, the reaction mechanisms of the two arsenate reductases have marked similarities (Fig. 6). The pI258 enzyme couples to Trx, not Grx [50,51,68]. Unlike the E. coli arsenate reductase, replacement of any of three cysteine residues (Cys10, which is similarly located to Cys12 of R773 arsenate reductase and also functions as the reaction center; and Cys82 and Cys89, for which equivalent cysteines do not occur in the E. coli enzyme) leads to an inactive enzyme in vitro. The enzyme was proteolytically cleaved after catalysis, and peptide products identified by mass spectroscopy analysis indicated a Cys82–Cys89 oxidized cystine, again an intermediate in the reaction pathway  (Fig. 6B). Zegers et al.  solved the crystal structures of both reduced and oxidized S. aureus pI258 arsenate reductase, using an inactive Cys10Ser Cys15Ala double mutant to stabilize the protein from oxidation during crystallization and irradiation. The secondary and tertiary structures of pI258 arsenate reductase (Fig. 5C,D)  are remarkably similar to those of a LMW-PTPase from mammals, a relationship predicted  based on overall sequence homology (26% amino acid identities) as well as conservation of key residues in the ‘P loop’ active site, where the tyrosine-phosphate substrate approaches the active site cysteine (Cys10 for pI258 arsenate reductase). The stability of the P loop structure requires the presence of an oxyanion [69,70].
The overall secondary structure of the reduced (active) and oxidized (Cys82–SS–Cys89; inactive) forms of the pI258 arsenate reductase are basically the same, but there is a major change in conformation, with the Cys89 thiol moving more than 10 Å to form the disulfide bond (compare Fig. 5C and D). A recent report  combines further NMR spectroscopy and X-ray crystal structures for the Staphylococcus arsenate reductase to provide evidence for Cys82 nucleophilically attacking Cys10 to form a Cys10–Cys82 disulfide (Fig. 6B) and providing electrons to reduce artenate to arsenite. This model for the Trx-linked arsenate reductase (Fig. 6B) is quite different from the proposed reaction pathway for GSH-linked E. coli arsenate reductase (Fig. 6A). The question as to whether basically similar or fundamentally different chemistry occurs with the two enzymes is still not resolved.
The S. aureus plasmid pI258 arsenate reductase shows phosphatase activity with the model substrate p-nitrophenyl phosphate . The pI258 ArsC had a Kcat of 0.5 min−1, a very high Km of 146 mM, and an overall activity far below the range found with enzymatically characterized LMW-PTPases. The oxidized arsenate reductase and mutant Cys10Ser protein were inactive, whereas mutant proteins lacking any of the other three cysteines present in the sequence retained almost full PTPase activity. Arsenate was a competitive inhibitor of phosphatase activity, with a Ki very similar to the Km for arsenate reductase activity. By all appearances, pI258 arsenate reductase is a dual-function enzyme; and the question can be raised of which activity came first, that toward protein tyrosine-phosphates or that toward arsenate. Some of us favor the latter alternative and suggest that arsenate resistance arose early in cellular life, prior to regulation of cell division by phosphorylation/dephosphorylation [66,67]. Others of us have proposed that arsenate reductases evolved later after the atmosphere became sufficiently oxidizing to allow formation of abundant arsenate .
Based on the structure of pI258 arsenate reductase and the relationship to protein tyrosine phosphate, Zegers et al.  proposed a detailed catalytic mechanism (Fig. 6B) with interesting similarities and differences to that for the R773 arsenate reductase (Fig. 6A). As with the R773 ArsC, the active site Cys10 of the pI258 enzyme nucleophilically attacks the arsenate, with the leaving oxygen protonated from a closely located dicarboxylic acid residue, Asp105. Arsenate is thought to be reduced to arsenite by transfer of electrons from Cys10, which then forms a disulfide bond with neighboring Cys82. Cys89 then attacks Cys82, reforming the Cys10 thiolate. This stage involves the major conformational change in the enzyme. The major difference between the Trx and Grx/GSG classes of arsenate reductases is that GSH plays a role similar to that of pI258 residue Cys82 in the R773 ArsC (Fig. 6A).
A second difference in the catalytic reaction pathways in Fig. 6A,B is the novel Cys12–S–As–SG intermediate proposed for R773 ArsC , which is similar to a bridged S–As–S adduct previously seen with As-trypanothione. In contrast, the Trx arsenate reductases are proposed to release arsenite and form a Cys10–S–S–Cys82 intermediate (Fig. 5D) . An oxidized cystine intermediate was identified by mass spectroscopy  with a Cys89Ser mutant enzyme (that can only carry out a single reduction step because the active Cys10–SH cannot be recycled). Further molecular genetic and biochemical studies with both classes of ArsC enzyme will eventually clarify these differences in proposed intermediates (Fig. 6).
A third X-ray crystal structure for arsenate reductase was reported , this time for the enzyme encoded by the SKIN element in the chromosome of B. subtilis. The Bacillus enzyme belongs to the Trx-linked family (Fig. 4) (with 65% amino acid identity to ArsC of pI258). The proposed structures of reduced Bacillus and pI258 arsenate reductase are very similar (compare Fig. 5B and C), with only minor differences. As with the pI258 arsenate reductase, Asp105 of the Bacillus enzyme is positioned to coordinate with an arsenate bonded to Cys10 . Arg16 is thought to stabilize the active site and to lower the pKa values of Cys10, Cys82 and Cys89 to facilitate the reaction sequence. The Bacillus arsenate reductase also showed phosphatase activity with p-nitrophenyl phosphate, and basically the same reaction pathway (Fig. 6B) was proposed . While the Bacillus arsC gene has been characterized in vivo , the enzyme has not been studied in vitro before, as has the S. aureus plasmid pI258 enzyme.
3.3The Saccharomyces family, a third clade
The S. cerevisiae Arr2p is the only currently identified eukaryotic arsenate reductase, known to convert arsenate to arsenite [43,44]. It is unrelated to the bacterial arsenate reductases in primary sequence and appears to represent a third convergent evolutionary clade (Fig. 4). Unlike the bacterial ArsC reductases, purified Arr2p is a homodimer of two 130-residue monomers . The yeast Arr2p arsenate reductase shares GSH and Grx substrates with the R773 reductase. The yeast Arr2p arsenate reductase uses GSH and S. cerevisiae Grx1 (or any of the three E. coli Grxs) for recycling , similar to the R773 arsenate reductase. Moreover, Grx in which the C-distal cysteine of the CXXC consensus sequence was eliminated by mutation still couples to arsenate reduction , suggesting a catalytic pathway similar to that of R773 ArsC (Fig. 5A) and Arr2p is unable to use Trx as a substitute for Grx. The yeast ARR2 gene complements an E. coli strain with a deletion of the chromosomal arsC gene .
Arr2p shows significant sequence homology in the active site region to large protein tyrosine phosphatases (PTPases) such as the human cell cycle Cdc25a (Fig. 4C). However, the Cdc25a PTPases are unrelated to the LMW-PTPases that are paralogs of the Trx clade of arsenate reductases (Fig. 4B). Arr2p shares a consensus HC(X)5R motif with Cdc25a that is part of the active site of Cdc25A [57,66]. In Arr2p, this His75C76(X)5R82 motif is also involved in catalysis, since mutations with Cys76 or Arg82 substitutions (but not a His75Ala mutational change) eliminate both arsenate resistance in vivo and arsenate reduction in vitro . In the PTPases, the conserved cysteine is a catalytic residue, and the conserved arginine stabilizes the thiol phosphate intermediate. The initial step in both enzymatic reactions is likely to utilize similar oxyanion binding sites. In both enzymes, attack by the thiolate of the conserved cysteine residue, Cys76 in Arr2p or Cys430 in Cdc25a, results in the formation of a Cys–S–P intermediate or a proposed Cys–S–As.
Considering that the yeast Arr2p and the S. aureus ArsC are homologs of PTPases [66,67], it seems possible that all three families of arsenate reductases share ancestors with present-day phosphatases, retaining remnants of the ancestral activity and active site. The question remains of which came first in evolution, arsenate reductase or protein tyrosine phosphatase activity. To date the yeast Arr2p arsenate reductase is the sole such enzyme characterized in eukaryotes. Even at the genome level, the only homologous gene product is from another currently uncharacterized S. cerevisiae open reading frame (Fig. 4). An ARR2 gene has not been found with the fission yeast S. pombe or other fungal genomes. Clearly, more examples are needed to understand the evolutionary origin and range of the Arr2p arsenate reductase. Nevertheless, the data indicate that the evolution of arsenate reductases into PTPases (or the reverse; or the third alternative, evolution of both from an ancestral enzyme with a still different activity, such as a sulfotransferase) apparently occurred more than once. We favor the hypothesis that the ancestors of these families including ArsC and ArsB arose more than once , early after the origin of life on Earth. Since arsenic was already present in the early environment, at least one branch of each of these families quickly evolved into arsenate reductases, and another branch evolved into PTPases when phosphorylation/dephosphorylation regulatory cascades came into existence. The protein phosphatases including LMW-PTPases are broadly represented in recent releases of genomes of prokaryotes, both Archaea and Bacteria . However, the Cdc25 phosphatases appear to be limited to eukaryotes and therefore might be a later evolutionary invention, with yeast arsenate reductase and protein phosphatase paralogs (Fig. 4).
Interestingly, while purified wild-type Arr2p does not catalyze p-nitrophenyl phosphate hydrolysis, mutagenesis of residues in the active site has produced an enzyme that has gained phosphatase activity while losing reductase activity (R. Mukhopadhyay and B.P. Rosen, unpublished). The mutated ARR2 gene no longer confers arsenate resistance but complements a phosphatase-deficient yeast strain. From the ease of mutational engineering of Arr2p, it would seem that protein phosphatases could easily have evolved from arsenate reductases.
In addition to the cytoplasmic ArsC arsenate reductases of ars resistance operons, there are also periplasmic arsenate reductases that are components of respiratory electron transport chains in which arsenate is the terminal electron acceptor [71–73]. Arsenate serves as an anaerobic alternative to oxygen. The respiratory-linked reductases are poorly characterized at the protein and gene levels and will not be described further, except in the context of a hypothesized relationship to arsenite oxidase which functions in the opposite direction (see next below).
4Arsenite oxidation, microbial and enzymatic
Oxidation of As(III) represents a potential detoxification process that allows microorganisms to tolerate higher levels of arsenite. Several examples of bacterial oxidation of arsenite to arsenate were reported as early as 1918 (reviewed in [21,28,31,74]). Anderson et al.  purified and characterized arsenite oxidase from the Alcaligenes faecalis strain of Legge and Turner . This enzyme is located on the outer surface of the inner membrane and exhibits arsenite oxidation activity in the presence of azurin or cytochrome c as electron acceptor. The purified protein was initially thought to be a Mo-pterin monomer containing several metal centers, including both an [4Fe–4S] HiPIP (high potential iron protein) and a Rieske-type [2Fe–2S] center [75,77]. The Mo-pterin cofactor is released on denaturation, like that from other molybdenum proteins [78,79]. The crystal structure of arsenite oxidase was recently solved by X-ray diffraction analysis  and the genes for the two structural subunit polypeptides sequenced (L.T. Phung, unpublished), providing a detailed picture of both structure and reaction pathway (Fig. 7) analogous to those described above for the unrelated cytoplasmic arsenate reductase.
The structure of arsenite oxidase shows two subunits, a larger 88-kDa polypeptide containing the Mo-pterin and a HiPIP 3Fe–4S center and a smaller 14-kDa subunit with the Rieske 2Fe–2S center (Fig. 7) [77,80]. The Mo-pterin cofactor contains two pterins, oriented relative to one another up and down in the protein, similar to other Mo-pterin cofactors in the large dimethyl sulfoxide (DMSO) reductase family of proteins . This is a diverse superfamily of evolutionarily related enzymes that vary in substrates, midpoint electric potential and direction, so that some function physiologically as oxidases (as does arsenite oxidase), while others function as reductases such as DMSO reductase and probably the respiratory arsenate reductase studied by Krafft and Macy  (which is a Mo protein with small and large subunits, both containing Fe–S centers). Arsenite oxidase is closest in sequence to the subfamilies nitrate reductase and formate dehydrogenase, but there are only about 20% identical amino acids in each pairing . The large subunit structure of arsenite oxidase has a 3Fe–4S center, which is consistent with the Ser99 in arsenite oxidase in the position where a Cys residue would be needed to bind to a 4Fe–4S center . A flat funnel-shaped cleft on the large subunit structure allows As(OH)3 to enter (possibly coordinated by residues His195, Glu203, Arg419 and His423) and after oxidation allows HAsO42− to exit the protein in the reverse direction [77,80]. Arsenite is proposed to bond closely to the Mo(VI) of the oxidized cofactor, allowing a direct nucleophilic attack and transfer of two electrons. The remainder of the proposed steps shown in Fig. 7 involve oxidation of Mo(IV) back to Mo(VI) with electron transfer initially to the 3Fe–4S HiPIP center that is approximately 14 Å distant from the Mo atom (necessitating an electron pathway with several intermediates). The HiPIP center would be reoxidized by transfer of two electrons to the 2Fe–2S center on the small subunit (again the distance between the two centers requires intermediates), which is reoxidized by transfer of electrons to a component of the oxygen-consuming respiratory chain, either azurin or cytochrome c[75,77]. Useful energy is not generated by this respiratory process that functions to detoxify arsenite. The cells cannot grow with arsenite as an energy source .
The DNA sequence (L.T. Phung, unpublished) was obtained with an initial PCR product using oligonucleotide primers designed from the then unpublished primary sequence of Ellis et al. . The amino acid sequence translated from the large asoA gene contains only a single N-terminal residue (methionine) that was not resolved in the crystal structure. Twelve nucleotides before the start of asoA, the asoB gene for the small Rieske subunit ends. The gene for the small subunit potentially encodes 42 amino acids at its N-terminus that are not found in the crystal structure. These include a canonical signal sequence for a Tat (twin-arginine translocator) [82,83] non-Sec (secretion of polypeptides) protein export signal. Tat-translocated proteins in the periplasm are exported already folded in final tertiary form with bound cofactors (unlike the Sec system which exports unfolded polypeptides that then fold outside the cell membrane). The hypothesis from the preliminary gene sequence is then that a Tat signal sequence of the small subunit suffices for Tat export of the heterodimer, consisting of small and large AsoA and AsoB subunits, with inserted Mo-pterin and Fe–S clusters that were synthesized, assembled and inserted in the cytoplasm. This follows from the primary sequence of the unprocessed protein translated from the asoAB genes (L.T. Phung, unpublished) plus extensive studies on other members of the DMSO reductase family . We hypothesize that recently discovered respiratory arsenate reductases of anaerobic bacteria [71,73,84] will be recognized as a new branch of the large DMSO reductase family once primary sequences and structures become available.
The primary amino acid sequences from the DNA sequence and the electron density map of arsenite oxidase differ in a surprising 9% of positions, 78 in AsoA and 12 in AsoB. Those differences available when the article by Ellis et al.  was prepared (56 in the center of AsoA and not including the N- and C-termini) were corrected by Ellis et al.  from the DNA sequence (L.T. Phung, unpublished). No residues implicated in the reaction pathway or binding of cofactors differ. Thus, the Cys22–X2–Cys25–X3–Cys28–X70–Ser99 positions involved in binding the 3Fe–4S cluster are in both sequences, as are Cys60, His62, Cys78 and His81 binding the Rieske 2Fe–2S cluster in the small subunit . G. Anderson (personal communication) subsequently identified by direct sequencing six additional N-terminal amino acid residues in the Rieske subunit that are not resolved in the crystal structure but found in the gene translation product. In sum, crystallography and gene sequencing have recently placed the arsenite oxidase system ready for detailed molecular genetic analysis.
Work on arsenic resistance in our laboratories has been supported by grants from the U.S. National Institutes of Health (NIH Grant GM52216 to B.P.R.) and Department of Energy (to S.S.). Colleagues have freely exchanged information and ideas over the years, especially Gretchen Anderson, Carlos Cervantes, Russ Hille, Guangyung Ji, José C. Martins, Alan McEwan, Joris Messens and Xiao-Dong Su.