Thiamine triphosphatase and the CYTH superfamily of proteins

Authors


  • Contribution to the minireview series: ‘Thiamine-dependent enzymes: new perspectives from the long-standing development on the interface of chemistry and biology’ (Coordinated by V. Bunik)

Abstract

The CYTH superfamily of proteins was named after its two founding members, the CyaB adenylyl cyclase from Aeromonas hydrophila, and the human 25-kDa thiamine triphosphatase (ThTPase). Members of this superfamily of proteins exist in all organisms, including bacteria, archaeons, fungi, plants, and animals (except birds), and can be traced back to the last universal common ancestor. Their sequences include several charged residues involved in divalent cation and triphosphate binding. Indeed, all members of the CYTH superfamily that have been characterized act on triphosphorylated substrates and require at least one divalent metal cation for catalysis. In most cases, the enzyme–substrate complex adopts a tunnel-like (β-barrel) conformation. The Nitrosomonas europaea, Escherichia coli and Arabidopsis thaliana CYTH proteins are specific inorganic tripolyphosphatases. We propose that inorganic tripolyphosphate, the simplest triphosphate compound, is the primitive substrate of CYTH proteins, other enzyme activities, such as adenylate cyclase (in A. hydrophila and Yersinia pestis), mRNA triphosphatase (in fungi and protozoans), and ThTPase (in metazoans), being secondary acquisitions. ThTPase activity is not limited to mammals, as sea anemone and zebrafish CYTH proteins are specific ThTPases. The acquisition of this enzyme activity is linked to the presence of a tryptophan involved in the binding of the thiazolium heterocycle of the thiamine molecule. Furthermore, we propose a conserved catalytic mechanism between a bacterial inorganic tripolyphosphatase and metazoan ThTPases, based on a catalytic dyad comprising a lysine and a tyrosine, explaining the alkaline pH optimum of these enzymes.

Abbreviations
AC

adenylyl cyclase

AC2

CyaB-like adenylyl cyclase from Aeromonas hydrophila (AC2)

AtTTM3

tripolyphosphatase from Arabidopsis thaliana

CthTTM

tripolyphosphatase from Clostridium

Gppp

guanosine 5′-tetraphosphate

hThTPase

human 25-kDa thiamine triphosphatase

LUCA

last universal common ancestor

m7G

7-methylguanylate

mThTPase

mouse thiamine triphosphatase

NeuTTM

tripolyphosphatase from Nitrosomonas europaea

PDB

Protein Data Bank

PPPi

inorganic tripolyphosphate

ThDP

thiamine diphosphate

ThTP

thiamine triphosphate

ThTPase

thiamine triphosphatase

TTM

triphosphate tunnel metalloenzyme

Introduction

During the last two decades, the systematic sequencing of a large number of bacterial, archaeal and eukaryotic genomes has greatly facilitated analysis of the protein families involved in the metabolism of phosphoryl compounds, i.e. organic phosphates and polyphosphates [1-3]. By applying powerful bioinformatic methods for database screening, it became possible to identify a large number of proteins belonging to a given superfamily of domains. However, the exact biological role(s) of many of the hypothetical proteins identified as members of a family remain ill-defined. One such instance is the so-called CYTH superfamily of domains [4].

In animal tissues, the only known biochemical function of CYTH enzymes is the hydrolysis of thiamine triphosphate (ThTP). In contrast to thiamine diphosphate (ThDP), ThTP is not a coenzyme, and is generally a minor thiamine derivative (< 1% of total thiamine). However, it has been found in all organisms investigated to date, from bacteria to mammals [5]. In Escherichia coli, ThTP could be a signaling molecule that is transiently produced in response to amino acid starvation [6]. In vertebrate tissues, ThTP can phosphorylate certain proteins, and might be part of a new cellular signaling pathway [7, 8]. In E. coli [9] and in rat brain mitochondria [10], ThTP is synthesized by a chemiosmotic mechanism that is an alternative to oxidative phosphorylation: indeed, the mechanism requires a proton-motive force created by the respiratory chain, and ThTP synthesis from ThDP + inorganic phosphate is catalyzed by FoF1-ATP synthase [9]. It remains to be clarified how and why, under special circumstances, FoF1 will catalyze the phosphorylation of ThDP rather than its normal substrate ADP. In rat brain, ThTP is mainly synthesized in mitochondria [10], and is released into the cytosol, where it can be hydrolyzed by the soluble 25-kDa ThTPase. This enzyme is highly specific for ThTP, and its most obvious function would be to keep ThTP concentrations low (< 1 μm) in most mammalian cells [11].

The soluble 25-kDa ThTPase (EC 3.6.1.28) was discovered by Hashitani and Cooper in 1972 [12]. It is a Mg2+-dependent enzyme (inhibited by Ca2+) with an alkaline pH optimum, and it seems to be present in all mammalian tissues. However, no soluble specific ThTPase activity has been detected so far in other animal tissues. This enzyme was purified from bovine brain [13], and its sequencing and molecular characterization were reported in 2002 [14]. The sequence had no homology with any known mammalian protein, but shortly thereafter, Iyer and Aravind [4] pointed out that the catalytic domains of human 25-kDa ThTPase (hThTPase) [14] and CyaB-like adenylyl cyclase (AC; EC 4.6.1.1) from Aeromonas hydrophilia (AC2 [15]) define a novel superfamily of domains that, according to these authors, should bind ‘organic phosphates’. This superfamily was therefore called ‘CYTH’ (CYA B-thiamine triphosphatase), and the presence of orthologs was demonstrated in all three superkingdoms of life. This suggested that CYTH is an ancient enzymatic domain, and that a representative must have been present in the last universal common ancestor (LUCA) of all extant life forms. It was proposed [4] that this enzymatic domain might play a central role at the interface between nucleotide and polyphosphate metabolism, but this role remained largely undefined: in fact, the two experimentally characterized members of the superfamily, AC2 and mammalian ThTPase, are likely to perform secondarily acquired activities. Plausible, more ancient (prokaryotic) CYTH enzymes have more fundamental roles in organic and/or inorganic polyphosphate metabolism [4]. Using multiple alignments and secondary structure predictions, Iyer and Aravind showed that the catalytic core of CYTH enzymes contained a novel α + β scaffold with six conserved acidic residues and four basic residues. At least four of the acidic residues (generally glutamates) are likely to chelate two divalent cations that are required for catalysis, as is the case for nucleotide cyclases, polymerases, and some phosphohydrolases [3, 4, 16].

Although representatives of the CYTH superfamily were found in bacteria, archaeons, plants, and animals, Iyer and Aravind [4] found no orthologs in fungi and protozoa. Yet, in 2006, Gong et al. [17] noticed a striking similarity between recently deposited structures of bacterial and archaeal CYTH proteins (of unknown functions) and the known crystal structure of Cet1 RNA triphosphatase (EC 3.1.3.33) from Saccharomyces cerevisiae. It had been shown previously [18] that this yeast enzyme has a novel active site fold whereby an eight-strand ß-barrel forms a topologically closed tunnel. Comparing Cet1 with the CYTH ortholog from the archaeon Pyrococcus furiosus [conserved hypothetical protein Pfu-838710-001; Protein Data Bank (PDB) 1YEM], Gong et al. [17] found that the structural similarity was striking (closed tunnel structure), although the primary structure conservation was low (14% identity in the DALI alignment). Nevertheless, a manual alignment revealed that the Pfu-838710-001 sequence recapitulated almost perfectly the eight-strand β-barrel of the triphosphate tunnel of Cet1. It was therefore proposed [17] that this tunnel fold was the prototype of a larger superfamily, including the CYTH branch, characterized by a β-barrel tunnel structure. This family would include enzymes able to bind triphosphorylated substrates such as ATP and ThTP and requiring divalent metal ions as activators. They were thus called triphosphate tunnel metalloenzymes (TTMs). It is not established, however, that all members of the CYTH superfamily actually have the β-barrel tunnel conformation. We therefore prefer to consider a larger CYTH superfamily (based on the N-terminal signature motif EXEXK and several other conserved charged residues) (Figs 1 and S1), including RNA triphosphatases from unicellular eukaryotes, but without assuming that all of these proteins necessarily have a tunnel-like conformation.

Figure 1.

Alignment of sequences from typical representatives of the CYTH superfamily of proteins. β-Sheets (blue boxes) and α-helices (red boxes) are indicated for hThTPase. For Cet1 from S. cerevisiae, only a truncated sequence (corresponding to the RNA triphosphatase activity) is represented, so that the residue numbered 1 corresponds to Phe303 in the whole Cet1 sequence. For H. sapiens, M. musculus, D. rerio, and Ne. vectensis, exon 2 begins at residues corresponding to Gly183. The alignment was performed with CLC sequence viewer, Version 6.8.2 (CLC bio A/S).

General features of CYTH proteins

Primary structure

In the alignments originally made by Iyer and Aravind [4], the CyaB AC from A. hydrophila was used to initiate the psi-blast search. This small globular protein, also referred to as class IV AC, is unrelated to other, well-characterized bacterial and eukaryotic ACs [15]. A closely related ortholog was recently described in Yersinia pestis [19, 20] (see below).

On the basis of the CyaB sequence, Iyer and Aravind found many orthologs in various phyla, notably bacteria, archaeons, plants, and animals. The alignments based on the CyaB sequence more or less spanned the entire length of a given CYTH protein, typically containing the same set of acidic and basic residues that are supposed to be important for enzyme function. In many cases, the region of similarity to CyaB comprised the entire length of the target protein detected, although, in some instances, it comprised only a part of the protein, the rest being composed of other globular domains.

As mentioned above, the second founding member of the CYTH superfamily is hThTPase. This is also a small globular protein, with several conserved acidic and basic residues homologous to those found in the CyaB sequence. As we recently undertook a detailed study of the structure and catalytic mechanism of this enzyme [21-23], we will use the hThTPase sequence (rather than CyaB) as the basis for numbering of the conserved residues that appear to be important for substrate and cofactor binding as well as for catalysis.

Alignments of typical CYTH sequences (from all superkingdoms of life) are shown in Fig. 1 (see also Fig. S1). In most cases, the catalytic core of the CYTH enzymes contains five or six conserved acidic residues and four conserved basic residues. Two pairs of glutamates (Glu7/Glu9 and Glu157/Glu159 in hThTPase) are believed to bind magnesium ions that are required as activators. Three conserved arginines (Arg55, Arg57, and Arg125) are supposed to bind the triphosphate part of the substrate (in fact, for all CYTH enzymes biochemically characterized so far, the substrate is a triphosphorylated compound). A relatively well-conserved lysine (Lys65 in hThTPase) may be directly involved in catalysis in some cases [23, 24].

Three-dimensional structure

Since the first molecular characterization of the Cet1 RNA triphosphatase from S. cerevisiae [18], the three-dimensional structures of eight other CYTH proteins have been reported (Fig. 2). In general, the active site appears to be situated within a closed tunnel composed of antiparallel β-strands, with charged residues pointing into the hydrophilic cavity. The tunnel structures are strikingly similar for seven of the nine proteins, but the Nitrosomonas europaea tripolyphosphatase (EC 3.6.1.25) and the mouse (Mus musculus) ThTPase (mThTPase) have an open β-barrel structure.

Figure 2.

Three-dimensional structures of nine representatives of the CYTH superfamily of proteins. The H. sapiens (PDB 3TVL) [23] and M. musculus (PDB 2TMV) [22] enzymes are specific thiamine ThTPases. The S. cerevisiae (PDB 1D8H) [18] and mimivirus of Ac. polyphaga (PDB 3BGY) enzymes are RNA triphosphatases [45]. The Y. pestis CYTH protein has AC activity (PDB 3N10) [19]. The N. europaea (PDB 3TYP) [24] and Ar. thaliana (PDB 3V85) [26] enzymes are specific tripolyphosphatases. No enzyme activities are known for the V. parahaemolyticus (PDB 2ACA) and the P. furiosus (PDB 1YEM) proteins. The N-terminus is shown in blue, and the C-terminus in red. All of the representations are taken from the PDB, and are X-ray crystal structures, except for the M. musculus protein, which is a NMR solution structure.

Thus, the closed tunnel conformation may not be a constant feature of CYTH proteins. It should also be pointed out that, although the crystal structure of hThTPase shows the typical closed tunnel structure, a study of mThTPase by NMR spectroscopy has revealed that the free protein in solution has an open cleft structure (Fig. 2). However, when the substrate ThTP was bound to the active site, a more closed tunnel-like conformation was observed [22]. This raises the possibility that closed tunnel-like conformations could correspond to enzyme–substrate complexes but not to free protein. It must be recalled that the proteins shown in Fig. 2 were crystallized as complexes with anions such as sulfate, a mimetic of the substrate phosphoryl group.

Catalytic properties

Only a few CYTH proteins have been functionally characterized so far, but the current evidence suggests that most of them are phosphohydrolases. It was originally proposed that the CYTH enzymatic domain would play a central role ‘at the interface between nucleotide and phosphate metabolism’ [4]. The idea of a fundamental role in nucleotide metabolism was largely derived from the fact that one of the founding members, CyaB from A. hydrophila, has AC activity [15]. Such an activity was also reported for the closely homologous CYTH protein from Y. pestis [20]. However, the AC activity of these proteins is weak under usual assay conditions, and its physiological relevance is doubtful (see below). On the other hand, the RNA triphosphatases from fungi and protozoa hydrolyze NTPs in the presence of Mn2+ or Co2+, but, in the presence of the more physiological Mg2+, the activity is very weak [25]. Two other well-characterized CYTH enzymes, apart from the mammalian 25-kDa ThTPase, are the tripolyphosphatases from N. europaea (NeuTTM [24]) and Arabidopsis thaliana (AtTTM3, [26]) [22, 23]. All three enzymes are very specific for their respective substrates, and have little NTPase activity. These findings suggest that CYTH enzymes do not play a central role in nucleotide metabolism. Rather, their common features appear to be the ability to bind triphosphate compounds and catalyze their chemical transformation (generally hydrolysis) in the presence of divalent metal activators. In general, the physiological activator appears to be Mg2+, whereas Ca2+ is inhibitory. Another feature of several CYTH enzymes is an alkaline pH optimum and a relatively high optimal temperature. Recent results [23, 24] have suggested that the same catalytic dyad is operative in NeuTTM and mammalian ThTPases. However, it is unlikely that the same mechanism is present in all CYTH enzymes (see next section). In conclusion, the CYTH domain appears to be a versatile fold, both structurally and functionally. Probably, nature has exploited this β-barrel structure for various purposes in many taxa.

Structure and function of biochemically characterized CYTH enzymes

So far, only a few CYTH enzymes have been biochemically characterized, and their physiological function is even less clear. Only fungal and protozoan RNA triphosphatases play a clear and indispensable role in RNA capping. Although CyaB from A. hydrohila has been shown to have AC activity, the latter activity is important only under rather extreme conditions (pH > 9, 50 °C), and the protein is not expressed under usual conditions of growth [15]. Concerning the bacterial tripolyphosphatase from N. europaea and the 25-kDa ThTPase from mammals, both enzymes hydrolyze their respective substrates with high specificity and catalytic efficiency, but the biological role of either tripolyphosphate (PPPi) or ThTP remains unclear. Interestingly, Ar. thaliana tripolyphosphatase seems to play a role in root development [26].

Class IV ACs

The first protein of this class was discovered in A. hydrophila [15], where the gene was designated cyaB and, as it was the second AC found in this organism, the protein was called AC2. AC activity was later also demonstrated in a closely related CyaB from Y. pestis [27]. The latter protein was designated YpAC-IV [20]. The recombinant YpAC-IV protein was crystallized and characterized in detail [19, 20]. The crystallized protein forms a dimer of two identical subunits (molecular mass of 20.5 kDa). It folds into an antiparallel eight-strand barrel, with nearly identical topology to Cet1 RNA triphosphatase from S. cerevisiae. However, the sequence conservation is limited to a few active site residues. In both cases, the β-barrel structure forms a topologically closed tunnel. Several conserved charged residues (Glu10, Glu12, Lys14, Arg63, Lys76, Arg113, and Glu136) lie inside the barrel core and form the likely binding sites for the phosphoryl part of the substrate (ATP or GTP) and the divalent metal activator (Mg2+ or Mn2+). These residues correspond to Glu7, Glu9, Lys11, Arg57, Lys65, Arg125 and Glu155 in hThTPase. This is in line with the fact that both enzymes bind a triphosphorylated substrate and a divalent metal activator, but, obviously, the chemistry involved in catalysis must be different (CyaB proteins have no phosphohydrolase activity).

As mentioned above, CyaB proteins do not appear to play an essential role in cAMP production in bacteria. In A. hydrophila, the physiologically important AC is AC1 (encoded by the cyaA gene), which belongs to the previously described class I ACs [15]. In contrast to cyaA, cyaB is not expressed under usual growth conditions. Moreover, the unusual properties of CyaB ACs (optimum activity at pH 9.5 and 65 °C) suggest that they could hardly play a physiological role under usual conditions of growth of this organism. Interestingly, Sismeiro et al. [15] noted a significant similarity between AC2 and the gene products of three hyperthermophilic archaeons. The structural similarity between a CyaB and a protein of unknown function from P. furiosus is also striking [17]. Therefore, horizontal gene transfer between hyperthermophilic archaeons and bacteria such as A. hydrophila and Y. pestis seems to be possible.

Tripolyphosphatases from bacteria and plants

In bacteria, the first characterized CYTH protein was CyaB from A. hydrophila, which was found to have AC activity [15]. For that reason, bacterial and archaeal CYTH proteins were annotated as hypothetical (CyaB-like) ACs, but it was not clear whether bacterial CYTH proteins other than those of A. hydrophila and Y. pestis had the same enzymatic activity. Therefore, Keppetipola et al. (2007) attempted to functionally characterize the CYTH protein from Clostridium thermocellum [28]. Unexpectedly, they found that the recombinant protein was completely devoid of nucleotidyl cyclase activity, but was a phosphohydrolase. NTPs were hydrolyzed in the presence of Mn2+ or Co2+, but not in the presence of the physiological Mg2+ cation. This is a common feature of fungal RNA triphosphatases. Most surprisingly, the Clostridium enzyme (CthTTM) was 150-fold more active in cleaving PPPi than ATP. This is in sharp contrast to Cet1 RNA triphosphatase, which does not hydrolyze PPPi (but the latter is a very potent competitive inhibitor of Cet1 ATPase). More recently, Jain and Shuman [29] demonstrated that, in the presence of Mg2+, CthTTM was able to hydrolyze not only PPPi but also guanosine 5′-tetraphosphate (Gppp). Interestingly, CthTTM was ~ 2000-fold more active on Gppp than on GTP, suggesting that the nucleoside part of NTP substrates causes hindrance of catalysis if it is positioned too close to the scissile phosphoanhydride bond. Therefore, Jain and Shuman (2008) considered the possibility that CthTTM (and possibly other TTMs) might hydrolyze long-chain polyphosphates [29]. These ubiquitous macromolecules play important roles in bacterial physiology [30]. CthTTM was indeed found to hydrolyze polyphosphates of various chain lengths, yielding Pi and pyrophosphate. However the activity was low and required Mn2+ (≥ 10 mm) rather than Mg2+. The physiological relevance of this polyphosphatase activity is thus doubtful.

These intriguing properties of CthTTM prompted us to investigate the closely related NeuTTM, a small protein of known crystal structure (PDB 2FBL and 3TYP). In contrast to many CYTH proteins, which have a closed tunnel structure, NeuTTM has an open β-barrel structure, with a rather rigid (open cleft) cup shape [24]. This conformation seems to be stabilized by a C-terminal broken helix that is linked to the β1 strand by a hydrogen bond (Fig. 2). It is therefore unlikely that binding of a substrate could induce a tunnel-like conformation, as is the case for 25-kDa mammalian ThTPase (see below) (PDB 3TVL and 2JMU).

Functionally, NeuTTM is a remarkably specific tripolyphosphatase. PPPi is hydrolyzed with high catalytic efficiency in the presence of Mg2+. Native MS data have shown that the enzyme binds PPPi (as well as the Mg–PPPi complex) with high affinity (Kd < 1.5 μm), whereas it has low affinity for ATP. There is no AC or polyphosphatase activity. NeuTTM has no ThTPase activity, but it shares some kinetic properties with mammalian 25-kDa ThTPase, e.g. heat stability, alkaline pH optimum, and inhibition by Ca2+ (by competition with Mg2+) and Zn2+ (at an allosteric site). Most remarkably, the catalytic mechanism for both enzymes appears to involve a common catalytic dyad (Lys52 and Tyr28 in NeuTTM, and the corresponding Lys65 and Tyr39 in mThTPase).

Finally, we investigated the CYTH enzyme from E. coli, called Ygif. The recombinant protein was also found to be a specific tripolyphosphatase [31], but the catalytic efficiency was lower than for NeuTTM. This may be linked to the lack of a lysine at the locus corresponding to Lys52 in NeuTTM (Figs 1 and S1).

Very recently, the first plant CYTH protein, AtTTM3 was characterized in Ar. thaliana [26]. Its crystal structure reveals the characteristic eight-strand β-barrel tunnel structure of CYTH proteins. This enzyme has a strong and relatively specific tripolyphosphatase activity. It does not hydrolyze high molecular mass polyphosphates, It also has weak dNTPase and NTPase activity, but does not have AC or RNA triphosphatase activity. ThTPase activity was not tested. Interestingly, it is highly expressed in the root proximal meristematic zone, and AtTTM3 knockout plants show an anomalous root phenotype, suggesting a role in root development.

Taken together, these results raise the question of the possible biological roles of both PPPi and the enzymes able to hydrolyze it. PPPi and other very short-chain polyphosphates have so far not been reported to exist in any organism, but this is simply because of the present lack of sensitive and specific detection methods. Like cAMP or perhaps ThTP, PPPi might act as an intracellular signal. When it becomes possible to detect low cellular amounts of PPPi and other short-chain polyphosphates, the latter might turn out to be just as important as their long-chain counterparts in cell biology.

RNA triphosphatases from fungi, protozoa, and DNA viruses

mRNA processing plays a critical role in the expression of eukaryotic genes. The earliest modification event is the formation of the 7-methylguanylate (m7G) cap structure, which promotes translation initiation and protects mRNA from degradation by 5′-exoribonucleases. All eukaryote species, as well as many eukaryal viruses, share a three-step capping pathway in which: (a) an RNA triphosphatase removes the γ-phosphate of the primary transcript; (b) an RNA guanylyltransferase transfers GMP from GTP to the 5′-diphosphate RNA to form a GpppRNA cap; and (c) an mRNA (guanine-N7-)-methyltransferase adds a methyl group from adenosylmethionine to the cap guanine to form the m7GpppRNA structure [32].

In contrast to RNA guanylyl transferases (which are structurally and mechanistically conserved among fungi, mammals, and DNA viruses), RNA triphosphatases are not conserved among eukaryotes, and fall into at least two mechanistically and structurally distinct families: (a) the divalent cation-dependent RNA triphosphatases of fungi, protozoa, and DNA viruses; and (b) the divalent cation-independent RNA triphosphatases of mammals and other metazoans. The latter enzymes show extensive amino acid sequence similarity with protein tyrosine phosphatases, and do not belong to the CYTH superfamily.

The yeast mRNA capping apparatus was first investigated by Mizumoto et al. [33], who characterized a preparation of yeast guanylyltransferase that consisted of two major polypeptides, a 52-kDa guanylyltransferase corresponding to Ceg1, and an 80-kDa RNA triphosphatase identified as Cet1 (capping enzyme RNA triphosphatase 1) [34]. Shortly thereafter, Ho et al. [35] characterized the recombinant Cet1 protein. It catalyzes the Mg2+-dependent hydrolysis of the γ-phosphate of triphosphate-terminated mRNA at a rate of 1 s−1. Monomeric 80-kDa Cet1 binds to recombinant Ceg1 in vitro to form a Cet1–Ceg1 heterodimer. This interaction elicits a > 10-fold stimulation of the guanylyltransferase activity of Ceg1. Deletion analysis showed that the N-terminal 200 amino acids of Cet1 are not essential for RNA triphosphatase activity or for binding to Ceg1. However, the 201–301 part of the sequence is required for binding to Ceg1, whereas the 301–549 part (which was later shown to be a CYTH domain) is responsible for the triphosphatase activity.

Purified recombinant Cet1 also has robust ATPase activity (Km = 2.8 μm; Vmax = 25 s−1) in the presence of Mn2+. Co2+ is also an effective activator but Mg2+ is not, casting doubt on the physiological relevance of NTPase activities. Nonetheless, the hydrolysis of NTPs to NDPs in the presence of Mn2+ [25] is considered by Shuman et al. to be the ‘signature function’ of TTMs, at least those with RNA triphosphatase activity.

Shortly after the biochemical characterization of Cet1 RNA triphosphatase activity, Lima et al. reported the crystal structure of the truncated Cet1 (241–539) peptide [18]. The 2.05-Å crystal structure revealed, for the first time, an active site fold whereby an eight-strand β-barrel forms a topologically closed triphosphate tunnel. This was the prototype of a family of proteins that Shuman et al. later called TTMs [17].

Members of this family with well-characterized RNA triphosphatase activity include orthologs from S. cerevisiae [36], Schizosaccharomyces pombe [37], Candida albicans [38], Plasmodium falciparum [17, 39], Trypanosoma brucei [40, 41], and Giardia lamblia [42]. They also include the RNA triphosphatases of several DNA virus capping systems [43-45].

Mutational analyses in yeast and other TTM RNA triphosphatases have highlighted a common mechanism of metal-dependent NTP hydrolysis. Four acidic (glutamate) residues appear to constitute the metal-binding site(s), and at least two basic (arginine and lysine) residues appear to bind the 5′-triphosphate [46]. A more detailed mutational analysis of the active site tunnel of Cet1 RNA triphosphatase [36] revealed an intricate network of hydrogen bonds and electrostatic interactions within the cavity, most of which are required for catalytic activity. Bisaillon and Shuman proposed a one-step in-line mechanism whereby the metal ion plus the Arg393, Arg458 and Lys456 activate the γ-phosphate for attack by water and stabilize a pentacoordinate phosphorane transition state [36]. The water molecule situated posterior to the γ-phosphate would be poised to act as the attacking nucleophile, and Glu433, coordinating this water molecule, would serve as a general base catalyst. Although speculative, this mechanism for NTP hydrolysis is different from the mechanism proposed for NeuTTM [24] and the mammalian 25-kDa ThTPase [23], both involving a lysine/tyrosine catalytic dyad. Although Bisaillon and Shuman postulate that a lysine is important for catalysis, it is not homologous to Lys65 in hThTPase. On the other hand, the glutamate that is supposed to act as a general base is homologous to Glu81 in hThTPase. Although conserved in metazoan ThTPases, the latter residue does not appear to be involved in the mechanisms of ThTP hydrolysis [23].

In general, all recombinant CYTH proteins that have been crystallized so far were obtained as homodimers, but dimerization is not important for enzyme activity in solution. It should be pointed out, however, that the homodimeric quaternary structure is required for the in vivo function and thermal stability of RNA triphosphatases from S. cerevisiae and Sc. pombe [47].

ThTPases of metazoans

The primary structure of hThTPase (as well as those of of orthologs found in the genomes of many mammals, Xenopus, zebrafish, amphioxus, sea urchin, and the sea anemone) includes residues conserved in many CYTH proteins, notably the glutamates involved in divalent metal cation binding (Glu7, Glu9, Glu157, and Glu159) and the arginines involved in triphosphate binding (Arg55, Arg57, and Arg125) (Figs 1 and S1).

Concerning the three-dimensional structure, the classical β-barrel is observed in ThTPases as in other CYTH proteins. A crystal structure of hThTPase, obtained in the presence of sulfate and citrate, was deposited in 2007 in the PDB (PDB code 3HBD) by the Structural Genomic Consortium (Toronto). The asymmetric unit of the crystal contained two molecules, and each of them had a closed tunnel-like structure, as in other TTMs. One year later, the solution structure of mThTPase was determined by NMR [22]. This revealed a more dynamic structure, with an open fold in the absence of substrate and a more closed structure in the presence of ThTP (Fig. 2). The protein was a monomer in solution. Its apparent affinity for ThTP (in the absence of metal ion) appeared to be remarkably high (Kdiss < 1 μm). The mThTPase has high catalytic efficiency and is very specific for ThTP [23].

Recent studies have shown that the catalytic mechanism of 25-kDA ThTPase relies on a lysine/tyrosine dyad (Lys65 and Tyr39), the two residues being homologous to the catalytic dyad found in NeuTTM [24]. Structural studies also allowed us to explain the specificity of ThTPases: by docking of the ThTP molecule in the active site of hThTPase [23], we found that Trp53 should interact with the thiazolium part of the substrate molecule, thus playing a key role in substrate recognition and specificity. Sea anemone and zebrafish CYTH proteins, which retain the corresponding tryptophans, were also found to be specific ThTPases, although their catalytic efficiency was lower than that of mammalian ThTPases. The specificity for ThTP thus appears to be linked to a cation–π or a charge transfer interaction between the thiazolium heterocycle of thiamine and a tryptophan. The latter probably plays a role in the secondary acquisition of ThTPase activity in early metazoan CYTH enzymes in the lineage leading from cnidarians to vertebrates.

The structure and the catalytic mechanism of human 25-kDa ThTPase are shown in Fig. 3. Note that the specific binding of the triphosphate part of the substrate involves the coordinated actions of an unusually large number of charged residues, many of them being conserved in CYTH proteins that are known to be phosphohydrolases.

Figure 3.

Structure and catalytic mechanism of hThTPase. (A) Representation of the three-dimensional structure of hThTPase with ThTP docked in its active site. Two protein β-strands (residues 5–12 and 78–82) participating in the closing of the tunnel are shown in green. Important ThTPase residues are shown as yellow sticks, and the ThTP molecule is shown as magenta sticks. Antiparallel β1 and β5 strands are indicated. (B) Close-up view of the active site with a PPPi molecule from the crystallographic structure superposed and shown as thin black lines. (C) Proposed catalytic mechanism of ThTPase. Adapted from [23].

It is interesting to note that, in thiamine pyrophosphokinase, which catalyzes the pyrophosphorylation of thiamine to ThDP, a tryptophan is also involved in thiamine binding, but in this case the indole ring of tryptophan interacts with the aminopyrimidine part of thiamine [48, 49]. On the other hand, enzymes that use ThDP as a cofactor form a very versatile class [50], but they share the same ThDP-binding fold. However, ThDP binding to decarboxylases does not seem to involve charge transfer interactions with an aromatic residue of the apoenzyme [51, 52].

Evolutionary significance

An attempt to reconstitute the phylogenetic tree of the CYTH superfamily is shown in Fig. 4. Such a tree is based on molecular phylogenetic analysis, assuming that the overwhelmingly dominant pattern of heredity is ‘vertical descent’, i.e. the passage of genes down through generations by replication processes within species. During the last 10 years, however, it has become manifest that ‘horizontal’ (or ‘lateral’) gene transfer, i.e. the exchange of genetic information across species, is far more pervasive than previously thought, especially in microorganisms [53].

Figure 4.

Reconstitution of the phylogenetic tree of the CYTH superfamily. Currently, four different enzyme activities have been identified for members of the CYTH superfamily: tripolyphosphatase (in C. thermocellum, N. europaea, E. coli, and Ar. thaliana), AC (in A. hydrophila and Y. pestis), RNA triphosphatase (in fungi and some protozoans), and ThTPase (in some metazoans, including vertebrates). We hypothesize that the original activity was the hydrolysis of low molecular mass polyphosphates. Adapted from [23].

In the case of the CYTH superfamily, some genes could indeed have been acquired through lateral transfer: for instance, cyaB from A. hydrophila might originate from hyperthermophilic bacteria and archaeons [15].

Nonetheless, we shall assume that most CYTH genes evolved through vertical descent. Evolutionary links in the CYTH superfamily were first identified on the basis of sequence similarity [4]. These data have revealed a fairly high continuity of molecular evolution, even through the transition from prokaryotes to eukaryotes. However, it is by no means clear that there is a corresponding continuity in the functional characteristics of these proteins. Like the majority of protein superfamilies, the CYTH superfamily shows variations in enzyme function. This functional diversity is generally correlated with local sequence variation and domain shuffling. Typically, substrate specificity is diverse, whereas the reaction chemistry is conserved [54]. This rule does not seem to apply to CYTH enzymes: the substrates, although diverse, were all found to be triphosphorylated compounds. On the other hand, catalytic mechanisms are clearly different for class IV ACs [19] and for CYTH enzymes with phosphohydrolase activity [23, 24, 36]. It is noteworthy that the same catalytic dyad seems to be operative in the bacterial NeuTTM and in the mammalian ThTPase, but this dyad is not conserved in all CYTH enzymes. Even in animal species, this dyad is conserved in the lineage between cnidarians (Nematostella) and vertebrates, but not in nematodes (Caenorhabditis elegans) and insects (Drosophila melanogaster). In the latter cases, there are presently no clues about the catalytic activity or biological role of CYTH proteins.

It should be stressed that RNA triphosphatases of fungi and protozoans are the only CYTH enzymes with a clearly established physiological role. It is all the more surprising that not only are CYTH orthologs present in all superkingdoms of life, but they have been found in nearly every genome examined so far. The only exceptions are birds. Although several bird genomes are presently known, no sequence corresponding to a CYTH protein has been identified. Moreover, the whole region of human chromosome 14 containing the thtpa gene has disappeared in birds (30 deleted genes) [23]. On closer examination of mammalian, zebrafish and sea anemone gene organization, it appears that, in all cases, an intron is present at an equivalent position in the coding region, i.e. between Leu182 and Gly183 in the human sequence, strengthening the idea that these genes are evolutionarily related. Such similarities are in line with the emerging picture [55] of extensive conservation in gene content, structure and organization between Nematostella and vertebrates. Even chromosome-scale linkage has been preserved between Nematostalla and vertebrates. In contrast, nematode and insect model systems have undergone extensive gene and intron loss, as well as genome rearrangement.

Furthermore, both in mammals and in fish, the ThTPase gene has the same genes (ZFH2X and NGDN) as neighbors, whereas all three have disappeared in birds. In contrast to the situation in birds, the genome of the lizard Anolis carolinensis contains 26 of the 30 selected genes conserved on the same chromosome 14. The disappearance of thtpa in birds is in line with our previous findings that no soluble ThTPase activity can be detected in bird tissues [5]. On the other hand, no soluble ThTPase activity has been detected so far in extracts from fish tissues [5, 56], although the gene is present and the recombinant enzyme from the zebrafish Danio rerio specifically hydrolyzes ThTP [23]. It was also found that, in pig tissues, the 25-kDa ThTPase was catalytically inefficient [57]. It is therefore possible that, in animal tissues, the ThTPase activity of CYTH enzymes is not physiologically relevant, but that CYTH proteins may exert other functions that are unrelated to catalysis.

Such alternative functions seem to be all the more possible given that structural features appear to be better conserved than catalytic mechanisms in CYTH proteins. As emphasized by Gong et al. [17], the crystal structures that have been determined so far show remarkably similar tunnel-like structures, with eight antiparallel β-strands. Another key feature of TTMs is the presence of four conserved glutamates that are responsible for divalent cation binding. Note that metal ion binding also occurs in the absence of substrate. It is possible that the most primitive role of CYTH proteins was the binding of PPPi as well as one or two divalent metal ions in the hydrophilic cavity of the β-barrel structure. Starting from a hypothetical primitive tripolyphosphatase, several types of metal-dependent catalytic mechanisms would have evolved, and other functions unrelated to catalytic activities may also have appeared. Thus, it seems that the most salient feature of the CYTH superfamily of proteins is the relatively well-conserved three-dimensional structure combined with remarkable functional diversity.

Acknowledgements

L. Bettendorff is Research Director and P. Wins is an honorary Research Associate at the Fonds de la Recherche Scientifique-FNRS. This work was supported by grant number 2.4508.10 (L. Bettendorff) from the Fonds de la Recherche Fondamentale Collective (FRFC).

Ancillary