A. Gohla, Institute for Pharmacology and Toxicology and Rudolf Virchow Center, DFG Research Center for Experimental Biomedicine, University of Würzburg, Versbacher Str. 9, 97078 Würzburg, Germany Fax: +49 931 201 48539 Tel: +49 931 201 48977 E-mail: email@example.com
Phosphatases of the haloacid dehalogenase (HAD) superfamily of hydrolases are an ancient and very large class of enzymes that have evolved to dephosphorylate a wide range of low- and high molecular weight substrates with often exquisite specificities. HAD phosphatases constitute approximately one-fifth of all human phosphatase catalytic subunits. While the overall sequence similarity between HAD phosphatases is generally very low, family members can be identified based on the presence of a characteristic Rossmann-like fold and the active site sequence DxDx(V/T). HAD phosphatases employ an aspartate residue as a nucleophile in a magnesium-dependent phosphoaspartyl transferase reaction. Although there is genetic evidence demonstrating a causal involvement of some HAD phosphatases in diseases such as cancer, cardiovascular, metabolic and neurological disorders, the physiological roles of many of these enzymes are still poorly understood. In this review, we discuss the structure and evolution of human HAD phosphatases, and summarize their known functions in health and disease.
The HAD superfamily is a large and ubiquitous class of enzymes present in the proteomes of organisms from all three superkingdoms of life . Over 19 000 unique sequences of superfamily members have been identified so far, including 183 in Homo sapiens . While originally named after the haloacid dehalogenases that are mostly found in prokaryotes, HAD superfamily members in all organisms are quantitatively dominated by enzymes that catalyze phosphoryl transfer. The majority of these phosphotransferases are phosphatases (phosphate monoester hydrolases, ∼ 79%) and ATPases (phosphoanhydride hydrolases, ∼ 20%) [2,3]. This review focuses on mammalian HAD-type phosphatases.
Although the overall sequence identity between HAD phosphatases is typically very low (often < 15%), family members can be identified by amino acid sequence alignments based on the presence of four short HAD signature motifs that contain the conserved catalytic residues.
HAD phosphatases carry out catalysis differently from other well-known phosphatases. In contrast to enzymes from the alkaline phosphatase or tyrosine-specific phosphatase superfamilies that catalyze phosphoryl transfer using serine or cysteine nucleophiles, respectively, HAD phosphatases use an aspartate residue in the active site for nucleophilic attack [4–7]. This distinguishing feature of HAD phosphatases also explains their lack of sensitivity against commonly employed phosphatase inhibitors and may have contributed to the relatively slow appreciation of their multiple roles in mammalian cells.
Interest in mammalian HAD phosphoprotein phosphatases has soared with the seminal discovery that members of the Fcp/Scp subfamilies act as key regulators of transcription by dephosphorylating the C-terminal domain of RNA polymerase II [8,9], and with the exciting finding that the Eyes absent (Eya) family of transcription factors contain intrinsic HAD-type phosphatase activity that is crucial for organ formation [10–12]. Due to their phosphotyrosine or phosphoserine phosphatase activity, Eya or Fcp/Scp phosphatases have initially been grouped under phosphotyrosine phosphatases (PTPs) or metal-dependent serine/threonine-directed phosphatases [4,13]. However, it is clear that mammalian HAD phosphatases constitute a much larger family of enzymes, and that they have evolved independently from classic phosphatases .
Structural and mechanistic features of HAD phosphatases
Crystallographic work on pro- and eukaryotic HAD phosphatases has revealed that all members of the HAD phosphatase superfamily share the same structural arrangement of the active core [15,16]. The residues of the catalytic machinery are positioned in a modified Rossmann fold, which is characterized by a three stacked α/β sandwich, comprised of repeating β-α units. The central sheet is parallel and generally consists of at least five strands sequentially arranged in a ‘54123’ order, thereby orienting four loops which contain the core residues involved in positioning the substrate, the cofactor and the catalytic groups (Fig. 1A, B). The Rossmannoid fold typical of HAD phosphatases harbors three additional structural signatures that allow the enzyme to adopt distinct conformational states and that contribute to substrate specificity: the squiggle, flap, and cap domains (discussed below) [2,14,17].
The characteristic feature of HAD-type phosphatases is a two-step phosphoaspartyl transferase mechanism. In the first step, the Asp nucleophile initiates a nucleophilic attack on the phosphoryl group of the substrate, which results in the formation of a phosphoaspartyl enzyme intermediate and the displacement of the substrate leaving group. In the subsequent step, a water molecule exerts a nucleophilic attack on the phosphoaspartyl intermediate, thus releasing free phosphate and regenerating the catalytic Asp (Fig. 2). HAD phosphatases contain a second Asp residue positioned two residues C-terminal of the Asp nucleophile (designated Asp + 2). Asp + 2 functions as a general acid/base to protonate the leaving group in the first partial reaction, and to deprotonate the water nucleophile in the second partial reaction. The chemical advantage of a catalytic Asp is its versatility: it constitutes both a good nucleophile and a good leaving group, and it can operate at low and high pH. All HAD phosphoaspartyl transferases use Mg2+ as an obligatory cofactor. Mg2+ aids in the correct positioning of the substrate phosphoryl group relative to the Asp nucleophile, and electrostatically stabilizes the required close approximation of the anionic nucleophile to the dianionic substrate phosphomonoester (Fig. 1B). Furthermore, Mg2+ provides charge neutralization of the transition state. Together, the catalytic residues and the Mg2+ cofactor stabilize the trigonal bipyramidal transition state of both partial reactions .
Squiggle and flap elements
The split phosphoaspartyl transferase mechanism is dependent on an initial reaction that requires solvent exclusion (to favor the Asp-based nucleophilic attack), and a subsequent reaction that involves extensive solvent contact (leading to the hydrolysis of the aspartylphosphate intermediate) . Therefore, an essential aspect of catalysis is the alternation between closed and open states of the active site cavity. The basic structural features in the HAD Rossmannoid core responsible for this mobility are the unique and conserved ‘squiggle’ and ‘flap’ signature elements, which are located immediately downstream of the β1-strand of the core Rossmannoid fold (see Fig. 1A). The small squiggle domain of approximately six amino acids folds into an almost complete single helical turn, while the flap that is located C-terminally of the squiggle adopts a β-hairpin turn, thus projecting two strands out from the Rossmannoid core scaffold. The helical squiggle can switch between tightly or loosely wound conformations, thereby triggering a movement of the flap . Since the β-hairpin flap is located immediately adjacent to the active site, this squiggle-induced flap movement can partly cover the catalytic cavity. The conformational changes exerted by the squiggle and flap domains appear to constitute the minimal machinery required for solvent exclusion and solvent access at the active site.
Additional mobile inserts termed cap modules can provide more extensive shielding for the catalytic cavity than the simple flap elements. In addition, cap domains supply binding determinants for substrate selectivity, and they can also be involved in phosphatase oligomerization. Despite their structural diversity, caps can be divided into three broad categories, C0, C1 and C2, and based on this domain organization, HAD phosphatases fall within three structural subfamilies (see Table 1).
Table 1. Functionally characterized human HAD phosphatases.
Whereas C0 elements are very small, C1 and C2 caps fold into domains of considerable size that move extensively to mediate active site solvent occlusion/inclusion during the catalytic cycle (Fig. 3). Type C0 and C1 structures are inserted in between the two β-strands of the flap itself, whereas C2 caps are incorporated in the linker immediately after the β3 strand of the core domain. C0 caps represent the structurally simplest modules [19,20], and can consist of loops or β-strands, as for example in the polynucleotide kinase/phosphatase (PNKP, PDB: 3ZVL; ). The more elaborate and most common C1 caps are large enough to completely seal the enzyme’s active site in the closed state. C1 modules in HAD phosphatases are α-helical domains of varying complexities. For example, a tetrahelical bundle is found in the phosphoserine phosphatase (PSPH, PDB: 1L8L; ), and the C1 cap of Eyes absent 2 (Eya2) (PDB: 3GEB, 3HB0, 3HB1; ) folds into a large bundle of seven helices. The C2 caps are highly diversified modules, generally composed of α + β domains with a core β-sheet of at least three strands, to which other simple secondary structure elements can be added. Examples of C2-capped HAD phosphatases include pyridoxal 5′-phosphatase (Pdxp)/chronophin (PDB: 2OYC, 2P69; ) and phosphomannomutase 1 (PMM1, PDB: 2FUE).
Role of cap modules for substrate selectivity
Due to the location of the catalytic residues at the C-termini of the β-strands in the Rossmann core (see Fig. 1), the active site of HAD phosphatases is open. Therefore, C0 members such as PNKP or the RNA polymerase II C-terminal domain (CTD) phosphatases tend to process macromolecular substrates, and the bound substrate itself functions as a cap by excluding bulk solvent. In these cases, substrate selectivity may be provided by a number of invariant residues that line the entrance to the active site. However, structures of bacterial HAD phosphatases indicate that uncapped phosphatases may also utilize small substrates due to ‘pseudocapping’ by oligomerization via the flap segment . The presence of distinct C1/C2-type cap modules (or the pseudocapping by oligomerization) sterically restricts access to the catalytic cavity, and allows phosphatases to act on small molecules, which can be sequestered within the active site by cap closure. As an exception to this general rule, two capped phosphatases have been shown to act on macromolecular substrates: Eya, which can dephosphorylate the C-terminal tyrosyl residue of histone H2AX , and chronophin, which dephosphorylates not only pyridoxal 5′-phosphate, but also pSer3 of the actin-binding factor cofilin [23,25]. Therefore, capped proteins can be accessible to the termini of phosphoproteins.
Structural analysis of C1- and C2-type phosphatases has revealed the existence of ‘substrate specificity domains’ inserted in the caps [2,26]. These specificity modules generally consist of residues that interact with the substrate leaving group, define the electrostatic environment of the active site and activate the substrate for nucleophilic attack [26,27]. In addition, substrate binding may also stabilize the closed conformation of the cap domain, thereby providing specificity. As the number of structurally characterized mammalian HADs increases, it might become possible to identify conserved residues that are responsible for binding of particular substrate classes. However, determinants other than specific amino acid residues strategically placed in the cap modules may be important for specificity, such as those determining conformational flexibility and active-site sequestration .
HAD signature motifs
The catalytic core residues are highly conserved throughout the HAD phosphatase family and cluster into four signature motifs in the primary amino acid sequence that correspond to the four active site loop residues [1,3]. Therefore, the presence of these motifs provides a means of identifying family members via amino acid sequence alignments. Based on extensive work performed mostly on prokaryotes [2,14], HAD signature motif I contains the essential Asp nucleophile and has the extended consensus sequence hhhDxDx(T/V)(L/V)h (where h represents a hydrophobic residue, and x indicates any amino acid). The carboxylate group of the Asp nucleophile and the carbonyl backbone of the second Asp (Asp + 2) in motif I coordinate the essential Mg2+ in the active site. Motif II with the consensus sequence hhhhhh(S/T) contains a conserved Ser or Thr residue that helps to orient the substrate for nucleophilic attack by forming a hydrogen bond with its transferring phosphoryl group. Motif III is poorly conserved in comparison to the other motifs, and centers on a conserved Lys residue, which is spaced 18–30 residues apart from motif IV. The function of the motif III Lys is to stabilize the negative charge of the reaction intermediate together with Ser/Thr of motif II. Motif IV typically exhibits the consensus sequence (G/S)(D/S)x3-4(D/E)hhhh, but a DD signature instead of a Dx3-4D sequence is also observed [29,30]. Together with the Asp residues of motif I, the conserved motif IV acidic Asp or Glu residues are involved in the coordination of Mg2+. Motifs I–IV are spatially arranged around a single binding cavity at the C-terminal end of the strands of the central sheet that forms the active site of HAD phosphatases (see Fig. 1A,B).
The gene complement of human HAD phosphatases
On the basis of the above-discussed criteria that define the HAD family of phosphatases by the presence of a Rossmannoid structure of the catalytic core domain and the active site signature DxDx(V/T), we have determined the human gene complement of HAD phosphatase catalytic subunits. By database mining, we have identified 40 different genes and their corresponding protein products. Fig. 4A shows an alignment of motif I of 40 human HAD phosphatases and the consensus motif derived from this alignment, hhhDxDx(T/V)(L/I)h. The family conservation in the 14 amino acids surrounding motif I is shown in the sequence logo in Fig. 4B . Figure 4C shows motif II-IV catalytic core residues for selected human HAD phosphatases. In these proteins, we were able to unambiguously identify the active site residues by inspection of available crystal structures (see legend to Fig. 4C).
Using the Genome Reference Consortium Homo sapiens high coverage assembly GRCh37 (GRCh37e66, release February 2012), available via the Ensembl database, we have also performed a search for the existence of possible HAD phosphatase transcript variants. As detailed in the Supplementary Table S1, the listed 40 human HAD phosphatase genes can encode for 193 protein-coding transcripts. Thirty of these transcripts are predicted to be subject to nonsense-mediated decay, and two transcripts encode for a ‘protein’ product of only two amino acids. Thus, the listed 40 human HAD phosphatases could potentially generate 161 protein-coding phosphatase variants. It remains to be tested how many of these variants encode functional enzymes. An overview of currently functionally characterized human HAD phosphatases and their variants is given in Table 1.
Considering the existence of 103 genes encoding for catalytic subunits of Cys-based human PTPs, 15 genes encoding for catalytic subunits of the PPP family, and 16 genes encoding for human PPM catalytic subunits , HAD phosphatases amount to at least 22.98% of all human phosphatase catalytic subunits. However, whereas PPMs (represented by PP2C and pyruvate dehydrogenase phosphatase) contain catalytic and regulatory domains on one polypeptide chain , the catalytic subunits of PPP family members (including PP1 and PP2A as the most abundant Ser/Thr phosphatases) combinatorially associate with a diverse array of non-catalytic regulatory and structural proteins, thereby generating hundreds of functionally distinct proteins . It is currently not known whether the catalytic subunits of human HAD phosphatases also associate with regulatory or targeting subunits. Therefore, the relative abundance of functionally distinct HAD phosphatases compared to phosphatases containing PTP, PPP, or PPM catalytic subunits can at present not be precisely assessed.
Evolutionary history of metazoan HAD phosphatases
The HADs are not only an extremely large, but also a very old superfamily. As an estimation, five genes encoding HADs were already present in the last universal common ancestor . Radiation of these ur-genes has happened in all three superkingdoms of life, resulting in at least 23 protein families. Independently, some of these proteins evolved the capability to delete a phosphate group from a substrate, they became phosphatases. As most of the radiation happened in bacteria, we asked whether these HAD phosphatases have been a target of evolution also in the eukaryotic kingdom. Therefore, we traced the evolution of 22 HAD phosphatase families in the kingdom arguably most relevant for humans, the metazoans (detailed methods are provided in the Supplementary Text, Doc. S1). We found that most of these families have already been present in the last common ancestor of all animals. From here on, different branches have seen expansion and losses of different families (Fig. 4D). Most notable are the multiple duplications in six families at the base of the vertebrates. This expansion was coupled with the evolution of new functions like in the case of phosphoglycolate phosphatase (PGP)/Pdxp and PMM1/2 [33,34]. In addition to these duplication-driven neofunctionalisations, there seems to be one case of ‘de novo’ evolution of a phosphatase. Although present in all analyzed genomes, only the vertebrate members of the soluble epoxide hydrolase (sEH2) phosphatases show the hallmark DxDx(V/T) motif. Obviously, this sequence-based prediction needs further experimental characterization. Many evolutionary events also happened in the lineages leading to the classical model organism Drosophila melanogaster and Caenorhabditis elegans. Both have seen expansion and losses affecting six families. Surprisingly, in both D. melanogaster and C. elegans, the PGP/Pdxp family was expanded independently. This coarse-grained analysis reveals that HAD phosphatases are an active target of evolution in metazoans. Thus, the HADs are not only a useful model to study protein evolution on the level of superfamilies [14,35], but their enormous evolutionary flexibility also makes them good candidates to analyze how evolution generates functional diversity within one protein family.
Multidomain architecture of human HAD phosphatases
HAD phosphatases have undergone a remarkable expansion during the evolution of animals (see Fig. 4). Gene duplication events are typical of higher eukaryotes, and are often accompanied by the acquisition of new domains which further diversify and specialize protein functions. Thus, whereas many prokaryotic HAD phosphatases are small proteins that appear to consist of a single hydrolase domain, some human HAD phosphatases contain additional domains that give some indications as to their subcellular localization and functions (Fig. 5).
While there are no known examples of HAD phosphatases with extracellular domains (as can be found in receptor PTPs), the C-terminal domain nuclear envelope phosphatase/dullard homolog has a transmembrane helix motif required for nuclear membrane targeting , and the mitochondrial deoxyribonucleotidase mdNT contains a mitochondrial leader sequence . The specialized functions of PNKP and sEH2 for DNA repair or lipid metabolism, respectively, have been accomplished by the fusion of HAD phosphatase domains with DNA kinase or epoxide hydrolase domains [21,37]. PNKP additionally contains a DNA binding motif and a forkhead-associated domain that mediates binding to other DNA repair proteins. The Eya phosphatase domain is embedded in a region that mediates protein-protein interactions with DNA binding proteins, and the catalytic domain is additionally fused to a transactivation domain flanked by P/S/T-rich regions . To fulfill their functions in lipid metabolism on intracellular membranes, lipin phosphatases contain an amphipathic α-helix responsible for membrane association. In addition, lipins contain a nuclear localization signal and coactivator motifs to regulate the transcription of genes involved in fatty acid metabolism . The RNA polymerase II C-terminal domain (CTD) phosphatase Fcp1 contains a transcription factor TFIIF-interacting helix and a breast cancer protein-related carboxy-terminal (BRCT) domain that binds to the phosphorylated CTD [20,40], whereas the Fcp1-related small CTD phosphatases (Scps) lack the BRCT and TFIIF-binding domains , and the ubiquitin-like CTD phosphatase (UBLCP1) is additionally equipped with an ubiquitin-like domain .
Thus, while some HAD phosphatases have an elaborate extracatalytic multidomain structure, others display no additional recognizable domains. These phosphatases may associate with regulatory or targeting subunits (although very few interacting proteins of HAD phosphatases have been described so far), or they may operate as single hydrolase domain entities whose specificity is determined by their cap structure.
We have also performed a preliminary bioinformatic exploration of the link between the combination of multiple biochemical activities in some HAD phosphatases and the evolution of these enzymes. Although many of the domains found in eukaryotic HADs have already been present in prokaryotes, their fusion is eukaryote specific. The only exception is the combination of the HAD domain with a kinase domain of the AAA family in PNKP, which is also present in bacteria. As here the order of the domains is inversed, this fusion seems to have happened at least twice independently. The phylogenetic distribution of the domain architectures of lipins, PNKP and UBLCP1 suggest an origin in the last common ancestor of the eukaryotes. Also, the core architecture of CTDP1 (HAD + BRCT) was identified throughout all eukaryotes. The additional accretion of the TFIIF domain happened at the base of the vertebrates with the exception of one protein in the beetle Tribolium castaneum. The most recent event was the fusion of HAD with the epoxide hydrolase domain at the base of the tetrapods. Thus, the enzymatic combinations found in human multi-domain HAD phosphatases serve to further diversify and specialize HAD phosphatase functions.
Roles in human health and disease
A number of HAD phosphatases play important roles in a range of human diseases, including cancer, cardiovascular, metabolic and neurological disorders (see Table 1). This overview chapter highlights those HAD phosphatases whose causal link to human disease is supported by genetic or epidemiological data.
Whenever applicable, Table 1 also contains references to the Online Mendelian Inheritance in Man (OMIM) database, a genetic database that curates the medical literature for genetic disorders . In cases in which individual genes have been associated with a physiological phenotype, OMIM provides clinical descriptions together with some genetic information. However, the listed genetic variants may be linked to a biological phenotype more by statistical association than necessarily by functional or medical analysis. It has to be taken into account that OMIM does not list all known variants for each gene, and that OMIM does not attempt to report all genes and variants identified through genome-wide association studies (GWAS). For an in-depth analysis of all variants identified for a particular gene, the reader should therefore consult additional databases, such as GWAS central (https://www.gwascentral.org), HGMD (http://www.hgmd.org), LOVD (http://www.lovd.nl/2.0/), or MutaDATABASE (http://www.mutadatabase.org/) .
The unstructured C-terminal domain (CTD) of eukaryotic RNA polymerase II regulates transcription by recruiting different factors to nascent mRNA. The human CTD is composed of 52 tandem heptapeptide repeats with the sequence Y1S2P3T4S5P6S7, which are dynamically phosphorylated and dephosphorylated throughout transcription cycles. The extent and pattern of CTD phosphorylation represents a critical regulatory checkpoint for transcription and is determined by dedicated CTD kinases and phosphatases. Transcription initiation requires CTD dephosphorylation by phosphatases, which are therefore essential for the regulation of gene expression .
Fcp1 is the main serine phosphatase for the CTD and can processively dephosphorylate both pSer2 and pSer5. Varon et al.  have shown that the congenital cataracts facial dysmorphism neuropathy syndrome (CCFDN; OMIM #604168) is caused by Fcp1 loss-of-function. A single-nucleotide substitution in an antisense Alu element in intron 6 of CTDP1 (encoding for Fcp1) results in a rare mechanism of aberrant splicing and an Alu insertion in the processed mRNA. The insertion in the CTDP1 mRNA results in a premature termination signal 17 codons downstream of exon 6, with the mutant transcript expected to undergo nonsense-mediated decay or to produce a nonfunctional protein lacking the nuclear localization signal.
CCFDN is an autosomal recessive disorder prevalent among Gypsy families. This demyelinating neuropathy is characterized by progressive peripheral nerve abnormalities that lead to severe disability. It is currently not understood how nonfunctional Fcp1a results in the specific symptoms of CCFDN.
Small C-terminal domain phosphatases (Scp1-3)
Scps are structurally related to Fcp1, and control the RNA polymerase II transcription machinery by preferentially dephosphorylating the CTD on pSer5 . The expression of Scp1-3 is confined to non-neuronal tissues and neuroepithelial precursor cells, where they operate in a silencing complex to epigenetically block the inappropriate expression of specific neuronal genes . Since antagonism of the Scp pathway might promote neuronal stem cell differentiation in vivo, small molecule Scp phosphatase inhibitors could be powerful tools to direct neurogenesis and promote the regeneration of neurons, for example upon neuronal injury. Zhang and colleagues have targeted the unique hydrophobic binding pocket adjacent to the Scp active site [19,46], and have identified rabeprazol as a first lead compound that selectively inhibits Scp1, but not the related Fcp1 or Dullard proteins . This study provides a promising starting point for the design and optimization of potent and specific Scp inhibitors that may facilitate neuronal differentiation to repair nervous system damage.
Besides the regulation of transcription by CTD dephosphorylation, Scps can also recognize other substrates and fulfil additional biological functions. Scps1-3 can dephosphorylate and stabilize Snail, a key transcriptional repressor of E-cadherin. Stabilization of Snail by Scp enhances E-cadherin promoter suppression and promotes cell migration in vitro . Scps1-3 also dephosphorylate and modulate the activities of Smad1 and Smad2/3 proteins, which function as critical transducers of bone morphogenetic protein- and transforming growth factor β-initiated cellular responses [49,50]. Transiently overexpressed Scp3 can dephosphorylate the tumor suppressor retinoblastoma protein 1 (pRb1) in cells, and may thereby activate Rb1 to inhibit cell cycle progression . This finding may explain a role of Scp3 in cancer: CTDSPL (encoding for Scp3) resides in a chromosomal region (3p21.3) that is deleted in > 90% of major human carcinomas, including small cell lung cancer, renal cell carcinoma and breast carcinoma. Scp3 was found to be hemi- or homozygously deleted or functionally inactivated by mutations in some of these malignancies, and was additionally shown to function as a tumor suppressor in immunocompromised mice .
Polynucleotide 5′-kinase/3′-phosphatase (PNKP)
DNA damage occurs constantly by various internal agents (such as reactive oxygen species), during normal processes such as DNA replication, and by external agents (such as ultraviolet light). DNA damage is continuously repaired by several DNA repair pathways [52,53], and defects in these processes are considered to play a causative role in aging  and neurological disorders , and to be an important factor in the etiology and treatment of cancer . Ionizing radiation and other internal and external DNA damaging agents often generate DNA strand breaks with incompatible termini, which first require processing before strand resynthesis and ligation by DNA polymerases and ligases can take place. Termini with 3′-phosphate and 5′-hydroxyl groups occur very frequently, and PNKP is the major enzyme that restores the chemistry of strand breaks by generating the obligatory 3′-hydroxyl and 5′-phosphate termini for repair .
PNKP is a multidomain enzyme that consists of an N-terminal forkhead-associated (FHA) domain and a C-terminal catalytic domain, composed of fused HAD phosphatase and kinase subdomains . Interestingly, the phosphatase activity of PNKP appears to be much higher than its kinase activity, which may reflect the more frequent occurrence of 3′-phosphorylated termini upon DNA damage . PNKP is a key enzyme in several DNA repair pathways (i.e., single-strand break repair, base-excision repair and double-strand break repair), because it interacts with other DNA repair proteins, notably with phosphorylated XRCC1 and XRCC4, via its FHA domain .
Genetic defects in PNKP have revealed its essential role in the developing central nervous system. Neurons are particularly sensitive to mutations in DNA repair genes, and loss-of-function mutations in PNKP (leading to substitutions in the kinase and phosphatase domains) cause early infantile epileptic encephalopathy-10 (EIEE10; OMIM #613402) . EIEE10 is a severe, autosomal recessive disease that is characterized by intractable seizures, microcephaly and developmental delay.
In cancer cells, on the other hand, PNKP-mediated DNA repair can enhance the resistance to genotoxic therapeutic agents. The DNA repair capacity of tumor cells is regarded as an important factor in the clinical response to ionizing radiation and various chemotherapeutic agents, because it protects cells from genotoxic insults. Blocking DNA repair sensitizes tumor cells to apoptosis, and the recent identification of small molecule DNA repair protein inhibitors has increased clinical interest in this pharmacological concept. PNKP emerges as a particularly attractive therapeutic target due to its importance in multiple DNA repair pathways. A non-competitive, allosteric small molecule inhibitor of PNKP phosphatase activity has been identified . This polysubstituted imidopiperidine compound specifically blocks human PNKP, but not other related DNA phosphatases, PP-1cγ, or calcineurin. Given this lead structure, new inhibitory compounds will need to be identified and optimized for clinical use.
Soluble epoxide hydrolase 2 (sEH2)
Endogenous fatty acid epoxides such as the arachidonic acid-derived epoxyeicosatrienoic acids are signaling molecules that possess a wide variety of biological effects, many of which are related to cardiovascular physiology and inflammation. The human sEH2 is a homodimeric, bifunctional enzyme with a C-terminal epoxide hydrolase domain, which is responsible for the transformation of epoxyeicosatrienoic acids to the corresponding vicinal diols. sEH2 also has an N-terminal HAD-type phosphatase domain, whose function has long remained elusive [37,60]. Recent studies demonstrate that the sEH2 N-terminal domain can effectively dephosphorylate dihydroxy lipid phosphates and polyisoprenyl pyro- and monophosphates, which are metabolic precursors of cholesterol biosynthesis .
sEH2 has been identified as a heart failure susceptibility gene in a model of spontaneously hypertensive heart failure rats . Monti et al. found increased sEH2 expression and elevated epoxide hydrolase activity, leading to a more rapid hydrolysis of cardioprotective epoxyeicosatrienoic acids. Furthermore, EPHX2 gene ablation in mice protected from pressure overload-induced heart failure and cardiac arrhythmias. While a potential contribution of the sEH2 phosphatase activity was not investigated in this study, new findings indicate that sEH2 phosphatase activity may play an important pathophysiological role.
First, while the pharmacological inhibition of sEH2 epoxide hydrolase activity in vivo attenuates hypertension, this effect was markedly less impressive than the blood pressure reduction observed upon complete EPHX2 gene deletion in mice, pointing to a role of the phosphatase domain for blood pressure elevation .
Second, sEH2 phosphatase activity appears to contribute significantly to the role of sEH2 in lipid metabolism and lipid-related disorders. sEH2 phosphatase activity leads to an elevation of cholesterol levels, whereas the sEH2 epoxide hydrolase activity lowers cholesterol levels in cells and the administration of an sEH2 epoxide hydrolase inhibitor elevated cholesterol levels in vivo .
Third, several epidemiological studies link human EPHX2 polymorphisms with dyslipidemia and related disorders, such as atherosclerosis and coronary heart disease. The most frequently found EPHX2 SNP leads to a R287Q substitution, which is primarily associated with cardiovascular disease. Importantly, sEH2-R287Q displays significantly impaired epoxide hydrolase activity, but elevated phosphatase activity. sEH2-R287Q can act as a modifier of familial hypercholesterolemia caused by a heterozygous mutation in the low density lipoprotein receptor (LDLR) gene, and is associated with elevated plasma triglyceride levels (OMIM #143890). Other epidemiological studies have found an association of the sEH2-R287Q polymorphism with elevated risk for coronary artery calcification and subclinical arteriosclerosis , and EPHX2 haplotypes have also been associated with altered risk of ischemic stroke (OMIM #601367).
Based on the potent anti-inflammatory, vasodilator, and cardioprotective properties of epoxyeicosatrienoic acids, sEH2 epoxide hydrolase inhibitors are currently being developed as potential therapeutic strategies for the treatment of inflammatory disorders and cardiovascular diseases [64,65]. The findings described above add a cautionary note to the development of sEH2 hydrolase-targeted inhibitors for the treatment of cardiovascular diseases associated with dyslipidemia. Further studies are required to elucidate the mechanism of the sEH2 phosphatase-dependent increase of cholesterol levels. Ultimately, inhibitors of sEH2 phosphatase activity may offer novel therapeutic approaches for the management of dyslipidemia-related disorders.
The human 5′-nucleotidases are a large family of genetically unrelated enzymes that catalyze the dephosphorylation of (deoxy)ribonucleoside monophosphates [(d)NMPs] to the corresponding nucleosides, and function to maintain balanced cellular NTP and dNTP pools [66–68]. Characterized family members are ecto-5′-nucleotidase (eNT), cytosolic 5′-nucleotidase (cN)-IA, cN-IB, cN-II, cN-III, cytosolic 5′(3′)-deoxyribonucleotidase (cdN) and mitochondrial 5′(3′)-deoxyribonucleotidase (mdN). Additional, related sequences can be identified in databases (see Fig. 4). While eNT does not harbor typical HAD domains, HAD motif I is found in all intracellular 5′-nucleotidases. HAD motifs II-IV were identified only in mdN, cdN and cN-III. The crystal structure of human mdN has revealed the presence of a specificity motif (motif S) that forms hydrogen bonds with the substrate base , and motif S is present in all intracellular 5′-nucleotidases .
5′-Nucleotidases differ in their affinities for (d)NMPs, their subcellular localization and tissue distribution. The non-HAD-type eNT (also known as CD73) is an ubiquitous, AMP-hydrolyzing enzyme bound to the external leaflet of the plasma membrane. eNT produces extracellular adenosine, a ligand for G protein-coupled, purinergic receptors with important functions in cellular signal transduction. The HAD phosphatases cN-IA/-IB are cytosolic nucleotidases, characterized by their affinity towards AMP, while cN-II is an IMP and/or GMP preferring enzyme. cN-III is a pyrimidine 5′-nucleotidase, and mdN and cdN are 5′(3′)-pyrimidine nucleotidases. In addition to their nucleotidase activities, cN-II, cN-III and cdN have also been demonstrated to act as phosphotransferases, i.e., the phosphate from the phosphoaspartyl intermediate is transferred to another nucleoside instead of being hydrolyzed by water. In contrast, cN-IA does not display phosphotransferase activity .
Intracellular HAD-type 5′-nucleotidases play a general role in the salvage pathway to recover nucleosides that are formed during RNA and DNA degradation for the synthesis of nucleotides and in nucleic acid repair [68,69]. In addition, cN-III is involved in breakdown of pyrimidine nucleotides during erythrocyte maturation . Aside from these physiological functions, 5′-nucleotidases appear to play a role in the development of drug resistance against nucleoside analogues . Nucleoside analogues are important antimetabolites used in the treatment of cancer and viral infections [72,73]. These drugs inhibit DNA synthesis either directly, or through inhibition of DNA precursor synthesis by acting on the de novo or salvage pathways. Nucleoside analogues mimic natural nucleosides, and need to be converted to their active triphosphate forms in cells. 5′-(Deoxy)nucleotidases can dephosphorylate and thus inactivate the monophosphate forms of nucleoside analogues, and clinical data indicate that 5′-(deoxy)nucleotidases contribute to the development of nucleoside analogue resistance. Of the 5′-nucleotidases, cN-II has so far received most attention for its possible role in resistance to antimetabolites. cdN is also a good candidate for mediating nucleoside analogue resistance, due to its preference for the deoxyribonucleoside monophosphates that nucleoside analogues are designed to mimic .
Cytosolic 5′-nucleotidases-IA, -IB
Human cytosolic nucleotidases cN-IA and cN-IB are two closely related, AMP preferring 5′-nucleotidases. cN-IB is poorly characterized, but appears to be functionally similar to cN-IA. cN-IA expression predominates in the heart, where it is found associated with the contractile elements of cardiomyocytes. The main function of cN-IA appears to be the intracellular formation of adenosine from AMP under conditions of ATP breakdown, such as ischemia and hypoxia. cN-I requires a nucleoside diphosphate such as ADP for maximum activity. Thus, ATP consuming conditions increase cN-I activator (ADP) and substrate (AMP) concentrations, which together with decreased adenosine kinase activity ensure sufficient adenosine generation. The produced adenosine is then excreted and stimulates purinergic adenosine cell surface receptors, thereby increasing coronary blood flow, antagonizing the effects of catecholamines, and prolonging atrioventricular conduction time. Together, these actions of adenosine increase energy supply and reduce energy demand of the heart. A selective inhibitor of cN-IA has been developed, which effectively blocks adenosine formation in rat cardiomyocytes [reviewed in 66].
Cytosolic 5′-nucleotidase III (cN-III)
cN-III catalyzes the dephosphorylation of the 5′-pyrimidine monophosphates CMP and UMP to the corresponding nucleosides during RNA degradation in maturing erythrocytes. Although cN-III is found in various other cells and tissues, its function has mainly been studied in red blood cells. Deficiency of NT5C3 due to mutations in the gene causes autosomal recessive, non-spherocytic hemolytic anemia, characterized by a massive accumulation of pyrimidine nucleotides within the erythrocyte that interfere with glycolysis (OMIM #266120). After glucose 6-phosphate dehydrogenase and pyruvate kinase deficiencies, cN-III deficiency is the third most common cause of a red blood cell enzymopathy causing hemolysis .
Phosphoserine phosphatase (PSPH)
l-Serine is a non-essential amino acid that is available from dietary protein, protein and phospholipid degradation, and biosynthetically. The major endogenous source is the glycolytic intermediate 3-phosphoglycerate. This biosynthetic pathway involves three enzymes, 3-phosphoglycerate dehydrogenase (PHGDH), phosphoserine aminotransferase 1 (PSAT1) and PSPH that catalyzes the final and irreversible step. l-Serine is essential for the synthesis of proteins and other biomolecules needed for cell proliferation, including nucleotides, phosphatidyl-serine and sphingosine. l-Serine is also a precursor for the neuromodulators d-serine and glycine, both of which function as endogenous ligands at the ‘glycine site’ of the N-methyl-d-aspartat receptor , an ionotropic glutamate receptor with important roles for memory and learning .
Patients with congenital defects in the l-serine synthesizing enzymes present with severe neurological abnormalities, demonstrating that the de novo synthesis of l-serine plays an essential role in the development and functioning of the central nervous system. Among these, PSPH deficiency syndrome (OMIM #614023) is an autosomal recessive disorder caused by PSPH polymorphism. The syndrome is characterized by strongly reduced enzymatic PSPH activity, due to a PSPH-D32N substitution (resulting in 50% residual phosphatase activity), and a PSPH-M52T substitution (with no detectable residual enzymatic activity). Affected individuals present with intrauterine and postnatal growth retardation, congenital microcephaly, feeding difficulties and moderate psychomotor retardation. Some of these symptoms can be alleviated by substitution treatment with oral serine [77,78].
The l-serine biosynthetic pathway is important for cell proliferation, and has therefore been extensively studied as a potential target in cancer treatment. Via a negative-selection RNAi screening using a human breast cancer xenograft model at an orthotopic site in the mouse, Sabatini and colleagues have recently shown that the serine synthesis pathway is essential for tumorigenesis in estrogen receptor-negative breast cancer . PHGDH was amplified and PHGDH protein levels were elevated in the majority of the disease cases, and RNAi-mediated inhibition of PHGDH, PSAT1 and PSPH blocked tumor formation. Interestingly, serine production was not the only important role of PHGDH in these tumor cells, and the authors showed that the pathway contributes substantially to the anaplerosis of glutamate into the tricarboxylic acid cycle by producing α-ketoglutarate. These findings suggest that inhibitors of serine biosynthesis may be of value in the treatment of estrogen receptor-negative breast tumors .
A chemical genetic screen has identified P-Ser as an inhibitor of neural progenitor proliferation that stimulates neurogenic fate commitment, terminal differentiation, and nascent neuronal survival via the activation of the metabotropic glutamate receptor 4. These results suggest that elevating P-Ser levels by inhibiting PSPH may be of therapeutic value, e.g., for the therapy of stroke or spinal cord injury . A competitive PSPH inhibitor, AP3, has been described. This compound is a structural analog of P-Ser and also functions as a metabotropic glutamate receptor antagonist .
Eyes absent (Eya)
Eya proteins belong to a novel family of proteins identified in many animals. Humans have four paralogs, designated Eya 1-4. Eya proteins have been named after their critical function in a conserved network of transcription factors collectively termed the ‘retinal determination gene network’ for their role in Drosophila eye specification. Mammalian Eya proteins are involved in the formation of many tissues and organs, and mutations in human Eya proteins cause a variety of congenital disorders [38,82].
Eya proteins are defined by a highly conserved ∼ 270 amino acid C-terminal motif referred to as the Eya domain. This domain is required for interaction with a homeodomain protein called Sine oculis in Drosophila and Six in vertebrates, and with the transcriptional regulator Dachshund, named Dachshund homolog 1 in mice. Eya and Six operate as a transcription factor complex, in which Six mediates DNA binding, and Eya uses its N-terminal domain for the activation of transcription.
The realization that the Eya domain contains embedded HAD phosphatase signature sequences, and that Eya proteins are indeed functional phosphatases, was a breakthrough in developmental biology because it provided the first example of a transcription factor with an inherent phosphatase activity [10–12]. This work also raised awareness of the large and heterogeneous group of HAD phosphatases that had previously been characterized in prokaryotes, but had gone mostly unnoticed in higher eukaryotes.
While initial work in transfected cells had indicated that Eya’s phosphatase activity could be pivotal for the Eya-mediated transcriptional activation of some Six-dependent reporter genes , recent studies in Drosophila showed that reducing Eya phosphatase activity does not globally impair transcriptional output . Eya proteins function as protein tyrosine phosphatases, although the identification of physiological Eya targets remains an important issue. It has been demonstrated that Eya3 can dephosphorylate PTyr-142 of histone H2AX, a decisive phosphorylation mark that discriminates between apoptotic or DNA repair responses to genotoxic stress [84,85]. The H2AX dephosphorylation by Eya3 promotes the recruitment of DNA repair complexes and thus renders cells resistant to apoptosis. Therefore, the phosphatase activity of Eya may block an improper apoptotic response to physiological levels of genotoxic stress by dephosphorylating H2AX on tyrosine, and this function may be critical in mammalian organogenesis. Eya proteins have also been reported to dephosphorylate threonine residues, but this activity is apparently encoded in the N-terminal portion of the protein. This non-HAD-type threonine phosphatase activity of Eya4 has recently been linked to the regulation of antiviral innate immune responses by modulating the phosphorylation state of signal transducers for intracellular pathogens .
Mutations in the EYA1, SIX1, and SIX5 genes cause branchio-oto-renal syndrome (BOR1, OMIM #113650), and branchio-otic syndrome (BOS1, OMIM #602588). Mutations in EYA1 are detected in approximately 40% of affected individuals, whereas SIX mutations are much less common. BOR1 is an autosomal dominant disorder that is characterized by fistulas or cysts in the neck, hearing loss (found in > 90% of BOR1 patients), ear malformations and abnormalities of kidney structure and function, ranging from mild renal hypoplasia to a complete lack of kidney formation. BOR1 is estimated to affect about 1 in 40 000 people. BOS1 can be caused by allelic variants of EYA1, and is characterized by branchial and otic anomalies as seen in individuals with BOR1, in the absence of renal anomalies. Molecular genetic testing is clinically available.
A deletion mutation in human EYA4 has been identified as a cause of dilated cardiomyopathy type 1J (CMD1J) and heart failure, preceded by sensorineural hearing loss (OMIM #605362). The transmission of this genetic disorder is autosomal dominant. Biochemical analysis indicated that the shortened peptide of Eya4 produced by the deletion mutation failed to bind wildtype Eya4 and Six proteins, suggesting that its functions as a transcriptional co-activator may be impaired . Mutations in human EYA4 have also been identified at the deafness, autosomal dominant nonsyndromic sensorineural 10 locus (DFNA10; OMIM #601316). Affected individuals exhibit a postlingual, progressive form of deafness that can finally lead to severe-to-profound hearing impairment. This disorder is caused by a truncation that deletes the Eya domain, but not the variable domain of Eya4 . Since dilated cardiomyopathy has not been observed, the partial truncation of the Eya4 variable domain observed in CMD1J correlates with the occurrence of dilated cardiomyopathy. Because sensorineural hearing loss is generally caused by abnormalities in the hair cells of the organ of Corti in the cochlea, the phenotype of individuals affected by CMD1J and DFNA10 indicates that Eya4 is also important postdevelopmentally for the continued function of the mature organ of Corti.
In Drosophila, Eya and Sine oculis overexpression triggers tissue overgrowth , and elevated levels of Eya and Six family members have been observed in some malignant tumors in humans, including breast and ovarian cancers and malignant peripheral nerve sheath tumors [89–92]. In ovarian cancer, Eya2 is upregulated on the RNA and protein levels, in part due to genomic amplification, and this overexpression is significantly associated with short overall survival . Furthermore, the ectopic expression of Eya2 in xenograft tumors significantly promotes tumor growth in vivo . Conversely, RNA interference-mediated suppression of EYA4 expression in malignant peripheral nerve sheath tumor cells suppresses tumor growth in nude mice .
Transcriptional targets of mammalian Eya and Six proteins include not only the cell cycle regulatory genes cyclin D1 and cyclin A1, but also the proto-oncogene c-Myc, and ezrin, a regulator of the cytoskeleton and contributor to cell migration and metastasis . Indeed, both Six1 and Eya have independently been shown to mediate cancer metastasis [91,93–95], and it is the tyrosine phosphatase activity of the Eyas that is essential to promote breast cancer cell migration, invasion, and transformation in vitro . Using RNA interference-mediated depletion of Eya2 in MCF7 mammary carcinoma cells, it has recently been demonstrated that Six1 and Eya2 functionally interact during tumor progression, and that Eya2 is a necessary co-factor for many of the metastasis promoting functions of Six1 . These findings suggest that targeting the Six1-Eya interaction may represent a novel strategy to inhibit breast cancer progression.
PMM2 converts mannose 6-phosphate to mannose 1-phosphate, which is then transformed to GDP-mannose . This mannose donor is needed for the initial step of protein N-glycosylation. Oligosaccharide moieties on glycoproteins can determine their folding, transport, biological activity and stability. Therefore, protein glycosylation errors can affect a broad spectrum of cellular functions, including metabolism, cell recognition, adhesion and migration, host defense and antigenicity. Defying its namesake phosphomannomutase, the PMM2 paralog PMM1 has recently been shown to act as an IMP-stimulated glucose-1,6-bisphosphatase that may be involved in brain metabolism under ischemic conditions .
While PMM1 has not been linked with any hereditary disease, mutations in the PMM2 gene cause the congenital disorder of glycosylation, type Ia (CDG1A, Jaeken syndrome, PMM2-CDG (CDG-Ia); OMIM #212065; ). CDGs are autosomal recessive disorders, and CDG1A is the most widespread form with an estimated prevalence as high as 1 : 20 000. In individuals with enzymatically proven CDG1A, the mutation detection rate in PMM2 is as high as 100% and includes missense mutations and deletions. Molecular genetic approaches have been established to detect PMM2 sequence variants and exonic or whole-gene deletions of the PMM2 locus by sequence analysis or deletion/duplication analysis, and these methods are available for clinical testing. CDG1A usually presents as a severe neurological disorder in the neonatal period. The clinical phenotype of CDG1A is broad, with basic signs including developmental delay, cerebellar atrophy, peripheral neuropathy, hypotonia and psychomotor retardation. The lethality in the first year of life is 20% due to severe infections, liver insufficiency, or cardiomyopathy. There is currently no pharmacological option to correct the glycosylation defect in CDG1A patients, and a better understanding of the pathophysiology is needed to enable the development of therapeutic strategies .
The triglyceride core of lipid droplets is mainly synthesized through the sequential acylation of glycerol-3-phosphate. The penultimate step in triacylglycerol synthesis consists in the Mg2+-dependent dephosphorylation of phosphatidic acid to form diacylglycerol. This key step in lipid biosynthesis is catalyzed by a family of HAD-type phosphatases called lipin1-3 . Emerging evidence suggests that lipins also play crucial roles in the nucleus as transcriptional coactivators that regulate the expression of genes involved in lipid metabolism. The lipins exhibit distinct patterns of tissue-specific expression and appear to play non-redundant roles, with lipin1 being principally expressed in adipose tissue, skeletal muscle, and heart; lipin2 found predominantly in the liver, and lipin3 in the intestine. All three mammalian lipins possess phosphatidic acid phosphatase activity, but lipin1 is by far the most active enzyme. Common features of all lipins are their highly conserved N- and C-terminal lipin domains (termed NLIP and CLIP). The CLIP domain contains the transcriptional coactivator motif and the HAD domains required for phosphatidic acid dephosphorylation.
Lipin1-deficiency was identified as the cause of the disturbed metabolic phenotype of the fatty liver dystrophy (fld) mouse . Fld mice are lipodystrophic, exhibit multiple defects in adipose tissue development, and are characterized by insulin resistance, peripheral neuropathy and neonatal fatty liver. While inactivating LPIN1 mutations in humans are not associated with lipodystrophy for reasons that are currently unclear, human LPIN1 polymorphisms have been linked to rhabdomyolysis (also known as autosomal recessive recurrent acute myoglobinuria, OMIM #268200). Disease onset is typically in childhood, and can be fatal due to generalized muscle weakness and kidney failure. Using homozygosity mapping, Zeharia et al.  have identified six mutations in the LPIN1 gene in patients who presented with recurrent, massive rhabdomyolysis. These mutations create stop codons at residues 215, 388, and 800 or produce frame shifts as a result of exon skipping, and are thus predicted to result in truncated proteins lacking catalytic activity. Consistent with these molecular findings, analysis of the muscle tissue phospholipid contents demonstrated an accumulation of the lipin-1 substrates phosphatidic acid and lysophospholipids. Interestingly, the authors also identified one carrier for a pathogenic mutation in the LPIN1 gene (Glu769Gly) among six individuals who developed statin-induced myopathy . Several other studies have linked multiple additional polymorphisms in the LPIN1 gene (leading to lipins with impaired catalytic activities) to metabolic disease traits, such as insulin resistance and diabetes, blood pressure regulation, response to thiazolidinedione drugs, and susceptibility to statin-induced myopathy [101,103–106].
Mutations in the human LPIN2 gene, leading to lipin 2 deficiency, can cause a very rare autoinflammatory bone disease known as Majeed syndrome (OMIM #609628), which is characterized by recurrent multifocal bone and skin inflammation and dyserythropoietic anemia. The pathomechanism is currently unclear, but may be related to an accumulation phosphatidic acid that may trigger inflammatory signaling cascades.
Conclusions and future perspectives
HAD phosphatases constitute an ancient, large and very diverse group of enzymes that have evolved to specifically dephosphorylate carbohydrates, lipids, metabolites, DNA and serine-, threonine- or tyrosine-phosphorylated proteins in humans. While they have long been regarded as metabolic phosphatases with relaxed substrate specificities that fulfil merely housekeeping functions, recent findings prove otherwise. It is now clear that loss of some HAD phosphatases causes hereditary disorders, and evidence is accumulating that several human HAD phosphatases are involved in important diseases, such as cancer, cardiovascular, metabolic and neurological disorders.
It can be expected that the characterization of the physiological substrates, biological roles and modes of regulation of human HAD phosphatases, including the functions of their numerous splice variants and disease-associated SNPs, will provide a rich field for further study.
Allosteric regulatory sites have begun to be identified in some HAD phosphatases. Together with more structures of human HAD phosphatases becoming available, this knowledge should greatly facilitate the design of specific HAD phosphatase inhibitors for potential future invivo use.
This work was supported by grants from the DFG (SFB688, to A.G.), and by the Rudolf Virchow Center (DFG/FZ82, to A.G.).