Laboratory of Genetic Engineering and Cellular and Molecular Biology, Quilmes National University, Bernal, Buenos Aires, Argentina
P. D. Ghiringhelli, Laboratory of Genetic Engineering and Cellular and Molecular Biology, Department of Science and Technology, Quilmes National University, Roque Saenz Peña 352, Bernal 1876, Buenos Aires, Argentina
Pablo D. Ghiringhelli and Daniel E. Gomez contributed equally to this work.
Here, we review current knowledge about pseudouridine synthase and archaeosine transglycosylase (PUA)-domain-containing proteins to illustrate progress in this field. A methodological analysis of the literature about the topic was carried out, together with a ‘qualitative comparative analysis’ to give a more comprehensive review. Bioinformatics methods for whole-protein or protein-domain identification are commonly based on pairwise protein sequence comparisons; we added comparison of structures to detect the whole universe of proteins containing the PUA domain. We present an update of proteins having this domain, focusing on the specific proteins present in Homo sapiens (dyskerin, MCT1, Nip7, eIF2D and Nsun6), and explore the existence of these in other species. We also analyze the phylogenetic distribution of the PUA domain in different species and proteins. Finally, we performed a structural comparison of the PUA domain through data mining of structural databases, determining a conserved structural motif, despite the differences in the sequence, even among eukaryotes, archaea and bacteria. All data discussed in this review, both bibliographic and analytical, corroborate the functional importance of the PUA domain in RNA-binding proteins.
pseudouridine synthase and archaeosine transglycosylase
RNA binding domains
RNA-binding proteins are fundamental for many aspects of gene expression and cellular functions, including important post-transcriptional processes. RNA-binding proteins are involved in each step of RNA metabolism. Most of them are composed of small RNA-binding domains (RBDs) that are needed for their recruitment to specific RNA targets. In recent years, determination of the structures of RNA–protein complexes by crystallography and NMR has led to increased knowledge about these proteins and their domains. RNA-recognition modes have been extensively studied and are highly versatile. Currently, diverse RBD families are known (Table 1), including the pseudouridine synthase and archaeosine transglycosylase (PUA) domain [8, 9]. Table 1 shows all the RBDs found in three protein databases (Pfam, Smart and PROSITE).
Table 1. RNA-binding domains.
RNA recognition motif (RRM, RBD, RNP domain)
Found in a variety of RNA-binding proteins. Also appears in a few single-stranded DNA-binding proteins 
U1A, PABP, HuD, protein components of small nuclear RNPs
Double-stranded RNA binding motif (dsRBD)
Found in a variety of RNA-binding proteins with different structures and exhibiting a diversity of functions [2, 3]
RNA adenosine deaminase, Endoribonuclease Dicer
K homology RNA-binding domain, RBD (KH)
First identified in the human heterogeneous nuclear ribonucleoprotein (hnRNP) K. Binds RNA and may function in RNA recognition [4, 5]
PNPases, RS3 ribosomal proteins, human onconeural ventral antigen-1 (NOVA-1)
Piwi, Argonaut and Zwille (PAZ)
Function unknown, but has been suggested to mediate complex formation between proteins of the Piwi and Dicer families by heterodimerization [6, 7]
Argonaute and Piwi proteins, proteins involved in gene silencing
Pseudouridine synthase and archaeosine transglycosylase (PUA)
Highly conserved and found in a wide range of archaeal, bacterial and eukaryotic proteins. Have a common RNA recognition surface, with some versatility in the way in which the motif binds to RNA [8, 9]
Enzymes that catalyze tRNA and rRNA post-transcriptional modifications, dyskerin, eIF2D
Oligonucleotide/oligosaccharide binding (OB fold)
Has a five-stranded β sheet coiled to form a closed β barrel. This barrel is capped by an α helix located between the third and fourth strands 
Clan (42 families).
Rnase E, Rnase II, NusA, BEM-5, ribonuclease G
Pumilio homology domain (PUF)
Multimeric domain of eight tandem repeats that are sufficient for sequence specific RNA binding [11, 12]
Fly Pumilio, Worm FBF–1, Worrn FBF–2
U1-like zinc finger (Znf) (ZnF_U1)
Family of C2H2-type Znf. Bind DNA, RNA, protein and/or lipid substrates [13, 14]
TIS11d, HIV–1 NC, U1 small nuclear ribonucleoprotein C
Co-antiterminator domain (CAT_RBD)
Found at the N–terminal end of transcriptional antitermination proteins. Binds to ribonucleotidic antiterminator hairpin [15, 16]
BglG, SacY, LicT
Involved in formation of the telomerase holoenzyme in addition to recognition and binding of RNA 
S4 RBD (S4)
Small domain consisting of 60–65 amino acid residues [8, 18]
Ribosomal protein S4 and S9, RNA methylases
RBD abundant in apicomplexans (RAP)
Consists of multiple blocks of charged and aromatic residues in eukaryotic proteins [19, 20]
Human hypothetical protein MGC5297, mammalian FAST
Thiouridine synthases and methylases (THUMP)
Adopts an α/β-fold similar to the C–terminus of translation initiation factor 3 [21, 22]
4-thiouridine, pseudouridine synthases and RNA methylases
Several enzymes act upon RNA and although we can not consider them to be a single protein family, they have a common function in enzymatically modifying RNA. In these proteins, the enzymatic domain itself binds RNA and contributes to the enzyme specificity . Within this latter group of proteins with enzymatic activity, we find pseudouridine (Ψ) synthase, which catalyzes the isomerization of uridine to pseudouridine (Ψ) in a variety of RNA molecules, and may function as an RNA chaperone. The domain most commonly found associated with these proteins, PUA domain, is also found in diverse and varied proteins, including archaeal sulfate reductases, bacterial and yeast glutamate kinases and proteins involved in ribosome biogenesis and translation initiation.
Ψ synthases from archaea, bacteria and eukarya can be classified into six families, named after the Escherichia coli enzymes RluA, RsuA, TruA, TruB, TruD and Pus10 [23, 24]. These proteins have substrate specificity, recognizing uridine in the context of RNA, utilizing the sequence or structural context of their target site. Only the TruB family has a C–terminal PUA domain. In bacteria, all TruB family synthases are capable of performing its function without any accessory factors; by contrast, their eukaryote and archaea counterparts function as part of ribonucleoprotein (RNP) complexes. The defining molecules of these RNPs are their small nucleolar RNAs (snoRNAs), which can be divided into H/ACA and C/D classes, and function predominantly in the modification of rRNA. H/ACA RNPs have been identified as (sno)RNPs, but depending on their site of maturation and action, they can also be classified as small Cajal body RNPs [25, 26]. There are ~ 200 different H/ACA RNAs, which associate with the same four core proteins: dyskerin (Cbf5 in yeast, Nap57 in rodents and Nop60B in Drosophila melanogaster), Gar1, Nhp2 (L7Ae in archaea) and Nop10 to form an H/ACA RNP [27, 28]. These proteins are evolutionarily highly conserved with orthologs in eukaryotes and archaeas. Dyskerin ψ synthase provides the catalytic activity for the H/ACA RNP complex . Accessory proteins are required to bind its target RNA and trigger its enzymatic activity . Although the main function of H/ACA RNPs is ψ synthase activity , these proteins also participate in specific cleavage of the precursor to 18S rRNA, and dyskerin is also needed for the normal function of the telomerase RNP and telomere maintenance .
The main characteristics of PUA domains, their biological functions and architecture were last reviewed in 2007 . Here, we provide an update (with bioinformatics tools and data mining analysis), including a complete list of all known proteins with a PUA domain, a description of their biological functions, sequence homology and underlying structures. Moreover, we intend to expand knowledge of the molecular basis for RNA recognition and evolution of the PUA domain in different proteins.
Results and discussion
Major features of the PUA domain
The PUA domain is a highly conserved RNA-binding motif found in a wide range of archaeal, bacterial and eukaryotic proteins. This group includes proteins involved in ribosome biogenesis and translation, enzymes that catalyse tRNA and rRNA post-transcriptional modifications, as well as enzymes involved in proline biosynthesis. This domain was detected in archaeal and eukaryotic ψ synthases, archaeal archaeosine synthases (TGTs) and a family of predicted archaeal and bacterial rRNA methylases. In addition, the PUA domain was detected in a family of eukaryotic proteins that comprise a novel type of translation factor, and also in bacterial and yeast glutamate kinases (G5Ks) .
PUA was initially defined as a compact domain that presents a common RNA recognition surface with α/β architecture , however, the length of the PUA domains currently detected by Pfam , PROSITE  and CDD  varies between 64 and 96 amino acids for each family of proteins. The PUA domain contains highly conserved motifs that center on stretches of hydrophobic residues , and also three highly conserved positions occupied by glycines or small amino acids (Fig. 1). The overall sequence logo (Fig. 1) created by alignment of all retrieved PUA domain sequences shows conservation of these residues. All proteins studied presented between 10 and 14 polar residues, most of which are glycines (five of these glycines are conserved in position in the overall alignment), and between 2 and 7 coserved acidic residues. Determination of the crystal structure of archaeosine tRNA–guanine transglycosylase from Pyrococcus horikoshii has shown that the PUA domain consists of two α helices and six β strands, and folds into a β–sandwich structure similar to the oligonucleotide-binding fold .
At the time of writing, the PUA domain was known to be present in 2084 species and 3208 stored sequences (Pfam database). This domain showed interactions with five other domains stored in Pfam (domain Nop10p, TGT, TruB_N, DKCLD and DUF1947) and 26 structures stored in the PDB contain the domain.
PUA domain and global proteome
As mentioned above, the PUA domain is found in proteins from different species of the three superkingdoms. From this universe of species, we selected a representative set of 20 to analyze the distribution and characteristics of the PUA domain: Homo sapiens, Rattus norvegicus, Mus musculus, Bos taurus, Gallus gallus, Danio rerio, Drosophila melanogaster, Arabidopsis thaliana, Anopheles gambiae, Aedes aegypti, Caenorhabditis elegans, Schizosaccharomyces pombe, Saccharomyces cerevisiae, Kluyveromyces lactis, Escherichia coli, Salmonella enterica, Shigella dysenteriae, Pyrococcus furiosus, Pyrococcus abyssi and Methanocaldococcus jannaschii. We found eight groups of PUA-domain-containing proteins, five of which are present in most of the organisms studied. Table 2 gives a summary of the function of these characteristic proteins and the specific role of the PUA domain in each. It can be seen that PUA-domain-containing proteins might have different global functions. This is because these proteins have modular domains and the conserved PUA domain appears as a single architectural unit or in diverse combinations with a variety of other domains (Fig. 2), although still retaining its ability to bind to RNA.
Table 2. Function and features of the eight characteristic proteins with PUA domain. Alternative names and abbreviation utilized in this work are indicated for each protein, in addition to the protein function. PUA domain role in the specific protein and the organisms in which they are found are also indicated.
Role of PUA domain
Alternative name: H/ACA small nucleolar ribonucleoprotein (H/ACA small nucleolar RNP) subunit 4, Cbf5, NAP57, Nop60B
Pseudouridine synthase and H/ACA ribonucleoprotein
Binds to the ACA motif of H/ACA RNA. Implicated in the stable anchoring of dyskerin to tRNAs 
Alternative name: MCT-1 (multiple copies T-cell malignancies 1); MCTS-1 protein (malignant T-cell-amplified sequence 1), Tma20
Ribosome biogenesis and translational regulation
Required for Cap complex–binding. The interaction with m7GTP through the PUA domain requires the presence of eIF4E 
All except E. coli, Salmonella enterica and Shigella dysenteriae
May be specifically responsible for archaeosine incorporation at different sites in tRNA molecules 
Only in: P. furiosus, P. abyssi and M. jannaschii
Phosphoadenosine phosphosulfate reductase
Alternative name: PUA domain-phosphoadenosine phosphosulfate reductase proteins
Involved in taxa-specific RNA modifications.
Involved in the bases biosynthesis through sulfate activation and reduction [9, 42]
Only in: P. furiosus, P. abyssi and M. jannaschii
The eight groups are given below.
TGTs and phosphoadenosine phosphosulfate
These proteins are only present in archaea  and are distantly related to the family of queuine-specific TGTs [35, 41] and to the proteins that contain the phosphoadenosine phosphosulfate reductase domain  of bacteria and eukaryotes, which lack the PUA domain.
Glutamate-5-kinases (G5K) are present in all species studied but only the G5K of Schizosaccharomyces pombe, S. cerevisiae, K. lactis, Salmonella enterica, Shigella dysenteriae and E. coli present a PUA domain. G5K catalyzes the controlling first step in the synthesis of the osmoprotective amino acid proline, feedback of which inhibits G5K. In E. coli, the PUA domain of G5K modulates the function of the amino acid kinase domain and is capable of exposing new surfaces upon proline binding. In higher eukaryotes, there is a bifunctional Δ(1)-pyrroline-5-carboxylate synthase, wherein the PUA domain is absent [40, 43].
Dyskerin is a highly conserved nucleolar protein. A lot of information is available about dyskerin because mutations in this protein cause dyskeratosis congenital disorder (DKC), an inherited disorder with clinical manifestations of skin hyperpigmentation (dark patches), oral leukoplakia (white spots inside the mouth) and nail dystrophy (lack of nails). These mutations cause a reduction in telomerase activity, leading to limitations in the proliferative capacity of stem cells. This reduction in proliferative capacity for high turnover cells leads to low counts for blood and immune cells, resulting in aplastic anemia . Like all small nucleolar RNP-associated proteins, dyskerin is phylogenetically highly conserved, and is found in all studied species. Data from studies in mice, rats and humans indicate that dyskerin is involved in at least three basic processes: maintenance of telomere integrity, biogenesis and function of the ribosome, and pseudouridylation of various cellular RNAs. Furthermore, recent reports suggest a role for dyskerin in the regulation of a subset of microRNAs  and in translational control of mRNA . Human dyskerin is a 514 amino acid protein; its homologs in archaea (Cbf5, 340 amino acids) and bacteria (TruB, 314 amino acids) are shorter. All proteins contain the TruB_N and PUA domains (currently identified as TruB–C2 by different databases in TruB of E. coli). Nevertheless, dyskerin and Cbf5 also contain a dyskeratosis congenital-like domain of unknown function, but that is typical of this protein family (Fig. 2). Furthermore, human dyskerin contains nuclear and nucleolar localization signals which logically are not present in their homologs from archaea or bacteria. TruB_N is the catalytic domain (TruB family pseudouridylate synthase N–terminal domain) and harbors conserved aspartic acid 125, which marks the active site. The dyskeratosis congenital-like domain is an N–terminal domain associated with the TruB_N/PUA domain of dyskerin-like proteins . As mentioned in the Introduction, members of the TruB_N family are involved in modifying bases in RNA molecules. This group consists of eukaryotic, bacterial and archaeal pseudouridine synthases similar to human dyskerin, S. cerevisiae Cbf5, D. melanogaster Mfl (minifly protein) and includes TruB [48, 49]. Nuclear and nucleolar localization signals have very similar amino acid compositions, but these two signals are recognized as being different by the cell . Proteins containing the joint nucleolar–nuclear localization signal, such as human dyskerin, can cross the nuclear envelope and accumulate in the nucleolus . However, although human dyskerin was first defined as a nucleolar protein that showed strict nucleolar localization (initially localized in the nucleoplasm, followed by a sequential translocation to the nucleoli and to the coiled bodies ), a recent study has indicated that the human DKC1 gene encodes a new alternatively spliced mRNA (isoform 3) which can direct the synthesis of a variant form of dyskerin that has an unexpected cytoplasmic localization and lacks the C–terminal nuclear localization signal . Supporting the data, expression of dyskerin has been described in both the cytoplasm and nuclei of basal or parabasal cells in cervical lesions , and dyskerin staining was localized mainly in the nuclei of tumor cells and partly in the cytoplasm . Further research is required to learn more about the nucleolar and cytoplasmic forms of dyskerin protein. The biological functions attributed to dyskerin may all be essentially related to its ability to bind and stabilize H/ACA snoRNAs. For stabilization of this H/ACA snoRNA, binding of dyskerin to snoRNAs via the PUA domain is essential.
Malignant T-cell amplified sequence 1 (MCT1) is an oncogene initially identified in a human T-cell lymphoma and can induce cell proliferation as well as activate survival-related pathways. MCT1 contains two domains, the N–terminus of eIF2D, malignant T cell-amplified sequence 1 and related proteins (eIF2D_N) and the C–terminal PUA domain (Fig. 2). The eIF2D_N domain, initially called DUF1947 by Pfam, is also found in eIF2D and uncharacterized archaeal proteins, and was first identified in archaeal MCT1 homologs . MCT1 protein interacts with the cap complex via its PUA domain and then recruits the density-regulated protein (DENR/DRP) that contains the SUI1/eIF1 domain (also found in the translation initiation factor eIF1) . Consequently, the mRNA translational profile of the human cell is altered, thus the oncogenic activity of MCT1 may be linked with target RNA translation initiation/regulation . Using a comparative genomics approach, Matte-Tailliez et al. , identified a homolog of MCT1 in the archaea P. abyssi and in our search we found MCT1 homologs in all species studied except in E. coli, Salmonella enterica and Shigella dysenteriae. Therefore, MCT1 seems to be a highly conserved oncogene with the critical biological function of promoting cell proliferation. MCT1 and DENR in combination (MCT1/DENR) together with eIF2D (initially named Ligatin) are three related mammalian proteins that have a dual function . These proteins have a potential to substitute for the canonical initiation factor eIF2 in circumstances when its activity is downregulated.
Although the functional role of eIF2D remains obscure, it is known to participate in the translation initiation of specific mRNA(s) . However, we can not exclude the participation of eIF2D in the regulation of some general translational events (initiation, elongation or termination), under specific conditions or in specific cells . Nevertheless, we found no similar proteins in the archaea and bacteria tested. This protein contains the eIF2D_N domain, PUA domain and the C–terminal domains SWIB/MDM2 and SUI1/eIF1 (also called eIF2D_C domain). These C–terminal domains also occur in DENR. eIF2D, MCT1/DENR and Tma20/Tma22 (yeast orthologs) have been implicated in translation on the basis of bioinformatic, proteomic and overexpression/silencing analyses [57, 60]. Many PUA domains bind dsRNA through either the major or the minor groove, but others, including MCT1, eIF2D and Nip7 , lack most or all of the conserved basic residues involved in these interactions, and their potential contacts with rRNA must thus involve different residues and may have an altered specificity. The eIF2D PUA domain presents six prolines and five leucines that are highly conserved, and unlike the others, presents no great conservation of acidic and basic residues. The eIF2D_N domain in P. horikoshii PHO734.1 has an exposed electropositive cluster, also present in MCT1 and eIF2D, which might mediate an interaction with the ribosome.
The Nip7 protein is involved in ribosome biogenesis, being required for proper 27S pre-rRNA processing and 60S ribosome subunit assembly. Yeast Nip7 interacts with nucleolar proteins. Homo sapiens Nip7 (also called KD93), belongs to the UPF0113 family and its possible molecular function is related to RNA binding. The crystal structure of P. abyssi and H. sapiens Nip7 revealed a monomeric protein composed by two interlinked α/β domains. The N–terminal domain is formed by a five-stranded antiparallel β sheet surrounded by three α helices and a 310 helix, whereas the C–terminus corresponds to the conserved PUA domain. It has been shown experimentally that the PUA domain in yeast and archaea mediates the interaction between Nip7 and RNA . However, there are differences in the amino acid constitution and electrostatic surface between the RNA-binding regions of human Nip7 and other PUA-containing proteins, indicating that they may have different RNA-binding modes . The Nip7 PUA domain, similar to dyskerin, also has conserved basic residues in addition to the characteristic residues of the PUA domain (7–10 residues).
Nsun6 is present in most studied species as Anopheles gambiae, Aedes aegypti, C. elegans, S. cerevisiae, Schizosaccharomyces pombe and K. lactis. This is the least characterized of the eight proteins presented in Table 2. It has two domains, NOL1/NOP2/Sun and PUA (detected with Pfam and PROSITE), and it is detected as class I S–adenosyl-l–methionine-dependent methyltransferases by Scop and CDD. Of the proteins belonging to the NOL1/NOP2/Sun family, Nsun2 is the most studied because it is related to diseases such as autosomal recessive intellectual disability [39, 62]. Phylogenetic analysis of proteins belonging to this family, leads to the conclusion that Nsun6 is more closely related to NOP2 that the other members . S-Adenosyl-l–methionine-dependent methyltransferases, as typified by the hypothetical protein MJ1653 from M. jannaschii are characterized by an N–terminal RNA-binding PUA domain and a C–terminal AdoMet-MTase catalytic domain of the class I Rossmann-fold-like type .
Comparative analysis of the PUA domain between the different organisms
Many common RNA-binding protein folds are observed in ancient RNA-binding proteins (such as tRNA synthases). However, despite the obvious structural similarities, there is sometimes insufficient sequence homology to detect a direct evolutionary relationship between RNA-binding motifs. Therefore, to produce a more comprehensive review, we carried out a ‘qualitative comparative analysis’ following the guidelines of the Center for Reviews and Dissemination . We searched for similar sequences using several different methods, including sequence similarity, generation of hidden Markov models (HMM) and structural similarity. Once all sequences of the 20 species had been recovered, we tried to construct a phylogeny based only on the PUA domain sequences. We made many attempts using different approaches and algorithms, including distance, maximum parsimony or maximum likelihood, but we could not consistently cluster the entire collection. This may have been due to the very variable levels of sequence identity between some of the species (5–100%), which did not allow us to obtain phylogenetic trees with significant node consistency.
We eventually turned to blastclust  and clans  as more appropriates tools to generate clusters in this highly divergent set of sequences. Both programs showed similar results, allowing us to observe that the PUA domain clusters correlate quite well with the different protein architectures (Fig. 3). PUA domain sequences from dyskerin, MCT1, eif2D, Nip7 and Nsun6 proteins formed distinct clusters at higher cut-offs, and are thus more closely related. Moreover in the clans analysis, there is a higher cluster that comprises those of the first three proteins mentioned above. The PUA domain sequences from Nip7 and Nsun6 of M. jannaschii, and Nsun6 of A. thaliana are not grouped in any cluster using blastclust, but the clans program includes them in the NIP7 and Nsun6 clusters. Furthermore, the PUA domain sequences from MCT1 proteins of B. taurus, H. sapiens, M. musculus and R. norvegicus are more closely related, and can not be separated by blastclust, as they are clustered even with 100% of identity. The same happens with the PUA domain sequences of H. sapiens and B. taurus dyskerin proteins, and R. norvegicus and B. taurus NIP7 proteins.
PUA of dyskerin presents a high level of evolutionary conservation. As can be seen in Fig. 3, the TruB proteins form a cluster separate from the rest of the dyskerin-like proteins. This is consistent with the literature because, despite the low sequence similarity between them, dyskerin shows strong structural similarity to TruB (bacterial pseudouridine synthase) and for that reason is considered to be the pseudouridine synthase of H/ACA small nucleolar RNPs [67, 68]. However, these proteins have some differences, the PUA domain of TruB makes nonsequence-specific contacts with the acceptor stem of tRNA, the PUA domain of dyskerin is considerably larger than that of TruB, and the angle formed between the PUA domain and the core Ψ synthase fold extends the active site cleft in the other direction .
Structural comparison of PUA-domain-containing proteins
The PUA domain structure consists of six β strands (β1–β6) and two short α helices (α1–α2). DALI server  detect a total of 197 protein structures (Z-score > 3), which belong to one of the groups listed in Table 2. Structural homologs (Z-score > 7) identified by the DALI server show high similarity in overall PUA domain structure. To further illustrate the conformational similarity, we aligned the structures of the PUA domain using the structure of S. cerevisiae as a template (Fig. 4C,D). The 16 PUA domain structures can be closely aligned with a r.m.s.d. of < 1.6.
Compared with their archaeal counterparts, eukaryotic dyskerin protein is longer and contains an N–terminal extension (NTE) and C–terminal extension . These two extensions are highly conserved in eukaryotes and harbor many pathologic mutations. Using the archaeal dyskerin structure as a template, a structural model for human dyskerin was proposed by Rashid et al. . Even though incomplete because the archaeal protein lacks the NTE and C–terminal extension regions mentioned above, the model shows that most dyskerin pathologic mutations converge on the same side of the PUA domain . This suggests that these mutations may affect the binding to substrate RNAs, or to an as yet unidentified partner of the complex. The crystal structure of the yeast dyskerin in complex with Gar1 and Nop10 showed that the NTE folds into a new structural layer covering the PUA domain, whereas the C–terminal extension is disordered . The structures supplied by Li and colleagues [70, 71] revealed the role of the NTE. The NTE is partially structured and forms a new architectural layer that covers the β barrel of the PUA domain and expands upon the eukaryotic PUA domain. A portion of the NTE is encircled by the N–terminal residues of the PUA domain and packs against the surface of the β barrel via hydrophobic interactions . Certainly, a structural model of the human dyskerin but using as template Cbf5 yeast, will provide useful information.
The number of databases has increased from 2007 to the present, and the available information about proteins and their functions has increased, significantly. Analysis of the updated databases allowed us to detect the existence of all PUA domain proteins, corroborating previous findings by Pérez-Arellano et al. , but also adding important information that was unknown at that time.
This review illustrates the diversity in the distribution, protein architecture and sequence characteristics of PUA-domain-containing proteins. We surveyed public sequences and structure databases, and multiple search engines were used to identify all proteins containing the PUA domain. The PUA domain is found in a wide variety of proteins and we detected eight protein groups in different species from the three superkingdoms. Five of them (dyskerin, MCT1, eIF2D, Nsun6 and Nip7) are present in H. sapiens and have orthologs in most species, whereas the other three (G5K, TGTs and phosphoadenosine phosphosulfate) are characteristic of archaea or bacteria. This domain occurs as a single copy and might appear as a single architectural unit or in diverse combinations with a variety of other domains. The PUA-containing proteins function as post-transcriptional RNA-modifying enzymes, translation initiation factors, glutamate kinases (in bacterial and yeast) and proteins involved in ribosome biogenesis.
Structural analysis of the PUA domain reveals high levels of conservation in different PUA-domain-containing proteins, even among different species; which testifies to the importance of proteins belonging to the PUA group. Because PUA-containing proteins have an important role in the cell, it is expected that their deregulation results in aberrant cell phenotypes. Thus, better understanding of PUA domain functioning in pathology, as well as in normal cell physiology, might lead to novel therapies.
With the wealth of information described about the group of PUA containing proteins, one of the main issues to be addressed in the future is how these PUA domains are part of proteins with diverse architectures, despite maintaining similar molecular RNA-binding properties. In addition, how the activity of PUA domain is regulated and coordinated by the cell is largely unknown and thus needs to be investigated. Further research is warranted to improve our understanding of this domain and their impact in physiological and pathological processes.
Materials and methods
Protein sequences corresponding to dyskerin, MCT1, eIF2D, NIP7 and Nsun6 were retrieved from the GenBank database using blastp and psi-blast programs . Sequences belonging to the following species were recovered: H. sapiens, R. norvegicus, Mus musculus, Bos taurus, Gallus gallus, Danio rerio, D. melanogaster, A. thaliana, Anopheles gambiae, Aedes aegypti, C. elegans, Schizosaccharomyces pombe, S. cerevisiae, K. lactis, E. coli, Salmonella enterica, Shigella dysenteriae, P. furiosus, P. abyssi and M. jannaschii.
HMM and phylogeny
In order to retrieve all proteins containing the PUA domain from the databases, HMMs were generated. HMMs were built from alignments of PUA domains, using the hmmbuild program and searches were carried out using the hmmsearch program from the hmmer package . Results from hmmsearch and blast were compared and additional PUA domains were selected. Phylogenetic and molecular evolutionary analyses were conducted using mega v. 5 .
Cluster analysis of the PUA domain
Clustering of the PUA domain dataset was performed using blastclust  and clans . blastclust was performed by obtaining 70% of the length of the sequences for comparison. Cluster analysis of the clans program was performed using PSIBLAST and BLOSUM62 parameters, and a cut-off value of 9.4926.
Structure similarity searches were conducted using the dali program  and the structure of S. cerevisiae (3UAIA) as the input, both complete and isolated PUA domain. One hundred and nine entries, with a Z-score > 8, were retrieved from the PDB using the dali structure superposition search. Fifteen additional sequences comprising PUA domain structures were selected (PDB ID: 3ZV0D, 1T5YA, 3U28A, 1SQWA, 3LWPA, 2EY4A, 3R90A, 3HAXA, 2J5TD, 1K8WA, 3MQKA, 2APOA, 2AUSA, 2P38B and 2FRXA). Structural superpositions, sequence alignments and r.m.s.d. values were generated and calculated using dali and by manual inspection and adjustments. Molecular representations were performed using jmol (SourceForge, Dice Holdings Inc.; Phoenix, AZ, USA) and pymol (Schrödinger, LLC; Portland, OR, USA) (which can be downloaded from http://www.jmol.org and http://www.pymol.org, respectively).
This study was supported by Grants from Quilmes National University (Argentina). DEG and PDG are researchers from CONICET. DEG is a member of the National Cancer Institute of Argentina.