In order to extend the structural coverage of eukaryotic genomes, we selected 288 open reading frames (ORF's) in the yeast genome with significant homology to mouse proteins. One of these, an allantoicase (YIR029W) from Saccharomyces cerevisiae, encodes a protein with a molecular weight of 38,581 Da (residues 1–343) and a calculated isoelectric point of 5.9. Allantoicase (EC 220.127.116.11), also known as allantoate amidinohydrolase, is involved in purine degradation and facilitates the utilization of purines as secondary nitrogen sources in nitrogen-limiting conditions.1 While purine degradation converges to uric acid in all vertebrates, its further degradation varies from species to species. Uric acid is excreted by birds, reptiles, and some mammals that do not have a functional uricase gene, whereas other mammals produce allantoin. Amphibians and microorganisms produce ammonia and carbon dioxide using the uricolytic pathway.2 Allantoicase performs the second step in this pathway that hydrolyses the linear amidine allantoate to (−)-ureidoglycolate and urea. Hydrolysis of the alternative substrate (+)-ureidoglycolate to glyoxylate and urea has also been observed.3 Although allantoicase activity is not detectable in mammals, birds, reptiles and some fishes, they still contain the gene for allantoicase, suggesting an alternative function.2, 4 Here, we report the crystal structure of YIR029W determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).5
The structure of YIR029W [Fig. 1(A)] was determined to 2.40-Å resolution using the multi-wavelength anomalous dispersion (MAD) method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes one protein monomer (residues 1–57, 67–187, 194–284 and 292–343) and 239 water molecules. No electron density was observed for residues 58–66, 188–193 and 285–295. The Matthews' coefficient (Vm)6 for YIR029W is 2.87 Å3/Da and the estimated solvent content is 56.8%. The Ramachandran plot, produced by Procheck 3.47 shows that 85.5% of the residues are in the most favored regions and 14.5% are in additional allowed regions.
|Unit cell parameters||a = b = 107.21 Å, c = 134.92 Å, α = β = 90°, γ = 120°|
|Resolution range (Å)||41.97–2.40||38.35–2.75||38.35–2.75||38.35–2.75|
|Number of observations||94,513||85,239||85,236||84,753|
|Number of reflections||18,532||12,534||12,549||12,544|
|Completeness (%)||99.8 (98.8)a||99.9 (99.9)a||99.9 (100.0)a||99.9 (99.7)a|
|Mean I/σ(I)||13.0 (1.8)a||10.0 (2.5)a||5.2 (1.9)a||9.1 (2.1)a|
|Rsym on Ib||0.099 (0.707)a||0.065 (0.285)a||0.077 (0.380)a||0.074 (0.344)a|
|Highest resolution shell (Å)||2.46–2.40||2.90–2.75||2.90–2.75||2.90–2.75|
|Model and refinement statistics|
|Resolution range (Å)||41.97–2.40||Data set used in refinement||λ0MADSe|
|Number of reflections (total)||18,506||Cutoff criteria|||F| > 0|
|Number of reflections (test)||830||Rcrystc||0.175|
|Completeness (% total)||99.8||Rfreed||0.222|
|Restraints (RMS observed)|
|Bond length||0.017 Å|
|Average isotropic B-value||57.3 Å2|
|ESU based on R valuee||0.28 Å|
The YIR029W monomer contains 24 β-strands (β1–β24), one α-helix (H2) and six 310–helices (H1, H3–H7) [Fig. 1(A, B)]. The total β-strand, α-helical and 310-helical content is 40.9%, 2.5% and 6.6% respectively. YIR029W contains an N-terminal strand-helix motif and two allantoicase-repeats (AR) which form two similar β-sandwich domains.8 AR1 (residues 21–187) and AR2 (residues 194–351) are connected by a flexible linker (residues 188–193) [Fig. 1(A)]. AR1 folds in a β–sandwich composed of four-stranded (A) and five-stranded (B) antiparallel β-sheets: A with 1423 topology (β2, β6, β9, β12) and B with 12534 topology (β3, β5, β7, β8, β10). The A and B β-sheets are slightly crossed (∼40°) with respect to each other and bury a compact hydrophobic core. The β-strands are connected by extended loops one of which is disordered (residues 58–66) in the crystal structure. In addition, a short, additional two-stranded, antiparallel β-sheet C (β4, β11) is flanked by two loops and helix H4. The N-terminal strand-helix motif (β1, H1, H2; residues 1–20) packs against β-sheet A, where β1 forms an additional, antiparallel β-strand that is hydrogen bonded to β9. H2 forms part of the interface to AR2 through interaction with β14.
AR2 has a very similar fold to AR1. Both domains are related by an approximate two-fold and can be superimposed with a root-mean-square deviation (RMSD) of 1.04 Å for 115 residues with 40% sequence identity. AR2 folds into a β–sandwich composed of a four-stranded (D) and a five-stranded (E) antiparallel β-sheet: D with 1423 topology (β13, β18, β21, β24) and E with 12534 topology (β14, β17, β19, β20, β22) [Fig. 1(A, B)]. The β-strands are connected by extended loops one of which is disordered (residues 285–295). In addition, a short, additional three-stranded antiparallel β-sheet F (β15, β16, β23) is flanked by two loops and helix H6 next to the interface region with AR1.
A structural similarity search, performed with the coordinates of YIR029W using the DALI server,9 showed the best match to be the N-terminal domain of the human DNA-Repair Protein XRCC1 (PDB:1xna),10 with an RMSD of 2.5 Å over 123 aligned residues with 17% sequence identity to the AR2 domain. XRCC1 is also similar to the AR1 domain, where the RMSD is 2.6 Å over 122 aligned residues with 16% sequence identity. Another structural homologue is the galactose-binding domain in a sialidase from M. viridifaciens (PDB:1euu),9 where the respective RMSD's for the AR1 and AR2 domains are 2.8 Å and 2.7 Å over 118 aligned residues with 10% and 11% sequence identity. None of the DALI hits contains a second AR domain, indicating that YIR029W is the first structure of a protein containing two Allantoicase repeats. Models for YIR029W homologues can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl?key=YIR029W.
The crystallographic packing in the YIR029W structure indicates that a hexamer is the biologically-relevant oligomeric form. A hexamer (200 kDa), composed of two trimers (100 kDa), has also been reported in biophysical studies with the allantoicase from Chlamydomonas reinhardtii.1 The hexamer is comprised of two planar trimers stacked on top of each other to form a barrel-like structure with 3,2-symmetry. The hexamer measures 100 Å in diameter and 60 Å in height with a 15 Å wide inner channel [Fig. 2(A)]. The interfaces in the trimer are formed by head-to-tail interactions between residues from the AR1 domain (Glu72, Arg75, Glu78, and Asp172) from one subunit with the AR2 domain (Arg238, Arg240, Lys305, and Asp332) from the adjacent subunit [Fig. 2(A)]. The interface between the two trimers is formed by side-on interactions between residues from the AR1 domain (strand β3 and the loop-region of residues 122–127) from one subunit with residues from the AR2 domain (loop-region of residues 194–203 and β-sheet D) from the other subunit. The subunit interactions are stabilized by seven salt-bridges and account for a buried surface area of 2444 Å2 per monomer.
Mapping the sequence conservation of 43 known allantoicases8 onto the YIR029W structure identifies two highly similar clusters of hydrophilic residues in AR1 (Glu72, Arg75, Asp82, Asn108, and Asp172) and in AR2 (Glu235, Arg238, Asp246, Asn272, and Asp332) [Fig. 2(A,B)]. The two conserved clusters are within the head-to-tail subunit interface [Fig. 2(A)]. The active site in the galactose-binding domain coincides with the conserved cluster in the AR-repeats of YIR029W, suggesting a possible location of its active site in the subunit interface [Fig. 2(C)]. An alternative active site location is the deep crevice in the AR1-AR2 domain interface [Fig. 2(A)], which contains two strictly conserved residues (Asp24 and Arg179).
The YIR029W structure reported here represents the first allantoicase, whose structure has been determined by X-ray crystallography using the MAD method. The information reported here, in combination with further biochemical and biophysical studies will yield valuable insights into the functional role of allantoicase in microorganisms, invertebrates, and vertebrates.