Crystal structure of an allantoicase (YIR029W) from Saccharomyces cerevisiae at 2.4 Å resolution


In order to extend the structural coverage of eukaryotic genomes, we selected 288 open reading frames (ORF's) in the yeast genome with significant homology to mouse proteins. One of these, an allantoicase (YIR029W) from Saccharomyces cerevisiae, encodes a protein with a molecular weight of 38,581 Da (residues 1–343) and a calculated isoelectric point of 5.9. Allantoicase (EC, also known as allantoate amidinohydrolase, is involved in purine degradation and facilitates the utilization of purines as secondary nitrogen sources in nitrogen-limiting conditions.1 While purine degradation converges to uric acid in all vertebrates, its further degradation varies from species to species. Uric acid is excreted by birds, reptiles, and some mammals that do not have a functional uricase gene, whereas other mammals produce allantoin. Amphibians and microorganisms produce ammonia and carbon dioxide using the uricolytic pathway.2 Allantoicase performs the second step in this pathway that hydrolyses the linear amidine allantoate to (−)-ureidoglycolate and urea. Hydrolysis of the alternative substrate (+)-ureidoglycolate to glyoxylate and urea has also been observed.3 Although allantoicase activity is not detectable in mammals, birds, reptiles and some fishes, they still contain the gene for allantoicase, suggesting an alternative function.2, 4 Here, we report the crystal structure of YIR029W determined using the semiautomated high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).5

The structure of YIR029W [Fig. 1(A)] was determined to 2.40-Å resolution using the multi-wavelength anomalous dispersion (MAD) method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes one protein monomer (residues 1–57, 67–187, 194–284 and 292–343) and 239 water molecules. No electron density was observed for residues 58–66, 188–193 and 285–295. The Matthews' coefficient (Vm)6 for YIR029W is 2.87 Å3/Da and the estimated solvent content is 56.8%. The Ramachandran plot, produced by Procheck 3.47 shows that 85.5% of the residues are in the most favored regions and 14.5% are in additional allowed regions.

Figure 1.

Crystal structure of YIR029W. A: Stereo ribbon diagram of Saccharomyces cerevisiae YIR029W color coded from N-terminus (blue) to C-terminus (red) showing the domain organization. Helices H1–H7, and β-strands (β1–β24) in β-sheets A–F are indicated. Residue numbers at the beginning and end of missing loops are indicated. B: Diagram showing the secondary structure elements in YIR029W superimposed on its primary sequence. The disordered regions are depicted by a dashed line with the corresponding sequence in brackets. The β-sheet designation of the individual β-strands are indicated by a red A–F and β-hairpins are depicted as red loops.

Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for YIR029W (PDB: 1o59)
  • a

    highest resolution shell.

  • b

    Rsym = Σ|Ii−〈Ii〉|/Σ|Ii| where Ii is the scaled intensity of the ith measurement, and 〈Ii〉 is the mean intensity for that reflection.

  • c

    Rcryst = Σ||Fobs| − |Fcalc||/Σ|Fobs| where Fcalc and Fobs are the calculated and observed structure factor amplitudes, respectively.

  • d

    Rfree = as for Rcryst, but for 4.5% of the total reflections chosen at random and omitted from refinement.

  • e

    ESU = Estimated overall coordinate error.15, 20

Space groupP6322   
Unit cell parametersa = b = 107.21 Å, c = 134.92 Å, α = β = 90°, γ = 120°
Data collectionλ0MADSeλ1MADSeλ2MADSeλ3MADSe
Wavelength (Å)0.97620.97940.97960.9184
Resolution range (Å)41.97–2.4038.35–2.7538.35–2.7538.35–2.75
Number of observations94,51385,23985,23684,753
Number of reflections18,53212,53412,54912,544
Completeness (%)99.8 (98.8)a99.9 (99.9)a99.9 (100.0)a99.9 (99.7)a
Mean I/σ(I)13.0 (1.8)a10.0 (2.5)a5.2 (1.9)a9.1 (2.1)a
Rsym on Ib0.099 (0.707)a0.065 (0.285)a0.077 (0.380)a0.074 (0.344)a
Sigma cutoff0.
Highest resolution shell (Å)2.46–2.402.90–2.752.90–2.752.90–2.75
Model and refinement statistics    
Resolution range (Å)41.97–2.40Data set used in refinementλ0MADSe
Number of reflections (total)18,506Cutoff criteria|F| > 0
Number of reflections (test)830Rcrystc0.175
Completeness (% total)99.8Rfreed0.222
Stereochemical parameters    
Restraints (RMS observed)    
Bond length0.017 Å   
Bond angle1.51°   
Average isotropic B-value57.3 Å2   
ESU based on R valuee0.28 Å   
Protein residues/atoms317/2567   
Solvent molecules239   

The YIR029W monomer contains 24 β-strands (β1–β24), one α-helix (H2) and six 310–helices (H1, H3–H7) [Fig. 1(A, B)]. The total β-strand, α-helical and 310-helical content is 40.9%, 2.5% and 6.6% respectively. YIR029W contains an N-terminal strand-helix motif and two allantoicase-repeats (AR) which form two similar β-sandwich domains.8 AR1 (residues 21–187) and AR2 (residues 194–351) are connected by a flexible linker (residues 188–193) [Fig. 1(A)]. AR1 folds in a β–sandwich composed of four-stranded (A) and five-stranded (B) antiparallel β-sheets: A with 1423 topology (β2, β6, β9, β12) and B with 12534 topology (β3, β5, β7, β8, β10). The A and B β-sheets are slightly crossed (∼40°) with respect to each other and bury a compact hydrophobic core. The β-strands are connected by extended loops one of which is disordered (residues 58–66) in the crystal structure. In addition, a short, additional two-stranded, antiparallel β-sheet C (β4, β11) is flanked by two loops and helix H4. The N-terminal strand-helix motif (β1, H1, H2; residues 1–20) packs against β-sheet A, where β1 forms an additional, antiparallel β-strand that is hydrogen bonded to β9. H2 forms part of the interface to AR2 through interaction with β14.

AR2 has a very similar fold to AR1. Both domains are related by an approximate two-fold and can be superimposed with a root-mean-square deviation (RMSD) of 1.04 Å for 115 residues with 40% sequence identity. AR2 folds into a β–sandwich composed of a four-stranded (D) and a five-stranded (E) antiparallel β-sheet: D with 1423 topology (β13, β18, β21, β24) and E with 12534 topology (β14, β17, β19, β20, β22) [Fig. 1(A, B)]. The β-strands are connected by extended loops one of which is disordered (residues 285–295). In addition, a short, additional three-stranded antiparallel β-sheet F (β15, β16, β23) is flanked by two loops and helix H6 next to the interface region with AR1.

A structural similarity search, performed with the coordinates of YIR029W using the DALI server,9 showed the best match to be the N-terminal domain of the human DNA-Repair Protein XRCC1 (PDB:1xna),10 with an RMSD of 2.5 Å over 123 aligned residues with 17% sequence identity to the AR2 domain. XRCC1 is also similar to the AR1 domain, where the RMSD is 2.6 Å over 122 aligned residues with 16% sequence identity. Another structural homologue is the galactose-binding domain in a sialidase from M. viridifaciens (PDB:1euu),9 where the respective RMSD's for the AR1 and AR2 domains are 2.8 Å and 2.7 Å over 118 aligned residues with 10% and 11% sequence identity. None of the DALI hits contains a second AR domain, indicating that YIR029W is the first structure of a protein containing two Allantoicase repeats. Models for YIR029W homologues can be accessed at

The crystallographic packing in the YIR029W structure indicates that a hexamer is the biologically-relevant oligomeric form. A hexamer (200 kDa), composed of two trimers (100 kDa), has also been reported in biophysical studies with the allantoicase from Chlamydomonas reinhardtii.1 The hexamer is comprised of two planar trimers stacked on top of each other to form a barrel-like structure with 3,2-symmetry. The hexamer measures 100 Å in diameter and 60 Å in height with a 15 Å wide inner channel [Fig. 2(A)]. The interfaces in the trimer are formed by head-to-tail interactions between residues from the AR1 domain (Glu72, Arg75, Glu78, and Asp172) from one subunit with the AR2 domain (Arg238, Arg240, Lys305, and Asp332) from the adjacent subunit [Fig. 2(A)]. The interface between the two trimers is formed by side-on interactions between residues from the AR1 domain (strand β3 and the loop-region of residues 122–127) from one subunit with residues from the AR2 domain (loop-region of residues 194–203 and β-sheet D) from the other subunit. The subunit interactions are stabilized by seven salt-bridges and account for a buried surface area of 2444 Å2 per monomer.

Figure 2.

A: The YIR029W hexamer in surface representation, shown normal and parallel to the three-fold. Upper and lower trimers are colored grey and green, respectively. One subunit of the gray trimers is shown in blue and the clusters of conserved residues are highlighted in yellow. B: Ribbon diagram of a superposition of YIR029W AR1 (residues 1–187) in yellow and AR2 (residues 193–343) in blue. The cluster of conserved residues in AR1 and AR2 (labels shown in brackets) is shown in ball and stick. C: Ribbon diagram of a superposition of YIR029W AR2 (residues 193–343) in blue and the galactose-binding domain (PDB:1euu) in white with bound substrate (D-galactose) The cluster of conserved residues in AR2 (blue) and the D-galactose (white) are shown in ball and stick.

Mapping the sequence conservation of 43 known allantoicases8 onto the YIR029W structure identifies two highly similar clusters of hydrophilic residues in AR1 (Glu72, Arg75, Asp82, Asn108, and Asp172) and in AR2 (Glu235, Arg238, Asp246, Asn272, and Asp332) [Fig. 2(A,B)]. The two conserved clusters are within the head-to-tail subunit interface [Fig. 2(A)]. The active site in the galactose-binding domain coincides with the conserved cluster in the AR-repeats of YIR029W, suggesting a possible location of its active site in the subunit interface [Fig. 2(C)]. An alternative active site location is the deep crevice in the AR1-AR2 domain interface [Fig. 2(A)], which contains two strictly conserved residues (Asp24 and Arg179).

The YIR029W structure reported here represents the first allantoicase, whose structure has been determined by X-ray crystallography using the MAD method. The information reported here, in combination with further biochemical and biophysical studies will yield valuable insights into the functional role of allantoicase in microorganisms, invertebrates, and vertebrates.

Materials and Methods.

Protein production and crystallization:

YIR029W (TIGR: YIR029W; Swissprot:O29664) was amplified by PCR from genomic DNA from Saccharomyces cerevisiae using Taq T33 polymerase (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends of YIR029W. The PCR product was cloned into plasmid pMH1, which encodes an expression and purification tag consisting of MGSDKIHHHHHH at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in selenomethionine-containing medium using the E. coli methionine auxotrophic strain DL41. Bacteria were lysed by sonication in lysis buffer (50 mM KPO4, pH 7.8, 300 mM NaCl, 10% glycerol, 5 mM imidazole, Roche EDTA-free protease inhibitor tablets) with 0.5 mg/ml lysozyme. Immediately after sonication, the cell debris was pelleted by ultracentrifugation at 60,000 g for 20 min (4°C). The soluble fraction was applied to a gravity flow metal chelate column (Talon resin charged with cobalt; Clontech) equilibrated in lysis buffer. The column was then washed with seven column volumes (CV) of wash buffer (20 mM Tris, pH 7.8, 300 mM NaCl, 10% glycerol, 10 mM imidazole) and eluted with 3 CV of elute buffer (25 mM Tris, pH 7.8, 300 mM NaCl, 150 mM imidazole). The protein was then buffer exchanged into crystallization buffer (10 mM Tris, pH 7.8, 150 mM NaCl) and concentrated to 8 mg/mL by centrifugal ultrafiltration (Orbital). The protein was either frozen in liquid nitrogen for later use or used immediately for crystallization trials. The protein was crystallized using the nanodroplet vapor diffusion method12 with standard Joint Center for Structural Genomics crystallization protocols.5 Crystals grew in 30% ethylene glycol. The crystals were indexed in the hexagonal space group P6322 (Table I).

Data collection:

Anomalous diffraction data were collected at the Stanford Synchrotron Radiation Laboratory (SSRL, Stanford, CA) on beamline 11-1 at wavelengths corresponding to the inflection point (λ1), peak (λ2), and high energy remote (λ3) of a selenium MAD experiment, in addition to a 2.40 Å native high resolution data set (λ0), using the BLU-ICE13 data collection environment (Table I). The data sets were collected at 100 K using a Quantum 315 CCD detector. Data were integrated and reduced using Mosflm14 and then scaled with the program SCALA from the CCP4 suite.15 Data statistics are summarized in Table I.

Structure solution and refinement:

The structure was determined using the CCP4 suite15 and SOLVE/RESOLVE.16 Structure refinement was performed using REFMAC5,15 O,17 and Xfit.18 Refinement statistics are summarized in Table I. The final model includes one protein monomer (residues 1–57, 67–187, 194–284, and 292–343), one histidine residue from the purification tag, and 239 water molecules in the asymmetric unit. No electron density was observed for residues 58–66, 188–193, and 285–291 and the rest of the expression and purification tag.

Structure analysis and deposition:

Analysis of the stereochemical quality of the model was accomplished using Procheck 3.4,7 SFcheck 4.0,15 and WHAT IF 5.0.19 Protein quarternary structure analysis used the PQS server ( Figure 1(B) was adapted from an analysis using PDBsum ( and all others were prepared with PYMOL (DeLano Scientific). Atomic coordinates and experimental structure factors of YIR029W have been deposited with the Protein Data Bank and are accessible under the code 1o59.


This work was supported by NIH Protein Structure Initiative grant P50-GM 62411 from the National Institute of General Medical Sciences ( Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory, a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences).