In order to extend the structural coverage of eukaryotic members of the Protein Family database (PFAM),1 we selected 400 open reading frames (ORFs) from available cDNA libraries of the Mouse genome. One of these, gi-13879369 belongs to the structurally and functionally uncharacterized protein family PF03674. This family is highly conserved, with hundreds of homologs in all kingdoms of life, and includes like BtrG from Bacillus circulans, which is part of the biosynthetic pathway for the antibiotic butirosin.2 Significant sequence homology is also found with the AIG2-like family (PF06094), which are plant proteins induced after bacterial infection.3 The 13879369 gene encodes a small protein with a molecular weight of 16,948 Da (residues 1–149) and a calculated isoelectric point of 5.1. Here, we report the crystal structure of 13879369, which was determined using the semi-automated, high-throughput pipeline of the Joint Center for Structural Genomics (JCSG).4
Materials and Methods.
Protein production and crystallization.
A hypothetical protein from Mouse (gi: 13879369, IMAGE: 3501534, Swiss-Prot: Q923B0) was amplified by polymerase chain reaction (PCR) from a clone obtained from the IMAGE consortium using PfuTurbo (Stratagene) and primer pairs encoding the predicted 5′- and 3′-ends. The PCR product was cloned into plasmid pMH4, which encodes an expression and purification tag (MGSDKIHHHHHH) at the amino terminus of the full-length protein. The cloning junctions were confirmed by sequencing. Protein expression was performed in a modified Terrific Broth using the Escherichia coli strain GeneHogs®. Lysozyme was added to the culture at the end of fermentation to a final concentration of 250 μg/mL. Bacteria were lysed by sonication after a freeze/thaw procedure in Lysis Buffer [50 mM Tris pH 7.9, 50 mM NaCl, 10 mM imidazole, 0.25 mM Tris(2-carboxyethyl)phosphine hydrochloride (TCEP)], and the cell debris was pelleted by centrifugation at 3400 × g for 60 min. The soluble fraction was applied to a nickel-chelating resin (Amersham Biosciences) pre-equilibrated with Lysis Buffer. The resin was washed with Wash Buffer [50 mM potassium phosphate pH 7.8, 300 mM NaCl, 40 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP], and the target protein was eluted with Elution Buffer [20 mM Tris pH 7.9, 300 mM imidazole, 10% (v/v) glycerol, 0.25 mM TCEP]. The eluate was buffer-exchanged into Buffer Q [20 mM Tris pH 7.9, 5% (v/v) glycerol, 0.25 mM TCEP] containing 50 mM NaCl and applied to a RESOURCE Q column (Amersham Biosciences) pre-equilibrated with the same buffer. The target protein was eluted using a linear gradient of 50 to 500 mM NaCl in Buffer Q. The appropriate RESOURCE Q fractions were pooled and further purified using a Superdex 200 column (Amersham Biosciences) with elution in Crystallization Buffer [20 mM Tris pH 7.9, 150 mM NaCl, 0.25 mM TCEP]. The appropriate Superdex 200 fractions were pooled and concentrated for crystallization assays to 20 mg/mL by centrifugal ultrafiltration (Millipore). The protein was crystallized using the nanodroplet vapor diffusion method5 with standard JCSG crystallization protocols.4 The crystallization reagent contained 2.4 αM sodium formate, 0.1 αM sodium acetate (final pH 4.1). Twenty-five percent (v/v) glycerol (final concentration) was included as a cryoprotectant. The crystals were indexed in the hexagonal space group P65 (Table I).
Table I. Summary of Crystal Parameters, Data Collection, and Refinement Statistics for 13879369 from Mouse (PDB: 1vkb)
Highly redundant (∼60 fold) anomalous diffraction data were collected at the Advanced Light Source (ALS, Berkeley, USA) on beamline 8.3.1 at wavelength 1.743 Å (λ1) suitable for a Sulfur single-wavelength anomalous dispersion (SAD) experiment. The dataset was collected at 100K using an ADSC CCD detector. The data were integrated and scaled using HKL2000.8 Data statistics are summarized in Table I.
Structure solution and refinement.
The structure was determined with a highly redundant 1.90 Å Sulfur–SAD dataset using SHELX,9 SHARP,10 and ARP/wARP.11 Model completion and refinement were performed with XtalView12 and REFMAC5.6 Refinement statistics are summarized in Table I. The final model includes one monomer, residues 1–102, 106–149, two N-terminal histidines from the purification tag, three formic acid molecules (FMT), and 111 water molecules in the asymmetric unit. No electron density was observed for residues 103–105 and the rest of the expression and purification tag.
Validation and deposition.
Analysis of the stereochemical quality of the model was accomplished using AutoDepInputTool (http://deposit.pdb.org/adit/), MolProbity,13 SFcheck 4.0,6 and WHAT IF 5.0.14 Protein quaternary structure analysis used the PQS server (http://pqs.ebi.ac.uk/). Figures were prepared with PYMOL (DeLano Scientific). Atomic coordinates and experimental structure factors have been deposited in the Protein Data Bank (PDB) and are accessible under the code 1vkb.
Results and Discussion.
The crystal structure of 13879369 [Fig. 1(a)] was determined to 1.90 Å resolution using the Sulfur–SAD method. Data collection, model, and refinement statistics are summarized in Table I. The final model includes two monomers (residues 1–102 and 106–149), three formic acid molecules (FMT), and 111 water molecules in the asymmetric unit. The Matthews' coefficient (Vm)15 for 13879369 is 2.44 Å3/Da, and the estimated solvent content is 49.3%. The Ramachandran plot, produced by MolProbity,13 shows that 99.3% and 0.7% of the residues are in favored and allowed regions, respectively.
Protein 13879369 has approximate dimensions of 28 × 35 × 40 Å3 and is comprised of seven β-strands (β1–β7), three α-helices (H1, H3, H4), one 310 helix (H2) and extended loop regions [Fig. 1(a,b)]. The total β-strand, α-helical, and 310-helical contents are 34.5%, 9.5%, and 5.4%, respectively. Protein 13879369 folds into a five-stranded, antiparallel β-barrel (A) composed of strands β1, β2, and β5–β7. The β-barrel is flanked on one side by helices H1–H3, which pack against strands β1 and β5–β7. One of the open ends of the barrel interacts with β-sheet (B), composed of strands β3 and β4, which in turn packs against helix H4 and a long, C-terminal coil region. The β-barrel contains a disulfide bond between Cys35 and Cys62 adjacent to strands β2 and β5, respectively [Fig. 1(a,b)]. The electron density map shows alternative conformations for the cysteine side chains, indicating that the disulfide bond is only partially formed in the crystal, which may relate to the long exposure of the crystals to the intense X-ray beam during data collection.16 The structure contains a central, hydrophilic cavity gated by the strictly conserved residues Asn16 and Glu82 from the connecting regions following strand β1 and helix H3, and Tyr143 and Arg146 from the C-terminal, respectively, extended region [Fig. 2(a)]. The cavity is lined with Thr9 and four tyrosine residues (Tyr7, Tyr88, Tyr115, and Tyr143), and is solvent accessible through a narrow opening of approximately 2.5 Å in diameter adjacent to Asn16. The cavity is occupied by three water molecules and two FMT molecules: FMT1 is located at the cavity entrance within hydrogen-bonding distance of Arg146, whereas FMT2 is located deep inside the cavity next to Tyr7. The location of this cavity, its sequence conservation, and the presence of bound ligands points to its possible role as the active site of this protein.
The crystallographic packing indicates that a monomer is the biologically relevant oligomeric form. A monomer is also consistent with results from analytical size exclusion chromatography and static light scattering. A structural similarity search, performed with the coordinates of the hypothetical protein using the DALI server,17 showed no structural similarity, indicating that 13879369 is a new fold. The recently solved structure of the hypothetical protein Ytfp from E. coli (PDB: 1xhs)18 shows significant similarity, with a root mean square deviation (RMSD) of 3.1 Å over 109 aligned residues with 22% sequence identity. Comparison with the shorter Ytfp structure (115 residues) reveals good overall agreement of both structures, but with three significant differences at the local structural level [Fig. 2(b)]. Ytfp contains an open β-barrel that shows no hydrogen bonding between strands β2 and β6. Ytfp contains an additional strand β8 that extends sheet B instead of the corresponding helix H4 that is found in the Mouse structure. In addition, the extended C-terminal coil region, which covers the cavity in the Mouse structure, is not present in Ytfp. Instead, Ytfp contains a large, positively charged crevice at this location, which supports another functional role for this region in the protein. Taken together, these structures suggest that the conserved hypothetical protein 13879369 from Mouse contains an internal active site and may function as an enzyme. This finding supports and adds to the current annotation of this protein and its protein family as a biosynthetic protein involved in cellular defense.2, 3
According to FFAS,19 the protein family including 13879369 has about 46 sequence homologs in eukaryotes, bacteria, and archaea. The Drosophila genome contains an uncharacterized entry YS11 (Q9WOY1) with 43% sequence identity. The closest sequence homologs in bacteria are the butirosin biosynthesis protein BtrG from Bacillus cereus, with 31% sequence identity, and the conserved hypothetical protein Ytfp from E. coli, with an overall sequence identity of 22%. Models for 13879369 homologs can be accessed at http://www1.jcsg.org/cgi-bin/models/get_mor.pl?key=13879369.
The 13879369 structure represents a conserved protein from Mouse whose structure has been determined by X-ray crystallography. The information reported here, in combination with further biochemical and biophysical studies, should yield valuable insight into the functional role of this protein in mammals.
This work was supported by a NIH Protein Structure Initiative grant from the National Institute of General Medical Sciences (www.nigms.nih.gov). Portions of this research were carried out at the Stanford Synchrotron Radiation Laboratory (SSRL) and the Advanced Light Source (ALS). The SSRL is a national user facility operated by Stanford University on behalf of the U.S. Department of Energy, Office of Basic Energy Sciences. The SSRL Structural Molecular Biology Program is supported by the Department of Energy, Office of Biological and Environmental Research, and by the National Institutes of Health (National Center for Research Resources, Biomedical Technology Program, and the National Institute of General Medical Sciences). The ALS is supported by the Director, Office of Science, Office of Basic Energy Sciences, Materials Sciences Division, of the U.S. Department of Energy under Contract No. DE-AC03-76SF00098 at Lawrence Berkeley National Laboratory.