The Center for Eukaryotic Structural Genomics (CESG) focuses on technology and methodology development for high-throughput X-ray or NMR structure determination of proteins from eukaryotic organisms.1 The goals of this project also include the identification of new or unique protein folds and characterization of proteins of unknown structure or function. Through a process of selecting targets that have no close amino acid sequence relationship to those in Protein Data Bank (PDB),2 CESG selected two open reading frames, At5g11950 and At2g37210, from Arabidopsis thaliana for structural characterization. These two genes encode highly conserved hypothetical proteins with molecular weights of 23.8 and 23.6 kDa, respectively. The biological functions of the At5g11950 and At2g37210 genes in A. thaliana are not yet established. Based on sequence similarities, the protein products of At5g11950 and At2g37210 are annotated as lysine decarboxylase (LDC)-like proteins; however, no indication of the basis for this annotation can be found. In A. thaliana, at least 11 hypothetical proteins are annotated as LDC-like proteins by genome analysis.3 No biochemical evidence supporting this annotation is available. Here, we report the X-ray crystal structures of the proteins from A. thaliana gene loci At5g11950 and At2g37210 and describe the structural context of the characteristic motif of this protein family.
The genes were cloned4 and proteins were expressed5 and purified6 by standard CESG protocols. Crystals of Se–Met-labeled At5g11950 were grown by the hanging drop method, from a 10 mg/ml protein solution in Buffer A (5 mM BisTris, 50 mM NaCl, 3.1 mM NaN3, 0.3 mM TCEP, pH 6.0) mixed with an equal volume of well solution containing 13% (w/v) MePEG 2000, 280 mM KNO3, 100 mM MOPS (pH 7.0 at 293 K). Crystals were cryoprotected by placing them serially in well solutions supplemented with increasing concentrations of ethylene glycol, up to a final concentration of 25% (v/v) ethylene glycol. Single-wavelength diffraction data were collected from Se–Met-labeled At5g11950 using an APS 1 detector on beamline 19-BM SBC-CAT at the Advanced Photon Source, Argonne National Laboratory. The data were integrated and scaled using the HKL2000 suite.7 Localization of the Se positions, phasing, and phase improvement were performed with SOLVE8 and RESOLVE9 programs. The initial model was built using the automatic tracing procedure as implemented in ARP/wARP10 and refined to 2.15 Å using Refmac5.11
Crystals of Se–Met-labeled At2g37210 were grown by the hanging drop method, from a 10 mg/ml solution in Buffer A (see above) mixed with an equal volume of well solution containing 22% (w/v) MePEG 2000, 84 mM MgSO4, 100 mM BisTris (pH 6.5 at 296 K). Crystals were cryoprotected by placing them serially in well solutions supplemented with increasing concentrations of ethylene glycol, up to a final concentration of 20% (v/v) ethylene glycol. Single-wavelength diffraction data were collected using a MAR 225 detector on beamline 22-BM SER-CAT at the Advanced Photon Source, Argonne National Laboratory. The data were processed using the HKL2000 suite.7 The structure was solved by molecular replacement using MOLREP12 and the structure of At5g11950 as the phasing model. The structure of At2g37210 was refined to 1.95 Å using Refmac5.11
RESULTS AND DISCUSSION
Table I summarizes data collection, phasing, refinement, and model statistics. Coordinates for the crystal structures and diffraction data have been deposited in the PDB under the accession codes 1YDH and 2A33 for At5g11950 and At2g37210, respectively. The monomeric structure of At5g11950 shows an α/β protein fold comprising eight α-helices and seven β-strands (β1α1β2α2β3α3β4α4β5α5β6α6α 7β7α8) [Fig. 1(A)]. The central feature of this domain is the β-sheet formed by seven parallel β-strands surrounded by the eight α-helices. On one side of the central β-sheet are helices α1, α2, α3, and α8, and on the other side are helices α4, α5, α6, and α7. The tertiary structure of the At2g37210 monomer is almost identical to that of the At5g11950 monomer, with a 0.9 Å root mean square deviation (rmsd) and 71% identity over 167 aligned Cα positions. The only major difference between the structures is an absence of the short α3 helix in the At2g37210 structure. The loop that spans residues 82–89 was highly disordered in the electron density map, and thus, was not built into the final At2g37210 structure.
Table I. Summary of Crystal Parameters, Data Collection, Phasing, Refinement and Model Statistics
, where Ii(h) is the intensity of an individual measurement of the reflection and 〈I(h)〉 is the mean intensity of the reflection. Values in parentheses are for the highest resolution shell.
, where Fobs and Fcalc are the observed and calculated structure–factor amplitudes, respectively.
Rfree was calculated as Rcryst using 5.0% of the randomly selected unique reflections that were omitted from structure refinement.
Two At5g11950 subunits associate to form a tight dimer in the crystalline asymmetric unit [Fig. 1(A)]. The dimer buries 1806 Å2 surface area of each monomer and includes 12 hydrogen bonds. The interface between the two monomers is mostly hydrophobic (68%) and stabilized by contact of helices α5 and α6. The CASTp server13 was used to search for pockets or cavities on the surface of At5g11950. It found a cleft with a surface area of 623 Å2 and a volume of 322 Å3. The bottom of the cleft is defined by residues from strands β1, β2, and β3 and a loop between β3 and α3, and the wall of the cleft is formed mostly by residues from helices α4 and α5.
The coordinates of At5g11950 and At2g37210 were analyzed by the DALI server14 to find structurally similar proteins in the PDB. DALI returned 8 and 19 structural homologs for At5g11590 and At2g37210, respectively, all with a Z score over 7.0. The best match to both At5g11590 and At2g37210 was TT1887 (PDB 1WEH), with Z scores of 19.1 and 20.4, rmsd's of 2.4 and 2.1 Å and 23 and 25% identity, respectively, for the two proteins whose structures we determined. TT1887 is a hypothetical protein from Thermus thermophilus Hb8, which is also annotated as an LDC-like protein.15 However, its true biological activity is still unknown. The second and third top matches were nucleoside 2-deoxyribosyltransferase (PDB 1F8X, Z score 7.5 and 7.8, rmsd 3.9 and 3.4 Å, 12 and 15% identity)16 and UDP–N-acetylglucosamine 2-epimerase (PDB 1F6D, Z score 7.2 and 8.1, rmsd 3.0 and 2.9 Å, 12 and 12% identity),17 respectively. Although they share an apparent structural similarity, both At5g11950 and At2g37210 may have a different function because they do not contain the active site residues of those two enzymes. Interestingly, At5g11950 and At2g37210 are also structurally similar to the negative transcriptional regulator NmrA (PDB 1K6I, Z score 6.7 and 7.6, rmsd 3.5 and 3.3 Å, 8 and 7% identity). NmrA is involved in the signaling pathway of nitrogen metabolite repression in various fungi.18
A VAST search19 found 18 and 14 structural neighbors of At5g11950 and At2g37210, respectively, all with a VAST score over 13.0. Among these structures, four top neighbors with VAST scores greater than 17, rmsd values less than 2.0 Å and a sequence identity of over 23% were annotated as putative LDCs: YvdD (PDB 1T35) from Bacillus subtilis, Tm1055 (PDB 1RCU) from Thermotoga maritima, TT1465 (PDB 1WEK) and TT1887 (PDB 1WEH) from T. thermophilus Hb8.15 All four structures display an α/β protein fold and contain 6, 7, or 8 α-helices flanking a central β sheet in a similar location to the At5g11950 and At2g37210 structures.
An FFAS03 search20 confirmed that the four putative LDCs identified by the VAST server share distant sequence homology to At5g11950 and At2g37210, with FFAS03 scores below −49.8 and sequence identity just over 21%. Based upon the crystal structures and sequence homology searches, the protein fold of At5g11950 and At2g37210 was classified as part of the LDC family, pfam03641; however, at present there is no biochemical evidence to support this annotation. Structural analysis with At5g11950 revealed that the consensus motif PGGxGTxxE15 is within helix α5 and constitutes part of a cleft [Fig. 1(B)]. In addition, conserved residues Arg98, Thr118, and Glu121 are positioned at the bottom of the cleft, which is created by the β-sheet and helices α4 and α5 in each monomer [Fig. 1(C)]. With these findings, we speculate that the invariant residues and consensus motif are functionally important for biological activity, perhaps forming part of a catalytic site.
Data were collected at Southeast Regional Collaborative Access Team (SER-CAT) 22-BM beamline at the Advanced Photon Source, Argonne National Laboratory. Supporting institutions may be found at www.ser-cat.org/members.html. Use of the Argonne National Laboratory Structural Biology Center beamlines at the Advanced Photon Source, was supported by the U.S. Department of Energy, Office of Energy Research, under Contract No. W-31-109-ENG-38. Special thanks goes to all members of the CESG.