M. Kainosho, Graduate School of Science, Institute for Advanced Research, Furo-cho, Chikusa-ku, Nagoya 464-8601, Japan Fax: +81 52 747 6433 Tel: +81 52 747 6474 E-mail: firstname.lastname@example.org
J. L. Markley, Center for Eukaryotic Structural Genomics, Department of Biochemistry, University of Wisconsin-Madison, 433 Babcock Drive, Madison, WI 53706 1344, USA Fax: +1 608 262 3759 Tel: +1 608 263 9349 E-mail: email@example.com
The product of gene At3g16450.1 from Arabidopsis thaliana is a 32 kDa, 299-residue protein classified as resembling a myrosinase-binding protein (MyroBP). MyroBPs are found in plants as part of a complex with the glucosinolate-degrading enzyme myrosinase, and are suspected to play a role in myrosinase-dependent defense against pathogens. Many MyroBPs and MyroBP-related proteins are composed of repeated homologous sequences with unknown structure. We report here the three-dimensional structure of the At3g16450.1 protein from Arabidopsis, which consists of two tandem repeats. Because the size of the protein is larger than that amenable to high-throughput analysis by uniform 13C/15N labeling methods, we used stereo-array isotope labeling (SAIL) technology to prepare an optimally 2H/13C/15N-labeled sample. NMR data sets collected using the SAIL protein enabled us to assign 1H, 13C and 15N chemical shifts to 95.5% of all atoms, even at a low concentration (0.2 mm) of protein product. We collected additional NOESY data and determined the three-dimensional structure using the cyana software package. The structure, the first for a MyroBP family member, revealed that the At3g16450.1 protein consists of two independent but similar lectin-fold domains, each composed of three β-sheets.
The flowering plant Arabidopsis thaliana is an important model system for identifying plant genes and determining their functions. Analysis of the completed Arabidopsis thaliana genome revealed the presence of 25 498 genes encoding proteins from 11 000 families, including many new protein families . To investigate the biological importance of these proteins, the Center for Eukaryotic Structural Genomics (CESG) at the University of Madison-Wisconsin has established platforms for protein structure determination by X-ray crystallography and NMR spectroscopy, with protein production both by conventional heterologous gene expression in Escherichia coli and automated cell-free technology . To date, targets for NMR analysis have been limited to proteins < 25 kDa, because this is the conventional size limit for high-throughput structure determination by NMR spectroscopy .
One of the motivations at CESG for choosing to develop a cell-free protein production platform was to be able to take advantage of the emerging new technology of optimal isotopic labeling for protein NMR spectroscopy. This approach, named stereo-array isotope labeling (SAIL), utilizes the incorporation of amino acids labeled with 2H, 13C and 15N in order to minimize spectral complexity and spin diffusion within the protein while allowing detection of all connectivities required for sequence-specific assignments and determination of sufficient constraints for high-resolution solution structures . The SAIL approach requires cell-free incorporation of the amino acids because the labeling patterns in the amino acids would become scrambled if they were incorporated in a cellular system . As its first target for investigation by the SAIL approach, CESG chose the A. thaliana gene At3g16450.1, which encodes a 32 kDa, 299-residue protein with unknown structure.
At3g16450.1 has been classified as a myrosinase-binding protein-like protein. Myrosinase is a glucosinolate-degrading enzyme , and myrosinase-binding protein (MyroBP) has been identified as a component of high-molecular-mass myrosinase complexes in extracts of Brassica napus seed . The presence of three myrosinase genes and several putative MyroBPs has been reported in A. thaliana [6–8]. The myrosinase/glucosinolate system is involved in plant defense against insects and pathogens , and hence MyroBP is implicated in this defense system, although experimental data supporting this notion are lacking . Many MyroBPs and MyroBP-related proteins have a repetitive structure with two or more homologous sequences [10,11]. The homologous domains also have sequence similarity to some plant lectins, and, because seed MyroBP from B. napus has been found to bind to p-aminophenyl-α-d-mannopyranoside and to some extent to N-acetylglucosamine, the protein has been reported to possess lectin activity . However, despite its functional importance, no three-dimensional structure has been determined for any domain of the MyroBP family.
We report here the three-dimensional structure of the At3g16450.1 protein, which consists of two homologous MyroBP-type domains. The structure, which was determined by NMR spectroscopy from a relatively low quantity of SAIL protein (approximately 60 nmol; 300 μL of 0.2 mm protein), revealed that At3g16450.1 consists of tandem lectin-like domains corresponding to the two homologous sequences (residues 1–144 and 153–299). To explore the sugar-binding activity of At3g16450.1, we investigated interactions between immobilized At3g16450.1 protein and fluorescently labeled (pyridylaminated, PA) sugars by frontal affinity chromatography (FAC) . Of the carbohydrates tested, only a few PA sugars showed significant affinity for the immobilized At3g16450.1. This result is discussed in light of the possible biological function of this protein. This study demonstrates the power of the SAIL approach in determining the structure of a larger protein by semi-automated means and with a minimal amount of material. It also shows how a structure determined by NMR spectroscopy can be the springboard for easily performed functional investigations.
Preparation of SAIL At3g16450.1
At3g16450.1 is a 299-residue protein with a molecular weight of 32 kDa. In our earlier work , we assigned the backbone resonances of At3g16450.1 using samples labeled uniformly with 13C/15N or 2H/13C/15N. However, further progress towards structure determination was impeded by the problems of spectral crowding and broadened signals, as commonly seen in the NMR spectra of uniformly 13C/15N-labeled (UL) large proteins. In the present study, we used the SAIL technique  to address these problems. As an initial step, we optimized the conditions for E. coli cell-free production of At3g16450.1 with regard to reaction temperature, duration of incubation, and expression vector. For comparison purposes, [U-13C,U-15N]-labeled At3g16450.1 (UL At3g16450.1) was prepared using an E. coli in vivo expression system.
Comparison of NMR spectra of SAIL and UL At3g16450.1
Although the concentration of the SAIL protein was lower than that of the UL protein by a factor of three (SAIL, 0.2 mm; UL, 0.6 mm), the NMR spectra of SAIL At3g16450.1 exhibited higher signal-to-noise ratios than those of UL At3g16450.1. The 1H-13C constant-time HSQC spectrum of SAIL At3g16450.1 was less crowded and better resolved than that of UL At3g16450.1 (Fig. 1A,B). The extensive stereo- and regio-specific deuteration of the SAIL protein led to diminished overlaps and sharpened peaks, particularly in the methylene region, without compromising essential structural information (Fig. 1C,D). In the methyl region, the regio-specifically labeled methyl resonances from the SAIL sample were much less crowded (Fig. 1E,F). As a result of these striking spectral improvements, it became possible to use established methods  to assign 95.5% of the resonances of SAIL At3g16450.1. The chemical shifts for SAIL At3g16450.1 have been deposited in the Biological Magnetic Resonance Data Bank (BMRB)  with accession number 15607. In addition, 93% of the backbone carbonyl 13C shifts had been assigned previously using uniformly 13C/15N-labeled protein . These assigned chemical shifts were used as input for the talos program  to obtain dihedral angle constraints.
Solution structure of SAIL At3g16450.1
Assignment of the NOE peaks of At3g16450.1 and the structure determination were accomplished by use of the cyana program [17,18]. The structural statistics are summarized in Table 1. Although the 20 conformers representing the structures of At3g16450.1 did not superimpose well when the full sequence was considered (residues 1-299), each individual domain (residues 1-144 or residues 153-299) superimposed well when considered separately (Fig. 2A,B). Residues 16–21 and 45–47 exhibited severe line broadening, probably arising from internal dynamics of these residues on the intermediate time scale for chemical shifts. As a result, these are the least well-defined regions of the N-terminal domain. The C-terminal domain yielded reasonably well-converged structures, including the side-chain conformations of residues in its core (Fig. 2C,D).
Table 1. NMR constraints and structure calculation statistics for At3g16450.1a.
a The completeness of the 1H, 13C and 15N chemical shift assignments was evaluated for the aliphatic, aromatic, backbone amide and Asn/Gln/Trp side-chain amide nuclei, excluding the carbon and nitrogen atoms not bound to 1H. Where applicable, the value given corresponds to the average over the 20 energy-refined conformers that represent the solution structure. cyana target function values were calculated before energy refinement.
Completeness of the chemical shift assignments (%)
Root mean square deviation from the averaged coordinates (Å)
Backbone atoms of residues 2–144 (N-domain)
1.12 ± 0.19
Heavy atoms of residues 2–144 (N-domain)
1.65 ± 0.16
Backbone atoms of residues 153–297 (C-domain)
0.69 ± 0.10
Heavy atoms of residues 153–297 (C-domain)
1.08 ± 0.09
Residues 145–152 in the linker region between the two domains are highly disordered. In addition, a careful search failed to reveal any inter-domain NOE peaks. Thus the relative orientations of the two domains appear not to be fixed, and the overall structure of At3g16450.1 is best described as two tandem structural domains connected by a flexible linker (Fig. 3A). The secondary structural elements of At3g16450.1, extracted from the coordinates of the three-dimensional structure using the dssp algorithm , showed that each domain has a similar structure consisting of three β-sheets related by pseudo three-fold symmetry (Fig. 3B).
The coordinates of the 20 energy-refined conformers that represent the solution structure of At3g16450.1 have been deposited in the Protein Data Bank with accession code 2JZ4. A structural homology search using the program dali at the European Molecular Biology Laboratory (EMBL) [20,21] yielded the agglutinin from Maclura promifera (Protein Data Bank code 1JOT), a plant lectin, as the closest structure. The root mean square deviation values for the N- and C-terminal domains versus the agglutinin are 2.2 and 2.0Å, respectively. Thus each of the two domains of At3g16450.1 adopts a lectin fold. The orientation of the N-terminal domain relative to the C-terminal domain could not be defined owing to the absence of inter-domain NOEs. To confirm the molecular organization of the tandem arrangement, expression vectors were constructed that separately encoded the N-terminal half (residues 1–153) and the C-terminal half (residues 151–299) of At3g16450.1, and these were used to prepare 15N-labeled samples of each domain. The 1H-15N HSQC spectrum of each domain was well dispersed, and, when overlaid, closely approximated the spectrum of full-length At3g16450.1 (Fig. 4A,B). This result confirms the structural arrangement of At3g16450.1 as two independent tandem structural domains.
Interaction analysis of At3g16450.1 with sugars
Because each structural domain of At3g16450.1 was found to adopt a lectin fold, we assayed At3g16450.1 for possible sugar-binding activity. We utilized 13 fluorescence-labeled oligosaccharides (PA sugars) as candidates. Four PA sugars eluted more slowly than the tetra-sialyl PA-glycan as a control PA sugars from a column of immobilized At3g16450.1 (Fig. 5A,B and Table 2). On the basis of the elution profiles, the Kd values for the four PA sugars to At3g16450.1 were estimated to be low, at most 10−4m. To further examine the observed interaction, we acquired 1H-15N HSQC spectra of 15N-labeled At3g16450.1 in the presence and absence of maltohexaose, (Glcα1-4Glc)3. However, addition of (Glcα1-4Glc)3 did not cause any perturbation of NMR resonances, even when the concentration of the sugar was ten times higher than that of the protein (data not shown). By contrast, NMR titration of At3g16450.1 with (Glcα1-4Glc)3-PA led to distinct chemical shift changes for certain NMR resonances (Fig. 5C), but addition of PA as the ligand resulted only in limited subtle changes. These results suggest that both PA and the (Glcα1-4Glc)3 elements contribute to the observed interactions. Residues in both the N- and C-terminal domains of At3g16450.1 were affected by the presence of PA sugars (Fig. 5C, blue and red boxes). Taken together, these binding analyses suggest that At3g16450.1 has the potential to bind PA sugars with specificity for the sugar structure, although none of the various sugars tested exhibited a strong affinity.
Table 2. Summary of results of the FAC binding assay for At3g16450.1 with various PA sugars.
In this study, we determined the solution structure of the 32 kDa At3g16450.1 protein from A. thaliana by the SAIL-NMR method. This is the first application of SAIL-NMR in a structural genomics study. It provided the first structure for a member of the hitherto structurally unexplored MyroBP family.
At3g16450.1 consists of two tandem domains, each composed of three β-sheets. The fold of each domain is nearly identical to that of an agglutinin (Protein Data Bank code 1JOT), which shares sequence identities of 26 and 33% with the N- and C-terminal domains of At3g16450.1, respectively. Sequence similarity searches performed by psi-blast  identified other MyroBPs and MyroBP-like proteins from A. thaliana and B. napus, with sequence identities to the At3g16450.1 domains ranging from 30% to 70%. The most highly conserved regions correspond to the β-strands (Fig. 6). The N- and C-terminal domains of At3g16450.1, with 51% sequence identity to each other, are superimposed with root mean square deviations of 1.3 Å for the backbone of the β-strands and 1.7 Å if the loop regions are included, indicating that all of these family members adopt a similar fold.
It has been reported that seed MyroBP from B. napus possesses lectin activity, binding to p-aminophenyl-α-d-mannopyranoside and to some extent to N-acetylglucosamine . Because myrosinase contains potential N-linked sugar-binding sites , the sugar-binding activity of MyroBP is implicated in binding to myrosinase. In the case of At3g16450.1, the protein did not show a significant affinity for sugar structures specific to N-linked glycan, but rather showed weak affinity for starch or glycolipid, raising the possibility that the lectin activity of the MyroBP family is also involved in interaction between a myrosinase complex and other molecules. It is also noteworthy that a UniGene database search  suggested that At3g16450.1 is expressed in leaf and root. Because myrosinases have also been shown to be expressed in A. thaliana leaf [6,8], it may be suspected that At3g16450.1 forms a complex with myrosinase, thereby guiding the myrosinase to a damaged site in the leaf via weak interactions with starch in the leaf or glycolipid from foreign pathogens. However, it is obvious that further study will be required to determine the biological importance of MyroBP–sugar interactions.
Many MyroBP and MyroBP-related proteins contain tandem lectin domains as shown in Fig. 6. The tandem domains present in MyroBP family members may participate in multivalent sugar binding as observed with other carbohydrate binding proteins with multiple domains. Results of the NMR chemical-shift perturbation experiments (Fig. 5C) suggest that both domains of At3g16450.1 can participate in a bivalent sugar binding. It is also probable that each homologous domain of the MyroBP family possesses different ligand-binding properties, thereby providing a broad binding specificity. In some proteins containing tandem homologous domains, inter-domain interactions fix the relative orientation of the domains in a specific multi-domain structure that is essential for biological function. Other proteins with tandem domains contain a flexible linker, and a specific structure may be adopted only when a target is bound. The present study suggests that At3g16450.1 belongs to the latter category.
The major problems with structural genomics studies using NMR are low solubility and molecular-weight limitations . As shown by this study, the SAIL-NMR method provides a promising approach to overcoming both of these problems. One important aspect of the SAIL technology is that the signal intensities for the SAIL protein are several times stronger than for the corresponding UL sample , thus making it possible to perform structure determination for proteins even at low concentration. In this study, the structure was determined using a 0.2 mm sample of SAIL At3g16450.1. The SAIL-NMR method offers the opportunity to determine structures of proteins with low solubility or poor yield. The SAIL method can also accelerate the process of structural analysis. The spectral simplification achieved by SAIL with this larger protein makes it possible to use semi- or fully automated methods developed for use with smaller proteins to analyze the NMR data. We are developing a software package that exploits the benefits of the SAIL method [25–27]. Finally, the SAIL method is expected to enable functional investigations of larger proteins.
The construction of pET15b (Novagen, Madison, WI, USA) harboring At3g16450.1 was performed as described previously . The vector used for cell-free production of At3g16450.1 was constructed according to a strategy described previously . DNA coding for the N-terminal histidine tag followed by the At3g16450.1 was subcloned into pIVEX2.3d (Roche, Pleasanton, CA, USA) between the NcoI/NdeI and NdeI/BamHI sites, respectively. Silent mutations were introduced into the N-terminal sequence to enhance the expression rate . Expression vectors coding for the N-terminal (residues 1–153) and C-terminal (residues 151–299) domains of At3g16450.1 were constructed by cloning the corresponding target sequence into the NdeI and BamHI sites of pET15b.
Preparation of labeled proteins
[U-15N]- and [U-13C, U-15N]-labeled proteins were produced by culturing Escherichia coli BL21 (DE3) strain harboring the corresponding expression vector in M9 medium containing 15NH4Cl and/or [U-13C]-labeled glucose as the sole nitrogen and carbon sources. Cells were cultured at 30 °C with shaking. Expression was induced by the addition of isopropyl thio-β-d-galactoside (IPTG) at a final concentration of 1 mm, and cells were harvested 6.5 h after induction.
SAIL At3g16450.1 was produced by E. coli cell-free expression. A total of 110 mg of SAIL amino acid mixture was used, with the amount of each individual SAIL amino acid proportional to the amino acid composition of At3g16450.1. A home-made E. coli S30 extract was used, and the reaction was performed as previously described [25,28]. The volumes of the inner and outer solutions were 10 and 40 mL, respectively. The reaction was carried out at 30 °C for 15 h with shaking. To prevent degradation of the produced protein, a protease inhibitor cocktail (Roche) was added to the reaction. The At3g16450.1 protein was purified as described previously .
The NMR sample used for the structure determination contained 0.2 mm SAIL At3g16450.1 protein in 20 mm bis-Tris(2-carboxymethyl)phosphine: HCl(D19, 98%) (Cambridge Isotope Laboratories Andover, MA, USA), 100 mm KCl, 10% D2O, pH 6.8. NMR spectra were recorded on a Bruker (Tsukuba, Japan) Avance 600 MHz spectrometer equipped with a 5 mm 1H-observe triple-resonance cryogenic probe (Bruker TXI cryoProbe), and on a Bruker Avance 800 MHz spectrometer at 27.5 °C. The spectra were processed using the programs xwinnmr version 3.5 (Bruker) or nmrpipe , and analyzed using the program sparky (T. D. Goddard and D. G. Kneller, Department of Pharmaceutical Chemistry, University of California, San Francisco, CA, USA). Backbone and β-CH resonances were assigned using 2D HSQC, and 3D HN(CO)CACB and HBHA(CO)NH spectra. Side-chain resonances were assigned using 3D H(CCCO)NH, (H)CC(CO)NH, HCCH-TOCSY, constant time-HCCH-COSY, 13C-edited NOESY and 15N-edited NOESY spectra. 15N- and 13C-edited NOESY spectra were recorded with a mixing time of 75 ms, and the inter-proton distance constraints were obtained from the NOESY peaks, which were selected and manually filtered using sparky.
Collection of conformational constraints, structure calculation and refinement
Automated NOE cross-peak assignments  and structure calculations with torsion-angle dynamics were performed using the program cyana, version 2.2 . Backbone torsion-angle constraints obtained from database searches using the program talos  were incorporated into the structure calculation. Simulated annealing with 20 000 torsion-angle dynamics time steps per conformer was performed during the cyana structure calculations. In the final cycle of the cyana protocol, 100 conformers were generated and further refined using the amber 9 software package  with a full-atom force field . The refinement comprised three stages: initial minimization, molecular dynamics, and final minimization. Minimization and molecular dynamics consisted of 1500 steps and 20 ps duration, respectively. A generalized Born implicit solvent model was used to account for the solvent effects . The force constants for distance and torsion-angle constraints were 50 kcal·mol−1·Å−2 and 200 kcal·mol−1·rad−2 respectively. From the resulting structures of this first amber refinement, we extracted backbone hydrogen-bond constraints in the regular secondary elements that were present in more than 75% of the 100 conformers. With these as additional constraints, we repeated the refinement. From the conformers that did not significantly violate experimental constraints, we selected the 20 lowest-energy structures for analysis. The structural quality was evaluated using procheck-nmr . The program molmol  was used to visualize the structures. The coordinates of the 20 energy-refined cyana conformers of At3g16450.1 have been deposited in the Protein Data Bank (accession code 2JZ4). The chemical shifts of At3g16450.1 have been deposited in the BioMagResBank (accession code 15607).
Frontal affinity chromatography
M9.1, 210.1, 210.4 and 210.1FX were purchased from Seikagaku Kogyo Co (Tokyo, Japan). The code numbers and structures of pyridylaminated oligosaccharides refer to the GALAXY website at http://www.glycoanalysis.info/ENG/index.html . Two kinds of PA-oligosaccharides, GalNAcα1-3(Fucα1-2)Galβ1-3(Fucα1-4)GlcNAcβ1-3Galβ1-4Glc-PA and Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-6(Neu5Acα2-3Galβ1-3(Neu5Acα2-6)GlcNAcβ1-4(Neu5Acα2-6Galβ1-4GlcNAcβ1-2)Manα1-3)Manβ1-4GlcNAcβ1-4GlcNAc-PA were obtained from Takara Bio. Inc. (Otsu, Shiga, Japan). Other PA glycans were prepared by amination of the commercial oligosaccharides using 2-aminopyridine . Lewis A- and Lewis X-type glycans, Galβ1-3(Fucα1-4)GlcNAcβ1-3Galβ1-4Glc and Galβ1-4(Fucα1-3)GlcNAcβ1-3Galβ1-4Glc were purchased from Calbiochem (San Diego, CA, USA). Cellohesaose, chitohesaose, isomaltohexaose, laminarihesaose and maltohexaose were purchased from Seikagaku Kogyo Co.
The protein At3g16450.1 containing the N-terminal histidine tag was dissolved in 10 mm HEPES buffer, pH 7.6, containing 150 mm NaCl, 1 mm CaCl2, and bound to Ni-NTA agarose. After immobilization, the agarose beads were packed into a stainless steel column (4.0 × 10 mm, GL Sciences, Tokyo, Japan).
Frontal affinity chromatography analysis was performed as described previously . PA oligosaccharides were dissolved at a concentration of 10 nm in 10 mm HEPES, pH 7.6, containing 150 mm NaCl, 1 mm CaCl2, and applied onto the At3g16450.1 column at a flow rate of 0.25 mL·min−1 at 20 °C. The elution profile was monitored by the fluorescence intensity at 400 nm (excitation at 320 nm). Tetrasialyl PA glycan Neu5Acα2-6Galβ1-4GlcNAcβ1-2Manα1-6(Neu5Acα2-3Galβ1-3(Neu5Acα2-6)GlcNAcβ1-4(Neu5Acα2-6Galβ1-4GlcNAcβ1-2)Manα1-3)Manβ1-4GlcNAcβ1-4GlcNA-PA was used as a control sugar to determine the elution volume of the unbound oligosaccharide.
NMR chemical-shift perturbation mapping
NMR samples were prepared using free [U-15N]-labeled At3g16450.1 (0.1 mm protein, 10 mm HEPES, pH 7.6, 150 mm KCl, 1 mm CaCl2) and its complex with PA sugar [same solvent composition plus 0.5 mm PA-(Glcα1-4Glc)3]. 1H-15N HSQC spectra of the isolated and titrated samples were acquired at 27.5 °C using a Bruker Avance 600 MHz NMR spectrometer.
This work was supported by the Technology Development for Protein Analyses and Targeted Protein Research Program of the Ministry of Education, Culture, Sports, Science and Technology of Japan (MEXT), by Core Research for Evolutional Science and Technology (CREST) of the Japan Science and Technology Agency (JST), by a Grant-in-Aid for Scientific Research from the Japan Society for the Promotion of Science (JSPS), by the National Institutes of Health Protein Structure Initiative (grants P50 GM64598 and U54 GM074901), and by the Volkswagen Foundation.