Characterization of disordered regions in globular proteins constitutes a significant challenge. Here, we report an approach based on 13C-detected nuclear magnetic resonance experiments for the identification and assignment of disordered regions in large proteins. Using this method, we demonstrate that disordered fragments can be accurately identified in two homologs of menin, a globular protein with a molecular weight over 50 kDa. Our work provides an efficient way to characterize disordered fragments in globular proteins for structural biology applications.
Disordered regions in proteins can play important roles in protein function. These regions are frequently involved in cell signal transduction, transcriptional regulation, molecular recognition, and protein regulation through posttranslational modification.1, 2 The identification and characterization of disordered regions in proteins has become an important task for computational protein structure prediction and for structural biology.3–6 Disordered fragments can be predicted using various bioinformatics methods3, 6, 7; however, high-resolution experimental validation and biophysical characterization of these regions remain challenging. Accurate methods for the identification of disordered fragments in globular proteins are of significant interest to the structural biology community as the presence of flexible protein segments may interfere with the production of diffraction quality crystals. Consequently, extensive protein engineering can be required to remove these flexible regions to enable crystallization or improve the quality of protein crystals. For example, the recent X-ray structure of the Drosophila effector caspase drICE required the deletion of a highly flexible internal fragment.8 Additionally, deletion of internal flexible regions in the GluR2 receptor ligand-binding domain resulted in improved diffraction of protein crystals from 2.5 to 1.5 Å.9, 10 Experimental identification of internal-disordered regions is commonly based on rapid hydrogen-deuterium exchange rates for solvent-exposed amide protons, which can be detected using mass spectrometry.11, 12 Nevertheless, accurate identification of disordered residues remains difficult, and an efficient strategy to experimentally detect such fragments in globular proteins would significantly facilitate the design of protein constructs suitable for crystallization. Furthermore, identification of disordered residues might facilitate the characterization of their role in the protein function.
Nuclear magnetic resonance (NMR) is a valuable experimental technique uniquely suited for high-resolution studies of disorder in proteins.13 However, amide-proton-detected NMR experiments commonly used for protein studies are hindered in the characterization of intrinsically disordered proteins due to poor resonance dispersion and fast exchange of amides with water limiting the observation of complete set of resonances. This can be partly circumvented by experiments with Hα-detection.14 To the contrary, NMR experiments directly detecting 13C overcome these limitations as the random coil carbon chemical shifts have greater dispersion than proton chemical shifts, and observation of 13C is not affected by exchange of amide protons with water.15 Additional advantages of carbon-detected experiments include the observation of resonances corresponding to the backbone Cα and C′ carbons allowing for the detection of all amino acids, including proline, which is frequently found in disordered regions.16 The use of 13C-detected experiments for the analysis of the disordered proteins α-synuclein,17 the N-terminal of c-Src kinase,18 and β-2-microglobulin19 at neutral pHs and ambient temperatures demonstrates the advantages of these experiments for systems where H-D exchange broadening limits the utility of proton-detected experiments. Furthermore, we have previously shown that 13C-detected experiments can be used to characterize protein–protein interactions involving disordered proteins.20
Here, we demonstrate that 13C-based NMR techniques provide a very efficient approach to characterize disordered regions within large, globular proteins. We find that this method allows for the detection of long-disordered regions as well as the highly accurate identification of even relatively short-disordered fragments (∼10 amino acids long). We used 13C-detected experiments to identify such disordered regions in menin. Menin is a tumor suppressor protein that controls cellular growth in endocrine tissues21 and also functions as an oncogenic cofactor required for leukemogenesis.22 Structural studies were recently undertaken, and while full-length menin proved to be recalcitrant to crystallization experiments, successful crystallization of the protein was achieved through the deletion of internal-disordered fragments.23, 24 The first menin to be crystallized was the homolog from Nematostella vectensis, and crystallization required truncation of the C-terminus and deletion of one internal-disordered fragment.23 More recently, human menin was crystallized, also requiring the deletion of a long internal fragment predicted to be unstructured.24 We evaluated whether these internal-disordered fragments could be identified through 13C-detected NMR experiments. As model proteins, we chose C-terminally truncated constructs of Nematostella menin (N_meninΔC corresponding to residues 1–468) and human menin with the partial deletion of an internal fragment and truncation of the C-terminus (H_meninΔΔC corresponding to residues 1–593 with the deletion of residues 465–524). Such truncated proteins were selected to achieve soluble protein while retaining significant molecular weight.
Results and Discussion
Sequence analysis using the DISOPRED2 server3 revealed that all menin homologs have multiple internal regions predicted to be disordered. We first tested whether 13C-NMR experiments could be used to identify these internal-disordered fragments in Nematostella menin. The CACO spectrum of 13C,15N-labeled N_meninΔC revealed the presence of ∼27 resonances [Fig. 1(A)]. Given the significant molecular weight of the protein (55 kDa), we expect that all observable signals correspond to the most disordered residues. Slow tumbling of the protein molecule leads to very strong broadening of resonances for structured fragments and acts as an efficient filter leaving observable signals only for highly mobile residues. To assign these observed resonances, we also collected CBCACO and CANCO experiments.25, 26 Through sequential assignment, we found that the majority of these signals correspond to an internal fragment (residues 423–440), 5 N-terminal residues and 2 C-terminal residues [Fig. 1(A)]. We found that analysis of Cβ-C′ correlations was essential for the assignment due to less significant peak overlap in this region and because the Cβ chemical shifts allow for the identification of the amino acid type [Fig. 1(B) and Supporting Information Fig. 1]. We were not able to assign several remaining peaks due to reduced intensities. Most likely, these peaks correspond to the shorter and less-disordered loops. The assigned residues 423–440 yield strong resonances, clearly indicating that this fragment is disordered in solution. Consistently, deletion of residues 426–442 in Nematostella menin was necessary to obtain diffraction quality crystals and to determine the X-ray structure of the protein.23 We also assessed whether disordered fragments can be identified using standard amide-proton-detected experiments. The 1H-15N heteronuclear single quantum coherence (HSQC) spectrum for N_meninΔC at pH 7.5 reveals the presence of ∼20–25 peaks for backbone amides including the number of strongly broadened resonances (Supporting Information Fig. 2). This indicates that 13C detection yields a clear benefit to identify disordered fragments as these spectra are not affected by the H–D exchange phenomenon.
The internal-disordered fragment in Nematostella menin is fairly short, and its assignment based on 13C-detected experiments was relatively straightforward. However, unambiguous assignment for more complex proteins with multiple-disordered fragments would be more difficult due to increased peak overlap and complexity of the 2D spectra. Therefore, we assessed whether assignment of disordered regions based on 13C experiments could be facilitated by combining bioinformatics methods for disorder prediction and chemical shift calculation. To test this, we first used the program DISOPRED23 for the prediction of internal regions of increased disorder in N_meninΔC. Based on this method, three possible internal-disordered regions were identified: residues 177–187, 356–367, and 418–456 [Fig. 1(A), inset]. We next assumed that these disordered fragments would have chemical shifts consistent with random-coil values, which can be predicted with high accuracy.27–29 Thus, we used the ncIDP program27 to generate predicted chemical shifts for these regions. These predicted chemical shifts were used to simulate spectra with Cβ-C′, Cα-C′, and C′i-Cαi+1 correlations and compared to experimental data for N_meninΔC. Using this approach, we found that observed resonances correspond to residues 423–440, consistent with the manual assignment. Overall, this analysis validated the use of chemical shift prediction as a very efficient strategy to aid in completing the assignment of disordered regions in large proteins.
We next applied this strategy to assign unstructured fragments in human menin, which represents a more complex protein. The CACO and CBCACO experiments measured for H_meninΔΔC indicated the presence of 34 disordered residues with more significant overlap compared to the N_meninΔC spectra [Fig. 2(A)]. The unambiguous assignment of these residues based on the combination of CACO, CBCACO, and CANCO experiments was difficult due to extensive signal overlap. To aid in the assignment of these resonances, we carried out chemical shift prediction for the fragments with the highest disorder probability. Analysis of the H_meninΔΔC sequence using the DISOPRED2 program identified three potential disordered regions: 204–213, 383–401, and 457–549 [Fig. 2(B)].
Using chemical shift prediction in conjunction with 13C-detected experiments acquired for sequential assignment, we unambiguously assigned two internal regions corresponding to residues 384–397 and 525–541. As observed for Nematostella menin, we also detected several additional peaks, corresponding to two N-terminal and two C-terminal residues in H_meninΔΔC. The longest disordered fragment, which we assigned, residues 525–541, was also predicted with high probability using the DISOPRED2 server [Fig. 2(B)]. These residues are unstructured in the recently reported crystal structure of human menin24 with only a short fragment observed due to crystal packing interactions [Fig. 2(C)]. We also found that of the two remaining fragments predicted with lower confidence to be disordered, only residues 384–397 are disordered in solution. This is consistent with the crystal structure of human menin, which is lacking electron density for residues 386–402 [Fig. 2(C)].24 To the contrary, the second fragment encompassing residues 204–213 predicted to be disordered was not detected in 13C-NMR experiments, and, indeed, this region is well ordered in the crystal structure of human menin [Fig. 2(C) and Supporting Information Figs. S3 and S4].24
As expected, we observed very good correlations between predicted and experimentally assigned chemical shifts for all resonances detected on menin spectra. The root-mean-square deviation (RMSD) values for the Nematostella menin are 0.21, 0.19, and 0.28 ppm for Cα, Cβ, and C′ chemical shifts, respectively. The RMSD between predicted and experimental chemical shifts for human menin is 0.34 for Cα, 0.18 for Cβ, and 0.30 ppm for C′ chemical shifts. Such close agreement demonstrates that chemical shift predication provided a very valuable tool for the rapid assignment of the spectra even in the presence of strong spectral overlap. We anticipate that the combination of 13C-detected experiments with chemical shift predication will be highly beneficial for rapid assignment of even more complex spectra with longer disordered fragments.
We expected that an important advantage of 13C-detected experiments over detection of amide protons is the possibility to record spectra at a broad pH range without compromising the number of observed signals. To assess this, we tested the effect of pH on the spectra of the H_meninΔΔC protein (Fig. 3). As expected, the number of backbone amide signals in 1H-15N HSQC spectra increased dramatically with decreasing pH. At pH 8.5, we observed only approximately five to seven sharp peaks, while by lowering the pH to 7.5 and 6.5, we observed, respectively, 15–18 and 30–32 sharp peaks. Even at the lowest pH, a number of peaks remained broad, which would most likely limit the assignment based on the HN-detected triple-resonance experiments. The CACO and CBCACO spectra at this range of pHs consistently showed ∼34 peaks, which would allow for complete assignment of disordered residues under a broad range of pHs. Importantly, we have recorded high-quality 13C experiments for protein concentration as low as 50 μM demonstrating applicability of this approach to large proteins with low solubility or particular buffer requirements.
In summary, we have developed a simple method for the identification and assignment of disordered regions in large proteins. We have demonstrated that (1) the use of carbon-detected NMR experiments allows for the identification of disordered residues in two homologs of menin (each protein is larger than 50 kDa); (2) this method allows for the complete detection of disordered residues at broad range of pHs when compared with standard amide-proton-detected experiments; (3) high-quality 13C experiments can be recorded at low (∼50–100 μM)-protein concentrations; (4) the assignment of relatively complex spectra can be rapidly achieved through the combination of experimental data with chemical shift calculation. Importantly, the NMR experiments allowed for highly accurate identification of even relatively short disordered fragments (∼10 amino acid long), which are more difficult to predict using bioinformatics methods.6 We have also demonstrated that the fragment of Nematostella menin, which was previously deleted for protein crystallization,23 is disordered in solution. Therefore, identification of disordered internal fragments through this method may serve to strongly support optimization of constructs for protein crystallization. Indeed, we have very recently used this approach to support engineering of a construct of human menin and obtained high-quality crystals diffracting to 1.3–1.5 Å resolution.30 Such an NMR-based approach is complementary to the high-resolution deuterium exchange mass spectroscopy (DXMS) method, which is frequently used for the detection of disordered fragments in proteins and construct optimization.11, 12, 31 Contrary to DXMS, NMR spectroscopy provides direct information regarding backbone flexibility and might be more suitable to distinguish disordered fragments from solvent-exposed and structured loops. Additionally, the direct high-resolution observation of disordered regions in large proteins using 13C-detected NMR may support future characterization of these regions as functional components of globular proteins.
Materials and Methods
The synthetic construct encoding Nematostella menin was ordered from GenScript and cloned into the pET32a vector. The truncation after residue 487 led to the generation of the N_meninΔC, which was used for NMR experiments. The 13C,15N-labeled N_meninΔC protein was expressed by growing bacterial cells in isotopically enriched M9 minimal media. The purification was carried out following previously described protocol.23
The synthetic gene encoding human menin with an internal deletion (Δ465–524) was ordered from GenScript and subcloned into pET32a vector (Novagen). The construct H_meninΔΔC was made by introducing stop codon to generate truncation following residue 593. H_meninΔΔC was expressed in Rosetta2(DE3) cells (Novagen) grown in isotopically enriched M9 minimal media and purified using affinity chromatography column HisTrap HP (GE Healthcare) followed by ion exchange using Q Sepharose FF (GE Healthcare). To remove the thioredoxin-His6 tag, the protein was cleaved by TEV protease and loaded onto Ni-NTA superflow resin (Qiagen). The protein was further purified by size exclusion chromatography using HiLoad 16/60 Superdex 75 pg (GE Healthcare).
For NMR experiments, the 13C,15N-labeled N_meninΔC sample was prepared at a final concentration of 100 μM in 50 mM Tris buffer, pH 7.5, 150 mM NaCl, and 1 mM tris(2-carboxyethyl)phosphine (TCEP) with 10% D2O. The 13C,15N-labeled H_meninΔΔC sample was prepared to a final concentration of 115 μM in 50 mM Tris, pH 7.5, 50 mM NaCl, and 1 mM dithiothreitol (DTT) with 10% D2O. pH titration experiments were prepared at 50 μM in 50 mM Tris, pH 6.5/7.5/8.5, 50 mM NaCl, and 1 mM dithiothreitol (DTT) with 10% D2O. NMR measurements were performed using a Bruker Advance III 600-MHz spectrometer equipped with 5-mm TCI cryogenic probe. The following parameters were used for 13C-detected experiments: 2D CACO25 data size: 64 (t1) × 512 (t2) complex points, t1max (13C) = 16 ms, and t2max (13C) = 85.2 ms; 2D CBCACO25 data size: 70 (t1) × 512 (t2) complex points, t1max (13C) = 7.8 ms, and t2max (13C) = 85.2 ms; 2D CANCO26 data size 50 (t1) × 512 (t2) complex points, t1max (13C) = 7.4 ms, and t2max (13C) = 85.2 ms. All 13C-detected experiments were recorded with 1H excitation in order to increase the sensitivity and processed with the in-phase anti-phase (IPAP) scheme for decoupling. These experiments were recorded with 1-s relaxation delay and 32, 64, and 448 scans per increment, respectively. This led to total acquisition times of 2.5, 6, and 31 h. All experiments were collected at 25°C. Spectra were processed with NMRPipe32 and analyzed with Sparky (http://www.cgl.ucsf.edu/home/sparky/).