Structure-based analysis of thermodynamic and mechanical properties of cavity-containing proteins – case study of plant pathogenesis-related proteins of class 10

Authors

  • Mateusz Chwastyk,

    1. Institute of Physics, Polish Academy of Sciences, Warsaw, Poland
    Search for more papers by this author
  • Mariusz Jaskolski,

    1. Center for Biocrystallographic Research, Institute of Bioorganic Chemistry, Polish Academy of Sciences, Poznan, Poland
    2. Department of Crystallography, Faculty of Chemistry, A. Mickiewicz University, Poznan, Poland
    Search for more papers by this author
  • Marek Cieplak

    Corresponding author
    1. Institute of Physics, Polish Academy of Sciences, Warsaw, Poland
    • Correspondence

      M. Cieplak, Institute of Physics, Polish Academy of Sciences, 02-668 Warsaw, Poland

      Fax: +48 22 116 0926

      Tel: +48 22 116 3365

      E-mail: mc@ifpan.edu.pl

    Search for more papers by this author

Abstract

We provide theoretical comparisons of the physical properties of eighteen proteins with the pathogenesis-related proteins of class 10 (PR-10) fold, which is characterized by a large hydrophobic cavity enclosed between a curved β-sheet and a variable α-helix. Our novel algorithm to calculate the volume of internal cavities within protein structures is used to demonstrate that, although the sizes of the cavities of the investigated PR-10 proteins vary significantly, their other physical properties, such as thermodynamic and mechanical parameters or parameters related to folding, are very close. The largest variations (in the order of 20%) are predicted for the optimal folding times. We show that, on squeezing, the PR-10 proteins behave differently from typical virus capsids.

Abbreviations
CSBP

cytokinin-specific binding protein

PDB

Protein Data Bank

PR-10

pathogenesis-related proteins of class 10

Introduction

Plants, with their permanent attachment to a given environment, have to respond to stress in situ. They have developed several unique defence mechanisms to abiotic and biotic stress factors, such as pathogens, one of which involves the expression of a number of pathogenesis-related (PR) groups of proteins, divided, according to function, into seventeen classes. In this classification, PR proteins of class 10 (PR-10) are among the most curious plant proteins because no unique function can be attributed to them despite their high levels of expression and involvement in other processes, such as the development regulation or symbiosis [1]. This puzzling mystery with respect to PR-10 function is to be contrasted with a thorough characterization of the PR-10 structure by X-ray crystallographic and NMR spectroscopic methods. Those studies have revealed a hollow cavity in the molecular core (formed by a relatively short polypeptide chain of 154–163 residues), surrounded by a seven-stranded antiparallel β-sheet crossed by a long C-terminal α-helix (α3; also denoted here as H3), resting on a V-shaped support formed by two additional helices [2, 3]. This characteristic fold, termed the PR-10 fold or Betv1 fold [after the first PR-10 protein (birch pollen allergen Bet v 1) for which the crystal structure was solved), is strongly suggestive of small-molecule binding. The ligands should be rather hydrophobic because the cavity interior has a mostly hydrophobic character. These physiological ligands might comprise plant hormones, termed phytohormones, which chemically belong to several divergent groups but usually have a hydrophobic part. Phytohormones are important signalling molecules in plant physiology, with special role in development and response to stress; therefore, they fit this hypothetical scenario quite well. Indeed, the binding of phytohormones by PR-10 proteins has been demonstrated experimentally, especially with respect to cytokinins, which are adenine-based plant hormones. As examples, the crystal structures of trans-zeatin (a cytokinin molecule) in complex with a cytokinin-specific binding protein (CSBP) from mung bean [4] or with a yellow lupine LlPR-10 protein [5] can be given. Those complexes, however, do not explain the PR-10 mystery because, although the ligand molecules are usually perfectly defined in electron density, their binding appears to be variable and nonspecific. It is therefore important to employ complementary biophysical and computational methods to study the PR-10 structure, with the hope that they will provide clues about its elusive physiological function. The biological activity aspect is further complicated by the fact that, despite the strictly conserved folding canon, various subgroups of PR-10 proteins show a low to very low level of sequence conservation. This is particularly unexpected in the H3 (or α3) region which, despite its conserved secondary structure, has almost no sequence conservation. It is plausible that the α3 helix, which is a major structural element contributing to the definition and properties of the internal cavity, by modulating its sequence (and degree of structural deformation), may ultimately regulate the ligand-binding properties of various PR-10 members.

We can sort these proteins into four groups according to their amino acid sequences. Nevertheless, the tertiary structure of each of the proteins is the same, and so we introduce standard labels for their secondary structure elements (Fig. 1). Table 1 shows the Protein Data Bank (PDB) codes [18] (by which the proteins are identified in the present study) and names of the 18 proteins under investigation. It also shows the chain selected (in cases of multiple copies in the crystallographic asymmetric unit) and cites the appropriate references. Table 2 shows the PDB codes and sequence assignment of the secondary structure elements for the 18 proteins under investigation. All of the entries correspond to single chains, except for 3IE5, which forms an S-S covalently linked homodimer, composed of two Hyp-1 chains. Some of the listed proteins (1FSK, 1FM4, 1TXC) were studied as complexes with small-molecule ligands but, for the present analysis, this aspect is disregarded. The ligand molecules of these complexes (as well as all water molecules and other components of the environment) have been omitted from the calculations.

Table 1. The proteins investigated in the present study [1]. The first column shows the PDB code; the second column lists the chain used in the present study; the third clumn shows the accepted designation of the protein; and the fourth column cites a reference from the literature.
PDBChainNameReferences
1BV1 ABet v 1a [2]
1FSK ABet v 1a/Fab [6]
1LLT ABet v 1a (E45S) [7]
1QMR ABet v 1a [8]
1FM4 ABet v 1I/deoxycholate [9]
1XDF ALIPR-10.2A [10]
2QIM ALIPR-10.2B/zeatin [5]
3E85 ALIPR-10.2B/DPU [11]
1ICX ALIPR-10.1A [3]
1IFV ALIPR-10.1B [3]
1TW0 ASPE16 [12]
1TXC ASPE16/ANS [12]
2BK0 AApi g 1 [13]
3C0V ACSBP/zeatin+(Ta6Br12)2+ [14]
3IE5 A (dimer)Hyp-1 [15]
2WQL ADau c 1 [16]
1VJH AAt1 g24000.1 [17]
2FLH BCSBP/zeatin [4]
Table 2. Sequence assignment of segments of secondary structure, as used in the present study. The assignment was calculated using the dssp algorithm, as implemented in the dssp (CMBI version of 2010-10-21) [19]. Si denotes β strand number i and Hi denotes α-helix i. Between each pair of two consecutive secondary structures elements, there is a loop formed by amino acid residues that do not belong to the aforementioned structures. These loops are not listed but are presented in Fig. 1, with the designations Li. The last two columns characterize the proteins geometrically, using the parameters Rg and w (see text). The average <Rg> over the set of proteins is 15.42 ± 0.15 Å. The average <w> over the set is 0.50 ± 0.20. In the case of 1VJH, there is no segment comprising the H2-S2-S3 elements.
PDBS1H1H2S2S3S4S5S6S7H3Rg [Å] w
1BV1 2–1115–2426–3340–4553–5766–7580–8895–106112–122130–15315.590.51
1FSK 2–3 (?)15–3340–4553–5766–7580–8795–106112–122130–15315.500.65
1LLT 2–1115–2424–3340–4553–5766–7580–8898–106112–121130–15315.630.56
1QMR 2–1115–2426–3340–4553–5766–7580–8895–106112–122130–15315.590.54
1FM4 2–1115–2426–3340–4553–5766–7580–8795–106112–122134–15315.660.10
1XDF 2–1115–2426–3337–4452–5962–7479–8594–105111–121129–15115.300.31
2QIM 2–1115–2426–3337–4452–5962–7479–8794–105111–121129–15215.500.70
3E85 3–1115–2426–3337–4452–5962–7479–8794–105111–121129–15215.220.67
1ICX 2–1115–2226–3337–4452–5863–7479–8794–105111–121128–15115.420.49
1IFV 2–1115–2326–3337–4452–5765–7479–8694–106110–121128–15115.350.71
1TW0 2–1115–3237–4452–5962–7479–8694–105111–121129–15215.280.41
1TXC 2–1115–3237–4452–5962–7479–8694–105111–121129–15215.440.52
2BK0 2–1115–2426–3340–4452–5665–7479–8694–105111–121129–15215.310.55
3C0V 3–1115–2326–3338–4553–5866–7580–8897–108111–121132–15015.150.46
3IE5 5–1317–2628–3540–4755–6068–7782–9097–109112–123131–15415.360.05
2WQL 5–1216–2527–3441–4553–5766–7580–8795–106112–122130–15315.200.71
1VJH 4–1217–2737–4449–5563–7481–91100–11914.860.23
2FLH 3–1115–2329–3338–4553–5866–7580–8897–108111–121132–15015.350.71
Figure 1.

The native structures of 1BV1 (left) and 1XDF (right). The β-strands (S), α-helices (H) and loops (L) are numbered consecutively from the N- to the C-terminus. Sequence assignment to the different secondary structure elements for all 18 proteins investigated in the present study is provided in Table 1.

The first group of proteins consists of 1BV1, 1FSK, 1LLT, 1QMR and 1FM4. The sequences of the first four are almost identical, whereas that of 1FM4 differs at 11 positions. The sequences of the proteins in the second group (1XDF, 2QIM and 3E85) are somewhat shorter. 2QIM and 3E85 have identical sequence and 1XDF differs at 11 positions. Figure 1 shows the fold and secondary structure assignment for 1BV1 from the first group and 1XDF from the second group. The third group comprises 1ICX, 1IFV, 1TW0 and 1TXC, which are not the same but very similar. The last group is composed of 2BKO, 3COV, 3IE5, 2WQL and 1VJH. These proteins differ significantly from the other groups with respect to the make-up of the amino acid sequences and yet their native structures correspond to the same folding pattern. In particular, 1VJH is a member of the major latex protein family of proteins. It has 10 evident deletions in the consensus amino acid sequence and is obviously missing some of the secondary structure elements that characterize the PR-10 fold.

In the present study, we provide a thorough characterization of all of these proteins based on theoretical modelling. We start with a description of the geometrical properties. We find that even though the 18 proteins have similar radii of gyration (15.37 ± 0.29 Å) as listed in Table 2, they differ in their overall shape considerably. The shape can be described by a parameter, w, which is related to the radii associated with the main directions of the tensor of inertia. Its values vary between 0.05 and 0.71, with a mean of 0.49 ± 0.20 (Table 2). Unexpectedly, the outside surface areas, S, of the proteins listed in Table 3, vary by up to 3%. They are of the order of 5729 ± 189 Å2. We then discuss properties of the hallmark feature of the PR-10 proteins: the internal cavity. We determine their connectivity and volume, V. The values of S and V are obtained by a novel algorithm, which involves placing spherical probes on a grid of points. When providing this characterization, we consider just one chain of the system, even if (as in the case of 3 IE5) the protein is actually dimeric.

Table 3. Volume [upper value(s) in each row, Å3] of the internal cavity of the proteins; and the outer surface area (lower value, Å2) of the proteins listed in the first column, calculated with the new algorithm using two different radii of the spherical probe (for explanation, see text). Multiple values for the volume parameter indicate the existence of a major (largest) and several significant minor cavities (with a volume of at least 5% of the volume of the major cavity). The volume of the largest major cavity is indicated in bold.
PDB1.3 [Å]1.42 [Å]
1BV1 406 ± 5332 ± 1
5841 ± 95881 ± 2
1FSK 383 ± 5305 ± 1; 17 ± 2
5711 ± 75759 ± 2
1LLT 393 ± 7319 ± 2
5878 ± 115923 ± 2
1QMR 390 ± 6320 ± 1
5792 ± 85833 ± 2
1FM4 632 ± 12529 ± 4
5928 ± 115958 ± 3
1XDF 35 ± 2; 16 ± 1; 11 ± 122 ± 1; 14 ± 1; 9 ± 2; 7 ± 1
5764 ± 145830 ± 5
2QIM 643 ± 18551 ± 4
5862 ± 85901 ± 3
3E85 677 ± 14584 ± 3
5669 ± 45706 ± 2
1ICX 284 ± 17229 ± 4
5767 ± 165819 ± 4
1IFV 95 ± 2; 79 ± 570 ± 1; 63 ± 1
5671 ± 95724 ± 4
1TW0 423 ± 11; 67 ± 2328 ± 3; 30 ± 1
5591 ± 105636 ± 3
1TXC 489 ± 10407 ± 3
5710 ± 105745 ± 3
2BK0 610 ± 10497 ± 2
5605 ± 75651 ± 2
3C0V 258 ± 12232 ± 3
5499 ± 65545 ± 3
3IE5 454 ± 3374 ± 2
5720 ± 115757 ± 3
2WQL 550 ± 3469 ± 1
5526 ± 95584 ± 3
1VJH 58 ± 652 ± 2; 11 ± 1
5079 ± 65127 ± 2
2FLH 285 ± 15254 ± 4
5695 ± 55750 ± 2

Next, we turn to a coarse-grained molecular dynamics model and discuss how the mechanical, thermodynamic and folding properties vary across the set of 18 proteins. Our motivation for using the simplified theoretical approach is that we want to compare a comprehensive list of protein attributes within one model. Many of these attributes are measures of large conformational changes that are hard to study meaningfully with an all-atom precision for proteins of approximately 150 residues. We find that the optimal folding times differ within 20%, although other characteristics are fairly conserved and depend much less on the particular system; this applies especially to the folding temperature. Subsequently, we discuss mechanostability [20-22]. Specifically, we consider constant-speed stretching and find a pattern of four peaks on the force-displacement curves. The patterns appear to be quite similar for all the proteins. The force peaks are predicted to be in the range of 160 pN, which is higher than typical values (approximately 120 pN) found in a recent theoretical survey of 1734 proteins [23]. It is interesting that the PR-10 proteins are expected to be robust mechanically despite containing large internal cavities. The fact that the proteins investigated in the present study have cavities makes them similar to virus capsids, albeit of a much smaller size. It is thus interesting to investigate them by methods designed for the virus capsids, such as nanoindentation [24-26]. Finally, we consider the proteins when submitted to squeezing by two plates. We find that the response to squeezing is similar across the set of the proteins but different from that found for virus capsids: there are no well identified force peaks and there is a substantial variability of the reaction force across different trajectories. We note that no nanoindentation experiments have yet been implemented in the context of proteins and so our assessment of squeezing is of a theoretical value.

Geometrical properties of PR-10 proteins

Shape characterization

We first consider two parameters, Rg and w, to characterize the overall geometry of protein conformations. The former one is the standard radius of gyration, which is defined in terms of all atoms in the protein. The latter one is a distortion parameter [27, 28]. It depends on all three principal radii, Rα, associated with the eigenvalues of the tensor of inertia DαRα = math formula . R1 is the smallest radius and R3 is the longest. The parameter w is defined as math formula where math formula and ΔR = R2 − math formula . The values of w are used to distinguish between globular, elongated and flat shapes. For globular shapes, w is close to 0. Elongated shapes have a substantial positive value of w because then R2 is close to R3 and ≈ ½(R2 − R1). Substantial negative values of w indicate flat shapes because then R2R1 and ≈ ½(R1 − R3). Table 2 lists the values of Rg and w. As noted in the Introduction, Rg varies very little across the 18 proteins, whereas w displays significant variations even within the four uniform groups. The variations suggest that the similarity of general fold and secondary structures does not necessarily mean a close shape: the moment of inertia is sensitive to the exact, physical mass distribution.

All of the PR-10 proteins investigated in the present study have a cavity (or cavities) inside their core. Thus, it is interesting to ask to what extent the cavities of various PR-10 proteins can differ. One way of characterizing the cavities is to determine their volumes. There are several groups of computational tools available for this task [29]. The first group involves setting up a three-dimensional grid of voxels and then scanning them to determine whether they are occupied by the protein atoms (represented by their, possibly scaled, van der Waals spheres) or not [30-36]. With these algorithms, it is necessary to start the counting inside the cavity or to introduce ways of closing it. The second group of methods involves filling the cavity by spherical probes [37-39]. Such algorithms may be complicated, especially when small and irregular cavities are encountered. The third group is based on the so called α-shape approximation to real shapes [40]. Algorithms in the fourth group blend the previous ideas by using Monte Carlo approaches [29, 41]. All of these methods have difficulties when it comes to distinguishing between the inside and outside of a protein and do not allow the investigator much flexibility (other than the choice of van der Waals radii and radius of the spherical probe) in selecting model parameters for the calculations and visualization.

A new algorithm for determination of the volumes of cavities and surface areas of proteins

In the present study, we introduce a new algorithm for cavity volume calculation that combines the concepts of sphere-filling and grid scanning but, in addition, applies shrinking of the enclosing box (movement of its walls) and rotations of the grid. As a by-product, it also allows the determination of the outside (external or solvent-accessible) surface of the protein. We start by imagining that a protein molecule is placed in a parallelepiped that is just sufficiently large to encompass it. We construct a grid of lattice points inside the box. The default lattice constant is taken as = 0.2 Å (smaller values of a do not affect the obtained volumes to any significant degree). The task at hand is to determine whether a grid point can be used as the center of a spherical probe of radius R that is clear of contact with any point of the protein, which is represented by the van der Waals spheres of its atoms. If it can, then the site contributes a3 to the volume. The count depends on the value of R. In biological applications, it is meaningful to consider spheres representing the size of a water molecule. By default, the protein is represented by all of its nonhydrogen atoms but, optionally, hydrogen atoms at generated positions (with caveats about OH, NH3 and CH3 groups) can be included as well. The protein atoms are endowed with an excluded volume corresponding to the van der Waals radii as compiled previously [42]. These radii range from 1.42 Å for the O atoms in side chains to 1.88 Å for the C-α and C-β atoms.

By counting the eligible sites, the volume of the internal cavity is obtained, possibly combined with the volume outside of the protein. The latter has to be subtracted. The outside volume can then be determined. We construct six planar walls (of the box) made of the spherical probes and move each wall from the box boundary inwards along its normal (i.e. along each of the Cartesian directions) in steps of a (i.e. the walls move independently of each other). If a sphere in a wall starts to overlap with the protein, its movement is arrested but the remaining spheres of this wall keep moving until all of them have stopped. When the probes of a wall are able to move without overlapping with the protein atoms, we mark their positions on the lattice. When a probe changes its status to ‘overlap’, its position is marked accordingly. Using this labelling system of the lattice points, we mark the lattice sites that are outside of the protein (having no protein contact) in one way, and those on the surface of the protein in a different way. Each of the grid points with a surface mark contributes a2 to the total measure of the external protein surface. The unmarked points of the lattice define the inside of the protein within the boundary of its surface. The marking of the sites is updated by the action of each wall. The total number of sites that are marked as ‘outside’ defines the external volume of the protein within the box.

There are many choices of orienting the protein relative to the Cartesian directions and they may result in different sphere packing properties. To reduce such orientational effects, we average the results over a number of random orientations. An alternative justification for doing so is that different rotations correspond to different flows of the water-imitating probes. Table 2 shows the values of the volume, V, and surface area, S, for two choices of R: 1.30 and 1.42 Å. The former choice was previously [1], where the group-two algorithm of Laskowski [37] was used (the default value of 1.0 Å for the radius of the filling sphere was overridden). The latter choice (1.42 Å) corresponds to the van der Waals radius of the oxygen atom of a water molecule. The error bars shown in Table 2 reflect the scatter of the result as a consequence of varying protein orientation. Ten rotations have been considered for = 1.30 Å and 100 for = 1.42 Å. The two choices of R yield comparable results for S. However, the estimates of V may differ even by up to 25%. We consider that the water-based value of R is probably more relevant. The dependence of V and S on R between 0.1 and 1.5 Å is shown in Fig. 3 for three proteins. There appears to be a crossover at around ≈ 0.65 Å between the small R and large R behaviour of V. However, the cross-over shifts or disappears when the van der Waals radii of the protein atoms are scaled up by a factor of 1.24 (which may be regarded as a realistic estimate of true van der Waals radii, demarcating the regions of atomic repulsions and attractions). There is no similar crossover in the dependence of S on R.

Volumes of cavities in the PR-10 proteins

It should be noted that the cavity identified through our procedure may also appear to have the form of either singly or multiply connected chambers, depending on the value of R. The values of V in Table 2 are listed as either single or multiple numbers, depending on the number of significant cavities found. The rule used here is that an additional (minor) chamber is listed only if its volume is at least 5% of the volume of the largest (major) cavity. The proteins 1XDF, 1IFV and 1VJH are found to have multiple-chamber cavities (when using = 1.42 Å) as shown in the right panel of Fig. 2. All of their subcavities are comparable in size and smaller than 70 Å3. Proteins 1FSK and 1TW0 have two cavities each but the larger cavity is clearly dominating and bigger than 305 Å3. All the other proteins have practically one cavity (see the left panel of Fig. 2). If only the major cavities are counted, one in each protein, then the average volume is <V> = 326.3 ± 167.0 Å3.

Figure 2.

Top: cavities in 1BV1 and 1XDF as determined by using a probe with R = 1.42 Å. Bottom: the outer surfaces of the same proteins.

Figure 3.

Determination of the outer surfaces (top) and volumes of the inner cavity (bottom) for the three proteins indicated, and as a function of the probe radius R. The full symbols correspond to calculations based on the standard values of van der Waals radii for the protein atoms. The empty ones correspond to values up-scaled by a factor of 1.24.

Remarkably, the external surfaces of all the proteins show only a small variation, with <S> = 5726.4 ±188.9 Å2. Thus, in summary, the 18 PR-10 proteins considered here appear quite similar on the outside but show considerable differences inside.

Kinetic and equilibrium properties

We now consider protein properties that depend on the atomic-level interactions within the molecules, and first we discuss the equilibrium and folding properties. The dynamics are implemented within the coarse-grained approach [43-46] in which the degrees of freedom are associated with the C-α atoms. Their interactions are governed by the contact map as determined through atomic overlaps [42]. The native contacts between the C-α atoms i and j at distance rij are described by the Lennard-Jones potential

display math

where σij is determined for each pair ij so that the potential minimum coincides with the native distance. Our estimate of the binding energy parameter ε ~ 110 pN·Å−1 (with a substantial error bar of approximately 35% derived within a set of 38 distinct proteins) is based on a comparison to experiments with protein stretching [23]. This energy corresponds to approximately 800 K multiplied by the Boltzmann constant kB so the room temperature (300 K) is closer to 0.375 ε·kB−1 than to 0.3 ε·kB−1 at which most of our stretching simulations have been computed. A slightly lower temperature usually leads to better folding. Thermostating is implemented by using Langevin noise and damping terms [47] that mimic the effects of the molecular water. Non-native contacts are purely repulsive and are given by the truncated Lennard-Jones potential that is cut at the minimum of 4 Å. Covalent couplings, such as those along the backbone, are described with a harmonic potential with a spring constant of 50 ε·Å−2. The characteristic time-scale of the simulations, τ, is of the order of 1 ns (i.e. the time needed for a bead to diffuse to a distance of approximately 5 Å through an implicit solvent). It should be noted that the nature of the solvent is reflected in the very shape of the native structure, which results, in particular, from the hydropathic properties of the residues.

Folding is implemented by starting from an extended polypeptide state and running several batches of 101 folding trajectories. Folding is declared to take place when all native contacts are established simultaneously for the first time. Generally, a native contact is considered to be present when rij is smaller than 1.5 σij..The folding time, tf, is determined by averaging median times of the first encounter of the native state over the batches. We study tf as a function of temperature, T. The results are shown in Fig. 4. Table 4 shows values of the shortest tf corresponding to an optimal temperature Topt. Optimal folding takes place in a range of temperatures between T1 and T2 such that T1 < Topt < T2. Typically, one may take Topt as ½(T1 + T2). Operationally, we define these limiting temperatures as ones at which tf becomes three times as long as the optimal tf. In the case of the dimeric 3 IE5, we study the folding of just one chain to allow for a comparison with all of the other proteins.

Table 4. Thermodynamic and kinetic properties of the model proteins investigated in the present study. The first column shows the PDB code, the next three columns refer to the folding parameters defined in the main text, and the last three columns refer to equilibrium parameters. In some cases, where two maxima are identified for the specific heat, the higher of the two is indicated in bold. The last row shows the corresponding parameters averaged over all of the proteins in Table 1.
PDBT1 [ε·kB−1]T2 [ε·kB−1] tf/100 τTf [ε·kB−1]Tmax [ε·kB−1]TQ [ε·kB−1]
1BV1 0.097 0.344 48.8 0.180 0.67 0.75
1FSK 0.092 0.341 44.7 0.185 0.67 0.75
1LLT 0.117 0.331 62.6 0.175 0.63 0.72
1QMR 0.106 0.345 44.3 0.180 0.68 0.75
1FM4 0.071 0.349 51.3 0.185 0.73 0.76
1XDF 0.094 0.348 78.0 0.185 0.79, 0.58 0.76
2QIM 0.073 0.336 66.5 0.175 0.68 0.72
3E85 0.066 0.333 50.0 0.176 0.65 0.73
1ICX 0.107 0.336 46.9 0.184 0.78 0.74
1IFV 0.110 0.348 42.6 0.189 0.55, 0.83 0.76
1TW0 0.141 0.328 56.4 0.174 0.61, 0.78 0.72
1TXC 0.094 0.326 51.9 0.178 0.60, 0.750.71
2BK0 0.101 0.326 57.4 0.172 0.61 0.71
3C0V 0.146 0.375 38.7 0.198 0.83, 0.58 0.79
3IE5 0.149 0.348 57.3 0.190 0.80 0.77
2WQL 0.1250.34048.70.179 0.66 0.73
1VJH 0.1740.36635.00.1920.85, 0.50 0.79
2FLH 0.136 0.357 39.0 0.188 0.82, 0.58 0.77
Mean0.11 ± 0.030.34 ± 0.0151.12 ± 10.710.18 ± 0.010.70 ± 0.080.75 ± 0.02
Figure 4.

Definition of the characteristic temperatures that describe the kinetic and equilibrium properties of the proteins investigated in the present study. Panels on the left refer to 1BV1 and panels on the right refer to 1XDF. In each panel, the characteristic temperatures are highlighted with a larger circle. The top panel shows the folding time as a function of temperature. The dependence is U-shaped. T1 is where tf is equal to triple of the optimal folding time on the low temperature side. T2 is similar to T1 but on the high temperature side. The lower panels show the temperature dependence of three quantities: (a) the specific heat, C, normalized by its maximum value (open circles); (b) fraction of the established native contacts, Q (open squares); and (c) probability, P0, of having all native contacts present (full circles). Tmax is the temperature at which C has its largest maximum. TQ is the temperature where Q reaches 0.5. Tf is where P0 reaches 0.5.

T1 indicates the onset of the low-T glassiness and T2 indicates the onset of strong entropic impediments. The equilibrium is characterized by three parameters: Tf, Tmax and TQ that are also listed in Table 4 and are obtained based on 10 long trajectories (typically of 50 000 τ). The first of these is the melting temperature. It is determined by studying the equilibrium probability, P0, of all native contacts being established: Tf is the temperature at which P= 0.5. Tmax is the temperature at which the specific heat reaches a maximum. For some proteins, there is another shoulder peak in the plot of the specific heat, either above or below Tmax. If this happens, we list two values of T and the higher maximum is highlighted in bold. Finally, TQ is defined as the temperature at which the fraction of the established native contacts crosses 0.5. It is seen that TQ almost coincides with Tmax, with both temperatures characterizing conditions at which a protein becomes globular. Tf, on the other hand, signifies closeness to the native conformation and not just to any compact conformation.

An inspection of Table 4 leads to the conclusion that the equilibrium parameters vary little between the proteins and so do the characteristic temperatures that relate to the kinetics, although T1 shows a bigger scatter than T2. What varies the most is the optimal folding time. It is in the range between 3500 τ (1VJH) and 6650 τ (2QIM). The average time is 5112 ± 1071 τ. We also do not observe any clear division of the properties among the groups of proteins discussed in the Introduction.

Mechanical stability

We now turn to the mechanical properties as tested by single-molecule stretching. In our model, stretching of monomeric proteins is implemented by attaching the termini of the structures to two springs. Their spring constant is taken to be equal to 0.06 ε·Å−1, as in the case of the ‘soft’ spring considered previously [44]. One of the springs is anchored, whereas the other one is moving with a speed of 0.005 Å·τ−1. A native contact is considered broken when rij exceeds 1.5 σij for the last time on the way to full stretching. Identification of the domain that is being unravelled at a displacement d goes through monitoring of the contacts that get ruptured and plotting the so-called scenario diagrams [44, 45] (not shown here). In the dimeric case, there are four termini: N and C belonging to the first chain and N’ and C’ belonging to the second chain. We consider four choices of the pairs of termini directly involved in pulling: N-C, C′-N, C′-C and N′-N.

We have performed stretching studies for all of the proteins listed in Table 1 and find them to be remarkably similar. Figure 5 shows examples of the force versus displacement curves (F versus d) for eight proteins. Each of the curves displays four force peaks (I–IV) that are all almost equal in height, with the caveat that the first and the last are typically approximately 0.2 ε·Å−1 higher. Their heights are listed in Table 5 and are seen to differ very little between the proteins. The typical height of a force peak is approximately 1.4 ε·Å−1 or 150 pN, assuming the calibration used previously [23]. This force is smaller than the value of approximately 200 pN found for the I27 domain of titin [48] but it arises four times for one protein chain during stretching. Even more remarkably, all the unravelling scenarios are essentially the same.

Table 5. Mechanostability of the model proteins investigated in the present study. The first column shows the PDB code and the remaining columns give the heights of the four force peaks. The data points are based on five stretching trajectories. There is no third force peak found for 1VJH. This protein has a deletion of a segment of its structure (Table 2) compared to the other PR-10 proteins in the present study. The deleted segment generally is responsible for the third force peak. The last row shows the corresponding forces averaged over all of the proteins.
PDBFI [ε·Å−1]FII [ε·Å−1]FIII [ε·Å−1]FIV [ε·Å−1]
1BV1 1.45 ± 0.061.22 ± 0.071.34 ± 0.131.43 ± 0.25
1FSK 1.51 ± 0.051.26 ± 0.171.44 ± 0.071.46 ± 0.31
1LLT 1.44 ± 0.051.09 ± 0.121.37 ± 0.111.22 ± 0.21
1QMR 1.55 ± 0.051.10 ± 0.061.36 ± 0.011.69 ± 0.06
1FM4 1.61 ± 0.031.32 ± 0.121.34 ± 0.141.54 ± 0.06
1XDF 1.41 ± 0.081.31 ± 0.111.48 ± 0.131.51 ± 0.09
2QIM 1.43 ± 0.031.29 ± 0.090.94 ± 0.131.42 ± 0.16
3E85 1.45 ± 0.041.21 ± 0.081.17 ± 0.091.33 ± 0.21
1ICX 1.49 ± 0.051.18 ± 0.061.31 ± 0.051.36 ± 0.16
1IFV 1.53 ± 0.051.14 ± 0.070.90 ± 0.111.61 ± 0.18
1TW0 1.22 ± 0.061.06 ± 0.041.22 ± 0.131.45 ± 0.14
1TXC 1.25 ± 0.051.18 ± 0.051.25 ± 0.081.31 ± 0.15
2BK0 1.45 ± 0.030.92 ± 0.101.10 ± 0.031.40 ± 0.16
3C0V 1.59 ± 0.041.28 ± 0.041.37 ± 0.071.42 ± 0.14
3IE5 1.38 ± 0.041.43 ± 0.111.58 ± 0.231.93 ± 0.08
2WQL 1.43 ± 0.051.48 ± 0.110.94 ± 0.151.53 ± 0.10
1VJH 1.52 ± 0.051.37 ± 0.071.26 ± 0.13
2FLH 1.66 ± 0.051.39 ± 0.071.17 ± 0.081.58 ± 0.13
Average1.47 ± 0.111.24 ± 0.141.25 ± 0.201.47 ± 0.17
Figure 5.

Examples of the force-displacement curves for a set of eight proteins.

By studying which contacts rupture at a given value of d, we can establish that the important events associated with force peaks I, II, III and IV are as shown in Fig. 6. The first force peak originates from the interaction between the two short α-helices (H1–H2), forming the V-motif, and the C-terminal helix H3. This peak corresponds essentially to the detachment of helix H3 from the support formed by the H1-H2 motif. We were able to confirm this by investigating the effect of selected contact removal during a stretching simulation; specifically, the removal of the H1-H3 and H2-H3 contacts resulted in a decrease of the peak height by approximately 0.45 ε·Å−1. The second peak originates from the interactions between the α-helix H1 and the β-strands S2, S5 and S7. As a result of removing the corresponding contacts, there was a decrease of approximately 0.60 ε·Å−1 in the peak height. In addition, we observed that the interactions between β-strands S6–S7 played an important role in three of the four proteins that were examined in the same way. The origins of the third and fourth peaks are correlated because the third peak appears as a result of shearing of the antiparallel β-strands S2, S3 and S4, whereas the β-strand S4 has contacts with S3. This means that, at the fourth peak, there is a shearing rupture of contacts between β-strands S4, S5 and S6. Hence, we were unable to use the method of contact removal for peaks three and four and had to simply monitor the unravelling process. In general, the contact removal method is approximate when it comes to the later peaks because the initial events would also be affected.

Figure 6.

Structural motifs that are ruptured at the four force peaks (I–IV) visible for 1BV1 in Fig. 5. The motifs are highlighted using nongreen colours in the native structure of 1BV1.

We now consider the more complicated case of the dimeric protein 3IE5. The snapshots of stretching, near the relevant end of the process (when further stretching results merely in working against covalent bonds and therefore in a rapid raise of F) are shown in Fig. 7 and as F – d curves in Fig. 8. The N-C stretching is seen to be quite similar to the single monomer case. At the end of the process, one monomer is fully stretched and the other is dangling with a near-native shape. Other choices of the termini yield comparable peak forces but the F – d patterns and the numbers of peaks are distinct. Furthermore, the C-C' stretching is accomplished when d is near 200 Å and large parts of the structure remain unravelled, whereas the N-N' stretching requires pulling that lasts more than four times longer.

Figure 7.

Final conformations after full stretching of the dimeric protein 3IE5. The covalent dimer is formed via a Cys(126)-Cys(126)' disulfide bond. Each image shows the result of stretching using a different pair of ends (marked in red). The remaining two free ends are marked in blue. Primes indicate chain endings belonging to molecule B.

Figure 8.

Examples of trajectories for the stretching of the dimeric protein 3IE5. Each of the panels corresponds to the indicated choice of the termini to which the springs are attached. The symbols above each plot indicate which parts of the structure rupture at a given d. There are some trajectory-dependent changes in the assignment.

Reaction to squeezing

The way to implement squeezing in our coarse-grained model has been discussed previously [25, 26] in the context of virus capsids. Other than the smaller size of the object that is squeezed, all methodological details are the same. In particular, we place the protein between two repulsive plates that generate a repulsive potential scaling as z0–10 [49]. The plates are brought together by gradually increasing their speeds symmetrically to a combined speed of 0.005 Å·τ−1. We monitor the forces acting on the two plates and find them to be equal and opposite when averaged over a small range of separations, s. The magnitudes of these forces are averaged to obtain the total compressive force, F. The examples of plots of F as a function of s for two different squeezing directions in 1BV1 are shown in Fig. 9 (bottom), which also shows the corresponding behaviour of Q and Rg. The curves start, at the larger values of s, at various initial points because the linear extension of the protein depends on the direction. The force curves, however, come together on reducing the s parameter.

Figure 9.

Behaviour of 1BV1 on squeezing by two plates separated by the distance of s. The top panel shows Q, the middle panel shows the ratio of Rg to its native value, Rg0, and the bottom panel shows the force of reaction by the protein as averaged over the two plates. Three trajectories are displayed. The thick and thin solid lines correspond to squeezing along the z-axis as defined by the PDB coordinate file. The braided line correspond to squeezing along the y-axis. The vertical lines indicate characteristic values of s (s= 2 Rg0, s= 1.5 Rg0 and s= 1.1 Rg0) at which the forces are measured.

The force curves have a sinuous structure but generally become steeper and steeper with the decreasing value of s. For a given direction of squeezing, the various trajectories differ (the two solid lines for squeezing along the z-direction) as a result of thermal fluctuations. In most virus capsids, the F(s) curves display well defined and reasonably reproducible peaks whose heights vary within a factor of 20 among different capsids [26]. Thus, the PR-10 protein 1BV1 behaves not as a typical capsid object but, instead, more like some of the outliers, CMV and 1BDV [26], although with a lower range of forces and separations. Furthermore, the native contacts break much less readily (Fig. 9, top) because the plates confine the protein very well and there are no inter-protein contacts that rupture readily in the capsids.

To make comparisons between the PR-10 proteins, we determine three characteristic forces, Fs1, Fs2 and Fs3, which arise at three characteristic separations of s1 = 2 Rg0, s2 = 1.5 Rg0 and s= 1.1 Rg0, respectively. Rg0 denotes the native value of Rg. Rg itself varies substantially during squeezing as shown Fig. 9 (middle). Examples of conformations of 1BV1 at s1 and s2 are shown in Fig. 10 next to the native conformation. The values of the three characteristic forces derived for the 18 PR-10 proteins (in the monomeric form) are listed in Table 6. They are averaged over 50 selections of the sqeezing direction and over 10 trajectories for each direction. We observe that, despite the differences in the size of the cavity, the values of Fsi are, remarkably, very close within each set for the three selected values of s. For Fs1, Fs2 and Fs3, we obtain 0.56 ± 0.06, 1.14 ± 0.11 and 4.24 ± 0.32 ε·Å−1, respectively, where the error bars correspond to variations among the proteins. A similar closeness of behaviour is expected to be found for other speeds of compression by the plates.

Table 6. Forces of resistance to squeezing at three characteristic separations averaged over a total of 500 trajectories, corresponding to 10 trajectories for each of the 50 random rotations of the protein.
PDBFS1 [ε·Å−1]FS2 [ε·Å−1]FS3 [ε·Å−1]
1BV1 0.56 ± 0.031.40 ± 0.074.29 ± 0.20
1FSK 0.55 ± 0.031.34 ± 0.064.00 ± 0.19
1LLT 0.50 ± 0.021.28 ± 0.064.03 ± 0.19
1QMR 0.56 ± 0.031.39 ± 0.074.08 ± 0.19
1FM4 0.55 ± 0.031.37 ± 0.074.07 ± 0.19
1XDF 0.71 ± 0.031.63 ± 0.084.98 ± 0.23
2QIM 0.49 ± 0.021.26 ± 0.063.56 ± 0.17
3E85 0.56 ± 0.031.38 ± 0.074.15 ± 0.19
1ICX 0.54 ± 0.031.40 ± 0.074.28 ± 0.20
1IFV 0.61 ± 0.031.51 ± 0.074.37 ± 0.20
1TW0 0.51 ± 0.021.27 ± 0.064.08 ± 0.19
1TXC 0.51 ± 0.021.31 ± 0.064.20 ± 0.20
2BK0 0.50 ± 0.021.30 ± 0.064.12 ± 0.19
3C0V 0.61 ± 0.031.60 ± 0.084.86 ± 0.23
3IE5 0.69 ± 0.031.49 ± 0.074.19 ± 0.20
2WQL 0.54 ± 0.031.38 ± 0.074.10 ± 0.19
1VJH 0.49 ± 0.021.50 ± 0.074.36 ± 0.20
2FLH 0.60 ± 0.031.51 ± 0.074.53 ± 0.21
Figure 10.

Examples of conformations (C-α traces) of 1BV1 corresponding to (A) s = 45 Å (the protein is barely touched by the plates), (B) s = s1 and (C) s = s2, respectively.

The role of the ligands

A few of the PR-10 proteins under discussion have been crystallized as ligand complexes. In our studies, we have removed such ligands so that we compare objects with similar constitution. Here, we examine the role of the ligands by considering the case of 2FLH, which can bind up to two ligands. One of these connects to just one site of the protein (Q67 indicated in Fig. 11), which does not affect any thermodynamical or mechanical properties of the protein at all. The other ligand connects to four sites of the protein (L22, E69, T139 and Y142 in Fig. 11). We have constructed this ligand, in a coarse-grained manner, out of six glycine-like beads connected into three chains: one with three covalent bonds and two with single covalent bonds. Four of these beads are then connected by the contact interactions to the four sites on the protein. (There is also an experimentally attested possibility of not connecting the ligand to Y142 because of a structural water nearby.) We used PyMol (http://www.pymol.org) to identify the optimal native placements of the beads.

Figure 11.

Top: the native conformation of 2FLH with the larger ligand. The ligand is shown in a darker shade (blue). Its contacts with L22, E69, T139 and Y142 are shown in a lighter shade (yellow). Site Q67 indicates the place where the second ligand is able to link. Bottom: an example of the conformation obtained through stretching.

We find that the inclusion of this ligand has a negligible effect on thermodynamics. For example, Tf increases from 0.19 to 0.20 ε·kB−1. It also leads to practically no change in the mechanical stability because the connections to the protein are at sites that are not involved in formation of the relevant mechanical clamps. An example of the conformation at the final stages of stretching is shown in Fig. 11, where the ligand just dangles connected to L22 by one contact interaction. The major impact of the two ligand is visible only in the context of the cavity volume. For example, if both ligands are connected, the volume of the cavity drops to zero. The volume drops from 254 ± 4 Å3 to 97.2 ± 9.3 Å3 if the ligand shown in Fig. 11 is connected and to 70.3 ± 0.3 Å3 if the other ligand is linked to E69. We have checked that the geometry of the larger ligand allows for an entry into the gap area in the folded protein such that it does not need to enter it during folding.

Conclusions

We have surveyed the geometrical, thermodynamic, folding and mechanical properties of 18 proteins with the PR-10 fold. We find a broad commonality in their behaviour, except for one order of magnitude variations in the volume of the internal cavities and 20% variations in the optimal folding time. Our nongeometrical results have been obtained within the structure-based coarse-grained model. However, all-atom simulations (which are much more difficult to perform, especially in the context of protein folding) are expected to yield qualitatively similar conclusions. We have also proposed a new algorithm for the determination of the volumes of internal cavities in protein structure, as well as the area of protein surfaces.

It is appropriate to comment on the interpretation of the PR-10 cavity volumes reported in this and other studies. The values previously reported in the literature [1] are larger than those calculated here. It appears that the previous volumes may have been overestimated as a result of various deficiencies of the algorithms used. On the other hand, the cavity volumes of different proteins appear to have been estimated correctly on the relative scale (i.e. larger volumes were correctly predicted as such). We hope that the careful analysis of the volume estimation methodology reported in the present study will help to organize and unify the information available for the PR-10 family, and possibly also for other void-containing proteins.

The main reason for the systematic volume overestimation produced by the existing algorithms is that they simply count the number of voxels placed, without checking whether there is a possibility for a transition between adjacent chambers. When two voxels (that can house a ‘water probe’) are joined only at corners, we consider them as belonging to separate chambers because ‘a water probe’ cannot cross this junction directly. In this way, various small chambers may not contribute to the volume of a large chamber. It should also be noted that an important aspect of the improved accuracy of our approach is that we take into account a large number of rotations (i.e. we consider various possible placements of the grid relative to the protein). The number of voxels housing ‘water probes’ does depend on the rotation and introducing an average over many rotations is important.

The interesting physical and geometrical features of the PR-10 proteins revealed in the present study have tempted us to speculate about their biological significance. The fact that different PR-10 members, despite their diversity, appear to be mechanically robust, as well as have almost identical patterns of structural rupture under the influence of (quite high relative to other globular proteins) mechanical force, may indicate that the PR-10 fold is quite stable. This observation is particularly intriguing in view of their hallow cores. It appears that the large void inside the proteins' centres does not make them more labile or prone to disorder. Hence, the surface of the internal cavity, even if inaccessible from the outside, should have properties that are similar to the outer, solvent exposed surface and would need no extra factors to cement its stability.

It is also important to emphasize that the internal cavity (although as a rule quite large) can have a very different volume and topology, consisting for example of a number of smaller chambers. This may indicate that the ligands of different PR-10 proteins can be of very different chemical character, a phenomenon that has already been noted in the literature [11]. According to this view, the PR-10 proteins could be regarded as versatile ligand binders, fulfilling perhaps quite diverse roles in small-molecule signalling and transport/storage.

After the submission of the present study, pressure unfolding studies (M. Dellarole, M. Fossat and C. A. Royer, unpublished results) revealed a large volume change upon unfolding (−160 mL·mol−1) for PR10, which is consistent with the existence of a large and largely solvent excluded internal cavity [50]. For example, for a CSBP protein Medicago truncatula, which is a close homologue of the 2FLH protein, the experimentally measured cavity size is 265 ± 11 Å3, which is close to our value of 254 ± 4 Å3 listed in Table 3.

Acknowledgements

The spaceball software for the calculation of protein cavities and surfaces is available via the website: http://www.ifpan.edu.pl/&#x0223C;cieplak/spaceball. The computer resources used in the present study were financed by the European Regional Development Fund under the Operational Programme Innovative Economy NanoFun POIG.02.02.00-00-025/09. At the Institute of Physics, this research was supported by Polish National Science Centre Grant No. 2011/01/B/ST3/02190 and by project FiberFuel ERA-NET-IB/06/2013/. MJ was supported by the Ministry of Science and Higher Education Grant No. NN 301 003739.

Ancillary