Protein crystals are of wide-spread interest because many of them allow structure analyses at atomic resolution. For soluble proteins, the packing density of such crystals is distributed according to the Matthews Graph. For integral membrane proteins, the respective graph is similar but at lower density and much broader. By visualizing the relative positions and orientations of membrane proteins in crystals, it has been suggested that the detergent micelles surrounding these proteins form sheets, filaments, or remain isolated in the crystal giving rise to three distinct packing density distributions that superimpose to form the observed broad distribution. This classification was indirect because detergent is not visible in X-ray crystallography. Given the extensive work involved in analyzing detergent structure directly by neutron diffraction, it seems unlikely that a statistically relevant number of them will be established in the near future. Therefore, the proposed classification is here scrutinized by a simulation in which an average detergent-carrying membrane protein was randomly packed to form crystals. The analysis reproduced the three types of detergent structures together with their packing density distributions and relative frequencies, which validates the previous classification. The simulation program was also run for crystals from soluble proteins using ellipsoids as reference shapes and defining a shape factor that quantifies the deviation from the nearest ellipsoid. This series reproduced and thus explained the Matthews Graph.
The crystal packing density has become a popular parameter in protein crystallography. It is usually represented by the packing parameter VM, which is the unit cell volume divided by the protein mass within the cell and thus a specific volume, or the inverse of a density. The VM distribution, also called Matthews Graph,1 was first derived for soluble proteins, which for a long time were the only proteins that could be crystallized and structurally analyzed.2, 3 The VM value is known to correlate with crystal disorder represented by the resolution limit of the X-ray diffraction4 and with the crystal symmetry.5 The combined VM distribution of all crystal data shows a peak that is skewed toward lower packing density, which means higher VM (Fig. 1). The first X-ray diffraction pattern of a well-behaved protein crystal usually reveals the volume of its unit cell. Since the protein mass is generally known before the first crystal appears, the most probable number of protein molecules in the unit cell can be derived from the VM distribution and used as a reasonable starting point in the X-ray analysis.
After the first membrane proteins were crystallized6, 7 and after the first membrane protein structures were established,8–10 it turned out that there are three distinct groups of membrane proteins: monotopic membrane proteins (floating in one of the two lipid layers of the membrane),11–13 β-barrel proteins, and α-helical proteins. The monotopic proteins are not considered here because they are of an intermediate and also more intricate nature. Both β-barrel and α-helical proteins are fully immersed in the membrane and thus called integral membrane proteins. In the context of this study, β-barrel and α-helical membrane proteins are similar enough to be lumped together. The observed VM distribution of 315 integral membrane protein crystal structures is depicted in Figure 1.14 It resembles the VM distribution for soluble proteins, but has a higher VM average and a larger relative peak width (RPW, full width at half maximum divided by the position of the peak center), namely 0.56 versus 0.38. It can also be used for estimating the number of molecules per unit cell from an initial X-ray diffraction pattern. Because of its broadness, however, the estimate is less informative than that for soluble proteins.
The generally higher VM values of membrane protein crystals are commonly considered as due to the addition of the required detergent to the solvent region. If this were correct, the packing density VM = Vcell/Mr = (Vprotein + Vsolvent + Vdetergent)/Mr = (Vprotein + Vsolvent)/Mr + Vdetergent/Mr would be distributed like that of the crystals from soluble proteins (italics) plus a term that is nearly constant because the detergent volume is approximately proportional to the protein mass. Accordingly, the VM distribution for membrane protein crystals would be the Matthews Graph for soluble proteins shifted to higher VM values, which would reduce its RPW instead of increasing it. This contradiction points to a more intricate situation.
It has been suggested that the VM distribution of membrane protein crystals consists of three distinct distributions representing three types of detergent structures, namely isolated toroidal micelles surrounding the proteins (M0) as well as toroidal micelles fused to filaments (M1) and to sheets (M2).14 As this classification was based on indirect detergent assignments derived from protein packing geometries, some further support was desirable. Such support cannot be provided by the four detergent structures in crystals established by neutron diffraction15–18 because these are by far too few to explain the statistics arising from 315 crystal structures. Therefore, I simulated the crystallization of membrane proteins. This reproduced the three micelle fusion types as well as their VM distributions and their relative frequencies and thus validated the indirect visual analysis. The simulation program is given in the Supporting Information. It was also used to explain the Matthews Graph for crystals from soluble proteins (Fig. 1).
M0, isolated toroidal micelles surrounding protein molecules in a zero-dimensional arrangement; M1, toroidal micelles fused into filaments in a one-dimensional arrangement; M2, toroidal micelles fused into sheets in a two-dimensional arrangement; P1[z = 1], crystal space group P1 with z = 1 molecule per asymmetric unit; PDB, Protein Data Bank; RPW, relative peak width; VM, Matthews' packing parameter given in (Å3/Da).
All simulations were performed using the most simple crystal space group P1 with z = 1 molecule in the asymmetric unit (P1[z = 1]). Soluble proteins were represented by ellipsoids and it was assumed that any part of their surface may participate in crystal contacts. Crystal formation in packing scheme P1[z = 1] is explained in Figure 2(A). Molecule 1 is an ellipsoid with its principal axes aligned to a Cartesian system. All other molecules are related by pure translations. The vectors from the origin to the centers of molecules 2, 3, and 4 define the unit cell and thus its volume Vcell. The directions of these vectors were determined by a random selection of the polar angles θn and ϕn (n = 2, 3, and 4, all directions with equal sterical probability), whereas their lengths were constrained by the required crystal contacts; the ellipsoids have to touch each other along the line defined by each vector [Fig. 2(A)]. Overlap was not allowed so that many of the random arrangements had to be discarded. Note that the monitored packing parameter VM = Vcell/Mr equals 1.23·Vcell/Vprotein (Mr = protein mass, Vprotein = protein volume) under the usual assumption that all proteins have the same density of 1 Da/1.23 Å3. Therefore, the VM value calculated in this simulation is independent of any scale factor. This agrees with the observation that VM is essentially independent of the protein size.4
Two VM distributions resulting from this simulation are depicted in Figure 3(A). Surprisingly, all kinds of ellipsoids with ratios between the largest and smallest principal axes in the range 3:1 and 1:1 resulted in almost identical VM distributions. Figure 3(A) shows two extreme distributions with RPW values of 0.23 and 0.25. All others are essentially between them. Since the packing density depends on crystal symmetry,5 the simulation has to be compared with the observed distribution for packing scheme P1[z = 1]. Among the soluble proteins in the Protein Data Bank (PDB),3 there are 87 distinct crystals belonging to P1[z = 1]. Their VM distribution (Figs. 1 and 3) shows an RPW of 0.33, which is larger than that of the simulations. All distributions are skewed toward higher VM values.
The remaining difference between the simulated and the observed VM distribution [Fig. 3(A)] is probably caused by the simplifying assumption that proteins have an ellipsoidal shape. For analyzing the consequence of this simplification, one has to keep in mind that crystallized proteins are in general globular and that any globular distribution of atoms is well characterized by its gyration ellipsoid with volume Vgyr (see Methods). As an example, Figure 2(C) shows the gyration ellipsoid of calmodulin. The ellipsoid best-fitted to a real protein is its gyration ellipsoid uniformly expanded such that it assumes the volume Vprotein of this protein. Proteins with identical volume (i.e., identical mass Mr) but different shapes have different gyration ellipsoids. As the minimum gyration ellipsoid with volume Vgyr-min corresponds to an exactly ellipsoidal protein, the shape factor Vgyr/Vgyr-min characterizes the shape deviation of a protein from an ellipsoid.
The gyration ellipsoids were determined for all proteins that crystallized in packing scheme P1[z = 1]. In 85 out of the 87 proteins, the resulting axial ratio between the largest and the smallest principle axis was in the range 3:1–1:1, confirming the axial ratios used in the simulation. The overall average was 1.4:1. The resulting shape factors correlate clearly with the VM values [Fig. 3(B)]. Calmodulin has a comparatively large shape factor [Fig. 3(B)] indicating a strong deviation from an ellipsoid in agreement with its dumb-bell shape [Fig. 2(C)].
For any shape factor above 1.0, part of the protein protrudes to the outside of the best-fitted ellipsoid leaving other parts within this ellipsoid empty [Fig. 2(C)]. If crystal contacts use the protruding parts (or the indentations) of the protein, the resulting unit cell volume Vcell is larger (or smaller) than the simulated Vcell based on the best-fitted ellipsoid [Fig. 2(A)]. Any deviation of Vcell for a given Vprotein is a deviation of VM = 1.23·Vcell/Vprotein. Accordingly, the deviation from the ellipsoid works in both directions reducing and increasing the VM value, which explains that the observed VM distribution is broader than the simulated ones [Fig. 3(A)]. Especially, large shape factors are found in the high VM tail region of the observed distribution [Fig. 3(B)] indicating particularly large protrusions and indentations relative to the best-fitted ellipsoid. The thick tail of the observed distribution of P1[z = 1] crystals is then explained by crystal contacts preferring protrusions over indentations, which seems reasonable because these are geometrically easier available.
Membrane proteins have polar and apolar surface regions. The apolar surface covers the membrane-immersed part, which is well approximated by a cylinder with its axis perpendicular to the membrane plane [Fig. 2(B)]. On both sides of the membrane, the cylinder is capped by protein regions with polar surfaces. The apolar surface of the cylinder is covered by a toroidal detergent micelle as confirmed in neutron diffraction analyses.15–19 The detergent keeps the protein monodisperse in solution, from which it crystallizes. In the simulation, I used a model that approximated the shape of an average crystallized membrane protein [Fig. 2(B)] for generating crystals in packing scheme P1[z = 1]. In contrast to soluble proteins, however, not all contacts were accepted. Arrangements with polar–apolar contacts, which occurred between polar surface and detergent micelle of the given model, were discarded because they are energetically unfavorable. On the other hand, detergent overlap was allowed as it simulates detergent fusion.
The resulting overall VM distribution is a flattened Gaussian that is strongly skewed to high VM values [Fig. 4(A)]. All accepted packing arrangements were automatically analyzed for the distribution of the detergent, which was done by counting the number of superimposed detergent voxels between all 28 pairs of the eight molecules in Figure 2(A). For example, extensive detergent superposition in molecule pairs 1-4, 2-6, 3-7, and 5-8 and no superposition in pairs 1-2, 1-3, etc. shows the M1 type (filaments), whereas extensive superposition within quartets 1-2-4-6 and 3-5-7-8 with no superposition in pair 1-3, etc. corresponds to type M2 (sheets). No strong superposition in any pair indicates M0 (isolated).
The simulation resulted in a clear split between the three detergent structure types without many inconclusive cases, which validates the proposed classification14 and also corroborates the importance of detergent micelle fusion for crystallization. The VM distributions of the three types are plotted in Figure 4(A). The ratios between the average VM values are 3.40:3.19:2.51 = 1.07:1.00:0.79, which fit the observed values of 1.13:1.00:0.74 of the visually assigned distributions reasonably well.14 The frequency ratios are 47:35:18 for types M0, M1, and M2, respectively, which corresponds well to the observed values 51:32:17.14 The absolute values of the VM averages are clearly smaller than those of all observed membrane protein crystals, because the simulation was performed in P1[z = 1], which is known to show lower VM values than a mixture of all space groups.5, 14
The VM difference between P1[z = 1] and the observed mixture of all space groups can be corrected using a stretch factor, which is 2.68/2.24 = 1.20 for crystals of soluble proteins (ratio of averages given in Fig. 1). The corresponding stretch factor for crystals of membrane proteins (3.90/3.24 = 1.20) comes with a large error because there are only five entries in P1[z = 1]. It should be smaller than that for soluble proteins because the symmetry dependence is less pronounced.14 As the VM differences between cubic and primitive space groups are 0.7 and 1.0 Å3/Da for membrane and soluble proteins, respectively, the stretch factor for membrane proteins should be 1.00 + 0.20·0.7/1.0 = 1.14. This value was preferred as it carries a much smaller error than the directly determined one.
The stretched simulated VM distribution and the observed distribution for membrane proteins are shown in Figure 4(B). The peak positions correspond to each other. The RPW of the simulated distribution is 0.50 and thus not very much lower than the respective 0.56 of the observed distribution. Most likely, the surface of the real membrane proteins is not as smooth as the model so that the distribution is broadened by contacts via protrusions and indentations. As with soluble proteins (Fig. 3), the thick tail at high VM values is probably caused by the preference of contacts between protrusions over all others.
The VM distributions of crystals from soluble and membrane proteins (Fig. 1) can be explained by simple models, an ellipsoid for soluble proteins, and a standard shape for membrane proteins, which is depicted in Figure 2(B). For soluble proteins, the crystal packing density is almost independent of the shape of the ellipsoid. The deviation of real proteins from ellipsoids gives rise to a scatter of the VM values. However, strong deviations lead in general to higher VM values because the protrusions are easier available for crystal contacts than indentations. For membrane proteins, the VM distribution is a superposition of those for the detergent structure types M0, M1, and M2,14 which were reproduced in detail with the simulation in packing scheme P1[z = 1]. Since crystal symmetry affects the crystal packing density of membrane proteins in the same (although less pronounced) way as that of soluble proteins,14 the simulation in P1[z = 1] can be corrected by a stretch factor and then compared with the observed VM distribution for all space groups. Stretching results in a reasonable fit that validates the proposed detergent micelle classification.
The proteins in packing scheme P1[z = 1] were extracted from the PDB (status January 2011). The initial 121 entries were reduced to 87 distinct crystals of different (10% volume) unit cells from soluble proteins as well as five distinct crystals from membrane proteins. In the simulation, the soluble proteins were approximated by ellipsoids. A large range of axial ratios in ellipsoids were considered. All ellipsoid orientations relative to the unit cell are accounted for by the simulation procedure itself [Fig. 2(A)].
The approximation of a protein by an ellipsoid was checked with gyration tensors defined by Gmn = Σj1,Nxmj·xnj/N with xmj = center of mass coordinate of atom j = 1, 2...N with m, n = 1–3. Using principle component analysis, this tensor is diagonalized by a rotation of the coordinate system. The resulting diagonal elements are the variances in the direction of the three principle axes. The square roots of these variances are the respective standard deviations, which are used as the principle axes of the gyration ellipsoid. The best-fitted ellipsoid of a given protein is its gyration ellipsoid (volume Vgyr) uniformly expanded by the factor (Vprotein/Vgyr)1/3 to fit Vprotein. If a protein was exactly ellipsoidal, the volume of its gyration ellipsoid would be at its minimum Vgyr-min = (5)−3/2·Vprotein when compared to all proteins that have the same volume. The expansion factor is then (5)1/2·(Vgyr-min/Vgyr)1/3. The shape factor was determined for all soluble proteins crystallized in P1[z = 1] and grouped into four VM ranges [Fig. 3(B)].
After taking a number of measurements of crystallized membrane proteins in the PDB using COOT,20 I decided on an average model with the following properties: the protein mass is about 110 kDa. The protein part outside the membrane is about half the total protein. The polar surface protrudes somewhat over the central membrane-immersed part. For simplicity, the cross section perpendicular to the z-axis was taken as circular although it is slightly ellipsoidal. The apolar surface length perpendicular to the membrane is 30 Å. The polar parts are two oblate ellipsoids with a diameter of 55 Å and a thickness of 15 Å. The membrane-immersed part has a diameter of 40 Å, and the detergent micelle protrudes 20 Å from the apolar surface. The resulting model is shown in Figure 2(B). The crystallization in packing group P1[z = 1] was simulated using program simulation.f described in the Supporting Information. The simulation result was more sensitive to the model than for soluble proteins, but the basic results were always reproduced.
The technical assistance by C. Schleberger is gratefully acknowledged. The author thanks O. Einsle and S. Gerhardt for discussions.