Undistorted structural analysis of soluble proteins by attenuated total reflectance infrared spectroscopy

Authors

  • Michel E. Goldberg,

    1. Unité de Repliement et Modélisation des Protéines, Department of Structural Biology and Chemistry, Institut Pasteur, Paris, France
    Search for more papers by this author
  • Alain F. Chaffotte

    Corresponding author
    1. Unité de Repliement et Modélisation des Protéines, Department of Structural Biology and Chemistry, Institut Pasteur, Paris, France
    • Institut Pasteur, 28 rue du Dr. Roux, 75724 Paris Cedex 15, France; fax: +33-1-40-61-30-43.

    Search for more papers by this author

Abstract

Water from the solvent very strongly absorbs light in the frequency range of interest for studying protein structure by infrared (IR) spectroscopy. This renders handling of the observation cells painstaking and time consuming, and limits the reproducibility of the measurements when IR spectroscopy is applied to proteins in aqueous solutions. These difficulties are circumvented by the use of an Attenuated Total Reflectance (ATR) accessory. However, when protein solutions are studied, ATR spectroscopy suffers from several drawbacks, the most severe being nonproportionality of the observed absorbance with the protein concentration and spectral distortions that vary from protein to protein and from sample to sample. In this study, we show (1) that the nonproportionality is due to adsorption of the protein on the ATR crystal surface; (2) that the contribution of the crystal-adsorbed protein can easily be taken into account, rendering the corrected absorbance proportional to the protein concentration; (3) that the observed variable base line distortions, likely due to changes in the penetration depth of the light beam in solutions with the refractive index that depends on the protein concentration, can be easily eliminated; and (4) that ATR IR spectra thus corrected for protein adsorption and light penetration can be used to properly analyze the secondary structure of proteins in solution.

Though X-ray crystallography and NMR provide powerful tools to reveal the details of protein structures, these methods are quite demanding and not applicable to proteins that fail to crystallize, or are too large to be amenable to NMR. Other methods, although providing less information on the three-dimensional structure of proteins, are of easier and more general use, and provide important insights on the protein conformation. In particular, circular dichroism in the near-UV as well as infrared spectroscopy yield precious information on the secondary structure contents of proteins. These two methods are complementary in that circular dichroism provides good estimates of the number of residues that belong to α-helices but fails to precisely predict the β-strand content, while infrared spectroscopy predicts β-strands much better than α-helices. Yet, infrared spectroscopy is still scarcely used for protein structural studies, in spite of the considerable improvements to the speed of acquisition, precision, and quality of the spectra brought about by the advent of Fourier transform infrared spectroscopy (FTIR) (Jackson and Mantsch 1995; Backmann et al. 1996; Baello et al. 2000).

The main reason for this trailing of FTIR seems to be the practical difficulties encountered in applying it to proteins in aqueous solutions. Indeed, the information needed to predict the secondary structure contents from infrared spectra is contained in the amide I and amide II absorption bands of the peptide bonds (centered at ∼1650 and 1550 cm−1, respectively), which overlap the very strong absorption band of water in the 1600 cm−1 region. This has two consequences. One is that the optical path used for FTIR measurements must be extremely short (a few microns only) to let enough light emerge from the sample cell. The second is that, in view of the very small concentration of peptide bonds compared to the water concentration (∼ 10−2 M peptide bonds vs. 111 M water O—H bonds in a 1 mg/mL protein solution), the protein signal is extremely small compared to the water signal, which requires the thickness and positioning of the sample cell to be rigorously the same for successive measurements. The conjunction of these two requirements results in severe technical difficulties. Indeed, only demountable cells can be practically used so that the cell windows can be efficiently cleaned after each protein sample. Since the exact path length of such cells depends on the strength put in tightening them upon reassembly, it is extremely difficult to keep the optical path strictly constant. This prompted the design of demountable (for cleaning purposes) cells that can be filled and emptied without disassembling them (for constant optical path purposes) for recording the sample and buffer spectra. But filling, emptying, cleaning, rinsing, and drying very thin cells are delicate and time-consuming operations since residual traces of liquid as well as air bubbles are not easy to avoid. To circumvent these difficulties, an “attenuated total reflectance” accessory (ATR) can be used instead of the usual transmission sample cell. With an ATR, rather than measuring the absorption of the incident light, one measures the absorption of the evanescent wave that penetrates the solution when the incident light is reflected at the crystal/liquid interface. Loading the ATR with a solution, emptying it, cleaning the sample compartment, rinsing, and drying it are extremely rapid and simple. Furthermore, because the ATR geometry is factory defined and remains perfectly constant, the “optical path” depends only on the crystal and solution refractive indices, thus rendering it perfectly constant for a given solution. Measurements are therefore highly reproducible. Moreover, for several commercially available ATRs with a circular well (as opposed to long crystals) only minute volumes of samples are needed since 10 μL of solution amply suffice to cover the active surface of the ATR. These major advantages prompted attempts to use ATR/FTIR spectroscopy for structural studies of proteins (Raussens et al. 1998; Smith et al. 2002).

While starting FTIR studies on proteins, we indeed found it very easy and rapid to acquire spectra with an ATR. However, we also found out that their interpretation is not straightforward and may be badly misleading. In this report, we shall show that the ATR/FTIR spectra as recorded are both biased and distorted, we shall offer a physical interpretation for the bias and distortions we observed, we shall propose a simple method to correct the observed spectra so as to render them reliable, and we shall show that the corrected spectra thus obtained allow for reliable secondary structure predictions.

Results

Concentration dependence of the ATR/FTIR spectra

In order to determine the usable protein concentration range for spectral measurements and to assess the quality of the spectra obtained with our instrument, ATR/FTIR transmission spectra were recorded at different concentrations of hen egg white lysozyme, a protein chosen for its availability, stability, and well-characterized spectral and structural properties. The protein concentrations were 1, 2, 4, 6, and 8 mg/mL. The absorption spectra at these concentrations were constructed, using as a reference the transmission spectrum of the pure buffer (Fig. 1). Visual examination of the spectra revealed that the height of the amide I and amide II peaks were not proportional to the protein concentration. This was confirmed by plotting the observed absorbance at the amide I and amide II peak frequencies as a function of the protein concentration. The straight line thus obtained (Fig. 2) indicates the presence of two components to the absorbance. One is concentration independent and corresponds to the absorbance obtained by extrapolation to zero protein concentration (0.0072 and 0.005 arbitrary units, respectively, for the amide I and amide II bands). The second is proportional to the protein concentration. For the amide I band, the absorbance of the concentration independent component corresponds to that of 3.5 mg/mL of the concentration-dependent component, as estimated from the slope of the amide I straight line. Similar experiments were conducted with bovine Ribonuclease A and horse muscle myoglobin and provided similar results, with the absorbance of the concentration-independent components corresponding to that of 3 and 5 mg/mL, respectively, of the concentration-dependent components. That the slopes of the straight lines obtained for the amide I and amide II bands are very similar, while the extrapolation to zero concentrations of the two lines do not coincide, suggests that the ratios of the amplitudes of the amide I and amide II bands are not the same for the concentration-dependent and concentration independent components.

Nature of the concentration independent absorbing component

It is part of the biochemists' common knowledge that proteins tend to stick to glass surfaces, and that at protein concentrations of less than 1 mg/mL the surface gets saturated with protein. Because all our ATR/FTIR spectra were recorded at protein concentrations of at least 1 mg/mL, we hypothesized that at all concentrations investigated the diamond crystal of the ATR might behave as a glass surface and might be saturated with adsorbed protein that the evanescent wave would have to cross before penetrating the solution. Accordingly, the adsorbed protein would be responsible for the concentration independent part of the IR light absorption. To test this hypothesis, 20-μL samples of lysozyme solutions at 1, 2, 4, and 8 mg/mL were incubated for 10 min in the ATR, which is amply sufficient to allow for adsorption of the protein onto the crystal surface (Santos et al. 2003). The solution was carefully removed from the ATR well by vacuum aspiration, 20 μL of buffer were added, and the IR spectrum immediately recorded with only 100 accumulations to shorten the recording time and hopefully minimize possible protein desorption (Fig. 3). Although the signal-to-noise ratio was rather poor (because of a small signal and limited number of accumulations), the spectra thus obtained clearly showed the characteristics of a protein spectrum, with the two major bands at the amide I and amide II positions. This confirmed that some protein got strongly adsorbed to the crystal during the 10-min incubation of the lysozyme in the ATR. Moreover, as expected from our hypothesis, the amplitudes of the amide I and amide II bands were the same (within the noise) for the four concentrations investigated, and these amplitudes represented over 70% of the concentration-independent component of the absorption discussed in the previous section. This suggested that at least a major part of the adsorbed protein was strongly attached to the ATR crystal. To test the stability of the adsorption, the buffer that had been added after removal of the 8 mg/mL lysozyme solution was removed by aspiration and replaced by 20 μL of fresh buffer, rinsing (i.e., aspiration followed by addition of 20 μL of buffer) was repeated twice, and the spectrum recorded. Finally, rinsing was applied three more times and the spectrum recorded again. While the peak amplitude after the first three rinses was slightly reduced compared to the initial spectrum of the adsorbed material, the spectrum after the sixth rinse was unchanged compared to that after three rinses. Thus, about half of the adsorbed protein appeared very tightly bound to the crystal, while the other half could be removed by repeated rinsing. Both the loosely and tightly adsorbed protein molecules could be efficiently removed from the crystal surface by washing the ATR sample compartment twice with a soft detergent solution (2% Hellmanex) followed by thorough rinsing with distilled water, a procedure that lasts ∼2 min (see Materials and Methods). This was demonstrated by incubating a 10-mg/mL lysozyme solution in the ATR, applying the wash-and-rinse procedure, adding 20 μL of buffer to the ATR and recording the spectrum, which showed no detectable absorption in the amide I–amide II region.

That protein adsorption onto a surface results, at saturating concentrations, in the formation of a protein monolayer is a widely accepted notion (for review, see Hlady et al. 1999). The IR absorbances we observed for the concentration-independent component are compatible with this view. Indeed, let us estimate roughly the concentration of protein in solution that would absorb the same amount of light as the monolayer of adsorbed protein. Assuming the thickness of the monolayer to be of the same order of magnitude as the diameter of the protein, it is expected to be ∼20–30 Å for a small protein like lysozyme, myoglobin or ribonuclease. In order for the evanescent wave to undergo, in the solution, an absorbance equivalent to that caused by the adsorbed material, the protein concentration must be such that the volume occupied by the protein molecules that are on the pass of the evanescent wave in solution (beam diameter × penetration depth × fraction of the volume occupied by the protein) must be the same as the volume of adsorbed protein it crosses (beam diameter × monolayer thickness). The penetration depth of the evanescent wave in the ATR accessory can be calculated (Griffiths and de Haseth 1986) to be ∼0.88 microns for a frequency corresponding to the amide I region. Therefore, for the evanescent wave to travel across the equivalent of 20 Å of protein molecules in the solution, the fraction of the volume occupied by the protein in the solution must be ∼0.23%–0.34% (i.e., 20/8800 and 30/8800, respectively). Assuming a standard value of 0.73 mL/g for the partial specific volume of the protein, this corresponds to a protein concentration of the order of 0.31%–0.46% (w/v), which compares very well with the protein concentrations of 0.3%–0.5% (i.e., 3–5 mg/mL) where the absorbance increment is equivalent to the absorbance of the adsorbed protein (see previous section).

Absorption spectrum of the protein in solution

As shown above, the contribution of the crystal-adsorbed protein to the absorption spectrum is quite important. The experimental spectrum therefore does not correspond to the real spectrum of the protein in solution. This can be easily corrected by recording two spectra of the same protein, one at high (5–10 mg/mL), the second at low (∼0.5–1 mg/mL) protein concentration. Subtracting the absorption spectrum at low concentration from the absorption spectrum at high concentration eliminates the contribution of the adsorbed protein (since it is the same in the two samples) and yields the spectrum of the soluble protein at a concentration equal to the concentration difference between the two solutions. Figure 4A shows the absorption spectrum of lysozyme at 9.1 mg/mL obtained by subtracting the spectrum recorded at 0.9 mg/mL from that recorded at 10 mg/mL. When this was done for lysozyme solutions at various concentrations, the resulting spectra were superimposable when normalized with respect to the concentration. Furthermore, the peak absorbances obtained for the amide I and amide II bands were now strictly proportional to the “apparent” concentration (i.e., the difference between the high and the low concentrations used for recording the spectra) as shown in Figure 4B.

It is noteworthy that the spectrum of the adsorbed protein, determined by replacing the protein solution with buffer but omitting the washings (Fig. 3), differed from the corrected spectrum of the protein in solution in terms of the amide I and amide II band shapes, as well as in relative intensities of these two bands. This indicates that the adsorbed protein differs from the native protein in solution in terms of secondary structure contents, suggesting partial denaturation of the protein upon adsorption. Such post-adsorption changes in conformation were already reported for several proteins (Kondo and Mijara 1996; Zoungrana et al. 1997; Van Tassel et al. 1998; Santos et al. 2003). This emphasizes the need to clear the experimental spectra from the contribution of the nonnative adsorbed protein. The double concentration differential method we propose meets this requirement and renders the absorption measurements quantitative.

Baseline adjustment

Most of the absorption spectra obtained by this differential method (see, e.g., the continuous line in Fig. 4A) showed a baseline distortion: While the absorbance should be zero throughout the 1700–1850 cm−1 region, it systematically showed a decrease with decreasing frequencies until the region of the amide I band was reached. This unexpected behavior appeared the more pronounced for spectra acquired at the higher protein concentrations. From this, we inferred that this spurious absorption might be due to a very slight reduction in the depth of penetration of the evanescent wave with increasing protein concentration, presumably due to an increase in the refractive index of the solution. This would result in a slight reduction of the water absorption in the experiment at high protein concentration compared to that at low protein concentration. Using the Spectral Subtract function of the PROTA software (Subtract.AB in GRAMS/32), a variable weight factor was assigned to the buffer versus air absorption spectrum (recorded in the ATR) and the weight factor was optimized so as to minimize the difference between the protein spectrum and the weighted buffer spectrum in the 1720–1850 cm−1 region. Applying this procedure resulted in a corrected lysozyme spectrum (dotted line in Fig. 4A), which shows a flat and constant baseline. The usefulness of such spectra, corrected as indicated above for both the adsorbed protein and the baseline distortion, will now be considered.

Secondary structure analysis

Although corrected as described above, the ATR/FTIR spectra obtained still do not coincide with FTIR spectra obtained with a conventional transmission cell, since the ATR introduces some systematic spectral distortions. Some of them, like that caused by the variation of the penetration depth with the light frequency, can be corrected (Griffiths and de Haseth 1986). Others are not readily amenable to mathematical correction. However, these distortions are constant for a given ATR accessory and reproducible from spectrum to spectrum. Thus, differences in the ATR/FTIR spectra (once corrected as discussed above for protein adsorption and baseline distortion) of different proteins should quantitatively reflect differences in their conformations and serve for secondary structure prediction. Based on this reasoning, we constructed a library of ATR/FTIR corrected spectra for proteins of known three-dimensional structures and used it to predict the secondary structure contents of proteins not represented in the library. The transmission spectra of 29 reference proteins, each at low and high concentration, were acquired using the ATR with 1000 accumulations and at 4 cm−1 resolution. For each protein, the absorption spectrum of the protein in solution was constructed, using the transmission spectrum of the diluted sample as a reference for the concentrated sample. The baseline distortion correction was applied as indicated above. Finally, the corrected spectrum of each protein was normalized and introduced in the data library as described in the Materials and Methods section. In order to test the validity of the database thus constructed, three proteins investigated in our laboratory (the maltose binding protein MBP, the peptidylprolyl cistrans-isomerase FKPA and thioredoxin TRX from Escherichia coli) were subjected to secondary structure predictions based on their FTIR spectra. Their spectra were acquired and treated as those of the database. The structural parameters, i.e., contents in α-helices H, extended β-strands E, bends S, and hydrogen bonded turns T as defined by Kabsch and Sander (1983), were obtained from the PDB files indicated in Table 1. Secondary structure predictions were then achieved using an adapted (see Materials and Methods) version of the variable selection method (Manavalan and Johnson 1987). The reconstructed spectrum corresponding to the best fit is shown, for each protein, as a solid line in Figure 5. While the fits seem reasonably good for the maltose binding protein and the FKPA isomerase, it significantly deviates from the experimental data in the region of the amide I band peak for thioredoxin. This rather strong deviation of the thioredoxin fit is confirmed by the high value of its root mean square deviation (0.0222) as indicated in Table 1. Table 1 also shows the secondary structure predictions corresponding to the best fit for each test protein, together with their secondary structure features assessed from their known 3D structures. The comparison between the predicted versus the assessed secondary structure contents indicates that the secondary structure contents of the three proteins were predicted with fairly good accuracy from the ATR/FTIR corrected spectra, in particular for the extended β-strands of the two first proteins, while their α-helix, bend and turn contents predictions were less accurate as often reported with transmission cell FTIR.

Compared accuracies of ATR and transmission cell based secondary structure predictions

In order to compare the value of transmission cell versus ATR/FTIR spectroscopy for secondary structure predictions, each of the 20 proteins common to the ATR spectra library we constructed and to the transmission cell spectra library provided with the PROTA software were subjected to prediction based on both libraries. To this effect, the spectra of one of these proteins was removed from the two libraries, and submitted to the secondary structure prediction procedure (to predict the secondary structure of the corresponding protein) with the spectra remaining in one or the other two libraries. This procedure was repeated for each protein common to both libraries. Table 2 shows the predicted secondary structure contents of these 20 proteins using the ATR library as well as their contents assessed from their 3D structure in the PDB. Figure 6 shows, for each database, the plot of the β-strand contents of the 20 proteins assessed versus their predicted contents. For the set of data points obtained with each database, a linear regression was applied. The ordinates at zero abscissa were −0.015 and 0.026 for the ATR and transmission cell database, respectively, the slopes were 1.075 and 0.937, respectively, and the correlation coefficients were 0.896 and 0.642, respectively. Thus, in terms of ordinate at zero abscissa and of slopes, the two fits compare equally well with the expected values (0 and 1, respectively), but the ATR database provides a much better correlation coefficient which reflects a significantly more robust prediction of the β-strand contents with the ATR measurements compared to the conventional transmission cell FTIR measurements. The same method was used to analyze the spectra of the two databases, and the two databases contain the spectra of the same set of proteins. Thus, the better correlation obtained with the ATR—compared to the transmission—database likely reflects larger experimental errors introduced in recording the spectra with transmission cells (used in PROTA) rather than with the ATR.

Discussion

The results reported in this paper demonstrate that FTIR spectra recorded with an ATR accessory suffer from two types of biases: one is related to the adsorption of proteins onto the ATR crystal, the other to the properties of the evanescent wave at the crystal/solution interface. It is shown that, at protein concentrations routinely used for FTIR measurements, the light absorption by the protein adsorbed on the crystal represents a significant fraction of the light absorption by the protein in solution. This is due to the fact that, at such concentrations, the protein covers the crystal surface with a molecular monolayer that the evanescent wave has to cross before penetrating the solution. While the adsorption of proteins at interfaces has been investigated in some details by means of attenuated total reflectance using various detection methods (Hlady et al. 1999; Santos et al. 2003), its influence on FTIR spectra as recorded with an ATR has been generally overlooked (see, e.g., Smith et al. 2002). In this report, it is shown that (1) the crystal-adsorbed protein biases the quantitative estimate of protein concentrations from the FTIR spectra; (2) it also distorts the shape of the spectra since even a protein as stable as hen lysozyme can undergo post-adsorption changes in conformation. Therefore, reliable quantitative measurements as well as secondary structure predictions based on FTIR spectra must eliminate the contribution of the crystal-adsorbed protein to the IR absorption. As shown in this report, this can be easily done by recording the transmission spectra of a concentrated and a diluted protein solution, and calculating the absorption spectrum using the spectrum of the diluted solution as a reference. It should be emphasized that, while the contribution of the adsorbed protein may sometimes be considered as negligible if the FTIR spectra are recorded at very high protein concentrations, it can by no means be overlooked in spectra recorded at protein concentrations usually used for FTIR measurements. Indeed, at protein concentrations of ∼3–5 mg/mL, the contribution of the adsorbed protein is as large as that of the soluble protein.

FTIR spectra obtained by use of an ATR are also distorted by a variety of factors influencing the behavior of the evanescent wave. One of them is the variation of the penetration depth with the protein concentration, which results in a difference in the light absorption by water for the concentrated and diluted solutions. This difference is particularly important in the amide I–amide II region, which coincides with a very strong absorption band of water. As shown above, this can be easily corrected by adding to the protein spectrum a water spectrum weighted so as to minimize the absorbance of the protein solution in the 1720–1850 cm−1 region, where the protein absorbance should be zero. It should be noted that, though the physical cause of the spectrum distortion is different when transmission cells are used (irreproducibility of the cell thickness rather than influence of the protein refractive index on the evanescent wave penetration depth), the same correction procedure is routinely applied in conventional IR spectroscopy.

A second factor influencing the absorption of the evanescent wave is related to the change in the penetration depth of the evanescent wave with the light frequency. This affects all spectra recorded by means of an ATR, regardless of the solute and its concentration. Because the penetration depth is proportional to the wavelength (assuming the refractive indices of the ATR crystal and of the solution to be frequency independent, which is not rigorously correct), the penetration depth can be calculated (see formula in Griffiths and de Haseth 1986) to be ∼8%–9% longer at the wavelength of the amide II band than at that of the amide I band, thus increasing artificially the apparent intensity of the amide II, compared to amide I, band.

Another important source of spectral distortion is the so-called “anomalous dispersion.” This effect was shown (Grdadolnik 2002) to account for 10%–15% of the amide II signal observed for bovine serum albumin in an ATR. Taken together, the penetration depth dependence on the wavelength and the anomalous dispersion thus account for a 20%–25% overestimate of the amide II band intensity compared to the amide I. These two effects account for the major part, if not the totality, of the lower ratio of the amide I to amide II amplitudes observed in the ATR (∼0.9–1.1) compared to a classical transmission cell (∼1.3–1.4).

Though a mathematical formula (Griffiths and de Haseth 1986) can easily correct the experimental ATR spectra for the wavelength dependence of the penetration depth, and a procedure was developed to correct for the anomalous dispersion (Grdadolnik 2002), these corrections were not applied in the present study because the same solvent was used throughout our experiments, resulting in the same spectral distortions for all the samples analyzed, and because systematic spectral distortions, i.e., that are the same for the proteins under investigation and for the proteins in the database, do not affect the spectrum analysis procedure we used. Thus, FTIR spectra recorded using an ATR accessory, once corrected for the crystal-adsorbed protein and for the distortion caused by the protein refractive index increment, can be readily used for secondary structure predictions, as documented by the results in Table 2. Indeed, the robustness of secondary structure predictions from the spectra recorded with the ATR appears better than that of predictions made from conventional spectra recorded with transmission cells, as indicated by a much better correlation coefficient for the ATR data compared to the transmission cell data shown in Figure 6.

The fit between predicted and assessed secondary structure contents (see Tables 1, 2) is far from being perfect. For some proteins, the discrepancy between the predicted and the assessed values is rather high. This may be due in part to the fact that the IR absorption bands of several amino acid side chains overlap the amide I and II bands, and thus contribute to the part of the IR absorption spectra used in the spectral analysis. Careful spectral analysis should therefore subtract the side chain contributions from the spectra of the unknown proteins as well as from those of the database. This could, in principle, be done by recording the spectra of the side chains alone, computing a “side-chain spectrum” from the known amino acid composition of each protein, and subtracting it from the experimental spectrum. It is likely that such side-chain corrections, included in the PROTA software for the correction of conventional transmission cell spectra, would improve the robustness of the predictions. Another way to improve the secondary structure prediction from the FTIR spectra might be to include more secondary structure types in the analysis. Indeed, the analysis reported here was based on only the following structural features (as defined by Kabsch and Sander 1983): α-helices, extended β-strands, hydrogen bonded turns and bends, all residues not belonging to one or the other of these four types being qualified as “others.” Including 3–10 helices, π-helices and/or isolated β-bridges in the database might improve the overall secondary structure predictions. The limited number of secondary structure types we used might account for the rather poor prediction of the β-strand content of E. coli thioredoxin, for which the 3D structure reported in the PDB indicates the presence of six residues (i.e., 5.5%) belonging to 3–10 helices in one of the two chains per unit cell. It is plausible that this relatively high 3–10 helix content is responsible for the impossibility to obtain a better fit of the theoretical spectrum of thioredoxin to the experimental data (Fig. 5C), and hence, better secondary structure predictions. Considering the 3–10 helices as a distinct type of secondary structure (while in our analysis these helices were included in the “other” type), or grouping all the helices (α, π, and 3–10) in the same secondary structure type, could be easily done through minor modifications of the VARSELEC program, and might improve the robustness of the secondary structure prediction based on FTIR spectra.

There exists a variety of procedures, other than VAR-SELEC, that have been developed for determining the secondary structure of proteins from FTIR spectra (for reviews, see Surewicz and Mantsch 1988; Arrondo et al. 1993). Some are based on band decomposition and curve fitting including frequency and/or bandshape analysis (Cameron et al. 1982; Byler and Susi 1986; Goormaghtigh et al. 1990), others are based on the use of calibration sets (Dousseau and Pézolet 1990; Lee et al. 1990; Sarver and Krueger 1991a). A systematic comparison of their accuracy in predicting the secondary structure from FTIR spectra on the basis of data reported in the literature is not an easy task. Indeed, the studies reported rely on different procedures, but also on different databases, which makes it difficult to assess which of the procedures or the databases is responsible for the quality of the results. However, based on our experience with CD, the most critical point for obtaining good predictions from CD spectra comparison is the quality of the experimental data (both that of the sample to analyze and of the proteins in the database) as well as the proper choice of the proteins in the database to be taken into account, the latter being optimized by the Variable Selection procedure (Manavalan and Johnson 1987). We therefore chose, in our study, the VARSELEC program, which is based on the “singular value decomposition” method initially proposed for CD by Hennessy and Johnson (1981), and includes the Variable Selection procedure (Manavalan and Johnson 1987). In spite of the fact that running it is much less tedious and time consuming than using VAR-SELEC, the program contained in the PROTA package, which is based on the “principal component factor analysis” method (Pancoska et al. 1991), was not retained because it does not include the variable selection procedure. Indeed, when tested on individual spectra extracted from the PROTA database using either the PROTA or the VARSELEC algorithm, VARSELEC provided a better secondary structure prediction, justifying further that the VARSELEC procedure was preferred.

It has been reported that improved predictions can be obtained, using a principal component regression-based computer program that includes an “inside model space” bootstrap to take into account the variability of the amide I and amide II bands introduced by the various levels of hydration of the amide bonds (Smith et al. 2002). This is of particular interest in view of the significant differences in the absorption of exposed and buried amide groups that are in α-helices (Walsh et al. 2003). There also exists a variety of experimental approaches that were reported to improve the reliability and precision of secondary structure predictions from FTIR spectroscopy. Thus, coupling FTIR with circular dichroism (Sarver and Krueger 1991b), or including hydrogen exchange in the FTIR measurements (Baello et al. 2000) were shown to improve the secondary structure predictions. None of these approaches was attempted in the present study since our goal was limited to finding out whether or not ATR/FTIR can be substituted for transmittance FTIR for the sake of simplifying and eventually improving data collection. The results we obtained clearly indicate that, provided appropriate caution is exerted in collecting and analyzing the data, the use of an ATR accessory leads to secondary structure predictions that are at least as good as those obtained by conventional transmission cell spectroscopy, with the considerable advantage that sample handling is dramatically easier, more reproducible, and much faster. It is likely that, based on this conclusion, the use of ATR/FTIR spectroscopy will become of more widespread use in investigations on protein structure.

Materials and methods

Proteins

The proteins chosen to construct the database were alcohol dehydrogenase from horse liver, carbonic anhydrase from bovine erythrocyte, catalase from bovine liver, citrate synthase from porcine liver, bovine α-chymotrypsinogen, concanavalin A, cytochrome c from horse liver, enolase from baker yeast, glyceraldehyde-3-phosphate dehydrogenase from rabbit muscle, human hemoglobin, hexokinase from baker yeast, soybean trypsin inhibitor, bovine pancreatic trypsin inhibitor, human α-lactalbumin, lactate dehydrogenase from rabbit muscle, bovine β-lactoglobulin, hen egg white lysozyme, myoglobin from horse muscle, hen ovalbumin, pepsin from porcine stomach, porcine pepsinogen, horse radish peroxidase, phosphoglycerate kinase from yeast, bovine ribonucleases A and S, human serum albumin, superoxide dismutase from bovine erythrocyte, thaumatin from Thaumatococcus daniellii, and triose phosphate isomerase from baker yeast. They were all obtained from Sigma-Aldrich in the highest purity grade available and used without further purification. The test proteins were the E. coli maltose binding protein MBP and prolyl cistrans-isomerase FKPA, both purified in our unit by Dr. J.-M. Betton, and thioredoxin purified and kindly provided by the laboratory of J. Beckwith (Harvard); 0.5 to 1 mL of a solution of each protein, prepared at a concentration expected to be ∼10 mg/mL, was dialyzed for at least 15 h at 4°C against 500 mL of 0.01 M sodium phosphate pH 7.0 (except for pepsin, where the pH was 5.5). The protein concentration in the dialysate was determined by spectrophotometry, using either published specific extinction coefficients, or extinction coefficients calculated from the amino acid composition according to Pace et al. (1995). A 10-fold dilution of the dialysate in the dialysis buffer was prepared and used for the spectrophotometric concentration determination as well as to serve as the 100% transmission reference solution for the FTIR measurements (see below).

Acquisition of ATR/FTIR spectra

The FTIR spectra were recorded on a PROTA FTIR Protein Analyzer from ABB-Bomem, which corresponds to an ABB-Bomem MB104 infrared spectrometer supplemented with software specifically dedicated to the analysis of infrared spectra of proteins. The PROTA software is based on the GRAMS program from Galactic Industries Corporation and includes, among other features, a database containing the FTIR absorption spectra (obtained by using conventional transmission cells) of 33 proteins of known 3D structures, a routine allowing one to subtract the contribution of the protein amino acid side chains from its absorption spectrum, and a program to predict the secondary structure contents of a protein from its IR spectrum by comparison to the database. The spectrophotometer was equipped with a DuraSamplIR II, 9 reflection diamond ATR accessory from SensIR Technologies, and with a conventional, low-sensitivity, DTGS Deuterated Tri-Glycine Sulfate (DTGS) detector. Higher sensitivity, improved signal-to-noise ratios and shorter accumulation times would have been obtained using a narrow range Mercury Cadmium Telluride (MCT) detector. The spectrophotometer was set up in a room were the temperature was maintained at 20° ± 2°C. The volume of the sample deposited in the ATR well was 20 μL. The ATR well was covered with a slightly convex glass cover (supplied with the ATR) to prevent solvent evaporation. All spectra were acquired in the single beam mode with a 4-cm−1 resolution. For all preliminary experiments (unless otherwise stated) the data from 300 scans were accumulated and a buffer transmission spectrum was systematically recorded (300 scans) before and after each series of measurements. For the construction of the database as well as for the analysis of the three test proteins, accumulation was over 1000 scans and took ∼50 min for each spectrum. The spectra of the dialysate and its 10-fold dilution were recorded one after the other without delay so as to minimize a possible drift in the output of the light source. The absorption spectrum of each solution was constructed from its single beam spectrum using the appropriate reference spectrum and the Absorbance function of the PROTA software. Uncorrected absorption spectra were constructed using the buffer single beam spectrum as a reference. Absorption spectra corrected for protein adsorption on the ATR crystal were constructed from the spectrum of the dialysate using the spectrum of the 10-fold diluted dialysate as the reference. The baseline distortion resulting from the refractive index increment due to the protein was corrected by means of the Spectral Subtract function of the PROTA software, using as subtrahend the absorption spectrum of the buffer (obtained from the single beam spectra of buffer and air, i.e., with the empty ATR) and the 1850–1720 cm−1 frequency range for the minimization. Unless otherwise stated, after each spectrum recording, the ATR well was emptied by means of an Eppendorf pipette tip connected to a vacuum pump. Next, the ATR window was washed as follows: 50 μL of a 2% Hellmanex (Hellma Gmbh) detergent solution were introduced in the ATR well. After 1 min, the ATR window was thoroughly rubbed with the extremity of a cotton tip previously soaked in the detergent solution. The excess detergent was removed from the compartment by aspiration and the whole washing operation was repeated a second time. The ATR well was then thoroughly rinsed by flushing it with distilled water with simultaneous aspiration of the excess water. The window and well were then carefully dried by aspiration of all traces of residual water.

Estimation of secondary structure

The secondary structure contents of the proteins were estimated using the single value decomposition with the variable selection (VARSELEC) procedure proposed by Manavalan and Johnson (1987). Two databases were used, one consisting in the 33 reference spectra included in the PROTA software, the second consisting in the 29 protein spectra acquired with the ATR accessory. In all cases, the spectra were normalized to 1 at the peak of the amide I band and only the 165 values corresponding to the spectral range between 1797 and 1481 cm1 were taken into account for the analysis. Four structural types were considered: α-helix, extended β-strands, bends, and H-bonded turns. The fraction of residues belonging to each structural type was assessed from the DSSP analysis (Kabsch and Sander 1983) of the PDB entry corresponding to each protein. For each secondary structure prediction, the best fit was selected from the combination(s) of spectra corresponding to the smallest RMS value and the sum of the structural fractions (between 0.90 and 1.10) closest to 1.0. In order to test the reliability of the secondary structure determination with either the PROTA database or the database we constructed, each of the 20 spectra of the reference proteins common to both databases were submitted to the VARSELEC procedure after its removal from the corresponding database.

The PDB files used for assessing the secondary structure contents in the ATR database were as follows: alcohol dehydrogenase, 1YE3; carbonic anhydrase, 1CA2; catalase, 7CAT; citrate synthase, 1CTS; α-chymotrypsinogen, 2CGA; concanavalin A, 1NLS; cytochrome c, 1AKK; enolase, 1EBH; glyceraldehyde-3-phosphate dehydrogenase, 1J0X; hemoglobin, 1BBB; hexokinase, 2YHX; soybean trypsin inhibitor, 1AVU; bovine pancreatic trypsin inhibitor, 4PTI; α-lactalbumin, 1HFZ; lactate dehydrogenase, 6LDH; β-lactoglobulin, 1BEB; lysozyme, 6LYZ; myoglobin, 1WLA; ovalbumin, 1OVA; pepsin, 4PEP; pepsinogen, 3PSG; peroxidase, 1HCH; phosphoglycerate kinase, 3PGK; ribonuclease A, 3RN3; ribonuclease S, 2RNS; serum albumin, 1AO6; superoxide dismutase, 2SOD; thaumatin, 1RQW; and triose phosphate isomerase, 1YPI.

Electronic supplemental material

The information provided in the electronic supplementary material consists in three files, all in Microsoft Word. One, named INSTRUCTIONS FOR USE.doc, is a complete description of the way in which (1) the data corresponding to the ATR/FTIR spectra corrected for protein adsorption and water-related baseline distortion should be formatted for use in the VARSELEC program, and (2) the various files introduced in the VARSELEC program have to be adapted for the analysis of FTIR spectra. A second file, S.DTA, contains the spectra of the 29 proteins of the ATR/FTIR database formatted for use in the VARSELEC program. The last file, named SS.DTA, contains the secondary structure content of the 29 proteins of the ATR/FTIR database, formatted for use in the VARSELEC program.

Table Table 1.. Secondary structure prediction for three test proteins
 HESTOSumRMS
  1. a

    The transmission spectra of the E. coli maltose binding protein, FKPA cis-trans isomerase, and oxidized thioredoxin were recorded both at high (16.5, 10.0, and 10.4 mg/mL, respectively) and low (1.65, 1.0, and 1.04 mg/mL) protein concentrations, respectively. The absorption spectrum of each protein was calculated taking the transmission at low protein concentration as the 100% reference. The spectra were corrected for water distortion, normalized in amplitude, and analyzed with the adapted VARSELEC method selecting the best set of 26 spectra among the 29 present in the FTIR/ATR database (3654 combinations). The results (upper line for each protein) are shown as the fraction of residues in each secondary structure type and compared to the values assessed from the PDB (lower line for each protein, in italics). The names of the PDB files used are indicated in the table under the protein name. The assessed values indicated for thioredoxin are the averages of those reported for the two chains present in the unit cell, for which the helical contents (particularly α-helices and 3-10 helices) are strikingly different. The secondary structure type symbols are H for α-helices, E for extended β-strands, S for bends, T for hydrogen bonded turns, and O for others. “Sum” indicates the sum of the relative contents of all structural types. RMS is the root-mean-square deviation calculated for the best fit.

MBP0.360.160.080.170.241.010.0149
PDB 1LLS0.430.160.090.130.191.00 
FKPA0.410.200.120.80.251.060.0119
PDB 1Q6U0.350.1550.040.150.3051.00 
TRX0.300.180.110.80.310.980.0222
PDB 2TRX0.260.2550.050.240.1951.00 
Table Table 2.. Secondary structure prediction for each protein of the database
 HESTOSumRMS
  1. a

    For each protein that is represented in both the ATR/FTIR database and the PROTA database, the spectrum was removed from the database and the secondary structure of the corresponding protein was predicted by means of the adapted VARSELEC method, selecting the best set of 25 spectra among the 28 remaining in the database (3276 combinations). The results (upper line) are shown as percentage of residues in each secondary structure type and compared to the values assessed from the PDB (lower line, in italics). For each protein, the name of the PDB file used is indicated in the table under the protein abbreviation. The abbreviations are Alc. DH, alcohol dehydrogenase; Carban, carbonic anhydrase; Cit. Synth, citrate synthase; Conca. A, concanavalin A; Chtgen, chymotrypsinogen; Cyto. C, cytochrome c; Sup. dismut, superoxide dismutase; Triose-P. I, triose phosphate isomerase. The symbols for secondary structure types are as in Table 1.

A1c. DH0.270.260.080.140.261.010.0110
PDB 1YE30.230.220.100.140.311.00 
Carban0.170.310.110.140.291.010.0139
PDB 1CA20.080.290.120.130.381.00 
Catalase0.230.250.150.130.311.060.0189
PDB 7CAT0.260.140.120.1550.3251.00 
Cit.Synth0.570.040.050.180.160.990.0098
PDB 1CTS0.540.0150.0650.140.241.00 
Conca. A−0.140.450.080.120.290.940.0259
PDB 1NLS00.4650.100.1250.311.00 
Chtgen0.200.290.080.130.270.980.0216
PDB 2CGA0.070.290.100.1750.3651.00 
Cyto. c0.410.030.140.130.240.960.0222
PDB 1AKK0.3400.170.140.351.00 
Enolase0.400.150.120.120.261.040.0074
PDB 1EBH0.380.160.070.130.261.00 
Hemoglobin0.61−0.010.070.140.170.990.0092
PDB 1BBB0.6700.060.110.161.00 
Lysozyme0.490.070.080.150.200.990.0137
PDB 6LYZ0.300.060.070.280.291.00 
Hexokinase0.420.100.100.120.261.000.0122
PDB 2YHX0.390.150.1350.090.2351.00 
Myoglobin0.540.070.070.110.221.010.0101
PDB 1WLA0.7400.0250.130.1051.00 
Ovalbumin0.240.250.120.160.311.080.0102
PDB 1OVA0.2950.290.050.150.2151.00 
Pepsin0.160.340.110.130.271.010.0246
PDB 4PEP0.110.410.080.140.261.00 
Pepsinogen0.200.330.090.120.271.010.0216
PDB 3PSG0.120.360.060.130.331.00 
RNase A0.100.250.130.170.350.990.0161
PDB 3RN30.180.330.0950.190.2051.00 
RNase S0.070.300.100.230.321.010.0215
PDB 2RNS0.180.310.090.1050.3151.00 
Sup. dismut0.150.290.130.130.341.050.0207
PDB 2SOD00.3550.300.050.2951.00 
Thaumatin−0.040.360.160.110.381.000.0075
PDB 1RQW0.1050.310.170.1450.271.00 
Triose-P. I.0.320.160.110.110.300.990.0092
PDB 1YPI0.380.160.0850.110.2651.00 
Figure Figure 1..

Crude ATR/FTIR absorption spectra of hen lysozyme. The transmission spectra of 0.01 M sodium phosphate buffer pH = 7.0 and of lysozyme solutions at different concentrations in the same buffer were recorded in single beam mode. The corresponding absorption spectra were constructed using the buffer spectrum as the 100% reference. For each protein concentration, the absorbance (in arbitrary units) is shown as a function of the light frequency. The lysozyme concentrations were 1, 2, 4, 6, and 8 mg/mL, respectively, for the spectra in the order of increasing peak height.

Figure Figure 2..

Peak values of the amide I and II bands as a function of lysozyme concentration. The values of the absorbances at the peaks of the amide I and II bands in the spectra of Figure 1 are shown as a function of the lysozyme concentration. Squares and circles correspond to the amide I and amide II bands, respectively. The straight lines were obtained by linear regression.

Figure Figure 3..

ATR/FTIR absorption spectrum of crystal-adsorbed lysozyme. Lysozyme (20 μL) at 8 mg/mL was incubated for 10 min in the ATR well, carefully removed by aspiration under vacuum, and replaced with 20 μL of buffer. The transmission spectrum was recorded in single beam mode with 100 accumulations. The absorption spectrum shown was then constructed using the transmission spectrum of pure buffer as the 100% reference.

Figure Figure 4..

Corrected ATR/FTIR absorption spectra of a lysozyme solution. The transmission spectra of buffer and of lysozyme solutions at 0.9, 2.5, 5, 7.5, and 10 mg/mL were successively recorded. A new recording was then made with the 0.9-mg/mL solution. The absorption spectra were constructed from the spectrum at each concentration using that at 0.9 mg/mL as the 100% transmission reference. (A) The spectrum constructed from the 10 mg/mL (corresponding to a 9.1-mg/mL solution) is shown in solid line. The spectrum distortion due to the protein refractive index increment was corrected as indicated in Materials and Methods. The corrected spectrum is shown in dotted lines. (B) The amide I (filled circles and solid line) and amide II (filled squares and dotted line) peak heights derived from the spectra corrected only for the adsorption are shown as a function of the lysozyme corrected concentration (i.e., the concentration at which the spectrum was recorded minus 0.9 mg/mL, the concentration of the solution used as the reference). The straight lines were obtained by linear regression.

Figure Figure 5..

Spectral analysis of test proteins. The corrected ATR/FTIR absorption spectra of E. coli maltose binding protein (A), FKPA prolyl cistrans isomerase (B), and thioredoxin (C) were constructed and analyzed using the adapted VARSELEC method selecting the best set of 26 spectra among the 29 (spectra) of the complete database. For each protein, the open circles show the experimental data and the solid line corresponds to the theoretical spectrum reconstructed from the best fit.

Figure Figure 6..

Compared robustness of the ATR and transmission cell FTIR-based predictions of β-strand contents. The β-strand content of each protein represented in both the ATR and PROTA FTIR spectra database were predicted by removing its spectrum from each database and analyzing either the ATR or the transmission cell FTIR spectrum with the adapted VARSELEC program using the cognate reduced database. The β-strand content assessed from the PDB is plotted vs. the predicted content. The open triangles and dotted line correspond to the PROTA-based prediction, and the closed circles and solid line correspond to the ATR-based prediction. The two straight lines were obtained by linear regression. The crossed circles correspond to the three test proteins indicated by the arrows.

Acknowledgements

This paper is dedicated to Robert L. Baldwin in recognition of his precious contribution, as Ph.D. advisor 40 years ago, in making one of us (M.E.G.) aware of the risk of protein adsorption to glass surfaces. This research was supported by grants from the Institut Pasteur, the CNRS (National Center for Scientific Research, URA 2185) and the French Ministry of Research and Technology (ACI Prions, SDV 03/1347).

Ancillary