A theoretical and experimental proteome map of Pseudomonas aeruginosa PAO1

A total proteome map of the Pseudomonas aeruginosa PAO1 proteome is presented, generated by a combination of two-dimensional gel electrophoresis and protein identification by mass spectrometry. In total, 1128 spots were visualized, and 181 protein spots were characterized, corresponding to 159 different protein entries. In particular, protein chaperones and enzymes important in energy conversion and amino acid biosynthesis were identified. Spot analysis always resulted in the identification of a single protein, suggesting sufficient spot resolution, although the same protein may be detected in two or more neighboring spots, possibly indicating posttranslational modifications. Comparison to the theoretical proteome revealed an underrepresentation of membrane proteins, though the identified proteins cover all predicted subcellular localizations and all functional classes. These data provide a basis for subsequent comparative studies of the biology and metabolism of P. aeruginosa, aimed at unraveling global regulatory networks.


Introduction
The pseudomonads comprise a group of Gram-negative bacteria with a high metabolic versatility allowing them to adapt to a broad range of environmental niches. Pseudomonas aeruginosa is an opportunistic pathogen responsible for severe life-threatening infections in immunocompromised patients. For example, in individuals with cystic fibrosis, chronic colonization of the lung mucosa by P. aeruginosa is a major cause of death (Govan and Deretic 1996;Lyczak et al. 2002;Ratjen and Doring 2003).
Pseudomonas aeruginosa possesses a strong inherent antibiotic resistance, partly due to extensive efflux systems and a highly impermeable membrane (Ahmad 2002). In addition, an increasing number of P. aeruginosa strains have developed an alarming level of acquired antibiotic resistance, caused by their large and adaptable genome, which, in combination with the development of impermeable biofilms, creates an even greater challenge in the battle against P. aeruginosa infections (Hancock and Speert 2000;Singh et al. 2000;Stewart and Costerton 2001;Drenkard 2003). Given its importance as a human pathogen, P. aeruginosa represents a useful model organism. Moreover, the availability of the completed 6.3-Mbp genome of P. aeruginosa PAO1 (Stover et al. 2000), revealing 5570 annotated Open Reading Frames (ORFs) (PseudoCAP) (Winsor et al. 2005), offers the opportunity to perform extensive proteome analyses.
In the past, studies have focused on disrupting biofilms and identifying new intracellular targets to develop novel classes of antibiotics (Stewart and Costerton 2001). Proteomic studies provide more insight into gene function and will play a vital role in unraveling the basic biology of microorganisms. Several recent P. aeruginosa studies using twodimensional gel electrophoresis (2-DE) aimed at both exploring the adaptation of the organism under nutrient and oxygen limitation (Hummerjohann et al. 1998;Quadroni et al. 1999;Guina et al. 2003;Heim et al. 2003;Wu et al. 2005b;Siqueira Reis et al. 2010) and at understanding of virulence (Hanna et al. 2000;Termine and Michel 2009), biofilm formation (Yoon et al. 2002;Southey-Pillig et al. 2005;Nigaud et al. 2010), and quorum-sensing signals (Arevalo-Ferro et al. 2003).
Here, the cytoplasmic 2-D reference map of the P. aeruginosa PAO1 proteome is presented, complementing the previously mapped P. aeruginosa membrane proteome (Nouwens et al. 2000) and periplasmic proteome (Imperi et al. 2009). 2-DE provides the reproducibility required for creating a reliable reference map, in combination with MALDI-TOF, MALDI-TOF/TOF, and ESI-MS/MS for protein identification. The experimental and theoretical proteome were compared using the data generated from the 181 identified protein spots. The proteome map presented here may serve as a reference for future studies, allowing comparative analyses for a variety of Pseudomonas strains under diverse conditions.

2-DE and image analysis
All 2-DE separations and image analyses were carried out using GE Healthcare devices and reagents. Iso Electric Focusing was performed using IPG strips (24-cm Immobiline DryStrips with linear pH gradient range 3-10 or 4-7). The strips were rehydrated overnight in a denaturating reswelling solution (7 M urea, 2 M thiourea, 2% w/v CHAPS, DeStreak Reagent, 0.5% IPG buffer, and a trace of bromophenol blue). The samples were applied by anodic cup-loading, and IEF was performed in the Ettan IPGphorII according to Görg et al. (2000). Following IEF, proteins were reduced and alkylated as described by Bae et al. (2003) using equilibration buffer I and II (6 M urea, 30% w/v glycerol, 2% w/v SDS in 50 mM Tris-HCl, pH 8.8) containing 1% DTT and 4% iodoacetamide (IAA), respectively. Subsequently, the second dimension (SDS-PAGE) was run in 1-mm thick vertical gels (15% polyacrylamide) using the Ettan DALTsix (GE Health-Care, UK). Protein spots were visualized by colloidal CBB G-250 staining (Neuhoff et al. 1988) or MS compatible silver nitrate staining (Shevchenko et al. 1996). Image acquisition was performed using a calibrated flatbed ImageScanner, combined with LabScan software. 2-DE maps were analyzed and spot data generated using ImageMaster 2D Platinum software. For each biological sample, six replicate gels were made.

In-gel protein digestion
Protein digestion was performed as detailed by Shevchenko et al. (1996). In short, Coomassie blue spots were excised from the gels and destained. The proteins were reduced and alkylated, whereafter the gel slices were sequentially hydrated and dried. Trypsin (Promega, Madison, WI) was added, followed by overnight digestion. Finally, peptides were extracted from the gel by sonication.

Mass spectrometry
Prior to mass spectrometric analysis, peptide samples were dried in a vacuum centrifuge and desalted using ZipTip C 18 pipette tips (Millipore, Bedford, MA). MALDI-TOF analyses were performed on a Reflex IV (Bruker Daltonik GmbH, Bremen, Germany) operating in reflectron mode. The matrix, consisting of saturated α-cyano-4-hydroxycinnamic acid in aceton, was cocrystallized with the peptide sample by the dried droplet technique. ESI-MS/MS was performed on an LCQ Classic (ThermoFinnigan, San Jose) equipped with a nano-LC column switching system as described previously (Dumont et al. 2004).

Protein identification
Proteins were identified by searching the NCBI database using Sequest (ThermoFinnigan) and Mascot (Matrix Science, (B) The predicted charge distribution is bimodal with a minor third peak. The pI of cytoplasmic proteins (red) is typically lower than the pI of membrane proteins. (C) The pattern on a virtual two-dimensional gel electrophoresis (2-DE) gel has a butterfly appearance. To obtain a general overview, IPG strips with pH 3-10 will be used. (D) In the virtual 2D-gel, the shift of cytoplasmic proteins (red) toward lower pI, and membrane proteins (blue) toward higher pI is observed again. MA). One missed cleavage was allowed and a mass tolerance of 0.3 Da was used. Possible modifications such as carbamidomethylation of cysteine and oxidation of methionine were included. For unambiguous peptide-mass fingerprint identification, more than five peptides must be matched and the sequence coverage must be greater than 15%. Agreement between theoretical and experimental pI and M r was also taken into account.

In silico analysis
All calculations were based on the 5570 annotated protein sequences included in the database of P. aeruginosa PAO1 (PseudoCAP) (Winsor et al. 2005). This database also provided information about predicted cellular localization and cluster of orthologous groups (COG) functional categories. The physical parameters of the proteins were computed with the ProtParam Tool at the ExPASy server (Gasteiger et al. 2005), calculating the theoretical pI as described by Bjellqvist et al. (1993) and the grand average of hydrophobicity (GRAVY) according to Kyte and Doolittle (1982). The codon adaptation index (CAI) of identified proteins was measured by the CAI calculator (Wu et al. 2005a) using the equation of Sharp and Li (1987) and a codon usage template of highly expressed genes (Grocock and Sharp 2002). Signal peptides were predicted using SignalP 3.0 (Brendtsen et al. 2004). Parameter statistics were performed by online QuickCalcs tools.

Results and Discussion
Theoretical P. aeruginosa PAO1 proteome The relatively large genome of P. aeruginosa (6.3 Mbp) contributes to its high versatility and environmental adaptability. With 5570 annotated genes, P. aeruginosa PAO1 is capable of expressing a proteome comparable in size and complexity to lower eukaryotes such as Saccharomyces cerevisiae (Stover et al. 2000).
Because physical parameters can be predicted from protein sequences using web-based tools, exploring the properties of the theoretical P. aeruginosa proteome allows to choose appropriate conditions for 2-DE. Although these properties may be altered by posttranslational modification for a minority of the proteins, typically, isoelectric point (pI) and relative molecular mass (M r ) can be accurately calculated.

Relative molecular mass (M r )
The 5570 annotated P. aeruginosa proteins show a unimodal mass distribution with the majority of protein masses between 10 and 50 kDa, with a long tail up to 120 kDa (Fig. 1A). This proteome consists of only 239 small (<10 kDa) and 126 large (>100 kDa) proteins, while the remainder 93% has an M r suitable for regular 2-DE. Hence, no adaptation of standard 2-DE methods was needed.

Isoelectric point (pI)
The predicted isoelectric points for the 5570 P. aeruginosa proteins were calculated and showed a bimodal charge distribution with peaks around pI 5.5 and 9.5. An additional minor peak is visible around pI 7.8, while almost no proteins have a pI near 7.5 (Fig. 1B). The majority of P. aeruginosa proteins (64%) have pI-values between 4 and 7, while only 5% fall outside the range of commercial IEF strips (3-11). Taking into account the predicted protein sublocalization, a shift toward the acidic region for cytoplasmic proteins (mean pI = 6.36) and toward the alkaline region for predicted inner membrane proteins (mean pI = 8.11) is observed (Fig. 1B). The shift is universal among all three domains of life (Schwartz et al. 2001). The significantly higher (P < 0.0001) pI-value of membrane proteins is consistent with the fact that most biomembranes have negatively charged surfaces (Schwartz et al. 2001).

Theoretical 2-DE gel
Virtual 2-DE gels are generated by plotting the theoretical M r against the theoretical pI. A map was made using a linear scale on the x-axis to imitate protein mobility during isoelectric focusing and a logarithmic scale on the y-axis to represent protein migration during SDS-PAGE. The pI range was set from 3 to 11 and the M r range from 3 to 300 kDa (Fig. 1C). The theoretical proteome plot reveals a "butterflydistribution," the left wing consisting of acidic proteins, the right wing of alkaline proteins. The body part represents the minor peak near pH 8. This pattern was previously reported for Escherichia coli (Link et al. 1997b;VanBogelen et al. 1997) as well as for other bacteria (Link et al. 1997a;Urquhart et al. 1997;Drews et al. 2004) and appears to be similar for proteomes in all three domains of life (Archaea, Eubacteria, and Eukarya) (Knight et al. 2004;Weiller et al. 2004). The near absence of proteins with cytoplasmic pH (between 7.2 and 7.4) (Urquhart et al. 1998) may be caused by avoidance of the intracellular pH, at which proteins are difficult to maintain in solution. Additionally, Schwartz et al. (2001) state that the pI bimodality may be the result of the need for different pI-values depending on subcellular localization, since membrane proteins have a significantly higher pI-value than cytoplasmic proteins. This hypothesis is supported by the fact that eukaryotes show trimodal pI distribution, with the third peak mainly consisting of nuclear proteins.
Considering subcellular localization of P. aeruginosa proteins, a shift in the virtual 2-DE gel toward the left and right side, for cytoplasmic and inner membrane proteins, respectively, is again observed (Fig. 1D). The membrane proteome of P. aeruginosa was mapped previously by Nouwens et al. (2000); this study mainly focuses on cytoplasmic proteins. The resolving power is enhanced by focusing on the pI range 4-7, within which the pI of the major part of cytoplasmic proteins falls (77.5%).

2-DE map of the P. aeruginosa proteome
Optimal results for protein extraction were obtained using protease inhibitors, EDTA, and DNaseI. A total of 300-400 μg of proteins extracted from P. aeruginosa, exponentially growing on rich medium, were applied by anodic cup loading. To obtain a general overview, initial protein separations were performed on IPG strips with a pH range 3-10. As predicted, most visible protein spots were concentrated in the acidic region of the gel (95%). For higher resolution of cytoplasmic proteins, a switch to strips with a pH range of 4-7 was made. The estimated number of 2-DE detectable proteins with a pI between 4 and 7, an M r between 10 and 100 kDa, and low hydrophobicity (GRAVY < 0.400) is 3319. On the silver-stained gels, 1128 spots were detected using ImageMaster software (Fig. 2), accounting for approximately 33% of the theoretically detectable proteome. Under the used growth conditions, a total proteome expression is not expected. When making a general comparison with a similar expression analysis in E. coli (Richmond et al. 1999), the relative number of expressed housekeeping genes compared to the total number of gene products is consistent.

Protein identification
In the reference gel with pH range 4-7, a random subset of spots distributed over the two-dimensional map were selected. One hundred and eighty one spots were unambigously identified by MS, originating from 159 different protein species (Table 1; Fig. S1). Spot analysis always resulted in the identification of a single protein, although the same protein may be detected in two or more neighboring spots (as discussed below).

Comparison between theoretical and experimental M r and pI
Predicted and experimental pI and mass of identified proteins is shown in Table 1. The high correlation between both values for pI and M r is displayed in the scatter plots (Fig. 3).

pI-values
Ninety-three percent of all identified proteins have an experimental pI approximating the predicted value. Thirteen proteins have an experimental pI that is at least 0.50 units lower than the predicted pI (spot numbers marked with an * in Table 1, pI-values underlined). The most common modification influencing the proteins' isoelectric point in prokaryotes is single or multiple phosphorylation (Deutscher and Saier 2005), lowering the pI due to the negative charge of the phosphate group. Two-component sensor kinases, such as PA4886 that shows a strong pI-shift (-1.91), are known to autophosphorylate (Rodrigue et al. 2000). For some of the proteins with a lowered pI-value (PA1084, PA2800, PA5076, and PA0291), a signal peptide was predicted by SignalP. After excluding these amino acids in the sequences, the proteins' theoretical masses and charges are close to the experimental values, suggesting indeed signal peptide cleavage. The exact nature of the modification can be deciphered by dedicated mass spectrometric analysis, which was beyond the aim of this study.

M r -values
Ninety-seven percent of the identified proteins have an experimental M r matching the predicted value. Modifications influencing protein mass are isoform splicing or addition of heavy groups, for example, ADP-ribosylation. The coverage of identified peptide fragments was well spread over the complete protein sequence. Four proteins are at least 5 kDa smaller than predicted, probably caused by the removal of a signal peptide, while 13 are larger than predicted, presumably carrying unknown modifications (spot numbers marked with an * in Table 1, M r -values underlined).

Protein isoforms
As many as 16 proteins, especially high-abundant proteins, appear as multiple spots on the gel (spot numbers bold in Table 1). Half of these proteins show only a pI shift, the other half show both a shift in charge and mass. These spots may be artifacts caused by the high abundance or may be the result of actual posttranslational modification. Little is known, however, about the full extent of protein modification and isoforms in bacteria. SucD, for example, was found in four separate spots (41-44) (Fig. 2), with a pI range of 5.42-5.72, while the predicted pI is 5.79 (Table 1). Crystal structures have revealed a phosphorylation of SucD in E. coli (Wolodko et al. 1994), possibly explaining the lowered pI-value of the highly similar SucD in P. aeruginosa.

Subcellular localization and GRAVY
All annotated P. aeruginosa PAO1 proteins were classified according to their predicted localization (PseudoCAP) (Fig. 4). This calculation shows that 41% of the proteome is localized in the cytoplasm, 19% is directed to the cytoplasmic membrane. A small fraction is transported to the periplasm (2%), the outer membrane (3%), or the extracellular environment (1%). The remaining one-third of the proteins cannot be localized based on their amino acid composition. This distribution of proteins at each localization is consistent across species, independent of proteome size (Gardy et al. 2005).
Among the 159 identified proteins, no extracellular proteins are found. This is not surprising since these are most likely discarded along with the growth medium during sample preparation. Outer membrane proteins and periplasmic proteins are present (12 and 16, respectively), but cytoplasmic membrane proteins are considerably underrepresented (6), consistent with the assumption that integral membrane proteins have low solubility near their isoelectric point and are thus difficult to detect under standard 2-DE conditions. The GRAVY value predicts the hydrophobicity of a protein: hydrophobic membrane proteins are believed to have a positive value. Therefore, GRAVY values ought be linked to the subcellular localization. The calculation of the mean GRAVY values confirms this assumption for P. aeruginosa. The mean value of the total P. aeruginosa proteome is -0.075. Predicted inner membrane proteins have a significantly (P < 0.0001) higher GRAVY value (0.448) than predicted intracellular proteins (-0.193). Periplasmic and outer membrane proteins, on the other hand, typically have negative GRAVY values.
The identified proteins have a mean GRAVY value of -0.129, which is slightly lower than the total proteome value (P < 0.05). Among these proteins, only one has a GRAVY value above 0.400 (PA4053, spot 106). Therefore, the underrepresentation of cytoplasmic membrane proteins is assumed to be caused by their high hydrophobic nature and by the chosen pI range.   Nevertheless, this is in agreement with the observation of Grocock and Sharp (2002), who pointed out that the CAI appears to be a poor statistic for organisms with a biased base composition, such as P. aeruginosa that has a GC-content of 67%.

Functional classification
All bacterial proteomes present in the public databases, including P. aeruginosa, were classified in COG protein categories, representing major biological cell functions (Tatusov et al. 1997). The protein distribution seems to be fairly similar for all bacteria, and no COG category appears to be overrepresented in the large P. aeruginosa proteome (http://www.ncbi.nlm.nih.gov/sutils/coxik.cgi?gi=163). The 159 identified proteins represent every existing COG category (Table 1). Even some low-abundant signaling proteins were identified, indicating a good representation of the total proteome on the 2-DE gel. Half of the identified proteins are important for metabolism, particularly energy conversion and amino acid metabolism. One-quarter functions in cellular processes, for example, protein turnover or cell envelope biogenesis. Other proteins play a role in translation or are poorly characterized.
The majority of identified proteins, which included large spots, function in carbohydrate metabolism and energy production. These represent enzymes from major biochemical pathways such as oxidative phosphorylation (7), reductive carboxylate pathway (4), pentose phosphate pathway (4), carbon fixation (6), citrate cycle (6), glycolysis and gluconeogenesis (6). This high representation suggests a strong expression of these key enzymes. Other major identified proteins on the 2-DE gel correspond to chaperones (GrpE, GroEL, GroES, trigger factor, and DnaK) responsible for proper folding of newly formed proteins. Protein chaperones and energyconversion enzymes also appear as intense spots on other bacterial 2-DE maps (Wolodko et al. 1994;Rodrigue et al. 2000;Gardy et al. 2005).

Hypothetical proteins
Apart from the classified proteins, 19 spots correspond to proteins marked as hypothetical in the Pseudomonas database, 12 of which so far lacked experimental confirmation (PA0446, PA0664, PA0976, PA1597, PA1677, PA1837, PA2806, PA3302, PA3481, PA3801, PA4458, and PA5339). Among those 19 proteins, 12 are conserved in other organisms. Obviously, their substantial expression suggests that they have biological roles in P. aeruginosa, which are thus far elusive. Their presence on a 2-DE gel opens perspectives for comparative studies.

Conclusions
We report a proteome analysis of P. aeruginosa PAO1, a species representing many strains of either clinical or environmental importance. The theoretical and experimental proteomes were compared by generating a 2-D reference map. On this map focused on cytoplasmic proteins, 181 spots were identified as corresponding to 159 different protein entries. Despite the low amount of hydrophobic proteins, these results show that the spots on the 2-DE map form a satisfactory and representative subset of the P. aeruginosa proteome; proteins from all predicted subcellular localizations and all functional categories are detected and identified. Moreover, 19 proteins, so far classified as hypothetical, are now experimentally confirmed. The data provide a reference for subsequent comparative studies of the biology and metabolism of P. aeruginosa, aimed at unraveling global regulatory networks.