Glycosylphosphatidylinositol-modified (GPI) proteins are widely found in lower and higher eukaryotic organisms (Eisenhaber et al., 2001). In fungi, GPI proteins are known to be either covalently incorporated into the cell wall network or to remain attached to the plasma membrane. Various functions have been suggested for them. They may be involved in cell wall biosynthesis and cell wall remodelling, they may determine surface hydrophobicity and antigenicity, and they are thought to have a role in adhesion and virulence (Hoyer, 2001; Klis et al., 2001; Sundstrom, 2002). The predicted amino acid sequences of GPI proteins conform to a general pattern. At their N-termini, they have a hydrophobic signal sequence that directs the protein to the ER. At their C-termini, GPI proteins have a second hydrophobic domain, which is cleaved off and replaced with a GPI anchor, a preformed lipid in the membrane of the endoplasmic reticulum (Orlean, 1997). Two independent studies aimed at genome-wide identification of the GPI proteins of Saccharomyces cerevisiae have earlier shown that sequence motifs can be successfully used to identify most of the GPI proteins (Caro et al., 1997; Hamada et al., 1998a). Hamada and co-workers used three criteria to select for GPI proteins: (a) the presence of a hydrophobic tail; (b) the presence of a N-terminal signal peptide for secretion; and (c) the presence of a serine/threonine (S/T)-rich region. Although often found in GPI proteins, the presence of a S/T-rich region has not shown to be an absolute prerequisite for GPI proteins to become attached to a GPI-anchor and therefore Caro et al. (1997) did not use this feature as a selection criterium. With the first selection criterium being the presence of a N-terminal signal peptide for secretion, final identification of GPI proteins by Caro et al. (1997) was based on a more specific definition of the GPI-attachment site (ω-site) and its downstream sequences in the C-terminal region of GPI proteins as described by Udenfriend and Kodukula (1995) and Nuoffer et al. (1993). In detail, the ω site could be an Asn, Ser, Gly, Ala, Asp or Cys residue followed by two amino acids with relatively short side chains, the ω + 2 position being more critical (Figure 1). The ω region is followed by a less defined spacer region of about 10 amino acids followed by a hydrophobic tail. The length of the C-terminal signal peptide normally varies between 15 and 30 residues (Udenfriend and Kodukula, 1995) but, as was shown in S. cerevisiae for Gas1, signal peptides of 31 residues do occur (Nuoffer et al., 1991). In the set of S. cerevisiae GPI proteins so obtained, Asn or Gly predominated at the GPI attachment site (Caro et al., 1997).
The function of the hydrophobic tail is to retain the protein on the membrane until GPI modification. The length and the hydrophobic character of the tail, rather than its specific sequence, therefore seem to be relevant for GPI anchoring (Ikezawa, 2002). Consistent with this, Coyne et al. (1993) showed GPI modification of a truncated protein with a synthetic tail of at least 11 Leu residues, whereas introduction of a single charge in the middle of the hydrophobic tail blocks GPI anchoring (Nuoffer et al., 1991).
What exactly determines the final destiny of GPI proteins of S. cerevisiae, cell wall or plasma membrane, is not exactly known. The presence of basic residues in the region immediately upstream of the GPI attachment site (ω− region) is favoured in proteins that are predominantly localized in the plasma membrane (Vossen et al., 1997). On the other hand, the presence of V, I or L at 4 or 5 amino acids upstream of the ω site (ω-4, ω-5) and Y or N at ω-2 has been shown to act as a positive signal for cell wall localization (Hamada et al., 1998b, 1999). Thus, GPI protein localization seems at least partly to be determined by the amino acids in the ω− region.
In this paper, we aimed at developing an algorithm for the identification of fungal GPI proteins using S. cerevisiae and Candida albicans proteins, for which GPI anchorage is known or likely, as positive controls. We have validated the algorithm on the S. cerevisiae genome and tested the predictive power of our algorithm on the human pathogenic yeast C. albicans, the fission yeast Schizosaccharomyces pombe and the filamentous ascomycete Neurospora crassa.
Materials and methods
The non-redundant open reading frames (ORFs) from the C. albicans genome were retrieved from CandidaDB (http://genolist.pasteur.fr/CandidaDB/). This genome database was created by the EU-funded consortium ‘Galar Fungail’ by performing independent annotation of assembly 19 sequence data obtained from the Stanford Genome Technology Center (http://www-sequence.stanford.edu/group/candida). The S. cerevisiae genome sequence was retrieved from the Saccharomyces Genome Database (http://genome-www.stanford.edu/Saccharomyces). For one S. cerevisiae ORF, Ecm33/Ybr078w, we modified the C-terminal region of the peptide sequence because it is now known that this ORF encodes a protein of 429 instead of 468 amino acids (Hamada et al., 1999). Annotated genome sequences of Sz. pombe were obtained from the Sanger Institute (Hinxton, UK) at http://www.sanger.ac.uk/Projects/S_pombe and the annotated N. crassa genome sequence (release 3: 2.12.2002) was downloaded from the Neurospora crassa Genome Database of the Whitehead Institute/MIT Center for Genome Research (http://www-genome.wi.mit.edu).
In silico analysis
Searches for proteins having a potential GPI anchor addition signal at their C-termini, as defined by our algorithm, were performed using the program FUZZPRO from the EMBOSS software package at http://www.hgmp.mrc.ac.uk/Software/EMBOSS/. The selected ORFs were screened further for the presence of a signal sequence for import into the ER using SignalP version 2.0 at http://www.cbs.dtu.dk/services/SignalP-2.0/. SignalP version 2.0 uses two signal peptide prediction methods, one based on Neural Networks (SignalP-NN; Von Heijne, 1986) and one based on Hidden Markov Models (SignalP-HMM; Nielsen et al., 1997). The standardized threshold value for signal peptides both in SignalP-NN (Smean) and SignalP-HMM (Sprob) is 0.5. To avoid any false positives we used a cut-off value of 0.6 for both methods. Only proteins that are predicted to have a signal peptide by both methods (SignalP-NN and SignalP-HMM) were selected for further analysis. The presence of integral transmembrane (TM) domains was analysed with TMHMM version 2.0 (http://www.cbs.dtu.dk/services/TMHMM/) and PSORT II (http://psort.nibb.ac.jp). TMHMM predicts TM domains with a Hidden Markov Model based on the presence of hydrophobic regions of at least 18 residues (Krogh et al., 2001) and PSORT II uses the method that was described by Nakai and Kanehisa (1992), using a threshold value of 2.0. Proteins that do not have any internal TM domain according to PSORT II and TMHMM were listed as putative GPI proteins. PSORTII was also used for protein localization predictions. BLASTP searches were performed at NCBI (http://www.ncbi.nlm.nih.gov/BLAST). The results of our in silico analysis are freely accessible at http://www.pasteur.fr/recherche/unites/Galar_Fungail/.
Developing a fungal-specific GPI algorithm
In order to develop an algorithm that would selectively identify GPI proteins of the pathogenic yeast C. albicans and other fungi, we analysed the C-terminal regions of known and predicted S. cerevisiae GPI proteins identified by Caro et al. (1997), who used an algorithm (Figure 1A) that was based on sequence requirements for GPI anchoring as described by Coyne et al. (1993). To make the algorithm more selective for GPI proteins we made some adjustments in accordance with Nuoffer et al. (1993) and Udenfriend and Kodukula (1995). In detail, the requirement for N or G at the GPI-attachment site (ω site) was made less stringent, whereas the presence of charged residues in the 11 most C-terminal amino acids, the hydrophobic tail, was not allowed. We also found that none of the GPI proteins for which GPI anchorage is supported by biochemical data contained glutamine (Q) in their hydrophobic tail. With the new algorithm we then tried to identify C. albicans proteins that were known or predicted, based on homology with S. cerevisiae proteins, to be GPI-anchored. Among these C. albicans proteins were Phr1, Phr2, Phr3, Plb proteins, Kre1, Hwp1, Hyr1 and Als proteins. To identify these C. albicans proteins with the algorithm, small adjustments were required. The uncharged or hydrophobic tail was shortened to 10 amino acids, the region between the hydrophobic tail and the ω site was extended to a maximum of 19 amino acids and L and F were allowed at position ω + 1. The length of the C-terminal signal peptide as defined by this algorithm can vary from 16 to 31 residues (Figure 1B).
Validation of the algorithm using the S. cerevisiae genome sequence
To test whether the algorithm thus created was still selective for S. cerevisiae GPI proteins, we analysed the S. cerevisiae genome, which was downloaded from Stanford Genome Database (SGD). Using the FUZZPRO program from the EMBOSS software package our algorithm selected 187 proteins (Table 1). These proteins were further analysed for the presence of a N-terminal signal peptide for secretion using SignalP version 2.0, which includes SignalP-NN and SignalP-HMM. Proteins giving a positive signal peptide prediction with both methods were subsequently analysed for the absence of internal transmembrane (TM) domains using TMHMM version 2.0 and PSORTII. Proteins that do not contain internal TM domains according to these methods were listed as predicted GPI proteins. In this way we selected 66 putative GPI proteins in S. cerevisiae (Table 2), which is considerably more than the 51 proteins that were selected with the screening method used by Caro et al. (1997). For two ORFs in our list, Kre1 (a known GPI protein) and YMR158W-A (an unknown ORF), the TM domain prediction was not unambiguous because TMHMM recognized a possible TM domain, but PSORT II did not.
Table 1. Genome-wide identification of putative GPI proteins. Four different fungal species were analysed using the C-terminal GPI algorithm [NSGDAC]–[GASVIETKDLF]–[GASV]–X(4,19)–[FILMVAGPSTCYWN](10)>. ORFs with a C-terminal GPI motif were screened for additional GPI protein characteristics, as described in the text
Number of ORFs analysed
Number of ORFs with GPI motif
Number of putative GPI proteins
Table 2. Putative GPI proteins of Saccharomyces cerevisiae. S. cerevisiae GPI proteins were identified with the C-terminal GPI algorithm in a non-redundant genome file obtained from SGDa
Three S. cerevisiae proteins, for which GPI attachment has been suggested based on structural similarity, were not recognized by our C-terminal algorithm. Two of these, Yps2/Mkc7 and Crr1, were also not picked up with the algorithm that was used by Caro et al. (1997). Both have multiple positively charged residues in the C-terminal hydrophobic tail. Surprisingly, the aspartic protease Yps2 has been reported to be cleaved from membranes using a GPI-specific phospholipase (Komano and Fuller, 1995). Possibly, the C-terminal region of Yps2 may contain sequencing errors. Gas5 (YOL030w), a protein belonging to the Gas1 family of GPI proteins, was not recognized by our algorithm, because it has a negatively charged glutamic acid residue in the hydrophobic tail. Our algorithm also did not recognize four unknown ORFs (YPL130w, YOR214c, YJL171w and YLR042c) that were identified by Caro et al. (1997). Finally, YPL261c and Sps2 (a protein with homology to GPI protein Ecm33), which were also identified in their screen for GPI proteins, were excluded from our GPI protein list because their N-terminal regions are unlikely to be signal peptides for secretion.
For 16 of the 22 additionally identified proteins, GPI modification has been shown or predicted, such as for the cell wall proteins Cwp2, Ssr1, Fit1, Fit2, and Yar066w and Yhr214w, which both show similarity to the sake yeast specific cell wall protein Awa1 (Shimoi et al., 2002). Dfg5, which shows similarity to bacterial endomannanases (Kitagaki et al., 2002) and is required for filamentous growth, Dse4 which shows similarity to endo-1,3-β-glucanases, and the aspartic protease Yps6 and phospholipase Plb3 are associated with the plasma membrane. The new algorithm therefore seems to be more effective and selective in recognizing GPI proteins than the earlier one. One of the newly identified proteins, Mnl1, which has homology to α-mannosidases of carbohydrate-active enzyme family 47, has been reported previously to be localized in the endoplasmic reticulum (Nakatsukasa et al., 2001). This localization was performed with a HA-tag fused to the C-terminus of the protein. In the case of GPI anchor modification, this would cause mislocalization of the protein and this might explain why Mnl1 was not found at the cell surface. Furthermore, the Mnl1 peptide sequence does not contain an ER retention motif, it has no predicted TM domains (according to both the TMHMM and the PSORT II program) and PSORT II predicts this protein to be localized at the cell surface with a high probability of 67%. GPI proteins usually have a high percentage of S and T residues, the side-chains of which are potential sites for O-glycosylation (Klis et al., 2002). The S/T content in our set of GPI proteins varies from 13% to 55% with an average of 25% for the 22 new proteins. This also indicates that at least the majority of these proteins are GPI proteins.
For many of the GPI proteins we identified, our algorithm recognizes multiple sites at which cleavage of the GPI anchor signal peptide could occur. According to Udenfriend and Kodukula (1995), the length of this signal peptide can vary (15–30 amino acids, with an average of 23 amino acid residues). Analysing the known GPI proteins from S. cerevisiae, we also found an average length of 23 amino acids for the GPI signal peptide. Therefore, in Tables 2–3 we have indicated signal peptides that are preferentially of about this length.
Table 3A. Putative GPI proteins of Candida albicans that give significant BLAST results. C. albicans GPI proteins were identified with a C-terminal GPI algorithm in a non-redundant genome file based on Stanford assembly 19
Table 3B. Putative GPI proteins of Candida albicans that do not give significant BLAST results. C. albicans GPI proteins were identified with a C-terminal GPI algorithm in a non-redundant genome file based on Stanford assembly 19
C. albicans GPI proteins were selected from a non-redundant genome file containing 6726 ORFs that was created by independent annotation of Stanford's contig assembly 19 by members of the EU Framework V program Galar Fungail. From this file, our algorithm selected 237 ORFs, and further analysis of these ORFs resulted in 104 putative GPI proteins (Table 3A, 3B). Extending the undefined spacer region of 19 amino acids in our algorithm with one residue resulted in eight additional proteins, neither of which have a predicted signal sequence according to SignalP version 2.0. For Pga1, which has homology to the GPI proteins ScKre1 and CaKre1, TMHMM recognized a possible TM domain but PSORT II did not (Table 3A). Some ORFs in genome assembly 19 are broken at the contig ends. For those ORFs, corresponding N- and C-terminal parts in the genome file were combined as indicated in Table 3A. For Ecm33.3, a reliable signal peptide prediction is lacking because a truncated version (no translational start codon found) is present in CandidaDB. This ORF has been confirmed by cDNA sequencing (Fradin and Hube; see CandidaDB), and has a C-terminal GPI anchor sequence, homology with GPI proteins and an S/T content of 25%. The overall S/T content in the C. albicans GPI proteins varies from 13% to 48% with an average of 28%, which is comparable to what we found in S. cerevisiae. For 51 of the selected ORFs, BLAST results indicated that they are related to known or putative GPI proteins. Proteins that did not yet receive a functional name are now, in accordance with CandidaDB, tentatively named Pga for predicted GPI-anchored proteins.
Among the identified ORFs are members of the Als protein family (10 ORFs), Hyr1, 7 Iff/Hyr1-related ORFs, Csa1, Rbt5, two Csa1/Rbt5-related ORFs, Hwp1, Rbt1 and a Hwp1/Rbt1-related ORF, all (related to) typical C. albicans proteins. For Als proteins and Hwp1 cell wall association has been shown immunologically by releasing proteins with β-1,6-glucanase and β-1,3-glucanase, respectively (Kapteyn et al., 2001, Sundstrom, 2002). In C. albicans nine ALS genes have been discovered thus far (Hoyer, 2001). Proper annotation of this protein family in CandidaDB has been hampered because these proteins have regions that are almost identical, they have internal repeats and they are large (ca. 900–2000 + residues in CandidaDB), which may have led to erroneous alignment of contigs. Additionally, we found orthologues of known S. cerevisiae GPI proteins, e.g. Ssr1, Dfg5, Cht2, Sap9 and Kre1, and we also found orthologues of known S. cerevisiae GPI protein families, e.g. four ORFs of the Gas family, three phospholipases (Plb3, 4 and 5), two members of the Ecm33 family, two Crh-family proteins and two chitinases (Cht1 and Cht2). By mass-spectrometric analysis of cell wall tryptic digests, Cht2 has recently been shown to be indeed cell surface-associated (Iranzo et al., 2002).
Identification of putative Sz. pombe GPI proteins
Besides S. cerevisiae and C. albicans, GPI proteins have recently also been identified in a variety of other fungi. Examples are the agglutinin-like sequence (Als) proteins in Candida tropicalis and Candida dubliniensis (Hoyer et al., 2001), the Epa1 adhesin of Candida glabrata (Frieman et al., 2002), Gas/Phr homologues in C. glabrata (Weig et al., 2001), Candida maltosa (Nakazawa et al., 2000) and Aspergillus fumigatus (Mouyna et al., 2000), Fem1 of Fusarium oxysporum (Schoffelmeer et al., 2001), MP1 of Penicillium marneffei (Cao et al., 1998), CAP22 of Glomerella cingulata (Hwang and Kolattukudy, 1995), Ag2 of Coccidioides immitis (Zhu et al., 1996), Psu1 (Omi et al., 1999), Yps1 (Ladds and Davey, 2000) and the Dfg5 homologue SPCC970.02 (Kitagaki et al., 2002) of Schizosaccharomyces pombe, and the Dfg5 homologue NCU03 770.1 of Neurosporacrassa (Kitagaki et al., 2002). All these proteins have C-terminal regions that conform to the GPI algorithm, which suggests that the algorithm is applicable for identification of GPI proteins in other fungi. We therefore also applied our algorithm to the ORFs identified in the complete genome sequences of the fission yeast Sz. pombe and the filamentous ascomycete N. crassa. In the Sz. pombe file that was downloaded from the website of the Sanger Institute, we found 33 GPI protein candidates among 4950 ORFs (Table 1). In two of the selected ORFs, PSORT II recognized a possible TM domain whereas TMHMM did not. Both these proteins, SPCC1795.09 (aspartic protease) and SPCC757.12 (α-amylase), have homology with known or predicted GPI proteins from other fungi. For 19 of the identified proteins, BLAST analysis showed homology with known or putative GPI proteins (Table 4), which indicates that the number of false positives that we may have picked up will be low. As for C. albicans and S. cerevisiae we found, for instance, α-amylases, aspartic proteases and members of the typical Gas/Phr-, Ecm33- and Plb GPI protein families. The S/T content in Sz. pombe GPI proteins varies between 11% and 60% with an average of 29%. Interestingly, the 14 unannotated ORFS have an average S/T content of 41%, strengthening the notion that these are indeed GPI proteins. Remarkably, in SPBPJ4664.02, a large ORF of 3971 residues, ca. 260 copies of the amino acid sequence N[ST]STPITSST[AV][LV] are present. Considering that this protein is expected to be secreted and that it has one potential N-glycosylation site and seven potential O-glycosylation sites in each repeat, this indicates that it may be extremely heavily glycosylated. Multiple S/T-rich repeated sequences are also present in Als proteins (Hoyer, 2001) and hyphally-related sequences (HYR1; Bailey et al., 1996) of C. albicans and in C. glabrata Epa1 (Frieman et al., 2002), which are all yeast GPI cell wall proteins. BLAST analysis of the regions preceding and following the 12 amino acid repeats unfortunately did not give further clues about the function of SPBPJ4664.02.
Table 4. Putative Schizosaccharomyces pombe GPI proteins. Sz. pombe GPI proteins were identified with the C-terminal GPI algorithm in a non-redundant genome file created by the Sanger Institute
To search for GPI proteins in the filamentous fungus Neurospora crassa, we used assembly 3 of the Whitehead Institute. In N. crassa we found 97 GPI protein candidates in a file containing 10 085 ORFs (Table 1). Of the selected ORFs, only NCU09 729, a protein with homology to chitinase GPI proteins, gave an ambiguous TM domain prediction (PSORT II predicting one Tm domain, TMHMM predicting absence of TM domains). For 43 of the predicted GPI proteins, BLAST analysis revealed homology with known or putative GPI proteins (Table 5). This indicates that our algorithm also efficiently identifies GPI proteins in N. crassa. The S/T content in the putative N. crassa GPI proteins varies from 12% to 37% with an average of 21%, which is slightly lower than in S. cerevisiae, C. albicans and Sz. pombe. Similar to the results obtained for the three yeasts, we also found in N. crassa orthologues of the Gas family and the Crh family, but members of the Ecm33 family were not identified. Furthermore, we found a diverse range of proteolytic and carbohydrate-modifying enzymes and orthologues of known cell surface proteins, such as Sz. pombe Psu1 and C. albicans Csa1. Surprisingly, the GPI algorithm also selected the rodlet protein Ccg-2, which belongs to a class of hydrophobic cell surface proteins that share a common pattern of eight cysteine residues and are found in various filamentous fungi (Wessels, 1997) but not in yeasts.
Table 5. Putative Neurospora crassa GPI proteins. N. crassa GPI proteins were identified with the C-terminal GPI algorithm in a non-redundant genome file (release 2) that was obtained from the Whitehead Institute
The use of algorithms to identify GPI proteins has been shown in the past to be a powerful method (Caro et al., 1997; Hamada et al., 1998a; Eisenhaber et al., 2001). Using such algorithms, two independent studies each identified about 50 putative GPI proteins in the yeast S. cerevisiae (Caro et al., 1997; Hamada et al., 1998a). Combining the output of these two studies indicates that approximately 70 GPI proteins are present in S. cerevisiae, as has already been suggested (Klis et al., 2002). With the algorithm used in the present study, we selected 66 candidate GPI proteins, including almost all known GPI proteins. The fact that we missed Gas5 and some unknown ORFs, among which are Yor214c and Ypl130w, in our screen indicates that an uncharged hydrophobic tail and the absence of glutamine residue(s), as demanded by our algorithm, may be slightly too strict. Alternatively, sequencing errors may be involved, as has been shown for Ecm33 (Hamada et al., 1999). Gas5 belongs to a family of GPI proteins that are involved in β-1,3-glucan remodelling but has itself not yet directly been shown to be GPI-anchored. For Yor214c and Ypl130w, incorporation into cell walls was observed when the C-terminal 40 amino acids were fused to reporter constructs (Hamada et al., 1999). These three ORFs have one charged residue in the hydrophobic tail. Thus, an even more complete set of proteins could possibly be obtained if an algorithm could be defined that allows one mismatch in this tail; however, this might also cause more false positives to be selected by our screen.
After optimizing the algorithm in such a way that all known C. albicans GPI proteins were recognized, we identified a set of 104 putative GPI proteins. This indicates that, in comparison to S. cerevisiae, C. albicans has an increased number of different GPI proteins. This may at least partly be explained by the presence in C. albicans of proteins and protein families that are specifically expressed during filamentous growth (e.g. Hwp1, Hyr-related proteins, Als3 and Als8) and/or determine surface hydrophobicity (Csa1) or function as adhesins (Als proteins, Hwp1; reviewed in Sundstrom, 2002). In a study aimed at identification of putative GPI cell wall proteins, 54 ORFs were selected from a set of 152 putative GPI proteins by analysing the region immediately upstream of the proposed GPI attachment site for sequence features that are characteristic for cell wall or plasma membrane proteins (Sundstrom, 2002). All the proteins in that study for which BLAST results indicated cell wall association, except for Crh12, were also recognized by our algorithm. Crh12 was not recognized by the algorithm because of the presence of a glutamic acid residue in the hydrophobic tail, but it further shows normal GPI protein characteristics. Crh12 and its homologues Crh11 and Utr2, are structurally related to the Crh protein family in S. cerevisiae, which has been suggested to be involved in β-1,3-glucan remodelling or incorporation of chitin into the β-glucan network of the cell wall (Rodriguez-Pena et al., 2000). In total, 16 ORFs from the Sundstrom (2002) dataset were not found in our screen and six of those represent second alleles of other listed GPI proteins. Sundstrom (2002) performed their screen on Stanford genome assembly 6, which comprises 9168 ORFs, whereas we used a non-redundant genome file comprised of 6726 ORFs that was created by independent annotations by the Galar Fungail consortium based on the more updated Stanford genome assembly 19. Five other ORFs were not recognized by our GPI algorithm and the remaining five ORFs were discarded from our list because of other non-GPI-like properties, such as the absence of a clear signal peptide for secretion.
The GPI algorithm, which was primarily designed to identify GPI proteins of S. cerevisiae and C. albicans, also seems to be quite effective and selective for application in other fungi. First of all, the C-terminal regions of all currently known fungal GPI proteins from other fungal species match with the GPI algorithm. Second, BLAST results indicated that 58% of the Sz. pombe and 44% of the N. crassa proteins that we have identified are homologous to known or putative GPI proteins. Third, the number of proteins we identified in the different species is comparable, although in Sz. pombe, consistent with the low amount of galactomannan in its walls (Sietsma and Wessels, 1990), we found only 33 putative GPI proteins. When the GPI algorithm was further tested on the rice blast fungus Magnaporthe grisea, we obtained a set of putative GPI proteins similar to the closely related N. crassa. However, the genome file for M. grisea, which can currently be obtained from the Whitehead Institute, represents an early genome assembly and therefore it is not yet possible to provide detailed information.
In order to predict which GPI proteins remain attached to the plasma membrane and which proteins are covalently attached to β-1,6-glucan in the cell wall, we have analysed the sequence requirements for the ω− region for efficient incorporation of proteins into the cell wall, as proposed by earlier studies on S. cerevisiae (Hamada et al., 1998b, 1999; Vossen et al., 1997). First, we have analysed in known S. cerevisiae cell wall proteins the occurrence of V, I or L at positions 4 or 5 amino acids upstream of the ω site and of Y or N two amino acids upstream of the predicted ω sites, which has been shown to be positively correlated with cell wall incorporation (Hamada et al., 1999). In the cell wall proteins Cwp1, Cwp2 and Ssr1, none of these amino acids is present at these positions (Table 2; Caro et al., 1997), which indicates that it is not allowed to use these features too strictly as a direct requirement for cell wall localization. Second, the presence of basic residues in the ω− region is often found in plasma membrane proteins but not in cell wall proteins (Vossen et al., 1997). Indeed, we also find that positively charged side-chains are rare in the ω− region of the known cell wall-associated proteins of S. cerevisiae and C. albicans. However, experimental evidence supporting this observation is lacking, which makes it dangerous to further classify the newly identified GPI proteins solely on the basis of this sequence feature. Also, recent immunological studies showed that part of the plasma membrane protein Gas1 is associated with the cell wall network, which indicates that a default localization of GPI proteins may not exist (Meyer et al., 2002). Furthermore, analysis of ω− sequence characteristics is hampered by the fact that GPI proteins often seem to have multiple potential sites for GPI attachment. Firm conclusions on sequence requirements for cell wall and plasma membrane localization of GPI proteins awaits a more detailed analysis of GPI attachment sites and upstream regions and the identification of GPI proteins isolated from cell walls or the plasma membrane.
We thank all members of the EU-funded Framework V program ‘Galar Fungail’ for valuable discussions and their contribution to gene annotations. We are indebted to Christophe d'Enfert of the Institut Pasteur (Paris) for creating the CandidaDB database and for help with initial GPI motif searches. CandidaDB was created using sequence data for Candida albicans obtained from the Stanford Genome Technology Center (Stanford, CA) which also created SGD. Sequencing of C. albicans was accomplished with the support of the NIDR and the Burroughs Wellcome Fund. The Neurospora genome sequence was obtained from Whitehead Institute/MIT Center for Genome Research and annotated genome sequences of Sz. pombe were retrieved from the Sanger Institute (Hinxton, UK). PdG was supported by the European Commission (QLRT-1999-30795).