Yersinia species utilize a type III secretion system to inject toxins, called Yops (Yersinia outer proteins), into eukaryotic cells. The N-termini of the Yops serve as type III secretion signals, but they do not share a consensus sequence. To simplify the analysis of type III secretion signals, we replaced amino acids 2–8 of the secreted protein YopE with all permutations (27 or 128) of synthetic serine/isoleucine sequences. The results demonstrate that amphipathic N-terminal sequences, containing four or five serine residues, have a much greater probability than hydrophobic or hydrophilic sequences to target YopE for secretion. Multiple linear regression analysis of the synthetic sequences was used to obtain a model for N-terminal secretion signals. The model accurately classifies the N-terminal sequences of native type III substrates as efficient secretion signals.
Many Gram-negative pathogens deploy type III secretion systems during infections of eukaryotic hosts (for a review, see Hueck, 1998). Pathogenic Yersinia species have two type III secretion systems: one is encoded on an approximately 70 kb plasmid that is required for virulence (Gemski et al., 1980; Zink et al., 1980; Portnoy et al., 1985), whereas the other is chromosomally en-coded (Haller et al., 2000). The plasmid-encoded Yersinia type III secretion system comprises about 25 Ysc (Yop secretion) proteins, many of which are homologous to those found in the type III secretion systems of a wide variety of plant and animal pathogens (Hueck, 1998). Yersinia species use this type III secretion system to in-ject six toxins, called Yops (Yersinia outer proteins), into eukaryotic cells to inhibit phagocytosis and downregulate the host inflammatory response (for reviews, see Cornelis and Wolf-Watz, 1997; Cornelis et al., 1998). YopE, in particular, is a Rho-GTPase activating protein (Von Pawel-Rammingen et al., 2000) that induces a cytotoxic response by disrupting the actin cytoskeleton of host cells (Rosqvist et al., 1991).
Cornelis and co-workers were the first to identify Yersinia type III secretion signals. Specifically, they found that fusing the first 15 or 17 residues of YopE or the tyrosine phosphatase YopH (Guan and Dixon, 1990), respectively, to the calmodulin-dependent adenylate cyclase (Cya) reporter was sufficient to direct its export via the Yersinia type III secretion system (Sory et al., 1995). Subsequently, it was shown that the first 11 amino acids of YopE are sufficient to target the Cya reporter to the bacterial cell surface (Schesser et al., 1996). Although the N-termini of the different Yops do not share a consensus sequence, a mutational analysis indicated that the first seven amino acid residues of YopE are the most important for secretion (Schesser et al., 1996). The N-terminal secretion signal hypothesis was later questioned by Schneewind and colleagues, who suggested that it is actually the 5′ coding regions of yop mRNAs that serve as secretion signals (Anderson and Schneewind, 1997, 1999; Anderson et al., 1999). In recent work, however, we demonstrated that the N-terminus of YopE is indeed critical for secretion, whereas the sequence of the 5′ coding region of yopE mRNA is not (Lloyd et al., 2001). In addition, mutants with altered N-terminal secretion signals can be targeted for secretion by the YopE chaperone YerA (Cheng et al., 1997; Lloyd et al., 2001).
We also showed that a YopE mutant in which amino acids 2–8 were replaced by a synthetic sequence consisting of alternating serine and isoleucine residues is secreted at wild-type levels (Lloyd et al., 2001). In contrast, replacement of amino acids 2–8 of YopE, with either polyserine or polyisoleucine sequences, abolishes the secretion of YopE. These results suggested that a physical property, namely amphipathicity, of the YopE N-terminus is critical for secretion, although the limited number of synthetic sequences analysed prevented a definitive conclusion from being drawn.
Here, we report the first comprehensive analysis of the ability of synthetic N-terminal sequences to target substrates for type III secretion. Specifically, we replaced amino acids 2–8 of YopE with all (27 or 128) permutations of synthetic serine/isoleucine sequences. The results demonstrate that the Yersinia type III secretion system preferentially exports substrates containing amphipathic N-terminal sequences. In addition, the data obtained from the synthetic sequences was used to calculate a multiple linear regression model of N-terminal secretion signals. The model correctly classifies the N-termini of native type III substrates as secretion signals.
Sequence analysis of type III secretion substrates
Although the N-termini of type III secretion substrates serve as secretion signals, they do not share any significant amino acid homology. To gain insights into the important features of type III secretion signals, we ana-lysed the sequences of 58 known or predicted type III secretion substrates from a wide variety of Gramnegative pathogens. These included Yersinia, Salmonella, Pseudomonas, enteropathogenic E. coli, Shigella and Xanthomonas species. These substrates included both effector proteins, as well as exported components of the secretion machineries. We limited our analysis to the first eight amino acids of these proteins. In contrast to the better characterized Sec-dependent secretion signals that are quite hydrophobic (Pugsley, 1993), the N-termini of type III secretion substrates are slightly polar, with 57% of the N-terminal residues comprising polar or charged amino acids, and 43% consisting of hydrophobic amino acids (Table 1). No single polar residue predominates as serine, threonine and asparagine are abundant. Among hydrophobic residues, isoleucine is the most abundant amino acid (excluding initiating methionine residues), constituting 10% of the secretion signals.
Table 1. Amino acid composition of type III secretion signals.
Per cent composition
a. includes initiating methionine residues.
For simplicity, amino acid residues were classified as hydrophobic (A, C, F, I, L, M, V, W, Y) or hydrophilic (D, E, G, H, K, N, P, Q, R, S, T). Sequences included in the analysis were: Yersinia pseudotuberculosis, YpkA, YopJ, YopH, LcrQ, YopN, LcrV, YopB, YopD, YopM, YopT, YopK, YopE, YscF, YscH, YscP, YscX and TyeA; Salmonella typhimurium, SspH 1, SspH 2, SlrP, SptP, SopB, SopE, SipA, SipB, SipC, SipD, SpiC, InvJ, PrgI and PrgJ; enteropathogenic Escherichia coli, EspA, EspB, and EspD; Pseudomonas aeruginosa, PscF, PscH, PcrV, PopB, PopD, PopN, ExoS, ExoT, ExoU, Pcr1 and Pcr3; Shigella flexneri, IpaA, IpaB, IpaC, IpaD, VirA, MxiH and MxiI; Pseudomonas syringae, HrpA, HrpW, HrpZ, AvrB, and AvrPto; and Xanthomonas campestris, AvrBs2.
Per cent hydrophobic
Per cent hydrophilic
Secretion of YopE variants with synthetic N-terminal sequences
Given the apparent sequence complexity of the N-termini of type III secretion substrates, we believed that analysing simpler sequences would provide important insights into the nature of type III secretion signals. Therefore, we replaced residues 2–8 of YopE with all permutations (27 or 128) of serine/isoleucine sequences. We focused ex-clusively on serine and isoleucine residues because of the aforementioned sequence analysis, the fact that a preliminary study of three synthetic serine/isoleucine se-quences suggested that an amphipathic N-terminus is required for the secretion of YopE (Lloyd et al., 2001), and that the native YopE N-terminus is rich in serine and isoleucine residues.
YopE variants containing synthetic serine/isoleucine N-terminal sequences were expressed in trans under the control of the arabinose-inducible PBAD promoter (Guzman et al., 1995) such that the 5′ UTR (untranslated region) of the yopE transcript was vector-derived. In a previous study, we demonstrated that wild-type YopE, expressed in such a manner, complemented a yopE null strain, YPIII(pIB522), and a yopE-yerA double null strain, YPIII(pIB525) (Table 2), both with respect to secretion in liquid cultures and to the ability to induce a cytotoxic effect in infected HeLa cells (Lloyd et al., 2001). The 128 YopE variants were expressed in YPIII(pIB525) and secretion was measured by growing the bacteria at 37°C in the absence of calcium, conditions known to induce the Yersinia type III secretion system (Michiels et al., 1990). As a control, a YopE mutant in which amino acids 2–8 were deleted was not secreted, thus confirming that the N-terminus of YopE is indeed required for secretion (Fig. 1). Of the YopE variants, 108 were expressed at significant levels and were classified according to their level of secretion (Fig. 1). The results demonstrate that YopE variants containing either 0, 1, 2, 3, 6 or 7 serine residues are typically not secreted at high levels, whereas YopE variants containing 4 or 5 serine residues are more likely to be secreted efficiently (Figs 2 and 3). When the experiments were performed in strain YPIII(pIB522), which expresses the YopE chaperone YerA, almost all of the YopE variants were secreted at high levels, with the homo-polyserine variant being the sole exception (data not shown). This confirms, as has previously been shown, that YerA can target YopE mutants lacking functional N-terminal targeting signals for secretion (Cheng et al., 1997; Lloyd et al., 2001).
Previous mutational analyses of YopE failed to identify missense mutations in the N-terminus of YopE that abolished secretion (Schesser et al., 1996; Anderson and Schneewind, 1997). By analysing our collection of synthetic YopE variants, we identified seven cases in which a single amino acid change was sufficient to convert a non-secreted YopE variant to one that was highly secreted (Fig. 4). Note that in all but one case, the secretion-restoring ‘mutation’ resulted in a sequence of greater amphipathic character. These results highlight the ability of the Yersinia type III secretion system to discriminate between potential substrates based on small changes in the sequence and/or hydrophobicity of the N-terminal secretion signal.
Secretion of synthetic YopE variants lacking the YerA-binding domain
Previous work by Cornelis and co-workers showed that the secretion of native YopE is inefficient in a yerA mutant, and that this is a result of the YerA-binding domain of YopE. However, a mutant YopE protein lacking the YerA binding site was secreted efficiently in the absence of YerA (Boyd et al., 2000). Therefore, to confirm that the poor secretion of YopE variants containing 0, 1, 2, 6 or 7 serine residues was not due to the presence of the YerA-binding domain, we engineered these 38 variants into a mutant YopE protein that lacked residues 50–74, which were previously shown to be required for YerA binding (Schesser et al., 1996). The YopE(Δ50–74) synthetic N-terminal variants were expressed under the control of the arabinose-inducible PBAD promoter (Guzman et al., 1995) in the yopE null strain, YPIII(pIB522) (Table 2) and Yop secretion was assayed (Fig. 5 and data not shown). The fact that the secretion of these synthetic N-terminal variants was quite poor, even though they lack the YerA-binding domain, confirms that their low level of secretion is due to the fact that they have defective N-terminal secretion signals. For the non-secreted variants, levels of soluble YopE in cytosolic extracts were similar to that of the wild type (data not shown).
Cornelis and colleagues also demonstrated that the translocation of a YopE15–Cya fusion protein into eukaryotic cells is increased in a strain lacking the other Yop effectors (Boyd et al., 2000). To test whether the secretion of these YopE(Δ50–74) synthetic N-termina variants could be improved in the absence of other Yops, they were expressed in a multiple yop mutant strain, YPIII(pIB29MEKAJ), that lacks the secreted effectors YopE, -H, -J, -M and YpkA. The results demonstrate that the secretion of these synthetic variants is often improved when they do not have to compete with other Yops for access to the secretion machinery (Fig. 5 and data not shown).
Detailed sequence-secretion analysis using multiple linear regression
The large number of synthetic YopE variants enabled us to analyse these sequences using multiple linear regression analysis. The original 128 permutations of isoleucine and serine correspond to making a full factorial design in two levels and seven factors or positions, which allows the calculation of all linear terms, as well as two-, three-, four-factor, etc. interactions among the positions. How-ever, the missing data for 20 of the permutations restricted the terms possible to calculate. The regression coefficients for the final model were calculated as described in Experimental procedures (Fig. 6). The model can be interpreted as follows: all linear terms, except for the coefficient for the first synthetic position, are significant and negative, and the coefficients for p4 and p5 have the largest absolute value. This means that, in general, serine increases secretion and is especially favourable at positions 4 and 5. However, it is notable that many of the two factor interactions are negative. For example, the interpretation of the largest negative two-factor interaction is that if there is a serine in position 4, serine and isoleucine residues are equally preferred at position 3. Furthermore, the values of the three-factor interactions are positive, indicating a preference for isoleucine. Thus, the linear terms reveal that serine is favourable in all positions, but the higher terms indicate that variability of the amino acid sequence is favourable. According to the regression coefficients, a favourable combination of amino acids, among many, is Ile-X-Ser-Ser-Ile-Ser-Ser. In addition, the model separates completely the non-secreted and highly secreted sequences (Fig. 5); how-ever, the weakly secreted sequences partially overlap with both the non-secreted and highly secreted sequences.
The set of 58 native sequences were also fit to the model after first having been transformed into binary code (–1 for hydrophilic and +1 for hydrophobic amino acids) according to the hydrophobicity scale of Kyte and Doolittle (1982) (Fig. 6). Of these, 38 were predicted to belong to the highly secreted class, 19 to the weakly secreted class and one to the non-secreted class.
Previous studies have demonstrated that the N-terminus of YopE serves as a type III secretion signal. Specifically, the labs of Cornelis and Wolf-Watz demonstrated that fusing as few as the first 11 amino acids of YopE to the Cya reporter is sufficient to direct its export to the bacterial cell surface in a type III-dependent manner (Sory et al., 1995; Schesser et al., 1996). Furthermore, a mutational analysis suggested that the first seven amino acids of YopE are the most important for secretion (Schesser et al., 1996). Schneewind and colleagues subsequently suggested that the N-terminus of YopE is dispensable for secretion and that the 5′ coding sequence of yopE mRNA serves as a targeting signal (Anderson and Schneewind, 1997). Recent work from our laboratory, however, re-confirmed that the N-terminus of YopE serves as a secretion signal, whereas the 5′ coding region of yopE mRNA does not (Lloyd et al., 2001).
The fact that the N-termini of Yersinia type III substrates do not share a consensus sequence has made it difficult to understand how the secretion apparatus could recognize them. A sequence analysis of the N-termini of 58 known or predicted secretion substrates revealed that these sequences are slightly polar; 57% of the amino acids are polar residues and 43% are hydrophobic re-sidues. The absence of extremely polar or hydrophobic sequences at the N-termini of type III secretion substrates could, of course, simply reflect random chance. The re-sults obtained from our analysis of synthetic N-terminal sequences, however, indicates that such sequences, if they were to arise, would be strongly counterselected as they often adversely affect secretion.
Specifically, very hydrophobic or very polar N-terminal sequences were not, in general, efficiently secreted. In fact, of 28 sequences containing 0, 1, or 2 serine residues, only six were secreted, all at low levels. In addition, for highly polar sequences containing 6 or 7 serines, only two out of seven were secreted at high levels. Whereas the number of these polar sequences is relatively small, the results suggest that highly polar sequences are less likely to promote efficient secretion. Particularly important in this regard is the fact that the homo-polyserine sequence was the sole synthetic N-terminal sequence that abolished YopE secretion in the presence of the chaperone YerA. In contrast, amphipathic sequences were more likely to direct high levels of YopE secretion. For sequences containing three serines, nine out of 30 were secreted at high levels, 19 were secreted at low levels and only two were not secreted. For synthetic sequences containing four serines, 22 out of 30 were efficiently secreted, seven were secreted at low levels, and only one sequence was not secreted. Of the 13 synthetic sequences containing five serines, 10 were efficiently secreted, whereas the remainder were secreted at low levels. These results suggest that amphipathic to moderately polar sequences comprising four or five serine residues are optimal for secretion.
It should be noted that the secretion of many of the extremely hydrophobic or polar variants, when fused to a mutant YopE protein that lacked the YerA binding site, was improved in a multiple yop mutant strain. These results indicate that these synthetic sequences are still capable of directing YopE to the secretion apparatus, but are not able to compete efficiently with native substrates. These results confirm, as first demonstrated by Cornelis and colleagues (Boyd et al., 2000), that secreted substrates compete with one another for access to the secretion apparatus.
Multiple linear regression analysis of the synthetic sequences confirmed that amphipathic sequences at the N-terminus of YopE are generally preferred. Specifically, the regression coefficients for the linear terms revealed that serine residues generally increase secretion. This is as a result of the fact that YopE variants with hydrophobic N-terminal sequences were not secreted or were weakly secreted. The higher order terms from the regression analysis indicate, however, that multiple consecutive serines are not as favourable, as evidenced by positive coefficients (indicating a preference for isoleucine) for the three-factor terms. Furthermore, a plot of observed secretion versus predicted secretion of the synthetic sequences clearly separated the non-secreted and highly secreted variants into distinct classes. Together, these results indicate that amphipathic sequences with alternating serine and isoleucine residues are preferred, which implies that both polar and hydrophobic interactions between the secretion apparatus and the side chains at the N-terminus of YopE are required for optimal secretion.
We believe that the results obtained using our library of synthetic sequences is relevant to native substrates as well, as 38 of the 58 native substrates examined were predicted by the model to be highly secreted, whereas 19 fell into the weakly secreted category and only one substrate was scored as non-secreted. The fact that the N-termini of native substrates are accurately predicted to be type III secretion signals suggests that it may be possible to develop algorithms with predictive capabilities. Towards this end, our model could be improved by taking into account the individual properties of each amino acid resi-due, as well as weighting each residue according to its prevalence at the N-termini of known type III substrates. This approach, possibly in combination with neural net algorithms, may allow the identification of type III substrates from genome sequences.
The fact that the hydrophobicity of the N-terminal secretion signal influences the efficiency of type III secretion is interesting given that secretion signal hydrophobicity plays an important role in other secretion systems. For example, the Sec-dependent secretion signal sequence consists of one or more positive charges at the N-terminus, immediately followed by a stretch of predominantly hydrophobic amino acids roughly 16–25 amino acids in length (for a review, see Pugsley, 1993). In bacteria, presecretory proteins are post-translationally targeted via the chaperone SecB to a translocon (Kumamoto and Beckwith, 1985), whereas inner membrane proteins are co-translationally inserted into the membrane by the signal recognition particle (SRP) (Ulbrandt et al., 1997;de Gier et al., 1998). Recent work by Bernstein and colleagues demonstrated that the hydrophobicity of the targeting sig-nal determines which pathway, either SecB or SRP, is utilized by a given substrate (Lee and Bernstein, 2001). Specifically, they demonstrated that by increasing the hydrophobicity of the maltose binding protein (MBP) sig-nal peptide, MBP is diverted from the SecB pathway into the SRP pathway.
Bacteria also possess a twin-arginine translocation (TAT) pathway of protein export, which is distinct from the Sec-dependent secretion system (Dalbey and Robinson, 1999). The TAT signal sequence consists of two conserved arginines at the N-terminus of the protein followed by a predominantly hydrophobic segment, albeit less hydrophobic than those utilized by the Sec pathway (Cristobal et al., 1999). Interestingly, von Heijne and co-workers demonstrated that by increasing the hydrophobicity of the TAT signal sequence they could re-route the TorA protein into the Sec pathway (Cristobal et al., 1999).
Together, these results demonstrate that bacterial protein secretion systems often utilize differences in secretion signal hydrophobicity to discriminate between substrates. Assuming that the Sec pathway is the pri-mordial protein secretion system (as it is found in all bacteria and there are components of the type III/flagellar and type IV secretion systems that rely upon the Sec system for their export), it seems likely that as subsequent secretion systems evolved, new types of secretion signals must have evolved as well. Therefore, the utilization of amphipathic secretion signals by type III systems may be a way to distinguish their substrates from those of other secretion systems.
Media and growth conditions
The liquid growth medium for Yersinia strains consisted of brain–heart infusion (BHI) broth (Difco) supplemented with 5 mM EGTA and 20 mM MgCl2. Escherichia strains were grown in Luria–Bertani (LB) broth or on LB agar (Davis et al., 1980). Bacteria containing plasmids expressing YopE were grown in the presence of 100 μg ml–1 carbenicillin. Yersinia strains also were grown in 50 μg ml–1 kanamycin to maintain selection of the virulence plasmid.
The YopE variants were made by polymerase chain reaction (PCR) amplification of YPIII(pIB102) genomic DNA with Pfx DNA polymerase (Life Technologies). The 5′ primer contained an NdeI site (which encodes the ATG initiation codon), followed by seven codons containing the synthetic sequences of interest. In all cases, AGC was used to encode serine and ATC was chosen as the codon for isoleucine. The remainder of the 5′ primer was identical to codons 9–16 of the native yopE sequence. The 3′ primer yopE primer incorporated a HindIII site. yopE fragments were digested with NdeI and HindIII and cloned into the same sites of the pET-22b vector. The yopE constructs were then moved into the pBAD18 vector using the restriction enzymes HindIII and XbaI. This places the yopE mutants under the control of an arabinose-inducible promoter in which the Shine–Dalgarno site is from the pET-22b vector. To engineer synthetic N-terminal variants in the YopE(Δ50–74) mutant that do not bind YerA, plasmid pSA8 served as the template in the above PCR amplification step; the subsequent cloning steps were ex-actly the same. pSA8 was made by PCR amplification of YPIII(pIB102) genomic DNA with Pfx DNA polymerase using the primer pairs SA6/SA21 and SA22/SA5. The 5′ primer SA5 is located upstream of the yopE promoter, whereas the 3′ primer SA22 extends from codon 49 of yopE. The 5′ primer SA21 starts at codon 75 of yopE, whereas the 3′ primer SA6 is located at the end of the yopE coding sequence. The products of the above PCRs were combined and amplified with SA5 and SA6 to yield a fragment containing the yopE promoter, codons 1–49 and 75–219 of yopE. This fragment was digested with XbaI and SphI and ligated into the same sites in the pDM4 vector (Milton et al., 1996). A list of the different oligonucleotides and plasmids used in this study is available from the authors upon request.
YopE secretion assay
Overnight cultures of Yersinia pseudotuberculosis strains expressing the different YopE variants were grown in BHI medium containing EGTA at 26°C. The cultures were diluted to OD600 of 0.2 into 2 ml of fresh medium containing 0.1% Triton X-100 and grown at 26°C for 1 h. Arabinose was added to a final concentration of 0.2%, and the cultures were grown for an additional 2 h at 37°C to induce secretion. It should be noted that full-length YopE variants that contained synthetic N-terminal sequences consisting solely of isoleucine or possessing a single serine residue were sometimes toxic when expression was fully induced with 0.2% arabinose. This was evidenced by an impairment of growth and accompanying cell lysis. Therefore, these variants, totalling eight sequences, were expressed at a lower arabinose concentration (0.02%), which did not impair cell growth. The same YopE(Δ50–74) variants did not display any toxicity and were fully induced with 0.2% arabinose. The cultures were then centrifuged at 20 800 g for 2 min. The supernatants containing the secreted Yops were passed through 0.45 μM filters and precipitated with 10% trichloroacetic acid (TCA) for 2 h on ice. The TCA precipitates were centrifuged at 20 800 g for 10 min, the supernatants were discarded, and the remaining pellets were resuspended in 100 μl of 2% SDS and precipitated with acetone at –20°C for 2 h. The samples were centrifuged at 20 800 g for 10 min, the supernatants were discarded, and the pellets were air-dried. The pellets were then resuspended in 30 μl of 8 M urea, and an equal amount of 2x sample buffer was added. The bacterial cell pellets were resuspended in 30 μl of H20 and an equal amount of 2x sample buffer. Equal amounts of culture supernatant and cell pellet fractions were separated by SDS–PAGE and transferred to a nitrocellulose membrane as described previously (Lloyd et al., 2001). YopE was detected with a polyclonal anti-YopE antiserum. The level of secretion of each YopE mutant was classified according to the following scale: non-secreted, no YopE secretion observed; low secretion, the level of YopE in the supernatant fraction was less that that seen in the pellet fraction; and high secretion, the level of YopE in the supernatant fraction was equal to, or greater than, that observed in the pellet fraction. YopE secretion was measured in three independent experiments. For the non-secreted variants, no YopE secretion was observed in any of the experiments. For the low-secretion variants, YopE secretion was observed in at least one experiment and the amount of YopE in the culture supernatant was always less than that observed in the cell pellet fraction. For 38 out of the high-secretion variants, an equal or greater amount of YopE was seen in the culture supernatants in at least two of the experiments. For the remaining five highly secreted variants, an equal or greater amount of YopE was seen in the culture supernatants in one experiment.
Multiple linear regression analysis
Initially a model with all linear, two factor and three factor terms was fit to the data using the following equation:
in which b0 is the model intercept, b2, b3 . . . , etc. are regression coefficients and p2, p3, etc. are –1 for serine and +1 for isoleucine in position 2 and position 3, etc. The y-vector was defined as 0 for non-secreted, 1 for weakly secreted and 2 for highly secreted sequences. Native type III secretion sequences were assigned a value of 2. In total, 64 regression coefficients were calculated. The R2 (explained variance in secretion) was 0.86 and the cross-validated R2, here denoted Q2, was 0.142. The model matrix was almost orthogonal (condition number 2.8), which made it possible to calculate confidence intervals for the coefficients. Coefficients that were not significant on the 90% level were deleted, and the model was recalculated. However, if a higher term was significant, corresponding terms with lower complexity were kept to preserve the model hierarchy. The intercept (b0) of the model was 1.1934. The R2 for the model was 0.78 and the Q2 was 0.59, and the model matrix was close to orthogonal (condition number 1.52). To analyse native sequences, all residues were converted to binary form (–1 for hydrophilic and +1 for hydrophobic residues) and the y-vector was as-signed a value of 2. Calculations were performed with the statistical package MODDE (Umetrics AB,