The hallmark of the immune system is its ability to recognize and distinguish between self and nonself (potential pathogen). T cells do this by recognizing peptides that are bound to major histocompatibility complex (MHC) receptors. A number of methods for predicting the binding of peptides to MHC molecules have been developed (for review, see Schirle et al. 2001) since the first motif methods were presented (Rothbard and Taylor 1988; Sette et al. 1989). The discovery of allele specific motifs (Falk et al. 1991) lead to the development of more accurate algorithms (Pamer et al. 1991; Rötzschke et al. 1991). In the simpler prediction tools it is assumed that the amino acids at each position along the peptide sequence contribute with a given binding energy, which can be independently added up to yield the overall binding energy of the peptide (Parker et al. 1994; Meister et al. 1995; Stryhn et al. 1996). Similar types of approaches are used by the EpiMatrix method (Schafer et al. 1998), the BIMAS method (Parker et al. 1994), and the SYFPEITHI method (Rammensee et al. 1999). These predictions, however, fail to recognize correlated effects where the binding affinity of a given amino acid at one position is influenced by amino acids at other positions in the peptide. Two adjacent amino acids may, for example, compete for the space in a pocket in the MHC molecule. Artificial neural networks (ANN) are ideally suited to take such correlations into account and neural network methods for predicting whether or not a peptide binds MHC molecules have earlier been developed (Brusic et al. 1994; S. Buus, S.L. Lauemøller, P. Worning, C. Kesmir, T. Frimurer, S. Corbet, A. Fomsgaard, J. Hilden, A. Holm, and S. Brunak, in prep.). Brusic et al. (1994). use a conventional sparse (orthogonal) encoding of the 20 amino acid alphabet as well as 6 and 9 letter reduced alphabets. The conventional sparse encoding of the amino acids ignores their chemical similarities. Here we use a combination of several sequence encoding strategies to take these similarities into account, explicitly. The different encoding schemes are defined in terms of Blosum matrices and hidden Markov models, in addition to the conventional sparse encoding.
More detailed predictions of peptide binding have been made by dividing binding affinities into classes of affinity ranges, and by inverting the networks it was found that the different classes are associated with different binding sequence motifs (Adams and Koziol 1995). Neural networks have also been trained to predict MHC binding using different affinity thresholds (Gulukota et al. 1997). Mamitsuka (1998) trained the transition and emission probabilities of a fully connected hidden Markov model using a steepest descent algorithm so as to minimize the differences between the predicted and target probabilities for each peptide. Using this method he obtained better results than using neural networks or hidden Markov models. We had earlier developed matrix methods (Lauemøller et al. 2001) and ANNs, which are special in that they are trained to predict quantitative (continuous) values for binding affinities between peptides and the human MHC molecule HLA-A2 (S. Buus, S.L. Lauemøller, P. Worning, C. Kesmir, T. Frimurer, S. Corbet, A. Fomsgaard, J. Hilden, A. Holm, and S. Brunak, in prep.). Buus et al. have demonstrated that neural networks trained to perform quantitative predictions of peptide MHC binding are superior to conventional classification neural networks trained to predict binding versus nonbinding.
In this paper we describe an improved method that extends the neural network approach (described by S. Buus, S.L. Lauemøller, P. Worning, C. Kesmir, T. Frimurer, S. Corbet, A. Fomsgaard, J. Hilden, A. Holm, and S. Brunak, in prep.).using a combination of several neural networks defined using a number of different sequence encoding strategies including a hidden Markov model encoding to achieve a more accurate prediction of the peptide/MHC binding affinity.