Improved pan-specific MHC class I peptide-binding predictions using a novel representation of the MHC-binding cleft environment

Authors

  • S. Carrasco Pro,

    1. Laboratorio de Bioinformática y Biología Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
    Search for more papers by this author
  • M. Zimic,

    1. Laboratorio de Bioinformática y Biología Molecular, Laboratorios de Investigación y Desarrollo, Facultad de Ciencias y Filosofía, Universidad Peruana Cayetano Heredia, Lima, Peru
    Search for more papers by this author
  • M. Nielsen

    Corresponding author
    1. Center for Biological Sequence Analysis, Department of Systems Biology, Technical University of Denmark, Lyngby, Denmark
    2. Instituto de Investigaciones Biotecnológicas, Universidad Nacional de San Martín, Buenos Aires, Argentina
    • Correspondence

      Morten Nielsen

      Center for Biological Sequence Analysis

      Department of Systems Biology

      Technical University of Denmark

      Building 208, Kemitorvet

      Lyngby, Denmark

      Tel: +45 4525 2425

      Fax: +45 4593 1585

      e-mail: mniel@cbs.dtu.dk

    Search for more papers by this author

Abstract

Major histocompatibility complex (MHC) molecules play a key role in cell-mediated immune responses presenting bounded peptides for recognition by the immune system cells. Several in silico methods have been developed to predict the binding affinity of a given peptide to a specific MHC molecule. One of the current state-of-the-art methods for MHC class I is NetMHCpan, which has a core ingredient for the representation of the MHC class I molecule using a pseudo-sequence representation of the binding cleft amino acid environment. New and large MHC–peptide-binding data sets are constantly being made available, and also new structures of MHC class I molecules with a bound peptide have been published. In order to test if the NetMHCpan method can be improved by integrating this novel information, we created new pseudo-sequence definitions for the MHC-binding cleft environment from sequence and structural analyses of different MHC data sets including human leukocyte antigen (HLA), non-human primates (chimpanzee, macaque and gorilla) and other animal alleles (cattle, mouse and swine). From these constructs, we showed that by focusing on MHC sequence positions found to be polymorphic across the MHC molecules used to train the method, the NetMHCpan method achieved a significant increase in the predictive performance, in particular, of non-human MHCs. This study hence showed that an improved performance of MHC-binding methods can be achieved not only by the accumulation of more MHC–peptide-binding data but also by a refined definition of the MHC-binding environment including information from non-human species.

Ancillary