SEARCH

SEARCH BY CITATION

Keywords:

  • feature selection;
  • high-dimension feature;
  • oligopeptides;
  • quantitative sequence–activity model;
  • support vector machine

Five hundred and thirty-one physicochemical property parameters of amino acids were directly used as descriptors to characterize the structure of oligopeptides. Based on support vector regression (SVR), a novel rapid selection method called binary matrix resetting filter (BMRF) was proposed to nonlinearly select high-dimensional features and then multiround last-elimination (MRLE) was used for subtle screening. The reserved descriptors were used to construct the regression model with SVR, which was then applied to the quantitative sequence–activity model (QSAM) analysis for two oligopeptide systems. Compared with the widely used 16 kinds of amino acid descriptors, four QSAM modeling methods and four feature selection methods, our work shows a significant improvement in modeling performance, especially in external prediction. Furthermore, the real biochemical significance corresponding to reserved descriptors can be given directly, and the interpretability of the established QSAM model is improved significantly. This novel method has a high potential to become an available tool for regression analysis of high-dimension data, such as QSAM modeling of peptides or even proteins.