Get access

ROC-Based Utility Function Maximization for Feature Selection and Classification with Applications to High-Dimensional Protease Data

Authors

  • Zhenqiu Liu,

    Corresponding author
    1. Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, Maryland 21201, U.S.A.
      email: zliu@umm.edu
    Search for more papers by this author
  • Ming Tan

    1. Division of Biostatistics, University of Maryland Greenebaum Cancer Center, 22 South Greene Street, Baltimore, Maryland 21201, U.S.A.
    Search for more papers by this author

email: zliu@umm.edu

Abstract

Summary In medical diagnosis, the diseased and nondiseased classes are usually unbalanced and one class may be more important than the other depending on the diagnosis purpose. Most standard classification methods, however, are designed to maximize the overall accuracy and cannot incorporate different costs to different classes explicitly. In this article, we propose a novel nonparametric method to directly maximize the weighted specificity and sensitivity of the receiver operating characteristic curve. Combining advances in machine learning, optimization theory, and statistics, the proposed method has excellent generalization property and assigns different error costs to different classes explicitly. We present experiments that compare the proposed algorithms with support vector machines and regularized logistic regression using data from a study on HIV-1 protease as well as six public available datasets. Our main conclusion is that the performance of proposed algorithm is significantly better in most cases than the other classifiers tested. Software package in MATLAB is available upon request.

Get access to the full text of this article

Ancillary