Z-BAG: A CLASSIFICATION ENSEMBLE SYSTEM WITH POSTERIOR PROBABILISTIC OUTPUTS

Authors

  • Zhonghui Xu,

    1. Perinatology Research Branch, Department of Health and Human Services, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, and Detroit, Michigan, USA
    Search for more papers by this author
    • These authors have contributed equally to this manuscript and should be considered joint first authors.

  • Călin Voichiţa,

    1. Department of Computer Science, Wayne State University, Detroit, Michigan, USA
    Search for more papers by this author
    • These authors have contributed equally to this manuscript and should be considered joint first authors.

  • Sorin Drăghici,

    1. Perinatology Research Branch, Department of Health and Human Services, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, and Detroit, Michigan, USA
    2. Department of Computer Science, Wayne State University, Detroit, Michigan, USA
    3. Department of Obstetrics & Gynecology, and Department of Clinical and Translational Science, Wayne State University, Detroit, Michigan, USA
    Search for more papers by this author
  • Roberto Romero

    1. Perinatology Research Branch, Department of Health and Human Services, National Institute of Child Health and Human Development, National Institutes of Health, Bethesda, Maryland, and Detroit, Michigan, USA
    2. Center for Molecular Medicine and Genetics, Wayne State University, Detroit, Michigan, USA
    Search for more papers by this author

Sorin Draghici, Department of Computer Science, Wayne State University, 5057 Woodward Ave., Suite 3010, Detroit, MI 48202, USA; e-mail: sorin@wayne.edu.

Abstract

Ensemble systems improve the generalization of single classifiers by aggregating the prediction of a set of base classifiers. Assessing classification reliability (posterior probability) is crucial in a number of applications, such as biomedical and diagnosis applications, where the cost of a misclassified input vector can be unacceptable high. Available methods are limited to either calibrate the posterior probability on an aggregated decision value or obtain a posterior probability for each base classifier and aggregate the result. We propose a method that takes advantage of the distribution of the decision values from the base classifiers to summarize a statistic which is subsequently used to generate the posterior probability. Three approaches are considered to fit the probabilistic output to the statistic: the standard Gaussian CDF, isotonic regression, and linear logistic. Even though this study focuses on a bagged support vector machine ensemble (Z-bag), our approach is not limited by the aggregation method selected, the choice of base classifiers, nor the statistic used. Performance is assessed on one artificial and 12 real-world data sets from the UCI Machine Learning Repository. Our approach achieves comparable or better generalization on accuracy and posterior estimation to existing ensemble calibration methods although lowering computational cost.

Ancillary