IMPROVING MULTILABEL CLASSIFICATION BY AVOIDING IMPLICIT NEGATIVITY WITH INCOMPLETE DATA

Authors


Abstract

Many real-world problems require multilabel classification, in which each training instance is associated with a set of labels. There are many existing learning algorithms for multilabel classification; however, these algorithms assume implicit negativity, where missing labels in the training data are automatically assumed to be negative. Additionally, many of the existing algorithms do not handle incremental learning in which new labels could be encountered later in the learning process. A novel multilabel adaptation of the backpropagation algorithm is proposed that does not assume implicit negativity. In addition, this algorithm can, using a naïve Bayesian approach, infer missing labels in the training data. This algorithm can also be trained incrementally as it dynamically considers new labels. This solution is compared with existing multilabel algorithms using data sets from multiple domains, and the performance is measured with standard multilabel evaluation metrics. It is shown that our algorithm improves classification performance for all metrics by an overall average of 7.4% when at least 40% of the labels are missing from the training data and improves by 18.4% when at least 90% of the labels are missing.

Ancillary