Get access

Adaptive concept drift detection

Authors


Abstract

An established method to detect concept drift in data streams is to perform statistical hypothesis testing on the multivariate data in the stream. The statistical theory offers rank-based statistics for this task. However, these statistics depend on a fixed set of characteristics of the underlying distribution. Thus, they work well whenever the change in the underlying distribution affects the properties measured by the statistic, but they perform not very well, if the drift influences the characteristics caught by the test statistic only to a small degree. To address this problem, we show how uniform convergence bounds in learning theory can be adjusted for adaptive concept drift detection. In particular, we present three novel drift detection tests, whose test statistics are dynamically adapted to match the actual data at hand. The first one is based on a rank statistic on density estimates for a binary representation of the data, the second compares average margins of a linear classifier induced by the 1-norm support vector machine (SVM), and the last one is based on the average zero-one, sigmoid or stepwise linear error rate of an SVM classifier. We compare these new approaches with the maximum mean discrepancy method, the StreamKrimp system, and the multivariate Wald–Wolfowitz test. The results indicate that the new methods are able to detect concept drift reliably and that they perform favorably in a precision-recall analysis. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 311-327, 2009

Get access to the full text of this article

Ancillary