Fast and accurate detection of changes in data streams



Change detection is one of the most important tasks in time series analysis. When the series is very long, or when it is rapidly updated, it has to be treated as a stream. This means that the change detection algorithm must process each sample in O (1) time and memory. A good algorithm must be generic in terms of the type of changes it can detect. Beyond all, a good algorithm must present a favorable and controlled ratio of the number of samples needed to detect a change to the rate of false positives. We present a change-point detection algorithm called ProTO which dynamically manages a set of candidate change-points whose expected size is a controllable constant. In terms of sample processing, ProTO is comparable with the fastest known algorithm—the Page-Hinkley Test (PHT). Yet, because PHT is limited to just one candidate, ProTO outperforms it in terms of the ratio of the delay to the false positive rate, as well as in terms of robustness. We provide variants of ProTO for detecting changes in the mean or the variance of the stream, and experiment with two realistic applications, as well as with synthetic data. On real problems, ProTO compares favorably with state-of-the-art algorithms implemented in the R-package, which require more than O (1) time per sample.