Get access

Time-decayed correlated aggregates over data streams

Authors


  • A preliminary version of this paper appeared in the Proceedings of the 9th SIAM International Conference on Data Mining (SDM) 2009.

Abstract

Data stream analysis frequently relies on identifying correlations and posing conditional queries on the data after it has been seen. Correlated aggregates form an important example of such queries, which ask for an aggregation over one dimension of stream elements which satisfy a predicate on another dimension. Since recent events are typically more important than older ones, time decay should also be applied to downweight less significant values. We present space-efficient algorithms as well as space lower bounds for the time-decayed correlated sum, a problem at the heart of many related aggregations. By considering different fundamental classes of decay functions, we separate cases where efficient approximations with relative error or additive error guarantees are possible, from other cases where linear space is necessary to approximate. In particular, we show that no efficient algorithms with relative error guarantees are possible for the popular sliding window and exponential decay models, resolving an open problem. This negative result for the exponential decay holds even if the stream is allowed to be processed in multiple passes. The results are surprising, since efficient approximations are known for other data stream problems under these decay models. This is a step toward better understanding which sophisticated queries can be answered on massive streams using limited memory and computation. Copyright © 2009 Wiley Periodicals, Inc. Statistical Analysis and Data Mining 2: 294-310, 2009

Get access to the full text of this article

Ancillary