Knowledge discovery in data streams with regression tree methods
Article first published online: 31 OCT 2011
Copyright © 2011 John Wiley & Sons, Inc.
Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
Volume 2, Issue 1, pages 69–78, January/February 2012
How to Cite
Alberg, D., Last, M. and Kandel, A. (2012), Knowledge discovery in data streams with regression tree methods. WIREs Data Mining Knowl Discov, 2: 69–78. doi: 10.1002/widm.51
- Issue published online: 19 DEC 2011
- Article first published online: 31 OCT 2011
This paper presents an advanced review of regression tree methods for mining data streams. Batch regression tree methods are known for their simplicity, interpretability, accuracy, and efficiency. They use fast divide-and-conquer greedy algorithms that recursively partition the given training data into smaller subsets. The result is a tree-shaped model with splitting rules in the internal nodes and predictions in the leaves. Most batch regression tree methods take a complete dataset and build a model using that data. Generally, this tree model cannot be modified if new data is acquired later. Their successors, the incremental model and interval trees algorithms, are able to build and retrain a model on a step-by-step basis by incorporating new numerical training instances into the model as they become available. Moreover, these algorithms produce even more compact and accurate models than batch regression tree algorithms because they use intervals or functional models with a change detection mechanism, which makes them a more suitable choice for regression analysis of data streams. Finally, this review summarizes the performance results of the reviewed methods and crystallizes 10 requirements for successful implementation of a regression tree algorithm in data stream mining area. © 2011 Wiley Periodicals, Inc.