Knowledge discovery in data streams with regression tree methods



This paper presents an advanced review of regression tree methods for mining data streams. Batch regression tree methods are known for their simplicity, interpretability, accuracy, and efficiency. They use fast divide-and-conquer greedy algorithms that recursively partition the given training data into smaller subsets. The result is a tree-shaped model with splitting rules in the internal nodes and predictions in the leaves. Most batch regression tree methods take a complete dataset and build a model using that data. Generally, this tree model cannot be modified if new data is acquired later. Their successors, the incremental model and interval trees algorithms, are able to build and retrain a model on a step-by-step basis by incorporating new numerical training instances into the model as they become available. Moreover, these algorithms produce even more compact and accurate models than batch regression tree algorithms because they use intervals or functional models with a change detection mechanism, which makes them a more suitable choice for regression analysis of data streams. Finally, this review summarizes the performance results of the reviewed methods and crystallizes 10 requirements for successful implementation of a regression tree algorithm in data stream mining area. © 2011 Wiley Periodicals, Inc.