A preliminary version appeared as ‘MARBLES: Mining Association Rules Buried in Long Event Sequences’, in Proceedings of the SIAM International Conference on Data Mining (SDM), 2012 1.
Article first published online: 12 AUG 2013
© 2013 Wiley Periodicals, Inc.
Statistical Analysis and Data Mining: The ASA Data Science Journal
Volume 7, Issue 2, pages 93–110, April 2014
How to Cite
Cule, B., Tatti, N. and Goethals, B. (2014), MARBLES: Mining association rules buried in long event sequences. Statistical Analy Data Mining, 7: 93–110. doi: 10.1002/sam.11199
This article is a part of the special issue based on the Best of SDM 2012, Statistical Analysis and Data Mining, volume 7, issue 1.
- Issue published online: 23 APR 2014
- Article first published online: 12 AUG 2013
- Manuscript Revised: 1 JUL 2013
- Manuscript Accepted: 1 JUL 2013
- Manuscript Received: 20 JUL 2012
- association rules;
- closed patterns;
- confidence boost;
- sequential data
Sequential pattern discovery is a well-studied field in data mining. Episodes are sequential patterns that describe events that often occur in the vicinity of each other. Episodes can impose restrictions on the order of the events, which makes them a versatile technique for describing complex patterns in the sequence. Most of the research on episodes deals with special cases such as serial and parallel episodes, while discovering general episodes is surprisingly understudied. This is particularly true when it comes to discovering association rules between them.
In this paper we propose an algorithm that mines association rules between two general episodes. On top of the traditional definitions of frequency and confidence, we introduce two novel confidence measures for the rules. The major challenge in mining these association rules is pattern explosion. To limit the output, we aim to eliminate all redundant rules. We define the class of closed association rules and show that this class contains all non-redundant output. To make the algorithm efficient, we use further pruning steps along the way. First of all, we generate only free and closed frequent episodes from which we create candidate rules, we speed up the evaluation of the rules, and then prune the remaining non-closed rules from the output. Finally, we provide the user with the additional option of using a confidence boost threshold to remove the less informative rules from the output.