Mining Top-Rank-k Erasable Itemsets by PID_lists

Authors

  • Zhihong Deng

    Corresponding author
    • Key Laboratory of Machine Perception (Ministry of Education), School of Electronics Engineering and Computer Science, Peking University, Beijing 100871, People's Republic of China
    Search for more papers by this author

Author to whom all correspondence should be addressed: e-mail: zhdeng@cis.pku.edu.cn.

Abstract

Mining erasable itemsets are one of new emerging data mining tasks. In this paper, we present a new data representation called a PID_list, which keeps track of the id_nums (identification number) of products that include an itemset. On the basis of the PID_list, we propose a new algorithm called VM for mining top-rank-k erasable itemsets efficiently. The VM algorithm can avoid the time-consuming process of calculating the gain of the candidate itemsets and lots of scans of the databases. Therefore, it can accelerate the task of mining greatly. For evaluating the VM algorithm, we have conducted experiments on six synthetic product databases. Our performance study shows that the VM algorithm is efficient and much faster than the MIKE algorithm, which is the first algorithm for dealing with the problem of mining top-rank-k erasable itemsets.

Ancillary