SEARCH

SEARCH BY CITATION

Mining erasable itemsets are one of new emerging data mining tasks. In this paper, we present a new data representation called a PID_list, which keeps track of the id_nums (identification number) of products that include an itemset. On the basis of the PID_list, we propose a new algorithm called VM for mining top-rank-k erasable itemsets efficiently. The VM algorithm can avoid the time-consuming process of calculating the gain of the candidate itemsets and lots of scans of the databases. Therefore, it can accelerate the task of mining greatly. For evaluating the VM algorithm, we have conducted experiments on six synthetic product databases. Our performance study shows that the VM algorithm is efficient and much faster than the MIKE algorithm, which is the first algorithm for dealing with the problem of mining top-rank-k erasable itemsets.