Mining frequent itemsets from transactional datasets is a well known problem with good algorithmic solutions. In the case of uncertain data, however, several new techniques have been proposed. Unfortunately, these proposals often suffer when a lot of items occur with many different probabilities. Here we propose an approach based on sampling by instantiating "possible worlds" of the uncertain data, on which we subsequently run optimized frequent itemset mining algorithms. As such we gain efficiency at a surprisingly low loss in accuracy. These is confirmed by a statistical and an empirical evaluation on real and synthetic data.
|Title of host publication||Advances in Knowledge Discovery and Data Mining (14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, June 21-24, 2010. Proceedings, Part I)|
|Editors||M.J. Zaki, J.X. Yu, B. Ravindran, V. Pudi|
|Place of Publication||Berlin|
|Publication status||Published - 2010|
|Name||Lecture Notes in Computer Science|