Abstract
Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space.
Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.
| Original language | English |
|---|---|
| Title of host publication | Proceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine'13, Chicago IL, USA, August 11, 2013; in conjuncion with SIGKDD'13) |
| Place of Publication | New York NY |
| Publisher | Association for Computing Machinery, Inc. |
| Pages | 55-62 |
| ISBN (Print) | 978-1-4503-2324-6 |
| DOIs | |
| Publication status | Published - 2013 |
Fingerprint
Dive into the research topics of 'Direct out-of-memory distributed parallel frequent pattern mining'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver