Direct out-of-memory distributed parallel frequent pattern mining

  • Z. Rong
  • , J. Knijf, De

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    Abstract

    Frequent itemset mining is a well studied and important problem in the datamining community. An abundance of different mining algorithms exists, all with different flavor and characteristics, but almost all suffer from two major shortcomings. First, in general frequent itemset mining algorithms perform exhaustive search over a huge pattern space. Second, most algorithms assume that the input data fits into main memory. The first problem was recently tackled in the work of [2], by direct sampling the required number of patterns over the pattern space. This paper extends the direct sampling approach by casting the algorithm into the MapReduce framework, effectively ceasing the memory requirements that the data should fit into main memory. The results show that the algorithm scales well for large data sets, while the memory requirements are solely dependent on the required number of patterns in the output.
    Original languageEnglish
    Title of host publicationProceedings of the 2nd International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (BigMine'13, Chicago IL, USA, August 11, 2013; in conjuncion with SIGKDD'13)
    Place of PublicationNew York NY
    PublisherAssociation for Computing Machinery, Inc.
    Pages55-62
    ISBN (Print)978-1-4503-2324-6
    DOIs
    Publication statusPublished - 2013

    Fingerprint

    Dive into the research topics of 'Direct out-of-memory distributed parallel frequent pattern mining'. Together they form a unique fingerprint.

    Cite this