Mining compressing sequential problems

T.L. Hoang, F. Mörchen, D. Fradkin, T.G.K. Calders

    Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

    2 Downloads (Pure)

    Abstract

    Compression based pattern mining has been successfully applied to many data mining tasks. We propose an approach based on the minimum description length principle to extract sequential patterns that compress a database of sequences well. We show that mining compressing patterns is NP-Hard and belongs to the class of inapproximable problems. We propose two heuristic algorithms to mining compressing patterns. The ¿rst uses a two-phase approach similar to Krimp for itemset data. To overcome performance with the required candidate generation we propose GoKrimp, an e¿ective greedy algorithm that directly mines compressing patterns. We conduct an empirical study on six real-life datasets to compare the proposed algorithms by run time, compressibility, and classi¿cation accuracy using the patterns found as features for SVM classi¿ers.
    Original languageEnglish
    Title of host publicationProceedings of the Twelfth SIAM International Conference on Data Mining (SDM 2012, Anaheim CA, USA, April 26-28, 2012)
    PublisherSociety for Industrial and Applied Mathematics (SIAM)
    Pages319-330
    ISBN (Print)978-1-61197-232-
    Publication statusPublished - 2012
    Eventconference; Twelfth SIAM International Conference on Data Mining -
    Duration: 1 Jan 2012 → …

    Conference

    Conferenceconference; Twelfth SIAM International Conference on Data Mining
    Period1/01/12 → …
    OtherTwelfth SIAM International Conference on Data Mining

    Fingerprint

    Dive into the research topics of 'Mining compressing sequential problems'. Together they form a unique fingerprint.

    Cite this