Finding robust itemsets under subsampling

N. Tatti, F. Moerchen, T.G.K. Calders

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    11 Citaten (Scopus)
    2 Downloads (Pure)

    Samenvatting

    Mining frequent patterns is plagued by the problem of pattern explosion, making pattern reduction techniques a key challenge in pattern mining. In this article we propose a novel theoretical framework for pattern reduction by measuring the robustness of a property of an itemset such as closedness or nonderivability. The robustness of a property is the probability that this property holds on random subsets of the original data. We study four properties, namely an itemset being closed, free, non-derivable, or totally shattered, and demonstrate how to compute the robustness analytically without actually sampling the data. Our concept of robustness has many advantages: Unlike statistical approaches for reducing patterns, we do not assume a null hypothesis or any noise model and, in contrast to noise-tolerant or approximate patterns, the robust patterns for a given property are always a subset of the patterns with this property. If the underlying property is monotonic then the measure is also monotonic, allowing us to efficiently mine robust itemsets. We further derive a parameter-free technique for ranking itemsets that can be used for top-k approaches. Our experiments demonstrate that we can successfully use the robustness measure to reduce the number of patterns and that ranking yields interesting itemsets. Keywords: Pattern reduction, robust itemsets, closed itemsets, free itemsets, non-derivable itemsets, totally shattered itemsets
    Originele taal-2Engels
    Pagina's (van-tot)20-1/27
    Aantal pagina's27
    TijdschriftACM Transactions on Database Systems
    Volume39
    Nummer van het tijdschrift3
    DOI's
    StatusGepubliceerd - 2014

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Finding robust itemsets under subsampling'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit