Mining frequent itemsets in a stream

T.G.K. Calders, N. Dexters, J.J.M. Gillis, B. Goethals

    Onderzoeksoutput: Bijdrage aan tijdschriftTijdschriftartikelAcademicpeer review

    52 Citaten (Scopus)
    1 Downloads (Pure)

    Samenvatting

    Mining frequent itemsets in a datastream proves to be a difficult problem, as itemsets arrive in rapid succession and storing parts of the stream is typically impossible. Nonetheless, it has many useful applications; e.g., opinion and sentiment analysis from social networks. Current stream mining algorithms are based on approximations. In earlier work, mining frequent items in a stream under the max-frequency measure proved to be effective for items. In this paper, we extended our work from items to itemsets. Firstly, an optimized incremental algorithm for mining frequent itemsets in a stream is presented. The algorithm maintains a very compact summary of the stream for selected itemsets. Secondly, we show that further compacting the summary is non-trivial. Thirdly, we establish a connection between the size of a summary and results from number theory. Fourthly, we report results of extensive experimentation, both of synthetic and real-world datasets, showing the efficiency of the algorithm both in terms of time and space.
    Originele taal-2Engels
    Pagina's (van-tot)233-255
    TijdschriftInformation Systems
    Volume39
    Nummer van het tijdschrift1
    DOI's
    StatusGepubliceerd - 2014

    Vingerafdruk

    Duik in de onderzoeksthema's van 'Mining frequent itemsets in a stream'. Samen vormen ze een unieke vingerafdruk.

    Citeer dit