BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams

M. Hassani (Corresponding author), D. Töws, A. Cuzzocrea, T. Seidl

Research output: Contribution to journalArticleAcademicpeer-review

19 Citations (Scopus)
114 Downloads (Pure)

Abstract

Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.

Keywords
Sequential pattern mining Data streams Batch-free
Original languageEnglish
Pages (from-to)223-239
Number of pages17
JournalInternational Journal of Data Science and Analytics
Volume8
Issue number3
DOIs
Publication statusPublished - 1 Oct 2019
Event6th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (KDD 2017) - Halifax, Canada
Duration: 14 Aug 201714 Aug 2017

Keywords

  • Batch-free
  • Data streams
  • Sequential pattern mining

Fingerprint

Dive into the research topics of 'BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams'. Together they form a unique fingerprint.

Cite this