BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams

M. Hassani (Corresponding author), D. Töws, A. Cuzzocrea, T. Seidl

Research output: Contribution to journalArticleAcademicpeer-review

5 Downloads (Pure)

Abstract

Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.

Keywords
Sequential pattern mining Data streams Batch-free
Original languageEnglish
Pages (from-to)223-239
JournalInternational Journal of Data Science and Analytics
Volume8
Issue number3
DOIs
Publication statusPublished - 2019
Event6th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (KDD 2017) - Halifax, Canada
Duration: 14 Aug 201714 Aug 2017

Fingerprint

Miners
Processing
Data mining

Cite this

@article{a44453cb36464a0b8f2ad6c46f2841e9,
title = "BFSPMiner: an effective and efficient batch-free algorithm for mining sequential patterns over data streams",
abstract = "Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.KeywordsSequential pattern mining Data streams Batch-free",
author = "M. Hassani and D. T{\"o}ws and A. Cuzzocrea and T. Seidl",
year = "2019",
doi = "10.1007/s41060-017-0084-8",
language = "English",
volume = "8",
pages = "223--239",
journal = "International Journal of Data Science and Analytics",
issn = "2364-415X",
publisher = "Springer",
number = "3",

}

BFSPMiner : an effective and efficient batch-free algorithm for mining sequential patterns over data streams. / Hassani, M. (Corresponding author); Töws, D.; Cuzzocrea, A.; Seidl, T.

In: International Journal of Data Science and Analytics, Vol. 8, No. 3, 2019, p. 223-239.

Research output: Contribution to journalArticleAcademicpeer-review

TY - JOUR

T1 - BFSPMiner

T2 - an effective and efficient batch-free algorithm for mining sequential patterns over data streams

AU - Hassani, M.

AU - Töws, D.

AU - Cuzzocrea, A.

AU - Seidl, T.

PY - 2019

Y1 - 2019

N2 - Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.KeywordsSequential pattern mining Data streams Batch-free

AB - Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.KeywordsSequential pattern mining Data streams Batch-free

U2 - 10.1007/s41060-017-0084-8

DO - 10.1007/s41060-017-0084-8

M3 - Article

VL - 8

SP - 223

EP - 239

JO - International Journal of Data Science and Analytics

JF - International Journal of Data Science and Analytics

SN - 2364-415X

IS - 3

ER -