Samenvatting
Supporting sequential pattern mining from data streams is nowadays a relevant problem in the area of data stream mining research. Actual proposals available in the literature are based on the well-known PrefixSpan approach and are, indeed, able to effectively bound the error of discovered patterns. This approach foresees the idea of dividing the target stream in a collection of manageable chunks, i.e., pieces of stream, in order to gain into effectiveness and efficiency. Unfortunately, mining patterns from stream chunks indeed introduce additional errors with respect to the basic application scenario where the target stream is mined continuously, in a non-batch manner. This is due to several reasons. First, since batches are processed individually, patterns that contain items from two consecutive batches are lost. Secondly, in most batch-based approaches, the decision about the frequency of a pattern is done locally inside a single batch. Thus, if a pattern is frequent in the stream but its items are scattered over different batches, it will be continuously pruned out and will never become frequent due to the algorithm’s lack of the “complete-picture” perspective. In order to address so-delineated pattern mining problems, this paper introduces and experimentally assesses BFSPMiner, a Batch-Free Sequential Pattern Miner algorithm for effectively and efficiently mining patterns in streams without being constrained to the traditional batch-based processing. This allows us, for instance, to discover frequent patterns that would be lost according to alternative batch-based stream mining processing models. We complement our analytical contributions by means of a comprehensive experimental campaign of BFSPMiner against real-world data stream sets and in comparison with current batch-based stream sequential pattern mining algorithms.
Keywords
Sequential pattern mining Data streams Batch-free
Keywords
Sequential pattern mining Data streams Batch-free
Originele taal-2 | Engels |
---|---|
Pagina's (van-tot) | 223-239 |
Aantal pagina's | 17 |
Tijdschrift | International Journal of Data Science and Analytics |
Volume | 8 |
Nummer van het tijdschrift | 3 |
DOI's | |
Status | Gepubliceerd - 1 okt. 2019 |
Evenement | 6th International Workshop on Big Data, Streams and Heterogeneous Source Mining: Algorithms, Systems, Programming Models and Applications (KDD 2017) - Halifax, Canada Duur: 14 aug. 2017 → 14 aug. 2017 |