TY - JOUR
T1 - On the application of sequential pattern mining primitives to process discovery
T2 - overview, outlook and opportunity identification
AU - Hassani, Marwan
AU - van Zelst, Sebastiaan J.
AU - van der Aalst, Wil M.P.
PY - 2019/11
Y1 - 2019/11
N2 - Sequential pattern mining (SPM) is a well-studied theme in data mining, in which one aims to discover common sequences of item sets in a large corpus of temporal itemset data. Due to the sequential nature of data streams, supporting SPM in streaming environments is commonly studied in the area of data stream mining as well. On the other hand, stream-based process discovery (PD), originating from the field of process mining, focusses on learning process models on the basis of online event data. In particular, the main goal of the models discovered is to describe the underlying generating process in an end-to-end fashion. As both SPM and PD use data that are comparable in nature, that is, both involve time-stamped instances, one expects that techniques from the SPM domain are (partly) transferable to the PD domain. However, thus far, little work has been done in the intersection of the two fields. In this focus article, we therefore study the possible application of SPM techniques in the context of PD. We provide an overview of the two fields, covering their commonalities and differences, highlight the challenges of applying them, and, present an outlook and several avenues for future work. This article is categorized under: Algorithmic Development > Spatial and Temporal Data Mining Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Big Data Mining.
AB - Sequential pattern mining (SPM) is a well-studied theme in data mining, in which one aims to discover common sequences of item sets in a large corpus of temporal itemset data. Due to the sequential nature of data streams, supporting SPM in streaming environments is commonly studied in the area of data stream mining as well. On the other hand, stream-based process discovery (PD), originating from the field of process mining, focusses on learning process models on the basis of online event data. In particular, the main goal of the models discovered is to describe the underlying generating process in an end-to-end fashion. As both SPM and PD use data that are comparable in nature, that is, both involve time-stamped instances, one expects that techniques from the SPM domain are (partly) transferable to the PD domain. However, thus far, little work has been done in the intersection of the two fields. In this focus article, we therefore study the possible application of SPM techniques in the context of PD. We provide an overview of the two fields, covering their commonalities and differences, highlight the challenges of applying them, and, present an outlook and several avenues for future work. This article is categorized under: Algorithmic Development > Spatial and Temporal Data Mining Fundamental Concepts of Data and Knowledge > Key Design Issues in Data Mining Fundamental Concepts of Data and Knowledge > Big Data Mining.
KW - data streams
KW - distributed sequential pattern mining
KW - process mining
KW - sequential pattern mining
UR - http://www.scopus.com/inward/record.url?scp=85065641261&partnerID=8YFLogxK
U2 - 10.1002/widm.1315
DO - 10.1002/widm.1315
M3 - Article
AN - SCOPUS:85065641261
SN - 1942-4787
VL - 9
JO - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
JF - Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery
IS - 6
M1 - e1315
ER -