TY - GEN
T1 - Scalable process discovery with guarantees
AU - Leemans, S.J.J.
AU - Fahland, D.
AU - Aalst, van der, W.M.P.
PY - 2015
Y1 - 2015
N2 - Considerable amounts of data, including process event data, are collected and stored by organisations nowadays. Discovering a process model from recorded process event data is the aim of process discovery algorithms. Many techniques have been proposed, but none combines scalability with quality guarantees, e.g. can handle billions of events or thousands of activities, and produces sound models (without deadlocks and other anomalies), and guarantees to rediscover the underlying process in some cases. In this paper, we introduce a framework for process discovery that computes a directly-follows graph by passing over the log once, and applying a divide-and-conquer strategy. Moreover, we introduce three algorithms using the framework. We experimentally show that it sacrifices little compared to algorithms that use the full event log, while it gains the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities.
Keywords: Big data; Scalable process mining; Block-structured process discovery; Directly-follows graphs; Rediscoverability
AB - Considerable amounts of data, including process event data, are collected and stored by organisations nowadays. Discovering a process model from recorded process event data is the aim of process discovery algorithms. Many techniques have been proposed, but none combines scalability with quality guarantees, e.g. can handle billions of events or thousands of activities, and produces sound models (without deadlocks and other anomalies), and guarantees to rediscover the underlying process in some cases. In this paper, we introduce a framework for process discovery that computes a directly-follows graph by passing over the log once, and applying a divide-and-conquer strategy. Moreover, we introduce three algorithms using the framework. We experimentally show that it sacrifices little compared to algorithms that use the full event log, while it gains the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities.
Keywords: Big data; Scalable process mining; Block-structured process discovery; Directly-follows graphs; Rediscoverability
U2 - 10.1007/978-3-319-19237-6_6
DO - 10.1007/978-3-319-19237-6_6
M3 - Conference contribution
SN - 978-3-319-19236-9
T3 - Lecture Notes in Business Information Processing
SP - 85
EP - 101
BT - Enterprise, Business-Process and Information Systems Modeling (16th International Conference, BPMDS 2015, 20th International Conference, EMMSAD 2015, Held at CAiSE 2015, Stockholm, Sweden, June 8-9, 2015, Proceedings)
A2 - Gaaloul, K.
A2 - Schmidt, R.
A2 - Nurcan, S.
A2 - Guerreiro, S.
A2 - Ma, Q.
PB - Springer
CY - Berlin
ER -