Over the last decade process mining techniques have matured and more and more organizations started to use process mining to analyze their operational processes. The current hype around "big data" illustrates the desire to analyze ever-growing data sets. Process mining starts from event logs—multisets of traces (sequences of events)—and for the widespread application of process mining it is vital to be able to handle "big event logs". Some event logs are "big" because they contain many traces. Others are big in terms of different activities. Most of the more advanced process mining algorithms (both for process discovery and conformance checking) scale very badly in the number of activities. For these algorithms, it could help if we could split the big event log (containing many activities) into a collection of smaller event logs (which each contain fewer activities), run the algorithm on each of these smaller logs, and merge the results into a single result. This paper introduces a generic framework for doing exactly that, and makes this concrete by implementing algorithms for decomposed process discovery and decomposed conformance checking using Integer Linear Programming (ILP) based algorithms. ILP-based process mining techniques provide precise results and formal guarantees (e.g., perfect fitness), but are known to scale badly in the number of activities. A small case study shows that we can gain orders of magnitude in run-time. However, in some cases there is tradeoff between run-time and quality.
|Title of host publication||Business Process Management Workshops (BPM 2014 International Workshops, Eindhoven, The Netherlands, September 7-8, 2014, Revised Papers)|
|Editors||F. Fournier, J. Mendling|
|Place of Publication||Dordrecht|
|Publication status||Published - 2015|
|Name||Lecture Notes in Business Information Processing|