Abstract
One of the main challenges in applying process mining on real event data, is the presence of noise and rare behaviour. Applying process mining algorithms directly on raw event data typically results in complex, incomprehensible, and, in some cases, even inaccurate analyses. As a result, correct and/or important behaviour may be concealed. In this paper, we propose an event data repair method, that tries to detect and repair outlier behaviour within the given event data. We propose a probabilistic method that is based on the occurrence frequency of activities in specific contexts. Our approach allows for removal of infrequent behaviour, which enables us to obtain a more global view of the process. The proposed method has been implemented in both the ProM- and the RapidProM framework. Using these implementations, we conduct a collection of experiments that show that we are able to detect and modify most types of outlier behaviour in the event data. Our evaluation clearly demonstrates that we are able to help to improve process mining discovery results by repairing event logs upfront.
Original language | English |
---|---|
Title of host publication | Business Information Systems - 21st International Conference, BIS 2018, Proceedings |
Editors | W. Abramowicz, A. Paschke |
Place of Publication | Cham |
Publisher | Springer |
Pages | 115-131 |
Number of pages | 17 |
ISBN (Electronic) | 978-3-319-93931-5 |
ISBN (Print) | 978-3-319-93930-8 |
DOIs | |
Publication status | Published - 1 Jan 2018 |
Event | 21st International Conference on Business Information Systems, (BIS 2018) - Berlin, Germany Duration: 18 Jul 2018 → 20 Jul 2018 https://link.springer.com/conference/bis |
Publication series
Name | Lecture Notes in Business Information Processing |
---|---|
Volume | 320 |
ISSN (Print) | 1865-1348 |
Conference
Conference | 21st International Conference on Business Information Systems, (BIS 2018) |
---|---|
Abbreviated title | BIS2018 |
Country/Territory | Germany |
City | Berlin |
Period | 18/07/18 → 20/07/18 |
Internet address |
Keywords
- Conditional probability
- Data cleansing
- Event log preprocessing
- Log repair
- Outlier detection
- Process mining