Information-preserving abstractions of event data in process mining

Sander J.J. Leemans (Corresponding author), Dirk Fahland

    Research output: Contribution to journalArticleAcademicpeer-review

    1 Downloads (Pure)

    Abstract

    Process mining aims at obtaining information about processes by analysing their past executions in event logs, event streams, or databases. Discovering a process model from a finite amount of event data thereby has to correctly infer infinitely many unseen behaviours. Thereby, many process discovery techniques leverage abstractions on the finite event data to infer and preserve behavioural information of the underlying process. However, the fundamental information-preserving properties of these abstractions are not well understood yet. In this paper, we study the information-preserving properties of the “directly follows” abstraction and its limitations. We overcome these by proposing and studying two new abstractions which preserve even more information in the form of finite graphs. We then show how and characterize when process behaviour can be unambiguously recovered through characteristic footprints in these abstractions. Our characterization defines large classes of practically relevant processes covering various complex process patterns. We prove that the information and the footprints preserved in the abstractions suffice to unambiguously rediscover the exact process model from a finite event log. Furthermore, we show that all three abstractions are relevant in practice to infer process models from event logs and outline the implications on process mining techniques.

    Original languageEnglish
    Pages (from-to)1143–1197
    Number of pages55
    JournalKnowledge and Information Systems
    Volume62
    Issue number3
    DOIs
    Publication statusPublished - 1 Mar 2020

    Keywords

    • Directly follows
    • Inclusive choice
    • Information preservation
    • Language abstraction
    • Minimum self-distance
    • Model abstraction
    • Process mining
    • Rediscoverability

    Cite this