Scalable process discovery and conformance checking

    Research output: Contribution to journalArticleAcademicpeer-review

    76 Citations (Scopus)
    86 Downloads (Pure)

    Abstract

    Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

    Original languageEnglish
    Pages (from-to)599-631
    Number of pages33
    JournalSoftware and Systems Modeling
    Volume17
    Issue number2
    Early online date8 Jul 2016
    DOIs
    Publication statusPublished - 1 May 2018

    Keywords

    • Algorithm evaluation
    • Big data
    • Block-structured process discovery
    • Conformance checking
    • Directly-follows graphs
    • Rediscoverability
    • Scalable process mining

    Fingerprint Dive into the research topics of 'Scalable process discovery and conformance checking'. Together they form a unique fingerprint.

    Cite this