Scalable process discovery and conformance checking

S.J.J. Leemans, D. Fahland, W.M.P. van der Aalst

Research output: Contribution to journalArticleAcademicpeer-review

164 Citations (Scopus)
158 Downloads (Pure)


Considerable amounts of data, including process events, are collected and stored by organisations nowadays. Discovering a process model from such event data and verification of the quality of discovered models are important steps in process mining. Many discovery techniques have been proposed, but none of them combines scalability with strong quality guarantees. We would like such techniques to handle billions of events or thousands of activities, to produce sound models (without deadlocks and other anomalies), and to guarantee that the underlying process can be rediscovered when sufficient information is available. In this paper, we introduce a framework for process discovery that ensures these properties while passing over the log only once and introduce three algorithms using the framework. To measure the quality of discovered models for such large logs, we introduce a model–model and model–log comparison framework that applies a divide-and-conquer strategy to measure recall, fitness, and precision. We experimentally show that these discovery and measuring techniques sacrifice little compared to other algorithms, while gaining the ability to cope with event logs of 100,000,000 traces and processes of 10,000 activities on a standard computer.

Original languageEnglish
Pages (from-to)599-631
Number of pages33
JournalSoftware and Systems Modeling
Issue number2
Early online date8 Jul 2016
Publication statusPublished - 1 May 2018


  • Algorithm evaluation
  • Big data
  • Block-structured process discovery
  • Conformance checking
  • Directly-follows graphs
  • Rediscoverability
  • Scalable process mining


Dive into the research topics of 'Scalable process discovery and conformance checking'. Together they form a unique fingerprint.

Cite this