Assessing process discovery scalability in data intensive environments

S. Hernández, J. Ezpeleta, S.J. Van Zelst, W.M.P. Van Der Aalst

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

10 Citations (Scopus)
3 Downloads (Pure)


Tremendous developments in Information Technology (IT) have enabled us to store and process huge amounts of data at unprecedented rates. This phenomenon largely impacts business processes. The field of process discovery, originating from the area of process mining, is concerned with automatically discovering process models from event data related to the execution of business processes. In this paper, we assess the scalability of applying process discovery techniques in data intensive environments. We propose ways to compute the internal data abstractions used by the discovery techniques within the MapReduce framework. The combination of MapReduce and process discovery enables us to tackle much bigger event logs in less time. Our generic approach scales linearly in terms of the data size and the number of computational resources used, and thus, shows great potential for the adoption of process discovery in a Big Data context.

Original languageEnglish
Title of host publicationProceedings - 2015 2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015, 7-10 December 2015, Limassol, Cyprus
Place of PublicationPiscataway
PublisherInstitute of Electrical and Electronics Engineers
Number of pages6
ISBN (Electronic)978-0-7695-5696-3
Publication statusPublished - 11 Feb 2016
Event2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015 - Limassol, Cyprus
Duration: 7 Dec 201510 Dec 2015


Conference2nd IEEE/ACM International Symposium on Big Data Computing, BDC 2015


  • Automated process discovery
  • Big Data
  • Hadoop
  • MapReduce
  • Process mining
  • ProM
  • Scalability


Dive into the research topics of 'Assessing process discovery scalability in data intensive environments'. Together they form a unique fingerprint.

Cite this