DB-XES: enabling process discovery in the large

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

3 Citations (Scopus)
1 Downloads (Pure)

Abstract

Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.

Original languageEnglish
Title of host publicationData-Driven Process Discovery and Analysis
Subtitle of host publication6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers
EditorsP. Ceravolo, C. Guetl, S. Rinderle-Ma
Place of PublicationDordrecht
PublisherSpringer
Pages53-77
Number of pages25
ISBN (Electronic)978-3-319-74161-1
ISBN (Print)978-3-319-74160-4
DOIs
Publication statusPublished - 2018
Event6th International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2016) - Graz, Austria
Duration: 15 Dec 201616 Dec 2016
Conference number: 6

Publication series

NameLecture Notes in Business Information Processing
Volume307
ISSN (Print)1865-1348

Conference

Conference6th International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2016)
Abbreviated titleSIMPDA 2016
CountryAustria
CityGraz
Period15/12/1616/12/16

Fingerprint

Data storage equipment
Relational Database
Data structures
Scalability
Engines
Schema
Processing
Insertion
Preprocessing
Data Structures
Engine
Limiting
Experiments
Requirements
Range of data
Demonstrate
Experiment
Data base
Relational database
Factors

Keywords

  • Big event data
  • Process discovery
  • Process mining
  • Relational database

Cite this

Syamsiyah, A., van Dongen, B. F., & van der Aalst, W. M. P. (2018). DB-XES: enabling process discovery in the large. In P. Ceravolo, C. Guetl, & S. Rinderle-Ma (Eds.), Data-Driven Process Discovery and Analysis: 6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers (pp. 53-77). (Lecture Notes in Business Information Processing; Vol. 307). Dordrecht: Springer. https://doi.org/10.1007/978-3-319-74161-1_4
Syamsiyah, A. ; van Dongen, B.F. ; van der Aalst, W.M.P. / DB-XES : enabling process discovery in the large. Data-Driven Process Discovery and Analysis: 6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers. editor / P. Ceravolo ; C. Guetl ; S. Rinderle-Ma. Dordrecht : Springer, 2018. pp. 53-77 (Lecture Notes in Business Information Processing).
@inproceedings{c24f89dd45154c22bc015f836eeada62,
title = "DB-XES: enabling process discovery in the large",
abstract = "Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.",
keywords = "Big event data, Process discovery, Process mining, Relational database",
author = "A. Syamsiyah and {van Dongen}, B.F. and {van der Aalst}, W.M.P.",
year = "2018",
doi = "10.1007/978-3-319-74161-1_4",
language = "English",
isbn = "978-3-319-74160-4",
series = "Lecture Notes in Business Information Processing",
publisher = "Springer",
pages = "53--77",
editor = "P. Ceravolo and C. Guetl and S. Rinderle-Ma",
booktitle = "Data-Driven Process Discovery and Analysis",
address = "Germany",

}

Syamsiyah, A, van Dongen, BF & van der Aalst, WMP 2018, DB-XES: enabling process discovery in the large. in P Ceravolo, C Guetl & S Rinderle-Ma (eds), Data-Driven Process Discovery and Analysis: 6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers. Lecture Notes in Business Information Processing, vol. 307, Springer, Dordrecht, pp. 53-77, 6th International Symposium on Data-Driven Process Discovery and Analysis (SIMPDA 2016), Graz, Austria, 15/12/16. https://doi.org/10.1007/978-3-319-74161-1_4

DB-XES : enabling process discovery in the large. / Syamsiyah, A.; van Dongen, B.F.; van der Aalst, W.M.P.

Data-Driven Process Discovery and Analysis: 6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers. ed. / P. Ceravolo; C. Guetl; S. Rinderle-Ma. Dordrecht : Springer, 2018. p. 53-77 (Lecture Notes in Business Information Processing; Vol. 307).

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

TY - GEN

T1 - DB-XES

T2 - enabling process discovery in the large

AU - Syamsiyah, A.

AU - van Dongen, B.F.

AU - van der Aalst, W.M.P.

PY - 2018

Y1 - 2018

N2 - Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.

AB - Dealing with the abundance of event data is one of the main process discovery challenges. Current process discovery techniques are able to efficiently handle imported event log files that fit in the computer’s memory. Once data files get bigger, scalability quickly drops since the speed required to access the data becomes a limiting factor. This paper proposes a new technique based on relational database technology as a solution for scalable process discovery. A relational database is used both for storing event data (i.e. we move the location of the data) and for pre-processing the event data (i.e. we move some computations from analysis-time to insertion-time). To this end, we first introduce DB-XES as a database schema which resembles the standard XES structure, we provide a transparent way to access event data stored in DB-XES, and we show how this greatly improves on the memory requirements of the state-of-the-art process discovery techniques. Secondly, we show how to move the computation of intermediate data structures to the database engine, to reduce the time required during process discovery. The work presented in this paper is implemented in ProM tool, and a range of experiments demonstrates the feasibility of our approach.

KW - Big event data

KW - Process discovery

KW - Process mining

KW - Relational database

UR - http://www.scopus.com/inward/record.url?scp=85041796509&partnerID=8YFLogxK

U2 - 10.1007/978-3-319-74161-1_4

DO - 10.1007/978-3-319-74161-1_4

M3 - Conference contribution

AN - SCOPUS:85041796509

SN - 978-3-319-74160-4

T3 - Lecture Notes in Business Information Processing

SP - 53

EP - 77

BT - Data-Driven Process Discovery and Analysis

A2 - Ceravolo, P.

A2 - Guetl, C.

A2 - Rinderle-Ma, S.

PB - Springer

CY - Dordrecht

ER -

Syamsiyah A, van Dongen BF, van der Aalst WMP. DB-XES: enabling process discovery in the large. In Ceravolo P, Guetl C, Rinderle-Ma S, editors, Data-Driven Process Discovery and Analysis: 6th IFIP WG 2.6 International Symposium, SIMPDA 2016, Graz, Austria, December 15-16, 2016, Revised Selected Papers. Dordrecht: Springer. 2018. p. 53-77. (Lecture Notes in Business Information Processing). https://doi.org/10.1007/978-3-319-74161-1_4