TY - JOUR
T1 - Enabling efficient process mining on large data sets
T2 - realizing an in-database process mining operator
AU - Dijkman, Remco
AU - Gao, Juntao
AU - Syamsiyah, Alifah
AU - van Dongen, Boudewijn
AU - Grefen, Paul
AU - ter Hofstede, Arthur
PY - 2020/3/1
Y1 - 2020/3/1
N2 - Process mining can be used to analyze business processes based on logs of their execution. These execution logs are often obtained by querying a database and storing the results in a file. The mining itself is then done on the file, such that the data processing power of the database cannot be used after the log is extracted. Enabling process mining directly on a database therefore provides additional flexibility and efficiency. To help facilitate this, this paper formally defines a database operator that extracts the ‘directly follows’ relation—one of the relations that is at the heart of process mining—from an operational database. It defines the operator using the well-known relational algebra and formally proves equivalence properties of the operator that are useful for query optimization. Subsequently, it presents time-complexity properties of the operator. Finally, it presents an implementation of the operator as part of the H2 DBMS and demonstrates that this implementation extracts the ‘directly follows’ relation from a database with an arbitrary database structure within a fraction of a second; several orders of magnitude faster than is currently possible.
AB - Process mining can be used to analyze business processes based on logs of their execution. These execution logs are often obtained by querying a database and storing the results in a file. The mining itself is then done on the file, such that the data processing power of the database cannot be used after the log is extracted. Enabling process mining directly on a database therefore provides additional flexibility and efficiency. To help facilitate this, this paper formally defines a database operator that extracts the ‘directly follows’ relation—one of the relations that is at the heart of process mining—from an operational database. It defines the operator using the well-known relational algebra and formally proves equivalence properties of the operator that are useful for query optimization. Subsequently, it presents time-complexity properties of the operator. Finally, it presents an implementation of the operator as part of the H2 DBMS and demonstrates that this implementation extracts the ‘directly follows’ relation from a database with an arbitrary database structure within a fraction of a second; several orders of magnitude faster than is currently possible.
KW - Database management system
KW - Formal methods
KW - Process mining
KW - Relational algebra
UR - http://www.scopus.com/inward/record.url?scp=85065715620&partnerID=8YFLogxK
U2 - 10.1007/s10619-019-07270-1
DO - 10.1007/s10619-019-07270-1
M3 - Article
AN - SCOPUS:85065715620
SN - 0926-8782
VL - 38
SP - 227
EP - 253
JO - Distributed and Parallel Databases
JF - Distributed and Parallel Databases
IS - 1
ER -