TY - CHAP
T1 - Discovering process models with genetic algorithms using sampling
AU - Bratosin, C.C.
AU - Sidorova, N.
AU - Aalst, van der, W.M.P.
PY - 2010
Y1 - 2010
N2 - Process mining, a new business intelligence area, aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, deals with the aforementioned challenges with success. Its drawback is high computation time due to the high time costs of the fitness evaluation. Fitness evaluation time linearly depends on the number of process instances in the log. By using a sampling-based approach, i.e. evaluating fitness on a sample from the log instead of the whole log, we drastically reduce the computation time. When the desired fitness is achieved on the sample, we check the fitness on the whole log; if it is not achieved yet, we increase the sample size and continue the computation iteratively. Our experiments show that sampling works well even for relatively small logs, and the total computation time is reduced by 6 up to 15 times.
AB - Process mining, a new business intelligence area, aims at discovering process models from event logs. Complex constructs, noise and infrequent behavior are issues that make process mining a complex problem. A genetic mining algorithm, which applies genetic operators to search in the space of all possible process models, deals with the aforementioned challenges with success. Its drawback is high computation time due to the high time costs of the fitness evaluation. Fitness evaluation time linearly depends on the number of process instances in the log. By using a sampling-based approach, i.e. evaluating fitness on a sample from the log instead of the whole log, we drastically reduce the computation time. When the desired fitness is achieved on the sample, we check the fitness on the whole log; if it is not achieved yet, we increase the sample size and continue the computation iteratively. Our experiments show that sampling works well even for relatively small logs, and the total computation time is reduced by 6 up to 15 times.
U2 - 10.1007/978-3-642-15387-7_8
DO - 10.1007/978-3-642-15387-7_8
M3 - Chapter
SN - 978-3-642-15386-0
T3 - Lecture Notes in Computer Science
SP - 41
EP - 50
BT - Knowledge-Based and Intelligent Information and Engineering Systems (14th International Conference, KES'2010, Cardiff, UK, September 8-10, 2010. Proceedings)
A2 - Setchi, R.
A2 - Jordanov, I.
A2 - Howlett, R.J.
A2 - Jain, L.C.
PB - Springer
CY - Berlin
T2 - conference; 14th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems (KES'2010); 2010-09-08; 2010-09-10
Y2 - 8 September 2010 through 10 September 2010
ER -