Signature extraction is a critical preprocessing step in forensic log analysis because it enables sophisticated analysis techniques to be applied to logs. Currently, most signature extraction frameworks either use rule-based approaches or handcrafted algorithms. Rule-based systems are error-prone and require high maintenance effort. Hand-crafted algorithms use heuristics and tend to work well only for specialized use cases. In this paper we present a novel approach to extract signatures from forensic logs that is based on a neural language model. This language model learns to identify mutable and non-mutable parts in a log message. We use this information to extract signatures. Neural language models have shown to work extremely well for learning complex relationships in natural language text. We experimentally demonstrate that our model can detect which parts are mutable with an accuracy of 86.4%. We also show how extracted signatures can be used for clustering log lines.
|Title of host publication||2017 5th International Symposium on Digital Forensic and Security, ISDFS 2017|
|Editors||Bela Genge, Piroska Haller|
|Place of Publication||Piscataway|
|Publisher||Institute of Electrical and Electronics Engineers|
|Number of pages||6|
|Publication status||Published - 1 May 2017|
Thaler, S. M., Menkovski, V., & Petkovic, M. (2017). Towards a neural language model for signature extraction from forensic logs. In B. Genge, & P. Haller (Eds.), 2017 5th International Symposium on Digital Forensic and Security, ISDFS 2017  Piscataway: Institute of Electrical and Electronics Engineers. https://doi.org/10.1109/ISDFS.2017.7916497