Intrusion detection systems are used in an operational IT environment to strengthen the security. Even with firewalls, system hardening, patch management and other preventive security controls, intrusions might still occur because of remaining weaknesses. IDS detect intrusions from predefined signatures or by detecting anomalies in behaviour from users, network, applications or systems. Signature-based IDS are dependend on defined patterns, rules or signatures that are developed for know attacks. New attacks, such as zero-day attacks or changing signatures of attacks are typically not detected by signature-based IDS. Anomaly-based IDS detect intrusions by detecting abnormal behaviour such as unexpected network communication, service requests or user behaviour. Because of the non-signature approach an anomaly-based IDS can also detect previously unseen attacks. If the number of false alerts can be kept low, anomaly-based IDS can reach good accuracy and detect even unknown attacks. However, because only anomalies are detected, an operator will have to analyse the anomaly event manually to verify and classify an alert. This leaves the usability of anomaly-based IDS at a low level, even when a high accuracy is achieved. This thesis presents an experiment to classify anomaly alerts automatically through supervised machine learning. The experiment is performed on web application attacks, with anomaly alerts received from the SilentDefense anomaly-based intrusion detection system. After considering several machine learning methods, naïve Bayes is selected for the experiment. A naïve Bayesian network is built from attack features that are extracted from the anomaly event data. The accuracy of the network is tested with two different datasets. Incremental learning is briefly regarded. Also a threshold is introduced to prevent misclassifications. The experimental results show that it might very well be possible to classify anomalies with a high TP rate and a low misclassification rate, using simple (stateless) features and a self-learning Naïve Bayesian network. A TP-rate of over 90%, as formulated in the research goal, might be achievable in practice for any well-defined attack class. A FP-rate of 1% or less can even be reached with the use of a threshold.
|Date of Award||31 Jan 2014|
|Supervisor||Sandro Etalle (Supervisor 1) & Damiano Bolzoni (External coach)|