Digital Waste Disposal: an automated framework for analysis of spam emails

Mina Sheikhalishahi, Andrea Saracino (Corresponding author), Fabio Martinelli, Antonio La Marra, Mohammed Mejri, Nadia Tawbi

Research output: Contribution to journalArticleAcademicpeer-review

6 Citations (Scopus)

Abstract

Spam email automated analysis and classification are a challenging task, which is vital in the identification of botnet structures and cybercrime fighting. In this work, we propose an automated methodology and the resulting framework based on innovative categorical divisive clustering, used both for grouping and for classification of spam messages. In particular, the grouping is exploited to identify campaigns of similar spam emails, while the classification is used to label specific emails according to the goal of spammer (e.g., phishing, malware distribution, advertisement, etc.). This work introduces the CCTree algorithm, both as clustering algorithm and as classification algorithm, in two operative modes: batch and dynamic, to handle both large data sets and data streams. Afterward, the CCTree is applied to large sets of spam emails for campaign identification and labeling. The performance of the algorithm is reported for both clustering and classification, and a comparison between the batch and dynamic approaches is presented and discussed.

Original languageEnglish
Pages (from-to)499-522
Number of pages24
JournalInternational Journal of Information Security
Volume19
Issue number5
Early online date25 Sept 2019
DOIs
Publication statusPublished - 1 Oct 2020

Funding

This study was funded by H2020 C3ISP Project (GA 700294).

FundersFunder number
H2020 C3ISP ProjectGA 700294
Horizon 2020 Framework Programme
Horizon 2020700294

    Keywords

    • Classification
    • Clustering
    • Dynamic clustering
    • Spam campaign detection
    • Spam email

    Fingerprint

    Dive into the research topics of 'Digital Waste Disposal: an automated framework for analysis of spam emails'. Together they form a unique fingerprint.

    Cite this