TY - JOUR
T1 - Few images, many insights
T2 - Illicit Content Detection Using a Limited Number of Images.
AU - Cascavilla, Giuseppe
AU - Catolino, Gemma
AU - Conti, Mauro
AU - Mellios, Dimos
AU - Tamburri, Damian A.
PY - 2024/12
Y1 - 2024/12
N2 - The anonymity and untraceability benefits of the dark web increased its popularity exponentially. The cost of these technical benefits is that such anonymity has created a suitable womb for illicit activity. Hence - in collaboration with cybersecurity practitioners and law-enforcement agencies - the research community provided approaches for recognizing and classifying illicit activities. Most of these approaches exploit textual content from dark web markets, whereas few used images that originated from them. This article investigates alternative techniques for recognizing illegal activities from images. The significant contributions of our work are threefold: (a) We investigate label-agnostic learning techniques like One-Shot and Few-Shot learning that use Siamese Neural Networks. Our approach manages to handle small-scale datasets with promising accuracy. In particular, the Siamese Neural Network approach reaches 90.9% on 5-Shot experiments over a 10-class dataset. (b) This study's satisfactory findings facilitate the creation of potent tools to assist authorities in identifying illicit content on the Web. Moreover, our proof-of-concept approach demonstrated the ability to recognize illegal images using a limited number of files, reducing the time constraint in collecting illegal images. (c) We provide a complete labeled dataset of 3,570 images from 55 different categories from dark web markets that can be used for future research activities.
AB - The anonymity and untraceability benefits of the dark web increased its popularity exponentially. The cost of these technical benefits is that such anonymity has created a suitable womb for illicit activity. Hence - in collaboration with cybersecurity practitioners and law-enforcement agencies - the research community provided approaches for recognizing and classifying illicit activities. Most of these approaches exploit textual content from dark web markets, whereas few used images that originated from them. This article investigates alternative techniques for recognizing illegal activities from images. The significant contributions of our work are threefold: (a) We investigate label-agnostic learning techniques like One-Shot and Few-Shot learning that use Siamese Neural Networks. Our approach manages to handle small-scale datasets with promising accuracy. In particular, the Siamese Neural Network approach reaches 90.9% on 5-Shot experiments over a 10-class dataset. (b) This study's satisfactory findings facilitate the creation of potent tools to assist authorities in identifying illicit content on the Web. Moreover, our proof-of-concept approach demonstrated the ability to recognize illegal images using a limited number of files, reducing the time constraint in collecting illegal images. (c) We provide a complete labeled dataset of 3,570 images from 55 different categories from dark web markets that can be used for future research activities.
KW - Convolution Neural Network
KW - Dark web
KW - Few-Shot learning
KW - One-Shot learning
KW - Siamese Neural Network
UR - http://www.scopus.com/inward/record.url?scp=85217850051&partnerID=8YFLogxK
U2 - 10.1145/3696458
DO - 10.1145/3696458
M3 - Article
SN - 2157-6904
VL - 15
JO - ACM Transactions on Intelligent Systems and Technology
JF - ACM Transactions on Intelligent Systems and Technology
IS - 6
M1 - 130
ER -