While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the class-skew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to “cherry pick” minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.
|Titel||Machine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track - European Conference, ECML PKDD 2020, Proceedings|
|Redacteuren||Yuxiao Dong, Dunja Mladenic, Craig Saunders|
|ISBN van geprinte versie||9783030676698|
|Status||Gepubliceerd - 2021|
|Evenement||2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020) - Virtual, Online, Ghent, België|
Duur: 14 sep. 2020 → 18 sep. 2020
|Naam||Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)|
|ISSN van geprinte versie||0302-9743|
|ISSN van elektronische versie||1611-3349|
|Congres||2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020)|
|Verkorte titel||ECML PKDD 2020|
|Periode||14/09/20 → 18/09/20|
Bibliografische notaPublisher Copyright:
© 2021, Springer Nature Switzerland AG.