PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review


While social media has taken a xed place in our daily life,
its steadily growing prominence also exacerbates the problem of hostile
contents and hate-speech. These destructive phenomena call for auto-
matic hate-speech detection, which, however, is facing two major chal-
lenges, namely i) the dynamic nature of online content causing signicant
data-drift over time, and ii) a high class-skew, as hate-speech represents
a relatively small fraction of the overall online content. The rst chal-
lenge naturally calls for a batch mode active learning solution, which
updates the detection system by querying human domain-experts to an-
notate meticulously selected batches of data instances. However, little
prior work exists on batch mode active learning with high class-skew,
and in particular for the problem of hate-speech detection. In this work,
we propose a novel partition-based batch mode active learning framework
to address this problem. Our framework falls into the so-called screening
approach, which pre-selects a subset of most uncertain data items and
then selects a representative set from this uncertainty space. To tackle
the class-skew problem, we use a data-driven skew-specialized cluster
representation, with a higher potential to \cherry pick" minority classes.
In extensive experiments we demonstrate substantial improvements in
terms of G-Means, and F1 measure, over several baseline approaches
and multiple datasets, for highly imbalanced class ratios.
Originele taal-2Engels
TitelThe European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
StatusGepubliceerd - 18 sep 2020

Vingerafdruk Duik in de onderzoeksthema's van 'PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data'. Samen vormen ze een unieke vingerafdruk.

Citeer dit