PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review


While social media has taken a xed place in our daily life,
its steadily growing prominence also exacerbates the problem of hostile
contents and hate-speech. These destructive phenomena call for auto-
matic hate-speech detection, which, however, is facing two major chal-
lenges, namely i) the dynamic nature of online content causing signicant
data-drift over time, and ii) a high class-skew, as hate-speech represents
a relatively small fraction of the overall online content. The rst chal-
lenge naturally calls for a batch mode active learning solution, which
updates the detection system by querying human domain-experts to an-
notate meticulously selected batches of data instances. However, little
prior work exists on batch mode active learning with high class-skew,
and in particular for the problem of hate-speech detection. In this work,
we propose a novel partition-based batch mode active learning framework
to address this problem. Our framework falls into the so-called screening
approach, which pre-selects a subset of most uncertain data items and
then selects a representative set from this uncertainty space. To tackle
the class-skew problem, we use a data-driven skew-specialized cluster
representation, with a higher potential to \cherry pick" minority classes.
In extensive experiments we demonstrate substantial improvements in
terms of G-Means, and F1 measure, over several baseline approaches
and multiple datasets, for highly imbalanced class ratios.
Original languageEnglish
Title of host publicationThe European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases
Publication statusPublished - 18 Sep 2020

Fingerprint Dive into the research topics of 'PS3: Partition-based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data'. Together they form a unique fingerprint.

Cite this