PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

1 Citaat (Scopus)

Samenvatting

While social media has taken a fixed place in our daily life, its steadily growing prominence also exacerbates the problem of hostile contents and hate-speech. These destructive phenomena call for automatic hate-speech detection, which, however, is facing two major challenges, namely i) the dynamic nature of online content causing significant data-drift over time, and ii) a high class-skew, as hate-speech represents a relatively small fraction of the overall online content. The first challenge naturally calls for a batch mode active learning solution, which updates the detection system by querying human domain-experts to annotate meticulously selected batches of data instances. However, little prior work exists on batch mode active learning with high class-skew, and in particular for the problem of hate-speech detection. In this work, we propose a novel partition-based batch mode active learning framework to address this problem. Our framework falls into the so-called screening approach, which pre-selects a subset of most uncertain data items and then selects a representative set from this uncertainty space. To tackle the class-skew problem, we use a data-driven skew-specialized cluster representation, with a higher potential to “cherry pick” minority classes. In extensive experiments we demonstrate substantial improvements in terms of G-Means, and F1 measure, over several baseline approaches and multiple datasets, for highly imbalanced class ratios.

Originele taal-2Engels
TitelMachine Learning and Knowledge Discovery in Databases. Applied Data Science and Demo Track - European Conference, ECML PKDD 2020, Proceedings
RedacteurenYuxiao Dong, Dunja Mladenic, Craig Saunders
UitgeverijSpringer
Pagina's68-84
Aantal pagina's17
ISBN van geprinte versie9783030676698
DOI's
StatusGepubliceerd - 2021
Evenement2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020) - Virtual, Online, Ghent, België
Duur: 14 sep. 202018 sep. 2020
https://ecmlpkdd2020.net/

Publicatie series

NaamLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12461 LNAI
ISSN van geprinte versie0302-9743
ISSN van elektronische versie1611-3349

Congres

Congres2020 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2020)
Verkorte titelECML PKDD 2020
Land/RegioBelgië
StadGhent
Periode14/09/2018/09/20
Internet adres

Bibliografische nota

Publisher Copyright:
© 2021, Springer Nature Switzerland AG.

Vingerafdruk

Duik in de onderzoeksthema's van 'PS3: Partition-Based Skew-Specialized Sampling for Batch Mode Active Learning in Imbalanced Text Data'. Samen vormen ze een unieke vingerafdruk.

Citeer dit