Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

Maarten Grootendorst, Joaquin Vanschoren

Onderzoeksoutput: Hoofdstuk in Boek/Rapport/CongresprocedureConferentiebijdrageAcademicpeer review

5 Citaten (Scopus)

Samenvatting

Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first order statistics. The proposed method is tested on four different data sets for single-label classification and compared with several baselines, including TF-IDF and Bag-of-Concepts. Results indicate that when combining features of VLAC with TF-IDF significant improvements in performance were found regardless of which word embeddings were used.

Originele taal-2Engels
TitelMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
RedacteurenUlf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet
Plaats van productieCham
UitgeverijSpringer
Pagina's681-696
Aantal pagina's16
ISBN van elektronische versie978-3-030-46147-8
ISBN van geprinte versie978-3-030-46146-1
DOI's
StatusGepubliceerd - 2020
Evenement2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Duitsland
Duur: 16 sep. 201920 sep. 2019
Congresnummer: 19
http://ecmlpkdd2019.org/

Publicatie series

NaamLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11907 LNAI
ISSN van geprinte versie0302-9743
ISSN van elektronische versie1611-3349

Congres

Congres2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019)
Verkorte titelECML PKDD 2019
Land/RegioDuitsland
StadWurzburg
Periode16/09/1920/09/19
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts'. Samen vormen ze een unieke vingerafdruk.

Citeer dit