Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts

Maarten Grootendorst, Joaquin Vanschoren

Research output: Chapter in Book/Report/Conference proceedingConference contributionAcademicpeer-review

Abstract

Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first order statistics. The proposed method is tested on four different data sets for single-label classification and compared with several baselines, including TF-IDF and Bag-of-Concepts. Results indicate that when combining features of VLAC with TF-IDF significant improvements in performance were found regardless of which word embeddings were used.

Original languageEnglish
Title of host publicationMachine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings
EditorsUlf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet
Place of PublicationCham
PublisherSpringer
Pages681-696
Number of pages16
ISBN (Electronic)978-3-030-46147-8
ISBN (Print)978-3-030-46146-1
DOIs
Publication statusPublished - 2020
Event2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Germany
Duration: 16 Sep 201920 Sep 2019
Conference number: 19
http://ecmlpkdd2019.org/

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume11907 LNAI
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019)
Abbreviated titleECML PKDD 2019
CountryGermany
CityWurzburg
Period16/09/1920/09/19
Internet address

Keywords

  • Bag of Concepts
  • Vector of Locally Aggregated Descriptors
  • Vectors of Locally Aggregated Concepts

Fingerprint

Dive into the research topics of 'Beyond Bag-of-Concepts: Vectors of Locally Aggregated Concepts'. Together they form a unique fingerprint.

Cite this