Abstract
Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first order statistics. The proposed method is tested on four different data sets for single-label classification and compared with several baselines, including TF-IDF and Bag-of-Concepts. Results indicate that when combining features of VLAC with TF-IDF significant improvements in performance were found regardless of which word embeddings were used.
Original language | English |
---|---|
Title of host publication | Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings |
Editors | Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet |
Place of Publication | Cham |
Publisher | Springer |
Pages | 681-696 |
Number of pages | 16 |
ISBN (Electronic) | 978-3-030-46147-8 |
ISBN (Print) | 978-3-030-46146-1 |
DOIs | |
Publication status | Published - 2020 |
Event | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Germany Duration: 16 Sept 2019 → 20 Sept 2019 Conference number: 19 http://ecmlpkdd2019.org/ |
Publication series
Name | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11907 LNAI |
ISSN (Print) | 0302-9743 |
ISSN (Electronic) | 1611-3349 |
Conference
Conference | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) |
---|---|
Abbreviated title | ECML PKDD 2019 |
Country/Territory | Germany |
City | Wurzburg |
Period | 16/09/19 → 20/09/19 |
Internet address |
Keywords
- Bag of Concepts
- Vector of Locally Aggregated Descriptors
- Vectors of Locally Aggregated Concepts