Samenvatting
Bag-of-Concepts, a model that counts the frequency of clustered word embeddings (i.e., concepts) in a document, has demonstrated the feasibility of leveraging clustered word embeddings to create features for document representation. However, information is lost as the word embeddings themselves are not used in the resulting feature vector. This paper presents a novel text representation method, Vectors of Locally Aggregated Concepts (VLAC). Like Bag-of-Concepts, it clusters word embeddings for its feature generation. However, instead of counting the frequency of clustered word embeddings, VLAC takes each cluster’s sum of residuals with respect to its centroid and concatenates those to create a feature vector. The resulting feature vectors contain more discriminative information than Bag-of-Concepts due to the additional inclusion of these first order statistics. The proposed method is tested on four different data sets for single-label classification and compared with several baselines, including TF-IDF and Bag-of-Concepts. Results indicate that when combining features of VLAC with TF-IDF significant improvements in performance were found regardless of which word embeddings were used.
Originele taal-2 | Engels |
---|---|
Titel | Machine Learning and Knowledge Discovery in Databases - European Conference, ECML PKDD 2019, Proceedings |
Redacteuren | Ulf Brefeld, Elisa Fromont, Andreas Hotho, Arno Knobbe, Marloes Maathuis, Céline Robardet |
Plaats van productie | Cham |
Uitgeverij | Springer |
Pagina's | 681-696 |
Aantal pagina's | 16 |
ISBN van elektronische versie | 978-3-030-46147-8 |
ISBN van geprinte versie | 978-3-030-46146-1 |
DOI's | |
Status | Gepubliceerd - 2020 |
Evenement | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) - Wurzburg, Duitsland Duur: 16 sep. 2019 → 20 sep. 2019 Congresnummer: 19 http://ecmlpkdd2019.org/ |
Publicatie series
Naam | Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) |
---|---|
Volume | 11907 LNAI |
ISSN van geprinte versie | 0302-9743 |
ISSN van elektronische versie | 1611-3349 |
Congres
Congres | 2019 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD 2019) |
---|---|
Verkorte titel | ECML PKDD 2019 |
Land/Regio | Duitsland |
Stad | Wurzburg |
Periode | 16/09/19 → 20/09/19 |
Internet adres |