Abstract
Original language | English |
---|---|
Title of host publication | Proceedings of the International Conference Recent Advances in Natural Language Processing 2019 |
Publisher | Association for Computational Linguistics (ACL) |
Number of pages | 10 |
Publication status | Accepted/In press - 2019 |
Event | International Conference Recent Advances in Natural Language Processing - Varna, Bulgaria Duration: 2 Sep 2019 → 4 Sep 2019 http://lml.bas.bg/ranlp2019/start.php |
Conference
Conference | International Conference Recent Advances in Natural Language Processing |
---|---|
Abbreviated title | RANLP |
Country | Bulgaria |
City | Varna |
Period | 2/09/19 → 4/09/19 |
Internet address |
Fingerprint
Keywords
- text classification
- term extraction
- character word embeddings
- invoice classification
Cite this
}
Term based semantic clusters for very short text classification. / Paalman, Jasper; Mullick, Shantanu; Zervanou, Kalliopi; Zhang, Yingqian.
Proceedings of the International Conference Recent Advances in Natural Language Processing 2019. Association for Computational Linguistics (ACL), 2019.Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Academic › peer-review
TY - GEN
T1 - Term based semantic clusters for very short text classification
AU - Paalman, Jasper
AU - Mullick, Shantanu
AU - Zervanou, Kalliopi
AU - Zhang, Yingqian
PY - 2019
Y1 - 2019
N2 - Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.
AB - Very short texts, such as tweets and invoices, present challenges in classification. Although term occurrences are strong indicators of content, in very short texts, the sparsity of these texts makes it difficult to capture important semantic relationships. A solution calls for a method that not only considers term occurrence, but also handles sparseness well. In this work, we introduce such an approach, the Term Based Semantic Clusters (TBSeC) that employs terms to create distinctive semantic concept clusters. These clusters are ranked using a semantic similarity function which in turn defines a semantic feature space that can be used for text classification. Our method is evaluated in an invoice classification task. Compared to well-known content representation methods the proposed method performs competitively.
KW - text classification
KW - term extraction
KW - character word embeddings
KW - invoice classification
M3 - Conference contribution
BT - Proceedings of the International Conference Recent Advances in Natural Language Processing 2019
PB - Association for Computational Linguistics (ACL)
ER -